Re: [PATCH V3] swiotlb: Split up single swiotlb lock
On 7/8/2022 1:07 AM, Christoph Hellwig wrote: Thanks, this looks much better. I think there is a small problem with how default_nareas is set - we need to use 0 as the default so that an explicit command line value of 1 works. Als have you checked the interaction with swiotlb_adjust_size in detail? Yes, the patch was tested in the Hyper-V SEV VM which always calls swiotlb_adjust_size() to adjust bounce buffer size according to memory size. It will round up bounce buffer size to the next power of 2 if the memory size is not power of 2. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH V4] swiotlb: Split up single swiotlb lock
From: Tianyu Lan Traditionally swiotlb was not performance critical because it was only used for slow devices. But in some setups, like TDX/SEV confidential guests, all IO has to go through swiotlb. Currently swiotlb only has a single lock. Under high IO load with multiple CPUs this can lead to significat lock contention on the swiotlb lock. This patch splits the swiotlb bounce buffer pool into individual areas which have their own lock. Each CPU tries to allocate in its own area first. Only if that fails does it search other areas. On freeing the allocation is freed into the area where the memory was originally allocated from. Area number can be set via swiotlb kernel parameter and is default to be possible cpu number. If possible cpu number is not power of 2, area number will be round up to the next power of 2. This idea from Andi Kleen patch(https://github.com/intel/tdx/commit/ 4529b5784c141782c72ec9bd9a92df2b68cb7d45). Based-on-idea-by: Andi Kleen Signed-off-by: Tianyu Lan --- Change sicne v3: * Make area number to be zero by default Change since v2: * Use possible cpu number to adjust iotlb area number Change since v1: * Move struct io_tlb_area to swiotlb.c * Fix some coding style issue. --- .../admin-guide/kernel-parameters.txt | 4 +- include/linux/swiotlb.h | 5 + kernel/dma/swiotlb.c | 230 +++--- 3 files changed, 198 insertions(+), 41 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 2522b11e593f..4a6ad177d4b8 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -5904,8 +5904,10 @@ it if 0 is given (See Documentation/admin-guide/cgroup-v1/memory.rst) swiotlb=[ARM,IA-64,PPC,MIPS,X86] - Format: { | force | noforce } + Format: { [,] | force | noforce } -- Number of I/O TLB slabs +-- Second integer after comma. Number of swiotlb +areas with their own lock. Must be power of 2. force -- force using of bounce buffers even if they wouldn't be automatically used by the kernel noforce -- Never use bounce buffers (for debugging) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 7ed35dd3de6e..5f898c5e9f19 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -89,6 +89,8 @@ extern enum swiotlb_force swiotlb_force; * @late_alloc:%true if allocated using the page allocator * @force_bounce: %true if swiotlb bouncing is forced * @for_alloc: %true if the pool is used for memory allocation + * @nareas: The area number in the pool. + * @area_nslabs: The slot number in the area. */ struct io_tlb_mem { phys_addr_t start; @@ -102,6 +104,9 @@ struct io_tlb_mem { bool late_alloc; bool force_bounce; bool for_alloc; + unsigned int nareas; + unsigned int area_nslabs; + struct io_tlb_area *areas; struct io_tlb_slot { phys_addr_t orig_addr; size_t alloc_size; diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index cb50f8d38360..9f547d8ab550 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -70,6 +70,43 @@ struct io_tlb_mem io_tlb_default_mem; phys_addr_t swiotlb_unencrypted_base; static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT; +static unsigned long default_nareas; + +/** + * struct io_tlb_area - IO TLB memory area descriptor + * + * This is a single area with a single lock. + * + * @used: The number of used IO TLB block. + * @index: The slot index to start searching in this area for next round. + * @lock: The lock to protect the above data structures in the map and + * unmap calls. + */ +struct io_tlb_area { + unsigned long used; + unsigned int index; + spinlock_t lock; +}; + +static void swiotlb_adjust_nareas(unsigned int nareas) +{ + if (!is_power_of_2(nareas)) + nareas = roundup_pow_of_two(nareas); + + default_nareas = nareas; + + pr_info("area num %d.\n", nareas); + /* +* Round up number of slabs to the next power of 2. +* The last area is going be smaller than the rest if +* default_nslabs is not power of two. +*/ + if (nareas && !is_power_of_2(default_nslabs)) { + default_nslabs = roundup_pow_of_two(default_nslabs); + pr_info("SWIOTLB bounce buffer size roundup to %luMB", + (default_nslabs << IO_TLB_SHIFT) >> 20); + } +} static int __init setup_io_tlb_npages(char *str) @@ -79,6 +116,10 @@ setup_io_tlb_np
[PATCH V3] swiotlb: Split up single swiotlb lock
From: Tianyu Lan Traditionally swiotlb was not performance critical because it was only used for slow devices. But in some setups, like TDX/SEV confidential guests, all IO has to go through swiotlb. Currently swiotlb only has a single lock. Under high IO load with multiple CPUs this can lead to significat lock contention on the swiotlb lock. This patch splits the swiotlb bounce buffer pool into individual areas which have their own lock. Each CPU tries to allocate in its own area first. Only if that fails does it search other areas. On freeing the allocation is freed into the area where the memory was originally allocated from. Area number can be set via swiotlb kernel parameter and is default to be possible cpu number. If possible cpu number is not power of 2, area number will be round up to the next power of 2. This idea from Andi Kleen patch(https://github.com/intel/tdx/commit/ 4529b5784c141782c72ec9bd9a92df2b68cb7d45). Based-on-idea-by: Andi Kleen Signed-off-by: Tianyu Lan --- Change since v2: * Use possible cpu number to adjust iotlb area number Change since v1: * Move struct io_tlb_area to swiotlb.c * Fix some coding style issue. --- .../admin-guide/kernel-parameters.txt | 4 +- include/linux/swiotlb.h | 5 + kernel/dma/swiotlb.c | 222 +++--- 3 files changed, 191 insertions(+), 40 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 2522b11e593f..4a6ad177d4b8 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -5904,8 +5904,10 @@ it if 0 is given (See Documentation/admin-guide/cgroup-v1/memory.rst) swiotlb=[ARM,IA-64,PPC,MIPS,X86] - Format: { | force | noforce } + Format: { [,] | force | noforce } -- Number of I/O TLB slabs +-- Second integer after comma. Number of swiotlb +areas with their own lock. Must be power of 2. force -- force using of bounce buffers even if they wouldn't be automatically used by the kernel noforce -- Never use bounce buffers (for debugging) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 7ed35dd3de6e..5f898c5e9f19 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -89,6 +89,8 @@ extern enum swiotlb_force swiotlb_force; * @late_alloc:%true if allocated using the page allocator * @force_bounce: %true if swiotlb bouncing is forced * @for_alloc: %true if the pool is used for memory allocation + * @nareas: The area number in the pool. + * @area_nslabs: The slot number in the area. */ struct io_tlb_mem { phys_addr_t start; @@ -102,6 +104,9 @@ struct io_tlb_mem { bool late_alloc; bool force_bounce; bool for_alloc; + unsigned int nareas; + unsigned int area_nslabs; + struct io_tlb_area *areas; struct io_tlb_slot { phys_addr_t orig_addr; size_t alloc_size; diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index cb50f8d38360..9e7aeca8faf4 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -70,6 +70,43 @@ struct io_tlb_mem io_tlb_default_mem; phys_addr_t swiotlb_unencrypted_base; static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT; +static unsigned long default_nareas = 1; + +/** + * struct io_tlb_area - IO TLB memory area descriptor + * + * This is a single area with a single lock. + * + * @used: The number of used IO TLB block. + * @index: The slot index to start searching in this area for next round. + * @lock: The lock to protect the above data structures in the map and + * unmap calls. + */ +struct io_tlb_area { + unsigned long used; + unsigned int index; + spinlock_t lock; +}; + +static void swiotlb_adjust_nareas(unsigned int nareas) +{ + if (!is_power_of_2(nareas)) + nareas = roundup_pow_of_two(nareas); + + default_nareas = nareas; + + pr_info("area num %d.\n", nareas); + /* +* Round up number of slabs to the next power of 2. +* The last area is going be smaller than the rest if +* default_nslabs is not power of two. +*/ + if (nareas > 1) { + default_nslabs = roundup_pow_of_two(default_nslabs); + pr_info("SWIOTLB bounce buffer size roundup to %luMB", + (default_nslabs << IO_TLB_SHIFT) >> 20); + } +} static int __init setup_io_tlb_npages(char *str) @@ -79,6 +116,10 @@ setup_io_tlb_npages(char *str) default_nslabs = ALIGN
Re: [PATCH 2/2] x86/ACPI: Set swiotlb area according to the number of lapic entry in MADT
On 7/6/2022 5:02 PM, Christoph Hellwig wrote: On Wed, Jul 06, 2022 at 04:57:33PM +0800, Tianyu Lan wrote: Swiotlb_init() is called in the mem_init() of different architects and memblock free pages are released to the buddy allocator just after calling swiotlb_init() via memblock_free_all(). Yes. The mem_init() is called before smp_init(). But why would that matter? cpu_possible_map is set up from setup_arch(), which is called before that. Sorry. I just still focus online cpu number and the number is got after smp_init(). Possible cpu number includes some offline cpus. I will have a try. Thanks for suggestion. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] x86/ACPI: Set swiotlb area according to the number of lapic entry in MADT
On 7/6/2022 4:00 PM, Christoph Hellwig wrote: On Fri, Jul 01, 2022 at 01:02:21AM +0800, Tianyu Lan wrote: Can we reorder that initialization? Because I really hate having to have an arch hook in every architecture. How about using "flags" parameter of swiotlb_init() to pass area number or add new parameter for area number? I just reposted patch 1 since there is just some coding style issue and area number may also set via swiotlb kernel parameter. We still need figure out a good solution to pass area number from architecture code. What is the problem with calling swiotlb_init after nr_possible_cpus() works? Swiotlb_init() is called in the mem_init() of different architects and memblock free pages are released to the buddy allocator just after calling swiotlb_init() via memblock_free_all(). The mem_init() is called before smp_init(). If calling swiotlb_init() after smp_init(), that means we can't allocate large chunk low end memory via memblock_alloc() in the swiotlb(). Swiotlb_init() needs to rework to allocate memory from the buddy allocator and just like swiotlb_init_late() does. This will limit the bounce buffer size. Otherwise We need to do the reorder for all achitectures and there maybe some other unknown issues. swiotlb flags parameter of swiotlb_init() seems to be a good place to pass the area number in current code. If not set the swiotlb_area number/flag, the area number will be one and keep the original behavior of one single global spinlock protecting io tlb data structure. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] x86/ACPI: Set swiotlb area according to the number of lapic entry in MADT
On 6/29/2022 10:04 PM, Christoph Hellwig wrote: On Mon, Jun 27, 2022 at 11:31:50AM -0400, Tianyu Lan wrote: From: Tianyu Lan When initialize swiotlb bounce buffer, smp_init() has not been called and cpu number can not be got from num_online_cpus(). Use the number of lapic entry to set swiotlb area number and keep swiotlb area number equal to cpu number on the x86 platform. Can we reorder that initialization? Because I really hate having to have an arch hook in every architecture. How about using "flags" parameter of swiotlb_init() to pass area number or add new parameter for area number? I just reposted patch 1 since there is just some coding style issue and area number may also set via swiotlb kernel parameter. We still need figure out a good solution to pass area number from architecture code. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH V2 1/1] swiotlb: Split up single swiotlb lock
From: Tianyu Lan Traditionally swiotlb was not performance critical because it was only used for slow devices. But in some setups, like TDX/SEV confidential guests, all IO has to go through swiotlb. Currently swiotlb only has a single lock. Under high IO load with multiple CPUs this can lead to significat lock contention on the swiotlb lock. This patch splits the swiotlb bounce buffer pool into individual areas which have their own lock. Each CPU tries to allocate in its own area first. Only if that fails does it search other areas. On freeing the allocation is freed into the area where the memory was originally allocated from. Area number can be set via swiotlb_adjust_nareas() and swiotlb kernel parameter. This idea from Andi Kleen patch(https://github.com/intel/tdx/commit/ 4529b5784c141782c72ec9bd9a92df2b68cb7d45). Based-on-idea-by: Andi Kleen Signed-off-by: Tianyu Lan --- Change since v1: * Move struct io_tlb_area to swiotlb.c * Fix some coding style issue. --- .../admin-guide/kernel-parameters.txt | 4 +- include/linux/swiotlb.h | 5 + kernel/dma/swiotlb.c | 218 ++ 3 files changed, 187 insertions(+), 40 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 2522b11e593f..4a6ad177d4b8 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -5904,8 +5904,10 @@ it if 0 is given (See Documentation/admin-guide/cgroup-v1/memory.rst) swiotlb=[ARM,IA-64,PPC,MIPS,X86] - Format: { | force | noforce } + Format: { [,] | force | noforce } -- Number of I/O TLB slabs +-- Second integer after comma. Number of swiotlb +areas with their own lock. Must be power of 2. force -- force using of bounce buffers even if they wouldn't be automatically used by the kernel noforce -- Never use bounce buffers (for debugging) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 7ed35dd3de6e..5f898c5e9f19 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -89,6 +89,8 @@ extern enum swiotlb_force swiotlb_force; * @late_alloc:%true if allocated using the page allocator * @force_bounce: %true if swiotlb bouncing is forced * @for_alloc: %true if the pool is used for memory allocation + * @nareas: The area number in the pool. + * @area_nslabs: The slot number in the area. */ struct io_tlb_mem { phys_addr_t start; @@ -102,6 +104,9 @@ struct io_tlb_mem { bool late_alloc; bool force_bounce; bool for_alloc; + unsigned int nareas; + unsigned int area_nslabs; + struct io_tlb_area *areas; struct io_tlb_slot { phys_addr_t orig_addr; size_t alloc_size; diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index cb50f8d38360..421bba62d4f1 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -70,6 +70,45 @@ struct io_tlb_mem io_tlb_default_mem; phys_addr_t swiotlb_unencrypted_base; static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT; +static unsigned long default_nareas = 1; + +/** + * struct io_tlb_area - IO TLB memory area descriptor + * + * This is a single area with a single lock. + * + * @used: The number of used IO TLB block. + * @index: The slot index to start searching in this area for next round. + * @lock: The lock to protect the above data structures in the map and + * unmap calls. + */ +struct io_tlb_area { + unsigned long used; + unsigned int index; + spinlock_t lock; +}; + +static void __init swiotlb_adjust_nareas(unsigned int nareas) +{ + if (!is_power_of_2(nareas)) { + pr_err("swiotlb: Invalid areas parameter %d.\n", nareas); + return; + } + + default_nareas = nareas; + + pr_info("area num %d.\n", nareas); + /* +* Round up number of slabs to the next power of 2. +* The last area is going be smaller than the rest if +* default_nslabs is not power of two. +*/ + if (nareas > 1) { + default_nslabs = roundup_pow_of_two(default_nslabs); + pr_info("SWIOTLB bounce buffer size roundup to %luMB", + (default_nslabs << IO_TLB_SHIFT) >> 20); + } +} static int __init setup_io_tlb_npages(char *str) @@ -79,6 +118,10 @@ setup_io_tlb_npages(char *str) default_nslabs = ALIGN(simple_strtoul(str, , 0), IO_TLB_SEGSIZE); } + if (*str == ',') + ++str; + if (isdigit(*str)) +
[PATCH 1/2] swiotlb: Split up single swiotlb lock
From: Tianyu Lan Traditionally swiotlb was not performance critical because it was only used for slow devices. But in some setups, like TDX/SEV confidential guests, all IO has to go through swiotlb. Currently swiotlb only has a single lock. Under high IO load with multiple CPUs this can lead to significat lock contention on the swiotlb lock. This patch splits the swiotlb bounce buffer pool into individual areas which have their own lock. Each CPU tries to allocate in its own area first. Only if that fails does it search other areas. On freeing the allocation is freed into the area where the memory was originally allocated from. Area number can be set via swiotlb_adjust_nareas() and swiotlb kernel parameter. This idea from Andi Kleen patch(https://github.com/intel/tdx/commit/4529b578 4c141782c72ec9bd9a92df2b68cb7d45). Based-on-idea-by: Andi Kleen Signed-off-by: Tianyu Lan --- .../admin-guide/kernel-parameters.txt | 4 +- include/linux/swiotlb.h | 27 +++ kernel/dma/swiotlb.c | 202 ++ 3 files changed, 194 insertions(+), 39 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 2522b11e593f..4a6ad177d4b8 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -5904,8 +5904,10 @@ it if 0 is given (See Documentation/admin-guide/cgroup-v1/memory.rst) swiotlb=[ARM,IA-64,PPC,MIPS,X86] - Format: { | force | noforce } + Format: { [,] | force | noforce } -- Number of I/O TLB slabs +-- Second integer after comma. Number of swiotlb +areas with their own lock. Must be power of 2. force -- force using of bounce buffers even if they wouldn't be automatically used by the kernel noforce -- Never use bounce buffers (for debugging) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 7ed35dd3de6e..7157428cf3ac 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -62,6 +62,22 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys, #ifdef CONFIG_SWIOTLB extern enum swiotlb_force swiotlb_force; +/** + * struct io_tlb_area - IO TLB memory area descriptor + * + * This is a single area with a single lock. + * + * @used: The number of used IO TLB block. + * @index: The slot index to start searching in this area for next round. + * @lock: The lock to protect the above data structures in the map and + * unmap calls. + */ +struct io_tlb_area { + unsigned long used; + unsigned int index; + spinlock_t lock; +}; + /** * struct io_tlb_mem - IO TLB Memory Pool Descriptor * @@ -89,6 +105,8 @@ extern enum swiotlb_force swiotlb_force; * @late_alloc:%true if allocated using the page allocator * @force_bounce: %true if swiotlb bouncing is forced * @for_alloc: %true if the pool is used for memory allocation + * @nareas: The area number in the pool. + * @area_nslabs: The slot number in the area. */ struct io_tlb_mem { phys_addr_t start; @@ -102,6 +120,9 @@ struct io_tlb_mem { bool late_alloc; bool force_bounce; bool for_alloc; + unsigned int nareas; + unsigned int area_nslabs; + struct io_tlb_area *areas; struct io_tlb_slot { phys_addr_t orig_addr; size_t alloc_size; @@ -130,6 +151,7 @@ unsigned int swiotlb_max_segment(void); size_t swiotlb_max_mapping_size(struct device *dev); bool is_swiotlb_active(struct device *dev); void __init swiotlb_adjust_size(unsigned long size); +void __init swiotlb_adjust_nareas(unsigned int nareas); #else static inline void swiotlb_init(bool addressing_limited, unsigned int flags) { @@ -162,6 +184,11 @@ static inline bool is_swiotlb_active(struct device *dev) static inline void swiotlb_adjust_size(unsigned long size) { } + +static inline void swiotlb_adjust_nareas(unsigned int nareas) +{ +} + #endif /* CONFIG_SWIOTLB */ extern void swiotlb_print_info(void); diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index cb50f8d38360..17154abdfb34 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -70,6 +70,7 @@ struct io_tlb_mem io_tlb_default_mem; phys_addr_t swiotlb_unencrypted_base; static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT; +static unsigned long default_nareas = 1; static int __init setup_io_tlb_npages(char *str) @@ -79,6 +80,10 @@ setup_io_tlb_npages(char *str) default_nslabs = ALIGN(simple_strtoul(str, , 0), IO_TLB_SEGSIZE); } + if (*str == ',') + ++str; + if (isdigi
[PATCH 0/2] swiotlb: Split up single swiotlb lock
From: Tianyu Lan Traditionally swiotlb was not performance critical because it was only used for slow devices. But in some setups, like TDX/SEV confidential guests, all IO has to go through swiotlb. Currently swiotlb only has a single lock. Under high IO load with multiple CPUs this can lead to significat lock contention on the swiotlb lock. Patch 1 is to introduce swiotlb area concept and split up single swiotlb lock. Patch 2 set swiotlb area number with lapic number Tianyu Lan (2): swiotlb: Split up single swiotlb lock x86/ACPI: Set swiotlb area according to the number of lapic entry in MADT .../admin-guide/kernel-parameters.txt | 4 +- arch/x86/kernel/acpi/boot.c | 3 + include/linux/swiotlb.h | 27 +++ kernel/dma/swiotlb.c | 202 ++ 4 files changed, 197 insertions(+), 39 deletions(-) -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/2] x86/ACPI: Set swiotlb area according to the number of lapic entry in MADT
From: Tianyu Lan When initialize swiotlb bounce buffer, smp_init() has not been called and cpu number can not be got from num_online_cpus(). Use the number of lapic entry to set swiotlb area number and keep swiotlb area number equal to cpu number on the x86 platform. Based-on-idea-by: Andi Kleen Signed-off-by: Tianyu Lan --- arch/x86/kernel/acpi/boot.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c index 907cc98b1938..7e13499f2c10 100644 --- a/arch/x86/kernel/acpi/boot.c +++ b/arch/x86/kernel/acpi/boot.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include @@ -1131,6 +1132,8 @@ static int __init acpi_parse_madt_lapic_entries(void) return count; } + swiotlb_adjust_nareas(max(count, x2count)); + x2count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_X2APIC_NMI, acpi_parse_x2apic_nmi, 0); count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_APIC_NMI, -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH V4 1/1] swiotlb: Split up single swiotlb lock
On 6/22/2022 6:54 PM, Christoph Hellwig wrote: Thanks, this looks pretty good to me. A few comments below: Thanks for your review. On Fri, Jun 17, 2022 at 10:47:41AM -0400, Tianyu Lan wrote: +/** + * struct io_tlb_area - IO TLB memory area descriptor + * + * This is a single area with a single lock. + * + * @used: The number of used IO TLB block. + * @index: The slot index to start searching in this area for next round. + * @lock: The lock to protect the above data structures in the map and + * unmap calls. + */ +struct io_tlb_area { + unsigned long used; + unsigned int index; + spinlock_t lock; +}; This can go into swiotlb.c. struct io_tlb_area is used in the struct io_tlb_mem. +void __init swiotlb_adjust_nareas(unsigned int nareas); And this should be marked static. +#define DEFAULT_NUM_AREAS 1 I'd drop this define, the magic 1 and a > 1 comparism seems to convey how it is used much better as the checks aren't about default or not, but about larger than one. I also think that we want some good way to size the default, e.g. by number of CPUs or memory size. swiotlb_adjust_nareas() is exposed to platforms to set area number. When swiotlb_init() is called, smp_init() isn't called at that point and so standard API of checking cpu number (e.g, num_online_cpus()) doesn't work. Platforms may have other ways to get cpu number(e.g x86 may ACPI MADT table entries to get cpu nubmer) and set area number. I will post following patch to set cpu number via swiotlb_adjust_nareas(), +void __init swiotlb_adjust_nareas(unsigned int nareas) +{ + if (!is_power_of_2(nareas)) { + pr_err("swiotlb: Invalid areas parameter %d.\n", nareas); + return; + } + + default_nareas = nareas; + + pr_info("area num %d.\n", nareas); + /* Round up number of slabs to the next power of 2. +* The last area is going be smaller than the rest if +* default_nslabs is not power of two. +*/ Please follow the normal kernel comment style with a /* on its own line. OK. Will update. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH V4 1/1] swiotlb: Split up single swiotlb lock
From: Tianyu Lan Traditionally swiotlb was not performance critical because it was only used for slow devices. But in some setups, like TDX/SEV confidential guests, all IO has to go through swiotlb. Currently swiotlb only has a single lock. Under high IO load with multiple CPUs this can lead to significat lock contention on the swiotlb lock. This patch splits the swiotlb bounce buffer pool into individual areas which have their own lock. Each CPU tries to allocate in its own area first. Only if that fails does it search other areas. On freeing the allocation is freed into the area where the memory was originally allocated from. Area number can be set via swiotlb_adjust_nareas() and swiotlb kernel parameter. This idea from Andi Kleen patch(https://github.com/intel/tdx/commit/4529b578 4c141782c72ec9bd9a92df2b68cb7d45). Based-on-idea-by: Andi Kleen Signed-off-by: Tianyu Lan --- .../admin-guide/kernel-parameters.txt | 4 +- include/linux/swiotlb.h | 27 +++ kernel/dma/swiotlb.c | 202 ++ 3 files changed, 194 insertions(+), 39 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 8090130b544b..5d46271982d5 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -5869,8 +5869,10 @@ it if 0 is given (See Documentation/admin-guide/cgroup-v1/memory.rst) swiotlb=[ARM,IA-64,PPC,MIPS,X86] - Format: { | force | noforce } + Format: { [,] | force | noforce } -- Number of I/O TLB slabs +-- Second integer after comma. Number of swiotlb +areas with their own lock. Must be power of 2. force -- force using of bounce buffers even if they wouldn't be automatically used by the kernel noforce -- Never use bounce buffers (for debugging) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 7ed35dd3de6e..7157428cf3ac 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -62,6 +62,22 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys, #ifdef CONFIG_SWIOTLB extern enum swiotlb_force swiotlb_force; +/** + * struct io_tlb_area - IO TLB memory area descriptor + * + * This is a single area with a single lock. + * + * @used: The number of used IO TLB block. + * @index: The slot index to start searching in this area for next round. + * @lock: The lock to protect the above data structures in the map and + * unmap calls. + */ +struct io_tlb_area { + unsigned long used; + unsigned int index; + spinlock_t lock; +}; + /** * struct io_tlb_mem - IO TLB Memory Pool Descriptor * @@ -89,6 +105,8 @@ extern enum swiotlb_force swiotlb_force; * @late_alloc:%true if allocated using the page allocator * @force_bounce: %true if swiotlb bouncing is forced * @for_alloc: %true if the pool is used for memory allocation + * @nareas: The area number in the pool. + * @area_nslabs: The slot number in the area. */ struct io_tlb_mem { phys_addr_t start; @@ -102,6 +120,9 @@ struct io_tlb_mem { bool late_alloc; bool force_bounce; bool for_alloc; + unsigned int nareas; + unsigned int area_nslabs; + struct io_tlb_area *areas; struct io_tlb_slot { phys_addr_t orig_addr; size_t alloc_size; @@ -130,6 +151,7 @@ unsigned int swiotlb_max_segment(void); size_t swiotlb_max_mapping_size(struct device *dev); bool is_swiotlb_active(struct device *dev); void __init swiotlb_adjust_size(unsigned long size); +void __init swiotlb_adjust_nareas(unsigned int nareas); #else static inline void swiotlb_init(bool addressing_limited, unsigned int flags) { @@ -162,6 +184,11 @@ static inline bool is_swiotlb_active(struct device *dev) static inline void swiotlb_adjust_size(unsigned long size) { } + +static inline void swiotlb_adjust_nareas(unsigned int nareas) +{ +} + #endif /* CONFIG_SWIOTLB */ extern void swiotlb_print_info(void); diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index cb50f8d38360..139d08068912 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -62,6 +62,8 @@ #define INVALID_PHYS_ADDR (~(phys_addr_t)0) +#define DEFAULT_NUM_AREAS 1 + static bool swiotlb_force_bounce; static bool swiotlb_force_disable; @@ -70,6 +72,7 @@ struct io_tlb_mem io_tlb_default_mem; phys_addr_t swiotlb_unencrypted_base; static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT; +static unsigned long default_nareas = DEFAULT_NUM_AREAS; static int __init setup_io_tlb_npages(char *str) @@ -79,6 +82,10 @@ setup_io_tlb_npages(char *str) default_
Re: [RFC PATCH V3 2/2] net: netvsc: Allocate per-device swiotlb bounce buffer for netvsc
On 5/27/2022 2:43 AM, Dexuan Cui wrote: From: Tianyu Lan Sent: Thursday, May 26, 2022 5:01 AM ... @@ -119,6 +124,10 @@ static void netvsc_subchan_work(struct work_struct *w) nvdev->max_chn = 1; nvdev->num_chn = 1; } + + /* Allocate boucne buffer.*/ + swiotlb_device_allocate(>device, nvdev->num_chn, + 10 * IO_TLB_BLOCK_UNIT); } Looks like swiotlb_device_allocate() is not called if the netvsc device has only 1 primary channel and no sub-schannel, e.g. in the case of single-vCPU VM? When there is only sinlge,there seems not to be much performance penalty. But you are right, we should keep the same behavior when single CPU and multi CPU. Will update in the next version. Thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH V3 1/2] swiotlb: Add Child IO TLB mem support
From: Tianyu Lan Traditionally swiotlb was not performance critical because it was only used for slow devices. But in some setups, like TDX/SEV confidential guests, all IO has to go through swiotlb. Currently swiotlb only has a single lock. Under high IO load with multiple CPUs this can lead to significant lock contention on the swiotlb lock. This patch adds child IO TLB mem support to resolve spinlock overhead among device's queues. Each device may allocate IO tlb mem and setup child IO TLB mem according to queue number. Swiotlb code allocates bounce buffer among child IO tlb mem iterately. Introduce IO TLB Block unit(2MB) concepts to allocate big bounce buffer from default pool for devices. IO TLB segment(256k) is too small for device bounce buffer. Signed-off-by: Tianyu Lan --- include/linux/swiotlb.h | 38 + kernel/dma/swiotlb.c| 304 ++-- 2 files changed, 329 insertions(+), 13 deletions(-) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 7ed35dd3de6e..a48a9d64e3c3 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -31,6 +31,14 @@ struct scatterlist; #define IO_TLB_SHIFT 11 #define IO_TLB_SIZE (1 << IO_TLB_SHIFT) +/* + * IO TLB BLOCK UNIT as device bounce buffer allocation unit. + * This allows device allocates bounce buffer from default io + * tlb pool. + */ +#define IO_TLB_BLOCKSIZE (8 * IO_TLB_SEGSIZE) +#define IO_TLB_BLOCK_UNIT (IO_TLB_BLOCKSIZE << IO_TLB_SHIFT) + /* default to 64MB */ #define IO_TLB_DEFAULT_SIZE (64UL<<20) @@ -89,6 +97,11 @@ extern enum swiotlb_force swiotlb_force; * @late_alloc:%true if allocated using the page allocator * @force_bounce: %true if swiotlb bouncing is forced * @for_alloc: %true if the pool is used for memory allocation + * @num_child: The child io tlb mem number in the pool. + * @child_nslot:The number of IO TLB slot in the child IO TLB mem. + * @child_nblock:The number of IO TLB block in the child IO TLB mem. + * @child_start:The child index to start searching in the next round. + * @block_start:The block index to start searching in the next round. */ struct io_tlb_mem { phys_addr_t start; @@ -102,6 +115,16 @@ struct io_tlb_mem { bool late_alloc; bool force_bounce; bool for_alloc; + unsigned int num_child; + unsigned int child_nslot; + unsigned int child_nblock; + unsigned int child_start; + unsigned int block_index; + struct io_tlb_mem *child; + struct io_tlb_mem *parent; + struct io_tlb_block { + unsigned int list; + } *block; struct io_tlb_slot { phys_addr_t orig_addr; size_t alloc_size; @@ -130,6 +153,10 @@ unsigned int swiotlb_max_segment(void); size_t swiotlb_max_mapping_size(struct device *dev); bool is_swiotlb_active(struct device *dev); void __init swiotlb_adjust_size(unsigned long size); +int swiotlb_device_allocate(struct device *dev, + unsigned int area_num, + unsigned long size); +void swiotlb_device_free(struct device *dev); #else static inline void swiotlb_init(bool addressing_limited, unsigned int flags) { @@ -162,6 +189,17 @@ static inline bool is_swiotlb_active(struct device *dev) static inline void swiotlb_adjust_size(unsigned long size) { } + +void swiotlb_device_free(struct device *dev) +{ +} + +int swiotlb_device_allocate(struct device *dev, + unsigned int area_num, + unsigned long size) +{ + return -ENOMEM; +} #endif /* CONFIG_SWIOTLB */ extern void swiotlb_print_info(void); diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index e2ef0864eb1e..7ca22a5a1886 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -195,7 +195,8 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, unsigned long nslabs, bool late_alloc) { void *vaddr = phys_to_virt(start); - unsigned long bytes = nslabs << IO_TLB_SHIFT, i; + unsigned long bytes = nslabs << IO_TLB_SHIFT, i, j; + unsigned int block_num = nslabs / IO_TLB_BLOCKSIZE; mem->nslabs = nslabs; mem->start = start; @@ -207,7 +208,36 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, mem->force_bounce = true; spin_lock_init(>lock); - for (i = 0; i < mem->nslabs; i++) { + + if (mem->num_child) { + mem->child_nslot = nslabs / mem->num_child; + mem->child_nblock = block_num / mem->num_child; + mem->child_start = 0; + + /* +* Initialize child IO TLB mem, divide IO TLB pool +* into child number. Reuse parent mem->slot in the +* child mem->slo
[RFC PATCH V3 2/2] net: netvsc: Allocate per-device swiotlb bounce buffer for netvsc
From: Tianyu Lan Netvsc driver allocates device io tlb mem via calling swiotlb_device_ allocate() and set child io tlb mem number according to device queue number. Child io tlb mem may reduce overhead of single spin lock in device io tlb mem among multi device queues. Signed-off-by: Tianyu Lan --- drivers/net/hyperv/netvsc.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c index 9442f751ad3a..26a8f8f84fc4 100644 --- a/drivers/net/hyperv/netvsc.c +++ b/drivers/net/hyperv/netvsc.c @@ -23,6 +23,7 @@ #include #include +#include #include "hyperv_net.h" #include "netvsc_trace.h" @@ -98,6 +99,7 @@ static void netvsc_subchan_work(struct work_struct *w) struct netvsc_device *nvdev = container_of(w, struct netvsc_device, subchan_work); struct rndis_device *rdev; + struct hv_device *hdev; int i, ret; /* Avoid deadlock with device removal already under RTNL */ @@ -108,6 +110,9 @@ static void netvsc_subchan_work(struct work_struct *w) rdev = nvdev->extension; if (rdev) { + hdev = ((struct net_device_context *) + netdev_priv(rdev->ndev))->device_ctx; + ret = rndis_set_subchannel(rdev->ndev, nvdev, NULL); if (ret == 0) { netif_device_attach(rdev->ndev); @@ -119,6 +124,10 @@ static void netvsc_subchan_work(struct work_struct *w) nvdev->max_chn = 1; nvdev->num_chn = 1; } + + /* Allocate boucne buffer.*/ + swiotlb_device_allocate(>device, nvdev->num_chn, + 10 * IO_TLB_BLOCK_UNIT); } rtnl_unlock(); @@ -769,6 +778,7 @@ void netvsc_device_remove(struct hv_device *device) /* Release all resources */ free_netvsc_device_rcu(net_device); + swiotlb_device_free(>device); } #define RING_AVAIL_PERCENT_HIWATER 20 -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH V3 0/2] swiotlb: Add child io tlb mem support
From: Tianyu Lan Traditionally swiotlb was not performance critical because it was only used for slow devices. But in some setups, like TDX/SEV confidential guests, all IO has to go through swiotlb. Currently swiotlb only has a single lock. Under high IO load with multiple CPUs this can lead to significant lock contention on the swiotlb lock. This patch adds child IO TLB mem support to resolve spinlock overhead among device's queues. Each device may allocate IO tlb mem and setup child IO TLB mem according to queue number. The number child IO tlb mem maybe set up equal with device queue number and this helps to resolve swiotlb spinlock overhead among devices and queues. introduces IO TLB Block concepts and swiotlb_device_allocate() API to allocate per-device swiotlb bounce buffer. The new API Accepts queue number as the number of child IO TLB mem to set up device's IO TLB mem. Patch 2 calls new allocation function in the netvsc driver to resolve global spin lock issue. Tianyu Lan (2): swiotlb: Add Child IO TLB mem support net: netvsc: Allocate per-device swiotlb bounce buffer for netvsc drivers/net/hyperv/netvsc.c | 10 ++ include/linux/swiotlb.h | 38 + kernel/dma/swiotlb.c| 299 ++-- 3 files changed, 334 insertions(+), 13 deletions(-) -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH V2 1/2] swiotlb: Add Child IO TLB mem support
On 5/16/2022 3:34 PM, Christoph Hellwig wrote: I don't really understand how 'childs' fit in here. The code also doesn't seem to be usable without patch 2 and a caller of the new functions added in patch 2, so it is rather impossible to review. Hi Christoph: OK. I will merge two patches and add a caller patch. The motivation is to avoid global spin lock when devices use swiotlb bounce buffer and this introduces overhead during high throughput cases. In my test environment, current code can achieve about 24Gb/s network throughput with SWIOTLB force enabled and it can achieve about 40Gb/s without SWIOTLB force. Storage also has the same issue. Per-device IO TLB mem may resolve global spin lock issue among devices but device still may have multi queues. Multi queues still need to share one spin lock. This is why introduce child or IO tlb areas in the previous patches. Each device queues will have separate child IO TLB mem and single spin lock to manage their IO TLB buffers. Otherwise, global spin lock still cost cpu usage during high throughput even when there is performance regression. Each device queues needs to spin on the different cpus to acquire the global lock. Child IO TLB mem also may resolve the cpu issue. Also: 1) why is SEV/TDX so different from other cases that need bounce buffering to treat it different and we can't work on a general scalability improvement Other cases also have global spin lock issue but it depends on whether hits the bottleneck. The cpu usage issue may be ignored. 2) per previous discussions at how swiotlb itself works, it is clear that another option is to just make pages we DMA to shared with the hypervisor. Why don't we try that at least for larger I/O? For confidential VM(Both TDX and SEV), we need to use bounce buffer to copy between private memory that hypervisor can't access directly and shared memory. For security consideration, confidential VM should not share IO stack DMA pages with hypervisor directly to avoid attack from hypervisor when IO stack handles the DMA data. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] swiotlb: Max mapping size takes min align mask into account
From: Tianyu Lan swiotlb_find_slots() skips slots according to io tlb aligned mask calculated from min aligned mask and original physical address offset. This affects max mapping size. The mapping size can't achieve the IO_TLB_SEGSIZE * IO_TLB_SIZE when original offset is non-zero. This will cause system boot up failure in Hyper-V Isolation VM where swiotlb force is enabled. Scsi layer use return value of dma_max_mapping_size() to set max segment size and it finally calls swiotlb_max_mapping_size(). Hyper-V storage driver sets min align mask to 4k - 1. Scsi layer may pass 256k length of request buffer with 0~4k offset and Hyper-V storage driver can't get swiotlb bounce buffer via DMA API. Swiotlb_find_slots() can't find 256k length bounce buffer with offset. Make swiotlb_max_mapping _size() take min align mask into account. Signed-off-by: Tianyu Lan --- kernel/dma/swiotlb.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 73a41cec9e38..0d6684ca7eab 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -743,7 +743,18 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t paddr, size_t size, size_t swiotlb_max_mapping_size(struct device *dev) { - return ((size_t)IO_TLB_SIZE) * IO_TLB_SEGSIZE; + int min_align_mask = dma_get_min_align_mask(dev); + int min_align = 0; + + /* +* swiotlb_find_slots() skips slots according to +* min align mask. This affects max mapping size. +* Take it into acount here. +*/ + if (min_align_mask) + min_align = roundup(min_align_mask, IO_TLB_SIZE); + + return ((size_t)IO_TLB_SIZE) * IO_TLB_SEGSIZE - min_align; } bool is_swiotlb_active(struct device *dev) -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH V2 0/2] swiotlb: Add child io tlb mem support
On 5/2/2022 8:54 PM, Tianyu Lan wrote: From: Tianyu Lan Traditionally swiotlb was not performance critical because it was only used for slow devices. But in some setups, like TDX/SEV confidential guests, all IO has to go through swiotlb. Currently swiotlb only has a single lock. Under high IO load with multiple CPUs this can lead to significant lock contention on the swiotlb lock. This patch adds child IO TLB mem support to resolve spinlock overhead among device's queues. Each device may allocate IO tlb mem and setup child IO TLB mem according to queue number. The number child IO tlb mem maybe set up equal with device queue number and this helps to resolve swiotlb spinlock overhead among devices and queues. Patch 2 introduces IO TLB Block concepts and swiotlb_device_allocate() API to allocate per-device swiotlb bounce buffer. The new API Accepts queue number as the number of child IO TLB mem to set up device's IO TLB mem. Gentile ping... Thanks. Tianyu Lan (2): swiotlb: Add Child IO TLB mem support Swiotlb: Add device bounce buffer allocation interface include/linux/swiotlb.h | 40 ++ kernel/dma/swiotlb.c| 290 ++-- 2 files changed, 317 insertions(+), 13 deletions(-) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH V2 1/2] swiotlb: Add Child IO TLB mem support
From: Tianyu Lan Traditionally swiotlb was not performance critical because it was only used for slow devices. But in some setups, like TDX/SEV confidential guests, all IO has to go through swiotlb. Currently swiotlb only has a single lock. Under high IO load with multiple CPUs this can lead to significant lock contention on the swiotlb lock. This patch adds child IO TLB mem support to resolve spinlock overhead among device's queues. Each device may allocate IO tlb mem and setup child IO TLB mem according to queue number. Swiotlb code allocates bounce buffer among child IO tlb mem iterately. Signed-off-by: Tianyu Lan --- include/linux/swiotlb.h | 7 +++ kernel/dma/swiotlb.c| 97 - 2 files changed, 94 insertions(+), 10 deletions(-) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 7ed35dd3de6e..4a3f6a7b4b7e 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -89,6 +89,9 @@ extern enum swiotlb_force swiotlb_force; * @late_alloc:%true if allocated using the page allocator * @force_bounce: %true if swiotlb bouncing is forced * @for_alloc: %true if the pool is used for memory allocation + * @child_nslot:The number of IO TLB slot in the child IO TLB mem. + * @num_child: The child io tlb mem number in the pool. + * @child_start:The child index to start searching in the next round. */ struct io_tlb_mem { phys_addr_t start; @@ -102,6 +105,10 @@ struct io_tlb_mem { bool late_alloc; bool force_bounce; bool for_alloc; + unsigned int num_child; + unsigned int child_nslot; + unsigned int child_start; + struct io_tlb_mem *child; struct io_tlb_slot { phys_addr_t orig_addr; size_t alloc_size; diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index e2ef0864eb1e..32e8f42530b6 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -207,6 +207,26 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, mem->force_bounce = true; spin_lock_init(>lock); + + if (mem->num_child) { + mem->child_nslot = nslabs / mem->num_child; + mem->child_start = 0; + + /* +* Initialize child IO TLB mem, divide IO TLB pool +* into child number. Reuse parent mem->slot in the +* child mem->slot. +*/ + for (i = 0; i < mem->num_child; i++) { + mem->child[i].slots = mem->slots + i * mem->child_nslot; + mem->child[i].num_child = 0; + + swiotlb_init_io_tlb_mem(>child[i], + start + ((i * mem->child_nslot) << IO_TLB_SHIFT), + mem->child_nslot, late_alloc); + } + } + for (i = 0; i < mem->nslabs; i++) { mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i); mem->slots[i].orig_addr = INVALID_PHYS_ADDR; @@ -336,16 +356,18 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask, mem->slots = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, get_order(array_size(sizeof(*mem->slots), nslabs))); - if (!mem->slots) { - free_pages((unsigned long)vstart, order); - return -ENOMEM; - } + if (!mem->slots) + goto error_slots; set_memory_decrypted((unsigned long)vstart, bytes >> PAGE_SHIFT); swiotlb_init_io_tlb_mem(mem, virt_to_phys(vstart), nslabs, true); swiotlb_print_info(); return 0; + +error_slots: + free_pages((unsigned long)vstart, order); + return -ENOMEM; } void __init swiotlb_exit(void) @@ -483,10 +505,11 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, unsigned int index) * Find a suitable number of IO TLB entries size that will fit this request and * allocate a buffer from that IO TLB pool. */ -static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr, - size_t alloc_size, unsigned int alloc_align_mask) +static int swiotlb_do_find_slots(struct io_tlb_mem *mem, +struct device *dev, phys_addr_t orig_addr, +size_t alloc_size, +unsigned int alloc_align_mask) { - struct io_tlb_mem *mem = dev->dma_io_tlb_mem; unsigned long boundary_mask = dma_get_seg_boundary(dev); dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(dev, mem->start) & boundary_mask; @@ -565,6 +588,46 @@ static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr, return index; } +static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr, +
[RFC PATCH V2 0/2] swiotlb: Add child io tlb mem support
From: Tianyu Lan Traditionally swiotlb was not performance critical because it was only used for slow devices. But in some setups, like TDX/SEV confidential guests, all IO has to go through swiotlb. Currently swiotlb only has a single lock. Under high IO load with multiple CPUs this can lead to significant lock contention on the swiotlb lock. This patch adds child IO TLB mem support to resolve spinlock overhead among device's queues. Each device may allocate IO tlb mem and setup child IO TLB mem according to queue number. The number child IO tlb mem maybe set up equal with device queue number and this helps to resolve swiotlb spinlock overhead among devices and queues. Patch 2 introduces IO TLB Block concepts and swiotlb_device_allocate() API to allocate per-device swiotlb bounce buffer. The new API Accepts queue number as the number of child IO TLB mem to set up device's IO TLB mem. Tianyu Lan (2): swiotlb: Add Child IO TLB mem support Swiotlb: Add device bounce buffer allocation interface include/linux/swiotlb.h | 40 ++ kernel/dma/swiotlb.c| 290 ++-- 2 files changed, 317 insertions(+), 13 deletions(-) -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH V2 2/2] Swiotlb: Add device bounce buffer allocation interface
From: Tianyu Lan In SEV/TDX Confidential VM, device DMA transaction needs use swiotlb bounce buffer to share data with host/hypervisor. The swiotlb spinlock introduces overhead among devices if they share io tlb mem. Avoid such issue, introduce swiotlb_device_allocate() to allocate device bounce buffer from default io tlb pool and set up child IO tlb mem for queue bounce buffer allocaton according input queue number. Device may have multi io queues and setting up the same number of child io tlb mem may help to resolve spinlock overhead among queues. Introduce IO TLB Block unit(2MB) concepts to allocate big bounce buffer from default pool for devices. IO TLB segment(256k) is too small. Signed-off-by: Tianyu Lan --- include/linux/swiotlb.h | 35 +++- kernel/dma/swiotlb.c| 195 +++- 2 files changed, 225 insertions(+), 5 deletions(-) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 4a3f6a7b4b7e..efd29e884fd7 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -31,6 +31,14 @@ struct scatterlist; #define IO_TLB_SHIFT 11 #define IO_TLB_SIZE (1 << IO_TLB_SHIFT) +/* + * IO TLB BLOCK UNIT as device bounce buffer allocation unit. + * This allows device allocates bounce buffer from default io + * tlb pool. + */ +#define IO_TLB_BLOCKSIZE (8 * IO_TLB_SEGSIZE) +#define IO_TLB_BLOCK_UNIT (IO_TLB_BLOCKSIZE << IO_TLB_SHIFT) + /* default to 64MB */ #define IO_TLB_DEFAULT_SIZE (64UL<<20) @@ -89,9 +97,11 @@ extern enum swiotlb_force swiotlb_force; * @late_alloc:%true if allocated using the page allocator * @force_bounce: %true if swiotlb bouncing is forced * @for_alloc: %true if the pool is used for memory allocation - * @child_nslot:The number of IO TLB slot in the child IO TLB mem. * @num_child: The child io tlb mem number in the pool. + * @child_nslot:The number of IO TLB slot in the child IO TLB mem. + * @child_nblock:The number of IO TLB block in the child IO TLB mem. * @child_start:The child index to start searching in the next round. + * @block_start:The block index to start searching in the next round. */ struct io_tlb_mem { phys_addr_t start; @@ -107,8 +117,16 @@ struct io_tlb_mem { bool for_alloc; unsigned int num_child; unsigned int child_nslot; + unsigned int child_nblock; unsigned int child_start; + unsigned int block_index; struct io_tlb_mem *child; + struct io_tlb_mem *parent; + struct io_tlb_block { + size_t alloc_size; + unsigned long start_slot; + unsigned int list; + } *block; struct io_tlb_slot { phys_addr_t orig_addr; size_t alloc_size; @@ -137,6 +155,10 @@ unsigned int swiotlb_max_segment(void); size_t swiotlb_max_mapping_size(struct device *dev); bool is_swiotlb_active(struct device *dev); void __init swiotlb_adjust_size(unsigned long size); +int swiotlb_device_allocate(struct device *dev, + unsigned int area_num, + unsigned long size); +void swiotlb_device_free(struct device *dev); #else static inline void swiotlb_init(bool addressing_limited, unsigned int flags) { @@ -169,6 +191,17 @@ static inline bool is_swiotlb_active(struct device *dev) static inline void swiotlb_adjust_size(unsigned long size) { } + +void swiotlb_device_free(struct device *dev) +{ +} + +int swiotlb_device_allocate(struct device *dev, + unsigned int area_num, + unsigned long size) +{ + return -ENOMEM; +} #endif /* CONFIG_SWIOTLB */ extern void swiotlb_print_info(void); diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 32e8f42530b6..f8a0711cd9de 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -195,7 +195,8 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, unsigned long nslabs, bool late_alloc) { void *vaddr = phys_to_virt(start); - unsigned long bytes = nslabs << IO_TLB_SHIFT, i; + unsigned long bytes = nslabs << IO_TLB_SHIFT, i, j; + unsigned int block_num = nslabs / IO_TLB_BLOCKSIZE; mem->nslabs = nslabs; mem->start = start; @@ -210,6 +211,7 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, if (mem->num_child) { mem->child_nslot = nslabs / mem->num_child; + mem->child_nblock = block_num / mem->num_child; mem->child_start = 0; /* @@ -219,15 +221,24 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, */ for (i = 0; i < mem->num_child; i++) { mem->child[i].slots = mem->slots + i * mem->child_nslot; - mem->c
Re: [RFC PATCH] swiotlb: Add Child IO TLB mem support
On 4/29/2022 10:21 PM, Tianyu Lan wrote: From: Tianyu Lan Traditionally swiotlb was not performance critical because it was only used for slow devices. But in some setups, like TDX/SEV confidential guests, all IO has to go through swiotlb. Currently swiotlb only has a single lock. Under high IO load with multiple CPUs this can lead to significant lock contention on the swiotlb lock. This patch adds child IO TLB mem support to resolve spinlock overhead among device's queues. Each device may allocate IO tlb mem and setup child IO TLB mem according to queue number. Swiotlb code allocates bounce buffer among child IO tlb mem iterately. Hi Robin and Christoph: According to Robin idea. I draft this patch. Please have a look and check whether it's right diection. Thanks. Signed-off-by: Tianyu Lan --- include/linux/swiotlb.h | 7 +++ kernel/dma/swiotlb.c| 96 - 2 files changed, 93 insertions(+), 10 deletions(-) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 7ed35dd3de6e..4a3f6a7b4b7e 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -89,6 +89,9 @@ extern enum swiotlb_force swiotlb_force; * @late_alloc: %true if allocated using the page allocator * @force_bounce: %true if swiotlb bouncing is forced * @for_alloc: %true if the pool is used for memory allocation + * @child_nslot:The number of IO TLB slot in the child IO TLB mem. + * @num_child: The child io tlb mem number in the pool. + * @child_start:The child index to start searching in the next round. */ struct io_tlb_mem { phys_addr_t start; @@ -102,6 +105,10 @@ struct io_tlb_mem { bool late_alloc; bool force_bounce; bool for_alloc; + unsigned int num_child; + unsigned int child_nslot; + unsigned int child_start; + struct io_tlb_mem *child; struct io_tlb_slot { phys_addr_t orig_addr; size_t alloc_size; diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index e2ef0864eb1e..382fa2288645 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -207,6 +207,25 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, mem->force_bounce = true; spin_lock_init(>lock); + + if (mem->num_child) { + mem->child_nslot = nslabs / mem->num_child; + mem->child_start = 0; + + /* +* Initialize child IO TLB mem, divide IO TLB pool +* into child number. Reuse parent mem->slot in the +* child mem->slot. +*/ + for (i = 0; i < mem->num_child; i++) { + mem->num_child = 0; + mem->child[i].slots = mem->slots + i * mem->child_nslot; + swiotlb_init_io_tlb_mem(>child[i], + start + ((i * mem->child_nslot) << IO_TLB_SHIFT), + mem->child_nslot, late_alloc); + } + } + for (i = 0; i < mem->nslabs; i++) { mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i); mem->slots[i].orig_addr = INVALID_PHYS_ADDR; @@ -336,16 +355,18 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask, mem->slots = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, get_order(array_size(sizeof(*mem->slots), nslabs))); - if (!mem->slots) { - free_pages((unsigned long)vstart, order); - return -ENOMEM; - } + if (!mem->slots) + goto error_slots; set_memory_decrypted((unsigned long)vstart, bytes >> PAGE_SHIFT); swiotlb_init_io_tlb_mem(mem, virt_to_phys(vstart), nslabs, true); swiotlb_print_info(); return 0; + +error_slots: + free_pages((unsigned long)vstart, order); + return -ENOMEM; } void __init swiotlb_exit(void) @@ -483,10 +504,11 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, unsigned int index) * Find a suitable number of IO TLB entries size that will fit this request and * allocate a buffer from that IO TLB pool. */ -static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr, - size_t alloc_size, unsigned int alloc_align_mask) +static int swiotlb_do_find_slots(struct io_tlb_mem *mem, +struct device *dev, phys_addr_t orig_addr, +size_t alloc_size, +unsigned int alloc_align_mask) { - struct io_tlb_mem *mem = dev->dma_io_tlb_mem; unsigned long boundary_mask = dma_get_seg_boundary(dev); dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(dev, mem->start) & boundary_mask; @@ -565,6 +587,46 @@ static int swiotl
[RFC PATCH] swiotlb: Add Child IO TLB mem support
From: Tianyu Lan Traditionally swiotlb was not performance critical because it was only used for slow devices. But in some setups, like TDX/SEV confidential guests, all IO has to go through swiotlb. Currently swiotlb only has a single lock. Under high IO load with multiple CPUs this can lead to significant lock contention on the swiotlb lock. This patch adds child IO TLB mem support to resolve spinlock overhead among device's queues. Each device may allocate IO tlb mem and setup child IO TLB mem according to queue number. Swiotlb code allocates bounce buffer among child IO tlb mem iterately. Signed-off-by: Tianyu Lan --- include/linux/swiotlb.h | 7 +++ kernel/dma/swiotlb.c| 96 - 2 files changed, 93 insertions(+), 10 deletions(-) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 7ed35dd3de6e..4a3f6a7b4b7e 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -89,6 +89,9 @@ extern enum swiotlb_force swiotlb_force; * @late_alloc:%true if allocated using the page allocator * @force_bounce: %true if swiotlb bouncing is forced * @for_alloc: %true if the pool is used for memory allocation + * @child_nslot:The number of IO TLB slot in the child IO TLB mem. + * @num_child: The child io tlb mem number in the pool. + * @child_start:The child index to start searching in the next round. */ struct io_tlb_mem { phys_addr_t start; @@ -102,6 +105,10 @@ struct io_tlb_mem { bool late_alloc; bool force_bounce; bool for_alloc; + unsigned int num_child; + unsigned int child_nslot; + unsigned int child_start; + struct io_tlb_mem *child; struct io_tlb_slot { phys_addr_t orig_addr; size_t alloc_size; diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index e2ef0864eb1e..382fa2288645 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -207,6 +207,25 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, mem->force_bounce = true; spin_lock_init(>lock); + + if (mem->num_child) { + mem->child_nslot = nslabs / mem->num_child; + mem->child_start = 0; + + /* +* Initialize child IO TLB mem, divide IO TLB pool +* into child number. Reuse parent mem->slot in the +* child mem->slot. +*/ + for (i = 0; i < mem->num_child; i++) { + mem->num_child = 0; + mem->child[i].slots = mem->slots + i * mem->child_nslot; + swiotlb_init_io_tlb_mem(>child[i], + start + ((i * mem->child_nslot) << IO_TLB_SHIFT), + mem->child_nslot, late_alloc); + } + } + for (i = 0; i < mem->nslabs; i++) { mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i); mem->slots[i].orig_addr = INVALID_PHYS_ADDR; @@ -336,16 +355,18 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask, mem->slots = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, get_order(array_size(sizeof(*mem->slots), nslabs))); - if (!mem->slots) { - free_pages((unsigned long)vstart, order); - return -ENOMEM; - } + if (!mem->slots) + goto error_slots; set_memory_decrypted((unsigned long)vstart, bytes >> PAGE_SHIFT); swiotlb_init_io_tlb_mem(mem, virt_to_phys(vstart), nslabs, true); swiotlb_print_info(); return 0; + +error_slots: + free_pages((unsigned long)vstart, order); + return -ENOMEM; } void __init swiotlb_exit(void) @@ -483,10 +504,11 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, unsigned int index) * Find a suitable number of IO TLB entries size that will fit this request and * allocate a buffer from that IO TLB pool. */ -static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr, - size_t alloc_size, unsigned int alloc_align_mask) +static int swiotlb_do_find_slots(struct io_tlb_mem *mem, +struct device *dev, phys_addr_t orig_addr, +size_t alloc_size, +unsigned int alloc_align_mask) { - struct io_tlb_mem *mem = dev->dma_io_tlb_mem; unsigned long boundary_mask = dma_get_seg_boundary(dev); dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(dev, mem->start) & boundary_mask; @@ -565,6 +587,46 @@ static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr, return index; } +static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr, + size_t a
Re: [RFC PATCH 1/2] swiotlb: Split up single swiotlb lock
On 4/28/2022 10:44 PM, Robin Murphy wrote: On 2022-04-28 15:14, Tianyu Lan wrote: From: Tianyu Lan Traditionally swiotlb was not performance critical because it was only used for slow devices. But in some setups, like TDX/SEV confidential guests, all IO has to go through swiotlb. Currently swiotlb only has a single lock. Under high IO load with multiple CPUs this can lead to significat lock contention on the swiotlb lock. This patch splits the swiotlb into individual areas which have their own lock. When there are swiotlb map/allocate request, allocate io tlb buffer from areas averagely and free the allocation back to the associated area. This is to prepare to resolve the overhead of single spinlock among device's queues. Per device may have its own io tlb mem and bounce buffer pool. This idea from Andi Kleen patch(https://github.com/intel/tdx/commit/4529b578 4c141782c72ec9bd9a92df2b68cb7d45). Rework it and make it may work for individual device's io tlb mem. The device driver may determine area number according to device queue number. Rather than introduce this extra level of allocator complexity, how about just dividing up the initial SWIOTLB allocation into multiple io_tlb_mem instances? Robin. Agree. Thanks for suggestion. That will be more generic and will update in the next version. Thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 2/2] Swiotlb: Add device bounce buffer allocation interface
On 4/28/2022 10:14 PM, Tianyu Lan wrote: From: Tianyu Lan In SEV/TDX Confidential VM, device DMA transaction needs use swiotlb bounce buffer to share data with host/hypervisor. The swiotlb spinlock introduces overhead among devices if they share io tlb mem. Avoid such issue, introduce swiotlb_device_allocate() to allocate device bounce buffer from default io tlb pool and set up areas according input queue number. Device may have multi io queues and setting up the same number of io tlb area may help to resolve spinlock overhead among queues. Introduce IO TLB Block unit(2MB) concepts to allocate big bounce buffer from default pool for devices. IO TLB segment(256k) is too small. Hi Christoph and Robin Murphy: From Christoph: "Yeah. We're almost done removing all knowledge of swiotlb from drivers, so the very last thing I want is an interface that allows a driver to allocate a per-device buffer." Please have a look at this patch. This patch is to provide a API to device driver to allocate per-device buffer. Just providing per-device bounce buffer is not enough. Device still may have multi queue. The single io tlb mem just has one spin lock in current code and this will introuduce overhead among queues DMA transaction. So the new API requests queues number as the IO TLB area number and this is why we still need to creat area in the IO Tlb mem. This new API is the one mentioned in the Christoph's comment. Signed-off-by: Tianyu Lan --- include/linux/swiotlb.h | 33 kernel/dma/swiotlb.c| 173 +++- 2 files changed, 203 insertions(+), 3 deletions(-) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 489c249da434..380bd1ce3d0f 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -31,6 +31,14 @@ struct scatterlist; #define IO_TLB_SHIFT 11 #define IO_TLB_SIZE (1 << IO_TLB_SHIFT) +/* + * IO TLB BLOCK UNIT as device bounce buffer allocation unit. + * This allows device allocates bounce buffer from default io + * tlb pool. + */ +#define IO_TLB_BLOCKSIZE (8 * IO_TLB_SEGSIZE) +#define IO_TLB_BLOCK_UNIT (IO_TLB_BLOCKSIZE << IO_TLB_SHIFT) + /* default to 64MB */ #define IO_TLB_DEFAULT_SIZE (64UL<<20) @@ -72,11 +80,13 @@ extern enum swiotlb_force swiotlb_force; * @index:The slot index to start searching in this area for next round. * @lock: The lock to protect the above data structures in the map and *unmap calls. + * @block_index: The block index to start earching in this area for next round. */ struct io_tlb_area { unsigned long used; unsigned int area_index; unsigned int index; + unsigned int block_index; spinlock_t lock; }; @@ -110,6 +120,7 @@ struct io_tlb_area { * @num_areas: The area number in the pool. * @area_start: The area index to start searching in the next round. * @area_nslabs: The slot number in the area. + * @areas_block_number: The block number in the area. */ struct io_tlb_mem { phys_addr_t start; @@ -126,7 +137,14 @@ struct io_tlb_mem { unsigned int num_areas; unsigned int area_start; unsigned int area_nslabs; + unsigned int area_block_number; + struct io_tlb_mem *parent; struct io_tlb_area *areas; + struct io_tlb_block { + size_t alloc_size; + unsigned long start_slot; + unsigned int list; + } *block; struct io_tlb_slot { phys_addr_t orig_addr; size_t alloc_size; @@ -155,6 +173,10 @@ unsigned int swiotlb_max_segment(void); size_t swiotlb_max_mapping_size(struct device *dev); bool is_swiotlb_active(struct device *dev); void __init swiotlb_adjust_size(unsigned long size); +int swiotlb_device_allocate(struct device *dev, + unsigned int area_num, + unsigned long size); +void swiotlb_device_free(struct device *dev); #else static inline void swiotlb_init(bool addressing_limited, unsigned int flags) { @@ -187,6 +209,17 @@ static inline bool is_swiotlb_active(struct device *dev) static inline void swiotlb_adjust_size(unsigned long size) { } + +void swiotlb_device_free(struct device *dev) +{ +} + +int swiotlb_device_allocate(struct device *dev, + unsigned int area_num, + unsigned long size) +{ + return -ENOMEM; +} #endif /* CONFIG_SWIOTLB */ extern void swiotlb_print_info(void); diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 00a16f540f20..7b95a140694a 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -218,7 +218,7 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, unsigned long nslabs, bool late_alloc) { void *vaddr = phys_to_virt(start); - unsigned long bytes
[RFC PATCH 2/2] Swiotlb: Add device bounce buffer allocation interface
From: Tianyu Lan In SEV/TDX Confidential VM, device DMA transaction needs use swiotlb bounce buffer to share data with host/hypervisor. The swiotlb spinlock introduces overhead among devices if they share io tlb mem. Avoid such issue, introduce swiotlb_device_allocate() to allocate device bounce buffer from default io tlb pool and set up areas according input queue number. Device may have multi io queues and setting up the same number of io tlb area may help to resolve spinlock overhead among queues. Introduce IO TLB Block unit(2MB) concepts to allocate big bounce buffer from default pool for devices. IO TLB segment(256k) is too small. Signed-off-by: Tianyu Lan --- include/linux/swiotlb.h | 33 kernel/dma/swiotlb.c| 173 +++- 2 files changed, 203 insertions(+), 3 deletions(-) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 489c249da434..380bd1ce3d0f 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -31,6 +31,14 @@ struct scatterlist; #define IO_TLB_SHIFT 11 #define IO_TLB_SIZE (1 << IO_TLB_SHIFT) +/* + * IO TLB BLOCK UNIT as device bounce buffer allocation unit. + * This allows device allocates bounce buffer from default io + * tlb pool. + */ +#define IO_TLB_BLOCKSIZE (8 * IO_TLB_SEGSIZE) +#define IO_TLB_BLOCK_UNIT (IO_TLB_BLOCKSIZE << IO_TLB_SHIFT) + /* default to 64MB */ #define IO_TLB_DEFAULT_SIZE (64UL<<20) @@ -72,11 +80,13 @@ extern enum swiotlb_force swiotlb_force; * @index: The slot index to start searching in this area for next round. * @lock: The lock to protect the above data structures in the map and * unmap calls. + * @block_index: The block index to start earching in this area for next round. */ struct io_tlb_area { unsigned long used; unsigned int area_index; unsigned int index; + unsigned int block_index; spinlock_t lock; }; @@ -110,6 +120,7 @@ struct io_tlb_area { * @num_areas: The area number in the pool. * @area_start: The area index to start searching in the next round. * @area_nslabs: The slot number in the area. + * @areas_block_number: The block number in the area. */ struct io_tlb_mem { phys_addr_t start; @@ -126,7 +137,14 @@ struct io_tlb_mem { unsigned int num_areas; unsigned int area_start; unsigned int area_nslabs; + unsigned int area_block_number; + struct io_tlb_mem *parent; struct io_tlb_area *areas; + struct io_tlb_block { + size_t alloc_size; + unsigned long start_slot; + unsigned int list; + } *block; struct io_tlb_slot { phys_addr_t orig_addr; size_t alloc_size; @@ -155,6 +173,10 @@ unsigned int swiotlb_max_segment(void); size_t swiotlb_max_mapping_size(struct device *dev); bool is_swiotlb_active(struct device *dev); void __init swiotlb_adjust_size(unsigned long size); +int swiotlb_device_allocate(struct device *dev, + unsigned int area_num, + unsigned long size); +void swiotlb_device_free(struct device *dev); #else static inline void swiotlb_init(bool addressing_limited, unsigned int flags) { @@ -187,6 +209,17 @@ static inline bool is_swiotlb_active(struct device *dev) static inline void swiotlb_adjust_size(unsigned long size) { } + +void swiotlb_device_free(struct device *dev) +{ +} + +int swiotlb_device_allocate(struct device *dev, + unsigned int area_num, + unsigned long size) +{ + return -ENOMEM; +} #endif /* CONFIG_SWIOTLB */ extern void swiotlb_print_info(void); diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 00a16f540f20..7b95a140694a 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -218,7 +218,7 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, unsigned long nslabs, bool late_alloc) { void *vaddr = phys_to_virt(start); - unsigned long bytes = nslabs << IO_TLB_SHIFT, i, j; + unsigned long bytes = nslabs << IO_TLB_SHIFT, i, j, k; unsigned int block_list; mem->nslabs = nslabs; @@ -226,6 +226,7 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, mem->end = mem->start + bytes; mem->index = 0; mem->late_alloc = late_alloc; + mem->area_block_number = nslabs / (IO_TLB_BLOCKSIZE * mem->num_areas); if (swiotlb_force_bounce) mem->force_bounce = true; @@ -233,10 +234,18 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, for (i = 0, j = 0, k = 0; i < mem->nslabs; i++) { if (!(i % mem->area_nslabs)) { mem->areas[j].index = 0; + mem->area
[RFC PATCH 1/2] swiotlb: Split up single swiotlb lock
From: Tianyu Lan Traditionally swiotlb was not performance critical because it was only used for slow devices. But in some setups, like TDX/SEV confidential guests, all IO has to go through swiotlb. Currently swiotlb only has a single lock. Under high IO load with multiple CPUs this can lead to significat lock contention on the swiotlb lock. This patch splits the swiotlb into individual areas which have their own lock. When there are swiotlb map/allocate request, allocate io tlb buffer from areas averagely and free the allocation back to the associated area. This is to prepare to resolve the overhead of single spinlock among device's queues. Per device may have its own io tlb mem and bounce buffer pool. This idea from Andi Kleen patch(https://github.com/intel/tdx/commit/4529b578 4c141782c72ec9bd9a92df2b68cb7d45). Rework it and make it may work for individual device's io tlb mem. The device driver may determine area number according to device queue number. Based-on-idea-by: Andi Kleen Signed-off-by: Tianyu Lan --- include/linux/swiotlb.h | 25 ++ kernel/dma/swiotlb.c| 173 +++- 2 files changed, 162 insertions(+), 36 deletions(-) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 7ed35dd3de6e..489c249da434 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -62,6 +62,24 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys, #ifdef CONFIG_SWIOTLB extern enum swiotlb_force swiotlb_force; +/** + * struct io_tlb_area - IO TLB memory area descriptor + * + * This is a single area with a single lock. + * + * @used: The number of used IO TLB block. + * @area_index: The index of to tlb area. + * @index: The slot index to start searching in this area for next round. + * @lock: The lock to protect the above data structures in the map and + * unmap calls. + */ +struct io_tlb_area { + unsigned long used; + unsigned int area_index; + unsigned int index; + spinlock_t lock; +}; + /** * struct io_tlb_mem - IO TLB Memory Pool Descriptor * @@ -89,6 +107,9 @@ extern enum swiotlb_force swiotlb_force; * @late_alloc:%true if allocated using the page allocator * @force_bounce: %true if swiotlb bouncing is forced * @for_alloc: %true if the pool is used for memory allocation + * @num_areas: The area number in the pool. + * @area_start: The area index to start searching in the next round. + * @area_nslabs: The slot number in the area. */ struct io_tlb_mem { phys_addr_t start; @@ -102,6 +123,10 @@ struct io_tlb_mem { bool late_alloc; bool force_bounce; bool for_alloc; + unsigned int num_areas; + unsigned int area_start; + unsigned int area_nslabs; + struct io_tlb_area *areas; struct io_tlb_slot { phys_addr_t orig_addr; size_t alloc_size; diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index e2ef0864eb1e..00a16f540f20 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -62,6 +62,8 @@ #define INVALID_PHYS_ADDR (~(phys_addr_t)0) +#define NUM_AREAS_DEFAULT 1 + static bool swiotlb_force_bounce; static bool swiotlb_force_disable; @@ -70,6 +72,25 @@ struct io_tlb_mem io_tlb_default_mem; phys_addr_t swiotlb_unencrypted_base; static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT; +static unsigned long default_area_num = NUM_AREAS_DEFAULT; + +static int swiotlb_setup_areas(struct io_tlb_mem *mem, + unsigned int num_areas, unsigned long nslabs) +{ + if (nslabs < 1 || !is_power_of_2(num_areas)) { + pr_err("swiotlb: Invalid areas parameter %d.\n", num_areas); + return -EINVAL; + } + + /* Round up number of slabs to the next power of 2. +* The last area is going be smaller than the rest if default_nslabs is +* not power of two. +*/ + mem->area_start = 0; + mem->num_areas = num_areas; + mem->area_nslabs = nslabs / num_areas; + return 0; +} static int __init setup_io_tlb_npages(char *str) @@ -114,6 +135,8 @@ void __init swiotlb_adjust_size(unsigned long size) return; size = ALIGN(size, IO_TLB_SIZE); default_nslabs = ALIGN(size >> IO_TLB_SHIFT, IO_TLB_SEGSIZE); + swiotlb_setup_areas(_tlb_default_mem, default_area_num, + default_nslabs); pr_info("SWIOTLB bounce buffer size adjusted to %luMB", size >> 20); } @@ -195,7 +218,8 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, unsigned long nslabs, bool late_alloc) { void *vaddr = phys_to_virt(start); - unsigned long bytes = nslabs << IO_TLB_SHIFT, i; + unsigned long bytes = nslabs << IO_TLB_SHIFT, i, j; + unsigned int block_list; mem-&
[RFC PATCH 0/2] swiotlb: Introduce swiotlb device allocation function
From: Tianyu Lan Traditionally swiotlb was not performance critical because it was only used for slow devices. But in some setups, like TDX/SEV confidential guests, all IO has to go through swiotlb. Currently swiotlb only has a single lock. Under high IO load with multiple CPUs this can lead to significant lock contention on the swiotlb lock. This patchset splits the swiotlb into individual areas which have their own lock. When there are swiotlb map/allocate request, allocate io tlb buffer from areas averagely and free the allocation back to the associated area. Patch 2 introduces an helper function to allocate bounce buffer from default IO tlb pool for devices with new IO TLB block unit and set up IO TLB area for device queues to avoid spinlock overhead. The area number is set by device driver according queue number. The network test between traditional VM and Confidential VM. The throughput improves from ~20Gb/s to ~34Gb/s with this patchset. Tianyu Lan (2): swiotlb: Split up single swiotlb lock Swiotlb: Add device bounce buffer allocation interface include/linux/swiotlb.h | 58 +++ kernel/dma/swiotlb.c| 340 +++- 2 files changed, 362 insertions(+), 36 deletions(-) -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V2 1/2] Swiotlb: Add swiotlb_alloc_from_low_pages switch
On 3/1/2022 7:53 PM, Christoph Hellwig wrote: On Fri, Feb 25, 2022 at 10:28:54PM +0800, Tianyu Lan wrote: One more perspective is that one device may have multiple queues and each queues should have independent swiotlb bounce buffer to avoid spin lock overhead. The number of queues is only available in the device driver. This means new API needs to be called in the device driver according to queue number. Well, given how hell bent people are on bounce buffering we might need some scalability work there anyway. According to my test on the local machine with two VMs, Linux guest without swiotlb bounce buffer or with the fix patch from Andi Kleen can achieve about 40G/s throughput but it's just 24-25G/s with current swiotlb code. Otherwise, the spinlock contention also consumes more cpu usage. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V2 1/2] Swiotlb: Add swiotlb_alloc_from_low_pages switch
On 2/23/2022 5:46 PM, Tianyu Lan wrote: On 2/23/2022 12:00 AM, Christoph Hellwig wrote: On Tue, Feb 22, 2022 at 11:07:19PM +0800, Tianyu Lan wrote: Thanks for your comment. That means we need to expose an swiotlb_device_init() interface to allocate bounce buffer and initialize io tlb mem entry. DMA API Current rmem_swiotlb_device_init() only works for platform with device tree. The new API should be called in the bus driver or new DMA API. Could you check whether this is a right way before we start the work. Do these VMs use ACPI? We'd probably really want some kind of higher level configuration and not have the drivers request it themselves. Yes, Hyper-V isolation VM uses ACPI. Devices are enumerated via vmbus host and there is no child device information in ACPI table. The host driver seems to be the right place to call new API. Hi Christoph: One more perspective is that one device may have multiple queues and each queues should have independent swiotlb bounce buffer to avoid spin lock overhead. The number of queues is only available in the device driver. This means new API needs to be called in the device driver according to queue number. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V2 1/2] Swiotlb: Add swiotlb_alloc_from_low_pages switch
On 2/23/2022 12:00 AM, Christoph Hellwig wrote: On Tue, Feb 22, 2022 at 11:07:19PM +0800, Tianyu Lan wrote: Thanks for your comment. That means we need to expose an swiotlb_device_init() interface to allocate bounce buffer and initialize io tlb mem entry. DMA API Current rmem_swiotlb_device_init() only works for platform with device tree. The new API should be called in the bus driver or new DMA API. Could you check whether this is a right way before we start the work. Do these VMs use ACPI? We'd probably really want some kind of higher level configuration and not have the drivers request it themselves. Yes, Hyper-V isolation VM uses ACPI. Devices are enumerated via vmbus host and there is no child device information in ACPI table. The host driver seems to be the right place to call new API. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V2 1/2] Swiotlb: Add swiotlb_alloc_from_low_pages switch
On 2/22/2022 4:05 PM, Christoph Hellwig wrote: On Mon, Feb 21, 2022 at 11:14:58PM +0800, Tianyu Lan wrote: Sorry. The boot failure is not related with these patches and the issue has been fixed in the latest upstream code. There is a performance bottleneck due to io tlb mem's spin lock during performance test. All devices'io queues uses same io tlb mem entry and the spin lock of io tlb mem introduce overheads. There is a fix patch from Andi Kleen in the github. Could you have a look? https://github.com/intel/tdx/commit/4529b5784c141782c72ec9bd9a92df2b68cb7d45 Please post these things to the list. But I suspect the right answer for the "secure" hypervisor case is to use the per-device swiotlb regions that we've recently added. Thanks for your comment. That means we need to expose an swiotlb_device_init() interface to allocate bounce buffer and initialize io tlb mem entry. DMA API Current rmem_swiotlb_device_init() only works for platform with device tree. The new API should be called in the bus driver or new DMA API. Could you check whether this is a right way before we start the work. Thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V2 1/2] Swiotlb: Add swiotlb_alloc_from_low_pages switch
On 2/15/2022 11:32 PM, Tianyu Lan wrote: On 2/14/2022 9:58 PM, Christoph Hellwig wrote: On Mon, Feb 14, 2022 at 07:28:40PM +0800, Tianyu Lan wrote: On 2/14/2022 4:19 PM, Christoph Hellwig wrote: Adding a function to set the flag doesn't really change much. As Robin pointed out last time you should fine a way to just call swiotlb_init_with_tbl directly with the memory allocated the way you like it. Or given that we have quite a few of these trusted hypervisor schemes maybe add an argument to swiotlb_init that specifies how to allocate the memory. Thanks for your suggestion. I will try the first approach first approach. Take a look at the SWIOTLB_ANY flag in this WIP branch: http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/swiotlb-init-cleanup That being said I'm not sure that either this flag or the existing powerpc code iѕ actually the right thing to do. We still need the 4G limited buffer to support devices with addressing limitations. So I think we need an additional io_tlb_mem instance for the devices without addressing limitations instead. Hi Christoph: Thanks for your patches. I tested these patches in Hyper-V trusted VM and system can't boot up. I am debugging and will report back. Sorry. The boot failure is not related with these patches and the issue has been fixed in the latest upstream code. There is a performance bottleneck due to io tlb mem's spin lock during performance test. All devices'io queues uses same io tlb mem entry and the spin lock of io tlb mem introduce overheads. There is a fix patch from Andi Kleen in the github. Could you have a look? https://github.com/intel/tdx/commit/4529b5784c141782c72ec9bd9a92df2b68cb7d45 Thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V2 1/2] Swiotlb: Add swiotlb_alloc_from_low_pages switch
On 2/14/2022 9:58 PM, Christoph Hellwig wrote: On Mon, Feb 14, 2022 at 07:28:40PM +0800, Tianyu Lan wrote: On 2/14/2022 4:19 PM, Christoph Hellwig wrote: Adding a function to set the flag doesn't really change much. As Robin pointed out last time you should fine a way to just call swiotlb_init_with_tbl directly with the memory allocated the way you like it. Or given that we have quite a few of these trusted hypervisor schemes maybe add an argument to swiotlb_init that specifies how to allocate the memory. Thanks for your suggestion. I will try the first approach first approach. Take a look at the SWIOTLB_ANY flag in this WIP branch: http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/swiotlb-init-cleanup That being said I'm not sure that either this flag or the existing powerpc code iѕ actually the right thing to do. We still need the 4G limited buffer to support devices with addressing limitations. So I think we need an additional io_tlb_mem instance for the devices without addressing limitations instead. Hi Christoph: Thanks for your patches. I tested these patches in Hyper-V trusted VM and system can't boot up. I am debugging and will report back. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V2 1/2] Swiotlb: Add swiotlb_alloc_from_low_pages switch
On 2/14/2022 4:19 PM, Christoph Hellwig wrote: Adding a function to set the flag doesn't really change much. As Robin pointed out last time you should fine a way to just call swiotlb_init_with_tbl directly with the memory allocated the way you like it. Or given that we have quite a few of these trusted hypervisor schemes maybe add an argument to swiotlb_init that specifies how to allocate the memory. Thanks for your suggestion. I will try the first approach first approach. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH V2 2/2] x86/hyperv: Make swiotlb bounce buffer allocation not just from low pages
From: Tianyu Lan In Hyper-V Isolation VM, swiotlb bnounce buffer size maybe 1G at most and there maybe no enough memory from 0 to 4G according to memory layout. Devices in Isolation VM can use memory above 4G as DMA memory and call swiotlb_alloc_from_low_pages() to allocate swiotlb bounce buffer not limit from 0 to 4G. Signed-off-by: Tianyu Lan --- arch/x86/kernel/cpu/mshyperv.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c index 5a99f993e639..50ba4622c650 100644 --- a/arch/x86/kernel/cpu/mshyperv.c +++ b/arch/x86/kernel/cpu/mshyperv.c @@ -343,6 +343,7 @@ static void __init ms_hyperv_init_platform(void) * use swiotlb bounce buffer for dma transaction. */ swiotlb_force = SWIOTLB_FORCE; + swiotlb_set_alloc_from_low_pages(false); #endif } -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH V2 1/2] Swiotlb: Add swiotlb_alloc_from_low_pages switch
From: Tianyu Lan Hyper-V Isolation VM and AMD SEV VM uses swiotlb bounce buffer to share memory with hypervisor. Current swiotlb bounce buffer is only allocated from 0 to ARCH_LOW_ADDRESS_LIMIT which is default to 0xUL. Isolation VM and AMD SEV VM needs 1G bounce buffer at most. This will fail when there is not enough memory from 0 to 4G address space and devices also may use memory above 4G address space as DMA memory. Expose swiotlb_alloc_from_low_pages and platform mey set it to false when it's not necessary to limit bounce buffer from 0 to 4G memory. Signed-off-by: Tianyu Lan --- include/linux/swiotlb.h | 1 + kernel/dma/swiotlb.c| 18 -- 2 files changed, 17 insertions(+), 2 deletions(-) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index f6c3638255d5..2b4f92668bc7 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -39,6 +39,7 @@ enum swiotlb_force { extern void swiotlb_init(int verbose); int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose); unsigned long swiotlb_size_or_default(void); +void swiotlb_set_alloc_from_low_pages(bool low); extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs); extern int swiotlb_late_init_with_default_size(size_t default_size); extern void __init swiotlb_update_mem_attributes(void); diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index f1e7ea160b43..62bf8b5cc3e4 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -73,6 +73,8 @@ enum swiotlb_force swiotlb_force; struct io_tlb_mem io_tlb_default_mem; +static bool swiotlb_alloc_from_low_pages = true; + phys_addr_t swiotlb_unencrypted_base; /* @@ -116,6 +118,11 @@ void swiotlb_set_max_segment(unsigned int val) max_segment = rounddown(val, PAGE_SIZE); } +void swiotlb_set_alloc_from_low_pages(bool low) +{ + swiotlb_alloc_from_low_pages = low; +} + unsigned long swiotlb_size_or_default(void) { return default_nslabs << IO_TLB_SHIFT; @@ -284,8 +291,15 @@ swiotlb_init(int verbose) if (swiotlb_force == SWIOTLB_NO_FORCE) return; - /* Get IO TLB memory from the low pages */ - tlb = memblock_alloc_low(bytes, PAGE_SIZE); + /* +* Get IO TLB memory from the low pages if swiotlb_alloc_from_low_pages +* is set. +*/ + if (swiotlb_alloc_from_low_pages) + tlb = memblock_alloc_low(bytes, PAGE_SIZE); + else + tlb = memblock_alloc(bytes, PAGE_SIZE); + if (!tlb) goto fail; if (swiotlb_init_with_tbl(tlb, default_nslabs, verbose)) -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH V2 0/2] x86/hyperv/Swiotlb: Add swiotlb_set_alloc_from_low_pages() switch function.
From: Tianyu Lan Hyper-V Isolation VM may fail to allocate swiotlb bounce buffer due to there is no enough contiguous memory from 0 to 4G in some cases. Current swiotlb code allocates bounce buffer in the low end memory. This patchset adds a new function swiotlb_set_alloc_from_low_pages() to control swiotlb bounce buffer from low pages or no limitation. Devices in Hyper-V Isolation VM may use memory above 4G as DMA memory and switch swiotlb allocation in order to avoid no enough contiguous memory in low pages. Tianyu Lan (2): Swiotlb: Add swiotlb_alloc_from_low_pages switch x86/hyperv: Make swiotlb bounce buffer allocation not just from low pages arch/x86/kernel/cpu/mshyperv.c | 1 + include/linux/swiotlb.h| 1 + kernel/dma/swiotlb.c | 18 -- 3 files changed, 18 insertions(+), 2 deletions(-) -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] Netvsc: Call hv_unmap_memory() in the netvsc_device_remove()
On 2/3/2022 1:05 AM, Michael Kelley (LINUX) wrote: From: Tianyu Lan Sent: Tuesday, February 1, 2022 8:32 AM netvsc_device_remove() calls vunmap() inside which should not be called in the interrupt context. Current code calls hv_unmap_memory() in the free_netvsc_device() which is rcu callback and maybe called in the interrupt context. This will trigger BUG_ON(in_interrupt()) in the vunmap(). Fix it via moving hv_unmap_memory() to netvsc_device_ remove(). I think this change can fail to call hv_unmap_memory() in an error case. If netvsc_init_buf() fails after hv_map_memory() succeeds for the receive buffer or for the send buffer, no corresponding hv_unmap_memory() will be done. The failure in netvsc_init_buf() will cause netvsc_connect_vsp() to fail, so netvsc_add_device() will "goto close" where free_netvsc_device() will be called. But free_netvsc_device() no longer calls hv_unmap_memory(), so it won't ever happen. netvsc_device_remove() is never called in this case because netvsc_add_device() failed. Hi Michael: Thanks for your review. Nice catch and will fix in the next version. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/2] x86/hyperv/Swiotlb: Add swiotlb_alloc_from_low_pages switch
On 2/2/2022 4:12 PM, Christoph Hellwig wrote: I think this interface is a little too hacky. In the end all the non-trusted hypervisor schemes (including the per-device swiotlb one) can allocate the memory from everywhere and want for force use of swiotlb. I think we need some kind of proper interface for that instead of setting all kinds of global variables. Hi Christoph: Thanks for your review. I draft the following patch to export a interface swiotlb_set_alloc_from_low_pages(). Could you have a look whether this looks good for you. diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index f6c3638255d5..2b4f92668bc7 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -39,6 +39,7 @@ enum swiotlb_force { extern void swiotlb_init(int verbose); int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose); unsigned long swiotlb_size_or_default(void); +void swiotlb_set_alloc_from_low_pages(bool low); extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs); extern int swiotlb_late_init_with_default_size(size_t default_size); extern void __init swiotlb_update_mem_attributes(void); diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index f1e7ea160b43..62bf8b5cc3e4 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -73,6 +73,8 @@ enum swiotlb_force swiotlb_force; struct io_tlb_mem io_tlb_default_mem; +static bool swiotlb_alloc_from_low_pages = true; + phys_addr_t swiotlb_unencrypted_base; /* @@ -116,6 +118,11 @@ void swiotlb_set_max_segment(unsigned int val) max_segment = rounddown(val, PAGE_SIZE); } +void swiotlb_set_alloc_from_low_pages(bool low) +{ + swiotlb_alloc_from_low_pages = low; +} + unsigned long swiotlb_size_or_default(void) { return default_nslabs << IO_TLB_SHIFT; @@ -284,8 +291,15 @@ swiotlb_init(int verbose) if (swiotlb_force == SWIOTLB_NO_FORCE) return; - /* Get IO TLB memory from the low pages */ - tlb = memblock_alloc_low(bytes, PAGE_SIZE); + /* +* Get IO TLB memory from the low pages if swiotlb_alloc_from_low_pages +* is set. +*/ + if (swiotlb_alloc_from_low_pages) + tlb = memblock_alloc_low(bytes, PAGE_SIZE); + else + tlb = memblock_alloc(bytes, PAGE_SIZE); + if (!tlb) goto fail; if (swiotlb_init_with_tbl(tlb, default_nslabs, verbose)) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] Netvsc: Call hv_unmap_memory() in the netvsc_device_remove()
From: Tianyu Lan netvsc_device_remove() calls vunmap() inside which should not be called in the interrupt context. Current code calls hv_unmap_memory() in the free_netvsc_device() which is rcu callback and maybe called in the interrupt context. This will trigger BUG_ON(in_interrupt()) in the vunmap(). Fix it via moving hv_unmap_memory() to netvsc_device_ remove(). Fixes: 846da38de0e8 ("net: netvsc: Add Isolation VM support for netvsc driver") Signed-off-by: Tianyu Lan --- drivers/net/hyperv/netvsc.c | 18 ++ 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c index afa81a9480cc..f989f920d4ce 100644 --- a/drivers/net/hyperv/netvsc.c +++ b/drivers/net/hyperv/netvsc.c @@ -154,19 +154,15 @@ static void free_netvsc_device(struct rcu_head *head) kfree(nvdev->extension); - if (nvdev->recv_original_buf) { - hv_unmap_memory(nvdev->recv_buf); + if (nvdev->recv_original_buf) vfree(nvdev->recv_original_buf); - } else { + else vfree(nvdev->recv_buf); - } - if (nvdev->send_original_buf) { - hv_unmap_memory(nvdev->send_buf); + if (nvdev->send_original_buf) vfree(nvdev->send_original_buf); - } else { + else vfree(nvdev->send_buf); - } bitmap_free(nvdev->send_section_map); @@ -765,6 +761,12 @@ void netvsc_device_remove(struct hv_device *device) netvsc_teardown_send_gpadl(device, net_device, ndev); } + if (net_device->recv_original_buf) + hv_unmap_memory(net_device->recv_buf); + + if (net_device->send_original_buf) + hv_unmap_memory(net_device->send_buf); + /* Release all resources */ free_netvsc_device_rcu(net_device); } -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/2] x86/hyperv: Set swiotlb_alloc_from_low_pages to false
From: Tianyu Lan In Hyper-V Isolation VM, swiotlb bnounce buffer size maybe 1G at most and there maybe no enough memory from 0 to 4G according to memory layout. Devices in Isolation VM can use memory above 4G as DMA memory. Set swiotlb_ alloc_from_low_pages to false in Isolation VM. Signed-off-by: Tianyu Lan --- arch/x86/kernel/cpu/mshyperv.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c index 5a99f993e639..80a0423ac75d 100644 --- a/arch/x86/kernel/cpu/mshyperv.c +++ b/arch/x86/kernel/cpu/mshyperv.c @@ -343,6 +343,7 @@ static void __init ms_hyperv_init_platform(void) * use swiotlb bounce buffer for dma transaction. */ swiotlb_force = SWIOTLB_FORCE; + swiotlb_alloc_from_low_pages = false; #endif } -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/2] Swiotlb: Add swiotlb_alloc_from_low_pages switch
From: Tianyu Lan Hyper-V Isolation VM and AMD SEV VM uses swiotlb bounce buffer to share memory with hypervisor. Current swiotlb bounce buffer is only allocated from 0 to ARCH_LOW_ADDRESS_LIMIT which is default to 0xUL. Isolation VM and AMD SEV VM needs 1G bounce buffer at most. This will fail when there is not enough contiguous memory from 0 to 4G address space and devices also may use memory above 4G address space as DMA memory. Expose swiotlb_alloc_from_low_pages and platform mey set it to false when it's not necessary to limit bounce buffer from 0 to 4G memory. Signed-off-by: Tianyu Lan --- include/linux/swiotlb.h | 1 + kernel/dma/swiotlb.c| 13 +++-- 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index f6c3638255d5..55c178e8eee0 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -191,5 +191,6 @@ static inline bool is_swiotlb_for_alloc(struct device *dev) #endif /* CONFIG_DMA_RESTRICTED_POOL */ extern phys_addr_t swiotlb_unencrypted_base; +extern bool swiotlb_alloc_from_low_pages; #endif /* __LINUX_SWIOTLB_H */ diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index f1e7ea160b43..159fef80f3db 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -73,6 +73,12 @@ enum swiotlb_force swiotlb_force; struct io_tlb_mem io_tlb_default_mem; +/* + * Get IO TLB memory from the low pages if swiotlb_alloc_from_low_pages + * is set. + */ +bool swiotlb_alloc_from_low_pages = true; + phys_addr_t swiotlb_unencrypted_base; /* @@ -284,8 +290,11 @@ swiotlb_init(int verbose) if (swiotlb_force == SWIOTLB_NO_FORCE) return; - /* Get IO TLB memory from the low pages */ - tlb = memblock_alloc_low(bytes, PAGE_SIZE); + if (swiotlb_alloc_from_low_pages) + tlb = memblock_alloc_low(bytes, PAGE_SIZE); + else + tlb = memblock_alloc(bytes, PAGE_SIZE); + if (!tlb) goto fail; if (swiotlb_init_with_tbl(tlb, default_nslabs, verbose)) -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 0/2] x86/hyperv/Swiotlb: Add swiotlb_alloc_from_low_pages switch
From: Tianyu Lan Hyper-V Isolation VM may fail to allocate swiotlb bounce buffer due to there is no enough contiguous memory from 0 to 4G in some cases. Current swiotlb code allocate bounce buffer in the low end memory. This patchset adds a switch "swiotlb_alloc_from_low_pages" and it's set to true by default. Platform may clear it if necessary. Devices in Hyper-V Isolation VM may use memory above 4G as DMA memory and set the switch to false in order to avoid no enough contiguous memory in low end address space. Tianyu Lan (2): Swiotlb: Add swiotlb_alloc_from_low_pages switch x86/hyperv: Set swiotlb_alloc_from_low_pages to false arch/x86/kernel/cpu/mshyperv.c | 1 + include/linux/swiotlb.h| 1 + kernel/dma/swiotlb.c | 13 +++-- 3 files changed, 13 insertions(+), 2 deletions(-) -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] Swiotlb: Add CONFIG_HAS_IOMEM check around memremap() in the swiotlb_mem_remap()
From: Tianyu Lan HAS_IOMEM option may not be selected on some platforms(e.g, s390) and this will cause compile error due to miss memremap() implementation. Fix it via adding HAS_IOMEM check around memremap() in the swiotlb.c. Reported-by: kernel test robot Signed-off-by: Tianyu Lan --- kernel/dma/swiotlb.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index b36c1cdd0c4f..3de651ba38cc 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -167,6 +167,7 @@ static void *swiotlb_mem_remap(struct io_tlb_mem *mem, unsigned long bytes) { void *vaddr = NULL; +#ifdef CONFIG_HAS_IOMEM if (swiotlb_unencrypted_base) { phys_addr_t paddr = mem->start + swiotlb_unencrypted_base; @@ -175,6 +176,7 @@ static void *swiotlb_mem_remap(struct io_tlb_mem *mem, unsigned long bytes) pr_err("Failed to map the unencrypted memory %pa size %lx.\n", , bytes); } +#endif return vaddr; } -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V7 1/5] swiotlb: Add swiotlb bounce buffer remap function for HV IVM
On 12/15/2021 6:40 AM, Dave Hansen wrote: On 12/14/21 2:23 PM, Tom Lendacky wrote: I don't really understand how this can be more general any *not* get utilized by the existing SEV support. The Virtual Top-of-Memory (VTOM) support is an SEV-SNP feature that is meant to be used with a (relatively) un-enlightened guest. The idea is that the C-bit in the guest page tables must be 0 for all accesses. It is only the physical address relative to VTOM that determines if the access is encrypted or not. So setting sme_me_mask will actually cause issues when running with this feature. Since all DMA for an SEV-SNP guest must still be to shared (unencrypted) memory, some enlightenment is needed. In this case, memory mapped above VTOM will provide that via the SWIOTLB update. For SEV-SNP guests running with VTOM, they are likely to also be running with the Reflect #VC feature, allowing a "paravisor" to handle any #VCs generated by the guest. See sections 15.36.8 "Virtual Top-of-Memory" and 15.36.9 "Reflect #VC" in volume 2 of the AMD APM [1]. Thanks, Tom, that's pretty much what I was looking for. The C-bit normally comes from the page tables. But, the hardware also provides an alternative way to effectively get C-bit behavior without actually setting the bit in the page tables: Virtual Top-of-Memory (VTOM). Right? It sounds like Hyper-V has chosen to use VTOM instead of requiring the guest to do the C-bit in its page tables. But, the thing that confuses me is when you said: "it (VTOM) is meant to be used with a (relatively) un-enlightened guest". We don't have an unenlightened guest here. We have Linux, which is quite enlightened. Is VTOM being used because there's something that completely rules out using the C-bit in the page tables? What's that "something"? For "un-enlightened" guest, there is an another system running insider the VM to emulate some functions(tsc, timer, interrupt and so on) and this helps not to modify OS(Linux/Windows) a lot. In Hyper-V Isolation VM, we called the new system as HCL/paravisor. HCL runs in the VMPL0 and Linux runs in VMPL2. This is similar with nested virtualization. HCL plays similar role as L1 hypervisor to emulate some general functions (e.g, rdmsr/wrmsr accessing and interrupt injection) which needs to be enlightened in the enlightened guest. Linux kernel needs to handle #vc/#ve exception directly in the enlightened guest. HCL handles such exception in un-enlightened guest and emulate interrupt injection which helps not to modify OS core part code. Using vTOM also is same purpose. Hyper-V uses vTOM avoid changing page table related code in OS(include Windows and Linux)and just needs to map memory into decrypted address space above vTOM in the driver code. Linux has generic swiotlb bounce buffer implementation and so introduce swiotlb_unencrypted_base here to set shared memory boundary or vTOM. Hyper-V Isolation VM is un-enlightened guest. Hyper-V doesn't expose sev/sme capability to guest and so SEV code actually doesn't work. So we also can't interact current existing SEV code and these code is for enlightened guest support without HCL/paravisor. If other platforms or SEV want to use similar vTOM feature, swiotlb_unencrypted_base can be reused. So swiotlb_unencrypted_base is a general solution for all platforms besides SEV and Hyper-V. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V7 1/5] swiotlb: Add swiotlb bounce buffer remap function for HV IVM
On 12/14/2021 12:45 AM, Dave Hansen wrote: On 12/12/21 11:14 PM, Tianyu Lan wrote: In Isolation VM with AMD SEV, bounce buffer needs to be accessed via extra address space which is above shared_gpa_boundary (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access physical address will be original physical address + shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of memory(vTOM). Memory addresses below vTOM are automatically treated as private while memory above vTOM is treated as shared. This seems to be independently reintroducing some of the SEV infrastructure. Is it really OK that this doesn't interact at all with any existing SEV code? For instance, do we need a new 'swiotlb_unencrypted_base', or should this just be using sme_me_mask somehow? Hi Dave: Thanks for your review. Hyper-V provides a para-virtualized confidential computing solution based on the AMD SEV function and not expose sev capabilities to guest. So sme_me_mask is unset in the Hyper-V Isolation VM. swiotlb_unencrypted_base is more general solution to handle such case of different address space for encrypted and decrypted memory and other platform also may reuse it. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH V7 5/5] net: netvsc: Add Isolation VM support for netvsc driver
From: Tianyu Lan In Isolation VM, all shared memory with host needs to mark visible to host via hvcall. vmbus_establish_gpadl() has already done it for netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_ pagebuffer() stills need to be handled. Use DMA API to map/umap these memory during sending/receiving packet and Hyper-V swiotlb bounce buffer dma address will be returned. The swiotlb bounce buffer has been masked to be visible to host during boot up. rx/tx ring buffer is allocated via vzalloc() and they need to be mapped into unencrypted address space(above vTOM) before sharing with host and accessing. Add hv_map/unmap_memory() to map/umap rx /tx ring buffer. Signed-off-by: Tianyu Lan --- Change since v3: * Replace HV_HYP_PAGE_SIZE with PAGE_SIZE and virt_to_hvpfn() with vmalloc_to_pfn() in the hv_map_memory() Change since v2: * Add hv_map/unmap_memory() to map/umap rx/tx ring buffer. --- arch/x86/hyperv/ivm.c | 28 ++ drivers/hv/hv_common.c| 11 +++ drivers/net/hyperv/hyperv_net.h | 5 ++ drivers/net/hyperv/netvsc.c | 136 +- drivers/net/hyperv/netvsc_drv.c | 1 + drivers/net/hyperv/rndis_filter.c | 2 + include/asm-generic/mshyperv.h| 2 + include/linux/hyperv.h| 5 ++ 8 files changed, 187 insertions(+), 3 deletions(-) diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c index 69c7a57f3307..2b994117581e 100644 --- a/arch/x86/hyperv/ivm.c +++ b/arch/x86/hyperv/ivm.c @@ -287,3 +287,31 @@ int hv_set_mem_host_visibility(unsigned long kbuffer, int pagecount, bool visibl kfree(pfn_array); return ret; } + +/* + * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM. + */ +void *hv_map_memory(void *addr, unsigned long size) +{ + unsigned long *pfns = kcalloc(size / PAGE_SIZE, + sizeof(unsigned long), GFP_KERNEL); + void *vaddr; + int i; + + if (!pfns) + return NULL; + + for (i = 0; i < size / PAGE_SIZE; i++) + pfns[i] = vmalloc_to_pfn(addr + i * PAGE_SIZE) + + (ms_hyperv.shared_gpa_boundary >> PAGE_SHIFT); + + vaddr = vmap_pfn(pfns, size / PAGE_SIZE, PAGE_KERNEL_IO); + kfree(pfns); + + return vaddr; +} + +void hv_unmap_memory(void *addr) +{ + vunmap(addr); +} diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c index 7be173a99f27..3c5cb1f70319 100644 --- a/drivers/hv/hv_common.c +++ b/drivers/hv/hv_common.c @@ -295,3 +295,14 @@ u64 __weak hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_s return HV_STATUS_INVALID_PARAMETER; } EXPORT_SYMBOL_GPL(hv_ghcb_hypercall); + +void __weak *hv_map_memory(void *addr, unsigned long size) +{ + return NULL; +} +EXPORT_SYMBOL_GPL(hv_map_memory); + +void __weak hv_unmap_memory(void *addr) +{ +} +EXPORT_SYMBOL_GPL(hv_unmap_memory); diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h index 315278a7cf88..cf69da0e296c 100644 --- a/drivers/net/hyperv/hyperv_net.h +++ b/drivers/net/hyperv/hyperv_net.h @@ -164,6 +164,7 @@ struct hv_netvsc_packet { u32 total_bytes; u32 send_buf_index; u32 total_data_buflen; + struct hv_dma_range *dma_range; }; #define NETVSC_HASH_KEYLEN 40 @@ -1074,6 +1075,7 @@ struct netvsc_device { /* Receive buffer allocated by us but manages by NetVSP */ void *recv_buf; + void *recv_original_buf; u32 recv_buf_size; /* allocated bytes */ struct vmbus_gpadl recv_buf_gpadl_handle; u32 recv_section_cnt; @@ -1082,6 +1084,7 @@ struct netvsc_device { /* Send buffer allocated by us */ void *send_buf; + void *send_original_buf; u32 send_buf_size; struct vmbus_gpadl send_buf_gpadl_handle; u32 send_section_cnt; @@ -1731,4 +1734,6 @@ struct rndis_message { #define RETRY_US_HI1 #define RETRY_MAX 2000/* >10 sec */ +void netvsc_dma_unmap(struct hv_device *hv_dev, + struct hv_netvsc_packet *packet); #endif /* _HYPERV_NET_H */ diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c index 396bc1c204e6..b7ade735a806 100644 --- a/drivers/net/hyperv/netvsc.c +++ b/drivers/net/hyperv/netvsc.c @@ -153,8 +153,21 @@ static void free_netvsc_device(struct rcu_head *head) int i; kfree(nvdev->extension); - vfree(nvdev->recv_buf); - vfree(nvdev->send_buf); + + if (nvdev->recv_original_buf) { + hv_unmap_memory(nvdev->recv_buf); + vfree(nvdev->recv_original_buf); + } else { + vfree(nvdev->recv_buf); + } + + if (nvdev->send_original_buf) { + hv_unmap_memory(nvdev->send_buf); + vfree(nvdev->send_original_buf); + } else { + vfree(nvdev-&
[PATCH V7 4/5] scsi: storvsc: Add Isolation VM support for storvsc driver
From: Tianyu Lan In Isolation VM, all shared memory with host needs to mark visible to host via hvcall. vmbus_establish_gpadl() has already done it for storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_ mpb_desc() still needs to be handled. Use DMA API(scsi_dma_map/unmap) to map these memory during sending/receiving packet and return swiotlb bounce buffer dma address. In Isolation VM, swiotlb bounce buffer is marked to be visible to host and the swiotlb force mode is enabled. Set device's dma min align mask to HV_HYP_PAGE_SIZE - 1 in order to keep the original data offset in the bounce buffer. Signed-off-by: Tianyu Lan --- drivers/hv/vmbus_drv.c | 4 drivers/scsi/storvsc_drv.c | 37 + include/linux/hyperv.h | 1 + 3 files changed, 26 insertions(+), 16 deletions(-) diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index 392c1ac4f819..ae6ec503399a 100644 --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -33,6 +33,7 @@ #include #include #include +#include #include #include "hyperv_vmbus.h" @@ -2078,6 +2079,7 @@ struct hv_device *vmbus_device_create(const guid_t *type, return child_device_obj; } +static u64 vmbus_dma_mask = DMA_BIT_MASK(64); /* * vmbus_device_register - Register the child device */ @@ -2118,6 +2120,8 @@ int vmbus_device_register(struct hv_device *child_device_obj) } hv_debug_add_dev_dir(child_device_obj); + child_device_obj->device.dma_mask = _dma_mask; + child_device_obj->device.dma_parms = _device_obj->dma_parms; return 0; err_kset_unregister: diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index 20595c0ba0ae..ae293600d799 100644 --- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -21,6 +21,8 @@ #include #include #include +#include + #include #include #include @@ -1336,6 +1338,7 @@ static void storvsc_on_channel_callback(void *context) continue; } request = (struct storvsc_cmd_request *)scsi_cmd_priv(scmnd); + scsi_dma_unmap(scmnd); } storvsc_on_receive(stor_device, packet, request); @@ -1749,7 +1752,6 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd) struct hv_host_device *host_dev = shost_priv(host); struct hv_device *dev = host_dev->dev; struct storvsc_cmd_request *cmd_request = scsi_cmd_priv(scmnd); - int i; struct scatterlist *sgl; unsigned int sg_count; struct vmscsi_request *vm_srb; @@ -1831,10 +1833,11 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd) payload_sz = sizeof(cmd_request->mpb); if (sg_count) { - unsigned int hvpgoff, hvpfns_to_add; unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset); unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length); - u64 hvpfn; + struct scatterlist *sg; + unsigned long hvpfn, hvpfns_to_add; + int j, i = 0; if (hvpg_count > MAX_PAGE_BUFFER_COUNT) { @@ -1848,21 +1851,22 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd) payload->range.len = length; payload->range.offset = offset_in_hvpg; + sg_count = scsi_dma_map(scmnd); + if (sg_count < 0) + return SCSI_MLQUEUE_DEVICE_BUSY; - for (i = 0; sgl != NULL; sgl = sg_next(sgl)) { + for_each_sg(sgl, sg, sg_count, j) { /* -* Init values for the current sgl entry. hvpgoff -* and hvpfns_to_add are in units of Hyper-V size -* pages. Handling the PAGE_SIZE != HV_HYP_PAGE_SIZE -* case also handles values of sgl->offset that are -* larger than PAGE_SIZE. Such offsets are handled -* even on other than the first sgl entry, provided -* they are a multiple of PAGE_SIZE. +* Init values for the current sgl entry. hvpfns_to_add +* is in units of Hyper-V size pages. Handling the +* PAGE_SIZE != HV_HYP_PAGE_SIZE case also handles +* values of sgl->offset that are larger than PAGE_SIZE. +* Such offsets are handled even on other than the first +* sgl entry, provided they are a multiple of PAGE_SIZE. */ - hvpgoff = HVPFN_DOWN(sgl->offset); -
[PATCH V7 3/5] hyper-v: Enable swiotlb bounce buffer for Isolation VM
From: Tianyu Lan hyperv Isolation VM requires bounce buffer support to copy data from/to encrypted memory and so enable swiotlb force mode to use swiotlb bounce buffer for DMA transaction. In Isolation VM with AMD SEV, the bounce buffer needs to be accessed via extra address space which is above shared_gpa_boundary (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access physical address will be original physical address + shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of memory(vTOM). Memory addresses below vTOM are automatically treated as private while memory above vTOM is treated as shared. Swiotlb bounce buffer code calls set_memory_decrypted() to mark bounce buffer visible to host and map it in extra address space via memremap. Populate the shared_gpa_boundary (vTOM) via swiotlb_unencrypted_base variable. The map function memremap() can't work in the early place (e.g ms_hyperv_init_platform()) and so call swiotlb_update_mem_ attributes() in the hyperv_init(). Signed-off-by: Tianyu Lan --- Change since v6: * Fix compile error when swiotlb is not enabled. Change since v4: * Remove Hyper-V IOMMU IOMMU_INIT_FINISH related functions and set SWIOTLB_FORCE and swiotlb_unencrypted_base in the ms_hyperv_init_platform(). Call swiotlb_update_mem_attributes() in the hyperv_init(). Change since v3: * Add comment in pci-swiotlb-xen.c to explain why add dependency between hyperv_swiotlb_detect() and pci_ xen_swiotlb_detect(). * Return directly when fails to allocate Hyper-V swiotlb buffer in the hyperv_iommu_swiotlb_init(). --- arch/x86/hyperv/hv_init.c | 12 arch/x86/kernel/cpu/mshyperv.c | 15 ++- 2 files changed, 26 insertions(+), 1 deletion(-) diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c index 24f4a06ac46a..749906a8e068 100644 --- a/arch/x86/hyperv/hv_init.c +++ b/arch/x86/hyperv/hv_init.c @@ -28,6 +28,7 @@ #include #include #include +#include int hyperv_init_cpuhp; u64 hv_current_partition_id = ~0ull; @@ -502,6 +503,17 @@ void __init hyperv_init(void) /* Query the VMs extended capability once, so that it can be cached. */ hv_query_ext_cap(0); + +#ifdef CONFIG_SWIOTLB + /* +* Swiotlb bounce buffer needs to be mapped in extra address +* space. Map function doesn't work in the early place and so +* call swiotlb_update_mem_attributes() here. +*/ + if (hv_is_isolation_supported()) + swiotlb_update_mem_attributes(); +#endif + return; clean_guest_os_id: diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c index 4794b716ec79..e3a240c5e4f5 100644 --- a/arch/x86/kernel/cpu/mshyperv.c +++ b/arch/x86/kernel/cpu/mshyperv.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include #include @@ -319,8 +320,20 @@ static void __init ms_hyperv_init_platform(void) pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n", ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b); - if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) + if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) { static_branch_enable(_type_snp); +#ifdef CONFIG_SWIOTLB + swiotlb_unencrypted_base = ms_hyperv.shared_gpa_boundary; +#endif + } + +#ifdef CONFIG_SWIOTLB + /* +* Enable swiotlb force mode in Isolation VM to +* use swiotlb bounce buffer for dma transaction. +*/ + swiotlb_force = SWIOTLB_FORCE; +#endif } if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) { -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH V7 2/5] x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()
From: Tianyu Lan Hyper-V provides Isolation VM for confidential computing support and guest memory is encrypted in it. Places checking cc_platform_has() with GUEST_MEM_ENCRYPT attr should return "True" in Isolation vm. e.g, swiotlb bounce buffer size needs to adjust according to memory size in the sev_setup_arch(). Add GUEST_MEM_ENCRYPT check for Hyper-V Isolation VM. Signed-off-by: Tianyu Lan --- Change since v6: * Change the order in the cc_platform_has() and check sev first. Change since v3: * Change code style of checking GUEST_MEM attribute in the hyperv_cc_platform_has(). --- arch/x86/kernel/cc_platform.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c index 03bb2f343ddb..6cb3a675e686 100644 --- a/arch/x86/kernel/cc_platform.c +++ b/arch/x86/kernel/cc_platform.c @@ -11,6 +11,7 @@ #include #include +#include #include static bool __maybe_unused intel_cc_platform_has(enum cc_attr attr) @@ -58,12 +59,19 @@ static bool amd_cc_platform_has(enum cc_attr attr) #endif } +static bool hyperv_cc_platform_has(enum cc_attr attr) +{ + return attr == CC_ATTR_GUEST_MEM_ENCRYPT; +} bool cc_platform_has(enum cc_attr attr) { if (sme_me_mask) return amd_cc_platform_has(attr); + if (hv_is_isolation_supported()) + return hyperv_cc_platform_has(attr); + return false; } EXPORT_SYMBOL_GPL(cc_platform_has); -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH V7 0/5] x86/Hyper-V: Add Hyper-V Isolation VM support(Second part)
From: Tianyu Lan Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset is to add support for these Isolation VM support in Linux. The memory of these vms are encrypted and host can't access guest memory directly. Hyper-V provides new host visibility hvcall and the guest needs to call new hvcall to mark memory visible to host before sharing memory with host. For security, all network/storage stack memory should not be shared with host and so there is bounce buffer requests. Vmbus channel ring buffer already plays bounce buffer role because all data from/to host needs to copy from/to between the ring buffer and IO stack memory. So mark vmbus channel ring buffer visible. For SNP isolation VM, guest needs to access the shared memory via extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_ ISOLATION_CONFIG. The access physical address of the shared memory should be bounce buffer memory GPA plus with shared_gpa_boundary reported by CPUID. This patchset is to enable swiotlb bounce buffer for netvsc/storvsc drivers in Isolation VM. Change since v6: * Fix compile error in hv_init.c and mshyperv.c when swiotlb is not enabled. * Change the order in the cc_platform_has() and check sev first. Change sicne v5: * Modify "Swiotlb" to "swiotlb" in commit log. * Remove CONFIG_HYPERV check in the hyperv_cc_platform_has() Change since v4: * Remove Hyper-V IOMMU IOMMU_INIT_FINISH related functions and set SWIOTLB_FORCE and swiotlb_unencrypted_base in the ms_hyperv_init_platform(). Call swiotlb_update_mem_attributes() in the hyperv_init(). Change since v3: * Fix boot up failure on the host with mem_encrypt=on. Move calloing of set_memory_decrypted() back from swiotlb_init_io_tlb_mem to swiotlb_late_init_with_tbl() and rmem_swiotlb_device_init(). * Change code style of checking GUEST_MEM attribute in the hyperv_cc_platform_has(). * Add comment in pci-swiotlb-xen.c to explain why add dependency between hyperv_swiotlb_detect() and pci_ xen_swiotlb_detect(). * Return directly when fails to allocate Hyper-V swiotlb buffer in the hyperv_iommu_swiotlb_init(). Change since v2: * Remove Hyper-V dma ops and dma_alloc/free_noncontiguous. Add hv_map/unmap_memory() to map/umap netvsc rx/tx ring into extra address space. * Leave mem->vaddr in swiotlb code with phys_to_virt(mem->start) when fail to remap swiotlb memory. Change since v1: * Add Hyper-V Isolation support check in the cc_platform_has() and return true for guest memory encrypt attr. * Remove hv isolation check in the sev_setup_arch() Tianyu Lan (5): swiotlb: Add swiotlb bounce buffer remap function for HV IVM x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has() hyper-v: Enable swiotlb bounce buffer for Isolation VM scsi: storvsc: Add Isolation VM support for storvsc driver net: netvsc: Add Isolation VM support for netvsc driver arch/x86/hyperv/hv_init.c | 12 +++ arch/x86/hyperv/ivm.c | 28 ++ arch/x86/kernel/cc_platform.c | 8 ++ arch/x86/kernel/cpu/mshyperv.c| 15 +++- drivers/hv/hv_common.c| 11 +++ drivers/hv/vmbus_drv.c| 4 + drivers/net/hyperv/hyperv_net.h | 5 ++ drivers/net/hyperv/netvsc.c | 136 +- drivers/net/hyperv/netvsc_drv.c | 1 + drivers/net/hyperv/rndis_filter.c | 2 + drivers/scsi/storvsc_drv.c| 37 include/asm-generic/mshyperv.h| 2 + include/linux/hyperv.h| 6 ++ include/linux/swiotlb.h | 6 ++ kernel/dma/swiotlb.c | 43 +- 15 files changed, 294 insertions(+), 22 deletions(-) -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH V7 1/5] swiotlb: Add swiotlb bounce buffer remap function for HV IVM
From: Tianyu Lan In Isolation VM with AMD SEV, bounce buffer needs to be accessed via extra address space which is above shared_gpa_boundary (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access physical address will be original physical address + shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of memory(vTOM). Memory addresses below vTOM are automatically treated as private while memory above vTOM is treated as shared. Expose swiotlb_unencrypted_base for platforms to set unencrypted memory base offset and platform calls swiotlb_update_mem_attributes() to remap swiotlb mem to unencrypted address space. memremap() can not be called in the early stage and so put remapping code into swiotlb_update_mem_attributes(). Store remap address and use it to copy data from/to swiotlb bounce buffer. Acked-by: Christoph Hellwig Signed-off-by: Tianyu Lan --- Change since v3: * Fix boot up failure on the host with mem_encrypt=on. Move calloing of set_memory_decrypted() back from swiotlb_init_io_tlb_mem to swiotlb_late_init_with_tbl() and rmem_swiotlb_device_init(). Change since v2: * Leave mem->vaddr with phys_to_virt(mem->start) when fail to remap swiotlb memory. Change since v1: * Rework comment in the swiotlb_init_io_tlb_mem() * Make swiotlb_init_io_tlb_mem() back to return void. --- include/linux/swiotlb.h | 6 ++ kernel/dma/swiotlb.c| 43 +++-- 2 files changed, 47 insertions(+), 2 deletions(-) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 569272871375..f6c3638255d5 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -73,6 +73,9 @@ extern enum swiotlb_force swiotlb_force; * @end: The end address of the swiotlb memory pool. Used to do a quick * range check to see if the memory was in fact allocated by this * API. + * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb memory pool + * may be remapped in the memory encrypted case and store virtual + * address for bounce buffer operation. * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and * @end. For default swiotlb, this is command line adjustable via * setup_io_tlb_npages. @@ -92,6 +95,7 @@ extern enum swiotlb_force swiotlb_force; struct io_tlb_mem { phys_addr_t start; phys_addr_t end; + void *vaddr; unsigned long nslabs; unsigned long used; unsigned int index; @@ -186,4 +190,6 @@ static inline bool is_swiotlb_for_alloc(struct device *dev) } #endif /* CONFIG_DMA_RESTRICTED_POOL */ +extern phys_addr_t swiotlb_unencrypted_base; + #endif /* __LINUX_SWIOTLB_H */ diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 8e840fbbed7c..34e6ade4f73c 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -50,6 +50,7 @@ #include #include +#include #include #include #include @@ -72,6 +73,8 @@ enum swiotlb_force swiotlb_force; struct io_tlb_mem io_tlb_default_mem; +phys_addr_t swiotlb_unencrypted_base; + /* * Max segment that we can provide which (if pages are contingous) will * not be bounced (unless SWIOTLB_FORCE is set). @@ -155,6 +158,27 @@ static inline unsigned long nr_slots(u64 val) return DIV_ROUND_UP(val, IO_TLB_SIZE); } +/* + * Remap swioltb memory in the unencrypted physical address space + * when swiotlb_unencrypted_base is set. (e.g. for Hyper-V AMD SEV-SNP + * Isolation VMs). + */ +void *swiotlb_mem_remap(struct io_tlb_mem *mem, unsigned long bytes) +{ + void *vaddr = NULL; + + if (swiotlb_unencrypted_base) { + phys_addr_t paddr = mem->start + swiotlb_unencrypted_base; + + vaddr = memremap(paddr, bytes, MEMREMAP_WB); + if (!vaddr) + pr_err("Failed to map the unencrypted memory %llx size %lx.\n", + paddr, bytes); + } + + return vaddr; +} + /* * Early SWIOTLB allocation may be too early to allow an architecture to * perform the desired operations. This function allows the architecture to @@ -172,7 +196,12 @@ void __init swiotlb_update_mem_attributes(void) vaddr = phys_to_virt(mem->start); bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT); set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT); - memset(vaddr, 0, bytes); + + mem->vaddr = swiotlb_mem_remap(mem, bytes); + if (!mem->vaddr) + mem->vaddr = vaddr; + + memset(mem->vaddr, 0, bytes); } static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, @@ -196,7 +225,17 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, mem->slots[i].orig_addr = INVALID
Re: [PATCH V6 3/5] hyper-v: Enable swiotlb bounce buffer for Isolation VM
On 12/10/2021 9:25 PM, Tianyu Lan wrote: @@ -319,8 +320,16 @@ static void __init ms_hyperv_init_platform(void) pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n", ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b); - if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) + if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) { static_branch_enable(_type_snp); + swiotlb_unencrypted_base = ms_hyperv.shared_gpa_boundary; + } + + /* + * Enable swiotlb force mode in Isolation VM to + * use swiotlb bounce buffer for dma transaction. + */ + swiotlb_force = SWIOTLB_FORCE; I'm good with this approach that directly updates the swiotlb settings here rather than in IOMMU initialization code. It's a lot more straightforward. However, there's an issue if building for X86_32 without PAE, in that the swiotlb module may not be built, resulting in compile and link errors. The swiotlb.h file needs to be updated to provide a stub function for swiotlb_update_mem_attributes(). swiotlb_unencrypted_base probably needs wrapper functions to get/set it, which can be stubs when CONFIG_SWIOTLB is not set. swiotlb_force is a bit of a mess in that it already has a stub definition that assumes it will only be read, and not set. A bit of thinking will be needed to sort that out. It's ok to fix the issue via selecting swiotlb when CONFIG_HYPERV is set? Sorry. ignore the previous statement. These codes doesn't depend on CONFIG_HYPERV. How about making these code under #ifdef CONFIG_X86_64 or CONFIG_SWIOTLB? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V6 3/5] hyper-v: Enable swiotlb bounce buffer for Isolation VM
On 12/10/2021 4:09 AM, Michael Kelley (LINUX) wrote: @@ -319,8 +320,16 @@ static void __init ms_hyperv_init_platform(void) pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n", ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b); - if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) + if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) { static_branch_enable(_type_snp); + swiotlb_unencrypted_base = ms_hyperv.shared_gpa_boundary; + } + + /* +* Enable swiotlb force mode in Isolation VM to +* use swiotlb bounce buffer for dma transaction. +*/ + swiotlb_force = SWIOTLB_FORCE; I'm good with this approach that directly updates the swiotlb settings here rather than in IOMMU initialization code. It's a lot more straightforward. However, there's an issue if building for X86_32 without PAE, in that the swiotlb module may not be built, resulting in compile and link errors. The swiotlb.h file needs to be updated to provide a stub function for swiotlb_update_mem_attributes(). swiotlb_unencrypted_base probably needs wrapper functions to get/set it, which can be stubs when CONFIG_SWIOTLB is not set. swiotlb_force is a bit of a mess in that it already has a stub definition that assumes it will only be read, and not set. A bit of thinking will be needed to sort that out. It's ok to fix the issue via selecting swiotlb when CONFIG_HYPERV is set? } if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) { diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index b823311eac79..1f037e114dc8 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -1726,6 +1726,14 @@ int hyperv_write_cfg_blk(struct pci_dev *dev, void *buf, unsigned int len, int hyperv_reg_block_invalidate(struct pci_dev *dev, void *context, void (*block_invalidate)(void *context, u64 block_mask)); +#if IS_ENABLED(CONFIG_HYPERV) +int __init hyperv_swiotlb_detect(void); +#else +static inline int __init hyperv_swiotlb_detect(void) +{ + return 0; +} +#endif I don't think hyperv_swiotlb_detect() is used any longer, so this change should be dropped. Yes, will update. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V6 2/5] x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()
On 12/10/2021 4:38 AM, Michael Kelley (LINUX) wrote: From: Tianyu Lan Sent: Monday, December 6, 2021 11:56 PM Hyper-V provides Isolation VM which has memory encrypt support. Add hyperv_cc_platform_has() and return true for check of GUEST_MEM_ENCRYPT attribute. Signed-off-by: Tianyu Lan --- Change since v3: * Change code style of checking GUEST_MEM attribute in the hyperv_cc_platform_has(). --- arch/x86/kernel/cc_platform.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c index 03bb2f343ddb..47db88c275d5 100644 --- a/arch/x86/kernel/cc_platform.c +++ b/arch/x86/kernel/cc_platform.c @@ -11,6 +11,7 @@ #include #include +#include #include static bool __maybe_unused intel_cc_platform_has(enum cc_attr attr) @@ -58,9 +59,16 @@ static bool amd_cc_platform_has(enum cc_attr attr) #endif } +static bool hyperv_cc_platform_has(enum cc_attr attr) +{ + return attr == CC_ATTR_GUEST_MEM_ENCRYPT; +} bool cc_platform_has(enum cc_attr attr) { + if (hv_is_isolation_supported()) + return hyperv_cc_platform_has(attr); + if (sme_me_mask) return amd_cc_platform_has(attr); Throughout Linux kernel code, there are about 20 calls to cc_platform_has() with CC_ATTR_GUEST_MEM_ENCRYPT as the argument. The original code (from v1 of this patch set) only dealt with the call in sev_setup_arch(). But with this patch, all the other calls that previously returned "false" will now return "true" in a Hyper-V Isolated VM. I didn't try to analyze all these other calls, so I think there's an open question about whether this is the behavior we want. CC_ATTR_GUEST_MEM_ENCRYPT is for SEV support so far. Hyper-V Isolation VM is based on SEV or software memory encrypt. Most checks can be reused. The difference is that SEV code use encrypt bit in the page table to encrypt and decrypt memory while Hyper-V uses vTOM. But the sev memory encrypt mask "sme_me_mask" is unset in the Hyper-V Isolation VM where claims sev and sme are unsupported. The rest of checks for mem enc bit are still safe. So reuse CC_ATTR_GUEST_MEM_ENCRYPT for Hyper-V. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V6 4/5] scsi: storvsc: Add Isolation VM support for storvsc driver
On 12/9/2021 4:00 PM, Long Li wrote: @@ -1848,21 +1851,22 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd) payload->range.len = length; payload->range.offset = offset_in_hvpg; + sg_count = scsi_dma_map(scmnd); + if (sg_count < 0) + return SCSI_MLQUEUE_DEVICE_BUSY; Hi Tianyu, This patch (and this patch series) unconditionally adds code for dealing with DMA addresses for all VMs, including non-isolation VMs. Does this add performance penalty for VMs that don't require isolation? Hi Long: scsi_dma_map() in the traditional VM just save sg->offset to sg->dma_address and no data copy because swiotlb bounce buffer code doesn't work. The data copy only takes place in the Isolation VM and swiotlb_force is set. So there is no additional overhead in the traditional VM. Thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V6 5/5] net: netvsc: Add Isolation VM support for netvsc driver
On 12/9/2021 4:14 AM, Haiyang Zhang wrote: From: Tianyu Lan Sent: Tuesday, December 7, 2021 2:56 AM To: KY Srinivasan ; Haiyang Zhang ; Stephen Hemminger ; wei@kernel.org; Dexuan Cui ; t...@linutronix.de; mi...@redhat.com; b...@alien8.de; dave.han...@linux.intel.com; x...@kernel.org; h...@zytor.com; da...@davemloft.net; k...@kernel.org; j...@linux.ibm.com; martin.peter...@oracle.com; a...@arndb.de; h...@infradead.org; m.szyprow...@samsung.com; robin.mur...@arm.com; Tianyu Lan ; thomas.lenda...@amd.com; Michael Kelley (LINUX) Cc: iommu@lists.linux-foundation.org; linux-a...@vger.kernel.org; linux- hyp...@vger.kernel.org; linux-ker...@vger.kernel.org; linux-s...@vger.kernel.org; net...@vger.kernel.org; vkuznets ; brijesh.si...@amd.com; konrad.w...@oracle.com; h...@lst.de; j...@8bytes.org; parri.and...@gmail.com; dave.han...@intel.com Subject: [PATCH V6 5/5] net: netvsc: Add Isolation VM support for netvsc driver From: Tianyu Lan In Isolation VM, all shared memory with host needs to mark visible to host via hvcall. vmbus_establish_gpadl() has already done it for netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_ pagebuffer() stills need to be handled. Use DMA API to map/umap these memory during sending/receiving packet and Hyper-V swiotlb bounce buffer dma address will be returned. The swiotlb bounce buffer has been masked to be visible to host during boot up. rx/tx ring buffer is allocated via vzalloc() and they need to be mapped into unencrypted address space(above vTOM) before sharing with host and accessing. Add hv_map/unmap_memory() to map/umap rx /tx ring buffer. Signed-off-by: Tianyu Lan --- Change since v3: * Replace HV_HYP_PAGE_SIZE with PAGE_SIZE and virt_to_hvpfn() with vmalloc_to_pfn() in the hv_map_memory() Change since v2: * Add hv_map/unmap_memory() to map/umap rx/tx ring buffer. --- arch/x86/hyperv/ivm.c | 28 ++ drivers/hv/hv_common.c| 11 +++ drivers/net/hyperv/hyperv_net.h | 5 ++ drivers/net/hyperv/netvsc.c | 136 +- drivers/net/hyperv/netvsc_drv.c | 1 + drivers/net/hyperv/rndis_filter.c | 2 + include/asm-generic/mshyperv.h| 2 + include/linux/hyperv.h| 5 ++ 8 files changed, 187 insertions(+), 3 deletions(-) diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c index 69c7a57f3307..2b994117581e 100644 --- a/arch/x86/hyperv/ivm.c +++ b/arch/x86/hyperv/ivm.c @@ -287,3 +287,31 @@ int hv_set_mem_host_visibility(unsigned long kbuffer, int pagecount, bool visibl kfree(pfn_array); return ret; } + +/* + * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM. + */ +void *hv_map_memory(void *addr, unsigned long size) +{ + unsigned long *pfns = kcalloc(size / PAGE_SIZE, + sizeof(unsigned long), GFP_KERNEL); + void *vaddr; + int i; + + if (!pfns) + return NULL; + + for (i = 0; i < size / PAGE_SIZE; i++) + pfns[i] = vmalloc_to_pfn(addr + i * PAGE_SIZE) + + (ms_hyperv.shared_gpa_boundary >> PAGE_SHIFT); + + vaddr = vmap_pfn(pfns, size / PAGE_SIZE, PAGE_KERNEL_IO); + kfree(pfns); + + return vaddr; +} + +void hv_unmap_memory(void *addr) +{ + vunmap(addr); +} diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c index 7be173a99f27..3c5cb1f70319 100644 --- a/drivers/hv/hv_common.c +++ b/drivers/hv/hv_common.c @@ -295,3 +295,14 @@ u64 __weak hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_s return HV_STATUS_INVALID_PARAMETER; } EXPORT_SYMBOL_GPL(hv_ghcb_hypercall); + +void __weak *hv_map_memory(void *addr, unsigned long size) +{ + return NULL; +} +EXPORT_SYMBOL_GPL(hv_map_memory); + +void __weak hv_unmap_memory(void *addr) +{ +} +EXPORT_SYMBOL_GPL(hv_unmap_memory); diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h index 315278a7cf88..cf69da0e296c 100644 --- a/drivers/net/hyperv/hyperv_net.h +++ b/drivers/net/hyperv/hyperv_net.h @@ -164,6 +164,7 @@ struct hv_netvsc_packet { u32 total_bytes; u32 send_buf_index; u32 total_data_buflen; + struct hv_dma_range *dma_range; }; #define NETVSC_HASH_KEYLEN 40 @@ -1074,6 +1075,7 @@ struct netvsc_device { /* Receive buffer allocated by us but manages by NetVSP */ void *recv_buf; + void *recv_original_buf; u32 recv_buf_size; /* allocated bytes */ struct vmbus_gpadl recv_buf_gpadl_handle; u32 recv_section_cnt; @@ -1082,6 +1084,7 @@ struct netvsc_device { /* Send buffer allocated by us */ void *send_buf; + void *send_original_buf; u32 send_buf_size; struct vmbus_gpadl send_buf_gpadl_handle; u32 send_section_cnt; @@ -1731,4 +1734,6 @@ struct rndis_message { #define RETRY_US_HI 1 #define RETRY_MAX
Re: [PATCH V6 2/5] x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()
Hi Borislav: Thanks for your review. On 12/7/2021 5:47 PM, Borislav Petkov wrote: On Tue, Dec 07, 2021 at 02:55:58AM -0500, Tianyu Lan wrote: From: Tianyu Lan Hyper-V provides Isolation VM which has memory encrypt support. Add hyperv_cc_platform_has() and return true for check of GUEST_MEM_ENCRYPT attribute. You need to refresh on how to write commit messages - never say what the patch is doing - that's visible in the diff itself. Rather, you should talk about *why* it is doing what it is doing. Sure. Will update. bool cc_platform_has(enum cc_attr attr) { + if (hv_is_isolation_supported()) + return hyperv_cc_platform_has(attr); Is there any reason for the hv_is_.. check to come before... Do you mean to check hyper-v before sev? If yes, no special reason. + if (sme_me_mask) return amd_cc_platform_has(attr); ... the sme_me_mask check? What's in sme_me_mask on hyperv? sme_me_mask is unset in this case. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH V6 5/5] net: netvsc: Add Isolation VM support for netvsc driver
From: Tianyu Lan In Isolation VM, all shared memory with host needs to mark visible to host via hvcall. vmbus_establish_gpadl() has already done it for netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_ pagebuffer() stills need to be handled. Use DMA API to map/umap these memory during sending/receiving packet and Hyper-V swiotlb bounce buffer dma address will be returned. The swiotlb bounce buffer has been masked to be visible to host during boot up. rx/tx ring buffer is allocated via vzalloc() and they need to be mapped into unencrypted address space(above vTOM) before sharing with host and accessing. Add hv_map/unmap_memory() to map/umap rx /tx ring buffer. Signed-off-by: Tianyu Lan --- Change since v3: * Replace HV_HYP_PAGE_SIZE with PAGE_SIZE and virt_to_hvpfn() with vmalloc_to_pfn() in the hv_map_memory() Change since v2: * Add hv_map/unmap_memory() to map/umap rx/tx ring buffer. --- arch/x86/hyperv/ivm.c | 28 ++ drivers/hv/hv_common.c| 11 +++ drivers/net/hyperv/hyperv_net.h | 5 ++ drivers/net/hyperv/netvsc.c | 136 +- drivers/net/hyperv/netvsc_drv.c | 1 + drivers/net/hyperv/rndis_filter.c | 2 + include/asm-generic/mshyperv.h| 2 + include/linux/hyperv.h| 5 ++ 8 files changed, 187 insertions(+), 3 deletions(-) diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c index 69c7a57f3307..2b994117581e 100644 --- a/arch/x86/hyperv/ivm.c +++ b/arch/x86/hyperv/ivm.c @@ -287,3 +287,31 @@ int hv_set_mem_host_visibility(unsigned long kbuffer, int pagecount, bool visibl kfree(pfn_array); return ret; } + +/* + * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM. + */ +void *hv_map_memory(void *addr, unsigned long size) +{ + unsigned long *pfns = kcalloc(size / PAGE_SIZE, + sizeof(unsigned long), GFP_KERNEL); + void *vaddr; + int i; + + if (!pfns) + return NULL; + + for (i = 0; i < size / PAGE_SIZE; i++) + pfns[i] = vmalloc_to_pfn(addr + i * PAGE_SIZE) + + (ms_hyperv.shared_gpa_boundary >> PAGE_SHIFT); + + vaddr = vmap_pfn(pfns, size / PAGE_SIZE, PAGE_KERNEL_IO); + kfree(pfns); + + return vaddr; +} + +void hv_unmap_memory(void *addr) +{ + vunmap(addr); +} diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c index 7be173a99f27..3c5cb1f70319 100644 --- a/drivers/hv/hv_common.c +++ b/drivers/hv/hv_common.c @@ -295,3 +295,14 @@ u64 __weak hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_s return HV_STATUS_INVALID_PARAMETER; } EXPORT_SYMBOL_GPL(hv_ghcb_hypercall); + +void __weak *hv_map_memory(void *addr, unsigned long size) +{ + return NULL; +} +EXPORT_SYMBOL_GPL(hv_map_memory); + +void __weak hv_unmap_memory(void *addr) +{ +} +EXPORT_SYMBOL_GPL(hv_unmap_memory); diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h index 315278a7cf88..cf69da0e296c 100644 --- a/drivers/net/hyperv/hyperv_net.h +++ b/drivers/net/hyperv/hyperv_net.h @@ -164,6 +164,7 @@ struct hv_netvsc_packet { u32 total_bytes; u32 send_buf_index; u32 total_data_buflen; + struct hv_dma_range *dma_range; }; #define NETVSC_HASH_KEYLEN 40 @@ -1074,6 +1075,7 @@ struct netvsc_device { /* Receive buffer allocated by us but manages by NetVSP */ void *recv_buf; + void *recv_original_buf; u32 recv_buf_size; /* allocated bytes */ struct vmbus_gpadl recv_buf_gpadl_handle; u32 recv_section_cnt; @@ -1082,6 +1084,7 @@ struct netvsc_device { /* Send buffer allocated by us */ void *send_buf; + void *send_original_buf; u32 send_buf_size; struct vmbus_gpadl send_buf_gpadl_handle; u32 send_section_cnt; @@ -1731,4 +1734,6 @@ struct rndis_message { #define RETRY_US_HI1 #define RETRY_MAX 2000/* >10 sec */ +void netvsc_dma_unmap(struct hv_device *hv_dev, + struct hv_netvsc_packet *packet); #endif /* _HYPERV_NET_H */ diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c index 396bc1c204e6..b7ade735a806 100644 --- a/drivers/net/hyperv/netvsc.c +++ b/drivers/net/hyperv/netvsc.c @@ -153,8 +153,21 @@ static void free_netvsc_device(struct rcu_head *head) int i; kfree(nvdev->extension); - vfree(nvdev->recv_buf); - vfree(nvdev->send_buf); + + if (nvdev->recv_original_buf) { + hv_unmap_memory(nvdev->recv_buf); + vfree(nvdev->recv_original_buf); + } else { + vfree(nvdev->recv_buf); + } + + if (nvdev->send_original_buf) { + hv_unmap_memory(nvdev->send_buf); + vfree(nvdev->send_original_buf); + } else { + vfree(nvdev-&
[PATCH V6 3/5] hyper-v: Enable swiotlb bounce buffer for Isolation VM
From: Tianyu Lan hyperv Isolation VM requires bounce buffer support to copy data from/to encrypted memory and so enable swiotlb force mode to use swiotlb bounce buffer for DMA transaction. In Isolation VM with AMD SEV, the bounce buffer needs to be accessed via extra address space which is above shared_gpa_boundary (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access physical address will be original physical address + shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of memory(vTOM). Memory addresses below vTOM are automatically treated as private while memory above vTOM is treated as shared. Swiotlb bounce buffer code calls set_memory_decrypted() to mark bounce buffer visible to host and map it in extra address space via memremap. Populate the shared_gpa_boundary (vTOM) via swiotlb_unencrypted_base variable. The map function memremap() can't work in the early place (e.g ms_hyperv_init_platform()) and so call swiotlb_update_mem_ attributes() in the hyperv_init(). Signed-off-by: Tianyu Lan --- Change since v4: * Remove Hyper-V IOMMU IOMMU_INIT_FINISH related functions and set SWIOTLB_FORCE and swiotlb_unencrypted_base in the ms_hyperv_init_platform(). Call swiotlb_update_mem_attributes() in the hyperv_init(). Change since v3: * Add comment in pci-swiotlb-xen.c to explain why add dependency between hyperv_swiotlb_detect() and pci_ xen_swiotlb_detect(). * Return directly when fails to allocate Hyper-V swiotlb buffer in the hyperv_iommu_swiotlb_init(). --- arch/x86/hyperv/hv_init.c | 10 ++ arch/x86/kernel/cpu/mshyperv.c | 11 ++- include/linux/hyperv.h | 8 3 files changed, 28 insertions(+), 1 deletion(-) diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c index 24f4a06ac46a..9e18a280f89d 100644 --- a/arch/x86/hyperv/hv_init.c +++ b/arch/x86/hyperv/hv_init.c @@ -28,6 +28,7 @@ #include #include #include +#include int hyperv_init_cpuhp; u64 hv_current_partition_id = ~0ull; @@ -502,6 +503,15 @@ void __init hyperv_init(void) /* Query the VMs extended capability once, so that it can be cached. */ hv_query_ext_cap(0); + + /* +* Swiotlb bounce buffer needs to be mapped in extra address +* space. Map function doesn't work in the early place and so +* call swiotlb_update_mem_attributes() here. +*/ + if (hv_is_isolation_supported()) + swiotlb_update_mem_attributes(); + return; clean_guest_os_id: diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c index 4794b716ec79..baf3a0873552 100644 --- a/arch/x86/kernel/cpu/mshyperv.c +++ b/arch/x86/kernel/cpu/mshyperv.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include #include @@ -319,8 +320,16 @@ static void __init ms_hyperv_init_platform(void) pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n", ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b); - if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) + if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) { static_branch_enable(_type_snp); + swiotlb_unencrypted_base = ms_hyperv.shared_gpa_boundary; + } + + /* +* Enable swiotlb force mode in Isolation VM to +* use swiotlb bounce buffer for dma transaction. +*/ + swiotlb_force = SWIOTLB_FORCE; } if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) { diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index b823311eac79..1f037e114dc8 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -1726,6 +1726,14 @@ int hyperv_write_cfg_blk(struct pci_dev *dev, void *buf, unsigned int len, int hyperv_reg_block_invalidate(struct pci_dev *dev, void *context, void (*block_invalidate)(void *context, u64 block_mask)); +#if IS_ENABLED(CONFIG_HYPERV) +int __init hyperv_swiotlb_detect(void); +#else +static inline int __init hyperv_swiotlb_detect(void) +{ + return 0; +} +#endif struct hyperv_pci_block_ops { int (*read_block)(struct pci_dev *dev, void *buf, unsigned int buf_len, -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH V6 4/5] scsi: storvsc: Add Isolation VM support for storvsc driver
From: Tianyu Lan In Isolation VM, all shared memory with host needs to mark visible to host via hvcall. vmbus_establish_gpadl() has already done it for storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_ mpb_desc() still needs to be handled. Use DMA API(scsi_dma_map/unmap) to map these memory during sending/receiving packet and return swiotlb bounce buffer dma address. In Isolation VM, swiotlb bounce buffer is marked to be visible to host and the swiotlb force mode is enabled. Set device's dma min align mask to HV_HYP_PAGE_SIZE - 1 in order to keep the original data offset in the bounce buffer. Signed-off-by: Tianyu Lan --- drivers/hv/vmbus_drv.c | 4 drivers/scsi/storvsc_drv.c | 37 + include/linux/hyperv.h | 1 + 3 files changed, 26 insertions(+), 16 deletions(-) diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index 392c1ac4f819..ae6ec503399a 100644 --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -33,6 +33,7 @@ #include #include #include +#include #include #include "hyperv_vmbus.h" @@ -2078,6 +2079,7 @@ struct hv_device *vmbus_device_create(const guid_t *type, return child_device_obj; } +static u64 vmbus_dma_mask = DMA_BIT_MASK(64); /* * vmbus_device_register - Register the child device */ @@ -2118,6 +2120,8 @@ int vmbus_device_register(struct hv_device *child_device_obj) } hv_debug_add_dev_dir(child_device_obj); + child_device_obj->device.dma_mask = _dma_mask; + child_device_obj->device.dma_parms = _device_obj->dma_parms; return 0; err_kset_unregister: diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index 20595c0ba0ae..ae293600d799 100644 --- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -21,6 +21,8 @@ #include #include #include +#include + #include #include #include @@ -1336,6 +1338,7 @@ static void storvsc_on_channel_callback(void *context) continue; } request = (struct storvsc_cmd_request *)scsi_cmd_priv(scmnd); + scsi_dma_unmap(scmnd); } storvsc_on_receive(stor_device, packet, request); @@ -1749,7 +1752,6 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd) struct hv_host_device *host_dev = shost_priv(host); struct hv_device *dev = host_dev->dev; struct storvsc_cmd_request *cmd_request = scsi_cmd_priv(scmnd); - int i; struct scatterlist *sgl; unsigned int sg_count; struct vmscsi_request *vm_srb; @@ -1831,10 +1833,11 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd) payload_sz = sizeof(cmd_request->mpb); if (sg_count) { - unsigned int hvpgoff, hvpfns_to_add; unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset); unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length); - u64 hvpfn; + struct scatterlist *sg; + unsigned long hvpfn, hvpfns_to_add; + int j, i = 0; if (hvpg_count > MAX_PAGE_BUFFER_COUNT) { @@ -1848,21 +1851,22 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd) payload->range.len = length; payload->range.offset = offset_in_hvpg; + sg_count = scsi_dma_map(scmnd); + if (sg_count < 0) + return SCSI_MLQUEUE_DEVICE_BUSY; - for (i = 0; sgl != NULL; sgl = sg_next(sgl)) { + for_each_sg(sgl, sg, sg_count, j) { /* -* Init values for the current sgl entry. hvpgoff -* and hvpfns_to_add are in units of Hyper-V size -* pages. Handling the PAGE_SIZE != HV_HYP_PAGE_SIZE -* case also handles values of sgl->offset that are -* larger than PAGE_SIZE. Such offsets are handled -* even on other than the first sgl entry, provided -* they are a multiple of PAGE_SIZE. +* Init values for the current sgl entry. hvpfns_to_add +* is in units of Hyper-V size pages. Handling the +* PAGE_SIZE != HV_HYP_PAGE_SIZE case also handles +* values of sgl->offset that are larger than PAGE_SIZE. +* Such offsets are handled even on other than the first +* sgl entry, provided they are a multiple of PAGE_SIZE. */ - hvpgoff = HVPFN_DOWN(sgl->offset); -
[PATCH V6 2/5] x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()
From: Tianyu Lan Hyper-V provides Isolation VM which has memory encrypt support. Add hyperv_cc_platform_has() and return true for check of GUEST_MEM_ENCRYPT attribute. Signed-off-by: Tianyu Lan --- Change since v3: * Change code style of checking GUEST_MEM attribute in the hyperv_cc_platform_has(). --- arch/x86/kernel/cc_platform.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c index 03bb2f343ddb..47db88c275d5 100644 --- a/arch/x86/kernel/cc_platform.c +++ b/arch/x86/kernel/cc_platform.c @@ -11,6 +11,7 @@ #include #include +#include #include static bool __maybe_unused intel_cc_platform_has(enum cc_attr attr) @@ -58,9 +59,16 @@ static bool amd_cc_platform_has(enum cc_attr attr) #endif } +static bool hyperv_cc_platform_has(enum cc_attr attr) +{ + return attr == CC_ATTR_GUEST_MEM_ENCRYPT; +} bool cc_platform_has(enum cc_attr attr) { + if (hv_is_isolation_supported()) + return hyperv_cc_platform_has(attr); + if (sme_me_mask) return amd_cc_platform_has(attr); -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH V6 1/5] swiotlb: Add swiotlb bounce buffer remap function for HV IVM
From: Tianyu Lan In Isolation VM with AMD SEV, bounce buffer needs to be accessed via extra address space which is above shared_gpa_boundary (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access physical address will be original physical address + shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of memory(vTOM). Memory addresses below vTOM are automatically treated as private while memory above vTOM is treated as shared. Expose swiotlb_unencrypted_base for platforms to set unencrypted memory base offset and platform calls swiotlb_update_mem_attributes() to remap swiotlb mem to unencrypted address space. memremap() can not be called in the early stage and so put remapping code into swiotlb_update_mem_attributes(). Store remap address and use it to copy data from/to swiotlb bounce buffer. Acked-by: Christoph Hellwig Signed-off-by: Tianyu Lan --- Change since v3: * Fix boot up failure on the host with mem_encrypt=on. Move calloing of set_memory_decrypted() back from swiotlb_init_io_tlb_mem to swiotlb_late_init_with_tbl() and rmem_swiotlb_device_init(). Change since v2: * Leave mem->vaddr with phys_to_virt(mem->start) when fail to remap swiotlb memory. Change since v1: * Rework comment in the swiotlb_init_io_tlb_mem() * Make swiotlb_init_io_tlb_mem() back to return void. --- include/linux/swiotlb.h | 6 ++ kernel/dma/swiotlb.c| 43 +++-- 2 files changed, 47 insertions(+), 2 deletions(-) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 569272871375..f6c3638255d5 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -73,6 +73,9 @@ extern enum swiotlb_force swiotlb_force; * @end: The end address of the swiotlb memory pool. Used to do a quick * range check to see if the memory was in fact allocated by this * API. + * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb memory pool + * may be remapped in the memory encrypted case and store virtual + * address for bounce buffer operation. * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and * @end. For default swiotlb, this is command line adjustable via * setup_io_tlb_npages. @@ -92,6 +95,7 @@ extern enum swiotlb_force swiotlb_force; struct io_tlb_mem { phys_addr_t start; phys_addr_t end; + void *vaddr; unsigned long nslabs; unsigned long used; unsigned int index; @@ -186,4 +190,6 @@ static inline bool is_swiotlb_for_alloc(struct device *dev) } #endif /* CONFIG_DMA_RESTRICTED_POOL */ +extern phys_addr_t swiotlb_unencrypted_base; + #endif /* __LINUX_SWIOTLB_H */ diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 8e840fbbed7c..34e6ade4f73c 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -50,6 +50,7 @@ #include #include +#include #include #include #include @@ -72,6 +73,8 @@ enum swiotlb_force swiotlb_force; struct io_tlb_mem io_tlb_default_mem; +phys_addr_t swiotlb_unencrypted_base; + /* * Max segment that we can provide which (if pages are contingous) will * not be bounced (unless SWIOTLB_FORCE is set). @@ -155,6 +158,27 @@ static inline unsigned long nr_slots(u64 val) return DIV_ROUND_UP(val, IO_TLB_SIZE); } +/* + * Remap swioltb memory in the unencrypted physical address space + * when swiotlb_unencrypted_base is set. (e.g. for Hyper-V AMD SEV-SNP + * Isolation VMs). + */ +void *swiotlb_mem_remap(struct io_tlb_mem *mem, unsigned long bytes) +{ + void *vaddr = NULL; + + if (swiotlb_unencrypted_base) { + phys_addr_t paddr = mem->start + swiotlb_unencrypted_base; + + vaddr = memremap(paddr, bytes, MEMREMAP_WB); + if (!vaddr) + pr_err("Failed to map the unencrypted memory %llx size %lx.\n", + paddr, bytes); + } + + return vaddr; +} + /* * Early SWIOTLB allocation may be too early to allow an architecture to * perform the desired operations. This function allows the architecture to @@ -172,7 +196,12 @@ void __init swiotlb_update_mem_attributes(void) vaddr = phys_to_virt(mem->start); bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT); set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT); - memset(vaddr, 0, bytes); + + mem->vaddr = swiotlb_mem_remap(mem, bytes); + if (!mem->vaddr) + mem->vaddr = vaddr; + + memset(mem->vaddr, 0, bytes); } static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, @@ -196,7 +225,17 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, mem->slots[i].orig_addr = INVALID
[PATCH V6 0/5] x86/Hyper-V: Add Hyper-V Isolation VM support(Second part)
From: Tianyu Lan Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset is to add support for these Isolation VM support in Linux. The memory of these vms are encrypted and host can't access guest memory directly. Hyper-V provides new host visibility hvcall and the guest needs to call new hvcall to mark memory visible to host before sharing memory with host. For security, all network/storage stack memory should not be shared with host and so there is bounce buffer requests. Vmbus channel ring buffer already plays bounce buffer role because all data from/to host needs to copy from/to between the ring buffer and IO stack memory. So mark vmbus channel ring buffer visible. For SNP isolation VM, guest needs to access the shared memory via extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_ ISOLATION_CONFIG. The access physical address of the shared memory should be bounce buffer memory GPA plus with shared_gpa_boundary reported by CPUID. This patchset is to enable swiotlb bounce buffer for netvsc/storvsc in Isolation VM. This version follows Michael Kelley suggestion in the following link. https://lkml.org/lkml/2021/11/24/2044 Change sicne v5: * Modify "Swiotlb" to "swiotlb" in commit log. * Remove CONFIG_HYPERV check in the hyperv_cc_platform_has() Change since v4: * Remove Hyper-V IOMMU IOMMU_INIT_FINISH related functions and set SWIOTLB_FORCE and swiotlb_unencrypted_base in the ms_hyperv_init_platform(). Call swiotlb_update_mem_attributes() in the hyperv_init(). Change since v3: * Fix boot up failure on the host with mem_encrypt=on. Move calloing of set_memory_decrypted() back from swiotlb_init_io_tlb_mem to swiotlb_late_init_with_tbl() and rmem_swiotlb_device_init(). * Change code style of checking GUEST_MEM attribute in the hyperv_cc_platform_has(). * Add comment in pci-swiotlb-xen.c to explain why add dependency between hyperv_swiotlb_detect() and pci_ xen_swiotlb_detect(). * Return directly when fails to allocate Hyper-V swiotlb buffer in the hyperv_iommu_swiotlb_init(). Change since v2: * Remove Hyper-V dma ops and dma_alloc/free_noncontiguous. Add hv_map/unmap_memory() to map/umap netvsc rx/tx ring into extra address space. * Leave mem->vaddr in swiotlb code with phys_to_virt(mem->start) when fail to remap swiotlb memory. Change since v1: * Add Hyper-V Isolation support check in the cc_platform_has() and return true for guest memory encrypt attr. * Remove hv isolation check in the sev_setup_arch() Tianyu Lan (5): swiotlb: Add swiotlb bounce buffer remap function for HV IVM x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has() hyper-v: Enable swiotlb bounce buffer for Isolation VM scsi: storvsc: Add Isolation VM support for storvsc driver net: netvsc: Add Isolation VM support for netvsc driver arch/x86/hyperv/hv_init.c | 10 +++ arch/x86/hyperv/ivm.c | 28 ++ arch/x86/kernel/cc_platform.c | 8 ++ arch/x86/kernel/cpu/mshyperv.c| 11 ++- drivers/hv/hv_common.c| 11 +++ drivers/hv/vmbus_drv.c| 4 + drivers/net/hyperv/hyperv_net.h | 5 ++ drivers/net/hyperv/netvsc.c | 136 +- drivers/net/hyperv/netvsc_drv.c | 1 + drivers/net/hyperv/rndis_filter.c | 2 + drivers/scsi/storvsc_drv.c| 37 include/asm-generic/mshyperv.h| 2 + include/linux/hyperv.h| 14 +++ include/linux/swiotlb.h | 6 ++ kernel/dma/swiotlb.c | 43 +- 15 files changed, 296 insertions(+), 22 deletions(-) -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH V6 5/5] net: netvsc: Add Isolation VM support for netvsc driver
From: Tianyu Lan In Isolation VM, all shared memory with host needs to mark visible to host via hvcall. vmbus_establish_gpadl() has already done it for netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_ pagebuffer() stills need to be handled. Use DMA API to map/umap these memory during sending/receiving packet and Hyper-V swiotlb bounce buffer dma address will be returned. The swiotlb bounce buffer has been masked to be visible to host during boot up. rx/tx ring buffer is allocated via vzalloc() and they need to be mapped into unencrypted address space(above vTOM) before sharing with host and accessing. Add hv_map/unmap_memory() to map/umap rx /tx ring buffer. Signed-off-by: Tianyu Lan --- Change since v3: * Replace HV_HYP_PAGE_SIZE with PAGE_SIZE and virt_to_hvpfn() with vmalloc_to_pfn() in the hv_map_memory() Change since v2: * Add hv_map/unmap_memory() to map/umap rx/tx ring buffer. --- arch/x86/hyperv/ivm.c | 28 ++ drivers/hv/hv_common.c| 11 +++ drivers/net/hyperv/hyperv_net.h | 5 ++ drivers/net/hyperv/netvsc.c | 136 +- drivers/net/hyperv/netvsc_drv.c | 1 + drivers/net/hyperv/rndis_filter.c | 2 + include/asm-generic/mshyperv.h| 2 + include/linux/hyperv.h| 5 ++ 8 files changed, 187 insertions(+), 3 deletions(-) diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c index 69c7a57f3307..2b994117581e 100644 --- a/arch/x86/hyperv/ivm.c +++ b/arch/x86/hyperv/ivm.c @@ -287,3 +287,31 @@ int hv_set_mem_host_visibility(unsigned long kbuffer, int pagecount, bool visibl kfree(pfn_array); return ret; } + +/* + * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM. + */ +void *hv_map_memory(void *addr, unsigned long size) +{ + unsigned long *pfns = kcalloc(size / PAGE_SIZE, + sizeof(unsigned long), GFP_KERNEL); + void *vaddr; + int i; + + if (!pfns) + return NULL; + + for (i = 0; i < size / PAGE_SIZE; i++) + pfns[i] = vmalloc_to_pfn(addr + i * PAGE_SIZE) + + (ms_hyperv.shared_gpa_boundary >> PAGE_SHIFT); + + vaddr = vmap_pfn(pfns, size / PAGE_SIZE, PAGE_KERNEL_IO); + kfree(pfns); + + return vaddr; +} + +void hv_unmap_memory(void *addr) +{ + vunmap(addr); +} diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c index 7be173a99f27..3c5cb1f70319 100644 --- a/drivers/hv/hv_common.c +++ b/drivers/hv/hv_common.c @@ -295,3 +295,14 @@ u64 __weak hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_s return HV_STATUS_INVALID_PARAMETER; } EXPORT_SYMBOL_GPL(hv_ghcb_hypercall); + +void __weak *hv_map_memory(void *addr, unsigned long size) +{ + return NULL; +} +EXPORT_SYMBOL_GPL(hv_map_memory); + +void __weak hv_unmap_memory(void *addr) +{ +} +EXPORT_SYMBOL_GPL(hv_unmap_memory); diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h index 315278a7cf88..cf69da0e296c 100644 --- a/drivers/net/hyperv/hyperv_net.h +++ b/drivers/net/hyperv/hyperv_net.h @@ -164,6 +164,7 @@ struct hv_netvsc_packet { u32 total_bytes; u32 send_buf_index; u32 total_data_buflen; + struct hv_dma_range *dma_range; }; #define NETVSC_HASH_KEYLEN 40 @@ -1074,6 +1075,7 @@ struct netvsc_device { /* Receive buffer allocated by us but manages by NetVSP */ void *recv_buf; + void *recv_original_buf; u32 recv_buf_size; /* allocated bytes */ struct vmbus_gpadl recv_buf_gpadl_handle; u32 recv_section_cnt; @@ -1082,6 +1084,7 @@ struct netvsc_device { /* Send buffer allocated by us */ void *send_buf; + void *send_original_buf; u32 send_buf_size; struct vmbus_gpadl send_buf_gpadl_handle; u32 send_section_cnt; @@ -1731,4 +1734,6 @@ struct rndis_message { #define RETRY_US_HI1 #define RETRY_MAX 2000/* >10 sec */ +void netvsc_dma_unmap(struct hv_device *hv_dev, + struct hv_netvsc_packet *packet); #endif /* _HYPERV_NET_H */ diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c index 396bc1c204e6..b7ade735a806 100644 --- a/drivers/net/hyperv/netvsc.c +++ b/drivers/net/hyperv/netvsc.c @@ -153,8 +153,21 @@ static void free_netvsc_device(struct rcu_head *head) int i; kfree(nvdev->extension); - vfree(nvdev->recv_buf); - vfree(nvdev->send_buf); + + if (nvdev->recv_original_buf) { + hv_unmap_memory(nvdev->recv_buf); + vfree(nvdev->recv_original_buf); + } else { + vfree(nvdev->recv_buf); + } + + if (nvdev->send_original_buf) { + hv_unmap_memory(nvdev->send_buf); + vfree(nvdev->send_original_buf); + } else { + vfree(nvdev-&
[PATCH V6 3/5] hyper-v: Enable swiotlb bounce buffer for Isolation VM
From: Tianyu Lan hyperv Isolation VM requires bounce buffer support to copy data from/to encrypted memory and so enable swiotlb force mode to use swiotlb bounce buffer for DMA transaction. In Isolation VM with AMD SEV, the bounce buffer needs to be accessed via extra address space which is above shared_gpa_boundary (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access physical address will be original physical address + shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of memory(vTOM). Memory addresses below vTOM are automatically treated as private while memory above vTOM is treated as shared. Swiotlb bounce buffer code calls set_memory_decrypted() to mark bounce buffer visible to host and map it in extra address space via memremap. Populate the shared_gpa_boundary (vTOM) via swiotlb_unencrypted_base variable. The map function memremap() can't work in the early place (e.g ms_hyperv_init_platform()) and so call swiotlb_update_mem_ attributes() in the hyperv_init(). Signed-off-by: Tianyu Lan --- Change since v4: * Remove Hyper-V IOMMU IOMMU_INIT_FINISH related functions and set SWIOTLB_FORCE and swiotlb_unencrypted_base in the ms_hyperv_init_platform(). Call swiotlb_update_mem_attributes() in the hyperv_init(). Change since v3: * Add comment in pci-swiotlb-xen.c to explain why add dependency between hyperv_swiotlb_detect() and pci_ xen_swiotlb_detect(). * Return directly when fails to allocate Hyper-V swiotlb buffer in the hyperv_iommu_swiotlb_init(). --- arch/x86/hyperv/hv_init.c | 10 ++ arch/x86/kernel/cpu/mshyperv.c | 11 ++- include/linux/hyperv.h | 8 3 files changed, 28 insertions(+), 1 deletion(-) diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c index 24f4a06ac46a..9e18a280f89d 100644 --- a/arch/x86/hyperv/hv_init.c +++ b/arch/x86/hyperv/hv_init.c @@ -28,6 +28,7 @@ #include #include #include +#include int hyperv_init_cpuhp; u64 hv_current_partition_id = ~0ull; @@ -502,6 +503,15 @@ void __init hyperv_init(void) /* Query the VMs extended capability once, so that it can be cached. */ hv_query_ext_cap(0); + + /* +* Swiotlb bounce buffer needs to be mapped in extra address +* space. Map function doesn't work in the early place and so +* call swiotlb_update_mem_attributes() here. +*/ + if (hv_is_isolation_supported()) + swiotlb_update_mem_attributes(); + return; clean_guest_os_id: diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c index 4794b716ec79..baf3a0873552 100644 --- a/arch/x86/kernel/cpu/mshyperv.c +++ b/arch/x86/kernel/cpu/mshyperv.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include #include @@ -319,8 +320,16 @@ static void __init ms_hyperv_init_platform(void) pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n", ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b); - if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) + if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) { static_branch_enable(_type_snp); + swiotlb_unencrypted_base = ms_hyperv.shared_gpa_boundary; + } + + /* +* Enable swiotlb force mode in Isolation VM to +* use swiotlb bounce buffer for dma transaction. +*/ + swiotlb_force = SWIOTLB_FORCE; } if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) { diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index b823311eac79..1f037e114dc8 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -1726,6 +1726,14 @@ int hyperv_write_cfg_blk(struct pci_dev *dev, void *buf, unsigned int len, int hyperv_reg_block_invalidate(struct pci_dev *dev, void *context, void (*block_invalidate)(void *context, u64 block_mask)); +#if IS_ENABLED(CONFIG_HYPERV) +int __init hyperv_swiotlb_detect(void); +#else +static inline int __init hyperv_swiotlb_detect(void) +{ + return 0; +} +#endif struct hyperv_pci_block_ops { int (*read_block)(struct pci_dev *dev, void *buf, unsigned int buf_len, -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH V6 2/5] x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()
From: Tianyu Lan Hyper-V provides Isolation VM which has memory encrypt support. Add hyperv_cc_platform_has() and return true for check of GUEST_MEM_ENCRYPT attribute. Signed-off-by: Tianyu Lan --- Change since v3: * Change code style of checking GUEST_MEM attribute in the hyperv_cc_platform_has(). --- arch/x86/kernel/cc_platform.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c index 03bb2f343ddb..47db88c275d5 100644 --- a/arch/x86/kernel/cc_platform.c +++ b/arch/x86/kernel/cc_platform.c @@ -11,6 +11,7 @@ #include #include +#include #include static bool __maybe_unused intel_cc_platform_has(enum cc_attr attr) @@ -58,9 +59,16 @@ static bool amd_cc_platform_has(enum cc_attr attr) #endif } +static bool hyperv_cc_platform_has(enum cc_attr attr) +{ + return attr == CC_ATTR_GUEST_MEM_ENCRYPT; +} bool cc_platform_has(enum cc_attr attr) { + if (hv_is_isolation_supported()) + return hyperv_cc_platform_has(attr); + if (sme_me_mask) return amd_cc_platform_has(attr); -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH V6 4/5] scsi: storvsc: Add Isolation VM support for storvsc driver
From: Tianyu Lan In Isolation VM, all shared memory with host needs to mark visible to host via hvcall. vmbus_establish_gpadl() has already done it for storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_ mpb_desc() still needs to be handled. Use DMA API(scsi_dma_map/unmap) to map these memory during sending/receiving packet and return swiotlb bounce buffer dma address. In Isolation VM, swiotlb bounce buffer is marked to be visible to host and the swiotlb force mode is enabled. Set device's dma min align mask to HV_HYP_PAGE_SIZE - 1 in order to keep the original data offset in the bounce buffer. Signed-off-by: Tianyu Lan --- drivers/hv/vmbus_drv.c | 4 drivers/scsi/storvsc_drv.c | 37 + include/linux/hyperv.h | 1 + 3 files changed, 26 insertions(+), 16 deletions(-) diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index 392c1ac4f819..ae6ec503399a 100644 --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -33,6 +33,7 @@ #include #include #include +#include #include #include "hyperv_vmbus.h" @@ -2078,6 +2079,7 @@ struct hv_device *vmbus_device_create(const guid_t *type, return child_device_obj; } +static u64 vmbus_dma_mask = DMA_BIT_MASK(64); /* * vmbus_device_register - Register the child device */ @@ -2118,6 +2120,8 @@ int vmbus_device_register(struct hv_device *child_device_obj) } hv_debug_add_dev_dir(child_device_obj); + child_device_obj->device.dma_mask = _dma_mask; + child_device_obj->device.dma_parms = _device_obj->dma_parms; return 0; err_kset_unregister: diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index 20595c0ba0ae..ae293600d799 100644 --- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -21,6 +21,8 @@ #include #include #include +#include + #include #include #include @@ -1336,6 +1338,7 @@ static void storvsc_on_channel_callback(void *context) continue; } request = (struct storvsc_cmd_request *)scsi_cmd_priv(scmnd); + scsi_dma_unmap(scmnd); } storvsc_on_receive(stor_device, packet, request); @@ -1749,7 +1752,6 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd) struct hv_host_device *host_dev = shost_priv(host); struct hv_device *dev = host_dev->dev; struct storvsc_cmd_request *cmd_request = scsi_cmd_priv(scmnd); - int i; struct scatterlist *sgl; unsigned int sg_count; struct vmscsi_request *vm_srb; @@ -1831,10 +1833,11 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd) payload_sz = sizeof(cmd_request->mpb); if (sg_count) { - unsigned int hvpgoff, hvpfns_to_add; unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset); unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length); - u64 hvpfn; + struct scatterlist *sg; + unsigned long hvpfn, hvpfns_to_add; + int j, i = 0; if (hvpg_count > MAX_PAGE_BUFFER_COUNT) { @@ -1848,21 +1851,22 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd) payload->range.len = length; payload->range.offset = offset_in_hvpg; + sg_count = scsi_dma_map(scmnd); + if (sg_count < 0) + return SCSI_MLQUEUE_DEVICE_BUSY; - for (i = 0; sgl != NULL; sgl = sg_next(sgl)) { + for_each_sg(sgl, sg, sg_count, j) { /* -* Init values for the current sgl entry. hvpgoff -* and hvpfns_to_add are in units of Hyper-V size -* pages. Handling the PAGE_SIZE != HV_HYP_PAGE_SIZE -* case also handles values of sgl->offset that are -* larger than PAGE_SIZE. Such offsets are handled -* even on other than the first sgl entry, provided -* they are a multiple of PAGE_SIZE. +* Init values for the current sgl entry. hvpfns_to_add +* is in units of Hyper-V size pages. Handling the +* PAGE_SIZE != HV_HYP_PAGE_SIZE case also handles +* values of sgl->offset that are larger than PAGE_SIZE. +* Such offsets are handled even on other than the first +* sgl entry, provided they are a multiple of PAGE_SIZE. */ - hvpgoff = HVPFN_DOWN(sgl->offset); -
[PATCH V6 1/5] swiotlb: Add swiotlb bounce buffer remap function for HV IVM
From: Tianyu Lan In Isolation VM with AMD SEV, bounce buffer needs to be accessed via extra address space which is above shared_gpa_boundary (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access physical address will be original physical address + shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of memory(vTOM). Memory addresses below vTOM are automatically treated as private while memory above vTOM is treated as shared. Expose swiotlb_unencrypted_base for platforms to set unencrypted memory base offset and platform calls swiotlb_update_mem_attributes() to remap swiotlb mem to unencrypted address space. memremap() can not be called in the early stage and so put remapping code into swiotlb_update_mem_attributes(). Store remap address and use it to copy data from/to swiotlb bounce buffer. Acked-by: Christoph Hellwig Signed-off-by: Tianyu Lan --- Change since v3: * Fix boot up failure on the host with mem_encrypt=on. Move calloing of set_memory_decrypted() back from swiotlb_init_io_tlb_mem to swiotlb_late_init_with_tbl() and rmem_swiotlb_device_init(). Change since v2: * Leave mem->vaddr with phys_to_virt(mem->start) when fail to remap swiotlb memory. Change since v1: * Rework comment in the swiotlb_init_io_tlb_mem() * Make swiotlb_init_io_tlb_mem() back to return void. --- include/linux/swiotlb.h | 6 ++ kernel/dma/swiotlb.c| 43 +++-- 2 files changed, 47 insertions(+), 2 deletions(-) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 569272871375..f6c3638255d5 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -73,6 +73,9 @@ extern enum swiotlb_force swiotlb_force; * @end: The end address of the swiotlb memory pool. Used to do a quick * range check to see if the memory was in fact allocated by this * API. + * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb memory pool + * may be remapped in the memory encrypted case and store virtual + * address for bounce buffer operation. * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and * @end. For default swiotlb, this is command line adjustable via * setup_io_tlb_npages. @@ -92,6 +95,7 @@ extern enum swiotlb_force swiotlb_force; struct io_tlb_mem { phys_addr_t start; phys_addr_t end; + void *vaddr; unsigned long nslabs; unsigned long used; unsigned int index; @@ -186,4 +190,6 @@ static inline bool is_swiotlb_for_alloc(struct device *dev) } #endif /* CONFIG_DMA_RESTRICTED_POOL */ +extern phys_addr_t swiotlb_unencrypted_base; + #endif /* __LINUX_SWIOTLB_H */ diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 8e840fbbed7c..34e6ade4f73c 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -50,6 +50,7 @@ #include #include +#include #include #include #include @@ -72,6 +73,8 @@ enum swiotlb_force swiotlb_force; struct io_tlb_mem io_tlb_default_mem; +phys_addr_t swiotlb_unencrypted_base; + /* * Max segment that we can provide which (if pages are contingous) will * not be bounced (unless SWIOTLB_FORCE is set). @@ -155,6 +158,27 @@ static inline unsigned long nr_slots(u64 val) return DIV_ROUND_UP(val, IO_TLB_SIZE); } +/* + * Remap swioltb memory in the unencrypted physical address space + * when swiotlb_unencrypted_base is set. (e.g. for Hyper-V AMD SEV-SNP + * Isolation VMs). + */ +void *swiotlb_mem_remap(struct io_tlb_mem *mem, unsigned long bytes) +{ + void *vaddr = NULL; + + if (swiotlb_unencrypted_base) { + phys_addr_t paddr = mem->start + swiotlb_unencrypted_base; + + vaddr = memremap(paddr, bytes, MEMREMAP_WB); + if (!vaddr) + pr_err("Failed to map the unencrypted memory %llx size %lx.\n", + paddr, bytes); + } + + return vaddr; +} + /* * Early SWIOTLB allocation may be too early to allow an architecture to * perform the desired operations. This function allows the architecture to @@ -172,7 +196,12 @@ void __init swiotlb_update_mem_attributes(void) vaddr = phys_to_virt(mem->start); bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT); set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT); - memset(vaddr, 0, bytes); + + mem->vaddr = swiotlb_mem_remap(mem, bytes); + if (!mem->vaddr) + mem->vaddr = vaddr; + + memset(mem->vaddr, 0, bytes); } static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, @@ -196,7 +225,17 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, mem->slots[i].orig_addr = INVALID
[PATCH V6 0/5] x86/Hyper-V: Add Hyper-V Isolation VM support(Second part)
From: Tianyu Lan Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset is to add support for these Isolation VM support in Linux. The memory of these vms are encrypted and host can't access guest memory directly. Hyper-V provides new host visibility hvcall and the guest needs to call new hvcall to mark memory visible to host before sharing memory with host. For security, all network/storage stack memory should not be shared with host and so there is bounce buffer requests. Vmbus channel ring buffer already plays bounce buffer role because all data from/to host needs to copy from/to between the ring buffer and IO stack memory. So mark vmbus channel ring buffer visible. For SNP isolation VM, guest needs to access the shared memory via extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_ ISOLATION_CONFIG. The access physical address of the shared memory should be bounce buffer memory GPA plus with shared_gpa_boundary reported by CPUID. This patchset is to enable swiotlb bounce buffer for netvsc/storvsc in Isolation VM. This version follows Michael Kelley suggestion in the following link. https://lkml.org/lkml/2021/11/24/2044 Change sicne v5: * Modify "Swiotlb" to "swiotlb" in commit log. * Remove CONFIG_HYPERV check in the hyperv_cc_platform_has() Change since v4: * Remove Hyper-V IOMMU IOMMU_INIT_FINISH related functions and set SWIOTLB_FORCE and swiotlb_unencrypted_base in the ms_hyperv_init_platform(). Call swiotlb_update_mem_attributes() in the hyperv_init(). Change since v3: * Fix boot up failure on the host with mem_encrypt=on. Move calloing of set_memory_decrypted() back from swiotlb_init_io_tlb_mem to swiotlb_late_init_with_tbl() and rmem_swiotlb_device_init(). * Change code style of checking GUEST_MEM attribute in the hyperv_cc_platform_has(). * Add comment in pci-swiotlb-xen.c to explain why add dependency between hyperv_swiotlb_detect() and pci_ xen_swiotlb_detect(). * Return directly when fails to allocate Hyper-V swiotlb buffer in the hyperv_iommu_swiotlb_init(). Change since v2: * Remove Hyper-V dma ops and dma_alloc/free_noncontiguous. Add hv_map/unmap_memory() to map/umap netvsc rx/tx ring into extra address space. * Leave mem->vaddr in swiotlb code with phys_to_virt(mem->start) when fail to remap swiotlb memory. Change since v1: * Add Hyper-V Isolation support check in the cc_platform_has() and return true for guest memory encrypt attr. * Remove hv isolation check in the sev_setup_arch() Tianyu Lan (5): swiotlb: Add swiotlb bounce buffer remap function for HV IVM x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has() hyper-v: Enable swiotlb bounce buffer for Isolation VM scsi: storvsc: Add Isolation VM support for storvsc driver net: netvsc: Add Isolation VM support for netvsc driver arch/x86/hyperv/hv_init.c | 10 +++ arch/x86/hyperv/ivm.c | 28 ++ arch/x86/kernel/cc_platform.c | 8 ++ arch/x86/kernel/cpu/mshyperv.c| 11 ++- drivers/hv/hv_common.c| 11 +++ drivers/hv/vmbus_drv.c| 4 + drivers/net/hyperv/hyperv_net.h | 5 ++ drivers/net/hyperv/netvsc.c | 136 +- drivers/net/hyperv/netvsc_drv.c | 1 + drivers/net/hyperv/rndis_filter.c | 2 + drivers/scsi/storvsc_drv.c| 37 include/asm-generic/mshyperv.h| 2 + include/linux/hyperv.h| 14 +++ include/linux/swiotlb.h | 6 ++ kernel/dma/swiotlb.c | 43 +- 15 files changed, 296 insertions(+), 22 deletions(-) -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V4 3/5] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM
On 12/5/2021 6:31 PM, Juergen Gross wrote: On 05.12.21 09:48, Tianyu Lan wrote: On 12/5/2021 4:34 PM, Juergen Gross wrote: On 05.12.21 09:18, Tianyu Lan wrote: From: Tianyu Lan hyperv Isolation VM requires bounce buffer support to copy data from/to encrypted memory and so enable swiotlb force mode to use swiotlb bounce buffer for DMA transaction. In Isolation VM with AMD SEV, the bounce buffer needs to be accessed via extra address space which is above shared_gpa_boundary (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access physical address will be original physical address + shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of memory(vTOM). Memory addresses below vTOM are automatically treated as private while memory above vTOM is treated as shared. Hyper-V initalizes swiotlb bounce buffer and default swiotlb needs to be disabled. pci_swiotlb_detect_override() and pci_swiotlb_detect_4gb() enable the default one. To override the setting, hyperv_swiotlb_detect() needs to run before these detect functions which depends on the pci_xen_swiotlb_ init(). Make pci_xen_swiotlb_init() depends on the hyperv_swiotlb _detect() to keep the order. Why? Does Hyper-V plan to support Xen PV guests? If not, I don't see the need for adding this change. This is to keep detect function calling order that Hyper-V detect callback needs to call before pci_swiotlb_detect_override() and pci_swiotlb_detect_4gb(). This is the same for why pci_swiotlb_detect_override() needs to depend on the pci_xen_swiotlb_detect(). Hyper-V also has such request and so make xen detect callback depends on Hyper-V one. And does this even work without CONFIG_SWIOTLB_XEN, i.e. without pci_xen_swiotlb_detect() being in the system? Hi Juergen: Thanks for your review. This is a issue and I just sent out a v5 which decouples the dependency between xen and hyperv. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V4 1/5] Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM
On 12/6/2021 10:09 PM, Christoph Hellwig wrote: Please spell swiotlb with a lower case s. Otherwise this look good Acked-by: Christoph Hellwig Feel free to carry this in whatever tree is suitable for the rest of the patches. Sure. Thanks for your ack and will update "swiotlb" in the next version. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V4 2/5] x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()
Hi Christoph: Thanks for your review. On 12/6/2021 10:06 PM, Christoph Hellwig wrote: On Sun, Dec 05, 2021 at 03:18:10AM -0500, Tianyu Lan wrote: +static bool hyperv_cc_platform_has(enum cc_attr attr) +{ +#ifdef CONFIG_HYPERV + return attr == CC_ATTR_GUEST_MEM_ENCRYPT; +#else + return false; +#endif +} Can we even end up here without CONFIG_HYPERV? Yes, I will update in the next version. Thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH V5 4/5] scsi: storvsc: Add Isolation VM support for storvsc driver
From: Tianyu Lan In Isolation VM, all shared memory with host needs to mark visible to host via hvcall. vmbus_establish_gpadl() has already done it for storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_ mpb_desc() still needs to be handled. Use DMA API(scsi_dma_map/unmap) to map these memory during sending/receiving packet and return swiotlb bounce buffer dma address. In Isolation VM, swiotlb bounce buffer is marked to be visible to host and the swiotlb force mode is enabled. Set device's dma min align mask to HV_HYP_PAGE_SIZE - 1 in order to keep the original data offset in the bounce buffer. Signed-off-by: Tianyu Lan --- drivers/hv/vmbus_drv.c | 4 drivers/scsi/storvsc_drv.c | 37 + include/linux/hyperv.h | 1 + 3 files changed, 26 insertions(+), 16 deletions(-) diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index 392c1ac4f819..ae6ec503399a 100644 --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -33,6 +33,7 @@ #include #include #include +#include #include #include "hyperv_vmbus.h" @@ -2078,6 +2079,7 @@ struct hv_device *vmbus_device_create(const guid_t *type, return child_device_obj; } +static u64 vmbus_dma_mask = DMA_BIT_MASK(64); /* * vmbus_device_register - Register the child device */ @@ -2118,6 +2120,8 @@ int vmbus_device_register(struct hv_device *child_device_obj) } hv_debug_add_dev_dir(child_device_obj); + child_device_obj->device.dma_mask = _dma_mask; + child_device_obj->device.dma_parms = _device_obj->dma_parms; return 0; err_kset_unregister: diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index 20595c0ba0ae..ae293600d799 100644 --- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -21,6 +21,8 @@ #include #include #include +#include + #include #include #include @@ -1336,6 +1338,7 @@ static void storvsc_on_channel_callback(void *context) continue; } request = (struct storvsc_cmd_request *)scsi_cmd_priv(scmnd); + scsi_dma_unmap(scmnd); } storvsc_on_receive(stor_device, packet, request); @@ -1749,7 +1752,6 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd) struct hv_host_device *host_dev = shost_priv(host); struct hv_device *dev = host_dev->dev; struct storvsc_cmd_request *cmd_request = scsi_cmd_priv(scmnd); - int i; struct scatterlist *sgl; unsigned int sg_count; struct vmscsi_request *vm_srb; @@ -1831,10 +1833,11 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd) payload_sz = sizeof(cmd_request->mpb); if (sg_count) { - unsigned int hvpgoff, hvpfns_to_add; unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset); unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length); - u64 hvpfn; + struct scatterlist *sg; + unsigned long hvpfn, hvpfns_to_add; + int j, i = 0; if (hvpg_count > MAX_PAGE_BUFFER_COUNT) { @@ -1848,21 +1851,22 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd) payload->range.len = length; payload->range.offset = offset_in_hvpg; + sg_count = scsi_dma_map(scmnd); + if (sg_count < 0) + return SCSI_MLQUEUE_DEVICE_BUSY; - for (i = 0; sgl != NULL; sgl = sg_next(sgl)) { + for_each_sg(sgl, sg, sg_count, j) { /* -* Init values for the current sgl entry. hvpgoff -* and hvpfns_to_add are in units of Hyper-V size -* pages. Handling the PAGE_SIZE != HV_HYP_PAGE_SIZE -* case also handles values of sgl->offset that are -* larger than PAGE_SIZE. Such offsets are handled -* even on other than the first sgl entry, provided -* they are a multiple of PAGE_SIZE. +* Init values for the current sgl entry. hvpfns_to_add +* is in units of Hyper-V size pages. Handling the +* PAGE_SIZE != HV_HYP_PAGE_SIZE case also handles +* values of sgl->offset that are larger than PAGE_SIZE. +* Such offsets are handled even on other than the first +* sgl entry, provided they are a multiple of PAGE_SIZE. */ - hvpgoff = HVPFN_DOWN(sgl->offset); -
[PATCH V5 2/5] x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()
From: Tianyu Lan Hyper-V provides Isolation VM which has memory encrypt support. Add hyperv_cc_platform_has() and return true for check of GUEST_MEM_ENCRYPT attribute. Signed-off-by: Tianyu Lan --- Change since v3: * Change code style of checking GUEST_MEM attribute in the hyperv_cc_platform_has(). --- arch/x86/kernel/cc_platform.c | 12 1 file changed, 12 insertions(+) diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c index 03bb2f343ddb..7b66793c0f25 100644 --- a/arch/x86/kernel/cc_platform.c +++ b/arch/x86/kernel/cc_platform.c @@ -11,6 +11,7 @@ #include #include +#include #include static bool __maybe_unused intel_cc_platform_has(enum cc_attr attr) @@ -58,9 +59,20 @@ static bool amd_cc_platform_has(enum cc_attr attr) #endif } +static bool hyperv_cc_platform_has(enum cc_attr attr) +{ +#if IS_ENABLED(CONFIG_HYPERV) + return attr == CC_ATTR_GUEST_MEM_ENCRYPT; +#else + return false; +#endif +} bool cc_platform_has(enum cc_attr attr) { + if (hv_is_isolation_supported()) + return hyperv_cc_platform_has(attr); + if (sme_me_mask) return amd_cc_platform_has(attr); -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH V5 5/5] net: netvsc: Add Isolation VM support for netvsc driver
From: Tianyu Lan In Isolation VM, all shared memory with host needs to mark visible to host via hvcall. vmbus_establish_gpadl() has already done it for netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_ pagebuffer() stills need to be handled. Use DMA API to map/umap these memory during sending/receiving packet and Hyper-V swiotlb bounce buffer dma address will be returned. The swiotlb bounce buffer has been masked to be visible to host during boot up. rx/tx ring buffer is allocated via vzalloc() and they need to be mapped into unencrypted address space(above vTOM) before sharing with host and accessing. Add hv_map/unmap_memory() to map/umap rx /tx ring buffer. Signed-off-by: Tianyu Lan --- Change since v3: * Replace HV_HYP_PAGE_SIZE with PAGE_SIZE and virt_to_hvpfn() with vmalloc_to_pfn() in the hv_map_memory() Change since v2: * Add hv_map/unmap_memory() to map/umap rx/tx ring buffer. --- arch/x86/hyperv/ivm.c | 28 ++ drivers/hv/hv_common.c| 11 +++ drivers/net/hyperv/hyperv_net.h | 5 ++ drivers/net/hyperv/netvsc.c | 136 +- drivers/net/hyperv/netvsc_drv.c | 1 + drivers/net/hyperv/rndis_filter.c | 2 + include/asm-generic/mshyperv.h| 2 + include/linux/hyperv.h| 5 ++ 8 files changed, 187 insertions(+), 3 deletions(-) diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c index 69c7a57f3307..2b994117581e 100644 --- a/arch/x86/hyperv/ivm.c +++ b/arch/x86/hyperv/ivm.c @@ -287,3 +287,31 @@ int hv_set_mem_host_visibility(unsigned long kbuffer, int pagecount, bool visibl kfree(pfn_array); return ret; } + +/* + * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM. + */ +void *hv_map_memory(void *addr, unsigned long size) +{ + unsigned long *pfns = kcalloc(size / PAGE_SIZE, + sizeof(unsigned long), GFP_KERNEL); + void *vaddr; + int i; + + if (!pfns) + return NULL; + + for (i = 0; i < size / PAGE_SIZE; i++) + pfns[i] = vmalloc_to_pfn(addr + i * PAGE_SIZE) + + (ms_hyperv.shared_gpa_boundary >> PAGE_SHIFT); + + vaddr = vmap_pfn(pfns, size / PAGE_SIZE, PAGE_KERNEL_IO); + kfree(pfns); + + return vaddr; +} + +void hv_unmap_memory(void *addr) +{ + vunmap(addr); +} diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c index 7be173a99f27..3c5cb1f70319 100644 --- a/drivers/hv/hv_common.c +++ b/drivers/hv/hv_common.c @@ -295,3 +295,14 @@ u64 __weak hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_s return HV_STATUS_INVALID_PARAMETER; } EXPORT_SYMBOL_GPL(hv_ghcb_hypercall); + +void __weak *hv_map_memory(void *addr, unsigned long size) +{ + return NULL; +} +EXPORT_SYMBOL_GPL(hv_map_memory); + +void __weak hv_unmap_memory(void *addr) +{ +} +EXPORT_SYMBOL_GPL(hv_unmap_memory); diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h index 315278a7cf88..cf69da0e296c 100644 --- a/drivers/net/hyperv/hyperv_net.h +++ b/drivers/net/hyperv/hyperv_net.h @@ -164,6 +164,7 @@ struct hv_netvsc_packet { u32 total_bytes; u32 send_buf_index; u32 total_data_buflen; + struct hv_dma_range *dma_range; }; #define NETVSC_HASH_KEYLEN 40 @@ -1074,6 +1075,7 @@ struct netvsc_device { /* Receive buffer allocated by us but manages by NetVSP */ void *recv_buf; + void *recv_original_buf; u32 recv_buf_size; /* allocated bytes */ struct vmbus_gpadl recv_buf_gpadl_handle; u32 recv_section_cnt; @@ -1082,6 +1084,7 @@ struct netvsc_device { /* Send buffer allocated by us */ void *send_buf; + void *send_original_buf; u32 send_buf_size; struct vmbus_gpadl send_buf_gpadl_handle; u32 send_section_cnt; @@ -1731,4 +1734,6 @@ struct rndis_message { #define RETRY_US_HI1 #define RETRY_MAX 2000/* >10 sec */ +void netvsc_dma_unmap(struct hv_device *hv_dev, + struct hv_netvsc_packet *packet); #endif /* _HYPERV_NET_H */ diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c index 396bc1c204e6..b7ade735a806 100644 --- a/drivers/net/hyperv/netvsc.c +++ b/drivers/net/hyperv/netvsc.c @@ -153,8 +153,21 @@ static void free_netvsc_device(struct rcu_head *head) int i; kfree(nvdev->extension); - vfree(nvdev->recv_buf); - vfree(nvdev->send_buf); + + if (nvdev->recv_original_buf) { + hv_unmap_memory(nvdev->recv_buf); + vfree(nvdev->recv_original_buf); + } else { + vfree(nvdev->recv_buf); + } + + if (nvdev->send_original_buf) { + hv_unmap_memory(nvdev->send_buf); + vfree(nvdev->send_original_buf); + } else { + vfree(nvdev-&
[PATCH V5 3/5] hyper-v: Enable swiotlb bounce buffer for Isolation VM
From: Tianyu Lan hyperv Isolation VM requires bounce buffer support to copy data from/to encrypted memory and so enable swiotlb force mode to use swiotlb bounce buffer for DMA transaction. In Isolation VM with AMD SEV, the bounce buffer needs to be accessed via extra address space which is above shared_gpa_boundary (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access physical address will be original physical address + shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of memory(vTOM). Memory addresses below vTOM are automatically treated as private while memory above vTOM is treated as shared. Swiotlb bounce buffer code calls set_memory_decrypted() to mark bounce buffer visible to host and map it in extra address space via memremap. Populate the shared_gpa_boundary (vTOM) via swiotlb_unencrypted_base variable. The map function memremap() can't work in the early place (e.g ms_hyperv_init_platform()) and so call swiotlb_update_mem_ attributes() in the hyperv_init(). Signed-off-by: Tianyu Lan --- Change since v4: * Remove Hyper-V IOMMU IOMMU_INIT_FINISH related functions and set SWIOTLB_FORCE and swiotlb_unencrypted_base in the ms_hyperv_init_platform(). Call swiotlb_update_mem_attributes() in the hyperv_init(). Change since v3: * Add comment in pci-swiotlb-xen.c to explain why add dependency between hyperv_swiotlb_detect() and pci_ xen_swiotlb_detect(). * Return directly when fails to allocate Hyper-V swiotlb buffer in the hyperv_iommu_swiotlb_init(). --- arch/x86/hyperv/hv_init.c | 10 ++ arch/x86/kernel/cpu/mshyperv.c | 11 ++- include/linux/hyperv.h | 8 3 files changed, 28 insertions(+), 1 deletion(-) diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c index 24f4a06ac46a..9e18a280f89d 100644 --- a/arch/x86/hyperv/hv_init.c +++ b/arch/x86/hyperv/hv_init.c @@ -28,6 +28,7 @@ #include #include #include +#include int hyperv_init_cpuhp; u64 hv_current_partition_id = ~0ull; @@ -502,6 +503,15 @@ void __init hyperv_init(void) /* Query the VMs extended capability once, so that it can be cached. */ hv_query_ext_cap(0); + + /* +* Swiotlb bounce buffer needs to be mapped in extra address +* space. Map function doesn't work in the early place and so +* call swiotlb_update_mem_attributes() here. +*/ + if (hv_is_isolation_supported()) + swiotlb_update_mem_attributes(); + return; clean_guest_os_id: diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c index 4794b716ec79..baf3a0873552 100644 --- a/arch/x86/kernel/cpu/mshyperv.c +++ b/arch/x86/kernel/cpu/mshyperv.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include #include @@ -319,8 +320,16 @@ static void __init ms_hyperv_init_platform(void) pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n", ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b); - if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) + if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) { static_branch_enable(_type_snp); + swiotlb_unencrypted_base = ms_hyperv.shared_gpa_boundary; + } + + /* +* Enable swiotlb force mode in Isolation VM to +* use swiotlb bounce buffer for dma transaction. +*/ + swiotlb_force = SWIOTLB_FORCE; } if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) { diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index b823311eac79..1f037e114dc8 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -1726,6 +1726,14 @@ int hyperv_write_cfg_blk(struct pci_dev *dev, void *buf, unsigned int len, int hyperv_reg_block_invalidate(struct pci_dev *dev, void *context, void (*block_invalidate)(void *context, u64 block_mask)); +#if IS_ENABLED(CONFIG_HYPERV) +int __init hyperv_swiotlb_detect(void); +#else +static inline int __init hyperv_swiotlb_detect(void) +{ + return 0; +} +#endif struct hyperv_pci_block_ops { int (*read_block)(struct pci_dev *dev, void *buf, unsigned int buf_len, -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH V5 1/5] Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM
From: Tianyu Lan In Isolation VM with AMD SEV, bounce buffer needs to be accessed via extra address space which is above shared_gpa_boundary (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access physical address will be original physical address + shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of memory(vTOM). Memory addresses below vTOM are automatically treated as private while memory above vTOM is treated as shared. Expose swiotlb_unencrypted_base for platforms to set unencrypted memory base offset and platform calls swiotlb_update_mem_attributes() to remap swiotlb mem to unencrypted address space. memremap() can not be called in the early stage and so put remapping code into swiotlb_update_mem_attributes(). Store remap address and use it to copy data from/to swiotlb bounce buffer. Signed-off-by: Tianyu Lan --- Change since v3: * Fix boot up failure on the host with mem_encrypt=on. Move calloing of set_memory_decrypted() back from swiotlb_init_io_tlb_mem to swiotlb_late_init_with_tbl() and rmem_swiotlb_device_init(). Change since v2: * Leave mem->vaddr with phys_to_virt(mem->start) when fail to remap swiotlb memory. Change since v1: * Rework comment in the swiotlb_init_io_tlb_mem() * Make swiotlb_init_io_tlb_mem() back to return void. --- include/linux/swiotlb.h | 6 ++ kernel/dma/swiotlb.c| 43 +++-- 2 files changed, 47 insertions(+), 2 deletions(-) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 569272871375..f6c3638255d5 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -73,6 +73,9 @@ extern enum swiotlb_force swiotlb_force; * @end: The end address of the swiotlb memory pool. Used to do a quick * range check to see if the memory was in fact allocated by this * API. + * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb memory pool + * may be remapped in the memory encrypted case and store virtual + * address for bounce buffer operation. * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and * @end. For default swiotlb, this is command line adjustable via * setup_io_tlb_npages. @@ -92,6 +95,7 @@ extern enum swiotlb_force swiotlb_force; struct io_tlb_mem { phys_addr_t start; phys_addr_t end; + void *vaddr; unsigned long nslabs; unsigned long used; unsigned int index; @@ -186,4 +190,6 @@ static inline bool is_swiotlb_for_alloc(struct device *dev) } #endif /* CONFIG_DMA_RESTRICTED_POOL */ +extern phys_addr_t swiotlb_unencrypted_base; + #endif /* __LINUX_SWIOTLB_H */ diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 8e840fbbed7c..34e6ade4f73c 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -50,6 +50,7 @@ #include #include +#include #include #include #include @@ -72,6 +73,8 @@ enum swiotlb_force swiotlb_force; struct io_tlb_mem io_tlb_default_mem; +phys_addr_t swiotlb_unencrypted_base; + /* * Max segment that we can provide which (if pages are contingous) will * not be bounced (unless SWIOTLB_FORCE is set). @@ -155,6 +158,27 @@ static inline unsigned long nr_slots(u64 val) return DIV_ROUND_UP(val, IO_TLB_SIZE); } +/* + * Remap swioltb memory in the unencrypted physical address space + * when swiotlb_unencrypted_base is set. (e.g. for Hyper-V AMD SEV-SNP + * Isolation VMs). + */ +void *swiotlb_mem_remap(struct io_tlb_mem *mem, unsigned long bytes) +{ + void *vaddr = NULL; + + if (swiotlb_unencrypted_base) { + phys_addr_t paddr = mem->start + swiotlb_unencrypted_base; + + vaddr = memremap(paddr, bytes, MEMREMAP_WB); + if (!vaddr) + pr_err("Failed to map the unencrypted memory %llx size %lx.\n", + paddr, bytes); + } + + return vaddr; +} + /* * Early SWIOTLB allocation may be too early to allow an architecture to * perform the desired operations. This function allows the architecture to @@ -172,7 +196,12 @@ void __init swiotlb_update_mem_attributes(void) vaddr = phys_to_virt(mem->start); bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT); set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT); - memset(vaddr, 0, bytes); + + mem->vaddr = swiotlb_mem_remap(mem, bytes); + if (!mem->vaddr) + mem->vaddr = vaddr; + + memset(mem->vaddr, 0, bytes); } static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, @@ -196,7 +225,17 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, mem->slots[i].orig_addr = INVALID_PHY
[PATCH V5 0/5] x86/Hyper-V: Add Hyper-V Isolation VM support(Second part)
From: Tianyu Lan Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset is to add support for these Isolation VM support in Linux. The memory of these vms are encrypted and host can't access guest memory directly. Hyper-V provides new host visibility hvcall and the guest needs to call new hvcall to mark memory visible to host before sharing memory with host. For security, all network/storage stack memory should not be shared with host and so there is bounce buffer requests. Vmbus channel ring buffer already plays bounce buffer role because all data from/to host needs to copy from/to between the ring buffer and IO stack memory. So mark vmbus channel ring buffer visible. For SNP isolation VM, guest needs to access the shared memory via extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_ ISOLATION_CONFIG. The access physical address of the shared memory should be bounce buffer memory GPA plus with shared_gpa_boundary reported by CPUID. This patchset is to enable swiotlb bounce buffer for netvsc/storvsc in Isolation VM. This version follows Michael Kelley suggestion in the following link. https://lkml.org/lkml/2021/11/24/2044 Change since v4: * Remove Hyper-V IOMMU IOMMU_INIT_FINISH related functions and set SWIOTLB_FORCE and swiotlb_unencrypted_base in the ms_hyperv_init_platform(). Call swiotlb_update_mem_attributes() in the hyperv_init(). Change since v3: * Fix boot up failure on the host with mem_encrypt=on. Move calloing of set_memory_decrypted() back from swiotlb_init_io_tlb_mem to swiotlb_late_init_with_tbl() and rmem_swiotlb_device_init(). * Change code style of checking GUEST_MEM attribute in the hyperv_cc_platform_has(). * Add comment in pci-swiotlb-xen.c to explain why add dependency between hyperv_swiotlb_detect() and pci_ xen_swiotlb_detect(). * Return directly when fails to allocate Hyper-V swiotlb buffer in the hyperv_iommu_swiotlb_init(). Change since v2: * Remove Hyper-V dma ops and dma_alloc/free_noncontiguous. Add hv_map/unmap_memory() to map/umap netvsc rx/tx ring into extra address space. * Leave mem->vaddr in swiotlb code with phys_to_virt(mem->start) when fail to remap swiotlb memory. Change since v1: * Add Hyper-V Isolation support check in the cc_platform_has() and return true for guest memory encrypt attr. * Remove hv isolation check in the sev_setup_arch() Tianyu Lan (5): Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has() hyper-v: Enable swiotlb bounce buffer for Isolation VM scsi: storvsc: Add Isolation VM support for storvsc driver net: netvsc: Add Isolation VM support for netvsc driver arch/x86/hyperv/hv_init.c | 10 +++ arch/x86/hyperv/ivm.c | 28 ++ arch/x86/kernel/cc_platform.c | 12 +++ arch/x86/kernel/cpu/mshyperv.c| 11 ++- drivers/hv/hv_common.c| 11 +++ drivers/hv/vmbus_drv.c| 4 + drivers/net/hyperv/hyperv_net.h | 5 ++ drivers/net/hyperv/netvsc.c | 136 +- drivers/net/hyperv/netvsc_drv.c | 1 + drivers/net/hyperv/rndis_filter.c | 2 + drivers/scsi/storvsc_drv.c| 37 include/asm-generic/mshyperv.h| 2 + include/linux/hyperv.h| 14 +++ include/linux/swiotlb.h | 6 ++ kernel/dma/swiotlb.c | 43 +- 15 files changed, 300 insertions(+), 22 deletions(-) -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V4 3/5] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM
On 12/5/2021 4:34 PM, Juergen Gross wrote: On 05.12.21 09:18, Tianyu Lan wrote: From: Tianyu Lan hyperv Isolation VM requires bounce buffer support to copy data from/to encrypted memory and so enable swiotlb force mode to use swiotlb bounce buffer for DMA transaction. In Isolation VM with AMD SEV, the bounce buffer needs to be accessed via extra address space which is above shared_gpa_boundary (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access physical address will be original physical address + shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of memory(vTOM). Memory addresses below vTOM are automatically treated as private while memory above vTOM is treated as shared. Hyper-V initalizes swiotlb bounce buffer and default swiotlb needs to be disabled. pci_swiotlb_detect_override() and pci_swiotlb_detect_4gb() enable the default one. To override the setting, hyperv_swiotlb_detect() needs to run before these detect functions which depends on the pci_xen_swiotlb_ init(). Make pci_xen_swiotlb_init() depends on the hyperv_swiotlb _detect() to keep the order. Why? Does Hyper-V plan to support Xen PV guests? If not, I don't see the need for adding this change. This is to keep detect function calling order that Hyper-V detect callback needs to call before pci_swiotlb_detect_override() and pci_swiotlb_detect_4gb(). This is the same for why pci_swiotlb_detect_override() needs to depend on the pci_xen_swiotlb_detect(). Hyper-V also has such request and so make xen detect callback depends on Hyper-V one. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH V4 2/5] x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()
From: Tianyu Lan Hyper-V provides Isolation VM which has memory encrypt support. Add hyperv_cc_platform_has() and return true for check of GUEST_MEM_ENCRYPT attribute. Signed-off-by: Tianyu Lan --- Change since v3: * Change code style of checking GUEST_MEM attribute in the hyperv_cc_platform_has(). --- arch/x86/kernel/cc_platform.c | 12 1 file changed, 12 insertions(+) diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c index 03bb2f343ddb..27c06b32e7c4 100644 --- a/arch/x86/kernel/cc_platform.c +++ b/arch/x86/kernel/cc_platform.c @@ -11,6 +11,7 @@ #include #include +#include #include static bool __maybe_unused intel_cc_platform_has(enum cc_attr attr) @@ -58,9 +59,20 @@ static bool amd_cc_platform_has(enum cc_attr attr) #endif } +static bool hyperv_cc_platform_has(enum cc_attr attr) +{ +#ifdef CONFIG_HYPERV + return attr == CC_ATTR_GUEST_MEM_ENCRYPT; +#else + return false; +#endif +} bool cc_platform_has(enum cc_attr attr) { + if (hv_is_isolation_supported()) + return hyperv_cc_platform_has(attr); + if (sme_me_mask) return amd_cc_platform_has(attr); -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH V4 5/5] hv_netvsc: Add Isolation VM support for netvsc driver
From: Tianyu Lan In Isolation VM, all shared memory with host needs to mark visible to host via hvcall. vmbus_establish_gpadl() has already done it for netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_ pagebuffer() stills need to be handled. Use DMA API to map/umap these memory during sending/receiving packet and Hyper-V swiotlb bounce buffer dma address will be returned. The swiotlb bounce buffer has been masked to be visible to host during boot up. rx/tx ring buffer is allocated via vzalloc() and they need to be mapped into unencrypted address space(above vTOM) before sharing with host and accessing. Add hv_map/unmap_memory() to map/umap rx /tx ring buffer. Signed-off-by: Tianyu Lan --- Change since v3: * Replace HV_HYP_PAGE_SIZE with PAGE_SIZE and virt_to_hvpfn() with vmalloc_to_pfn() in the hv_map_memory() Change since v2: * Add hv_map/unmap_memory() to map/umap rx/tx ring buffer. --- arch/x86/hyperv/ivm.c | 28 ++ drivers/hv/hv_common.c| 11 +++ drivers/net/hyperv/hyperv_net.h | 5 ++ drivers/net/hyperv/netvsc.c | 136 +- drivers/net/hyperv/netvsc_drv.c | 1 + drivers/net/hyperv/rndis_filter.c | 2 + include/asm-generic/mshyperv.h| 2 + include/linux/hyperv.h| 5 ++ 8 files changed, 187 insertions(+), 3 deletions(-) diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c index 69c7a57f3307..2b994117581e 100644 --- a/arch/x86/hyperv/ivm.c +++ b/arch/x86/hyperv/ivm.c @@ -287,3 +287,31 @@ int hv_set_mem_host_visibility(unsigned long kbuffer, int pagecount, bool visibl kfree(pfn_array); return ret; } + +/* + * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM. + */ +void *hv_map_memory(void *addr, unsigned long size) +{ + unsigned long *pfns = kcalloc(size / PAGE_SIZE, + sizeof(unsigned long), GFP_KERNEL); + void *vaddr; + int i; + + if (!pfns) + return NULL; + + for (i = 0; i < size / PAGE_SIZE; i++) + pfns[i] = vmalloc_to_pfn(addr + i * PAGE_SIZE) + + (ms_hyperv.shared_gpa_boundary >> PAGE_SHIFT); + + vaddr = vmap_pfn(pfns, size / PAGE_SIZE, PAGE_KERNEL_IO); + kfree(pfns); + + return vaddr; +} + +void hv_unmap_memory(void *addr) +{ + vunmap(addr); +} diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c index 7be173a99f27..3c5cb1f70319 100644 --- a/drivers/hv/hv_common.c +++ b/drivers/hv/hv_common.c @@ -295,3 +295,14 @@ u64 __weak hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_s return HV_STATUS_INVALID_PARAMETER; } EXPORT_SYMBOL_GPL(hv_ghcb_hypercall); + +void __weak *hv_map_memory(void *addr, unsigned long size) +{ + return NULL; +} +EXPORT_SYMBOL_GPL(hv_map_memory); + +void __weak hv_unmap_memory(void *addr) +{ +} +EXPORT_SYMBOL_GPL(hv_unmap_memory); diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h index 315278a7cf88..cf69da0e296c 100644 --- a/drivers/net/hyperv/hyperv_net.h +++ b/drivers/net/hyperv/hyperv_net.h @@ -164,6 +164,7 @@ struct hv_netvsc_packet { u32 total_bytes; u32 send_buf_index; u32 total_data_buflen; + struct hv_dma_range *dma_range; }; #define NETVSC_HASH_KEYLEN 40 @@ -1074,6 +1075,7 @@ struct netvsc_device { /* Receive buffer allocated by us but manages by NetVSP */ void *recv_buf; + void *recv_original_buf; u32 recv_buf_size; /* allocated bytes */ struct vmbus_gpadl recv_buf_gpadl_handle; u32 recv_section_cnt; @@ -1082,6 +1084,7 @@ struct netvsc_device { /* Send buffer allocated by us */ void *send_buf; + void *send_original_buf; u32 send_buf_size; struct vmbus_gpadl send_buf_gpadl_handle; u32 send_section_cnt; @@ -1731,4 +1734,6 @@ struct rndis_message { #define RETRY_US_HI1 #define RETRY_MAX 2000/* >10 sec */ +void netvsc_dma_unmap(struct hv_device *hv_dev, + struct hv_netvsc_packet *packet); #endif /* _HYPERV_NET_H */ diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c index 396bc1c204e6..b7ade735a806 100644 --- a/drivers/net/hyperv/netvsc.c +++ b/drivers/net/hyperv/netvsc.c @@ -153,8 +153,21 @@ static void free_netvsc_device(struct rcu_head *head) int i; kfree(nvdev->extension); - vfree(nvdev->recv_buf); - vfree(nvdev->send_buf); + + if (nvdev->recv_original_buf) { + hv_unmap_memory(nvdev->recv_buf); + vfree(nvdev->recv_original_buf); + } else { + vfree(nvdev->recv_buf); + } + + if (nvdev->send_original_buf) { + hv_unmap_memory(nvdev->send_buf); + vfree(nvdev->send_original_buf); + } else { + vfree(nvdev-&
[PATCH V4 4/5] scsi: storvsc: Add Isolation VM support for storvsc driver
From: Tianyu Lan In Isolation VM, all shared memory with host needs to mark visible to host via hvcall. vmbus_establish_gpadl() has already done it for storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_ mpb_desc() still needs to be handled. Use DMA API(scsi_dma_map/unmap) to map these memory during sending/receiving packet and return swiotlb bounce buffer dma address. In Isolation VM, swiotlb bounce buffer is marked to be visible to host and the swiotlb force mode is enabled. Set device's dma min align mask to HV_HYP_PAGE_SIZE - 1 in order to keep the original data offset in the bounce buffer. Signed-off-by: Tianyu Lan --- drivers/hv/vmbus_drv.c | 1 + drivers/scsi/storvsc_drv.c | 37 + include/linux/hyperv.h | 1 + 3 files changed, 23 insertions(+), 16 deletions(-) diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index 0a64ccfafb8b..ae6ec503399a 100644 --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -2121,6 +2121,7 @@ int vmbus_device_register(struct hv_device *child_device_obj) hv_debug_add_dev_dir(child_device_obj); child_device_obj->device.dma_mask = _dma_mask; + child_device_obj->device.dma_parms = _device_obj->dma_parms; return 0; err_kset_unregister: diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index 20595c0ba0ae..ae293600d799 100644 --- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -21,6 +21,8 @@ #include #include #include +#include + #include #include #include @@ -1336,6 +1338,7 @@ static void storvsc_on_channel_callback(void *context) continue; } request = (struct storvsc_cmd_request *)scsi_cmd_priv(scmnd); + scsi_dma_unmap(scmnd); } storvsc_on_receive(stor_device, packet, request); @@ -1749,7 +1752,6 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd) struct hv_host_device *host_dev = shost_priv(host); struct hv_device *dev = host_dev->dev; struct storvsc_cmd_request *cmd_request = scsi_cmd_priv(scmnd); - int i; struct scatterlist *sgl; unsigned int sg_count; struct vmscsi_request *vm_srb; @@ -1831,10 +1833,11 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd) payload_sz = sizeof(cmd_request->mpb); if (sg_count) { - unsigned int hvpgoff, hvpfns_to_add; unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset); unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length); - u64 hvpfn; + struct scatterlist *sg; + unsigned long hvpfn, hvpfns_to_add; + int j, i = 0; if (hvpg_count > MAX_PAGE_BUFFER_COUNT) { @@ -1848,21 +1851,22 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd) payload->range.len = length; payload->range.offset = offset_in_hvpg; + sg_count = scsi_dma_map(scmnd); + if (sg_count < 0) + return SCSI_MLQUEUE_DEVICE_BUSY; - for (i = 0; sgl != NULL; sgl = sg_next(sgl)) { + for_each_sg(sgl, sg, sg_count, j) { /* -* Init values for the current sgl entry. hvpgoff -* and hvpfns_to_add are in units of Hyper-V size -* pages. Handling the PAGE_SIZE != HV_HYP_PAGE_SIZE -* case also handles values of sgl->offset that are -* larger than PAGE_SIZE. Such offsets are handled -* even on other than the first sgl entry, provided -* they are a multiple of PAGE_SIZE. +* Init values for the current sgl entry. hvpfns_to_add +* is in units of Hyper-V size pages. Handling the +* PAGE_SIZE != HV_HYP_PAGE_SIZE case also handles +* values of sgl->offset that are larger than PAGE_SIZE. +* Such offsets are handled even on other than the first +* sgl entry, provided they are a multiple of PAGE_SIZE. */ - hvpgoff = HVPFN_DOWN(sgl->offset); - hvpfn = page_to_hvpfn(sg_page(sgl)) + hvpgoff; - hvpfns_to_add = HVPFN_UP(sgl->offset + sgl->length) - - hvpgoff; + hvpfn = HVPFN_DOWN(sg_dma_address(sg)); + hvpfns_to_add = HVPFN_UP(sg_dma_address(sg) + +
[PATCH V4 3/5] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM
From: Tianyu Lan hyperv Isolation VM requires bounce buffer support to copy data from/to encrypted memory and so enable swiotlb force mode to use swiotlb bounce buffer for DMA transaction. In Isolation VM with AMD SEV, the bounce buffer needs to be accessed via extra address space which is above shared_gpa_boundary (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access physical address will be original physical address + shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of memory(vTOM). Memory addresses below vTOM are automatically treated as private while memory above vTOM is treated as shared. Hyper-V initalizes swiotlb bounce buffer and default swiotlb needs to be disabled. pci_swiotlb_detect_override() and pci_swiotlb_detect_4gb() enable the default one. To override the setting, hyperv_swiotlb_detect() needs to run before these detect functions which depends on the pci_xen_swiotlb_ init(). Make pci_xen_swiotlb_init() depends on the hyperv_swiotlb _detect() to keep the order. Swiotlb bounce buffer code calls set_memory_decrypted() to mark bounce buffer visible to host and map it in extra address space via memremap. Populate the shared_gpa_boundary (vTOM) via swiotlb_unencrypted_base variable. The map function memremap() can't work in the early place hyperv_iommu_swiotlb_init() and so call swiotlb_update_mem_attributes() in the hyperv_iommu_swiotlb_later_init(). Signed-off-by: Tianyu Lan --- Change since v3: * Add comment in pci-swiotlb-xen.c to explain why add dependency between hyperv_swiotlb_detect() and pci_ xen_swiotlb_detect(). * Return directly when fails to allocate Hyper-V swiotlb buffer in the hyperv_iommu_swiotlb_init(). --- arch/x86/xen/pci-swiotlb-xen.c | 12 ++- drivers/hv/vmbus_drv.c | 3 ++ drivers/iommu/hyperv-iommu.c | 58 ++ include/linux/hyperv.h | 8 + 4 files changed, 80 insertions(+), 1 deletion(-) diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c index 46df59aeaa06..8e2ee3ce6374 100644 --- a/arch/x86/xen/pci-swiotlb-xen.c +++ b/arch/x86/xen/pci-swiotlb-xen.c @@ -4,6 +4,7 @@ #include #include +#include #include #include @@ -90,7 +91,16 @@ int pci_xen_swiotlb_init_late(void) } EXPORT_SYMBOL_GPL(pci_xen_swiotlb_init_late); +/* + * Hyper-V initalizes swiotlb bounce buffer and default swiotlb + * needs to be disabled. pci_swiotlb_detect_override() and + * pci_swiotlb_detect_4gb() enable the default one. To override + * the setting, hyperv_swiotlb_detect() needs to run before + * these detect functions which depends on the pci_xen_swiotlb_ + * init(). Make pci_xen_swiotlb_init() depends on the hyperv_swiotlb + * _detect() to keep the order. + */ IOMMU_INIT_FINISH(pci_xen_swiotlb_detect, - NULL, + hyperv_swiotlb_detect, pci_xen_swiotlb_init, NULL); diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index 392c1ac4f819..0a64ccfafb8b 100644 --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -33,6 +33,7 @@ #include #include #include +#include #include #include "hyperv_vmbus.h" @@ -2078,6 +2079,7 @@ struct hv_device *vmbus_device_create(const guid_t *type, return child_device_obj; } +static u64 vmbus_dma_mask = DMA_BIT_MASK(64); /* * vmbus_device_register - Register the child device */ @@ -2118,6 +2120,7 @@ int vmbus_device_register(struct hv_device *child_device_obj) } hv_debug_add_dev_dir(child_device_obj); + child_device_obj->device.dma_mask = _dma_mask; return 0; err_kset_unregister: diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c index e285a220c913..44ba24d9e06c 100644 --- a/drivers/iommu/hyperv-iommu.c +++ b/drivers/iommu/hyperv-iommu.c @@ -13,14 +13,20 @@ #include #include #include +#include +#include #include #include #include #include +#include +#include #include #include #include +#include +#include #include "irq_remapping.h" @@ -337,4 +343,56 @@ static const struct irq_domain_ops hyperv_root_ir_domain_ops = { .free = hyperv_root_irq_remapping_free, }; +static void __init hyperv_iommu_swiotlb_init(void) +{ + unsigned long hyperv_io_tlb_size; + void *hyperv_io_tlb_start; + + /* +* Allocate Hyper-V swiotlb bounce buffer at early place +* to reserve large contiguous memory. +*/ + hyperv_io_tlb_size = swiotlb_size_or_default(); + hyperv_io_tlb_start = memblock_alloc(hyperv_io_tlb_size, PAGE_SIZE); + + if (!hyperv_io_tlb_start) { + pr_warn("Fail to allocate Hyper-V swiotlb buffer.\n"); + return; + } + + swiotlb_init_with_tbl(hyperv_io_tlb_start, + hyperv_io_tlb_size >>
[PATCH V4 1/5] Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM
From: Tianyu Lan In Isolation VM with AMD SEV, bounce buffer needs to be accessed via extra address space which is above shared_gpa_boundary (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access physical address will be original physical address + shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of memory(vTOM). Memory addresses below vTOM are automatically treated as private while memory above vTOM is treated as shared. Expose swiotlb_unencrypted_base for platforms to set unencrypted memory base offset and platform calls swiotlb_update_mem_attributes() to remap swiotlb mem to unencrypted address space. memremap() can not be called in the early stage and so put remapping code into swiotlb_update_mem_attributes(). Store remap address and use it to copy data from/to swiotlb bounce buffer. Signed-off-by: Tianyu Lan --- Change since v3: * Fix boot up failure on the host with mem_encrypt=on. Move calloing of set_memory_decrypted() back from swiotlb_init_io_tlb_mem to swiotlb_late_init_with_tbl() and rmem_swiotlb_device_init(). Change since v2: * Leave mem->vaddr with phys_to_virt(mem->start) when fail to remap swiotlb memory. Change since v1: * Rework comment in the swiotlb_init_io_tlb_mem() * Make swiotlb_init_io_tlb_mem() back to return void. --- include/linux/swiotlb.h | 6 ++ kernel/dma/swiotlb.c| 43 +++-- 2 files changed, 47 insertions(+), 2 deletions(-) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 569272871375..f6c3638255d5 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -73,6 +73,9 @@ extern enum swiotlb_force swiotlb_force; * @end: The end address of the swiotlb memory pool. Used to do a quick * range check to see if the memory was in fact allocated by this * API. + * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb memory pool + * may be remapped in the memory encrypted case and store virtual + * address for bounce buffer operation. * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and * @end. For default swiotlb, this is command line adjustable via * setup_io_tlb_npages. @@ -92,6 +95,7 @@ extern enum swiotlb_force swiotlb_force; struct io_tlb_mem { phys_addr_t start; phys_addr_t end; + void *vaddr; unsigned long nslabs; unsigned long used; unsigned int index; @@ -186,4 +190,6 @@ static inline bool is_swiotlb_for_alloc(struct device *dev) } #endif /* CONFIG_DMA_RESTRICTED_POOL */ +extern phys_addr_t swiotlb_unencrypted_base; + #endif /* __LINUX_SWIOTLB_H */ diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 8e840fbbed7c..34e6ade4f73c 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -50,6 +50,7 @@ #include #include +#include #include #include #include @@ -72,6 +73,8 @@ enum swiotlb_force swiotlb_force; struct io_tlb_mem io_tlb_default_mem; +phys_addr_t swiotlb_unencrypted_base; + /* * Max segment that we can provide which (if pages are contingous) will * not be bounced (unless SWIOTLB_FORCE is set). @@ -155,6 +158,27 @@ static inline unsigned long nr_slots(u64 val) return DIV_ROUND_UP(val, IO_TLB_SIZE); } +/* + * Remap swioltb memory in the unencrypted physical address space + * when swiotlb_unencrypted_base is set. (e.g. for Hyper-V AMD SEV-SNP + * Isolation VMs). + */ +void *swiotlb_mem_remap(struct io_tlb_mem *mem, unsigned long bytes) +{ + void *vaddr = NULL; + + if (swiotlb_unencrypted_base) { + phys_addr_t paddr = mem->start + swiotlb_unencrypted_base; + + vaddr = memremap(paddr, bytes, MEMREMAP_WB); + if (!vaddr) + pr_err("Failed to map the unencrypted memory %llx size %lx.\n", + paddr, bytes); + } + + return vaddr; +} + /* * Early SWIOTLB allocation may be too early to allow an architecture to * perform the desired operations. This function allows the architecture to @@ -172,7 +196,12 @@ void __init swiotlb_update_mem_attributes(void) vaddr = phys_to_virt(mem->start); bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT); set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT); - memset(vaddr, 0, bytes); + + mem->vaddr = swiotlb_mem_remap(mem, bytes); + if (!mem->vaddr) + mem->vaddr = vaddr; + + memset(mem->vaddr, 0, bytes); } static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, @@ -196,7 +225,17 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, mem->slots[i].orig_addr = INVALID_PHY
[PATCH V4 0/5] x86/Hyper-V: Add Hyper-V Isolation VM support(Second part)
From: Tianyu Lan Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset is to add support for these Isolation VM support in Linux. The memory of these vms are encrypted and host can't access guest memory directly. Hyper-V provides new host visibility hvcall and the guest needs to call new hvcall to mark memory visible to host before sharing memory with host. For security, all network/storage stack memory should not be shared with host and so there is bounce buffer requests. Vmbus channel ring buffer already plays bounce buffer role because all data from/to host needs to copy from/to between the ring buffer and IO stack memory. So mark vmbus channel ring buffer visible. For SNP isolation VM, guest needs to access the shared memory via extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_ ISOLATION_CONFIG. The access physical address of the shared memory should be bounce buffer memory GPA plus with shared_gpa_boundary reported by CPUID. This patchset is to enable swiotlb bounce buffer for netvsc/storvsc in Isolation VM. This version follows Michael Kelley suggestion in the following link. https://lkml.org/lkml/2021/11/24/2044 Change since v3: * Fix boot up failure on the host with mem_encrypt=on. Move calloing of set_memory_decrypted() back from swiotlb_init_io_tlb_mem to swiotlb_late_init_with_tbl() and rmem_swiotlb_device_init(). * Change code style of checking GUEST_MEM attribute in the hyperv_cc_platform_has(). * Add comment in pci-swiotlb-xen.c to explain why add dependency between hyperv_swiotlb_detect() and pci_ xen_swiotlb_detect(). * Return directly when fails to allocate Hyper-V swiotlb buffer in the hyperv_iommu_swiotlb_init(). Change since v2: * Remove Hyper-V dma ops and dma_alloc/free_noncontiguous. Add hv_map/unmap_memory() to map/umap netvsc rx/tx ring into extra address space. * Leave mem->vaddr in swiotlb code with phys_to_virt(mem->start) when fail to remap swiotlb memory. Change since v1: * Add Hyper-V Isolation support check in the cc_platform_has() and return true for guest memory encrypt attr. * Remove hv isolation check in the sev_setup_arch() Tianyu Lan (5): Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has() hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM scsi: storvsc: Add Isolation VM support for storvsc driver hv_netvsc: Add Isolation VM support for netvsc driver arch/x86/hyperv/ivm.c | 28 ++ arch/x86/kernel/cc_platform.c | 12 +++ arch/x86/xen/pci-swiotlb-xen.c| 12 ++- drivers/hv/hv_common.c| 11 +++ drivers/hv/vmbus_drv.c| 4 + drivers/iommu/hyperv-iommu.c | 58 + drivers/net/hyperv/hyperv_net.h | 5 ++ drivers/net/hyperv/netvsc.c | 136 +- drivers/net/hyperv/netvsc_drv.c | 1 + drivers/net/hyperv/rndis_filter.c | 2 + drivers/scsi/storvsc_drv.c| 37 include/asm-generic/mshyperv.h| 2 + include/linux/hyperv.h| 14 +++ include/linux/swiotlb.h | 6 ++ kernel/dma/swiotlb.c | 43 +- 15 files changed, 349 insertions(+), 22 deletions(-) -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V3 5/5] hv_netvsc: Add Isolation VM support for netvsc driver
On 12/4/2021 2:59 AM, Michael Kelley (LINUX) wrote: + +/* + * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM. + */ +void *hv_map_memory(void *addr, unsigned long size) +{ + unsigned long *pfns = kcalloc(size / HV_HYP_PAGE_SIZE, This should be just PAGE_SIZE, as this code is unrelated to communication with Hyper-V. Yes, agree. Will update. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V3 3/5] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM
On 12/4/2021 3:17 AM, Michael Kelley (LINUX) wrote: +static void __init hyperv_iommu_swiotlb_init(void) +{ + unsigned long hyperv_io_tlb_size; + void *hyperv_io_tlb_start; + + /* +* Allocate Hyper-V swiotlb bounce buffer at early place +* to reserve large contiguous memory. +*/ + hyperv_io_tlb_size = swiotlb_size_or_default(); + hyperv_io_tlb_start = memblock_alloc(hyperv_io_tlb_size, PAGE_SIZE); + + if (!hyperv_io_tlb_start) + pr_warn("Fail to allocate Hyper-V swiotlb buffer.\n"); In the error case, won't swiotlb_init_with_tlb() end up panic'ing when it tries to zero out the memory? The only real choice here is to return immediately after printing the message, and not call swiotlb_init_with_tlb(). Yes, agree. Will update. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V3 1/5] Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM
On 12/4/2021 4:06 AM, Tom Lendacky wrote: Hi Tom: Thanks for your test. Could you help to test the following patch and check whether it can fix the issue. The patch is mangled. Is the only difference where set_memory_decrypted() is called? I de-mangled the patch. No more stack traces with SME active. Thanks, Tom Hi Tom: Thanks a lot for your rework and test. I will update in the next version. Thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V3 3/5] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM
On 12/2/2021 10:43 PM, Wei Liu wrote: On Wed, Dec 01, 2021 at 11:02:54AM -0500, Tianyu Lan wrote: [...] diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c index 46df59aeaa06..30fd0600b008 100644 --- a/arch/x86/xen/pci-swiotlb-xen.c +++ b/arch/x86/xen/pci-swiotlb-xen.c @@ -4,6 +4,7 @@ #include #include +#include #include #include @@ -91,6 +92,6 @@ int pci_xen_swiotlb_init_late(void) EXPORT_SYMBOL_GPL(pci_xen_swiotlb_init_late); IOMMU_INIT_FINISH(pci_xen_swiotlb_detect, - NULL, + hyperv_swiotlb_detect, It is not immediately obvious why this is needed just by reading the code. Please consider copying some of the text in the commit message to a comment here. Thanks for suggestion. Will update. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V3 2/5] x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()
On 12/2/2021 10:39 PM, Wei Liu wrote: +static bool hyperv_cc_platform_has(enum cc_attr attr) +{ +#ifdef CONFIG_HYPERV + if (attr == CC_ATTR_GUEST_MEM_ENCRYPT) + return true; + else + return false; This can be simplified as return attr == CC_ATTR_GUEST_MEM_ENCRYPT; Wei. Hi Wei: Thanks for your review. Will update. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V3 1/5] Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM
On 12/2/2021 10:42 PM, Tom Lendacky wrote: On 12/1/21 10:02 AM, Tianyu Lan wrote: From: Tianyu Lan In Isolation VM with AMD SEV, bounce buffer needs to be accessed via extra address space which is above shared_gpa_boundary (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access physical address will be original physical address + shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of memory(vTOM). Memory addresses below vTOM are automatically treated as private while memory above vTOM is treated as shared. Expose swiotlb_unencrypted_base for platforms to set unencrypted memory base offset and platform calls swiotlb_update_mem_attributes() to remap swiotlb mem to unencrypted address space. memremap() can not be called in the early stage and so put remapping code into swiotlb_update_mem_attributes(). Store remap address and use it to copy data from/to swiotlb bounce buffer. Signed-off-by: Tianyu Lan This patch results in the following stack trace during a bare-metal boot on my EPYC system with SME active (e.g. mem_encrypt=on): [ 0.123932] BUG: Bad page state in process swapper pfn:108001 [ 0.123942] page:(ptrval) refcount:0 mapcount:-128 mapping: index:0x0 pfn:0x108001 [ 0.123946] flags: 0x17c000(node=0|zone=2|lastcpupid=0x1f) [ 0.123952] raw: 0017c000 88904f2d5e80 88904f2d5e80 [ 0.123954] raw: ff7f [ 0.123955] page dumped because: nonzero mapcount [ 0.123957] Modules linked in: [ 0.123961] CPU: 0 PID: 0 Comm: swapper Not tainted 5.16.0-rc3-sos-custom #2 [ 0.123964] Hardware name: AMD Corporation [ 0.123967] Call Trace: [ 0.123971] [ 0.123975] dump_stack_lvl+0x48/0x5e [ 0.123985] bad_page.cold+0x65/0x96 [ 0.123990] __free_pages_ok+0x3a8/0x410 [ 0.123996] memblock_free_all+0x171/0x1dc [ 0.124005] mem_init+0x1f/0x14b [ 0.124011] start_kernel+0x3b5/0x6a1 [ 0.124016] secondary_startup_64_no_verify+0xb0/0xbb [ 0.124022] I see ~40 of these traces, each for different pfns. Thanks, Tom Hi Tom: Thanks for your test. Could you help to test the following patch and check whether it can fix the issue. diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 569272871375..f6c3638255d5 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -73,6 +73,9 @@ extern enum swiotlb_force swiotlb_force; * @end: The end address of the swiotlb memory pool. Used to do a quick * range check to see if the memory was in fact allocated by this * API. + * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb memory pool + * may be remapped in the memory encrypted case and store virtual + * address for bounce buffer operation. * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and * @end. For default swiotlb, this is command line adjustable via * setup_io_tlb_npages. @@ -92,6 +95,7 @@ extern enum swiotlb_force swiotlb_force; struct io_tlb_mem { phys_addr_t start; phys_addr_t end; + void *vaddr; unsigned long nslabs; unsigned long used; unsigned int index; @@ -186,4 +190,6 @@ static inline bool is_swiotlb_for_alloc(struct device *dev) } #endif /* CONFIG_DMA_RESTRICTED_POOL */ +extern phys_addr_t swiotlb_unencrypted_base; + #endif /* __LINUX_SWIOTLB_H */ diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 8e840fbbed7c..34e6ade4f73c 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -50,6 +50,7 @@ #include #include +#include #include #include #include @@ -72,6 +73,8 @@ enum swiotlb_force swiotlb_force; struct io_tlb_mem io_tlb_default_mem; +phys_addr_t swiotlb_unencrypted_base; + /* * Max segment that we can provide which (if pages are contingous) will * not be bounced (unless SWIOTLB_FORCE is set). @@ -155,6 +158,27 @@ static inline unsigned long nr_slots(u64 val) return DIV_ROUND_UP(val, IO_TLB_SIZE); } +/* + * Remap swioltb memory in the unencrypted physical address space + * when swiotlb_unencrypted_base is set. (e.g. for Hyper-V AMD SEV-SNP + * Isolation VMs). + */ +void *swiotlb_mem_remap(struct io_tlb_mem *mem, unsigned long bytes) +{ + void *vaddr = NULL; + + if (swiotlb_unencrypted_base) { + phys_addr_t paddr = mem->start + swiotlb_unencrypted_base; + + vaddr = memremap(paddr, bytes, MEMREMAP_WB); + if (!vaddr) + pr_err("Failed to map the unencrypted memory %llx size %lx.\n", + paddr, bytes); + } + + return vaddr; +} + /* * Early SWIOTLB allocation may be too early to allow an architecture to * perform the desired operations. This fu
[PATCH V3 3/5] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM
From: Tianyu Lan hyperv Isolation VM requires bounce buffer support to copy data from/to encrypted memory and so enable swiotlb force mode to use swiotlb bounce buffer for DMA transaction. In Isolation VM with AMD SEV, the bounce buffer needs to be accessed via extra address space which is above shared_gpa_boundary (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access physical address will be original physical address + shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of memory(vTOM). Memory addresses below vTOM are automatically treated as private while memory above vTOM is treated as shared. Hyper-V initalizes swiotlb bounce buffer and default swiotlb needs to be disabled. pci_swiotlb_detect_override() and pci_swiotlb_detect_4gb() enable the default one. To override the setting, hyperv_swiotlb_detect() needs to run before these detect functions which depends on the pci_xen_swiotlb_ init(). Make pci_xen_swiotlb_init() depends on the hyperv_swiotlb _detect() to keep the order. Swiotlb bounce buffer code calls set_memory_decrypted() to mark bounce buffer visible to host and map it in extra address space via memremap. Populate the shared_gpa_boundary (vTOM) via swiotlb_unencrypted_base variable. The map function memremap() can't work in the early place hyperv_iommu_swiotlb_init() and so call swiotlb_update_mem_attributes() in the hyperv_iommu_swiotlb_later_init(). Signed-off-by: Tianyu Lan --- arch/x86/xen/pci-swiotlb-xen.c | 3 +- drivers/hv/vmbus_drv.c | 3 ++ drivers/iommu/hyperv-iommu.c | 56 ++ include/linux/hyperv.h | 8 + 4 files changed, 69 insertions(+), 1 deletion(-) diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c index 46df59aeaa06..30fd0600b008 100644 --- a/arch/x86/xen/pci-swiotlb-xen.c +++ b/arch/x86/xen/pci-swiotlb-xen.c @@ -4,6 +4,7 @@ #include #include +#include #include #include @@ -91,6 +92,6 @@ int pci_xen_swiotlb_init_late(void) EXPORT_SYMBOL_GPL(pci_xen_swiotlb_init_late); IOMMU_INIT_FINISH(pci_xen_swiotlb_detect, - NULL, + hyperv_swiotlb_detect, pci_xen_swiotlb_init, NULL); diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index 392c1ac4f819..0a64ccfafb8b 100644 --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -33,6 +33,7 @@ #include #include #include +#include #include #include "hyperv_vmbus.h" @@ -2078,6 +2079,7 @@ struct hv_device *vmbus_device_create(const guid_t *type, return child_device_obj; } +static u64 vmbus_dma_mask = DMA_BIT_MASK(64); /* * vmbus_device_register - Register the child device */ @@ -2118,6 +2120,7 @@ int vmbus_device_register(struct hv_device *child_device_obj) } hv_debug_add_dev_dir(child_device_obj); + child_device_obj->device.dma_mask = _dma_mask; return 0; err_kset_unregister: diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c index e285a220c913..dd729d49a1eb 100644 --- a/drivers/iommu/hyperv-iommu.c +++ b/drivers/iommu/hyperv-iommu.c @@ -13,14 +13,20 @@ #include #include #include +#include +#include #include #include #include #include +#include +#include #include #include #include +#include +#include #include "irq_remapping.h" @@ -337,4 +343,54 @@ static const struct irq_domain_ops hyperv_root_ir_domain_ops = { .free = hyperv_root_irq_remapping_free, }; +static void __init hyperv_iommu_swiotlb_init(void) +{ + unsigned long hyperv_io_tlb_size; + void *hyperv_io_tlb_start; + + /* +* Allocate Hyper-V swiotlb bounce buffer at early place +* to reserve large contiguous memory. +*/ + hyperv_io_tlb_size = swiotlb_size_or_default(); + hyperv_io_tlb_start = memblock_alloc(hyperv_io_tlb_size, PAGE_SIZE); + + if (!hyperv_io_tlb_start) + pr_warn("Fail to allocate Hyper-V swiotlb buffer.\n"); + + swiotlb_init_with_tbl(hyperv_io_tlb_start, + hyperv_io_tlb_size >> IO_TLB_SHIFT, true); +} + +int __init hyperv_swiotlb_detect(void) +{ + if (!hypervisor_is_type(X86_HYPER_MS_HYPERV)) + return 0; + + if (!hv_is_isolation_supported()) + return 0; + + /* +* Enable swiotlb force mode in Isolation VM to +* use swiotlb bounce buffer for dma transaction. +*/ + if (hv_isolation_type_snp()) + swiotlb_unencrypted_base = ms_hyperv.shared_gpa_boundary; + swiotlb_force = SWIOTLB_FORCE; + return 1; +} + +static void __init hyperv_iommu_swiotlb_later_init(void) +{ + /* +* Swiotlb bounce buffer needs to be mapped in extra address +* space. Map function doesn't work in the early place and so +* ca
[PATCH V3 5/5] hv_netvsc: Add Isolation VM support for netvsc driver
From: Tianyu Lan In Isolation VM, all shared memory with host needs to mark visible to host via hvcall. vmbus_establish_gpadl() has already done it for netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_ pagebuffer() stills need to be handled. Use DMA API to map/umap these memory during sending/receiving packet and Hyper-V swiotlb bounce buffer dma adress will be returned. The swiotlb bounce buffer has been masked to be visible to host during boot up. rx/tx ring buffer is allocated via vzalloc() and they need to be mapped into unencrypted address space(above vTOM) before sharing with host and accessing. Add hv_map/unmap_memory() to map/umap rx /tx ring buffer. Signed-off-by: Tianyu Lan --- Change since v2: * Add hv_map/unmap_memory() to map/umap rx/tx ring buffer. --- arch/x86/hyperv/ivm.c | 28 ++ drivers/hv/hv_common.c| 11 +++ drivers/net/hyperv/hyperv_net.h | 5 ++ drivers/net/hyperv/netvsc.c | 136 +- drivers/net/hyperv/netvsc_drv.c | 1 + drivers/net/hyperv/rndis_filter.c | 2 + include/asm-generic/mshyperv.h| 2 + include/linux/hyperv.h| 5 ++ 8 files changed, 187 insertions(+), 3 deletions(-) diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c index 69c7a57f3307..9f78d8f67ea3 100644 --- a/arch/x86/hyperv/ivm.c +++ b/arch/x86/hyperv/ivm.c @@ -287,3 +287,31 @@ int hv_set_mem_host_visibility(unsigned long kbuffer, int pagecount, bool visibl kfree(pfn_array); return ret; } + +/* + * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM. + */ +void *hv_map_memory(void *addr, unsigned long size) +{ + unsigned long *pfns = kcalloc(size / HV_HYP_PAGE_SIZE, + sizeof(unsigned long), GFP_KERNEL); + void *vaddr; + int i; + + if (!pfns) + return NULL; + + for (i = 0; i < size / PAGE_SIZE; i++) + pfns[i] = virt_to_hvpfn(addr + i * PAGE_SIZE) + + (ms_hyperv.shared_gpa_boundary >> PAGE_SHIFT); + + vaddr = vmap_pfn(pfns, size / PAGE_SIZE, PAGE_KERNEL_IO); + kfree(pfns); + + return vaddr; +} + +void hv_unmap_memory(void *addr) +{ + vunmap(addr); +} diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c index 7be173a99f27..3c5cb1f70319 100644 --- a/drivers/hv/hv_common.c +++ b/drivers/hv/hv_common.c @@ -295,3 +295,14 @@ u64 __weak hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_s return HV_STATUS_INVALID_PARAMETER; } EXPORT_SYMBOL_GPL(hv_ghcb_hypercall); + +void __weak *hv_map_memory(void *addr, unsigned long size) +{ + return NULL; +} +EXPORT_SYMBOL_GPL(hv_map_memory); + +void __weak hv_unmap_memory(void *addr) +{ +} +EXPORT_SYMBOL_GPL(hv_unmap_memory); diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h index 315278a7cf88..cf69da0e296c 100644 --- a/drivers/net/hyperv/hyperv_net.h +++ b/drivers/net/hyperv/hyperv_net.h @@ -164,6 +164,7 @@ struct hv_netvsc_packet { u32 total_bytes; u32 send_buf_index; u32 total_data_buflen; + struct hv_dma_range *dma_range; }; #define NETVSC_HASH_KEYLEN 40 @@ -1074,6 +1075,7 @@ struct netvsc_device { /* Receive buffer allocated by us but manages by NetVSP */ void *recv_buf; + void *recv_original_buf; u32 recv_buf_size; /* allocated bytes */ struct vmbus_gpadl recv_buf_gpadl_handle; u32 recv_section_cnt; @@ -1082,6 +1084,7 @@ struct netvsc_device { /* Send buffer allocated by us */ void *send_buf; + void *send_original_buf; u32 send_buf_size; struct vmbus_gpadl send_buf_gpadl_handle; u32 send_section_cnt; @@ -1731,4 +1734,6 @@ struct rndis_message { #define RETRY_US_HI1 #define RETRY_MAX 2000/* >10 sec */ +void netvsc_dma_unmap(struct hv_device *hv_dev, + struct hv_netvsc_packet *packet); #endif /* _HYPERV_NET_H */ diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c index 396bc1c204e6..b7ade735a806 100644 --- a/drivers/net/hyperv/netvsc.c +++ b/drivers/net/hyperv/netvsc.c @@ -153,8 +153,21 @@ static void free_netvsc_device(struct rcu_head *head) int i; kfree(nvdev->extension); - vfree(nvdev->recv_buf); - vfree(nvdev->send_buf); + + if (nvdev->recv_original_buf) { + hv_unmap_memory(nvdev->recv_buf); + vfree(nvdev->recv_original_buf); + } else { + vfree(nvdev->recv_buf); + } + + if (nvdev->send_original_buf) { + hv_unmap_memory(nvdev->send_buf); + vfree(nvdev->send_original_buf); + } else { + vfree(nvdev->send_buf); + } + kfree(nvdev->send_section_map); for (i = 0; i < VRSS_CHANNEL_MAX; i++) { @@ -338,6 +3
[PATCH V3 4/5] scsi: storvsc: Add Isolation VM support for storvsc driver
From: Tianyu Lan In Isolation VM, all shared memory with host needs to mark visible to host via hvcall. vmbus_establish_gpadl() has already done it for storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_ mpb_desc() still needs to be handled. Use DMA API(scsi_dma_map/unmap) to map these memory during sending/receiving packet and return swiotlb bounce buffer dma address. In Isolation VM, swiotlb bounce buffer is marked to be visible to host and the swiotlb force mode is enabled. Set device's dma min align mask to HV_HYP_PAGE_SIZE - 1 in order to keep the original data offset in the bounce buffer. Signed-off-by: Tianyu Lan --- drivers/hv/vmbus_drv.c | 1 + drivers/scsi/storvsc_drv.c | 37 + include/linux/hyperv.h | 1 + 3 files changed, 23 insertions(+), 16 deletions(-) diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index 0a64ccfafb8b..ae6ec503399a 100644 --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -2121,6 +2121,7 @@ int vmbus_device_register(struct hv_device *child_device_obj) hv_debug_add_dev_dir(child_device_obj); child_device_obj->device.dma_mask = _dma_mask; + child_device_obj->device.dma_parms = _device_obj->dma_parms; return 0; err_kset_unregister: diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index 20595c0ba0ae..ae293600d799 100644 --- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -21,6 +21,8 @@ #include #include #include +#include + #include #include #include @@ -1336,6 +1338,7 @@ static void storvsc_on_channel_callback(void *context) continue; } request = (struct storvsc_cmd_request *)scsi_cmd_priv(scmnd); + scsi_dma_unmap(scmnd); } storvsc_on_receive(stor_device, packet, request); @@ -1749,7 +1752,6 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd) struct hv_host_device *host_dev = shost_priv(host); struct hv_device *dev = host_dev->dev; struct storvsc_cmd_request *cmd_request = scsi_cmd_priv(scmnd); - int i; struct scatterlist *sgl; unsigned int sg_count; struct vmscsi_request *vm_srb; @@ -1831,10 +1833,11 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd) payload_sz = sizeof(cmd_request->mpb); if (sg_count) { - unsigned int hvpgoff, hvpfns_to_add; unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset); unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length); - u64 hvpfn; + struct scatterlist *sg; + unsigned long hvpfn, hvpfns_to_add; + int j, i = 0; if (hvpg_count > MAX_PAGE_BUFFER_COUNT) { @@ -1848,21 +1851,22 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd) payload->range.len = length; payload->range.offset = offset_in_hvpg; + sg_count = scsi_dma_map(scmnd); + if (sg_count < 0) + return SCSI_MLQUEUE_DEVICE_BUSY; - for (i = 0; sgl != NULL; sgl = sg_next(sgl)) { + for_each_sg(sgl, sg, sg_count, j) { /* -* Init values for the current sgl entry. hvpgoff -* and hvpfns_to_add are in units of Hyper-V size -* pages. Handling the PAGE_SIZE != HV_HYP_PAGE_SIZE -* case also handles values of sgl->offset that are -* larger than PAGE_SIZE. Such offsets are handled -* even on other than the first sgl entry, provided -* they are a multiple of PAGE_SIZE. +* Init values for the current sgl entry. hvpfns_to_add +* is in units of Hyper-V size pages. Handling the +* PAGE_SIZE != HV_HYP_PAGE_SIZE case also handles +* values of sgl->offset that are larger than PAGE_SIZE. +* Such offsets are handled even on other than the first +* sgl entry, provided they are a multiple of PAGE_SIZE. */ - hvpgoff = HVPFN_DOWN(sgl->offset); - hvpfn = page_to_hvpfn(sg_page(sgl)) + hvpgoff; - hvpfns_to_add = HVPFN_UP(sgl->offset + sgl->length) - - hvpgoff; + hvpfn = HVPFN_DOWN(sg_dma_address(sg)); + hvpfns_to_add = HVPFN_UP(sg_dma_address(sg) + +
[PATCH V3 2/5] x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()
From: Tianyu Lan Hyper-V provides Isolation VM which has memory encrypt support. Add hyperv_cc_platform_has() and return true for check of GUEST_MEM_ENCRYPT attribute. Signed-off-by: Tianyu Lan --- arch/x86/kernel/cc_platform.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c index 03bb2f343ddb..f3bb0431f5c5 100644 --- a/arch/x86/kernel/cc_platform.c +++ b/arch/x86/kernel/cc_platform.c @@ -11,6 +11,7 @@ #include #include +#include #include static bool __maybe_unused intel_cc_platform_has(enum cc_attr attr) @@ -58,9 +59,23 @@ static bool amd_cc_platform_has(enum cc_attr attr) #endif } +static bool hyperv_cc_platform_has(enum cc_attr attr) +{ +#ifdef CONFIG_HYPERV + if (attr == CC_ATTR_GUEST_MEM_ENCRYPT) + return true; + else + return false; +#else + return false; +#endif +} bool cc_platform_has(enum cc_attr attr) { + if (hv_is_isolation_supported()) + return hyperv_cc_platform_has(attr); + if (sme_me_mask) return amd_cc_platform_has(attr); -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH V3 1/5] Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM
From: Tianyu Lan In Isolation VM with AMD SEV, bounce buffer needs to be accessed via extra address space which is above shared_gpa_boundary (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access physical address will be original physical address + shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of memory(vTOM). Memory addresses below vTOM are automatically treated as private while memory above vTOM is treated as shared. Expose swiotlb_unencrypted_base for platforms to set unencrypted memory base offset and platform calls swiotlb_update_mem_attributes() to remap swiotlb mem to unencrypted address space. memremap() can not be called in the early stage and so put remapping code into swiotlb_update_mem_attributes(). Store remap address and use it to copy data from/to swiotlb bounce buffer. Signed-off-by: Tianyu Lan --- Change since v2: * Leave mem->vaddr with phys_to_virt(mem->start) when fail to remap swiotlb memory. Change since v1: * Rework comment in the swiotlb_init_io_tlb_mem() * Make swiotlb_init_io_tlb_mem() back to return void. --- include/linux/swiotlb.h | 6 ++ kernel/dma/swiotlb.c| 47 - 2 files changed, 48 insertions(+), 5 deletions(-) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 569272871375..f6c3638255d5 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -73,6 +73,9 @@ extern enum swiotlb_force swiotlb_force; * @end: The end address of the swiotlb memory pool. Used to do a quick * range check to see if the memory was in fact allocated by this * API. + * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb memory pool + * may be remapped in the memory encrypted case and store virtual + * address for bounce buffer operation. * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and * @end. For default swiotlb, this is command line adjustable via * setup_io_tlb_npages. @@ -92,6 +95,7 @@ extern enum swiotlb_force swiotlb_force; struct io_tlb_mem { phys_addr_t start; phys_addr_t end; + void *vaddr; unsigned long nslabs; unsigned long used; unsigned int index; @@ -186,4 +190,6 @@ static inline bool is_swiotlb_for_alloc(struct device *dev) } #endif /* CONFIG_DMA_RESTRICTED_POOL */ +extern phys_addr_t swiotlb_unencrypted_base; + #endif /* __LINUX_SWIOTLB_H */ diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 8e840fbbed7c..adb9d06af5c8 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -50,6 +50,7 @@ #include #include +#include #include #include #include @@ -72,6 +73,8 @@ enum swiotlb_force swiotlb_force; struct io_tlb_mem io_tlb_default_mem; +phys_addr_t swiotlb_unencrypted_base; + /* * Max segment that we can provide which (if pages are contingous) will * not be bounced (unless SWIOTLB_FORCE is set). @@ -155,6 +158,27 @@ static inline unsigned long nr_slots(u64 val) return DIV_ROUND_UP(val, IO_TLB_SIZE); } +/* + * Remap swioltb memory in the unencrypted physical address space + * when swiotlb_unencrypted_base is set. (e.g. for Hyper-V AMD SEV-SNP + * Isolation VMs). + */ +void *swiotlb_mem_remap(struct io_tlb_mem *mem, unsigned long bytes) +{ + void *vaddr = NULL; + + if (swiotlb_unencrypted_base) { + phys_addr_t paddr = mem->start + swiotlb_unencrypted_base; + + vaddr = memremap(paddr, bytes, MEMREMAP_WB); + if (!vaddr) + pr_err("Failed to map the unencrypted memory %llx size %lx.\n", + paddr, bytes); + } + + return vaddr; +} + /* * Early SWIOTLB allocation may be too early to allow an architecture to * perform the desired operations. This function allows the architecture to @@ -172,7 +196,12 @@ void __init swiotlb_update_mem_attributes(void) vaddr = phys_to_virt(mem->start); bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT); set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT); - memset(vaddr, 0, bytes); + + mem->vaddr = swiotlb_mem_remap(mem, bytes); + if (!mem->vaddr) + mem->vaddr = vaddr; + + memset(mem->vaddr, 0, bytes); } static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, @@ -196,7 +225,18 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, mem->slots[i].orig_addr = INVALID_PHYS_ADDR; mem->slots[i].alloc_size = 0; } + + /* +* If swiotlb_unencrypted_base is set, the bounce buffer memory will +* be remapped and cleared in swiotlb_update_mem_attributes. +*/ + if (swiotlb_unencrypted_base)
[PATCH V3 0/5] x86/Hyper-V: Add Hyper-V Isolation VM support(Second part)
From: Tianyu Lan Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset is to add support for these Isolation VM support in Linux. The memory of these vms are encrypted and host can't access guest memory directly. Hyper-V provides new host visibility hvcall and the guest needs to call new hvcall to mark memory visible to host before sharing memory with host. For security, all network/storage stack memory should not be shared with host and so there is bounce buffer requests. Vmbus channel ring buffer already plays bounce buffer role because all data from/to host needs to copy from/to between the ring buffer and IO stack memory. So mark vmbus channel ring buffer visible. For SNP isolation VM, guest needs to access the shared memory via extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_ ISOLATION_CONFIG. The access physical address of the shared memory should be bounce buffer memory GPA plus with shared_gpa_boundary reported by CPUID. This patchset is to enable swiotlb bounce buffer for netvsc/storvsc in Isolation VM. This version follows Michael Kelley suggestion in the following link. https://lkml.org/lkml/2021/11/24/2044 Change since v2: * Remove Hyper-V dma ops and dma_alloc/free_noncontiguous. Add hv_map/unmap_memory() to map/umap netvsc rx/tx ring into extra address space. * Leave mem->vaddr in swiotlb code with phys_to_virt(mem->start) when fail to remap swiotlb memory. Change since v1: * Add Hyper-V Isolation support check in the cc_platform_has() and return true for guest memory encrypt attr. * Remove hv isolation check in the sev_setup_arch() Tianyu Lan (5): Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has() hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM scsi: storvsc: Add Isolation VM support for storvsc driver hv_netvsc: Add Isolation VM support for netvsc driver arch/x86/hyperv/ivm.c | 28 ++ arch/x86/kernel/cc_platform.c | 15 arch/x86/xen/pci-swiotlb-xen.c| 3 +- drivers/hv/hv_common.c| 11 +++ drivers/hv/vmbus_drv.c| 4 + drivers/iommu/hyperv-iommu.c | 56 drivers/net/hyperv/hyperv_net.h | 5 ++ drivers/net/hyperv/netvsc.c | 136 +- drivers/net/hyperv/netvsc_drv.c | 1 + drivers/net/hyperv/rndis_filter.c | 2 + drivers/scsi/storvsc_drv.c| 37 include/asm-generic/mshyperv.h| 2 + include/linux/hyperv.h| 14 +++ include/linux/swiotlb.h | 6 ++ kernel/dma/swiotlb.c | 47 +-- 15 files changed, 342 insertions(+), 25 deletions(-) -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu