Re: [PATCH v12] i2c: virtio: add a virtio i2c frontend driver
On 05-07-21, 14:21, Jie Deng wrote: > > On 2021/7/5 10:43, Viresh Kumar wrote: > > On 02-07-21, 12:58, Andy Shevchenko wrote: > > > On Fri, Jul 02, 2021 at 04:46:47PM +0800, Jie Deng wrote: > > > > +static int virtio_i2c_complete_reqs(struct virtqueue *vq, > > > > + struct virtio_i2c_req *reqs, > > > > + struct i2c_msg *msgs, int nr, > > > > + bool fail) > > > > +{ > > > > + struct virtio_i2c_req *req; > > > > + bool failed = fail; > > Jie, you can actually get rid of this variable too. Jut rename fail to > > failed > > and everything shall work as you want. > > > Oh, You are not right. I just found we can't remove this variable. The > "fail" and "failed" have different > > meanings for this function. We need fail to return the result. Ahh, yes. You are right. Maybe rename fail to timedout, it would make it more readable, else fail and failed look very similar. > > > > + unsigned int len; > > > > + int i, j = 0; > > > > + > > > > + for (i = 0; i < nr; i++) { > > > > + /* Detach the ith request from the vq */ > > > > + req = virtqueue_get_buf(vq, ); > > > > + > > > > + /* > > > > +* Condition (req && req == [i]) should always > > > > meet since > > > > +* we have total nr requests in the vq. > > > > +*/ > > > > + if (!failed && (WARN_ON(!(req && req == [i])) || > > > > + (req->in_hdr.status != VIRTIO_I2C_MSG_OK))) > > > > + failed = true; > > > ...and after failed is true, we are continuing the loop, why? > > Actually this function can be called with fail set to true. We proceed as we > > need to call i2c_put_dma_safe_msg_buf() for all buffers we allocated > > earlier. > > > > > > + i2c_put_dma_safe_msg_buf(reqs[i].buf, [i], > > > > !failed); > > > > + if (!failed) > > > > + ++j; > > > Besides better to read j++ the j itself can be renamed to something more > > > verbose. > > > > > > > + } > > > > + return (fail ? -ETIMEDOUT : j); > > > Redundant parentheses. > > > > > > > +} -- viresh ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[RFC PATCH 05/12] x86/sev: Use GHCB protocol version 2 if supported
From: Joerg Roedel Check whether the hypervisor supports GHCB version 2 and use it if available. Signed-off-by: Joerg Roedel --- arch/x86/boot/compressed/sev.c | 10 -- arch/x86/include/asm/sev.h | 4 ++-- arch/x86/kernel/sev-shared.c | 17 ++--- 3 files changed, 24 insertions(+), 7 deletions(-) diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c index 101e08c67296..7f8416f76be7 100644 --- a/arch/x86/boot/compressed/sev.c +++ b/arch/x86/boot/compressed/sev.c @@ -119,16 +119,22 @@ static enum es_result vc_read_mem(struct es_em_ctxt *ctxt, /* Include code for early handlers */ #include "../../kernel/sev-shared.c" +static unsigned int ghcb_protocol; + static u64 sev_get_ghcb_proto_ver(void) { - return GHCB_PROTOCOL_MAX; + return ghcb_protocol; } static bool early_setup_sev_es(void) { - if (!sev_es_negotiate_protocol(NULL)) + struct sev_ghcb_protocol_info info; + + if (!sev_es_negotiate_protocol()) sev_es_terminate(GHCB_SEV_ES_REASON_PROTOCOL_UNSUPPORTED); + ghcb_protocol = info.vm_proto; + if (set_page_decrypted((unsigned long)_ghcb_page)) return false; diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h index fa5cd05d3b5b..134a7c9d91b6 100644 --- a/arch/x86/include/asm/sev.h +++ b/arch/x86/include/asm/sev.h @@ -12,8 +12,8 @@ #include #include -#define GHCB_PROTO_OUR 0x0001UL -#define GHCB_PROTOCOL_MAX 1ULL +#define GHCB_PROTOCOL_MIN 1ULL +#define GHCB_PROTOCOL_MAX 2ULL #define GHCB_DEFAULT_USAGE 0ULL #defineVMGEXIT() { asm volatile("rep; vmmcall\n\r"); } diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c index 36eaac2773ed..40a1ca81bdb8 100644 --- a/arch/x86/kernel/sev-shared.c +++ b/arch/x86/kernel/sev-shared.c @@ -61,6 +61,7 @@ static void __noreturn sev_es_terminate(unsigned int reason) static bool sev_es_negotiate_protocol(struct sev_ghcb_protocol_info *info) { + unsigned int protocol; u64 val; /* Do the GHCB protocol version negotiation */ @@ -71,14 +72,24 @@ static bool sev_es_negotiate_protocol(struct sev_ghcb_protocol_info *info) if (GHCB_MSR_INFO(val) != GHCB_MSR_SEV_INFO_RESP) return false; - if (GHCB_MSR_PROTO_MAX(val) < GHCB_PROTO_OUR || - GHCB_MSR_PROTO_MIN(val) > GHCB_PROTO_OUR) + /* Sanity check untrusted input */ + if (GHCB_MSR_PROTO_MIN(val) > GHCB_MSR_PROTO_MAX(val)) + return false; + + /* Use maximum supported protocol version */ + protocol = min_t(unsigned int, GHCB_MSR_PROTO_MAX(val), GHCB_PROTOCOL_MAX); + + /* +* Hypervisor does not support any protocol version required for this +* kernel. +*/ + if (protocol < GHCB_MSR_PROTO_MIN(val)) return false; if (info) { info->hv_proto_min = GHCB_MSR_PROTO_MIN(val); info->hv_proto_max = GHCB_MSR_PROTO_MAX(val); - info->vm_proto = GHCB_PROTO_OUR; + info->vm_proto = protocol; } return true; -- 2.31.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[RFC PATCH 10/12] x86/sev: Add MMIO handling support to boot/compressed/ code
From: Joerg Roedel Move the code for MMIO handling in the #VC handler to sev-shared.c so that it can be used in the decompressor code. The decompressor needs to handle MMIO events for writing to the VGA framebuffer. When the kernel is booted via UEFI the VGA console is not enabled that early, but a kexec boot will enable it and the decompressor needs MMIO support to write to the frame buffer. This also requires to share some code from lib/insn-eval.c. Since insn-eval.c can't be included into the decompressor code directly, move the relevant parts into lib/insn-eval-shared.c and include that file. Signed-off-by: Joerg Roedel --- arch/x86/boot/compressed/sev.c | 43 +- arch/x86/kernel/sev-shared.c| 282 +++ arch/x86/kernel/sev.c | 282 --- arch/x86/lib/insn-eval-shared.c | 805 arch/x86/lib/insn-eval.c| 802 +-- 5 files changed, 1116 insertions(+), 1098 deletions(-) create mode 100644 arch/x86/lib/insn-eval-shared.c diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c index 7f8416f76be7..3ffb3d873989 100644 --- a/arch/x86/boot/compressed/sev.c +++ b/arch/x86/boot/compressed/sev.c @@ -26,24 +26,8 @@ struct ghcb boot_ghcb_page __aligned(PAGE_SIZE); struct ghcb *boot_ghcb; -/* - * Copy a version of this function here - insn-eval.c can't be used in - * pre-decompression code. - */ -static bool insn_has_rep_prefix(struct insn *insn) -{ - insn_byte_t p; - int i; - - insn_get_prefixes(insn); - - for_each_insn_prefix(insn, i, p) { - if (p == 0xf2 || p == 0xf3) - return true; - } - - return false; -} +#undef WARN_ONCE +#define WARN_ONCE(condition, format...) /* * Only a dummy for insn_get_seg_base() - Early boot-code is 64bit only and @@ -54,6 +38,17 @@ static unsigned long insn_get_seg_base(struct pt_regs *regs, int seg_reg_idx) return 0UL; } +/* The decompressor only uses flat segments */ +static int get_seg_base_limit(struct insn *insn, struct pt_regs *regs, + int regoff, unsigned long *base, + unsigned long *limit) +{ + if (base) + *base = 0L; + if (limit) + *limit = ~0L; +} + static inline u64 sev_es_rd_ghcb_msr(void) { unsigned long low, high; @@ -105,6 +100,14 @@ static enum es_result vc_read_mem(struct es_em_ctxt *ctxt, return ES_OK; } +static enum es_result vc_slow_virt_to_phys(struct ghcb *ghcb, struct es_em_ctxt *ctxt, + unsigned long vaddr, phys_addr_t *paddr) +{ + *paddr = (phys_addr_t)vaddr; + + return ES_OK; +} + #undef __init #undef __pa #define __init @@ -115,6 +118,7 @@ static enum es_result vc_read_mem(struct es_em_ctxt *ctxt, /* Basic instruction decoding support needed */ #include "../../lib/inat.c" #include "../../lib/insn.c" +#include "../../lib/insn-eval-shared.c" /* Include code for early handlers */ #include "../../kernel/sev-shared.c" @@ -204,6 +208,9 @@ void do_boot_stage2_vc(struct pt_regs *regs, unsigned long exit_code) case SVM_EXIT_CPUID: result = vc_handle_cpuid(boot_ghcb, ); break; + case SVM_EXIT_NPF: + result = vc_handle_mmio(boot_ghcb, ); + break; default: result = ES_UNSUPPORTED; break; diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c index 40a1ca81bdb8..a7a0793c4f98 100644 --- a/arch/x86/kernel/sev-shared.c +++ b/arch/x86/kernel/sev-shared.c @@ -558,3 +558,285 @@ static enum es_result vc_handle_rdtsc(struct ghcb *ghcb, return ES_OK; } + +static long *vc_insn_get_reg(struct es_em_ctxt *ctxt) +{ + long *reg_array; + int offset; + + reg_array = (long *)ctxt->regs; + offset= insn_get_modrm_reg_off(>insn, ctxt->regs); + + if (offset < 0) + return NULL; + + offset /= sizeof(long); + + return reg_array + offset; +} + +static long *vc_insn_get_rm(struct es_em_ctxt *ctxt) +{ + long *reg_array; + int offset; + + reg_array = (long *)ctxt->regs; + offset= insn_get_modrm_rm_off(>insn, ctxt->regs); + + if (offset < 0) + return NULL; + + offset /= sizeof(long); + + return reg_array + offset; +} +static enum es_result vc_do_mmio(struct ghcb *ghcb, struct es_em_ctxt *ctxt, +unsigned int bytes, bool read) +{ + u64 exit_code, exit_info_1, exit_info_2; + unsigned long ghcb_pa = __pa(ghcb); + enum es_result res; + phys_addr_t paddr; + void __user *ref; + + ref = insn_get_addr_ref(>insn, ctxt->regs); + if (ref == (void __user *)-1L) + return ES_UNSUPPORTED; + + exit_code = read ? SVM_VMGEXIT_MMIO_READ : SVM_VMGEXIT_MMIO_WRITE; + + res
[RFC PATCH 07/12] x86/sev: Setup code to park APs in the AP Jump Table
From: Joerg Roedel The AP Jump Table under SEV-ES contains the reset vector where non-boot CPUs start executing when coming out of reset. This means that a CPU coming out of the AP-reset-hold VMGEXIT also needs to start executing at the reset vector stored in the AP Jump Table. The problem is to find a safe place to put the real-mode code which executes the VMGEXIT and jumps to the reset vector. The code can not be in kernel memory, because after kexec that memory is owned by the new kernel and the code might have been overwritten. Fortunately the AP Jump Table itself is a safe place, because the memory is not owned by the OS and will not be overwritten by a new kernel started through kexec. The table is 4k in size and only the first 4 bytes are used for the reset vector. This leaves enough space for some 16-bit code to do the job and even a small stack. Install 16-bit code into the AP Jump Table under SEV-ES after the APs have been brought up. The code will do an AP-reset-hold VMGEXIT and jump to the reset vector after being woken up. Signed-off-by: Joerg Roedel --- arch/x86/include/asm/realmode.h | 2 + arch/x86/include/asm/sev-ap-jumptable.h | 25 + arch/x86/kernel/sev.c | 105 +++ arch/x86/realmode/Makefile | 9 +- arch/x86/realmode/rmpiggy.S | 6 ++ arch/x86/realmode/sev/Makefile | 41 arch/x86/realmode/sev/ap_jump_table.S | 130 arch/x86/realmode/sev/ap_jump_table.lds | 24 + 8 files changed, 341 insertions(+), 1 deletion(-) create mode 100644 arch/x86/include/asm/sev-ap-jumptable.h create mode 100644 arch/x86/realmode/sev/Makefile create mode 100644 arch/x86/realmode/sev/ap_jump_table.S create mode 100644 arch/x86/realmode/sev/ap_jump_table.lds diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h index 5db5d083c873..29590a4ddf24 100644 --- a/arch/x86/include/asm/realmode.h +++ b/arch/x86/include/asm/realmode.h @@ -62,6 +62,8 @@ extern unsigned long initial_gs; extern unsigned long initial_stack; #ifdef CONFIG_AMD_MEM_ENCRYPT extern unsigned long initial_vc_handler; +extern unsigned char rm_ap_jump_table_blob[]; +extern unsigned char rm_ap_jump_table_blob_end[]; #endif extern unsigned char real_mode_blob[]; diff --git a/arch/x86/include/asm/sev-ap-jumptable.h b/arch/x86/include/asm/sev-ap-jumptable.h new file mode 100644 index ..1c8b2ce779e2 --- /dev/null +++ b/arch/x86/include/asm/sev-ap-jumptable.h @@ -0,0 +1,25 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * AMD Encrypted Register State Support + * + * Author: Joerg Roedel + */ +#ifndef __ASM_SEV_AP_JUMPTABLE_H +#define __ASM_SEV_AP_JUMPTABLE_H + +#defineSEV_APJT_CS16 0x8 +#defineSEV_APJT_DS16 0x10 + +#define SEV_APJT_ENTRY 0x10 + +#ifndef __ASSEMBLY__ + +struct sev_ap_jump_table_header { + u16 reset_ip; + u16 reset_cs; + u16 gdt_offset; +}; + +#endif /* !__ASSEMBLY__ */ + +#endif /* __ASM_SEV_AP_JUMPTABLE_H */ diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c index eedba56b6bac..0946ef732a62 100644 --- a/arch/x86/kernel/sev.c +++ b/arch/x86/kernel/sev.c @@ -19,6 +19,7 @@ #include #include +#include #include #include #include @@ -45,6 +46,9 @@ static struct ghcb __initdata *boot_ghcb; /* Cached AP Jump Table Address */ static phys_addr_t sev_es_jump_table_pa; +/* Whether the AP Jump Table blob was successfully installed */ +static bool sev_ap_jumptable_blob_installed __ro_after_init; + /* #VC handler runtime per-CPU data */ struct sev_es_runtime_data { struct ghcb ghcb_page; @@ -749,6 +753,107 @@ static void __init sev_es_setup_play_dead(void) static inline void sev_es_setup_play_dead(void) { } #endif +/* + * This function make the necessary runtime changes to the AP Jump Table blob. + * For now this only sets up the GDT used while the code executes. The GDT needs + * to contain 16-bit code and data segments with a base that points to AP Jump + * Table page. + */ +static void __init sev_es_setup_ap_jump_table_data(void *base, u32 pa) +{ + struct sev_ap_jump_table_header *header; + struct desc_ptr *gdt_descr; + u64 *ap_jumptable_gdt; + + header = base; + + /* +* Setup 16-bit protected mode code and data segments for AP Jumptable. +* Set the segment limits to 0x to already be compatible with +* real-mode. +*/ + ap_jumptable_gdt = (u64 *)(base + header->gdt_offset); + ap_jumptable_gdt[SEV_APJT_CS16 / 8] = GDT_ENTRY(0x9b, pa, 0x); + ap_jumptable_gdt[SEV_APJT_DS16 / 8] = GDT_ENTRY(0x93, pa, 0x); + + /* Write correct GDT base address into GDT descriptor */ + gdt_descr = (struct desc_ptr *)(base + header->gdt_offset); + gdt_descr->address += pa; +} + +/* + * This function sets up the AP Jump Table blob which contains code which runs + * in 16-bit protected mode to
[RFC PATCH 11/12] x86/sev: Handle CLFLUSH MMIO events
From: Joerg Roedel Handle CLFLUSH instruction to MMIO memory in the #VC handler. The instruction is ignored by the handler, as the Hypervisor is responsible for cache management of emulated MMIO memory. Signed-off-by: Joerg Roedel --- arch/x86/kernel/sev-shared.c | 9 + 1 file changed, 9 insertions(+) diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c index a7a0793c4f98..682fa202444f 100644 --- a/arch/x86/kernel/sev-shared.c +++ b/arch/x86/kernel/sev-shared.c @@ -632,6 +632,15 @@ static enum es_result vc_handle_mmio_twobyte_ops(struct ghcb *ghcb, long *reg_data; switch (insn->opcode.bytes[1]) { + /* CLFLUSH */ + case 0xae: + /* +* Ignore CLFLUSHes - those go to emulated MMIO anyway and the +* hypervisor is responsible for cache management. +*/ + ret = ES_OK; + break; + /* MMIO Read w/ zero-extension */ case 0xb6: bytes = 1; -- 2.31.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[RFC PATCH 08/12] x86/sev: Park APs on AP Jump Table with GHCB protocol version 2
From: Joerg Roedel GHCB protocol version 2 adds the MSR-based AP-reset-hold VMGEXIT which does not need a GHCB. Use that to park APs in 16-bit protected mode on the AP Jump Table. Signed-off-by: Joerg Roedel --- arch/x86/include/asm/realmode.h| 3 + arch/x86/kernel/sev.c | 48 ++-- arch/x86/realmode/rm/Makefile | 11 ++-- arch/x86/realmode/rm/header.S | 3 + arch/x86/realmode/rm/sev_ap_park.S | 89 ++ 5 files changed, 144 insertions(+), 10 deletions(-) create mode 100644 arch/x86/realmode/rm/sev_ap_park.S diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h index 29590a4ddf24..668de0a8b1ae 100644 --- a/arch/x86/include/asm/realmode.h +++ b/arch/x86/include/asm/realmode.h @@ -23,6 +23,9 @@ struct real_mode_header { u32 trampoline_header; #ifdef CONFIG_AMD_MEM_ENCRYPT u32 sev_es_trampoline_start; + u32 sev_real_ap_park_asm; + u32 sev_real_ap_park_seg; + u32 sev_ap_park_gdt; #endif #ifdef CONFIG_X86_64 u32 trampoline_pgd; diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c index 0946ef732a62..2147ebd0e919 100644 --- a/arch/x86/kernel/sev.c +++ b/arch/x86/kernel/sev.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include #include @@ -695,6 +696,35 @@ static bool __init sev_es_setup_ghcb(void) } #ifdef CONFIG_HOTPLUG_CPU +static void __noreturn sev_jumptable_ap_park(void) +{ + local_irq_disable(); + + write_cr3(real_mode_header->trampoline_pgd); + + /* Exiting long mode will fail if CR4.PCIDE is set. */ + if (boot_cpu_has(X86_FEATURE_PCID)) + cr4_clear_bits(X86_CR4_PCIDE); + + asm volatile("xorq %%r15, %%r15\n" +"xorq %%r14, %%r14\n" +"xorq %%r13, %%r13\n" +"xorq %%r12, %%r12\n" +"xorq %%r11, %%r11\n" +"xorq %%r10, %%r10\n" +"xorq %%r9, %%r9\n" +"xorq %%r8, %%r8\n" +"xorq %%rsi, %%rsi\n" +"xorq %%rdi, %%rdi\n" +"xorq %%rsp, %%rsp\n" +"xorq %%rbp, %%rbp\n" +"ljmpl *%0" : : +"m" (real_mode_header->sev_real_ap_park_asm), +"b" (sev_es_jump_table_pa >> 4)); + unreachable(); +} +STACK_FRAME_NON_STANDARD(sev_jumptable_ap_park); + static void sev_es_ap_hlt_loop(void) { struct ghcb_state state; @@ -731,8 +761,10 @@ static void sev_es_play_dead(void) play_dead_common(); /* IRQs now disabled */ - - sev_es_ap_hlt_loop(); + if (sev_ap_jumptable_blob_installed) + sev_jumptable_ap_park(); + else + sev_es_ap_hlt_loop(); /* * If we get here, the VCPU was woken up again. Jump to CPU @@ -762,8 +794,9 @@ static inline void sev_es_setup_play_dead(void) { } static void __init sev_es_setup_ap_jump_table_data(void *base, u32 pa) { struct sev_ap_jump_table_header *header; + u64 *ap_jumptable_gdt, *sev_ap_park_gdt; struct desc_ptr *gdt_descr; - u64 *ap_jumptable_gdt; + int idx; header = base; @@ -773,8 +806,13 @@ static void __init sev_es_setup_ap_jump_table_data(void *base, u32 pa) * real-mode. */ ap_jumptable_gdt = (u64 *)(base + header->gdt_offset); - ap_jumptable_gdt[SEV_APJT_CS16 / 8] = GDT_ENTRY(0x9b, pa, 0x); - ap_jumptable_gdt[SEV_APJT_DS16 / 8] = GDT_ENTRY(0x93, pa, 0x); + sev_ap_park_gdt = __va(real_mode_header->sev_ap_park_gdt); + + idx = SEV_APJT_CS16 / 8; + ap_jumptable_gdt[idx] = sev_ap_park_gdt[idx] = GDT_ENTRY(0x9b, pa, 0x); + + idx = SEV_APJT_DS16 / 8; + ap_jumptable_gdt[idx] = sev_ap_park_gdt[idx] = GDT_ENTRY(0x93, pa, 0x); /* Write correct GDT base address into GDT descriptor */ gdt_descr = (struct desc_ptr *)(base + header->gdt_offset); diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile index 83f1b6a56449..a7f84d43a0a3 100644 --- a/arch/x86/realmode/rm/Makefile +++ b/arch/x86/realmode/rm/Makefile @@ -27,11 +27,12 @@ wakeup-objs += video-vga.o wakeup-objs+= video-vesa.o wakeup-objs+= video-bios.o -realmode-y += header.o -realmode-y += trampoline_$(BITS).o -realmode-y += stack.o -realmode-y += reboot.o -realmode-$(CONFIG_ACPI_SLEEP) += $(wakeup-objs) +realmode-y += header.o +realmode-y += trampoline_$(BITS).o +realmode-y += stack.o +realmode-y += reboot.o +realmode-$(CONFIG_ACPI_SLEEP) += $(wakeup-objs) +realmode-$(CONFIG_AMD_MEM_ENCRYPT)
[RFC PATCH 04/12] x86/sev: Do not hardcode GHCB protocol version
From: Joerg Roedel Introduce the sev_get_ghcb_proto_ver() which will return the negotiated GHCB protocol version and use it to set the version field in the GHCB. Signed-off-by: Joerg Roedel --- arch/x86/boot/compressed/sev.c | 5 + arch/x86/kernel/sev-shared.c | 5 - arch/x86/kernel/sev.c | 5 + 3 files changed, 14 insertions(+), 1 deletion(-) diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c index 1a2e49730f8b..101e08c67296 100644 --- a/arch/x86/boot/compressed/sev.c +++ b/arch/x86/boot/compressed/sev.c @@ -119,6 +119,11 @@ static enum es_result vc_read_mem(struct es_em_ctxt *ctxt, /* Include code for early handlers */ #include "../../kernel/sev-shared.c" +static u64 sev_get_ghcb_proto_ver(void) +{ + return GHCB_PROTOCOL_MAX; +} + static bool early_setup_sev_es(void) { if (!sev_es_negotiate_protocol(NULL)) diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c index 73eeb5897d16..36eaac2773ed 100644 --- a/arch/x86/kernel/sev-shared.c +++ b/arch/x86/kernel/sev-shared.c @@ -28,6 +28,9 @@ struct sev_ghcb_protocol_info { unsigned int vm_proto; }; +/* Returns the negotiated GHCB Protocol version */ +static u64 sev_get_ghcb_proto_ver(void); + static bool __init sev_es_check_cpu_features(void) { if (!has_cpuflag(X86_FEATURE_RDRAND)) { @@ -122,7 +125,7 @@ static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb, enum es_result ret; /* Fill in protocol and format specifiers */ - ghcb->protocol_version = GHCB_PROTOCOL_MAX; + ghcb->protocol_version = sev_get_ghcb_proto_ver(); ghcb->ghcb_usage = GHCB_DEFAULT_USAGE; ghcb_set_sw_exit_code(ghcb, exit_code); diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c index 8084bfd7cce1..5d3422e8b25e 100644 --- a/arch/x86/kernel/sev.c +++ b/arch/x86/kernel/sev.c @@ -498,6 +498,11 @@ static enum es_result vc_slow_virt_to_phys(struct ghcb *ghcb, struct es_em_ctxt /* Negotiated GHCB protocol version */ static struct sev_ghcb_protocol_info ghcb_protocol_info __ro_after_init; +static u64 sev_get_ghcb_proto_ver(void) +{ + return ghcb_protocol_info.vm_proto; +} + static noinstr void __sev_put_ghcb(struct ghcb_state *state) { struct sev_es_runtime_data *data; -- 2.31.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[RFC PATCH 06/12] x86/sev: Cache AP Jump Table Address
From: Joerg Roedel Store the physical address of the AP Jump Table in kernel memory so that it does not need to be fetched from the Hypervisor again. Signed-off-by: Joerg Roedel --- arch/x86/kernel/sev.c | 26 ++ 1 file changed, 14 insertions(+), 12 deletions(-) diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c index 5d3422e8b25e..eedba56b6bac 100644 --- a/arch/x86/kernel/sev.c +++ b/arch/x86/kernel/sev.c @@ -42,6 +42,9 @@ static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE); */ static struct ghcb __initdata *boot_ghcb; +/* Cached AP Jump Table Address */ +static phys_addr_t sev_es_jump_table_pa; + /* #VC handler runtime per-CPU data */ struct sev_es_runtime_data { struct ghcb ghcb_page; @@ -546,12 +549,14 @@ void noinstr __sev_es_nmi_complete(void) __sev_put_ghcb(); } -static u64 get_jump_table_addr(void) +static phys_addr_t get_jump_table_addr(void) { struct ghcb_state state; unsigned long flags; struct ghcb *ghcb; - u64 ret = 0; + + if (sev_es_jump_table_pa) + return sev_es_jump_table_pa; local_irq_save(flags); @@ -567,39 +572,36 @@ static u64 get_jump_table_addr(void) if (ghcb_sw_exit_info_1_is_valid(ghcb) && ghcb_sw_exit_info_2_is_valid(ghcb)) - ret = ghcb->save.sw_exit_info_2; + sev_es_jump_table_pa = (phys_addr_t)ghcb->save.sw_exit_info_2; __sev_put_ghcb(); local_irq_restore(flags); - return ret; + return sev_es_jump_table_pa; } int sev_es_setup_ap_jump_table(struct real_mode_header *rmh) { u16 startup_cs, startup_ip; - phys_addr_t jump_table_pa; - u64 jump_table_addr; u16 __iomem *jump_table; + phys_addr_t pa; - jump_table_addr = get_jump_table_addr(); + pa = get_jump_table_addr(); /* On UP guests there is no jump table so this is not a failure */ - if (!jump_table_addr) + if (!pa) return 0; /* Check if AP Jump Table is page-aligned */ - if (jump_table_addr & ~PAGE_MASK) + if (pa & ~PAGE_MASK) return -EINVAL; - jump_table_pa = jump_table_addr & PAGE_MASK; - startup_cs = (u16)(rmh->trampoline_start >> 4); startup_ip = (u16)(rmh->sev_es_trampoline_start - rmh->trampoline_start); - jump_table = ioremap_encrypted(jump_table_pa, PAGE_SIZE); + jump_table = ioremap_encrypted(pa, PAGE_SIZE); if (!jump_table) return -EIO; -- 2.31.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[RFC PATCH 00/12] x86/sev: KEXEC/KDUMP support for SEV-ES guests
From: Joerg Roedel Hi, here are changes to enable kexec/kdump in SEV-ES guests. The biggest problem for supporting kexec/kdump under SEV-ES is to find a way to hand the non-boot CPUs (APs) from one kernel to another. Without SEV-ES the first kernel parks the CPUs in a HLT loop until they get reset by the kexec'ed kernel via an INIT-SIPI-SIPI sequence. For virtual machines the CPU reset is emulated by the hypervisor, which sets the vCPU registers back to reset state. This does not work under SEV-ES, because the hypervisor has no access to the vCPU registers and can't make modifications to them. So an SEV-ES guest needs to reset the vCPU itself and park it using the AP-reset-hold protocol. Upon wakeup the guest needs to jump to real-mode and to the reset-vector configured in the AP-Jump-Table. The code to do this is the main part of this patch-set. It works by placing code on the AP Jump-Table page itself to park the vCPU and for jumping to the reset vector upon wakeup. The code on the AP Jump Table runs in 16-bit protected mode with segment base set to the beginning of the page. The AP Jump-Table is usually not within the first 1MB of memory, so the code can't run in real-mode. The AP Jump-Table is the best place to put the parking code, because the memory is owned, but read-only by the firmware and writeable by the OS. Only the first 4 bytes are used for the reset-vector, leaving the rest of the page for code/data/stack to park a vCPU. The code can't be in kernel memory because by the time the vCPU wakes up the memory will be owned by the new kernel, which might have overwritten it already. The other patches add initial GHCB Version 2 protocol support, because kexec/kdump need the MSR-based (without a GHCB) AP-reset-hold VMGEXT, which is a GHCB protocol version 2 feature. The kexec'ed kernel is also entered via the decompressor and needs MMIO support there, so this patch-set also adds MMIO #VC support to the decompressor and support for handling CLFLUSH instructions. Finally there is also code to disable kexec/kdump support at runtime when the environment does not support it (e.g. no GHCB protocol version 2 support or AP Jump Table over 4GB). The diffstat looks big, but most of it is moving code for MMIO #VC support around to make it available to the decompressor. There is also a video showing the code in action: https://www.youtube.com/watch?v=j1AUJANP7Mk Please review. Thanks, Joerg Joerg Roedel (12): kexec: Allow architecture code to opt-out at runtime x86/kexec/64: Forbid kexec when running as an SEV-ES guest x86/sev: Save and print negotiated GHCB protocol version x86/sev: Do not hardcode GHCB protocol version x86/sev: Use GHCB protocol version 2 if supported x86/sev: Cache AP Jump Table Address x86/sev: Setup code to park APs in the AP Jump Table x86/sev: Park APs on AP Jump Table with GHCB protocol version 2 x86/sev: Use AP Jump Table blob to stop CPU x86/sev: Add MMIO handling support to boot/compressed/ code x86/sev: Handle CLFLUSH MMIO events x86/sev: Support kexec under SEV-ES with AP Jump Table blob arch/x86/boot/compressed/sev.c | 56 +- arch/x86/include/asm/realmode.h | 5 + arch/x86/include/asm/sev-ap-jumptable.h | 25 + arch/x86/include/asm/sev.h | 13 +- arch/x86/kernel/machine_kexec_64.c | 12 + arch/x86/kernel/process.c | 8 + arch/x86/kernel/sev-shared.c| 333 +- arch/x86/kernel/sev.c | 494 ++- arch/x86/lib/insn-eval-shared.c | 805 arch/x86/lib/insn-eval.c| 802 +-- arch/x86/realmode/Makefile | 9 +- arch/x86/realmode/rm/Makefile | 11 +- arch/x86/realmode/rm/header.S | 3 + arch/x86/realmode/rm/sev_ap_park.S | 89 +++ arch/x86/realmode/rmpiggy.S | 6 + arch/x86/realmode/sev/Makefile | 41 ++ arch/x86/realmode/sev/ap_jump_table.S | 130 arch/x86/realmode/sev/ap_jump_table.lds | 24 + include/linux/kexec.h | 2 + kernel/kexec.c | 14 + kernel/kexec_file.c | 9 + 21 files changed, 1765 insertions(+), 1126 deletions(-) create mode 100644 arch/x86/include/asm/sev-ap-jumptable.h create mode 100644 arch/x86/lib/insn-eval-shared.c create mode 100644 arch/x86/realmode/rm/sev_ap_park.S create mode 100644 arch/x86/realmode/sev/Makefile create mode 100644 arch/x86/realmode/sev/ap_jump_table.S create mode 100644 arch/x86/realmode/sev/ap_jump_table.lds base-commit: 8d9d46bbf3b6b7ff8edcac33603ab45c29e0e07f -- 2.31.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[RFC PATCH 12/12] x86/sev: Support kexec under SEV-ES with AP Jump Table blob
From: Joerg Roedel When the AP Jump Table blob is installed the kernel can hand over the APs from the old to the new kernel. Enable kexec when the AP Jump Table blob has been installed. Signed-off-by: Joerg Roedel --- arch/x86/include/asm/sev.h | 2 ++ arch/x86/kernel/machine_kexec_64.c | 6 +- arch/x86/kernel/sev.c | 12 3 files changed, 19 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h index cd14b6e10f12..61910caf2a0d 100644 --- a/arch/x86/include/asm/sev.h +++ b/arch/x86/include/asm/sev.h @@ -87,6 +87,7 @@ static __always_inline void sev_es_stop_this_cpu(void) if (static_branch_unlikely(_es_enable_key)) __sev_es_stop_this_cpu(); } +bool sev_kexec_supported(void); #else static inline void sev_es_ist_enter(struct pt_regs *regs) { } static inline void sev_es_ist_exit(void) { } @@ -94,6 +95,7 @@ static inline int sev_es_setup_ap_jump_table(struct real_mode_header *rmh) { ret static inline void sev_es_nmi_complete(void) { } static inline int sev_es_efi_map_ghcbs(pgd_t *pgd) { return 0; } static inline void sev_es_stop_this_cpu(void) { } +static bool sev_kexec_supported(void) { return true; } #endif #endif diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c index f902cc9cc634..d08eae4f7694 100644 --- a/arch/x86/kernel/machine_kexec_64.c +++ b/arch/x86/kernel/machine_kexec_64.c @@ -26,6 +26,7 @@ #include #include #include +#include #ifdef CONFIG_ACPI /* @@ -626,5 +627,8 @@ void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) */ bool arch_kexec_supported(void) { - return !sev_es_active(); + if (!sev_kexec_supported()) + return false; + + return true; } diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c index 113903ba3095..731bbf941a82 100644 --- a/arch/x86/kernel/sev.c +++ b/arch/x86/kernel/sev.c @@ -901,6 +901,18 @@ static int __init sev_es_setup_ap_jump_table_blob(void) } core_initcall(sev_es_setup_ap_jump_table_blob); +bool sev_kexec_supported(void) +{ + /* +* KEXEC with SEV-ES and more than one CPU is only supported +* when the AP Jump Table is installed. +*/ + if (num_possible_cpus() > 1) + return !sev_es_active() || sev_ap_jumptable_blob_installed; + else + return true; +} + static void __init alloc_runtime_data(int cpu) { struct sev_es_runtime_data *data; -- 2.31.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[RFC PATCH 02/12] x86/kexec/64: Forbid kexec when running as an SEV-ES guest
From: Joerg Roedel For now, kexec is not supported when running as an SEV-ES guest. Doing so requires additional hypervisor support and special code to hand over the CPUs to the new kernel in a safe way. Until this is implemented, do not support kexec in SEV-ES guests. Cc: sta...@vger.kernel.org # v5.10+ Signed-off-by: Joerg Roedel --- arch/x86/kernel/machine_kexec_64.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c index c078b0d3ab0e..f902cc9cc634 100644 --- a/arch/x86/kernel/machine_kexec_64.c +++ b/arch/x86/kernel/machine_kexec_64.c @@ -620,3 +620,11 @@ void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) */ set_memory_encrypted((unsigned long)vaddr, pages); } + +/* + * Kexec is not supported in SEV-ES guests yet + */ +bool arch_kexec_supported(void) +{ + return !sev_es_active(); +} -- 2.31.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[RFC PATCH 09/12] x86/sev: Use AP Jump Table blob to stop CPU
From: Joerg Roedel To support kexec under SEV-ES the APs can't be parked with HLT. Upon wakeup the AP needs to find its way to execute at the reset vector set by the new kernel and in real-mode. This is what the AP Jump Table blob provides, so stop the APs the SEV-ES way by calling the AP-reset-hold VMGEXIT from the AP Jump Table. Signed-off-by: Joerg Roedel --- arch/x86/include/asm/sev.h | 7 +++ arch/x86/kernel/process.c | 8 arch/x86/kernel/sev.c | 11 ++- 3 files changed, 25 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h index 134a7c9d91b6..cd14b6e10f12 100644 --- a/arch/x86/include/asm/sev.h +++ b/arch/x86/include/asm/sev.h @@ -81,12 +81,19 @@ static __always_inline void sev_es_nmi_complete(void) __sev_es_nmi_complete(); } extern int __init sev_es_efi_map_ghcbs(pgd_t *pgd); +void __sev_es_stop_this_cpu(void); +static __always_inline void sev_es_stop_this_cpu(void) +{ + if (static_branch_unlikely(_es_enable_key)) + __sev_es_stop_this_cpu(); +} #else static inline void sev_es_ist_enter(struct pt_regs *regs) { } static inline void sev_es_ist_exit(void) { } static inline int sev_es_setup_ap_jump_table(struct real_mode_header *rmh) { return 0; } static inline void sev_es_nmi_complete(void) { } static inline int sev_es_efi_map_ghcbs(pgd_t *pgd) { return 0; } +static inline void sev_es_stop_this_cpu(void) { } #endif #endif diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 5e1f38179f49..c5da02f9100f 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -43,6 +43,7 @@ #include #include #include +#include #include "process.h" @@ -736,6 +737,13 @@ void stop_this_cpu(void *dummy) if (boot_cpu_has(X86_FEATURE_SME)) native_wbinvd(); for (;;) { + /* +* SEV-ES guests need a special stop routine to support +* kexec. Try this first, if it fails the function will +* return and native_halt() is used. +*/ + sev_es_stop_this_cpu(); + /* * Use native_halt() so that memory contents don't change * (stack usage and variables) after possibly issuing the diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c index 2147ebd0e919..5fdfa94c823c 100644 --- a/arch/x86/kernel/sev.c +++ b/arch/x86/kernel/sev.c @@ -695,7 +695,6 @@ static bool __init sev_es_setup_ghcb(void) return true; } -#ifdef CONFIG_HOTPLUG_CPU static void __noreturn sev_jumptable_ap_park(void) { local_irq_disable(); @@ -725,6 +724,16 @@ static void __noreturn sev_jumptable_ap_park(void) } STACK_FRAME_NON_STANDARD(sev_jumptable_ap_park); +void __sev_es_stop_this_cpu(void) +{ + /* Only park in the AP Jump Table when the code has been installed */ + if (!sev_ap_jumptable_blob_installed) + return; + + sev_jumptable_ap_park(); +} + +#ifdef CONFIG_HOTPLUG_CPU static void sev_es_ap_hlt_loop(void) { struct ghcb_state state; -- 2.31.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[RFC PATCH 03/12] x86/sev: Save and print negotiated GHCB protocol version
From: Joerg Roedel Save the results of the GHCB protocol negotiation into a data structure and print information about versions supported and used to the kernel log. Signed-off-by: Joerg Roedel --- arch/x86/boot/compressed/sev.c | 2 +- arch/x86/kernel/sev-shared.c | 22 +- arch/x86/kernel/sev.c | 13 - 3 files changed, 34 insertions(+), 3 deletions(-) diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c index 670e998fe930..1a2e49730f8b 100644 --- a/arch/x86/boot/compressed/sev.c +++ b/arch/x86/boot/compressed/sev.c @@ -121,7 +121,7 @@ static enum es_result vc_read_mem(struct es_em_ctxt *ctxt, static bool early_setup_sev_es(void) { - if (!sev_es_negotiate_protocol()) + if (!sev_es_negotiate_protocol(NULL)) sev_es_terminate(GHCB_SEV_ES_REASON_PROTOCOL_UNSUPPORTED); if (set_page_decrypted((unsigned long)_ghcb_page)) diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c index 9f90f460a28c..73eeb5897d16 100644 --- a/arch/x86/kernel/sev-shared.c +++ b/arch/x86/kernel/sev-shared.c @@ -14,6 +14,20 @@ #define has_cpuflag(f) boot_cpu_has(f) #endif +/* + * struct sev_ghcb_protocol_info - Used to return GHCB protocol + *negotiation details. + * + * @hv_proto_min: Minimum GHCB protocol version supported by Hypervisor + * @hv_proto_max: Maximum GHCB protocol version supported by Hypervisor + * @vm_proto: Protocol version the VM (this kernel) will use + */ +struct sev_ghcb_protocol_info { + unsigned int hv_proto_min; + unsigned int hv_proto_max; + unsigned int vm_proto; +}; + static bool __init sev_es_check_cpu_features(void) { if (!has_cpuflag(X86_FEATURE_RDRAND)) { @@ -42,7 +56,7 @@ static void __noreturn sev_es_terminate(unsigned int reason) asm volatile("hlt\n" : : : "memory"); } -static bool sev_es_negotiate_protocol(void) +static bool sev_es_negotiate_protocol(struct sev_ghcb_protocol_info *info) { u64 val; @@ -58,6 +72,12 @@ static bool sev_es_negotiate_protocol(void) GHCB_MSR_PROTO_MIN(val) > GHCB_PROTO_OUR) return false; + if (info) { + info->hv_proto_min = GHCB_MSR_PROTO_MIN(val); + info->hv_proto_max = GHCB_MSR_PROTO_MAX(val); + info->vm_proto = GHCB_PROTO_OUR; + } + return true; } diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c index a6895e440bc3..8084bfd7cce1 100644 --- a/arch/x86/kernel/sev.c +++ b/arch/x86/kernel/sev.c @@ -495,6 +495,9 @@ static enum es_result vc_slow_virt_to_phys(struct ghcb *ghcb, struct es_em_ctxt /* Include code shared with pre-decompression boot stage */ #include "sev-shared.c" +/* Negotiated GHCB protocol version */ +static struct sev_ghcb_protocol_info ghcb_protocol_info __ro_after_init; + static noinstr void __sev_put_ghcb(struct ghcb_state *state) { struct sev_es_runtime_data *data; @@ -665,7 +668,7 @@ static enum es_result vc_handle_msr(struct ghcb *ghcb, struct es_em_ctxt *ctxt) static bool __init sev_es_setup_ghcb(void) { /* First make sure the hypervisor talks a supported protocol. */ - if (!sev_es_negotiate_protocol()) + if (!sev_es_negotiate_protocol(_protocol_info)) return false; /* @@ -794,6 +797,14 @@ void __init sev_es_init_vc_handling(void) /* Secondary CPUs use the runtime #VC handler */ initial_vc_handler = (unsigned long)kernel_exc_vmm_communication; + + /* +* Print information about supported and negotiated GHCB protocol +* versions. +*/ + pr_info("Hypervisor GHCB protocol version support: min=%u max=%u\n", + ghcb_protocol_info.hv_proto_min, ghcb_protocol_info.hv_proto_max); + pr_info("Using GHCB protocol version %u\n", ghcb_protocol_info.vm_proto); } static void __init vc_early_forward_exception(struct es_em_ctxt *ctxt) -- 2.31.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[RFC PATCH 01/12] kexec: Allow architecture code to opt-out at runtime
From: Joerg Roedel Allow a runtime opt-out of kexec support for architecture code in case the kernel is running in an environment where kexec is not properly supported yet. This will be used on x86 when the kernel is running as an SEV-ES guest. SEV-ES guests need special handling for kexec to hand over all CPUs to the new kernel. This requires special hypervisor support and handling code in the guest which is not yet implemented. Cc: sta...@vger.kernel.org # v5.10+ Signed-off-by: Joerg Roedel --- include/linux/kexec.h | 2 ++ kernel/kexec.c| 14 ++ kernel/kexec_file.c | 9 + 3 files changed, 25 insertions(+) diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 0c994ae37729..400aae677435 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -422,6 +422,8 @@ static inline int kexec_crash_loaded(void) { return 0; } #define kexec_in_progress false #endif /* CONFIG_KEXEC_CORE */ +bool arch_kexec_supported(void); + #endif /* !defined(__ASSEBMLY__) */ #endif /* LINUX_KEXEC_H */ diff --git a/kernel/kexec.c b/kernel/kexec.c index c82c6c06f051..d03134160458 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -195,11 +195,25 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments, * that to happen you need to do that yourself. */ +bool __weak arch_kexec_supported(void) +{ + return true; +} + static inline int kexec_load_check(unsigned long nr_segments, unsigned long flags) { int result; + /* +* The architecture may support kexec in general, but the kernel could +* run in an environment where it is not (yet) possible to execute a new +* kernel. Allow the architecture code to opt-out of kexec support when +* it is running in such an environment. +*/ + if (!arch_kexec_supported()) + return -ENOSYS; + /* We only trust the superuser with rebooting the system. */ if (!capable(CAP_SYS_BOOT) || kexec_load_disabled) return -EPERM; diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index 33400ff051a8..96d08a512e9c 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -358,6 +358,15 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd, int ret = 0, i; struct kimage **dest_image, *image; + /* +* The architecture may support kexec in general, but the kernel could +* run in an environment where it is not (yet) possible to execute a new +* kernel. Allow the architecture code to opt-out of kexec support when +* it is running in such an environment. +*/ + if (!arch_kexec_supported()) + return -ENOSYS; + /* We only trust the superuser with rebooting the system. */ if (!capable(CAP_SYS_BOOT) || kexec_load_disabled) return -EPERM; -- 2.31.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v12] i2c: virtio: add a virtio i2c frontend driver
On 05-07-21, 14:22, Jie Deng wrote: > On 2021/7/5 12:38, Viresh Kumar wrote: > > On 05-07-21, 11:45, Jie Deng wrote: > > > On 2021/7/5 10:40, Viresh Kumar wrote: > > > > On 02-07-21, 16:46, Jie Deng wrote: > > > > The right way of doing this is is making this function return - Error > > > > on failure > > > > and 0 on success. There is no point returning number of successful > > > > additions > > > > here. > > > > > > We need the number for virtio_i2c_complete_reqs to do cleanup. We don't > > > have > > > to > > > > > > do cleanup "num" times every time. Just do it as needed. > > If you do full cleanup here, then you won't required that at the caller > > site. > > > > > > Moreover, on failures this needs to clean up (free the dmabufs) itself, > > > > just > > > > like you did i2c_put_dma_safe_msg_buf() at the end. The caller > > > > shouldn't be > > > > required to handle the error cases by freeing up resources. > > > > > > This function will return the number of requests being successfully > > > prepared > > > and make sure > > > > > > resources of the failed request being freed. And virtio_i2c_complete_reqs > > > will free the > > > > > > resources of those successful request. > > It just looks cleaner to give such responsibility to each and every > > function, > > i.e. if they fail, they should clean stuff up instead of the caller. That's > > the > > normal philosophy you will find across kernel in most of the cases. > > > > > + /* > > > > > + * Condition (req && req == [i]) should always > > > > > meet since > > > > > + * we have total nr requests in the vq. > > > > > + */ > > > > > + if (!failed && (WARN_ON(!(req && req == [i])) || > > > > > + (req->in_hdr.status != VIRTIO_I2C_MSG_OK))) > > > > What about writing this as: > > > > > > > > if (!failed && (WARN_ON(req != [i]) || > > > > (req->in_hdr.status != VIRTIO_I2C_MSG_OK))) > > > > > > > > We don't need to check req here since if req is NULL, we will not do > > > > req->in_hdr > > > > at all. > > > > > > It's right here just because the [i] will never be NULL in our case. > > > But if you see > > > > > > "virtio_i2c_complete_reqs" as an independent function, you need to check > > > the > > > > > > req. From the perspective of the callee, you can't ask the caller always > > > give you > > > > > > the non-NULL parameters. > > We need to keep this driver optimized in its current form. If you see your > > own > > argument here, then why don't you test vq or msgs for a valid pointer ? And > > even > > reqs. > > > > If we know for certain that this will never happen, then it should be > > optimized. > > But if you see a case where reqs[i] can be NULL here, then it would be fine. > > ot the driver. And we don't need to take care of that. > > > This is still not enough to convince me. So I won't change them for now > until I see it > > is the consensus of the majority. Do you see reqs[i] to ever be NULL here ? If not, then if (req) is like if (true). Why would you want to have something like that ? -- viresh ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v13] i2c: virtio: add a virtio i2c frontend driver
On 05-07-21, 12:38, Andy Shevchenko wrote: > Because we do not have "uapi" in the path in /usr/include on the real > system where the linux-headers (or kernel-headers) package is > installed. > > It's still possible that our installation hooks will remove that > "uapi" from the headers, but I think it makes things too complicated. Ahh, right. Yeah, I completely missed that part. -- viresh ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v12] i2c: virtio: add a virtio i2c frontend driver
On 2021/7/5 12:38, Viresh Kumar wrote: On 05-07-21, 11:45, Jie Deng wrote: On 2021/7/5 10:40, Viresh Kumar wrote: On 02-07-21, 16:46, Jie Deng wrote: The right way of doing this is is making this function return - Error on failure and 0 on success. There is no point returning number of successful additions here. We need the number for virtio_i2c_complete_reqs to do cleanup. We don't have to do cleanup "num" times every time. Just do it as needed. If you do full cleanup here, then you won't required that at the caller site. Moreover, on failures this needs to clean up (free the dmabufs) itself, just like you did i2c_put_dma_safe_msg_buf() at the end. The caller shouldn't be required to handle the error cases by freeing up resources. This function will return the number of requests being successfully prepared and make sure resources of the failed request being freed. And virtio_i2c_complete_reqs will free the resources of those successful request. It just looks cleaner to give such responsibility to each and every function, i.e. if they fail, they should clean stuff up instead of the caller. That's the normal philosophy you will find across kernel in most of the cases. + /* +* Condition (req && req == [i]) should always meet since +* we have total nr requests in the vq. +*/ + if (!failed && (WARN_ON(!(req && req == [i])) || + (req->in_hdr.status != VIRTIO_I2C_MSG_OK))) What about writing this as: if (!failed && (WARN_ON(req != [i]) || (req->in_hdr.status != VIRTIO_I2C_MSG_OK))) We don't need to check req here since if req is NULL, we will not do req->in_hdr at all. It's right here just because the [i] will never be NULL in our case. But if you see "virtio_i2c_complete_reqs" as an independent function, you need to check the req. From the perspective of the callee, you can't ask the caller always give you the non-NULL parameters. We need to keep this driver optimized in its current form. If you see your own argument here, then why don't you test vq or msgs for a valid pointer ? And even reqs. If we know for certain that this will never happen, then it should be optimized. But if you see a case where reqs[i] can be NULL here, then it would be fine. ot the driver. And we don't need to take care of that. This is still not enough to convince me. So I won't change them for now until I see it is the consensus of the majority. Thank you. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v12] i2c: virtio: add a virtio i2c frontend driver
On 2021/7/5 14:30, Viresh Kumar wrote: This is still not enough to convince me. So I won't change them for now until I see it is the consensus of the majority. Do you see reqs[i] to ever be NULL here ? If not, then if (req) is like if (true). Why would you want to have something like that ? No. Currently, virtio_i2c_complete_reqs is only called by virtio_i2c_xfer, thus we don't have reqs[i] to be NULL. But I think "virtio_i2c_complete_reqs" as an independent function should consider this from a callee perspective. Anyway, if you really think it should be changed, it can be fixed incrementally as Wolfram said. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 1/2] vdpa: support per virtqueue max queue size
Virtio spec allows the device to specify the per virtqueue max queue size. vDPA needs to adapt to this flexibility. E.g Qemu advertise a small control virtqueue for virtio-net. So this patch adds a index parameter to get_vq_num_max bus operations for the device to report its per virtqueue max queue size. Both VHOST_VDPA_GET_VRING_NUM and VDPA_ATTR_DEV_MAX_VQ_SIZE assume a global maximum size. So we iterate all the virtqueues to return the minimal size in this case. Actually, the VHOST_VDPA_GET_VRING_NUM is not a must for the userspace. Userspace may choose to check the VHOST_SET_VRING_NUM for proving or validating the maximum virtqueue size. Anyway, we can invent a per vq version of VHOST_VDPA_GET_VRING_NUM in the future if it's necessary. Signed-off-by: Jason Wang --- drivers/vdpa/ifcvf/ifcvf_main.c | 2 +- drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +- drivers/vdpa/vdpa.c | 22 +- drivers/vdpa/vdpa_sim/vdpa_sim.c | 2 +- drivers/vdpa/virtio_pci/vp_vdpa.c | 2 +- drivers/vhost/vdpa.c | 9 ++--- drivers/virtio/virtio_vdpa.c | 2 +- include/linux/vdpa.h | 5 - 8 files changed, 36 insertions(+), 10 deletions(-) diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c index ab0ab5cf0f6e..646b340db2af 100644 --- a/drivers/vdpa/ifcvf/ifcvf_main.c +++ b/drivers/vdpa/ifcvf/ifcvf_main.c @@ -254,7 +254,7 @@ static void ifcvf_vdpa_set_status(struct vdpa_device *vdpa_dev, u8 status) ifcvf_set_status(vf, status); } -static u16 ifcvf_vdpa_get_vq_num_max(struct vdpa_device *vdpa_dev) +static u16 ifcvf_vdpa_get_vq_num_max(struct vdpa_device *vdpa_dev, u16 qid) { return IFCVF_QUEUE_MAX; } diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index dda5dc6f7737..afd6114d07b0 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -1584,7 +1584,7 @@ static void mlx5_vdpa_set_config_cb(struct vdpa_device *vdev, struct vdpa_callba } #define MLX5_VDPA_MAX_VQ_ENTRIES 256 -static u16 mlx5_vdpa_get_vq_num_max(struct vdpa_device *vdev) +static u16 mlx5_vdpa_get_vq_num_max(struct vdpa_device *vdev, u16 idx) { return MLX5_VDPA_MAX_VQ_ENTRIES; } diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c index bb3f1d1f0422..d77d59811389 100644 --- a/drivers/vdpa/vdpa.c +++ b/drivers/vdpa/vdpa.c @@ -239,6 +239,26 @@ void vdpa_unregister_driver(struct vdpa_driver *drv) } EXPORT_SYMBOL_GPL(vdpa_unregister_driver); +/** + * vdpa_get_vq_num_max - get the maximum virtqueue size + * @vdev: vdpa device + */ +u16 vdpa_get_vq_num_max(struct vdpa_device *vdev) +{ + const struct vdpa_config_ops *ops = vdev->config; + u16 s, size = ops->get_vq_num_max(vdev, 0); + int i; + + for (i = 1; i < vdev->nvqs; i++) { + s = ops->get_vq_num_max(vdev, i); + if (s && s < size) + size = s; + } + + return size; +} +EXPORT_SYMBOL_GPL(vdpa_get_vq_num_max); + /** * vdpa_mgmtdev_register - register a vdpa management device * @@ -502,7 +522,7 @@ vdpa_dev_fill(struct vdpa_device *vdev, struct sk_buff *msg, u32 portid, u32 seq device_id = vdev->config->get_device_id(vdev); vendor_id = vdev->config->get_vendor_id(vdev); - max_vq_size = vdev->config->get_vq_num_max(vdev); + max_vq_size = vdpa_get_vq_num_max(vdev); err = -EMSGSIZE; if (nla_put_string(msg, VDPA_ATTR_DEV_NAME, dev_name(>dev))) diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c index 98f793bc9376..49e29056f164 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c @@ -422,7 +422,7 @@ static void vdpasim_set_config_cb(struct vdpa_device *vdpa, /* We don't support config interrupt */ } -static u16 vdpasim_get_vq_num_max(struct vdpa_device *vdpa) +static u16 vdpasim_get_vq_num_max(struct vdpa_device *vdpa, u16 idx) { return VDPASIM_QUEUE_MAX; } diff --git a/drivers/vdpa/virtio_pci/vp_vdpa.c b/drivers/vdpa/virtio_pci/vp_vdpa.c index c76ebb531212..2926641fb586 100644 --- a/drivers/vdpa/virtio_pci/vp_vdpa.c +++ b/drivers/vdpa/virtio_pci/vp_vdpa.c @@ -195,7 +195,7 @@ static void vp_vdpa_set_status(struct vdpa_device *vdpa, u8 status) vp_vdpa_free_irq(vp_vdpa); } -static u16 vp_vdpa_get_vq_num_max(struct vdpa_device *vdpa) +static u16 vp_vdpa_get_vq_num_max(struct vdpa_device *vdpa, u16 qid) { return VP_VDPA_QUEUE_MAX; } diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index fb41db3da611..c9ec395b8c42 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -289,11 +289,14 @@ static long vhost_vdpa_set_features(struct vhost_vdpa *v, u64 __user *featurep) static long vhost_vdpa_get_vring_num(struct vhost_vdpa *v, u16 __user *argp) { - struct vdpa_device *vdpa = v->vdpa; - const struct vdpa_config_ops *ops = vdpa->config; u16
[PATCH 2/2] vdpa: vp_vdpa: don't use hard-coded maximum virtqueue size
This patch switch to read virtqueue size from the capability instead of depending on the hardcoded value. This allows the per virtqueue size could be advertised. Signed-off-by: Jason Wang --- drivers/vdpa/virtio_pci/vp_vdpa.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/vdpa/virtio_pci/vp_vdpa.c b/drivers/vdpa/virtio_pci/vp_vdpa.c index 2926641fb586..198f7076e4d9 100644 --- a/drivers/vdpa/virtio_pci/vp_vdpa.c +++ b/drivers/vdpa/virtio_pci/vp_vdpa.c @@ -18,7 +18,6 @@ #include #include -#define VP_VDPA_QUEUE_MAX 256 #define VP_VDPA_DRIVER_NAME "vp_vdpa" #define VP_VDPA_NAME_SIZE 256 @@ -197,7 +196,10 @@ static void vp_vdpa_set_status(struct vdpa_device *vdpa, u8 status) static u16 vp_vdpa_get_vq_num_max(struct vdpa_device *vdpa, u16 qid) { - return VP_VDPA_QUEUE_MAX; + struct vp_vdpa *vp_vdpa = vdpa_to_vp(vdpa); + struct virtio_pci_modern_device *mdev = _vdpa->mdev; + + return vp_modern_get_queue_size(mdev, qid); } static int vp_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 qid, -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/2] vdpa: vp_vdpa: don't use hard-coded maximum virtqueue size
在 2021/7/5 下午3:26, Michael S. Tsirkin 写道: On Mon, Jul 05, 2021 at 03:19:10PM +0800, Jason Wang wrote: This patch switch to read virtqueue size from the capability instead of depending on the hardcoded value. This allows the per virtqueue size could be advertised. Signed-off-by: Jason Wang So let's add an ioctl for this? It's really a bug we don't.. As explained in patch 1. Qemu doesn't use VHOST_VDPA_GET_VRING_NUM actually. Instead it checks the result VHOST_VDPA_SET_VRING_NUM. So I change VHOST_VDPA_GET_VRING_NUM to return the minimal size of all the virtqueues. If you wish we can add a VHOST_VDPA_GET_VRING_NUM2, but I'm not sure it will have a user or not. Thanks --- drivers/vdpa/virtio_pci/vp_vdpa.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/vdpa/virtio_pci/vp_vdpa.c b/drivers/vdpa/virtio_pci/vp_vdpa.c index 2926641fb586..198f7076e4d9 100644 --- a/drivers/vdpa/virtio_pci/vp_vdpa.c +++ b/drivers/vdpa/virtio_pci/vp_vdpa.c @@ -18,7 +18,6 @@ #include #include -#define VP_VDPA_QUEUE_MAX 256 #define VP_VDPA_DRIVER_NAME "vp_vdpa" #define VP_VDPA_NAME_SIZE 256 @@ -197,7 +196,10 @@ static void vp_vdpa_set_status(struct vdpa_device *vdpa, u8 status) static u16 vp_vdpa_get_vq_num_max(struct vdpa_device *vdpa, u16 qid) { - return VP_VDPA_QUEUE_MAX; + struct vp_vdpa *vp_vdpa = vdpa_to_vp(vdpa); + struct virtio_pci_modern_device *mdev = _vdpa->mdev; + + return vp_modern_get_queue_size(mdev, qid); } static int vp_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 qid, -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v13] i2c: virtio: add a virtio i2c frontend driver
On Mon, Jul 5, 2021 at 11:56 AM Viresh Kumar wrote: > On 05-07-21, 11:45, Andy Shevchenko wrote: > > On Mon, Jul 5, 2021 at 11:03 AM Viresh Kumar > > wrote: > > > On 05-07-21, 14:53, Jie Deng wrote: > > > > > > +#include > > > > +#include > > > > > > Both of these need to be the uapi headers as Andy said earlier > > > > They are already since this header _is_ UAPI, > > Ahh, there is some tricky header inclusion there :) > > > what you are suggesting is gonna not work, > > Why ? Because we do not have "uapi" in the path in /usr/include on the real system where the linux-headers (or kernel-headers) package is installed. It's still possible that our installation hooks will remove that "uapi" from the headers, but I think it makes things too complicated. > > although it's correct for in-kernel users of UAPI > > headers. -- With Best Regards, Andy Shevchenko ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next 0/2] refactor the ringtest testing for ptr_ring
On Mon, Jul 05, 2021 at 11:57:33AM +0800, Yunsheng Lin wrote: > tools/include/* have a lot of abstract layer for building > kernel code from userspace, so reuse or add the abstract > layer in tools/include/ to build the ptr_ring for ringtest > testing. ... > create mode 100644 tools/include/asm/cache.h > create mode 100644 tools/include/asm/processor.h > create mode 100644 tools/include/generated/autoconf.h > create mode 100644 tools/include/linux/align.h > create mode 100644 tools/include/linux/cache.h > create mode 100644 tools/include/linux/slab.h Maybe somebody can change this to be able to include in-tree headers directly? Besides above, had you tested this with `make O=...`? -- With Best Regards, Andy Shevchenko ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/2] vdpa: vp_vdpa: don't use hard-coded maximum virtqueue size
On Mon, Jul 05, 2021 at 03:19:10PM +0800, Jason Wang wrote: > This patch switch to read virtqueue size from the capability instead > of depending on the hardcoded value. This allows the per virtqueue > size could be advertised. > > Signed-off-by: Jason Wang So let's add an ioctl for this? It's really a bug we don't.. > --- > drivers/vdpa/virtio_pci/vp_vdpa.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/vdpa/virtio_pci/vp_vdpa.c > b/drivers/vdpa/virtio_pci/vp_vdpa.c > index 2926641fb586..198f7076e4d9 100644 > --- a/drivers/vdpa/virtio_pci/vp_vdpa.c > +++ b/drivers/vdpa/virtio_pci/vp_vdpa.c > @@ -18,7 +18,6 @@ > #include > #include > > -#define VP_VDPA_QUEUE_MAX 256 > #define VP_VDPA_DRIVER_NAME "vp_vdpa" > #define VP_VDPA_NAME_SIZE 256 > > @@ -197,7 +196,10 @@ static void vp_vdpa_set_status(struct vdpa_device *vdpa, > u8 status) > > static u16 vp_vdpa_get_vq_num_max(struct vdpa_device *vdpa, u16 qid) > { > - return VP_VDPA_QUEUE_MAX; > + struct vp_vdpa *vp_vdpa = vdpa_to_vp(vdpa); > + struct virtio_pci_modern_device *mdev = _vdpa->mdev; > + > + return vp_modern_get_queue_size(mdev, qid); > } > > static int vp_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 qid, > -- > 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v13] i2c: virtio: add a virtio i2c frontend driver
Add an I2C bus driver for virtio para-virtualization. The controller can be emulated by the backend driver in any device model software by following the virtio protocol. The device specification can be found on https://lists.oasis-open.org/archives/virtio-comment/202101/msg8.html. By following the specification, people may implement different backend drivers to emulate different controllers according to their needs. Co-developed-by: Conghui Chen Signed-off-by: Conghui Chen Signed-off-by: Jie Deng --- Changes v12 -> v13 - Use _BITUL() instead of BIT() - Rename "virtio_i2c_send_reqs" to "virtio_i2c_prepare_reqs" - Optimize the return value of "virtio_i2c_complete_reqs" Changes v11 -> v12 - Do not sent msg_buf for zero-length request. - Send requests to host only if all the number of transfers requested prepared successfully. - Remove the line #include in virtio_i2c.h Changes v10 -> v11 - Remove vi->adap.class = I2C_CLASS_DEPRECATED. - Use #ifdef CONFIG_PM_SLEEP to replace the "__maybe_unused". - Remove "struct mutex lock" in "struct virtio_i2c". - Support zero-length request. - Remove unnecessary logs. - Remove vi->adap.timeout = HZ / 10, just use the default value. - Use BIT(0) to define VIRTIO_I2C_FLAGS_FAIL_NEXT. - Add the virtio_device index to adapter's naming mechanism. drivers/i2c/busses/Kconfig | 11 ++ drivers/i2c/busses/Makefile | 3 + drivers/i2c/busses/i2c-virtio.c | 269 include/uapi/linux/virtio_i2c.h | 41 ++ include/uapi/linux/virtio_ids.h | 1 + 5 files changed, 325 insertions(+) create mode 100644 drivers/i2c/busses/i2c-virtio.c create mode 100644 include/uapi/linux/virtio_i2c.h diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig index 10acece..e47616a 100644 --- a/drivers/i2c/busses/Kconfig +++ b/drivers/i2c/busses/Kconfig @@ -21,6 +21,17 @@ config I2C_ALI1535 This driver can also be built as a module. If so, the module will be called i2c-ali1535. +config I2C_VIRTIO + tristate "Virtio I2C Adapter" + select VIRTIO + help + If you say yes to this option, support will be included for the virtio + I2C adapter driver. The hardware can be emulated by any device model + software according to the virtio protocol. + + This driver can also be built as a module. If so, the module + will be called i2c-virtio. + config I2C_ALI1563 tristate "ALI 1563" depends on PCI diff --git a/drivers/i2c/busses/Makefile b/drivers/i2c/busses/Makefile index 69e9963..9843756 100644 --- a/drivers/i2c/busses/Makefile +++ b/drivers/i2c/busses/Makefile @@ -147,4 +147,7 @@ obj-$(CONFIG_I2C_XGENE_SLIMPRO) += i2c-xgene-slimpro.o obj-$(CONFIG_SCx200_ACB) += scx200_acb.o obj-$(CONFIG_I2C_FSI) += i2c-fsi.o +# VIRTIO I2C host controller driver +obj-$(CONFIG_I2C_VIRTIO) += i2c-virtio.o + ccflags-$(CONFIG_I2C_DEBUG_BUS) := -DDEBUG diff --git a/drivers/i2c/busses/i2c-virtio.c b/drivers/i2c/busses/i2c-virtio.c new file mode 100644 index 000..731267d --- /dev/null +++ b/drivers/i2c/busses/i2c-virtio.c @@ -0,0 +1,269 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Virtio I2C Bus Driver + * + * The Virtio I2C Specification: + * https://raw.githubusercontent.com/oasis-tcs/virtio-spec/master/virtio-i2c.tex + * + * Copyright (c) 2021 Intel Corporation. All rights reserved. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/** + * struct virtio_i2c - virtio I2C data + * @vdev: virtio device for this controller + * @completion: completion of virtio I2C message + * @adap: I2C adapter for this controller + * @vq: the virtio virtqueue for communication + */ +struct virtio_i2c { + struct virtio_device *vdev; + struct completion completion; + struct i2c_adapter adap; + struct virtqueue *vq; +}; + +/** + * struct virtio_i2c_req - the virtio I2C request structure + * @out_hdr: the OUT header of the virtio I2C message + * @buf: the buffer into which data is read, or from which it's written + * @in_hdr: the IN header of the virtio I2C message + */ +struct virtio_i2c_req { + struct virtio_i2c_out_hdr out_hdr cacheline_aligned; + uint8_t *bufcacheline_aligned; + struct virtio_i2c_in_hdr in_hdr cacheline_aligned; +}; + +static void virtio_i2c_msg_done(struct virtqueue *vq) +{ + struct virtio_i2c *vi = vq->vdev->priv; + + complete(>completion); +} + +static int virtio_i2c_prepare_reqs(struct virtqueue *vq, + struct virtio_i2c_req *reqs, + struct i2c_msg *msgs, int nr) +{ + struct scatterlist *sgs[3], out_hdr, msg_buf, in_hdr; + int i, outcnt, incnt, err = 0; + +
Re: [PATCH v13] i2c: virtio: add a virtio i2c frontend driver
Hi Jie. On 05-07-21, 14:53, Jie Deng wrote: > diff --git a/drivers/i2c/busses/i2c-virtio.c b/drivers/i2c/busses/i2c-virtio.c > +static int virtio_i2c_complete_reqs(struct virtqueue *vq, > + struct virtio_i2c_req *reqs, > + struct i2c_msg *msgs, int nr, > + bool fail) > +{ > + struct virtio_i2c_req *req; > + bool failed = fail; > + unsigned int len; > + int i, j = 0; > + > + for (i = 0; i < nr; i++) { > + /* Detach the ith request from the vq */ > + req = virtqueue_get_buf(vq, ); Calling this for cases where virtio_i2c_prepare_reqs() itself has failed will always return NULL, should we even try to call this then ? virtqueue_get_buf() returns the next used buffer only, i.e. returned by the host ? > + > + /* > + * Condition (req && req == [i]) should always meet since > + * we have total nr requests in the vq. > + */ > + if (!failed && (WARN_ON(!(req && req == [i])) || Mentioning again for completeness of the review, reqs[i] can never be NULL here though req can be. And even in that case you never need to check req here. Feel free to ignore it if you want, we can always send a fixup patch later and discuss it further :) > + (req->in_hdr.status != VIRTIO_I2C_MSG_OK))) > + failed = true; > + > + i2c_put_dma_safe_msg_buf(reqs[i].buf, [i], !failed); > + if (!failed) > + j++; > + } > + > + return fail ? 0 : j; Since you don't return ETIMEDOUT anymore, you can simply return j now. And so we can work with a single failed argument and don't need both fail and failed. > +} > + > +static int virtio_i2c_xfer(struct i2c_adapter *adap, struct i2c_msg *msgs, > int num) > +{ > + struct virtio_i2c *vi = i2c_get_adapdata(adap); > + struct virtqueue *vq = vi->vq; > + struct virtio_i2c_req *reqs; > + unsigned long time_left; > + int ret; > + > + reqs = kcalloc(num, sizeof(*reqs), GFP_KERNEL); > + if (!reqs) > + return -ENOMEM; > + > + ret = virtio_i2c_prepare_reqs(vq, reqs, msgs, num); > + if (ret != num) { > + ret = virtio_i2c_complete_reqs(vq, reqs, msgs, ret, true); > + goto err_free; > + } > + > + reinit_completion(>completion); > + virtqueue_kick(vq); > + time_left = wait_for_completion_timeout(>completion, adap->timeout); > + ret = virtio_i2c_complete_reqs(vq, reqs, msgs, num, !time_left); > + > + if (!time_left) { > + ret = -ETIMEDOUT; > + dev_err(>dev, "virtio i2c backend timeout.\n"); > + } > + > +err_free: > + kfree(reqs); > + return ret; > +} > diff --git a/include/uapi/linux/virtio_i2c.h b/include/uapi/linux/virtio_i2c.h > new file mode 100644 > index 000..df936a2 > --- /dev/null > +++ b/include/uapi/linux/virtio_i2c.h > @@ -0,0 +1,41 @@ > +/* SPDX-License-Identifier: GPL-2.0-or-later WITH Linux-syscall-note */ > +/* > + * Definitions for virtio I2C Adpter > + * > + * Copyright (c) 2021 Intel Corporation. All rights reserved. > + */ > + > +#ifndef _UAPI_LINUX_VIRTIO_I2C_H > +#define _UAPI_LINUX_VIRTIO_I2C_H > + > +#include > +#include Both of these need to be the uapi headers as Andy said earlier and they better be in alphabetical order. #include #include Though in your support, I do see a lot of files in uapi/linux using the standard types.h, which looks wrong as that types.h is not a userspace ABI. -- viresh ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v13] i2c: virtio: add a virtio i2c frontend driver
On Mon, Jul 5, 2021 at 11:03 AM Viresh Kumar wrote: > On 05-07-21, 14:53, Jie Deng wrote: > > +#include > > +#include > > Both of these need to be the uapi headers as Andy said earlier They are already since this header _is_ UAPI, what you are suggesting is gonna not work, although it's correct for in-kernel users of UAPI headers. > and they better > be in alphabetical order. I prefer that as well. > #include > #include -- With Best Regards, Andy Shevchenko ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v12] i2c: virtio: add a virtio i2c frontend driver
On 2021/7/5 10:43, Viresh Kumar wrote: On 02-07-21, 12:58, Andy Shevchenko wrote: On Fri, Jul 02, 2021 at 04:46:47PM +0800, Jie Deng wrote: +static int virtio_i2c_complete_reqs(struct virtqueue *vq, + struct virtio_i2c_req *reqs, + struct i2c_msg *msgs, int nr, + bool fail) +{ + struct virtio_i2c_req *req; + bool failed = fail; Jie, you can actually get rid of this variable too. Jut rename fail to failed and everything shall work as you want. Oh, You are not right. I just found we can't remove this variable. The "fail" and "failed" have different meanings for this function. We need fail to return the result. + unsigned int len; + int i, j = 0; + + for (i = 0; i < nr; i++) { + /* Detach the ith request from the vq */ + req = virtqueue_get_buf(vq, ); + + /* +* Condition (req && req == [i]) should always meet since +* we have total nr requests in the vq. +*/ + if (!failed && (WARN_ON(!(req && req == [i])) || + (req->in_hdr.status != VIRTIO_I2C_MSG_OK))) + failed = true; ...and after failed is true, we are continuing the loop, why? Actually this function can be called with fail set to true. We proceed as we need to call i2c_put_dma_safe_msg_buf() for all buffers we allocated earlier. + i2c_put_dma_safe_msg_buf(reqs[i].buf, [i], !failed); + if (!failed) + ++j; Besides better to read j++ the j itself can be renamed to something more verbose. + } + return (fail ? -ETIMEDOUT : j); Redundant parentheses. +} ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/3] vDPA/ifcvf: implement management netlink framework for ifcvf
On Mon, Jul 05, 2021 at 01:04:11PM +0800, Jason Wang wrote: > > +static int ifcvf_probe(struct pci_dev *pdev, const struct pci_device_id > > *id) > > +{ > > + struct ifcvf_vdpa_mgmt_dev *ifcvf_mgmt_dev; > > + struct device *dev = >dev; > > + struct ifcvf_adapter *adapter; > > > adapter is not used. It's used in error handling below. It's not *initialized*. > > > + u32 dev_type; > > + int ret; > > + > > + ifcvf_mgmt_dev = kzalloc(sizeof(struct ifcvf_vdpa_mgmt_dev), > > GFP_KERNEL); > > + if (!ifcvf_mgmt_dev) { > > + IFCVF_ERR(pdev, "Failed to alloc memory for the vDPA management > > device\n"); > > + return -ENOMEM; > > + } > > + > > + dev_type = get_dev_type(pdev); > > + switch (dev_type) { > > + case VIRTIO_ID_NET: > > + ifcvf_mgmt_dev->mdev.id_table = id_table_net; > > + break; > > + case VIRTIO_ID_BLOCK: > > + ifcvf_mgmt_dev->mdev.id_table = id_table_blk; > > + break; > > + default: > > + IFCVF_ERR(pdev, "VIRTIO ID %u not supported\n", dev_type); > > + ret = -EOPNOTSUPP; > > + goto err; > > + } > > + > > + ifcvf_mgmt_dev->mdev.ops = _vdpa_mgmt_dev_ops; > > + ifcvf_mgmt_dev->mdev.device = dev; > > + ifcvf_mgmt_dev->pdev = pdev; > > + > > + ret = pcim_enable_device(pdev); > > + if (ret) { > > + IFCVF_ERR(pdev, "Failed to enable device\n"); > > + goto err; > > + } > > + > > + ret = pcim_iomap_regions(pdev, BIT(0) | BIT(2) | BIT(4), > > +IFCVF_DRIVER_NAME); > > + if (ret) { > > + IFCVF_ERR(pdev, "Failed to request MMIO region\n"); > > + goto err; > > + } > > + > > + ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64)); > > + if (ret) { > > + IFCVF_ERR(pdev, "No usable DMA configuration\n"); > > + goto err; > > + } > > + > > + ret = devm_add_action_or_reset(dev, ifcvf_free_irq_vectors, pdev); > > + if (ret) { > > + IFCVF_ERR(pdev, > > + "Failed for adding devres for freeing irq vectors\n"); > > + goto err; > > + } > > + > > + pci_set_master(pdev); > > + > > + ret = vdpa_mgmtdev_register(_mgmt_dev->mdev); > > + if (ret) { > > + IFCVF_ERR(pdev, > > + "Failed to initialize the management interfaces\n"); > > goto err; > > } > > @@ -533,14 +610,21 @@ static int ifcvf_probe(struct pci_dev *pdev, const > > struct pci_device_id *id) > > err: > > put_device(>vdpa.dev); > > + kfree(ifcvf_mgmt_dev); > > return ret; > > } ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v13] i2c: virtio: add a virtio i2c frontend driver
On 05-07-21, 11:45, Andy Shevchenko wrote: > On Mon, Jul 5, 2021 at 11:03 AM Viresh Kumar wrote: > > On 05-07-21, 14:53, Jie Deng wrote: > > > > +#include > > > +#include > > > > Both of these need to be the uapi headers as Andy said earlier > > They are already since this header _is_ UAPI, Ahh, there is some tricky header inclusion there :) > what you are suggesting is gonna not work, Why ? > although it's correct for in-kernel users of UAPI > headers. -- viresh ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio_vdpa: reject invalid vq indices
On Thu, Jul 01, 2021 at 01:46:52PM +0200, Vincent Whitchurch wrote: Do not call vDPA drivers' callbacks with vq indicies larger than what the drivers indicate that they support. vDPA drivers do not bounds check the indices. Signed-off-by: Vincent Whitchurch --- drivers/virtio/virtio_vdpa.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c index e28acf482e0c..e9b9dd03f44a 100644 --- a/drivers/virtio/virtio_vdpa.c +++ b/drivers/virtio/virtio_vdpa.c @@ -149,6 +149,9 @@ virtio_vdpa_setup_vq(struct virtio_device *vdev, unsigned int index, if (!name) return NULL; + if (index >= vdpa->nvqs) + return ERR_PTR(-ENOENT); + /* Queue shouldn't already be set up. */ if (ops->get_vq_ready(vdpa, index)) return ERR_PTR(-ENOENT); -- 2.28.0 Reviewed-by: Stefano Garzarella ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v10] i2c: virtio: add a virtio i2c frontend driver
On 29-06-21, 12:43, Wolfram Sang wrote: > > > From the spec: > > > > The case when ``length of \field{write_buf}''=0, and at the same time, > > ``length of \field{read_buf}''=0 doesn't make any sense. > > > > I mentioned this in my first reply and to my understanding I did not get > > a reply that this has changed meanwhile. > > > > Also, this code as mentioned before: > > > + if (!msgs[i].len) > > + break; > > I hope this can extended in the future to allow zero-length messages. If > this is impossible we need to set an adapter quirk instead. Wolfram, I stumbled again upon this while working at the backend implementation. If you look at i2c_smbus_xfer_emulated(), the command is always sent via msgbuf0[0]. Even in the case of I2C_SMBUS_QUICK, where we set msg[0].len = 0, we still send the buf. This is really confusing :( Do I understand correctly that we always need to send msg[0].buf even when msg[0].len is 0 ? If so, it would be difficult to implement this with the current i2c virtio specification, as the msg.len isn't really passed from guest to host, rather it is inferred using the length of the buffer itself. And so we can't really pass a buffer if length is 0. Moreover, the driver uses i2c_get_dma_safe_msg_buf(), which also depends on the length parameter here to allocate the buffer and copy data to it. All in all, the latest version of the driver doesn't work with "i2cdetect -q ". To make it work, I had to add this: diff --git a/drivers/i2c/busses/i2c-virtio.c b/drivers/i2c/busses/i2c-virtio.c index 731267d42292..5b8bd98ae38e 100644 --- a/drivers/i2c/busses/i2c-virtio.c +++ b/drivers/i2c/busses/i2c-virtio.c @@ -73,6 +73,9 @@ static int virtio_i2c_prepare_reqs(struct virtqueue *vq, sg_init_one(_hdr, [i].out_hdr, sizeof(reqs[i].out_hdr)); sgs[outcnt++] = _hdr; + if (!msgs[i].len) + msgs[i].len = 1; + if (msgs[i].len) { reqs[i].buf = i2c_get_dma_safe_msg_buf([i], 1); if (!reqs[i].buf) which made it I2C_SMBUS_BYTE instead of I2C_SMBUS_QUICK. What should we do here Wolfram? Jie, while wolfram comes back and replies to this, I think you need to switch back to NOT supporting zero length transfer and set update virtio_i2c_func() to return: I2C_FUNC_I2C | (I2C_FUNC_SMBUS_EMUL & ~I2C_FUNC_SMBUS_QUICK); Support for zero-length transfers and I2C_FUNC_SMBUS_QUICK can be added separately. Thanks. -- viresh ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next 0/2] refactor the ringtest testing for ptr_ring
On Mon, Jul 05, 2021 at 08:06:50PM +0800, Yunsheng Lin wrote: > On 2021/7/5 17:56, Andy Shevchenko wrote: > > On Mon, Jul 05, 2021 at 11:57:33AM +0800, Yunsheng Lin wrote: > >> tools/include/* have a lot of abstract layer for building > >> kernel code from userspace, so reuse or add the abstract > >> layer in tools/include/ to build the ptr_ring for ringtest > >> testing. > > > > ... > > > >> create mode 100644 tools/include/asm/cache.h > >> create mode 100644 tools/include/asm/processor.h > >> create mode 100644 tools/include/generated/autoconf.h > >> create mode 100644 tools/include/linux/align.h > >> create mode 100644 tools/include/linux/cache.h > >> create mode 100644 tools/include/linux/slab.h > > > > Maybe somebody can change this to be able to include in-tree headers > > directly? > > If the above works, maybe the files in tools/include/* is not > necessary any more, just use the in-tree headers to compile > the user space app? > > Or I missed something here? I don't know if it works or not or why that decision had been made to copy'n'paste headers (Yes, I know they have some modifications). Somebody needs to check that and see what can be done in order to avoid copying entire include into tools/include. > > Besides above, had you tested this with `make O=...`? > > You are right, the generated/autoconf.h is in another directory > with `make O=...`. > > Any nice idea to fix the above problem? No idea. But I consider breakage of O= is a show stopper. -- With Best Regards, Andy Shevchenko ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v8 10/10] Documentation: Add documentation for VDUSE
On Mon, Jul 05, 2021 at 11:36:15AM +0800, Jason Wang wrote: > > 在 2021/7/4 下午5:49, Yongji Xie 写道: > > > > OK, I get you now. Since the VIRTIO specification says "Device > > > > configuration space is generally used for rarely-changing or > > > > initialization-time parameters". I assume the VDUSE_DEV_SET_CONFIG > > > > ioctl should not be called frequently. > > > The spec uses MUST and other terms to define the precise requirements. > > > Here the language (especially the word "generally") is weaker and means > > > there may be exceptions. > > > > > > Another type of access that doesn't work with the VDUSE_DEV_SET_CONFIG > > > approach is reads that have side-effects. For example, imagine a field > > > containing an error code if the device encounters a problem unrelated to > > > a specific virtqueue request. Reading from this field resets the error > > > code to 0, saving the driver an extra configuration space write access > > > and possibly race conditions. It isn't possible to implement those > > > semantics suing VDUSE_DEV_SET_CONFIG. It's another corner case, but it > > > makes me think that the interface does not allow full VIRTIO semantics. > > > Note that though you're correct, my understanding is that config space is > not suitable for this kind of error propagating. And it would be very hard > to implement such kind of semantic in some transports. Virtqueue should be > much better. As Yong Ji quoted, the config space is used for > "rarely-changing or intialization-time parameters". > > > > Agreed. I will use VDUSE_DEV_GET_CONFIG in the next version. And to > > handle the message failure, I'm going to add a return value to > > virtio_config_ops.get() and virtio_cread_* API so that the error can > > be propagated to the virtio device driver. Then the virtio-blk device > > driver can be modified to handle that. > > > > Jason and Stefan, what do you think of this way? Why does VDUSE_DEV_GET_CONFIG need to support an error return value? The VIRTIO spec provides no way for the device to report errors from config space accesses. The QEMU virtio-pci implementation returns -1 from invalid virtio_config_read*() and silently discards virtio_config_write*() accesses. VDUSE can take the same approach with VDUSE_DEV_GET_CONFIG/VDUSE_DEV_SET_CONFIG. > I'd like to stick to the current assumption thich get_config won't fail. > That is to say, > > 1) maintain a config in the kernel, make sure the config space read can > always succeed > 2) introduce an ioctl for the vduse usersapce to update the config space. > 3) we can synchronize with the vduse userspace during set_config > > Does this work? I noticed that caching is also allowed by the vhost-user protocol messages (QEMU's docs/interop/vhost-user.rst), but the device doesn't know whether or not caching is in effect. The interface you outlined above requires caching. Is there a reason why the host kernel vDPA code needs to cache the configuration space? Here are the vhost-user protocol messages: Virtio device config space ^^ ++--+---+-+ | offset | size | flags | payload | ++--+---+-+ :offset: a 32-bit offset of virtio device's configuration space :size: a 32-bit configuration space access size in bytes :flags: a 32-bit value: - 0: Vhost master messages used for writeable fields - 1: Vhost master messages used for live migration :payload: Size bytes array holding the contents of the virtio device's configuration space ... ``VHOST_USER_GET_CONFIG`` :id: 24 :equivalent ioctl: N/A :master payload: virtio device config space :slave payload: virtio device config space When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is submitted by the vhost-user master to fetch the contents of the virtio device configuration space, vhost-user slave's payload size MUST match master's request, vhost-user slave uses zero length of payload to indicate an error to vhost-user master. The vhost-user master may cache the contents to avoid repeated ``VHOST_USER_GET_CONFIG`` calls. ``VHOST_USER_SET_CONFIG`` :id: 25 :equivalent ioctl: N/A :master payload: virtio device config space :slave payload: N/A When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is submitted by the vhost-user master when the Guest changes the virtio device configuration space and also can be used for live migration on the destination host. The vhost-user slave must check the flags field, and slaves MUST NOT accept SET_CONFIG for read-only configuration space fields unless the live migration bit is set. Stefan signature.asc Description: PGP signature ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [MASSMAIL KLMS]Re: [RFC PATCH v2 0/6] Improve SOCK_SEQPACKET receive logic
On Mon, Jul 05, 2021 at 01:48:28PM +0300, Arseny Krasnov wrote: On 04.07.2021 12:54, Michael S. Tsirkin wrote: On Sun, Jul 04, 2021 at 12:23:03PM +0300, Arseny Krasnov wrote: On 04.07.2021 11:30, Michael S. Tsirkin wrote: On Sun, Jul 04, 2021 at 11:08:13AM +0300, Arseny Krasnov wrote: This patchset modifies receive logic for SOCK_SEQPACKET. Difference between current implementation and this version is that now reader is woken up when there is at least one RW packet in rx queue of socket and data is copied to user's buffer, while merged approach wake up user only when whole message is received and kept in queue. New implementation has several advantages: 1) There is no limit for message length. Merged approach requires that length must be smaller than 'peer_buf_alloc', otherwise transmission will stuck. 2) There is no need to keep whole message in queue, thus no 'kmalloc()' memory will be wasted until EOR is received. Also new approach has some feature: as fragments of message are copied until EOR is received, it is possible that part of message will be already in user's buffer, while rest of message still not received. And if user will be interrupted by signal or timeout with part of message in buffer, it will exit receive loop, leaving rest of message in queue. To solve this problem special callback was added to transport: it is called when user was forced to leave exit loop and tells transport to drop any packet until EOR met. Sorry about commenting late in the game. I'm a bit lost SOCK_SEQPACKET Provides sequenced, reliable, bidirectional, connection-mode transmission paths for records. A record can be sent using one or more output operations and received using one or more input operations, but a single operation never transfers part of more than one record. Record boundaries are visible to the receiver via the MSG_EOR flag. it's supposed to be reliable - how is it legal to drop packets? Sorry, seems i need to rephrase description. "Packet" here means fragment of record(message) at transport layer. As this is SEQPACKET mode, receiver could get only whole message or error, so if only several fragments of message was copied (if signal received for example) we can't return it to user - it breaks SEQPACKET sense. I think, in this case we can drop rest of record's fragments legally. Thank You Would not that violate the reliable property? IIUC it's only ok to return an error if socket gets closed. Just like e.g. TCP ... Sorry for late answer, yes You're right, seems this is unwanted drop... Lets wait for Stefano Garzarella feedback It was the same concern I had with the series that introduced SEQPACKET for vsock, which is why I suggested to wait until the message is complete, before copying it to the user's buffer. IIUC, with the current upstream implementation, we don't have this problem, right? I'm not sure how to fix this, other than by keeping all the fragments queued until we've successfully copied them to user space, which is what we should do without this series applied IIUC. Stefano ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/2] vdpa: vp_vdpa: don't use hard-coded maximum virtqueue size
在 2021/7/6 上午1:59, Michael S. Tsirkin 写道: On Mon, Jul 05, 2021 at 03:29:47PM +0800, Jason Wang wrote: 在 2021/7/5 下午3:26, Michael S. Tsirkin 写道: On Mon, Jul 05, 2021 at 03:19:10PM +0800, Jason Wang wrote: This patch switch to read virtqueue size from the capability instead of depending on the hardcoded value. This allows the per virtqueue size could be advertised. Signed-off-by: Jason Wang So let's add an ioctl for this? It's really a bug we don't.. As explained in patch 1. Qemu doesn't use VHOST_VDPA_GET_VRING_NUM actually. Instead it checks the result VHOST_VDPA_SET_VRING_NUM. So I change VHOST_VDPA_GET_VRING_NUM to return the minimal size of all the virtqueues. If you wish we can add a VHOST_VDPA_GET_VRING_NUM2, but I'm not sure it will have a user or not. Thanks Question is how do we know returning the minimal and not e.g. the max size is the right thing to do? For the new ioctl, it will return the max queue size per vq. It's probably too late to fix the old one, so it's only safe to return the minimal one. Actually, most of the vDPA parents should be fine except for the vp_vdpa. When running in a nested environment, Qemu only advertise cvq with 64 entries. Thanks --- drivers/vdpa/virtio_pci/vp_vdpa.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/vdpa/virtio_pci/vp_vdpa.c b/drivers/vdpa/virtio_pci/vp_vdpa.c index 2926641fb586..198f7076e4d9 100644 --- a/drivers/vdpa/virtio_pci/vp_vdpa.c +++ b/drivers/vdpa/virtio_pci/vp_vdpa.c @@ -18,7 +18,6 @@ #include #include -#define VP_VDPA_QUEUE_MAX 256 #define VP_VDPA_DRIVER_NAME "vp_vdpa" #define VP_VDPA_NAME_SIZE 256 @@ -197,7 +196,10 @@ static void vp_vdpa_set_status(struct vdpa_device *vdpa, u8 status) static u16 vp_vdpa_get_vq_num_max(struct vdpa_device *vdpa, u16 qid) { - return VP_VDPA_QUEUE_MAX; + struct vp_vdpa *vp_vdpa = vdpa_to_vp(vdpa); + struct virtio_pci_modern_device *mdev = _vdpa->mdev; + + return vp_modern_get_queue_size(mdev, qid); } static int vp_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 qid, -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v10] i2c: virtio: add a virtio i2c frontend driver
On 2021/7/5 20:18, Viresh Kumar wrote: On 29-06-21, 12:43, Wolfram Sang wrote: From the spec: The case when ``length of \field{write_buf}''=0, and at the same time, ``length of \field{read_buf}''=0 doesn't make any sense. I mentioned this in my first reply and to my understanding I did not get a reply that this has changed meanwhile. Also, this code as mentioned before: + if (!msgs[i].len) + break; I hope this can extended in the future to allow zero-length messages. If this is impossible we need to set an adapter quirk instead. Wolfram, I stumbled again upon this while working at the backend implementation. If you look at i2c_smbus_xfer_emulated(), the command is always sent via msgbuf0[0]. Even in the case of I2C_SMBUS_QUICK, where we set msg[0].len = 0, we still send the buf. This is really confusing :( Do I understand correctly that we always need to send msg[0].buf even when msg[0].len is 0 ? If so, it would be difficult to implement this with the current i2c virtio specification, as the msg.len isn't really passed from guest to host, rather it is inferred using the length of the buffer itself. And so we can't really pass a buffer if length is 0. Moreover, the driver uses i2c_get_dma_safe_msg_buf(), which also depends on the length parameter here to allocate the buffer and copy data to it. All in all, the latest version of the driver doesn't work with "i2cdetect -q ". To make it work, I had to add this: diff --git a/drivers/i2c/busses/i2c-virtio.c b/drivers/i2c/busses/i2c-virtio.c index 731267d42292..5b8bd98ae38e 100644 --- a/drivers/i2c/busses/i2c-virtio.c +++ b/drivers/i2c/busses/i2c-virtio.c @@ -73,6 +73,9 @@ static int virtio_i2c_prepare_reqs(struct virtqueue *vq, sg_init_one(_hdr, [i].out_hdr, sizeof(reqs[i].out_hdr)); sgs[outcnt++] = _hdr; + if (!msgs[i].len) + msgs[i].len = 1; + if (msgs[i].len) { reqs[i].buf = i2c_get_dma_safe_msg_buf([i], 1); if (!reqs[i].buf) which made it I2C_SMBUS_BYTE instead of I2C_SMBUS_QUICK. What should we do here Wolfram? Jie, while wolfram comes back and replies to this, I think you need to switch back to NOT supporting zero length transfer and set update virtio_i2c_func() to return: I2C_FUNC_I2C | (I2C_FUNC_SMBUS_EMUL & ~I2C_FUNC_SMBUS_QUICK); Support for zero-length transfers and I2C_FUNC_SMBUS_QUICK can be added separately. Thanks. It's OK to me. Let's see what Wolfram says when he comes back. I will send the updated version then. Thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v8 10/10] Documentation: Add documentation for VDUSE
在 2021/7/5 下午8:49, Stefan Hajnoczi 写道: On Mon, Jul 05, 2021 at 11:36:15AM +0800, Jason Wang wrote: 在 2021/7/4 下午5:49, Yongji Xie 写道: OK, I get you now. Since the VIRTIO specification says "Device configuration space is generally used for rarely-changing or initialization-time parameters". I assume the VDUSE_DEV_SET_CONFIG ioctl should not be called frequently. The spec uses MUST and other terms to define the precise requirements. Here the language (especially the word "generally") is weaker and means there may be exceptions. Another type of access that doesn't work with the VDUSE_DEV_SET_CONFIG approach is reads that have side-effects. For example, imagine a field containing an error code if the device encounters a problem unrelated to a specific virtqueue request. Reading from this field resets the error code to 0, saving the driver an extra configuration space write access and possibly race conditions. It isn't possible to implement those semantics suing VDUSE_DEV_SET_CONFIG. It's another corner case, but it makes me think that the interface does not allow full VIRTIO semantics. Note that though you're correct, my understanding is that config space is not suitable for this kind of error propagating. And it would be very hard to implement such kind of semantic in some transports. Virtqueue should be much better. As Yong Ji quoted, the config space is used for "rarely-changing or intialization-time parameters". Agreed. I will use VDUSE_DEV_GET_CONFIG in the next version. And to handle the message failure, I'm going to add a return value to virtio_config_ops.get() and virtio_cread_* API so that the error can be propagated to the virtio device driver. Then the virtio-blk device driver can be modified to handle that. Jason and Stefan, what do you think of this way? Why does VDUSE_DEV_GET_CONFIG need to support an error return value? The VIRTIO spec provides no way for the device to report errors from config space accesses. The QEMU virtio-pci implementation returns -1 from invalid virtio_config_read*() and silently discards virtio_config_write*() accesses. VDUSE can take the same approach with VDUSE_DEV_GET_CONFIG/VDUSE_DEV_SET_CONFIG. I'd like to stick to the current assumption thich get_config won't fail. That is to say, 1) maintain a config in the kernel, make sure the config space read can always succeed 2) introduce an ioctl for the vduse usersapce to update the config space. 3) we can synchronize with the vduse userspace during set_config Does this work? I noticed that caching is also allowed by the vhost-user protocol messages (QEMU's docs/interop/vhost-user.rst), but the device doesn't know whether or not caching is in effect. The interface you outlined above requires caching. Is there a reason why the host kernel vDPA code needs to cache the configuration space? Because: 1) Kernel can not wait forever in get_config(), this is the major difference with vhost-user. 2) Stick to the current assumption that virtio_cread() should always succeed. Thanks Here are the vhost-user protocol messages: Virtio device config space ^^ ++--+---+-+ | offset | size | flags | payload | ++--+---+-+ :offset: a 32-bit offset of virtio device's configuration space :size: a 32-bit configuration space access size in bytes :flags: a 32-bit value: - 0: Vhost master messages used for writeable fields - 1: Vhost master messages used for live migration :payload: Size bytes array holding the contents of the virtio device's configuration space ... ``VHOST_USER_GET_CONFIG`` :id: 24 :equivalent ioctl: N/A :master payload: virtio device config space :slave payload: virtio device config space When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is submitted by the vhost-user master to fetch the contents of the virtio device configuration space, vhost-user slave's payload size MUST match master's request, vhost-user slave uses zero length of payload to indicate an error to vhost-user master. The vhost-user master may cache the contents to avoid repeated ``VHOST_USER_GET_CONFIG`` calls. ``VHOST_USER_SET_CONFIG`` :id: 25 :equivalent ioctl: N/A :master payload: virtio device config space :slave payload: N/A When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is submitted by the vhost-user master when the Guest changes the virtio device configuration space and also can be used for live migration on the destination host. The vhost-user slave must check the flags field, and slaves MUST NOT accept SET_CONFIG for read-only configuration space fields unless the live migration bit is set. Stefan ___ Virtualization mailing list Virtualization@lists.linux-foundation.org
Re: [PATCH net-next 0/2] refactor the ringtest testing for ptr_ring
On Mon, Jul 05, 2021 at 02:26:32PM -0400, Michael S. Tsirkin wrote: > On Mon, Jul 05, 2021 at 08:06:50PM +0800, Yunsheng Lin wrote: > > On 2021/7/5 17:56, Andy Shevchenko wrote: > > > On Mon, Jul 05, 2021 at 11:57:33AM +0800, Yunsheng Lin wrote: > > >> tools/include/* have a lot of abstract layer for building > > >> kernel code from userspace, so reuse or add the abstract > > >> layer in tools/include/ to build the ptr_ring for ringtest > > >> testing. > > > > > > ... > > > > > >> create mode 100644 tools/include/asm/cache.h > > >> create mode 100644 tools/include/asm/processor.h > > >> create mode 100644 tools/include/generated/autoconf.h > > >> create mode 100644 tools/include/linux/align.h > > >> create mode 100644 tools/include/linux/cache.h > > >> create mode 100644 tools/include/linux/slab.h > > > > > > Maybe somebody can change this to be able to include in-tree headers > > > directly? > > > > If the above works, maybe the files in tools/include/* is not > > necessary any more, just use the in-tree headers to compile > > the user space app? > > > > Or I missed something here? > > why would it work? kernel headers outside of uapi are not > intended to be consumed by userspace. The problem here, that we are almost getting two copies of the headers, and tools are not in a good maintenance, so it's often desynchronized from the actual Linux headers. This will become more and more diverse if we keep same way of operation. So, I would rather NAK any new copies of the headers from include/ to tools/include. > > > Besides above, had you tested this with `make O=...`? > > > > You are right, the generated/autoconf.h is in another directory > > with `make O=...`. > > > > Any nice idea to fix the above problem? -- With Best Regards, Andy Shevchenko ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next 0/2] refactor the ringtest testing for ptr_ring
On Mon, Jul 05, 2021 at 10:05:30PM +0300, Andy Shevchenko wrote: > On Mon, Jul 5, 2021 at 9:45 PM Michael S. Tsirkin wrote: > > > > On Mon, Jul 05, 2021 at 09:36:26PM +0300, Andy Shevchenko wrote: > > > On Mon, Jul 05, 2021 at 02:26:32PM -0400, Michael S. Tsirkin wrote: > > > > On Mon, Jul 05, 2021 at 08:06:50PM +0800, Yunsheng Lin wrote: > > > > > On 2021/7/5 17:56, Andy Shevchenko wrote: > > > > > > On Mon, Jul 05, 2021 at 11:57:33AM +0800, Yunsheng Lin wrote: > > > > > >> tools/include/* have a lot of abstract layer for building > > > > > >> kernel code from userspace, so reuse or add the abstract > > > > > >> layer in tools/include/ to build the ptr_ring for ringtest > > > > > >> testing. > > > > > > > > > > > > ... > > > > > > > > > > > >> create mode 100644 tools/include/asm/cache.h > > > > > >> create mode 100644 tools/include/asm/processor.h > > > > > >> create mode 100644 tools/include/generated/autoconf.h > > > > > >> create mode 100644 tools/include/linux/align.h > > > > > >> create mode 100644 tools/include/linux/cache.h > > > > > >> create mode 100644 tools/include/linux/slab.h > > > > > > > > > > > > Maybe somebody can change this to be able to include in-tree > > > > > > headers directly? > > > > > > > > > > If the above works, maybe the files in tools/include/* is not > > > > > necessary any more, just use the in-tree headers to compile > > > > > the user space app? > > > > > > > > > > Or I missed something here? > > > > > > > > why would it work? kernel headers outside of uapi are not > > > > intended to be consumed by userspace. > > > > > > The problem here, that we are almost getting two copies of the headers, > > > and > > > tools are not in a good maintenance, so it's often desynchronized from the > > > actual Linux headers. This will become more and more diverse if we keep > > > same > > > way of operation. So, I would rather NAK any new copies of the headers > > > from > > > include/ to tools/include. > > > > We already have the copies > > yes they are not maintained well ... what's the plan then? > > NAK won't help us improve the situation. > > I understand and the proposal is to leave only the files which are not > the same (can we do kinda wrappers or so in tools/include rather than > copying everything?). I have no idea how we'd do all this. When I did tools/virtio I already tried to minimize copying. Want to try to do better? > > I would say copies are kind of okay just make sure they are > > built with kconfig. Then any breakage will be > > detected. > > > > > > > > Besides above, had you tested this with `make O=...`? > > > > > > > > > > You are right, the generated/autoconf.h is in another directory > > > > > with `make O=...`. > > > > > > > > > > Any nice idea to fix the above problem? > > > -- > With Best Regards, > Andy Shevchenko ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3] virtio-blk: Add validation for block size in config space
On Tue, Jun 22, 2021 at 11:11:06AM +0100, Stefan Hajnoczi wrote: > On Thu, Jun 17, 2021 at 01:10:04PM +0800, Xie Yongji wrote: > > This ensures that we will not use an invalid block size > > in config space (might come from an untrusted device). > > > > Signed-off-by: Xie Yongji > > --- > > drivers/block/virtio_blk.c | 29 +++-- > > 1 file changed, 23 insertions(+), 6 deletions(-) > > > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c > > index b9fa3ef5b57c..bbdae989f1ea 100644 > > --- a/drivers/block/virtio_blk.c > > +++ b/drivers/block/virtio_blk.c > > @@ -696,6 +696,28 @@ static const struct blk_mq_ops virtio_mq_ops = { > > static unsigned int virtblk_queue_depth; > > module_param_named(queue_depth, virtblk_queue_depth, uint, 0444); > > > > +static int virtblk_validate(struct virtio_device *vdev) > > +{ > > + u32 blk_size; > > + > > + if (!vdev->config->get) { > > + dev_err(>dev, "%s failure: config access disabled\n", > > + __func__); > > + return -EINVAL; > > + } > > + > > + if (!virtio_has_feature(vdev, VIRTIO_BLK_F_BLK_SIZE)) > > + return 0; > > + > > + blk_size = virtio_cread32(vdev, > > + offsetof(struct virtio_blk_config, blk_size)); > > + > > + if (blk_size < SECTOR_SIZE || blk_size > PAGE_SIZE) > > + __virtio_clear_bit(vdev, VIRTIO_BLK_F_BLK_SIZE); > > + > > + return 0; > > +} > > I saw Michael asked for .validate() in v2. I would prefer to keep > everything in virtblk_probe() instead of adding .validate() because: > > - There is a race condition that an untrusted device can exploit since > virtblk_probe() fetches the value again. > > - It's more complex now that .validate() takes a first shot at blk_size > and then virtblk_probe() deals with it again later on. This is a valid concern. But, silently ignoring what hypervisor told us to do is also ungood. Let's save it somewhere then. And there are more examples like this, e.g. the virtio net mtu. So if we worry about this stuff, let's do something along the lines of (note: incomplete, won't build: you need to update all drivers). virtio: allow passing config data from validate callback To avoid time of check to time of use races on config changes, pass config data from validate callback to probe. Signed-off-by: Michael S. Tsirkin --- diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c index 4b15c00c0a0a..d3657a0127d7 100644 --- a/drivers/virtio/virtio.c +++ b/drivers/virtio/virtio.c @@ -211,6 +211,7 @@ static int virtio_dev_probe(struct device *_d) u64 device_features; u64 driver_features; u64 driver_features_legacy; + void *config = NULL; /* We have a driver! */ virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER); @@ -249,18 +250,20 @@ static int virtio_dev_probe(struct device *_d) __virtio_set_bit(dev, i); if (drv->validate) { - err = drv->validate(dev); - if (err) + config = drv->validate(dev); + if (IS_ERR(config)) { + err = PTR_ERR(config); goto err; + } } err = virtio_finalize_features(dev); if (err) goto err; - err = drv->probe(dev); + err = drv->probe(dev, config); if (err) - goto err; + goto probe; /* If probe didn't do it, mark device DRIVER_OK ourselves. */ if (!(dev->config->get_status(dev) & VIRTIO_CONFIG_S_DRIVER_OK)) @@ -269,9 +272,12 @@ static int virtio_dev_probe(struct device *_d) if (drv->scan) drv->scan(dev); + kfree(config); virtio_config_enable(dev); return 0; +probe: + kfree(config); err: virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED); return err; diff --git a/include/linux/virtio.h b/include/linux/virtio.h index b1894e0323fa..90750567c0cc 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -151,6 +151,8 @@ size_t virtio_max_dma_size(struct virtio_device *vdev); * @feature_table_size: number of entries in the feature table array. * @feature_table_legacy: same as feature_table but when working in legacy mode. * @feature_table_size_legacy: number of entries in feature table legacy array. + * @validate: the function to validate feature bits and config. + * Returns a valid config or NULL to be passed to probe or ERR_PTR(-errno). * @probe: the function to call when a device is found. Returns 0 or -errno. * @scan: optional function to call after successful probe; intended *for virtio-scsi to invoke a scan. @@ -167,8 +169,8 @@ struct virtio_driver { unsigned int feature_table_size; const unsigned int *feature_table_legacy; unsigned int feature_table_size_legacy; - int (*validate)(struct virtio_device *dev); -
Re: [PATCH net-next 0/2] refactor the ringtest testing for ptr_ring
On Mon, Jul 5, 2021 at 9:45 PM Michael S. Tsirkin wrote: > > On Mon, Jul 05, 2021 at 09:36:26PM +0300, Andy Shevchenko wrote: > > On Mon, Jul 05, 2021 at 02:26:32PM -0400, Michael S. Tsirkin wrote: > > > On Mon, Jul 05, 2021 at 08:06:50PM +0800, Yunsheng Lin wrote: > > > > On 2021/7/5 17:56, Andy Shevchenko wrote: > > > > > On Mon, Jul 05, 2021 at 11:57:33AM +0800, Yunsheng Lin wrote: > > > > >> tools/include/* have a lot of abstract layer for building > > > > >> kernel code from userspace, so reuse or add the abstract > > > > >> layer in tools/include/ to build the ptr_ring for ringtest > > > > >> testing. > > > > > > > > > > ... > > > > > > > > > >> create mode 100644 tools/include/asm/cache.h > > > > >> create mode 100644 tools/include/asm/processor.h > > > > >> create mode 100644 tools/include/generated/autoconf.h > > > > >> create mode 100644 tools/include/linux/align.h > > > > >> create mode 100644 tools/include/linux/cache.h > > > > >> create mode 100644 tools/include/linux/slab.h > > > > > > > > > > Maybe somebody can change this to be able to include in-tree headers > > > > > directly? > > > > > > > > If the above works, maybe the files in tools/include/* is not > > > > necessary any more, just use the in-tree headers to compile > > > > the user space app? > > > > > > > > Or I missed something here? > > > > > > why would it work? kernel headers outside of uapi are not > > > intended to be consumed by userspace. > > > > The problem here, that we are almost getting two copies of the headers, and > > tools are not in a good maintenance, so it's often desynchronized from the > > actual Linux headers. This will become more and more diverse if we keep same > > way of operation. So, I would rather NAK any new copies of the headers from > > include/ to tools/include. > > We already have the copies > yes they are not maintained well ... what's the plan then? > NAK won't help us improve the situation. I understand and the proposal is to leave only the files which are not the same (can we do kinda wrappers or so in tools/include rather than copying everything?). > I would say copies are kind of okay just make sure they are > built with kconfig. Then any breakage will be > detected. > > > > > > Besides above, had you tested this with `make O=...`? > > > > > > > > You are right, the generated/autoconf.h is in another directory > > > > with `make O=...`. > > > > > > > > Any nice idea to fix the above problem? -- With Best Regards, Andy Shevchenko ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/2] vdpa: vp_vdpa: don't use hard-coded maximum virtqueue size
On Mon, Jul 05, 2021 at 03:29:47PM +0800, Jason Wang wrote: > > 在 2021/7/5 下午3:26, Michael S. Tsirkin 写道: > > On Mon, Jul 05, 2021 at 03:19:10PM +0800, Jason Wang wrote: > > > This patch switch to read virtqueue size from the capability instead > > > of depending on the hardcoded value. This allows the per virtqueue > > > size could be advertised. > > > > > > Signed-off-by: Jason Wang > > So let's add an ioctl for this? It's really a bug we don't.. > > > As explained in patch 1. Qemu doesn't use VHOST_VDPA_GET_VRING_NUM actually. > Instead it checks the result VHOST_VDPA_SET_VRING_NUM. > > So I change VHOST_VDPA_GET_VRING_NUM to return the minimal size of all the > virtqueues. > > If you wish we can add a VHOST_VDPA_GET_VRING_NUM2, but I'm not sure it will > have a user or not. > > Thanks Question is how do we know returning the minimal and not e.g. the max size is the right thing to do? > > > > > > --- > > > drivers/vdpa/virtio_pci/vp_vdpa.c | 6 -- > > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/vdpa/virtio_pci/vp_vdpa.c > > > b/drivers/vdpa/virtio_pci/vp_vdpa.c > > > index 2926641fb586..198f7076e4d9 100644 > > > --- a/drivers/vdpa/virtio_pci/vp_vdpa.c > > > +++ b/drivers/vdpa/virtio_pci/vp_vdpa.c > > > @@ -18,7 +18,6 @@ > > > #include > > > #include > > > -#define VP_VDPA_QUEUE_MAX 256 > > > #define VP_VDPA_DRIVER_NAME "vp_vdpa" > > > #define VP_VDPA_NAME_SIZE 256 > > > @@ -197,7 +196,10 @@ static void vp_vdpa_set_status(struct vdpa_device > > > *vdpa, u8 status) > > > static u16 vp_vdpa_get_vq_num_max(struct vdpa_device *vdpa, u16 qid) > > > { > > > - return VP_VDPA_QUEUE_MAX; > > > + struct vp_vdpa *vp_vdpa = vdpa_to_vp(vdpa); > > > + struct virtio_pci_modern_device *mdev = _vdpa->mdev; > > > + > > > + return vp_modern_get_queue_size(mdev, qid); > > > } > > > static int vp_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 qid, > > > -- > > > 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next 0/2] refactor the ringtest testing for ptr_ring
On Mon, Jul 05, 2021 at 08:06:50PM +0800, Yunsheng Lin wrote: > On 2021/7/5 17:56, Andy Shevchenko wrote: > > On Mon, Jul 05, 2021 at 11:57:33AM +0800, Yunsheng Lin wrote: > >> tools/include/* have a lot of abstract layer for building > >> kernel code from userspace, so reuse or add the abstract > >> layer in tools/include/ to build the ptr_ring for ringtest > >> testing. > > > > ... > > > >> create mode 100644 tools/include/asm/cache.h > >> create mode 100644 tools/include/asm/processor.h > >> create mode 100644 tools/include/generated/autoconf.h > >> create mode 100644 tools/include/linux/align.h > >> create mode 100644 tools/include/linux/cache.h > >> create mode 100644 tools/include/linux/slab.h > > > > Maybe somebody can change this to be able to include in-tree headers > > directly? > > If the above works, maybe the files in tools/include/* is not > necessary any more, just use the in-tree headers to compile > the user space app? > > Or I missed something here? why would it work? kernel headers outside of uapi are not intended to be consumed by userspace. > > > > Besides above, had you tested this with `make O=...`? > > You are right, the generated/autoconf.h is in another directory > with `make O=...`. > > Any nice idea to fix the above problem? > > > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4] virtio-blk: Add validation for block size in config space
On Mon, Jul 05, 2021 at 06:00:06PM +0800, Xie Yongji wrote: > This ensures that we will not use an invalid block size > in config space (might come from an untrusted device). > > Signed-off-by: Xie Yongji I replied on v3. Silently ignoring what hypervisor said is not a good idea. > --- > drivers/block/virtio_blk.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c > index e4bd3b1fc3c2..e9d7747c3cc0 100644 > --- a/drivers/block/virtio_blk.c > +++ b/drivers/block/virtio_blk.c > @@ -819,7 +819,7 @@ static int virtblk_probe(struct virtio_device *vdev) > err = virtio_cread_feature(vdev, VIRTIO_BLK_F_BLK_SIZE, > struct virtio_blk_config, blk_size, > _size); > - if (!err) > + if (!err && blk_size >= SECTOR_SIZE && blk_size <= PAGE_SIZE) > blk_queue_logical_block_size(q, blk_size); > else > blk_size = queue_logical_block_size(q); > -- > 2.11.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next 0/2] refactor the ringtest testing for ptr_ring
On Mon, Jul 05, 2021 at 09:36:26PM +0300, Andy Shevchenko wrote: > On Mon, Jul 05, 2021 at 02:26:32PM -0400, Michael S. Tsirkin wrote: > > On Mon, Jul 05, 2021 at 08:06:50PM +0800, Yunsheng Lin wrote: > > > On 2021/7/5 17:56, Andy Shevchenko wrote: > > > > On Mon, Jul 05, 2021 at 11:57:33AM +0800, Yunsheng Lin wrote: > > > >> tools/include/* have a lot of abstract layer for building > > > >> kernel code from userspace, so reuse or add the abstract > > > >> layer in tools/include/ to build the ptr_ring for ringtest > > > >> testing. > > > > > > > > ... > > > > > > > >> create mode 100644 tools/include/asm/cache.h > > > >> create mode 100644 tools/include/asm/processor.h > > > >> create mode 100644 tools/include/generated/autoconf.h > > > >> create mode 100644 tools/include/linux/align.h > > > >> create mode 100644 tools/include/linux/cache.h > > > >> create mode 100644 tools/include/linux/slab.h > > > > > > > > Maybe somebody can change this to be able to include in-tree headers > > > > directly? > > > > > > If the above works, maybe the files in tools/include/* is not > > > necessary any more, just use the in-tree headers to compile > > > the user space app? > > > > > > Or I missed something here? > > > > why would it work? kernel headers outside of uapi are not > > intended to be consumed by userspace. > > The problem here, that we are almost getting two copies of the headers, and > tools are not in a good maintenance, so it's often desynchronized from the > actual Linux headers. This will become more and more diverse if we keep same > way of operation. So, I would rather NAK any new copies of the headers from > include/ to tools/include. We already have the copies yes they are not maintained well ... what's the plan then? NAK won't help us improve the situation. I would say copies are kind of okay just make sure they are built with kconfig. Then any breakage will be detected. > > > > Besides above, had you tested this with `make O=...`? > > > > > > You are right, the generated/autoconf.h is in another directory > > > with `make O=...`. > > > > > > Any nice idea to fix the above problem? > > -- > With Best Regards, > Andy Shevchenko > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V2 2/2] vDPA/ifcvf: implement management netlink framework for ifcvf
On Mon, Jul 05, 2021 at 10:13:33PM +0800, Zhu Lingshan wrote: > This commit implments the management netlink framework for ifcvf, implements > including register and add / remove a device > > It works with iprouter2: I am guessing iproute2? > [root@localhost lszhu]# vdpa mgmtdev show -jp > { > "mgmtdev": { > "pci/:01:00.5": { > "supported_classes": [ "net" ] > }, > "pci/:01:00.6": { > "supported_classes": [ "net" ] > } > } > } > > [root@localhost lszhu]# vdpa dev add mgmtdev pci/:01:00.5 name vdpa0 > [root@localhost lszhu]# vdpa dev add mgmtdev pci/:01:00.6 name vdpa1 > > Signed-off-by: Zhu Lingshan > --- > drivers/vdpa/ifcvf/ifcvf_base.h | 6 ++ > drivers/vdpa/ifcvf/ifcvf_main.c | 154 > 2 files changed, 124 insertions(+), 36 deletions(-) > > diff --git a/drivers/vdpa/ifcvf/ifcvf_base.h b/drivers/vdpa/ifcvf/ifcvf_base.h > index ded1b1b5fb13..e5251fcbb200 100644 > --- a/drivers/vdpa/ifcvf/ifcvf_base.h > +++ b/drivers/vdpa/ifcvf/ifcvf_base.h > @@ -104,6 +104,12 @@ struct ifcvf_lm_cfg { > struct ifcvf_vring_lm_cfg vring_lm_cfg[IFCVF_MAX_QUEUE_PAIRS]; > }; > > +struct ifcvf_vdpa_mgmt_dev { > + struct vdpa_mgmt_dev mdev; > + struct ifcvf_adapter *adapter; > + struct pci_dev *pdev; > +}; > + > int ifcvf_init_hw(struct ifcvf_hw *hw, struct pci_dev *dev); > int ifcvf_start_hw(struct ifcvf_hw *hw); > void ifcvf_stop_hw(struct ifcvf_hw *hw); > diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c > index 5f70ab1283a0..c72d9b36e4a0 100644 > --- a/drivers/vdpa/ifcvf/ifcvf_main.c > +++ b/drivers/vdpa/ifcvf/ifcvf_main.c > @@ -218,7 +218,7 @@ static void ifcvf_vdpa_set_status(struct vdpa_device > *vdpa_dev, u8 status) > int ret; > > vf = vdpa_to_vf(vdpa_dev); > - adapter = dev_get_drvdata(vdpa_dev->dev.parent); > + adapter = vdpa_to_adapter(vdpa_dev); > status_old = ifcvf_get_status(vf); > > if (status_old == status) > @@ -442,6 +442,16 @@ static const struct vdpa_config_ops ifc_vdpa_ops = { > .set_config_cb = ifcvf_vdpa_set_config_cb, > }; > > +static struct virtio_device_id id_table_net[] = { > + {VIRTIO_ID_NET, VIRTIO_DEV_ANY_ID}, > + {0}, > +}; > + > +static struct virtio_device_id id_table_blk[] = { > + {VIRTIO_ID_BLOCK, VIRTIO_DEV_ANY_ID}, > + {0}, > +}; > + > static u32 get_dev_type(struct pci_dev *pdev) > { > u32 dev_type; > @@ -462,48 +472,30 @@ static u32 get_dev_type(struct pci_dev *pdev) > return dev_type; > } > > -static int ifcvf_probe(struct pci_dev *pdev, const struct pci_device_id *id) > +static int ifcvf_vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name) > { > - struct device *dev = >dev; > + struct ifcvf_vdpa_mgmt_dev *ifcvf_mgmt_dev; > struct ifcvf_adapter *adapter; > + struct pci_dev *pdev; > struct ifcvf_hw *vf; > + struct device *dev; > int ret, i; > > - ret = pcim_enable_device(pdev); > - if (ret) { > - IFCVF_ERR(pdev, "Failed to enable device\n"); > - return ret; > - } > - > - ret = pcim_iomap_regions(pdev, BIT(0) | BIT(2) | BIT(4), > - IFCVF_DRIVER_NAME); > - if (ret) { > - IFCVF_ERR(pdev, "Failed to request MMIO region\n"); > - return ret; > - } > - > - ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64)); > - if (ret) { > - IFCVF_ERR(pdev, "No usable DMA configuration\n"); > - return ret; > - } > - > - ret = devm_add_action_or_reset(dev, ifcvf_free_irq_vectors, pdev); > - if (ret) { > - IFCVF_ERR(pdev, > - "Failed for adding devres for freeing irq vectors\n"); > - return ret; > - } > + ifcvf_mgmt_dev = container_of(mdev, struct ifcvf_vdpa_mgmt_dev, mdev); > + if (ifcvf_mgmt_dev->adapter) > + return -EOPNOTSUPP; > > + pdev = ifcvf_mgmt_dev->pdev; > + dev = >dev; > adapter = vdpa_alloc_device(struct ifcvf_adapter, vdpa, > - dev, _vdpa_ops, NULL); > - if (adapter == NULL) { > + dev, _vdpa_ops, name); > + if (!adapter) { > IFCVF_ERR(pdev, "Failed to allocate vDPA structure"); > return -ENOMEM; > } > > - pci_set_master(pdev); > - pci_set_drvdata(pdev, adapter); > + ifcvf_mgmt_dev->adapter = adapter; > + pci_set_drvdata(pdev, ifcvf_mgmt_dev); > > vf = >vf; > vf->dev_type = get_dev_type(pdev); > @@ -523,9 +515,10 @@ static int ifcvf_probe(struct pci_dev *pdev, const > struct pci_device_id *id) > > vf->hw_features = ifcvf_get_hw_features(vf); > > - ret = vdpa_register_device(>vdpa, IFCVF_MAX_QUEUE_PAIRS * 2); > + adapter->vdpa.mdev = _mgmt_dev->mdev; > + ret = _vdpa_register_device(>vdpa, IFCVF_MAX_QUEUE_PAIRS * 2); >
Re: [PATCH net-next 1/2] tools: add missing infrastructure for building ptr_ring.h
On Mon, Jul 05, 2021 at 11:57:34AM +0800, Yunsheng Lin wrote: > In order to build ptr_ring.h in userspace, the cacheline > aligning, cpu_relax() and slab related infrastructure is > needed, so add them in this patch. > > As L1_CACHE_BYTES may be different for different arch, which > is mostly defined in include/generated/autoconf.h, so user may > need to do "make defconfig" before building a tool using the > API in linux/cache.h. > > Also "linux/lockdep.h" is not added in "tools/include" yet, > so remove it in "linux/spinlock.h", and the only place using > "linux/spinlock.h" is tools/testing/radix-tree, removing that > does not break radix-tree testing. > > Signed-off-by: Yunsheng Lin This is hard to review. Try to split this please. Functional changes separate from merely moving code around. > --- > tools/include/asm/cache.h | 56 > tools/include/asm/processor.h | 36 > tools/include/generated/autoconf.h | 1 + > tools/include/linux/align.h| 15 +++ > tools/include/linux/cache.h| 87 > ++ > tools/include/linux/gfp.h | 4 ++ > tools/include/linux/slab.h | 46 > tools/include/linux/spinlock.h | 2 - > 8 files changed, 245 insertions(+), 2 deletions(-) > create mode 100644 tools/include/asm/cache.h > create mode 100644 tools/include/asm/processor.h > create mode 100644 tools/include/generated/autoconf.h > create mode 100644 tools/include/linux/align.h > create mode 100644 tools/include/linux/cache.h > create mode 100644 tools/include/linux/slab.h > > diff --git a/tools/include/asm/cache.h b/tools/include/asm/cache.h > new file mode 100644 > index 000..071e310 > --- /dev/null > +++ b/tools/include/asm/cache.h > @@ -0,0 +1,56 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > + > +#ifndef __TOOLS_LINUX_ASM_CACHE_H > +#define __TOOLS_LINUX_ASM_CACHE_H > + > +#include > + > +#if defined(__i386__) || defined(__x86_64__) > +#define L1_CACHE_SHIFT (CONFIG_X86_L1_CACHE_SHIFT) > +#elif defined(__arm__) > +#define L1_CACHE_SHIFT (CONFIG_ARM_L1_CACHE_SHIFT) > +#elif defined(__aarch64__) > +#define L1_CACHE_SHIFT (6) > +#elif defined(__powerpc__) > + > +/* bytes per L1 cache line */ > +#if defined(CONFIG_PPC_8xx) > +#define L1_CACHE_SHIFT 4 > +#elif defined(CONFIG_PPC_E500MC) > +#define L1_CACHE_SHIFT 6 > +#elif defined(CONFIG_PPC32) > +#if defined(CONFIG_PPC_47x) > +#define L1_CACHE_SHIFT 7 > +#else > +#define L1_CACHE_SHIFT 5 > +#endif > +#else /* CONFIG_PPC64 */ > +#define L1_CACHE_SHIFT 7 > +#endif > + > +#elif defined(__sparc__) > +#define L1_CACHE_SHIFT 5 > +#elif defined(__alpha__) > + > +#if defined(CONFIG_ALPHA_GENERIC) || defined(CONFIG_ALPHA_EV6) > +#define L1_CACHE_SHIFT 6 > +#else > +/* Both EV4 and EV5 are write-through, read-allocate, > + direct-mapped, physical. > +*/ > +#define L1_CACHE_SHIFT 5 > +#endif > + > +#elif defined(__mips__) > +#define L1_CACHE_SHIFT CONFIG_MIPS_L1_CACHE_SHIFT > +#elif defined(__ia64__) > +#define L1_CACHE_SHIFT CONFIG_IA64_L1_CACHE_SHIFT > +#elif defined(__nds32__) > +#define L1_CACHE_SHIFT 5 > +#else > +#define L1_CACHE_SHIFT 5 > +#endif > + > +#define L1_CACHE_BYTES (1 << L1_CACHE_SHIFT) > + > +#endif > diff --git a/tools/include/asm/processor.h b/tools/include/asm/processor.h > new file mode 100644 > index 000..3198ad6 > --- /dev/null > +++ b/tools/include/asm/processor.h > @@ -0,0 +1,36 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > + > +#ifndef __TOOLS_LINUX_ASM_PROCESSOR_H > +#define __TOOLS_LINUX_ASM_PROCESSOR_H > + > +#include > + > +#if defined(__i386__) || defined(__x86_64__) > +#include "../../arch/x86/include/asm/vdso/processor.h" > +#elif defined(__arm__) > +#include "../../arch/arm/include/asm/vdso/processor.h" > +#elif defined(__aarch64__) > +#include "../../arch/arm64/include/asm/vdso/processor.h" > +#elif defined(__powerpc__) > +#include "../../arch/powerpc/include/vdso/processor.h" > +#elif defined(__s390__) > +#include "../../arch/s390/include/vdso/processor.h" > +#elif defined(__sh__) > +#include "../../arch/sh/include/asm/processor.h" > +#elif defined(__sparc__) > +#include "../../arch/sparc/include/asm/processor.h" > +#elif defined(__alpha__) > +#include "../../arch/alpha/include/asm/processor.h" > +#elif defined(__mips__) > +#include "../../arch/mips/include/asm/vdso/processor.h" > +#elif defined(__ia64__) > +#include "../../arch/ia64/include/asm/processor.h" > +#elif defined(__xtensa__) > +#include "../../arch/xtensa/include/asm/processor.h" > +#elif defined(__nds32__) > +#include "../../arch/nds32/include/asm/processor.h" > +#else > +#define cpu_relax() sched_yield() Does this have a chance to work outside of kernel? > +#endif did you actually test or even test build all these arches? Not sure we need to bother with hacks like these. > + > +#endif > diff --git
Re: [PATCH V2 5/6] virtio: add one field into virtio_device for recording if device uses managed irq
On Fri, Jul 02, 2021 at 11:05:54PM +0800, Ming Lei wrote: > blk-mq needs to know if the device uses managed irq, so add one field > to virtio_device for recording if device uses managed irq. > > If the driver use managed irq, this flag has to be set so it can be > passed to blk-mq. I don't think all this boilerplate code make a whole lot of sense. I think we need to record this information deep down in the irq code by setting a flag in struct device only if pci_alloc_irq_vectors_affinity atually managed to allocate multiple vectors and the PCI_IRQ_AFFINITY flag was set. Then blk-mq can look at that flag, and also check that more than one queue is in used and work based on that. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization