Re: [PATCH v12] i2c: virtio: add a virtio i2c frontend driver

2021-07-05 Thread Viresh Kumar
On 05-07-21, 14:21, Jie Deng wrote:
> 
> On 2021/7/5 10:43, Viresh Kumar wrote:
> > On 02-07-21, 12:58, Andy Shevchenko wrote:
> > > On Fri, Jul 02, 2021 at 04:46:47PM +0800, Jie Deng wrote:
> > > > +static int virtio_i2c_complete_reqs(struct virtqueue *vq,
> > > > +   struct virtio_i2c_req *reqs,
> > > > +   struct i2c_msg *msgs, int nr,
> > > > +   bool fail)
> > > > +{
> > > > +   struct virtio_i2c_req *req;
> > > > +   bool failed = fail;
> > Jie, you can actually get rid of this variable too. Jut rename fail to 
> > failed
> > and everything shall work as you want.
> 
> 
> Oh, You are not right. I just found we can't remove this variable. The
> "fail" and "failed" have different
> 
> meanings for this function. We need fail to return the result.

Ahh, yes. You are right. Maybe rename fail to timedout, it would make it more
readable, else fail and failed look very similar.
 
> > > > +   unsigned int len;
> > > > +   int i, j = 0;
> > > > +
> > > > +   for (i = 0; i < nr; i++) {
> > > > +   /* Detach the ith request from the vq */
> > > > +   req = virtqueue_get_buf(vq, );
> > > > +
> > > > +   /*
> > > > +* Condition (req && req == [i]) should always 
> > > > meet since
> > > > +* we have total nr requests in the vq.
> > > > +*/
> > > > +   if (!failed && (WARN_ON(!(req && req == [i])) ||
> > > > +   (req->in_hdr.status != VIRTIO_I2C_MSG_OK)))
> > > > +   failed = true;
> > > ...and after failed is true, we are continuing the loop, why?
> > Actually this function can be called with fail set to true. We proceed as we
> > need to call i2c_put_dma_safe_msg_buf() for all buffers we allocated 
> > earlier.
> > 
> > > > +   i2c_put_dma_safe_msg_buf(reqs[i].buf, [i], 
> > > > !failed);
> > > > +   if (!failed)
> > > > +   ++j;
> > > Besides better to read j++ the j itself can be renamed to something more
> > > verbose.
> > > 
> > > > +   }
> > > > +   return (fail ? -ETIMEDOUT : j);
> > > Redundant parentheses.
> > > 
> > > > +}

-- 
viresh
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC PATCH 05/12] x86/sev: Use GHCB protocol version 2 if supported

2021-07-05 Thread Joerg Roedel
From: Joerg Roedel 

Check whether the hypervisor supports GHCB version 2 and use it if
available.

Signed-off-by: Joerg Roedel 
---
 arch/x86/boot/compressed/sev.c | 10 --
 arch/x86/include/asm/sev.h |  4 ++--
 arch/x86/kernel/sev-shared.c   | 17 ++---
 3 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 101e08c67296..7f8416f76be7 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -119,16 +119,22 @@ static enum es_result vc_read_mem(struct es_em_ctxt *ctxt,
 /* Include code for early handlers */
 #include "../../kernel/sev-shared.c"
 
+static unsigned int ghcb_protocol;
+
 static u64 sev_get_ghcb_proto_ver(void)
 {
-   return GHCB_PROTOCOL_MAX;
+   return ghcb_protocol;
 }
 
 static bool early_setup_sev_es(void)
 {
-   if (!sev_es_negotiate_protocol(NULL))
+   struct sev_ghcb_protocol_info info;
+
+   if (!sev_es_negotiate_protocol())
sev_es_terminate(GHCB_SEV_ES_REASON_PROTOCOL_UNSUPPORTED);
 
+   ghcb_protocol = info.vm_proto;
+
if (set_page_decrypted((unsigned long)_ghcb_page))
return false;
 
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index fa5cd05d3b5b..134a7c9d91b6 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -12,8 +12,8 @@
 #include 
 #include 
 
-#define GHCB_PROTO_OUR 0x0001UL
-#define GHCB_PROTOCOL_MAX  1ULL
+#define GHCB_PROTOCOL_MIN  1ULL
+#define GHCB_PROTOCOL_MAX  2ULL
 #define GHCB_DEFAULT_USAGE 0ULL
 
 #defineVMGEXIT()   { asm volatile("rep; 
vmmcall\n\r"); }
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 36eaac2773ed..40a1ca81bdb8 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -61,6 +61,7 @@ static void __noreturn sev_es_terminate(unsigned int reason)
 
 static bool sev_es_negotiate_protocol(struct sev_ghcb_protocol_info *info)
 {
+   unsigned int protocol;
u64 val;
 
/* Do the GHCB protocol version negotiation */
@@ -71,14 +72,24 @@ static bool sev_es_negotiate_protocol(struct 
sev_ghcb_protocol_info *info)
if (GHCB_MSR_INFO(val) != GHCB_MSR_SEV_INFO_RESP)
return false;
 
-   if (GHCB_MSR_PROTO_MAX(val) < GHCB_PROTO_OUR ||
-   GHCB_MSR_PROTO_MIN(val) > GHCB_PROTO_OUR)
+   /* Sanity check untrusted input */
+   if (GHCB_MSR_PROTO_MIN(val) > GHCB_MSR_PROTO_MAX(val))
+   return false;
+
+   /* Use maximum supported protocol version */
+   protocol = min_t(unsigned int, GHCB_MSR_PROTO_MAX(val), 
GHCB_PROTOCOL_MAX);
+
+   /*
+* Hypervisor does not support any protocol version required for this
+* kernel.
+*/
+   if (protocol < GHCB_MSR_PROTO_MIN(val))
return false;
 
if (info) {
info->hv_proto_min = GHCB_MSR_PROTO_MIN(val);
info->hv_proto_max = GHCB_MSR_PROTO_MAX(val);
-   info->vm_proto = GHCB_PROTO_OUR;
+   info->vm_proto = protocol;
}
 
return true;
-- 
2.31.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC PATCH 10/12] x86/sev: Add MMIO handling support to boot/compressed/ code

2021-07-05 Thread Joerg Roedel
From: Joerg Roedel 

Move the code for MMIO handling in the #VC handler to sev-shared.c so
that it can be used in the decompressor code. The decompressor needs
to handle MMIO events for writing to the VGA framebuffer.

When the kernel is booted via UEFI the VGA console is not enabled that
early, but a kexec boot will enable it and the decompressor needs MMIO
support to write to the frame buffer.

This also requires to share some code from lib/insn-eval.c. Since
insn-eval.c can't be included into the decompressor code directly,
move the relevant parts into lib/insn-eval-shared.c and include that
file.

Signed-off-by: Joerg Roedel 
---
 arch/x86/boot/compressed/sev.c  |  43 +-
 arch/x86/kernel/sev-shared.c| 282 +++
 arch/x86/kernel/sev.c   | 282 ---
 arch/x86/lib/insn-eval-shared.c | 805 
 arch/x86/lib/insn-eval.c| 802 +--
 5 files changed, 1116 insertions(+), 1098 deletions(-)
 create mode 100644 arch/x86/lib/insn-eval-shared.c

diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 7f8416f76be7..3ffb3d873989 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -26,24 +26,8 @@
 struct ghcb boot_ghcb_page __aligned(PAGE_SIZE);
 struct ghcb *boot_ghcb;
 
-/*
- * Copy a version of this function here - insn-eval.c can't be used in
- * pre-decompression code.
- */
-static bool insn_has_rep_prefix(struct insn *insn)
-{
-   insn_byte_t p;
-   int i;
-
-   insn_get_prefixes(insn);
-
-   for_each_insn_prefix(insn, i, p) {
-   if (p == 0xf2 || p == 0xf3)
-   return true;
-   }
-
-   return false;
-}
+#undef WARN_ONCE
+#define WARN_ONCE(condition, format...)
 
 /*
  * Only a dummy for insn_get_seg_base() - Early boot-code is 64bit only and
@@ -54,6 +38,17 @@ static unsigned long insn_get_seg_base(struct pt_regs *regs, 
int seg_reg_idx)
return 0UL;
 }
 
+/* The decompressor only uses flat segments */
+static int get_seg_base_limit(struct insn *insn, struct pt_regs *regs,
+ int regoff, unsigned long *base,
+ unsigned long *limit)
+{
+   if (base)
+   *base  = 0L;
+   if (limit)
+   *limit = ~0L;
+}
+
 static inline u64 sev_es_rd_ghcb_msr(void)
 {
unsigned long low, high;
@@ -105,6 +100,14 @@ static enum es_result vc_read_mem(struct es_em_ctxt *ctxt,
return ES_OK;
 }
 
+static enum es_result vc_slow_virt_to_phys(struct ghcb *ghcb, struct 
es_em_ctxt *ctxt,
+  unsigned long vaddr, phys_addr_t 
*paddr)
+{
+   *paddr = (phys_addr_t)vaddr;
+
+   return ES_OK;
+}
+
 #undef __init
 #undef __pa
 #define __init
@@ -115,6 +118,7 @@ static enum es_result vc_read_mem(struct es_em_ctxt *ctxt,
 /* Basic instruction decoding support needed */
 #include "../../lib/inat.c"
 #include "../../lib/insn.c"
+#include "../../lib/insn-eval-shared.c"
 
 /* Include code for early handlers */
 #include "../../kernel/sev-shared.c"
@@ -204,6 +208,9 @@ void do_boot_stage2_vc(struct pt_regs *regs, unsigned long 
exit_code)
case SVM_EXIT_CPUID:
result = vc_handle_cpuid(boot_ghcb, );
break;
+   case SVM_EXIT_NPF:
+   result = vc_handle_mmio(boot_ghcb, );
+   break;
default:
result = ES_UNSUPPORTED;
break;
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 40a1ca81bdb8..a7a0793c4f98 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -558,3 +558,285 @@ static enum es_result vc_handle_rdtsc(struct ghcb *ghcb,
 
return ES_OK;
 }
+
+static long *vc_insn_get_reg(struct es_em_ctxt *ctxt)
+{
+   long *reg_array;
+   int offset;
+
+   reg_array = (long *)ctxt->regs;
+   offset= insn_get_modrm_reg_off(>insn, ctxt->regs);
+
+   if (offset < 0)
+   return NULL;
+
+   offset /= sizeof(long);
+
+   return reg_array + offset;
+}
+
+static long *vc_insn_get_rm(struct es_em_ctxt *ctxt)
+{
+   long *reg_array;
+   int offset;
+
+   reg_array = (long *)ctxt->regs;
+   offset= insn_get_modrm_rm_off(>insn, ctxt->regs);
+
+   if (offset < 0)
+   return NULL;
+
+   offset /= sizeof(long);
+
+   return reg_array + offset;
+}
+static enum es_result vc_do_mmio(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
+unsigned int bytes, bool read)
+{
+   u64 exit_code, exit_info_1, exit_info_2;
+   unsigned long ghcb_pa = __pa(ghcb);
+   enum es_result res;
+   phys_addr_t paddr;
+   void __user *ref;
+
+   ref = insn_get_addr_ref(>insn, ctxt->regs);
+   if (ref == (void __user *)-1L)
+   return ES_UNSUPPORTED;
+
+   exit_code = read ? SVM_VMGEXIT_MMIO_READ : SVM_VMGEXIT_MMIO_WRITE;
+
+   res 

[RFC PATCH 07/12] x86/sev: Setup code to park APs in the AP Jump Table

2021-07-05 Thread Joerg Roedel
From: Joerg Roedel 

The AP Jump Table under SEV-ES contains the reset vector where non-boot
CPUs start executing when coming out of reset. This means that a CPU
coming out of the AP-reset-hold VMGEXIT also needs to start executing at
the reset vector stored in the AP Jump Table.

The problem is to find a safe place to put the real-mode code which
executes the VMGEXIT and jumps to the reset vector. The code can not be
in kernel memory, because after kexec that memory is owned by the new
kernel and the code might have been overwritten.

Fortunately the AP Jump Table itself is a safe place, because the
memory is not owned by the OS and will not be overwritten by a new
kernel started through kexec. The table is 4k in size and only the
first 4 bytes are used for the reset vector. This leaves enough space
for some 16-bit code to do the job and even a small stack.

Install 16-bit code into the AP Jump Table under SEV-ES after the APs
have been brought up. The code will do an AP-reset-hold VMGEXIT and jump
to the reset vector after being woken up.

Signed-off-by: Joerg Roedel 
---
 arch/x86/include/asm/realmode.h |   2 +
 arch/x86/include/asm/sev-ap-jumptable.h |  25 +
 arch/x86/kernel/sev.c   | 105 +++
 arch/x86/realmode/Makefile  |   9 +-
 arch/x86/realmode/rmpiggy.S |   6 ++
 arch/x86/realmode/sev/Makefile  |  41 
 arch/x86/realmode/sev/ap_jump_table.S   | 130 
 arch/x86/realmode/sev/ap_jump_table.lds |  24 +
 8 files changed, 341 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/include/asm/sev-ap-jumptable.h
 create mode 100644 arch/x86/realmode/sev/Makefile
 create mode 100644 arch/x86/realmode/sev/ap_jump_table.S
 create mode 100644 arch/x86/realmode/sev/ap_jump_table.lds

diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
index 5db5d083c873..29590a4ddf24 100644
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -62,6 +62,8 @@ extern unsigned long initial_gs;
 extern unsigned long initial_stack;
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 extern unsigned long initial_vc_handler;
+extern unsigned char rm_ap_jump_table_blob[];
+extern unsigned char rm_ap_jump_table_blob_end[];
 #endif
 
 extern unsigned char real_mode_blob[];
diff --git a/arch/x86/include/asm/sev-ap-jumptable.h 
b/arch/x86/include/asm/sev-ap-jumptable.h
new file mode 100644
index ..1c8b2ce779e2
--- /dev/null
+++ b/arch/x86/include/asm/sev-ap-jumptable.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * AMD Encrypted Register State Support
+ *
+ * Author: Joerg Roedel 
+ */
+#ifndef __ASM_SEV_AP_JUMPTABLE_H
+#define __ASM_SEV_AP_JUMPTABLE_H
+
+#defineSEV_APJT_CS16   0x8
+#defineSEV_APJT_DS16   0x10
+
+#define SEV_APJT_ENTRY 0x10
+
+#ifndef __ASSEMBLY__
+
+struct sev_ap_jump_table_header {
+   u16 reset_ip;
+   u16 reset_cs;
+   u16 gdt_offset;
+};
+
+#endif /* !__ASSEMBLY__ */
+
+#endif /* __ASM_SEV_AP_JUMPTABLE_H */
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index eedba56b6bac..0946ef732a62 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -45,6 +46,9 @@ static struct ghcb __initdata *boot_ghcb;
 /* Cached AP Jump Table Address */
 static phys_addr_t sev_es_jump_table_pa;
 
+/* Whether the AP Jump Table blob was successfully installed */
+static bool sev_ap_jumptable_blob_installed __ro_after_init;
+
 /* #VC handler runtime per-CPU data */
 struct sev_es_runtime_data {
struct ghcb ghcb_page;
@@ -749,6 +753,107 @@ static void __init sev_es_setup_play_dead(void)
 static inline void sev_es_setup_play_dead(void) { }
 #endif
 
+/*
+ * This function make the necessary runtime changes to the AP Jump Table blob.
+ * For now this only sets up the GDT used while the code executes. The GDT 
needs
+ * to contain 16-bit code and data segments with a base that points to AP Jump
+ * Table page.
+ */
+static void __init sev_es_setup_ap_jump_table_data(void *base, u32 pa)
+{
+   struct sev_ap_jump_table_header *header;
+   struct desc_ptr *gdt_descr;
+   u64 *ap_jumptable_gdt;
+
+   header = base;
+
+   /*
+* Setup 16-bit protected mode code and data segments for AP Jumptable.
+* Set the segment limits to 0x to already be compatible with
+* real-mode.
+*/
+   ap_jumptable_gdt = (u64 *)(base + header->gdt_offset);
+   ap_jumptable_gdt[SEV_APJT_CS16 / 8] = GDT_ENTRY(0x9b, pa, 0x);
+   ap_jumptable_gdt[SEV_APJT_DS16 / 8] = GDT_ENTRY(0x93, pa, 0x);
+
+   /* Write correct GDT base address into GDT descriptor */
+   gdt_descr = (struct desc_ptr *)(base + header->gdt_offset);
+   gdt_descr->address += pa;
+}
+
+/*
+ * This function sets up the AP Jump Table blob which contains code which runs
+ * in 16-bit protected mode to 

[RFC PATCH 11/12] x86/sev: Handle CLFLUSH MMIO events

2021-07-05 Thread Joerg Roedel
From: Joerg Roedel 

Handle CLFLUSH instruction to MMIO memory in the #VC handler. The
instruction is ignored by the handler, as the Hypervisor is
responsible for cache management of emulated MMIO memory.

Signed-off-by: Joerg Roedel 
---
 arch/x86/kernel/sev-shared.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index a7a0793c4f98..682fa202444f 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -632,6 +632,15 @@ static enum es_result vc_handle_mmio_twobyte_ops(struct 
ghcb *ghcb,
long *reg_data;
 
switch (insn->opcode.bytes[1]) {
+   /* CLFLUSH */
+   case 0xae:
+   /*
+* Ignore CLFLUSHes - those go to emulated MMIO anyway and the
+* hypervisor is responsible for cache management.
+*/
+   ret = ES_OK;
+   break;
+
/* MMIO Read w/ zero-extension */
case 0xb6:
bytes = 1;
-- 
2.31.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC PATCH 08/12] x86/sev: Park APs on AP Jump Table with GHCB protocol version 2

2021-07-05 Thread Joerg Roedel
From: Joerg Roedel 

GHCB protocol version 2 adds the MSR-based AP-reset-hold VMGEXIT which
does not need a GHCB. Use that to park APs in 16-bit protected mode on
the AP Jump Table.

Signed-off-by: Joerg Roedel 
---
 arch/x86/include/asm/realmode.h|  3 +
 arch/x86/kernel/sev.c  | 48 ++--
 arch/x86/realmode/rm/Makefile  | 11 ++--
 arch/x86/realmode/rm/header.S  |  3 +
 arch/x86/realmode/rm/sev_ap_park.S | 89 ++
 5 files changed, 144 insertions(+), 10 deletions(-)
 create mode 100644 arch/x86/realmode/rm/sev_ap_park.S

diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
index 29590a4ddf24..668de0a8b1ae 100644
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -23,6 +23,9 @@ struct real_mode_header {
u32 trampoline_header;
 #ifdef CONFIG_AMD_MEM_ENCRYPT
u32 sev_es_trampoline_start;
+   u32 sev_real_ap_park_asm;
+   u32 sev_real_ap_park_seg;
+   u32 sev_ap_park_gdt;
 #endif
 #ifdef CONFIG_X86_64
u32 trampoline_pgd;
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 0946ef732a62..2147ebd0e919 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -695,6 +696,35 @@ static bool __init sev_es_setup_ghcb(void)
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
+static void __noreturn sev_jumptable_ap_park(void)
+{
+   local_irq_disable();
+
+   write_cr3(real_mode_header->trampoline_pgd);
+
+   /* Exiting long mode will fail if CR4.PCIDE is set. */
+   if (boot_cpu_has(X86_FEATURE_PCID))
+   cr4_clear_bits(X86_CR4_PCIDE);
+
+   asm volatile("xorq  %%r15, %%r15\n"
+"xorq  %%r14, %%r14\n"
+"xorq  %%r13, %%r13\n"
+"xorq  %%r12, %%r12\n"
+"xorq  %%r11, %%r11\n"
+"xorq  %%r10, %%r10\n"
+"xorq  %%r9,  %%r9\n"
+"xorq  %%r8,  %%r8\n"
+"xorq  %%rsi, %%rsi\n"
+"xorq  %%rdi, %%rdi\n"
+"xorq  %%rsp, %%rsp\n"
+"xorq  %%rbp, %%rbp\n"
+"ljmpl *%0" : :
+"m" (real_mode_header->sev_real_ap_park_asm),
+"b" (sev_es_jump_table_pa >> 4));
+   unreachable();
+}
+STACK_FRAME_NON_STANDARD(sev_jumptable_ap_park);
+
 static void sev_es_ap_hlt_loop(void)
 {
struct ghcb_state state;
@@ -731,8 +761,10 @@ static void sev_es_play_dead(void)
play_dead_common();
 
/* IRQs now disabled */
-
-   sev_es_ap_hlt_loop();
+   if (sev_ap_jumptable_blob_installed)
+   sev_jumptable_ap_park();
+   else
+   sev_es_ap_hlt_loop();
 
/*
 * If we get here, the VCPU was woken up again. Jump to CPU
@@ -762,8 +794,9 @@ static inline void sev_es_setup_play_dead(void) { }
 static void __init sev_es_setup_ap_jump_table_data(void *base, u32 pa)
 {
struct sev_ap_jump_table_header *header;
+   u64 *ap_jumptable_gdt, *sev_ap_park_gdt;
struct desc_ptr *gdt_descr;
-   u64 *ap_jumptable_gdt;
+   int idx;
 
header = base;
 
@@ -773,8 +806,13 @@ static void __init sev_es_setup_ap_jump_table_data(void 
*base, u32 pa)
 * real-mode.
 */
ap_jumptable_gdt = (u64 *)(base + header->gdt_offset);
-   ap_jumptable_gdt[SEV_APJT_CS16 / 8] = GDT_ENTRY(0x9b, pa, 0x);
-   ap_jumptable_gdt[SEV_APJT_DS16 / 8] = GDT_ENTRY(0x93, pa, 0x);
+   sev_ap_park_gdt  = __va(real_mode_header->sev_ap_park_gdt);
+
+   idx = SEV_APJT_CS16 / 8;
+   ap_jumptable_gdt[idx] = sev_ap_park_gdt[idx] = GDT_ENTRY(0x9b, pa, 
0x);
+
+   idx = SEV_APJT_DS16 / 8;
+   ap_jumptable_gdt[idx] = sev_ap_park_gdt[idx] = GDT_ENTRY(0x93, pa, 
0x);
 
/* Write correct GDT base address into GDT descriptor */
gdt_descr = (struct desc_ptr *)(base + header->gdt_offset);
diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
index 83f1b6a56449..a7f84d43a0a3 100644
--- a/arch/x86/realmode/rm/Makefile
+++ b/arch/x86/realmode/rm/Makefile
@@ -27,11 +27,12 @@ wakeup-objs += video-vga.o
 wakeup-objs+= video-vesa.o
 wakeup-objs+= video-bios.o
 
-realmode-y += header.o
-realmode-y += trampoline_$(BITS).o
-realmode-y += stack.o
-realmode-y += reboot.o
-realmode-$(CONFIG_ACPI_SLEEP)  += $(wakeup-objs)
+realmode-y += header.o
+realmode-y += trampoline_$(BITS).o
+realmode-y += stack.o
+realmode-y += reboot.o
+realmode-$(CONFIG_ACPI_SLEEP)  += $(wakeup-objs)
+realmode-$(CONFIG_AMD_MEM_ENCRYPT) 

[RFC PATCH 04/12] x86/sev: Do not hardcode GHCB protocol version

2021-07-05 Thread Joerg Roedel
From: Joerg Roedel 

Introduce the sev_get_ghcb_proto_ver() which will return the negotiated
GHCB protocol version and use it to set the version field in the GHCB.

Signed-off-by: Joerg Roedel 
---
 arch/x86/boot/compressed/sev.c | 5 +
 arch/x86/kernel/sev-shared.c   | 5 -
 arch/x86/kernel/sev.c  | 5 +
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 1a2e49730f8b..101e08c67296 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -119,6 +119,11 @@ static enum es_result vc_read_mem(struct es_em_ctxt *ctxt,
 /* Include code for early handlers */
 #include "../../kernel/sev-shared.c"
 
+static u64 sev_get_ghcb_proto_ver(void)
+{
+   return GHCB_PROTOCOL_MAX;
+}
+
 static bool early_setup_sev_es(void)
 {
if (!sev_es_negotiate_protocol(NULL))
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 73eeb5897d16..36eaac2773ed 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -28,6 +28,9 @@ struct sev_ghcb_protocol_info {
unsigned int vm_proto;
 };
 
+/* Returns the negotiated GHCB Protocol version */
+static u64 sev_get_ghcb_proto_ver(void);
+
 static bool __init sev_es_check_cpu_features(void)
 {
if (!has_cpuflag(X86_FEATURE_RDRAND)) {
@@ -122,7 +125,7 @@ static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
enum es_result ret;
 
/* Fill in protocol and format specifiers */
-   ghcb->protocol_version = GHCB_PROTOCOL_MAX;
+   ghcb->protocol_version = sev_get_ghcb_proto_ver();
ghcb->ghcb_usage   = GHCB_DEFAULT_USAGE;
 
ghcb_set_sw_exit_code(ghcb, exit_code);
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 8084bfd7cce1..5d3422e8b25e 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -498,6 +498,11 @@ static enum es_result vc_slow_virt_to_phys(struct ghcb 
*ghcb, struct es_em_ctxt
 /* Negotiated GHCB protocol version */
 static struct sev_ghcb_protocol_info ghcb_protocol_info __ro_after_init;
 
+static u64 sev_get_ghcb_proto_ver(void)
+{
+   return ghcb_protocol_info.vm_proto;
+}
+
 static noinstr void __sev_put_ghcb(struct ghcb_state *state)
 {
struct sev_es_runtime_data *data;
-- 
2.31.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC PATCH 06/12] x86/sev: Cache AP Jump Table Address

2021-07-05 Thread Joerg Roedel
From: Joerg Roedel 

Store the physical address of the AP Jump Table in kernel memory so
that it does not need to be fetched from the Hypervisor again.

Signed-off-by: Joerg Roedel 
---
 arch/x86/kernel/sev.c | 26 ++
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 5d3422e8b25e..eedba56b6bac 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -42,6 +42,9 @@ static struct ghcb boot_ghcb_page __bss_decrypted 
__aligned(PAGE_SIZE);
  */
 static struct ghcb __initdata *boot_ghcb;
 
+/* Cached AP Jump Table Address */
+static phys_addr_t sev_es_jump_table_pa;
+
 /* #VC handler runtime per-CPU data */
 struct sev_es_runtime_data {
struct ghcb ghcb_page;
@@ -546,12 +549,14 @@ void noinstr __sev_es_nmi_complete(void)
__sev_put_ghcb();
 }
 
-static u64 get_jump_table_addr(void)
+static phys_addr_t get_jump_table_addr(void)
 {
struct ghcb_state state;
unsigned long flags;
struct ghcb *ghcb;
-   u64 ret = 0;
+
+   if (sev_es_jump_table_pa)
+   return sev_es_jump_table_pa;
 
local_irq_save(flags);
 
@@ -567,39 +572,36 @@ static u64 get_jump_table_addr(void)
 
if (ghcb_sw_exit_info_1_is_valid(ghcb) &&
ghcb_sw_exit_info_2_is_valid(ghcb))
-   ret = ghcb->save.sw_exit_info_2;
+   sev_es_jump_table_pa = (phys_addr_t)ghcb->save.sw_exit_info_2;
 
__sev_put_ghcb();
 
local_irq_restore(flags);
 
-   return ret;
+   return sev_es_jump_table_pa;
 }
 
 int sev_es_setup_ap_jump_table(struct real_mode_header *rmh)
 {
u16 startup_cs, startup_ip;
-   phys_addr_t jump_table_pa;
-   u64 jump_table_addr;
u16 __iomem *jump_table;
+   phys_addr_t pa;
 
-   jump_table_addr = get_jump_table_addr();
+   pa = get_jump_table_addr();
 
/* On UP guests there is no jump table so this is not a failure */
-   if (!jump_table_addr)
+   if (!pa)
return 0;
 
/* Check if AP Jump Table is page-aligned */
-   if (jump_table_addr & ~PAGE_MASK)
+   if (pa & ~PAGE_MASK)
return -EINVAL;
 
-   jump_table_pa = jump_table_addr & PAGE_MASK;
-
startup_cs = (u16)(rmh->trampoline_start >> 4);
startup_ip = (u16)(rmh->sev_es_trampoline_start -
   rmh->trampoline_start);
 
-   jump_table = ioremap_encrypted(jump_table_pa, PAGE_SIZE);
+   jump_table = ioremap_encrypted(pa, PAGE_SIZE);
if (!jump_table)
return -EIO;
 
-- 
2.31.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC PATCH 00/12] x86/sev: KEXEC/KDUMP support for SEV-ES guests

2021-07-05 Thread Joerg Roedel
From: Joerg Roedel 

Hi,

here are changes to enable kexec/kdump in SEV-ES guests. The biggest
problem for supporting kexec/kdump under SEV-ES is to find a way to
hand the non-boot CPUs (APs) from one kernel to another.

Without SEV-ES the first kernel parks the CPUs in a HLT loop until
they get reset by the kexec'ed kernel via an INIT-SIPI-SIPI sequence.
For virtual machines the CPU reset is emulated by the hypervisor,
which sets the vCPU registers back to reset state.

This does not work under SEV-ES, because the hypervisor has no access
to the vCPU registers and can't make modifications to them. So an
SEV-ES guest needs to reset the vCPU itself and park it using the
AP-reset-hold protocol. Upon wakeup the guest needs to jump to
real-mode and to the reset-vector configured in the AP-Jump-Table.

The code to do this is the main part of this patch-set. It works by
placing code on the AP Jump-Table page itself to park the vCPU and for
jumping to the reset vector upon wakeup. The code on the AP Jump Table
runs in 16-bit protected mode with segment base set to the beginning
of the page. The AP Jump-Table is usually not within the first 1MB of
memory, so the code can't run in real-mode.

The AP Jump-Table is the best place to put the parking code, because
the memory is owned, but read-only by the firmware and writeable by
the OS. Only the first 4 bytes are used for the reset-vector, leaving
the rest of the page for code/data/stack to park a vCPU. The code
can't be in kernel memory because by the time the vCPU wakes up the
memory will be owned by the new kernel, which might have overwritten it
already.

The other patches add initial GHCB Version 2 protocol support, because
kexec/kdump need the MSR-based (without a GHCB) AP-reset-hold VMGEXT,
which is a GHCB protocol version 2 feature.

The kexec'ed kernel is also entered via the decompressor and needs
MMIO support there, so this patch-set also adds MMIO #VC support to
the decompressor and support for handling CLFLUSH instructions.

Finally there is also code to disable kexec/kdump support at runtime
when the environment does not support it (e.g. no GHCB protocol
version 2 support or AP Jump Table over 4GB).

The diffstat looks big, but most of it is moving code for MMIO #VC
support around to make it available to the decompressor.

There is also a video showing the code in action:

https://www.youtube.com/watch?v=j1AUJANP7Mk

Please review.

Thanks,

Joerg

Joerg Roedel (12):
  kexec: Allow architecture code to opt-out at runtime
  x86/kexec/64: Forbid kexec when running as an SEV-ES guest
  x86/sev: Save and print negotiated GHCB protocol version
  x86/sev: Do not hardcode GHCB protocol version
  x86/sev: Use GHCB protocol version 2 if supported
  x86/sev: Cache AP Jump Table Address
  x86/sev: Setup code to park APs in the AP Jump Table
  x86/sev: Park APs on AP Jump Table with GHCB protocol version 2
  x86/sev: Use AP Jump Table blob to stop CPU
  x86/sev: Add MMIO handling support to boot/compressed/ code
  x86/sev: Handle CLFLUSH MMIO events
  x86/sev: Support kexec under SEV-ES with AP Jump Table blob

 arch/x86/boot/compressed/sev.c  |  56 +-
 arch/x86/include/asm/realmode.h |   5 +
 arch/x86/include/asm/sev-ap-jumptable.h |  25 +
 arch/x86/include/asm/sev.h  |  13 +-
 arch/x86/kernel/machine_kexec_64.c  |  12 +
 arch/x86/kernel/process.c   |   8 +
 arch/x86/kernel/sev-shared.c| 333 +-
 arch/x86/kernel/sev.c   | 494 ++-
 arch/x86/lib/insn-eval-shared.c | 805 
 arch/x86/lib/insn-eval.c| 802 +--
 arch/x86/realmode/Makefile  |   9 +-
 arch/x86/realmode/rm/Makefile   |  11 +-
 arch/x86/realmode/rm/header.S   |   3 +
 arch/x86/realmode/rm/sev_ap_park.S  |  89 +++
 arch/x86/realmode/rmpiggy.S |   6 +
 arch/x86/realmode/sev/Makefile  |  41 ++
 arch/x86/realmode/sev/ap_jump_table.S   | 130 
 arch/x86/realmode/sev/ap_jump_table.lds |  24 +
 include/linux/kexec.h   |   2 +
 kernel/kexec.c  |  14 +
 kernel/kexec_file.c |   9 +
 21 files changed, 1765 insertions(+), 1126 deletions(-)
 create mode 100644 arch/x86/include/asm/sev-ap-jumptable.h
 create mode 100644 arch/x86/lib/insn-eval-shared.c
 create mode 100644 arch/x86/realmode/rm/sev_ap_park.S
 create mode 100644 arch/x86/realmode/sev/Makefile
 create mode 100644 arch/x86/realmode/sev/ap_jump_table.S
 create mode 100644 arch/x86/realmode/sev/ap_jump_table.lds


base-commit: 8d9d46bbf3b6b7ff8edcac33603ab45c29e0e07f
-- 
2.31.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC PATCH 12/12] x86/sev: Support kexec under SEV-ES with AP Jump Table blob

2021-07-05 Thread Joerg Roedel
From: Joerg Roedel 

When the AP Jump Table blob is installed the kernel can hand over the
APs from the old to the new kernel. Enable kexec when the AP Jump
Table blob has been installed.

Signed-off-by: Joerg Roedel 
---
 arch/x86/include/asm/sev.h |  2 ++
 arch/x86/kernel/machine_kexec_64.c |  6 +-
 arch/x86/kernel/sev.c  | 12 
 3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index cd14b6e10f12..61910caf2a0d 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -87,6 +87,7 @@ static __always_inline void sev_es_stop_this_cpu(void)
if (static_branch_unlikely(_es_enable_key))
__sev_es_stop_this_cpu();
 }
+bool sev_kexec_supported(void);
 #else
 static inline void sev_es_ist_enter(struct pt_regs *regs) { }
 static inline void sev_es_ist_exit(void) { }
@@ -94,6 +95,7 @@ static inline int sev_es_setup_ap_jump_table(struct 
real_mode_header *rmh) { ret
 static inline void sev_es_nmi_complete(void) { }
 static inline int sev_es_efi_map_ghcbs(pgd_t *pgd) { return 0; }
 static inline void sev_es_stop_this_cpu(void) { }
+static bool sev_kexec_supported(void) { return true; }
 #endif
 
 #endif
diff --git a/arch/x86/kernel/machine_kexec_64.c 
b/arch/x86/kernel/machine_kexec_64.c
index f902cc9cc634..d08eae4f7694 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_ACPI
 /*
@@ -626,5 +627,8 @@ void arch_kexec_pre_free_pages(void *vaddr, unsigned int 
pages)
  */
 bool arch_kexec_supported(void)
 {
-   return !sev_es_active();
+   if (!sev_kexec_supported())
+   return false;
+
+   return true;
 }
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 113903ba3095..731bbf941a82 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -901,6 +901,18 @@ static int __init sev_es_setup_ap_jump_table_blob(void)
 }
 core_initcall(sev_es_setup_ap_jump_table_blob);
 
+bool sev_kexec_supported(void)
+{
+   /*
+* KEXEC with SEV-ES and more than one CPU is only supported
+* when the AP Jump Table is installed.
+*/
+   if (num_possible_cpus() > 1)
+   return !sev_es_active() || sev_ap_jumptable_blob_installed;
+   else
+   return true;
+}
+
 static void __init alloc_runtime_data(int cpu)
 {
struct sev_es_runtime_data *data;
-- 
2.31.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC PATCH 02/12] x86/kexec/64: Forbid kexec when running as an SEV-ES guest

2021-07-05 Thread Joerg Roedel
From: Joerg Roedel 

For now, kexec is not supported when running as an SEV-ES guest. Doing
so requires additional hypervisor support and special code to hand
over the CPUs to the new kernel in a safe way.

Until this is implemented, do not support kexec in SEV-ES guests.

Cc: sta...@vger.kernel.org # v5.10+
Signed-off-by: Joerg Roedel 
---
 arch/x86/kernel/machine_kexec_64.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/kernel/machine_kexec_64.c 
b/arch/x86/kernel/machine_kexec_64.c
index c078b0d3ab0e..f902cc9cc634 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -620,3 +620,11 @@ void arch_kexec_pre_free_pages(void *vaddr, unsigned int 
pages)
 */
set_memory_encrypted((unsigned long)vaddr, pages);
 }
+
+/*
+ * Kexec is not supported in SEV-ES guests yet
+ */
+bool arch_kexec_supported(void)
+{
+   return !sev_es_active();
+}
-- 
2.31.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC PATCH 09/12] x86/sev: Use AP Jump Table blob to stop CPU

2021-07-05 Thread Joerg Roedel
From: Joerg Roedel 

To support kexec under SEV-ES the APs can't be parked with HLT. Upon
wakeup the AP needs to find its way to execute at the reset vector set
by the new kernel and in real-mode.

This is what the AP Jump Table blob provides, so stop the APs the
SEV-ES way by calling the AP-reset-hold VMGEXIT from the AP Jump
Table.

Signed-off-by: Joerg Roedel 
---
 arch/x86/include/asm/sev.h |  7 +++
 arch/x86/kernel/process.c  |  8 
 arch/x86/kernel/sev.c  | 11 ++-
 3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 134a7c9d91b6..cd14b6e10f12 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -81,12 +81,19 @@ static __always_inline void sev_es_nmi_complete(void)
__sev_es_nmi_complete();
 }
 extern int __init sev_es_efi_map_ghcbs(pgd_t *pgd);
+void __sev_es_stop_this_cpu(void);
+static __always_inline void sev_es_stop_this_cpu(void)
+{
+   if (static_branch_unlikely(_es_enable_key))
+   __sev_es_stop_this_cpu();
+}
 #else
 static inline void sev_es_ist_enter(struct pt_regs *regs) { }
 static inline void sev_es_ist_exit(void) { }
 static inline int sev_es_setup_ap_jump_table(struct real_mode_header *rmh) { 
return 0; }
 static inline void sev_es_nmi_complete(void) { }
 static inline int sev_es_efi_map_ghcbs(pgd_t *pgd) { return 0; }
+static inline void sev_es_stop_this_cpu(void) { }
 #endif
 
 #endif
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 5e1f38179f49..c5da02f9100f 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -43,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "process.h"
 
@@ -736,6 +737,13 @@ void stop_this_cpu(void *dummy)
if (boot_cpu_has(X86_FEATURE_SME))
native_wbinvd();
for (;;) {
+   /*
+* SEV-ES guests need a special stop routine to support
+* kexec. Try this first, if it fails the function will
+* return and native_halt() is used.
+*/
+   sev_es_stop_this_cpu();
+
/*
 * Use native_halt() so that memory contents don't change
 * (stack usage and variables) after possibly issuing the
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 2147ebd0e919..5fdfa94c823c 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -695,7 +695,6 @@ static bool __init sev_es_setup_ghcb(void)
return true;
 }
 
-#ifdef CONFIG_HOTPLUG_CPU
 static void __noreturn sev_jumptable_ap_park(void)
 {
local_irq_disable();
@@ -725,6 +724,16 @@ static void __noreturn sev_jumptable_ap_park(void)
 }
 STACK_FRAME_NON_STANDARD(sev_jumptable_ap_park);
 
+void __sev_es_stop_this_cpu(void)
+{
+   /* Only park in the AP Jump Table when the code has been installed */
+   if (!sev_ap_jumptable_blob_installed)
+   return;
+
+   sev_jumptable_ap_park();
+}
+
+#ifdef CONFIG_HOTPLUG_CPU
 static void sev_es_ap_hlt_loop(void)
 {
struct ghcb_state state;
-- 
2.31.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC PATCH 03/12] x86/sev: Save and print negotiated GHCB protocol version

2021-07-05 Thread Joerg Roedel
From: Joerg Roedel 

Save the results of the GHCB protocol negotiation into a data structure
and print information about versions supported and used to the kernel
log.

Signed-off-by: Joerg Roedel 
---
 arch/x86/boot/compressed/sev.c |  2 +-
 arch/x86/kernel/sev-shared.c   | 22 +-
 arch/x86/kernel/sev.c  | 13 -
 3 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 670e998fe930..1a2e49730f8b 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -121,7 +121,7 @@ static enum es_result vc_read_mem(struct es_em_ctxt *ctxt,
 
 static bool early_setup_sev_es(void)
 {
-   if (!sev_es_negotiate_protocol())
+   if (!sev_es_negotiate_protocol(NULL))
sev_es_terminate(GHCB_SEV_ES_REASON_PROTOCOL_UNSUPPORTED);
 
if (set_page_decrypted((unsigned long)_ghcb_page))
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 9f90f460a28c..73eeb5897d16 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -14,6 +14,20 @@
 #define has_cpuflag(f) boot_cpu_has(f)
 #endif
 
+/*
+ * struct sev_ghcb_protocol_info - Used to return GHCB protocol
+ *negotiation details.
+ *
+ * @hv_proto_min:  Minimum GHCB protocol version supported by Hypervisor
+ * @hv_proto_max:  Maximum GHCB protocol version supported by Hypervisor
+ * @vm_proto:  Protocol version the VM (this kernel) will use
+ */
+struct sev_ghcb_protocol_info {
+   unsigned int hv_proto_min;
+   unsigned int hv_proto_max;
+   unsigned int vm_proto;
+};
+
 static bool __init sev_es_check_cpu_features(void)
 {
if (!has_cpuflag(X86_FEATURE_RDRAND)) {
@@ -42,7 +56,7 @@ static void __noreturn sev_es_terminate(unsigned int reason)
asm volatile("hlt\n" : : : "memory");
 }
 
-static bool sev_es_negotiate_protocol(void)
+static bool sev_es_negotiate_protocol(struct sev_ghcb_protocol_info *info)
 {
u64 val;
 
@@ -58,6 +72,12 @@ static bool sev_es_negotiate_protocol(void)
GHCB_MSR_PROTO_MIN(val) > GHCB_PROTO_OUR)
return false;
 
+   if (info) {
+   info->hv_proto_min = GHCB_MSR_PROTO_MIN(val);
+   info->hv_proto_max = GHCB_MSR_PROTO_MAX(val);
+   info->vm_proto = GHCB_PROTO_OUR;
+   }
+
return true;
 }
 
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index a6895e440bc3..8084bfd7cce1 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -495,6 +495,9 @@ static enum es_result vc_slow_virt_to_phys(struct ghcb 
*ghcb, struct es_em_ctxt
 /* Include code shared with pre-decompression boot stage */
 #include "sev-shared.c"
 
+/* Negotiated GHCB protocol version */
+static struct sev_ghcb_protocol_info ghcb_protocol_info __ro_after_init;
+
 static noinstr void __sev_put_ghcb(struct ghcb_state *state)
 {
struct sev_es_runtime_data *data;
@@ -665,7 +668,7 @@ static enum es_result vc_handle_msr(struct ghcb *ghcb, 
struct es_em_ctxt *ctxt)
 static bool __init sev_es_setup_ghcb(void)
 {
/* First make sure the hypervisor talks a supported protocol. */
-   if (!sev_es_negotiate_protocol())
+   if (!sev_es_negotiate_protocol(_protocol_info))
return false;
 
/*
@@ -794,6 +797,14 @@ void __init sev_es_init_vc_handling(void)
 
/* Secondary CPUs use the runtime #VC handler */
initial_vc_handler = (unsigned long)kernel_exc_vmm_communication;
+
+   /*
+* Print information about supported and negotiated GHCB protocol
+* versions.
+*/
+   pr_info("Hypervisor GHCB protocol version support: min=%u max=%u\n",
+   ghcb_protocol_info.hv_proto_min, 
ghcb_protocol_info.hv_proto_max);
+   pr_info("Using GHCB protocol version %u\n", 
ghcb_protocol_info.vm_proto);
 }
 
 static void __init vc_early_forward_exception(struct es_em_ctxt *ctxt)
-- 
2.31.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC PATCH 01/12] kexec: Allow architecture code to opt-out at runtime

2021-07-05 Thread Joerg Roedel
From: Joerg Roedel 

Allow a runtime opt-out of kexec support for architecture code in case
the kernel is running in an environment where kexec is not properly
supported yet.

This will be used on x86 when the kernel is running as an SEV-ES
guest. SEV-ES guests need special handling for kexec to hand over all
CPUs to the new kernel. This requires special hypervisor support and
handling code in the guest which is not yet implemented.

Cc: sta...@vger.kernel.org # v5.10+
Signed-off-by: Joerg Roedel 
---
 include/linux/kexec.h |  2 ++
 kernel/kexec.c| 14 ++
 kernel/kexec_file.c   |  9 +
 3 files changed, 25 insertions(+)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 0c994ae37729..400aae677435 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -422,6 +422,8 @@ static inline int kexec_crash_loaded(void) { return 0; }
 #define kexec_in_progress false
 #endif /* CONFIG_KEXEC_CORE */
 
+bool arch_kexec_supported(void);
+
 #endif /* !defined(__ASSEBMLY__) */
 
 #endif /* LINUX_KEXEC_H */
diff --git a/kernel/kexec.c b/kernel/kexec.c
index c82c6c06f051..d03134160458 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -195,11 +195,25 @@ static int do_kexec_load(unsigned long entry, unsigned 
long nr_segments,
  * that to happen you need to do that yourself.
  */
 
+bool __weak arch_kexec_supported(void)
+{
+   return true;
+}
+
 static inline int kexec_load_check(unsigned long nr_segments,
   unsigned long flags)
 {
int result;
 
+   /*
+* The architecture may support kexec in general, but the kernel could
+* run in an environment where it is not (yet) possible to execute a new
+* kernel. Allow the architecture code to opt-out of kexec support when
+* it is running in such an environment.
+*/
+   if (!arch_kexec_supported())
+   return -ENOSYS;
+
/* We only trust the superuser with rebooting the system. */
if (!capable(CAP_SYS_BOOT) || kexec_load_disabled)
return -EPERM;
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 33400ff051a8..96d08a512e9c 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -358,6 +358,15 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, 
initrd_fd,
int ret = 0, i;
struct kimage **dest_image, *image;
 
+   /*
+* The architecture may support kexec in general, but the kernel could
+* run in an environment where it is not (yet) possible to execute a new
+* kernel. Allow the architecture code to opt-out of kexec support when
+* it is running in such an environment.
+*/
+   if (!arch_kexec_supported())
+   return -ENOSYS;
+
/* We only trust the superuser with rebooting the system. */
if (!capable(CAP_SYS_BOOT) || kexec_load_disabled)
return -EPERM;
-- 
2.31.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v12] i2c: virtio: add a virtio i2c frontend driver

2021-07-05 Thread Viresh Kumar
On 05-07-21, 14:22, Jie Deng wrote:
> On 2021/7/5 12:38, Viresh Kumar wrote:
> > On 05-07-21, 11:45, Jie Deng wrote:
> > > On 2021/7/5 10:40, Viresh Kumar wrote:
> > > > On 02-07-21, 16:46, Jie Deng wrote:
> > > > The right way of doing this is is making this function return - Error 
> > > > on failure
> > > > and 0 on success. There is no point returning number of successful 
> > > > additions
> > > > here.
> > > 
> > > We need the number for virtio_i2c_complete_reqs to do cleanup. We don't 
> > > have
> > > to
> > > 
> > > do cleanup "num" times every time. Just do it as needed.
> > If you do full cleanup here, then you won't required that at the caller 
> > site.
> > 
> > > > Moreover, on failures this needs to clean up (free the dmabufs) itself, 
> > > > just
> > > > like you did i2c_put_dma_safe_msg_buf() at the end. The caller 
> > > > shouldn't be
> > > > required to handle the error cases by freeing up resources.
> > > 
> > > This function will return the number of requests being successfully 
> > > prepared
> > > and make sure
> > > 
> > > resources of the failed request being freed. And virtio_i2c_complete_reqs
> > > will free the
> > > 
> > > resources of those successful request.
> > It just looks cleaner to give such responsibility to each and every 
> > function,
> > i.e. if they fail, they should clean stuff up instead of the caller. That's 
> > the
> > normal philosophy you will find across kernel in most of the cases.
> > > > > + /*
> > > > > +  * Condition (req && req == [i]) should always 
> > > > > meet since
> > > > > +  * we have total nr requests in the vq.
> > > > > +  */
> > > > > + if (!failed && (WARN_ON(!(req && req == [i])) ||
> > > > > + (req->in_hdr.status != VIRTIO_I2C_MSG_OK)))
> > > > What about writing this as:
> > > > 
> > > > if (!failed && (WARN_ON(req != [i]) ||
> > > > (req->in_hdr.status != VIRTIO_I2C_MSG_OK)))
> > > > 
> > > > We don't need to check req here since if req is NULL, we will not do 
> > > > req->in_hdr
> > > > at all.
> > > 
> > > It's right here just because the [i] will never be NULL in our case.
> > > But if you see
> > > 
> > > "virtio_i2c_complete_reqs" as an independent function, you need to check 
> > > the
> > > 
> > > req. From the perspective of the callee, you can't ask the caller always
> > > give you
> > > 
> > > the non-NULL parameters.
> > We need to keep this driver optimized in its current form. If you see your 
> > own
> > argument here, then why don't you test vq or msgs for a valid pointer ? And 
> > even
> > reqs.
> > 
> > If we know for certain that this will never happen, then it should be 
> > optimized.
> > But if you see a case where reqs[i] can be NULL here, then it would be fine.
> > ot the driver. And we don't need to take care of that.
> 
> 
> This is still not enough to convince me.  So I won't change them for now
> until I see it
> 
> is the consensus of the majority.

Do you see reqs[i] to ever be NULL here ? If not, then if (req) is like if
(true).

Why would you want to have something like that ?

-- 
viresh
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v13] i2c: virtio: add a virtio i2c frontend driver

2021-07-05 Thread Viresh Kumar
On 05-07-21, 12:38, Andy Shevchenko wrote:
> Because we do not have "uapi" in the path in /usr/include on the real
> system where the linux-headers (or kernel-headers) package is
> installed.
> 
> It's still possible that our installation hooks will remove that
> "uapi" from the headers, but I think it makes things too complicated.

Ahh, right. Yeah, I completely missed that part.

-- 
viresh
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v12] i2c: virtio: add a virtio i2c frontend driver

2021-07-05 Thread Jie Deng


On 2021/7/5 12:38, Viresh Kumar wrote:

On 05-07-21, 11:45, Jie Deng wrote:

On 2021/7/5 10:40, Viresh Kumar wrote:

On 02-07-21, 16:46, Jie Deng wrote:
The right way of doing this is is making this function return - Error on failure
and 0 on success. There is no point returning number of successful additions
here.


We need the number for virtio_i2c_complete_reqs to do cleanup. We don't have
to

do cleanup "num" times every time. Just do it as needed.

If you do full cleanup here, then you won't required that at the caller site.


Moreover, on failures this needs to clean up (free the dmabufs) itself, just
like you did i2c_put_dma_safe_msg_buf() at the end. The caller shouldn't be
required to handle the error cases by freeing up resources.


This function will return the number of requests being successfully prepared
and make sure

resources of the failed request being freed. And virtio_i2c_complete_reqs
will free the

resources of those successful request.

It just looks cleaner to give such responsibility to each and every function,
i.e. if they fail, they should clean stuff up instead of the caller. That's the
normal philosophy you will find across kernel in most of the cases.
  

+   /*
+* Condition (req && req == [i]) should always meet since
+* we have total nr requests in the vq.
+*/
+   if (!failed && (WARN_ON(!(req && req == [i])) ||
+   (req->in_hdr.status != VIRTIO_I2C_MSG_OK)))

What about writing this as:

if (!failed && (WARN_ON(req != [i]) ||
(req->in_hdr.status != VIRTIO_I2C_MSG_OK)))

We don't need to check req here since if req is NULL, we will not do req->in_hdr
at all.


It's right here just because the [i] will never be NULL in our case.
But if you see

"virtio_i2c_complete_reqs" as an independent function, you need to check the

req. From the perspective of the callee, you can't ask the caller always
give you

the non-NULL parameters.

We need to keep this driver optimized in its current form. If you see your own
argument here, then why don't you test vq or msgs for a valid pointer ? And even
reqs.

If we know for certain that this will never happen, then it should be optimized.
But if you see a case where reqs[i] can be NULL here, then it would be fine.
ot the driver. And we don't need to take care of that.



This is still not enough to convince me.  So I won't change them for now 
until I see it


is the consensus of the majority.

Thank you.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v12] i2c: virtio: add a virtio i2c frontend driver

2021-07-05 Thread Jie Deng

On 2021/7/5 14:30, Viresh Kumar wrote:



This is still not enough to convince me.  So I won't change them for now
until I see it

is the consensus of the majority.

Do you see reqs[i] to ever be NULL here ? If not, then if (req) is like if
(true).

Why would you want to have something like that ?


No. Currently, virtio_i2c_complete_reqs is only called by 
virtio_i2c_xfer, thus we don't


have reqs[i] to be NULL. But I think "virtio_i2c_complete_reqs" as an 
independent function


should consider this from a callee perspective.

Anyway, if you really think it should be changed, it can be fixed 
incrementally as Wolfram said.



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH 1/2] vdpa: support per virtqueue max queue size

2021-07-05 Thread Jason Wang
Virtio spec allows the device to specify the per virtqueue max queue
size. vDPA needs to adapt to this flexibility. E.g Qemu advertise a
small control virtqueue for virtio-net.

So this patch adds a index parameter to get_vq_num_max bus operations
for the device to report its per virtqueue max queue size.

Both VHOST_VDPA_GET_VRING_NUM and VDPA_ATTR_DEV_MAX_VQ_SIZE assume a
global maximum size. So we iterate all the virtqueues to return the
minimal size in this case. Actually, the VHOST_VDPA_GET_VRING_NUM is
not a must for the userspace. Userspace may choose to check the
VHOST_SET_VRING_NUM for proving or validating the maximum virtqueue
size. Anyway, we can invent a per vq version of
VHOST_VDPA_GET_VRING_NUM in the future if it's necessary.

Signed-off-by: Jason Wang 
---
 drivers/vdpa/ifcvf/ifcvf_main.c   |  2 +-
 drivers/vdpa/mlx5/net/mlx5_vnet.c |  2 +-
 drivers/vdpa/vdpa.c   | 22 +-
 drivers/vdpa/vdpa_sim/vdpa_sim.c  |  2 +-
 drivers/vdpa/virtio_pci/vp_vdpa.c |  2 +-
 drivers/vhost/vdpa.c  |  9 ++---
 drivers/virtio/virtio_vdpa.c  |  2 +-
 include/linux/vdpa.h  |  5 -
 8 files changed, 36 insertions(+), 10 deletions(-)

diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c
index ab0ab5cf0f6e..646b340db2af 100644
--- a/drivers/vdpa/ifcvf/ifcvf_main.c
+++ b/drivers/vdpa/ifcvf/ifcvf_main.c
@@ -254,7 +254,7 @@ static void ifcvf_vdpa_set_status(struct vdpa_device 
*vdpa_dev, u8 status)
ifcvf_set_status(vf, status);
 }
 
-static u16 ifcvf_vdpa_get_vq_num_max(struct vdpa_device *vdpa_dev)
+static u16 ifcvf_vdpa_get_vq_num_max(struct vdpa_device *vdpa_dev, u16 qid)
 {
return IFCVF_QUEUE_MAX;
 }
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index dda5dc6f7737..afd6114d07b0 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1584,7 +1584,7 @@ static void mlx5_vdpa_set_config_cb(struct vdpa_device 
*vdev, struct vdpa_callba
 }
 
 #define MLX5_VDPA_MAX_VQ_ENTRIES 256
-static u16 mlx5_vdpa_get_vq_num_max(struct vdpa_device *vdev)
+static u16 mlx5_vdpa_get_vq_num_max(struct vdpa_device *vdev, u16 idx)
 {
return MLX5_VDPA_MAX_VQ_ENTRIES;
 }
diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
index bb3f1d1f0422..d77d59811389 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -239,6 +239,26 @@ void vdpa_unregister_driver(struct vdpa_driver *drv)
 }
 EXPORT_SYMBOL_GPL(vdpa_unregister_driver);
 
+/**
+ * vdpa_get_vq_num_max - get the maximum virtqueue size
+ * @vdev: vdpa device
+ */
+u16 vdpa_get_vq_num_max(struct vdpa_device *vdev)
+{
+   const struct vdpa_config_ops *ops = vdev->config;
+   u16 s, size = ops->get_vq_num_max(vdev, 0);
+   int i;
+
+   for (i = 1; i < vdev->nvqs; i++) {
+   s = ops->get_vq_num_max(vdev, i);
+   if (s && s < size)
+   size = s;
+   }
+
+   return size;
+}
+EXPORT_SYMBOL_GPL(vdpa_get_vq_num_max);
+
 /**
  * vdpa_mgmtdev_register - register a vdpa management device
  *
@@ -502,7 +522,7 @@ vdpa_dev_fill(struct vdpa_device *vdev, struct sk_buff 
*msg, u32 portid, u32 seq
 
device_id = vdev->config->get_device_id(vdev);
vendor_id = vdev->config->get_vendor_id(vdev);
-   max_vq_size = vdev->config->get_vq_num_max(vdev);
+   max_vq_size = vdpa_get_vq_num_max(vdev);
 
err = -EMSGSIZE;
if (nla_put_string(msg, VDPA_ATTR_DEV_NAME, dev_name(>dev)))
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index 98f793bc9376..49e29056f164 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -422,7 +422,7 @@ static void vdpasim_set_config_cb(struct vdpa_device *vdpa,
/* We don't support config interrupt */
 }
 
-static u16 vdpasim_get_vq_num_max(struct vdpa_device *vdpa)
+static u16 vdpasim_get_vq_num_max(struct vdpa_device *vdpa, u16 idx)
 {
return VDPASIM_QUEUE_MAX;
 }
diff --git a/drivers/vdpa/virtio_pci/vp_vdpa.c 
b/drivers/vdpa/virtio_pci/vp_vdpa.c
index c76ebb531212..2926641fb586 100644
--- a/drivers/vdpa/virtio_pci/vp_vdpa.c
+++ b/drivers/vdpa/virtio_pci/vp_vdpa.c
@@ -195,7 +195,7 @@ static void vp_vdpa_set_status(struct vdpa_device *vdpa, u8 
status)
vp_vdpa_free_irq(vp_vdpa);
 }
 
-static u16 vp_vdpa_get_vq_num_max(struct vdpa_device *vdpa)
+static u16 vp_vdpa_get_vq_num_max(struct vdpa_device *vdpa, u16 qid)
 {
return VP_VDPA_QUEUE_MAX;
 }
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index fb41db3da611..c9ec395b8c42 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -289,11 +289,14 @@ static long vhost_vdpa_set_features(struct vhost_vdpa *v, 
u64 __user *featurep)
 
 static long vhost_vdpa_get_vring_num(struct vhost_vdpa *v, u16 __user *argp)
 {
-   struct vdpa_device *vdpa = v->vdpa;
-   const struct vdpa_config_ops *ops = vdpa->config;
u16 

[PATCH 2/2] vdpa: vp_vdpa: don't use hard-coded maximum virtqueue size

2021-07-05 Thread Jason Wang
This patch switch to read virtqueue size from the capability instead
of depending on the hardcoded value. This allows the per virtqueue
size could be advertised.

Signed-off-by: Jason Wang 
---
 drivers/vdpa/virtio_pci/vp_vdpa.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/vdpa/virtio_pci/vp_vdpa.c 
b/drivers/vdpa/virtio_pci/vp_vdpa.c
index 2926641fb586..198f7076e4d9 100644
--- a/drivers/vdpa/virtio_pci/vp_vdpa.c
+++ b/drivers/vdpa/virtio_pci/vp_vdpa.c
@@ -18,7 +18,6 @@
 #include 
 #include 
 
-#define VP_VDPA_QUEUE_MAX 256
 #define VP_VDPA_DRIVER_NAME "vp_vdpa"
 #define VP_VDPA_NAME_SIZE 256
 
@@ -197,7 +196,10 @@ static void vp_vdpa_set_status(struct vdpa_device *vdpa, 
u8 status)
 
 static u16 vp_vdpa_get_vq_num_max(struct vdpa_device *vdpa, u16 qid)
 {
-   return VP_VDPA_QUEUE_MAX;
+   struct vp_vdpa *vp_vdpa = vdpa_to_vp(vdpa);
+   struct virtio_pci_modern_device *mdev = _vdpa->mdev;
+
+   return vp_modern_get_queue_size(mdev, qid);
 }
 
 static int vp_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 qid,
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] vdpa: vp_vdpa: don't use hard-coded maximum virtqueue size

2021-07-05 Thread Jason Wang


在 2021/7/5 下午3:26, Michael S. Tsirkin 写道:

On Mon, Jul 05, 2021 at 03:19:10PM +0800, Jason Wang wrote:

This patch switch to read virtqueue size from the capability instead
of depending on the hardcoded value. This allows the per virtqueue
size could be advertised.

Signed-off-by: Jason Wang 

So let's add an ioctl for this? It's really a bug we don't..



As explained in patch 1. Qemu doesn't use VHOST_VDPA_GET_VRING_NUM 
actually. Instead it checks the result VHOST_VDPA_SET_VRING_NUM.


So I change VHOST_VDPA_GET_VRING_NUM to return the minimal size of all 
the virtqueues.


If you wish we can add a VHOST_VDPA_GET_VRING_NUM2, but I'm not sure it 
will have a user or not.


Thanks





---
  drivers/vdpa/virtio_pci/vp_vdpa.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/vdpa/virtio_pci/vp_vdpa.c 
b/drivers/vdpa/virtio_pci/vp_vdpa.c
index 2926641fb586..198f7076e4d9 100644
--- a/drivers/vdpa/virtio_pci/vp_vdpa.c
+++ b/drivers/vdpa/virtio_pci/vp_vdpa.c
@@ -18,7 +18,6 @@
  #include 
  #include 
  
-#define VP_VDPA_QUEUE_MAX 256

  #define VP_VDPA_DRIVER_NAME "vp_vdpa"
  #define VP_VDPA_NAME_SIZE 256
  
@@ -197,7 +196,10 @@ static void vp_vdpa_set_status(struct vdpa_device *vdpa, u8 status)
  
  static u16 vp_vdpa_get_vq_num_max(struct vdpa_device *vdpa, u16 qid)

  {
-   return VP_VDPA_QUEUE_MAX;
+   struct vp_vdpa *vp_vdpa = vdpa_to_vp(vdpa);
+   struct virtio_pci_modern_device *mdev = _vdpa->mdev;
+
+   return vp_modern_get_queue_size(mdev, qid);
  }
  
  static int vp_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 qid,

--
2.25.1


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v13] i2c: virtio: add a virtio i2c frontend driver

2021-07-05 Thread Andy Shevchenko
On Mon, Jul 5, 2021 at 11:56 AM Viresh Kumar  wrote:
> On 05-07-21, 11:45, Andy Shevchenko wrote:
> > On Mon, Jul 5, 2021 at 11:03 AM Viresh Kumar  
> > wrote:
> > > On 05-07-21, 14:53, Jie Deng wrote:
> >
> > > > +#include 
> > > > +#include 
> > >
> > > Both of these need to be the uapi headers as Andy said earlier
> >
> > They are already since this header _is_ UAPI,
>
> Ahh, there is some tricky header inclusion there :)
>
> > what you are suggesting is gonna not work,
>
> Why ?

Because we do not have "uapi" in the path in /usr/include on the real
system where the linux-headers (or kernel-headers) package is
installed.

It's still possible that our installation hooks will remove that
"uapi" from the headers, but I think it makes things too complicated.

> > although it's correct for in-kernel users of UAPI
> > headers.


-- 
With Best Regards,
Andy Shevchenko
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next 0/2] refactor the ringtest testing for ptr_ring

2021-07-05 Thread Andy Shevchenko
On Mon, Jul 05, 2021 at 11:57:33AM +0800, Yunsheng Lin wrote:
> tools/include/* have a lot of abstract layer for building
> kernel code from userspace, so reuse or add the abstract
> layer in tools/include/ to build the ptr_ring for ringtest
> testing.

...

>  create mode 100644 tools/include/asm/cache.h
>  create mode 100644 tools/include/asm/processor.h
>  create mode 100644 tools/include/generated/autoconf.h
>  create mode 100644 tools/include/linux/align.h
>  create mode 100644 tools/include/linux/cache.h
>  create mode 100644 tools/include/linux/slab.h

Maybe somebody can change this to be able to include in-tree headers directly?

Besides above, had you tested this with `make O=...`?

-- 
With Best Regards,
Andy Shevchenko


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] vdpa: vp_vdpa: don't use hard-coded maximum virtqueue size

2021-07-05 Thread Michael S. Tsirkin
On Mon, Jul 05, 2021 at 03:19:10PM +0800, Jason Wang wrote:
> This patch switch to read virtqueue size from the capability instead
> of depending on the hardcoded value. This allows the per virtqueue
> size could be advertised.
> 
> Signed-off-by: Jason Wang 

So let's add an ioctl for this? It's really a bug we don't..

> ---
>  drivers/vdpa/virtio_pci/vp_vdpa.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vdpa/virtio_pci/vp_vdpa.c 
> b/drivers/vdpa/virtio_pci/vp_vdpa.c
> index 2926641fb586..198f7076e4d9 100644
> --- a/drivers/vdpa/virtio_pci/vp_vdpa.c
> +++ b/drivers/vdpa/virtio_pci/vp_vdpa.c
> @@ -18,7 +18,6 @@
>  #include 
>  #include 
>  
> -#define VP_VDPA_QUEUE_MAX 256
>  #define VP_VDPA_DRIVER_NAME "vp_vdpa"
>  #define VP_VDPA_NAME_SIZE 256
>  
> @@ -197,7 +196,10 @@ static void vp_vdpa_set_status(struct vdpa_device *vdpa, 
> u8 status)
>  
>  static u16 vp_vdpa_get_vq_num_max(struct vdpa_device *vdpa, u16 qid)
>  {
> - return VP_VDPA_QUEUE_MAX;
> + struct vp_vdpa *vp_vdpa = vdpa_to_vp(vdpa);
> + struct virtio_pci_modern_device *mdev = _vdpa->mdev;
> +
> + return vp_modern_get_queue_size(mdev, qid);
>  }
>  
>  static int vp_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 qid,
> -- 
> 2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v13] i2c: virtio: add a virtio i2c frontend driver

2021-07-05 Thread Jie Deng
Add an I2C bus driver for virtio para-virtualization.

The controller can be emulated by the backend driver in
any device model software by following the virtio protocol.

The device specification can be found on
https://lists.oasis-open.org/archives/virtio-comment/202101/msg8.html.

By following the specification, people may implement different
backend drivers to emulate different controllers according to
their needs.

Co-developed-by: Conghui Chen 
Signed-off-by: Conghui Chen 
Signed-off-by: Jie Deng 
---
Changes v12 -> v13
- Use _BITUL() instead of BIT()
- Rename "virtio_i2c_send_reqs" to "virtio_i2c_prepare_reqs"
- Optimize the return value of "virtio_i2c_complete_reqs"

Changes v11 -> v12
- Do not sent msg_buf for zero-length request.
- Send requests to host only if all the number of transfers requested 
prepared successfully.
- Remove the line #include  in virtio_i2c.h

Changes v10 -> v11
- Remove vi->adap.class = I2C_CLASS_DEPRECATED.
- Use #ifdef CONFIG_PM_SLEEP to replace the "__maybe_unused".
- Remove "struct mutex lock" in "struct virtio_i2c".
- Support zero-length request.
- Remove unnecessary logs.
- Remove vi->adap.timeout = HZ / 10, just use the default value.
- Use BIT(0) to define VIRTIO_I2C_FLAGS_FAIL_NEXT.
- Add the virtio_device index to adapter's naming mechanism.

 drivers/i2c/busses/Kconfig  |  11 ++
 drivers/i2c/busses/Makefile |   3 +
 drivers/i2c/busses/i2c-virtio.c | 269 
 include/uapi/linux/virtio_i2c.h |  41 ++
 include/uapi/linux/virtio_ids.h |   1 +
 5 files changed, 325 insertions(+)
 create mode 100644 drivers/i2c/busses/i2c-virtio.c
 create mode 100644 include/uapi/linux/virtio_i2c.h

diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig
index 10acece..e47616a 100644
--- a/drivers/i2c/busses/Kconfig
+++ b/drivers/i2c/busses/Kconfig
@@ -21,6 +21,17 @@ config I2C_ALI1535
  This driver can also be built as a module.  If so, the module
  will be called i2c-ali1535.
 
+config I2C_VIRTIO
+   tristate "Virtio I2C Adapter"
+   select VIRTIO
+   help
+ If you say yes to this option, support will be included for the virtio
+ I2C adapter driver. The hardware can be emulated by any device model
+ software according to the virtio protocol.
+
+ This driver can also be built as a module. If so, the module
+ will be called i2c-virtio.
+
 config I2C_ALI1563
tristate "ALI 1563"
depends on PCI
diff --git a/drivers/i2c/busses/Makefile b/drivers/i2c/busses/Makefile
index 69e9963..9843756 100644
--- a/drivers/i2c/busses/Makefile
+++ b/drivers/i2c/busses/Makefile
@@ -147,4 +147,7 @@ obj-$(CONFIG_I2C_XGENE_SLIMPRO) += i2c-xgene-slimpro.o
 obj-$(CONFIG_SCx200_ACB)   += scx200_acb.o
 obj-$(CONFIG_I2C_FSI)  += i2c-fsi.o
 
+# VIRTIO I2C host controller driver
+obj-$(CONFIG_I2C_VIRTIO)   += i2c-virtio.o
+
 ccflags-$(CONFIG_I2C_DEBUG_BUS) := -DDEBUG
diff --git a/drivers/i2c/busses/i2c-virtio.c b/drivers/i2c/busses/i2c-virtio.c
new file mode 100644
index 000..731267d
--- /dev/null
+++ b/drivers/i2c/busses/i2c-virtio.c
@@ -0,0 +1,269 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Virtio I2C Bus Driver
+ *
+ * The Virtio I2C Specification:
+ * 
https://raw.githubusercontent.com/oasis-tcs/virtio-spec/master/virtio-i2c.tex
+ *
+ * Copyright (c) 2021 Intel Corporation. All rights reserved.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/**
+ * struct virtio_i2c - virtio I2C data
+ * @vdev: virtio device for this controller
+ * @completion: completion of virtio I2C message
+ * @adap: I2C adapter for this controller
+ * @vq: the virtio virtqueue for communication
+ */
+struct virtio_i2c {
+   struct virtio_device *vdev;
+   struct completion completion;
+   struct i2c_adapter adap;
+   struct virtqueue *vq;
+};
+
+/**
+ * struct virtio_i2c_req - the virtio I2C request structure
+ * @out_hdr: the OUT header of the virtio I2C message
+ * @buf: the buffer into which data is read, or from which it's written
+ * @in_hdr: the IN header of the virtio I2C message
+ */
+struct virtio_i2c_req {
+   struct virtio_i2c_out_hdr out_hdr   cacheline_aligned;
+   uint8_t *bufcacheline_aligned;
+   struct virtio_i2c_in_hdr in_hdr cacheline_aligned;
+};
+
+static void virtio_i2c_msg_done(struct virtqueue *vq)
+{
+   struct virtio_i2c *vi = vq->vdev->priv;
+
+   complete(>completion);
+}
+
+static int virtio_i2c_prepare_reqs(struct virtqueue *vq,
+  struct virtio_i2c_req *reqs,
+  struct i2c_msg *msgs, int nr)
+{
+   struct scatterlist *sgs[3], out_hdr, msg_buf, in_hdr;
+   int i, outcnt, incnt, err = 0;
+
+  

Re: [PATCH v13] i2c: virtio: add a virtio i2c frontend driver

2021-07-05 Thread Viresh Kumar
Hi Jie.

On 05-07-21, 14:53, Jie Deng wrote:
> diff --git a/drivers/i2c/busses/i2c-virtio.c b/drivers/i2c/busses/i2c-virtio.c
> +static int virtio_i2c_complete_reqs(struct virtqueue *vq,
> + struct virtio_i2c_req *reqs,
> + struct i2c_msg *msgs, int nr,
> + bool fail)
> +{
> + struct virtio_i2c_req *req;
> + bool failed = fail;
> + unsigned int len;
> + int i, j = 0;
> +
> + for (i = 0; i < nr; i++) {
> + /* Detach the ith request from the vq */
> + req = virtqueue_get_buf(vq, );

Calling this for cases where virtio_i2c_prepare_reqs() itself has failed will
always return NULL, should we even try to call this then ?

virtqueue_get_buf() returns the next used buffer only, i.e. returned by the host
?

> +
> + /*
> +  * Condition (req && req == [i]) should always meet since
> +  * we have total nr requests in the vq.
> +  */
> + if (!failed && (WARN_ON(!(req && req == [i])) ||

Mentioning again for completeness of the review, reqs[i] can never be NULL here
though req can be. And even in that case you never need to check req here.

Feel free to ignore it if you want, we can always send a fixup patch later and
discuss it further :)

> + (req->in_hdr.status != VIRTIO_I2C_MSG_OK)))
> + failed = true;
> +
> + i2c_put_dma_safe_msg_buf(reqs[i].buf, [i], !failed);
> + if (!failed)
> + j++;
> + }
> +
> + return fail ? 0 : j;

Since you don't return ETIMEDOUT anymore, you can simply return j now. And so we
can work with a single failed argument and don't need both fail and failed.

> +}
> +
> +static int virtio_i2c_xfer(struct i2c_adapter *adap, struct i2c_msg *msgs, 
> int num)
> +{
> + struct virtio_i2c *vi = i2c_get_adapdata(adap);
> + struct virtqueue *vq = vi->vq;
> + struct virtio_i2c_req *reqs;
> + unsigned long time_left;
> + int ret;
> +
> + reqs = kcalloc(num, sizeof(*reqs), GFP_KERNEL);
> + if (!reqs)
> + return -ENOMEM;
> +
> + ret = virtio_i2c_prepare_reqs(vq, reqs, msgs, num);
> + if (ret != num) {
> + ret = virtio_i2c_complete_reqs(vq, reqs, msgs, ret, true);
> + goto err_free;
> + }
> +
> + reinit_completion(>completion);
> + virtqueue_kick(vq);
> + time_left = wait_for_completion_timeout(>completion, adap->timeout);
> + ret = virtio_i2c_complete_reqs(vq, reqs, msgs, num, !time_left);
> +
> + if (!time_left) {
> + ret = -ETIMEDOUT;
> + dev_err(>dev, "virtio i2c backend timeout.\n");
> + }
> +
> +err_free:
> + kfree(reqs);
> + return ret;
> +}

> diff --git a/include/uapi/linux/virtio_i2c.h b/include/uapi/linux/virtio_i2c.h
> new file mode 100644
> index 000..df936a2
> --- /dev/null
> +++ b/include/uapi/linux/virtio_i2c.h
> @@ -0,0 +1,41 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later WITH Linux-syscall-note */
> +/*
> + * Definitions for virtio I2C Adpter
> + *
> + * Copyright (c) 2021 Intel Corporation. All rights reserved.
> + */
> +
> +#ifndef _UAPI_LINUX_VIRTIO_I2C_H
> +#define _UAPI_LINUX_VIRTIO_I2C_H
> +
> +#include 
> +#include 

Both of these need to be the uapi headers as Andy said earlier and they better
be in alphabetical order.

#include 
#include 

Though in your support, I do see a lot of files in uapi/linux using the standard
types.h, which looks wrong as that types.h is not a userspace ABI.

-- 
viresh
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v13] i2c: virtio: add a virtio i2c frontend driver

2021-07-05 Thread Andy Shevchenko
On Mon, Jul 5, 2021 at 11:03 AM Viresh Kumar  wrote:
> On 05-07-21, 14:53, Jie Deng wrote:

> > +#include 
> > +#include 
>
> Both of these need to be the uapi headers as Andy said earlier

They are already since this header _is_ UAPI, what you are suggesting
is gonna not work, although it's correct for in-kernel users of UAPI
headers.

>  and they better
> be in alphabetical order.

I prefer that as well.

> #include 
> #include 

-- 
With Best Regards,
Andy Shevchenko
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v12] i2c: virtio: add a virtio i2c frontend driver

2021-07-05 Thread Jie Deng



On 2021/7/5 10:43, Viresh Kumar wrote:

On 02-07-21, 12:58, Andy Shevchenko wrote:

On Fri, Jul 02, 2021 at 04:46:47PM +0800, Jie Deng wrote:

+static int virtio_i2c_complete_reqs(struct virtqueue *vq,
+   struct virtio_i2c_req *reqs,
+   struct i2c_msg *msgs, int nr,
+   bool fail)
+{
+   struct virtio_i2c_req *req;
+   bool failed = fail;

Jie, you can actually get rid of this variable too. Jut rename fail to failed
and everything shall work as you want.



Oh, You are not right. I just found we can't remove this variable. The 
"fail" and "failed" have different


meanings for this function. We need fail to return the result.



+   unsigned int len;
+   int i, j = 0;
+
+   for (i = 0; i < nr; i++) {
+   /* Detach the ith request from the vq */
+   req = virtqueue_get_buf(vq, );
+
+   /*
+* Condition (req && req == [i]) should always meet since
+* we have total nr requests in the vq.
+*/
+   if (!failed && (WARN_ON(!(req && req == [i])) ||
+   (req->in_hdr.status != VIRTIO_I2C_MSG_OK)))
+   failed = true;

...and after failed is true, we are continuing the loop, why?

Actually this function can be called with fail set to true. We proceed as we
need to call i2c_put_dma_safe_msg_buf() for all buffers we allocated earlier.


+   i2c_put_dma_safe_msg_buf(reqs[i].buf, [i], !failed);
+   if (!failed)
+   ++j;

Besides better to read j++ the j itself can be renamed to something more
verbose.


+   }
+   return (fail ? -ETIMEDOUT : j);

Redundant parentheses.


+}

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/3] vDPA/ifcvf: implement management netlink framework for ifcvf

2021-07-05 Thread Michael S. Tsirkin
On Mon, Jul 05, 2021 at 01:04:11PM +0800, Jason Wang wrote:
> > +static int ifcvf_probe(struct pci_dev *pdev, const struct pci_device_id 
> > *id)
> > +{
> > +   struct ifcvf_vdpa_mgmt_dev *ifcvf_mgmt_dev;
> > +   struct device *dev = >dev;
> > +   struct ifcvf_adapter *adapter;
> 
> 
> adapter is not used.

It's used in error handling below. It's not *initialized*.

> 
> > +   u32 dev_type;
> > +   int ret;
> > +
> > +   ifcvf_mgmt_dev = kzalloc(sizeof(struct ifcvf_vdpa_mgmt_dev), 
> > GFP_KERNEL);
> > +   if (!ifcvf_mgmt_dev) {
> > +   IFCVF_ERR(pdev, "Failed to alloc memory for the vDPA management 
> > device\n");
> > +   return -ENOMEM;
> > +   }
> > +
> > +   dev_type = get_dev_type(pdev);
> > +   switch (dev_type) {
> > +   case VIRTIO_ID_NET:
> > +   ifcvf_mgmt_dev->mdev.id_table = id_table_net;
> > +   break;
> > +   case VIRTIO_ID_BLOCK:
> > +   ifcvf_mgmt_dev->mdev.id_table = id_table_blk;
> > +   break;
> > +   default:
> > +   IFCVF_ERR(pdev, "VIRTIO ID %u not supported\n", dev_type);
> > +   ret = -EOPNOTSUPP;
> > +   goto err;
> > +   }
> > +
> > +   ifcvf_mgmt_dev->mdev.ops = _vdpa_mgmt_dev_ops;
> > +   ifcvf_mgmt_dev->mdev.device = dev;
> > +   ifcvf_mgmt_dev->pdev = pdev;
> > +
> > +   ret = pcim_enable_device(pdev);
> > +   if (ret) {
> > +   IFCVF_ERR(pdev, "Failed to enable device\n");
> > +   goto err;
> > +   }
> > +
> > +   ret = pcim_iomap_regions(pdev, BIT(0) | BIT(2) | BIT(4),
> > +IFCVF_DRIVER_NAME);
> > +   if (ret) {
> > +   IFCVF_ERR(pdev, "Failed to request MMIO region\n");
> > +   goto err;
> > +   }
> > +
> > +   ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
> > +   if (ret) {
> > +   IFCVF_ERR(pdev, "No usable DMA configuration\n");
> > +   goto err;
> > +   }
> > +
> > +   ret = devm_add_action_or_reset(dev, ifcvf_free_irq_vectors, pdev);
> > +   if (ret) {
> > +   IFCVF_ERR(pdev,
> > + "Failed for adding devres for freeing irq vectors\n");
> > +   goto err;
> > +   }
> > +
> > +   pci_set_master(pdev);
> > +
> > +   ret = vdpa_mgmtdev_register(_mgmt_dev->mdev);
> > +   if (ret) {
> > +   IFCVF_ERR(pdev,
> > + "Failed to initialize the management interfaces\n");
> > goto err;
> > }
> > @@ -533,14 +610,21 @@ static int ifcvf_probe(struct pci_dev *pdev, const 
> > struct pci_device_id *id)
> >   err:
> > put_device(>vdpa.dev);
> > +   kfree(ifcvf_mgmt_dev);
> > return ret;
> >   }

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v13] i2c: virtio: add a virtio i2c frontend driver

2021-07-05 Thread Viresh Kumar
On 05-07-21, 11:45, Andy Shevchenko wrote:
> On Mon, Jul 5, 2021 at 11:03 AM Viresh Kumar  wrote:
> > On 05-07-21, 14:53, Jie Deng wrote:
> 
> > > +#include 
> > > +#include 
> >
> > Both of these need to be the uapi headers as Andy said earlier
> 
> They are already since this header _is_ UAPI,

Ahh, there is some tricky header inclusion there :)

> what you are suggesting is gonna not work,

Why ?

> although it's correct for in-kernel users of UAPI
> headers.

-- 
viresh
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio_vdpa: reject invalid vq indices

2021-07-05 Thread Stefano Garzarella

On Thu, Jul 01, 2021 at 01:46:52PM +0200, Vincent Whitchurch wrote:

Do not call vDPA drivers' callbacks with vq indicies larger than what
the drivers indicate that they support.  vDPA drivers do not bounds
check the indices.

Signed-off-by: Vincent Whitchurch 
---
drivers/virtio/virtio_vdpa.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
index e28acf482e0c..e9b9dd03f44a 100644
--- a/drivers/virtio/virtio_vdpa.c
+++ b/drivers/virtio/virtio_vdpa.c
@@ -149,6 +149,9 @@ virtio_vdpa_setup_vq(struct virtio_device *vdev, unsigned 
int index,
if (!name)
return NULL;

+   if (index >= vdpa->nvqs)
+   return ERR_PTR(-ENOENT);
+
/* Queue shouldn't already be set up. */
if (ops->get_vq_ready(vdpa, index))
return ERR_PTR(-ENOENT);
--
2.28.0



Reviewed-by: Stefano Garzarella 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v10] i2c: virtio: add a virtio i2c frontend driver

2021-07-05 Thread Viresh Kumar
On 29-06-21, 12:43, Wolfram Sang wrote:
> 
> > From the spec:
> > 
> > The case when ``length of \field{write_buf}''=0, and at the same time,
> > ``length of \field{read_buf}''=0 doesn't make any sense.
> > 
> > I mentioned this in my first reply and to my understanding I did not get
> > a reply that this has changed meanwhile.
> > 
> 
> Also, this code as mentioned before:
> 
> > + if (!msgs[i].len)
> > + break;
> 
> I hope this can extended in the future to allow zero-length messages. If
> this is impossible we need to set an adapter quirk instead.

Wolfram,

I stumbled again upon this while working at the backend implementation.

If you look at i2c_smbus_xfer_emulated(), the command is always sent via
msgbuf0[0]. Even in the case of I2C_SMBUS_QUICK, where we set msg[0].len = 0, we
still send the buf. This is really confusing :(

Do I understand correctly that we always need to send msg[0].buf even when
msg[0].len is 0 ?

If so, it would be difficult to implement this with the current i2c virtio
specification, as the msg.len isn't really passed from guest to host, rather it
is inferred using the length of the buffer itself. And so we can't really pass a
buffer if length is 0.

Moreover, the driver uses i2c_get_dma_safe_msg_buf(), which also depends on the
length parameter here to allocate the buffer and copy data to it.

All in all, the latest version of the driver doesn't work with "i2cdetect -q 
".

To make it work, I had to add this:

diff --git a/drivers/i2c/busses/i2c-virtio.c b/drivers/i2c/busses/i2c-virtio.c
index 731267d42292..5b8bd98ae38e 100644
--- a/drivers/i2c/busses/i2c-virtio.c
+++ b/drivers/i2c/busses/i2c-virtio.c
@@ -73,6 +73,9 @@ static int virtio_i2c_prepare_reqs(struct virtqueue *vq,
sg_init_one(_hdr, [i].out_hdr, 
sizeof(reqs[i].out_hdr));
sgs[outcnt++] = _hdr;

+   if (!msgs[i].len)
+   msgs[i].len = 1;
+
if (msgs[i].len) {
reqs[i].buf = i2c_get_dma_safe_msg_buf([i], 1);
if (!reqs[i].buf)

which made it I2C_SMBUS_BYTE instead of I2C_SMBUS_QUICK.

What should we do here Wolfram?


Jie, while wolfram comes back and replies to this, I think you need to switch
back to NOT supporting zero length transfer and set update virtio_i2c_func() to
return:

I2C_FUNC_I2C | (I2C_FUNC_SMBUS_EMUL & ~I2C_FUNC_SMBUS_QUICK);

Support for zero-length transfers and I2C_FUNC_SMBUS_QUICK can be added
separately.

Thanks.

-- 
viresh
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next 0/2] refactor the ringtest testing for ptr_ring

2021-07-05 Thread Andy Shevchenko
On Mon, Jul 05, 2021 at 08:06:50PM +0800, Yunsheng Lin wrote:
> On 2021/7/5 17:56, Andy Shevchenko wrote:
> > On Mon, Jul 05, 2021 at 11:57:33AM +0800, Yunsheng Lin wrote:
> >> tools/include/* have a lot of abstract layer for building
> >> kernel code from userspace, so reuse or add the abstract
> >> layer in tools/include/ to build the ptr_ring for ringtest
> >> testing.
> > 
> > ...
> > 
> >>  create mode 100644 tools/include/asm/cache.h
> >>  create mode 100644 tools/include/asm/processor.h
> >>  create mode 100644 tools/include/generated/autoconf.h
> >>  create mode 100644 tools/include/linux/align.h
> >>  create mode 100644 tools/include/linux/cache.h
> >>  create mode 100644 tools/include/linux/slab.h
> > 
> > Maybe somebody can change this to be able to include in-tree headers 
> > directly?
> 
> If the above works, maybe the files in tools/include/* is not
> necessary any more, just use the in-tree headers to compile
> the user space app?
> 
> Or I missed something here?

I don't know if it works or not or why that decision had been made to
copy'n'paste headers (Yes, I know they have some modifications).

Somebody needs to check that and see what can be done in order to avoid copying
entire include into tools/include.

> > Besides above, had you tested this with `make O=...`?
> 
> You are right, the generated/autoconf.h is in another directory
> with `make O=...`.
> 
> Any nice idea to fix the above problem?

No idea. But I consider breakage of O= is a show stopper.

-- 
With Best Regards,
Andy Shevchenko


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 10/10] Documentation: Add documentation for VDUSE

2021-07-05 Thread Stefan Hajnoczi
On Mon, Jul 05, 2021 at 11:36:15AM +0800, Jason Wang wrote:
> 
> 在 2021/7/4 下午5:49, Yongji Xie 写道:
> > > > OK, I get you now. Since the VIRTIO specification says "Device
> > > > configuration space is generally used for rarely-changing or
> > > > initialization-time parameters". I assume the VDUSE_DEV_SET_CONFIG
> > > > ioctl should not be called frequently.
> > > The spec uses MUST and other terms to define the precise requirements.
> > > Here the language (especially the word "generally") is weaker and means
> > > there may be exceptions.
> > > 
> > > Another type of access that doesn't work with the VDUSE_DEV_SET_CONFIG
> > > approach is reads that have side-effects. For example, imagine a field
> > > containing an error code if the device encounters a problem unrelated to
> > > a specific virtqueue request. Reading from this field resets the error
> > > code to 0, saving the driver an extra configuration space write access
> > > and possibly race conditions. It isn't possible to implement those
> > > semantics suing VDUSE_DEV_SET_CONFIG. It's another corner case, but it
> > > makes me think that the interface does not allow full VIRTIO semantics.
> 
> 
> Note that though you're correct, my understanding is that config space is
> not suitable for this kind of error propagating. And it would be very hard
> to implement such kind of semantic in some transports.  Virtqueue should be
> much better. As Yong Ji quoted, the config space is used for
> "rarely-changing or intialization-time parameters".
> 
> 
> > Agreed. I will use VDUSE_DEV_GET_CONFIG in the next version. And to
> > handle the message failure, I'm going to add a return value to
> > virtio_config_ops.get() and virtio_cread_* API so that the error can
> > be propagated to the virtio device driver. Then the virtio-blk device
> > driver can be modified to handle that.
> > 
> > Jason and Stefan, what do you think of this way?

Why does VDUSE_DEV_GET_CONFIG need to support an error return value?

The VIRTIO spec provides no way for the device to report errors from
config space accesses.

The QEMU virtio-pci implementation returns -1 from invalid
virtio_config_read*() and silently discards virtio_config_write*()
accesses.

VDUSE can take the same approach with
VDUSE_DEV_GET_CONFIG/VDUSE_DEV_SET_CONFIG.

> I'd like to stick to the current assumption thich get_config won't fail.
> That is to say,
> 
> 1) maintain a config in the kernel, make sure the config space read can
> always succeed
> 2) introduce an ioctl for the vduse usersapce to update the config space.
> 3) we can synchronize with the vduse userspace during set_config
> 
> Does this work?

I noticed that caching is also allowed by the vhost-user protocol
messages (QEMU's docs/interop/vhost-user.rst), but the device doesn't
know whether or not caching is in effect. The interface you outlined
above requires caching.

Is there a reason why the host kernel vDPA code needs to cache the
configuration space?

Here are the vhost-user protocol messages:

  Virtio device config space
  ^^

  ++--+---+-+
  | offset | size | flags | payload |
  ++--+---+-+

  :offset: a 32-bit offset of virtio device's configuration space

  :size: a 32-bit configuration space access size in bytes

  :flags: a 32-bit value:
- 0: Vhost master messages used for writeable fields
- 1: Vhost master messages used for live migration

  :payload: Size bytes array holding the contents of the virtio
device's configuration space

  ...

  ``VHOST_USER_GET_CONFIG``
:id: 24
:equivalent ioctl: N/A
:master payload: virtio device config space
:slave payload: virtio device config space

When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is
submitted by the vhost-user master to fetch the contents of the
virtio device configuration space, vhost-user slave's payload size
MUST match master's request, vhost-user slave uses zero length of
payload to indicate an error to vhost-user master. The vhost-user
master may cache the contents to avoid repeated
``VHOST_USER_GET_CONFIG`` calls.

  ``VHOST_USER_SET_CONFIG``
:id: 25
:equivalent ioctl: N/A
:master payload: virtio device config space
:slave payload: N/A

When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is
submitted by the vhost-user master when the Guest changes the virtio
device configuration space and also can be used for live migration
on the destination host. The vhost-user slave must check the flags
field, and slaves MUST NOT accept SET_CONFIG for read-only
configuration space fields unless the live migration bit is set.

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [MASSMAIL KLMS]Re: [RFC PATCH v2 0/6] Improve SOCK_SEQPACKET receive logic

2021-07-05 Thread Stefano Garzarella

On Mon, Jul 05, 2021 at 01:48:28PM +0300, Arseny Krasnov wrote:


On 04.07.2021 12:54, Michael S. Tsirkin wrote:

On Sun, Jul 04, 2021 at 12:23:03PM +0300, Arseny Krasnov wrote:

On 04.07.2021 11:30, Michael S. Tsirkin wrote:

On Sun, Jul 04, 2021 at 11:08:13AM +0300, Arseny Krasnov wrote:

This patchset modifies receive logic for SOCK_SEQPACKET.
Difference between current implementation and this version is that
now reader is woken up when there is at least one RW packet in rx
queue of socket and data is copied to user's buffer, while merged
approach wake up user only when whole message is received and kept
in queue. New implementation has several advantages:
 1) There is no limit for message length. Merged approach requires
that length must be smaller than 'peer_buf_alloc', otherwise
transmission will stuck.
 2) There is no need to keep whole message in queue, thus no
'kmalloc()' memory will be wasted until EOR is received.

Also new approach has some feature: as fragments of message
are copied until EOR is received, it is possible that part of
message will be already in user's buffer, while rest of message
still not received. And if user will be interrupted by signal or
timeout with part of message in buffer, it will exit receive loop,
leaving rest of message in queue. To solve this problem special
callback was added to transport: it is called when user was forced
to leave exit loop and tells transport to drop any packet until
EOR met.

Sorry about commenting late in the game.  I'm a bit lost


SOCK_SEQPACKET
Provides sequenced, reliable, bidirectional, connection-mode transmission paths 
for records. A record can be sent using one or more output operations and 
received using one or more input operations, but a single operation never 
transfers part of more than one record. Record boundaries are visible to the 
receiver via the MSG_EOR flag.

it's supposed to be reliable - how is it legal to drop packets?

Sorry, seems i need to rephrase description. "Packet" here means fragment of 
record(message) at transport

layer. As this is SEQPACKET mode, receiver could get only whole message or 
error, so if only several fragments

of message was copied (if signal received for example) we can't return it to 
user - it breaks SEQPACKET sense. I think,

in this case we can drop rest of record's fragments legally.


Thank You

Would not that violate the reliable property? IIUC it's only ok to
return an error if socket gets closed. Just like e.g. TCP ...


Sorry for late answer, yes You're right, seems this is unwanted drop...

Lets wait for Stefano Garzarella feedback


It was the same concern I had with the series that introduced SEQPACKET 
for vsock, which is why I suggested to wait until the message is 
complete, before copying it to the user's buffer.


IIUC, with the current upstream implementation, we don't have this 
problem, right?


I'm not sure how to fix this, other than by keeping all the fragments 
queued until we've successfully copied them to user space, which is what 
we should do without this series applied IIUC.


Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] vdpa: vp_vdpa: don't use hard-coded maximum virtqueue size

2021-07-05 Thread Jason Wang


在 2021/7/6 上午1:59, Michael S. Tsirkin 写道:

On Mon, Jul 05, 2021 at 03:29:47PM +0800, Jason Wang wrote:

在 2021/7/5 下午3:26, Michael S. Tsirkin 写道:

On Mon, Jul 05, 2021 at 03:19:10PM +0800, Jason Wang wrote:

This patch switch to read virtqueue size from the capability instead
of depending on the hardcoded value. This allows the per virtqueue
size could be advertised.

Signed-off-by: Jason Wang 

So let's add an ioctl for this? It's really a bug we don't..


As explained in patch 1. Qemu doesn't use VHOST_VDPA_GET_VRING_NUM actually.
Instead it checks the result VHOST_VDPA_SET_VRING_NUM.

So I change VHOST_VDPA_GET_VRING_NUM to return the minimal size of all the
virtqueues.

If you wish we can add a VHOST_VDPA_GET_VRING_NUM2, but I'm not sure it will
have a user or not.

Thanks

Question is how do we know returning the minimal and not e.g. the max
size is the right thing to do?



For the new ioctl, it will return the max queue size per vq.

It's probably too late to fix the old one, so it's only safe to return 
the minimal one.


Actually, most of the vDPA parents should be fine except for the 
vp_vdpa. When running in a nested environment, Qemu only advertise cvq 
with 64 entries.


Thanks






---
   drivers/vdpa/virtio_pci/vp_vdpa.c | 6 --
   1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/vdpa/virtio_pci/vp_vdpa.c 
b/drivers/vdpa/virtio_pci/vp_vdpa.c
index 2926641fb586..198f7076e4d9 100644
--- a/drivers/vdpa/virtio_pci/vp_vdpa.c
+++ b/drivers/vdpa/virtio_pci/vp_vdpa.c
@@ -18,7 +18,6 @@
   #include 
   #include 
-#define VP_VDPA_QUEUE_MAX 256
   #define VP_VDPA_DRIVER_NAME "vp_vdpa"
   #define VP_VDPA_NAME_SIZE 256
@@ -197,7 +196,10 @@ static void vp_vdpa_set_status(struct vdpa_device *vdpa, 
u8 status)
   static u16 vp_vdpa_get_vq_num_max(struct vdpa_device *vdpa, u16 qid)
   {
-   return VP_VDPA_QUEUE_MAX;
+   struct vp_vdpa *vp_vdpa = vdpa_to_vp(vdpa);
+   struct virtio_pci_modern_device *mdev = _vdpa->mdev;
+
+   return vp_modern_get_queue_size(mdev, qid);
   }
   static int vp_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 qid,
--
2.25.1


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v10] i2c: virtio: add a virtio i2c frontend driver

2021-07-05 Thread Jie Deng



On 2021/7/5 20:18, Viresh Kumar wrote:

On 29-06-21, 12:43, Wolfram Sang wrote:

 From the spec:

The case when ``length of \field{write_buf}''=0, and at the same time,
``length of \field{read_buf}''=0 doesn't make any sense.

I mentioned this in my first reply and to my understanding I did not get
a reply that this has changed meanwhile.


Also, this code as mentioned before:


+ if (!msgs[i].len)
+ break;

I hope this can extended in the future to allow zero-length messages. If
this is impossible we need to set an adapter quirk instead.

Wolfram,

I stumbled again upon this while working at the backend implementation.

If you look at i2c_smbus_xfer_emulated(), the command is always sent via
msgbuf0[0]. Even in the case of I2C_SMBUS_QUICK, where we set msg[0].len = 0, we
still send the buf. This is really confusing :(

Do I understand correctly that we always need to send msg[0].buf even when
msg[0].len is 0 ?

If so, it would be difficult to implement this with the current i2c virtio
specification, as the msg.len isn't really passed from guest to host, rather it
is inferred using the length of the buffer itself. And so we can't really pass a
buffer if length is 0.

Moreover, the driver uses i2c_get_dma_safe_msg_buf(), which also depends on the
length parameter here to allocate the buffer and copy data to it.

All in all, the latest version of the driver doesn't work with "i2cdetect -q 
".

To make it work, I had to add this:

diff --git a/drivers/i2c/busses/i2c-virtio.c b/drivers/i2c/busses/i2c-virtio.c
index 731267d42292..5b8bd98ae38e 100644
--- a/drivers/i2c/busses/i2c-virtio.c
+++ b/drivers/i2c/busses/i2c-virtio.c
@@ -73,6 +73,9 @@ static int virtio_i2c_prepare_reqs(struct virtqueue *vq,
 sg_init_one(_hdr, [i].out_hdr, 
sizeof(reqs[i].out_hdr));
 sgs[outcnt++] = _hdr;

+   if (!msgs[i].len)
+   msgs[i].len = 1;
+
 if (msgs[i].len) {
 reqs[i].buf = i2c_get_dma_safe_msg_buf([i], 1);
 if (!reqs[i].buf)

which made it I2C_SMBUS_BYTE instead of I2C_SMBUS_QUICK.

What should we do here Wolfram?


Jie, while wolfram comes back and replies to this, I think you need to switch
back to NOT supporting zero length transfer and set update virtio_i2c_func() to
return:

 I2C_FUNC_I2C | (I2C_FUNC_SMBUS_EMUL & ~I2C_FUNC_SMBUS_QUICK);

Support for zero-length transfers and I2C_FUNC_SMBUS_QUICK can be added
separately.

Thanks.



It's OK to me. Let's see what Wolfram says when he comes back.

I will send the updated version then.

Thanks.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 10/10] Documentation: Add documentation for VDUSE

2021-07-05 Thread Jason Wang


在 2021/7/5 下午8:49, Stefan Hajnoczi 写道:

On Mon, Jul 05, 2021 at 11:36:15AM +0800, Jason Wang wrote:

在 2021/7/4 下午5:49, Yongji Xie 写道:

OK, I get you now. Since the VIRTIO specification says "Device
configuration space is generally used for rarely-changing or
initialization-time parameters". I assume the VDUSE_DEV_SET_CONFIG
ioctl should not be called frequently.

The spec uses MUST and other terms to define the precise requirements.
Here the language (especially the word "generally") is weaker and means
there may be exceptions.

Another type of access that doesn't work with the VDUSE_DEV_SET_CONFIG
approach is reads that have side-effects. For example, imagine a field
containing an error code if the device encounters a problem unrelated to
a specific virtqueue request. Reading from this field resets the error
code to 0, saving the driver an extra configuration space write access
and possibly race conditions. It isn't possible to implement those
semantics suing VDUSE_DEV_SET_CONFIG. It's another corner case, but it
makes me think that the interface does not allow full VIRTIO semantics.


Note that though you're correct, my understanding is that config space is
not suitable for this kind of error propagating. And it would be very hard
to implement such kind of semantic in some transports.  Virtqueue should be
much better. As Yong Ji quoted, the config space is used for
"rarely-changing or intialization-time parameters".



Agreed. I will use VDUSE_DEV_GET_CONFIG in the next version. And to
handle the message failure, I'm going to add a return value to
virtio_config_ops.get() and virtio_cread_* API so that the error can
be propagated to the virtio device driver. Then the virtio-blk device
driver can be modified to handle that.

Jason and Stefan, what do you think of this way?

Why does VDUSE_DEV_GET_CONFIG need to support an error return value?

The VIRTIO spec provides no way for the device to report errors from
config space accesses.

The QEMU virtio-pci implementation returns -1 from invalid
virtio_config_read*() and silently discards virtio_config_write*()
accesses.

VDUSE can take the same approach with
VDUSE_DEV_GET_CONFIG/VDUSE_DEV_SET_CONFIG.


I'd like to stick to the current assumption thich get_config won't fail.
That is to say,

1) maintain a config in the kernel, make sure the config space read can
always succeed
2) introduce an ioctl for the vduse usersapce to update the config space.
3) we can synchronize with the vduse userspace during set_config

Does this work?

I noticed that caching is also allowed by the vhost-user protocol
messages (QEMU's docs/interop/vhost-user.rst), but the device doesn't
know whether or not caching is in effect. The interface you outlined
above requires caching.

Is there a reason why the host kernel vDPA code needs to cache the
configuration space?



Because:

1) Kernel can not wait forever in get_config(), this is the major 
difference with vhost-user.
2) Stick to the current assumption that virtio_cread() should always 
succeed.


Thanks




Here are the vhost-user protocol messages:

   Virtio device config space
   ^^

   ++--+---+-+
   | offset | size | flags | payload |
   ++--+---+-+

   :offset: a 32-bit offset of virtio device's configuration space

   :size: a 32-bit configuration space access size in bytes

   :flags: a 32-bit value:
 - 0: Vhost master messages used for writeable fields
 - 1: Vhost master messages used for live migration

   :payload: Size bytes array holding the contents of the virtio
 device's configuration space

   ...

   ``VHOST_USER_GET_CONFIG``
 :id: 24
 :equivalent ioctl: N/A
 :master payload: virtio device config space
 :slave payload: virtio device config space

 When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is
 submitted by the vhost-user master to fetch the contents of the
 virtio device configuration space, vhost-user slave's payload size
 MUST match master's request, vhost-user slave uses zero length of
 payload to indicate an error to vhost-user master. The vhost-user
 master may cache the contents to avoid repeated
 ``VHOST_USER_GET_CONFIG`` calls.

   ``VHOST_USER_SET_CONFIG``
 :id: 25
 :equivalent ioctl: N/A
 :master payload: virtio device config space
 :slave payload: N/A

 When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is
 submitted by the vhost-user master when the Guest changes the virtio
 device configuration space and also can be used for live migration
 on the destination host. The vhost-user slave must check the flags
 field, and slaves MUST NOT accept SET_CONFIG for read-only
 configuration space fields unless the live migration bit is set.

Stefan


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org

Re: [PATCH net-next 0/2] refactor the ringtest testing for ptr_ring

2021-07-05 Thread Andy Shevchenko
On Mon, Jul 05, 2021 at 02:26:32PM -0400, Michael S. Tsirkin wrote:
> On Mon, Jul 05, 2021 at 08:06:50PM +0800, Yunsheng Lin wrote:
> > On 2021/7/5 17:56, Andy Shevchenko wrote:
> > > On Mon, Jul 05, 2021 at 11:57:33AM +0800, Yunsheng Lin wrote:
> > >> tools/include/* have a lot of abstract layer for building
> > >> kernel code from userspace, so reuse or add the abstract
> > >> layer in tools/include/ to build the ptr_ring for ringtest
> > >> testing.
> > > 
> > > ...
> > > 
> > >>  create mode 100644 tools/include/asm/cache.h
> > >>  create mode 100644 tools/include/asm/processor.h
> > >>  create mode 100644 tools/include/generated/autoconf.h
> > >>  create mode 100644 tools/include/linux/align.h
> > >>  create mode 100644 tools/include/linux/cache.h
> > >>  create mode 100644 tools/include/linux/slab.h
> > > 
> > > Maybe somebody can change this to be able to include in-tree headers 
> > > directly?
> > 
> > If the above works, maybe the files in tools/include/* is not
> > necessary any more, just use the in-tree headers to compile
> > the user space app?
> > 
> > Or I missed something here?
> 
> why would it work? kernel headers outside of uapi are not
> intended to be consumed by userspace.

The problem here, that we are almost getting two copies of the headers, and
tools are not in a good maintenance, so it's often desynchronized from the
actual Linux headers. This will become more and more diverse if we keep same
way of operation. So, I would rather NAK any new copies of the headers from
include/ to tools/include.

> > > Besides above, had you tested this with `make O=...`?
> > 
> > You are right, the generated/autoconf.h is in another directory
> > with `make O=...`.
> > 
> > Any nice idea to fix the above problem?

-- 
With Best Regards,
Andy Shevchenko


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next 0/2] refactor the ringtest testing for ptr_ring

2021-07-05 Thread Michael S. Tsirkin
On Mon, Jul 05, 2021 at 10:05:30PM +0300, Andy Shevchenko wrote:
> On Mon, Jul 5, 2021 at 9:45 PM Michael S. Tsirkin  wrote:
> >
> > On Mon, Jul 05, 2021 at 09:36:26PM +0300, Andy Shevchenko wrote:
> > > On Mon, Jul 05, 2021 at 02:26:32PM -0400, Michael S. Tsirkin wrote:
> > > > On Mon, Jul 05, 2021 at 08:06:50PM +0800, Yunsheng Lin wrote:
> > > > > On 2021/7/5 17:56, Andy Shevchenko wrote:
> > > > > > On Mon, Jul 05, 2021 at 11:57:33AM +0800, Yunsheng Lin wrote:
> > > > > >> tools/include/* have a lot of abstract layer for building
> > > > > >> kernel code from userspace, so reuse or add the abstract
> > > > > >> layer in tools/include/ to build the ptr_ring for ringtest
> > > > > >> testing.
> > > > > >
> > > > > > ...
> > > > > >
> > > > > >>  create mode 100644 tools/include/asm/cache.h
> > > > > >>  create mode 100644 tools/include/asm/processor.h
> > > > > >>  create mode 100644 tools/include/generated/autoconf.h
> > > > > >>  create mode 100644 tools/include/linux/align.h
> > > > > >>  create mode 100644 tools/include/linux/cache.h
> > > > > >>  create mode 100644 tools/include/linux/slab.h
> > > > > >
> > > > > > Maybe somebody can change this to be able to include in-tree 
> > > > > > headers directly?
> > > > >
> > > > > If the above works, maybe the files in tools/include/* is not
> > > > > necessary any more, just use the in-tree headers to compile
> > > > > the user space app?
> > > > >
> > > > > Or I missed something here?
> > > >
> > > > why would it work? kernel headers outside of uapi are not
> > > > intended to be consumed by userspace.
> > >
> > > The problem here, that we are almost getting two copies of the headers, 
> > > and
> > > tools are not in a good maintenance, so it's often desynchronized from the
> > > actual Linux headers. This will become more and more diverse if we keep 
> > > same
> > > way of operation. So, I would rather NAK any new copies of the headers 
> > > from
> > > include/ to tools/include.
> >
> > We already have the copies
> > yes they are not maintained well ... what's the plan then?
> > NAK won't help us improve the situation.
> 
> I understand and the proposal is to leave only the files which are not
> the same (can we do kinda wrappers or so in tools/include rather than
> copying everything?).

I have no idea how we'd do all this. When I did tools/virtio I already
tried to minimize copying. Want to try to do better?

> > I would say copies are kind of okay just make sure they are
> > built with kconfig. Then any breakage will be
> > detected.
> >
> > > > > > Besides above, had you tested this with `make O=...`?
> > > > >
> > > > > You are right, the generated/autoconf.h is in another directory
> > > > > with `make O=...`.
> > > > >
> > > > > Any nice idea to fix the above problem?
> 
> 
> -- 
> With Best Regards,
> Andy Shevchenko

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3] virtio-blk: Add validation for block size in config space

2021-07-05 Thread Michael S. Tsirkin
On Tue, Jun 22, 2021 at 11:11:06AM +0100, Stefan Hajnoczi wrote:
> On Thu, Jun 17, 2021 at 01:10:04PM +0800, Xie Yongji wrote:
> > This ensures that we will not use an invalid block size
> > in config space (might come from an untrusted device).
> > 
> > Signed-off-by: Xie Yongji 
> > ---
> >  drivers/block/virtio_blk.c | 29 +++--
> >  1 file changed, 23 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > index b9fa3ef5b57c..bbdae989f1ea 100644
> > --- a/drivers/block/virtio_blk.c
> > +++ b/drivers/block/virtio_blk.c
> > @@ -696,6 +696,28 @@ static const struct blk_mq_ops virtio_mq_ops = {
> >  static unsigned int virtblk_queue_depth;
> >  module_param_named(queue_depth, virtblk_queue_depth, uint, 0444);
> >  
> > +static int virtblk_validate(struct virtio_device *vdev)
> > +{
> > +   u32 blk_size;
> > +
> > +   if (!vdev->config->get) {
> > +   dev_err(>dev, "%s failure: config access disabled\n",
> > +   __func__);
> > +   return -EINVAL;
> > +   }
> > +
> > +   if (!virtio_has_feature(vdev, VIRTIO_BLK_F_BLK_SIZE))
> > +   return 0;
> > +
> > +   blk_size = virtio_cread32(vdev,
> > +   offsetof(struct virtio_blk_config, blk_size));
> > +
> > +   if (blk_size < SECTOR_SIZE || blk_size > PAGE_SIZE)
> > +   __virtio_clear_bit(vdev, VIRTIO_BLK_F_BLK_SIZE);
> > +
> > +   return 0;
> > +}
> 
> I saw Michael asked for .validate() in v2. I would prefer to keep
> everything in virtblk_probe() instead of adding .validate() because:
> 
> - There is a race condition that an untrusted device can exploit since
>   virtblk_probe() fetches the value again.
> 
> - It's more complex now that .validate() takes a first shot at blk_size
>   and then virtblk_probe() deals with it again later on.

This is a valid concern.
But, silently ignoring what hypervisor told us to do is also ungood.
Let's save it somewhere then.
And there are more examples like this, e.g. the virtio net mtu.

So if we worry about this stuff, let's do something along the lines of

(note: incomplete, won't build: you need to update all drivers).



virtio: allow passing config data from validate callback

To avoid time of check to time of use races on config changes,
pass config data from validate callback to probe.

Signed-off-by: Michael S. Tsirkin 

---

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 4b15c00c0a0a..d3657a0127d7 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -211,6 +211,7 @@ static int virtio_dev_probe(struct device *_d)
u64 device_features;
u64 driver_features;
u64 driver_features_legacy;
+   void *config = NULL;
 
/* We have a driver! */
virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
@@ -249,18 +250,20 @@ static int virtio_dev_probe(struct device *_d)
__virtio_set_bit(dev, i);
 
if (drv->validate) {
-   err = drv->validate(dev);
-   if (err)
+   config = drv->validate(dev);
+   if (IS_ERR(config)) {
+   err = PTR_ERR(config);
goto err;
+   }
}
 
err = virtio_finalize_features(dev);
if (err)
goto err;
 
-   err = drv->probe(dev);
+   err = drv->probe(dev, config);
if (err)
-   goto err;
+   goto probe;
 
/* If probe didn't do it, mark device DRIVER_OK ourselves. */
if (!(dev->config->get_status(dev) & VIRTIO_CONFIG_S_DRIVER_OK))
@@ -269,9 +272,12 @@ static int virtio_dev_probe(struct device *_d)
if (drv->scan)
drv->scan(dev);
 
+   kfree(config);
virtio_config_enable(dev);
 
return 0;
+probe:
+   kfree(config);
 err:
virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
return err;
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index b1894e0323fa..90750567c0cc 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -151,6 +151,8 @@ size_t virtio_max_dma_size(struct virtio_device *vdev);
  * @feature_table_size: number of entries in the feature table array.
  * @feature_table_legacy: same as feature_table but when working in legacy 
mode.
  * @feature_table_size_legacy: number of entries in feature table legacy array.
+ * @validate: the function to validate feature bits and config.
+ *  Returns a valid config or NULL to be passed to probe or 
ERR_PTR(-errno).
  * @probe: the function to call when a device is found.  Returns 0 or -errno.
  * @scan: optional function to call after successful probe; intended
  *for virtio-scsi to invoke a scan.
@@ -167,8 +169,8 @@ struct virtio_driver {
unsigned int feature_table_size;
const unsigned int *feature_table_legacy;
unsigned int feature_table_size_legacy;
-   int (*validate)(struct virtio_device *dev);
-  

Re: [PATCH net-next 0/2] refactor the ringtest testing for ptr_ring

2021-07-05 Thread Andy Shevchenko
On Mon, Jul 5, 2021 at 9:45 PM Michael S. Tsirkin  wrote:
>
> On Mon, Jul 05, 2021 at 09:36:26PM +0300, Andy Shevchenko wrote:
> > On Mon, Jul 05, 2021 at 02:26:32PM -0400, Michael S. Tsirkin wrote:
> > > On Mon, Jul 05, 2021 at 08:06:50PM +0800, Yunsheng Lin wrote:
> > > > On 2021/7/5 17:56, Andy Shevchenko wrote:
> > > > > On Mon, Jul 05, 2021 at 11:57:33AM +0800, Yunsheng Lin wrote:
> > > > >> tools/include/* have a lot of abstract layer for building
> > > > >> kernel code from userspace, so reuse or add the abstract
> > > > >> layer in tools/include/ to build the ptr_ring for ringtest
> > > > >> testing.
> > > > >
> > > > > ...
> > > > >
> > > > >>  create mode 100644 tools/include/asm/cache.h
> > > > >>  create mode 100644 tools/include/asm/processor.h
> > > > >>  create mode 100644 tools/include/generated/autoconf.h
> > > > >>  create mode 100644 tools/include/linux/align.h
> > > > >>  create mode 100644 tools/include/linux/cache.h
> > > > >>  create mode 100644 tools/include/linux/slab.h
> > > > >
> > > > > Maybe somebody can change this to be able to include in-tree headers 
> > > > > directly?
> > > >
> > > > If the above works, maybe the files in tools/include/* is not
> > > > necessary any more, just use the in-tree headers to compile
> > > > the user space app?
> > > >
> > > > Or I missed something here?
> > >
> > > why would it work? kernel headers outside of uapi are not
> > > intended to be consumed by userspace.
> >
> > The problem here, that we are almost getting two copies of the headers, and
> > tools are not in a good maintenance, so it's often desynchronized from the
> > actual Linux headers. This will become more and more diverse if we keep same
> > way of operation. So, I would rather NAK any new copies of the headers from
> > include/ to tools/include.
>
> We already have the copies
> yes they are not maintained well ... what's the plan then?
> NAK won't help us improve the situation.

I understand and the proposal is to leave only the files which are not
the same (can we do kinda wrappers or so in tools/include rather than
copying everything?).

> I would say copies are kind of okay just make sure they are
> built with kconfig. Then any breakage will be
> detected.
>
> > > > > Besides above, had you tested this with `make O=...`?
> > > >
> > > > You are right, the generated/autoconf.h is in another directory
> > > > with `make O=...`.
> > > >
> > > > Any nice idea to fix the above problem?


-- 
With Best Regards,
Andy Shevchenko
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] vdpa: vp_vdpa: don't use hard-coded maximum virtqueue size

2021-07-05 Thread Michael S. Tsirkin
On Mon, Jul 05, 2021 at 03:29:47PM +0800, Jason Wang wrote:
> 
> 在 2021/7/5 下午3:26, Michael S. Tsirkin 写道:
> > On Mon, Jul 05, 2021 at 03:19:10PM +0800, Jason Wang wrote:
> > > This patch switch to read virtqueue size from the capability instead
> > > of depending on the hardcoded value. This allows the per virtqueue
> > > size could be advertised.
> > > 
> > > Signed-off-by: Jason Wang 
> > So let's add an ioctl for this? It's really a bug we don't..
> 
> 
> As explained in patch 1. Qemu doesn't use VHOST_VDPA_GET_VRING_NUM actually.
> Instead it checks the result VHOST_VDPA_SET_VRING_NUM.
> 
> So I change VHOST_VDPA_GET_VRING_NUM to return the minimal size of all the
> virtqueues.
> 
> If you wish we can add a VHOST_VDPA_GET_VRING_NUM2, but I'm not sure it will
> have a user or not.
> 
> Thanks

Question is how do we know returning the minimal and not e.g. the max
size is the right thing to do?


> 
> > 
> > > ---
> > >   drivers/vdpa/virtio_pci/vp_vdpa.c | 6 --
> > >   1 file changed, 4 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/vdpa/virtio_pci/vp_vdpa.c 
> > > b/drivers/vdpa/virtio_pci/vp_vdpa.c
> > > index 2926641fb586..198f7076e4d9 100644
> > > --- a/drivers/vdpa/virtio_pci/vp_vdpa.c
> > > +++ b/drivers/vdpa/virtio_pci/vp_vdpa.c
> > > @@ -18,7 +18,6 @@
> > >   #include 
> > >   #include 
> > > -#define VP_VDPA_QUEUE_MAX 256
> > >   #define VP_VDPA_DRIVER_NAME "vp_vdpa"
> > >   #define VP_VDPA_NAME_SIZE 256
> > > @@ -197,7 +196,10 @@ static void vp_vdpa_set_status(struct vdpa_device 
> > > *vdpa, u8 status)
> > >   static u16 vp_vdpa_get_vq_num_max(struct vdpa_device *vdpa, u16 qid)
> > >   {
> > > - return VP_VDPA_QUEUE_MAX;
> > > + struct vp_vdpa *vp_vdpa = vdpa_to_vp(vdpa);
> > > + struct virtio_pci_modern_device *mdev = _vdpa->mdev;
> > > +
> > > + return vp_modern_get_queue_size(mdev, qid);
> > >   }
> > >   static int vp_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 qid,
> > > -- 
> > > 2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next 0/2] refactor the ringtest testing for ptr_ring

2021-07-05 Thread Michael S. Tsirkin
On Mon, Jul 05, 2021 at 08:06:50PM +0800, Yunsheng Lin wrote:
> On 2021/7/5 17:56, Andy Shevchenko wrote:
> > On Mon, Jul 05, 2021 at 11:57:33AM +0800, Yunsheng Lin wrote:
> >> tools/include/* have a lot of abstract layer for building
> >> kernel code from userspace, so reuse or add the abstract
> >> layer in tools/include/ to build the ptr_ring for ringtest
> >> testing.
> > 
> > ...
> > 
> >>  create mode 100644 tools/include/asm/cache.h
> >>  create mode 100644 tools/include/asm/processor.h
> >>  create mode 100644 tools/include/generated/autoconf.h
> >>  create mode 100644 tools/include/linux/align.h
> >>  create mode 100644 tools/include/linux/cache.h
> >>  create mode 100644 tools/include/linux/slab.h
> > 
> > Maybe somebody can change this to be able to include in-tree headers 
> > directly?
> 
> If the above works, maybe the files in tools/include/* is not
> necessary any more, just use the in-tree headers to compile
> the user space app?
> 
> Or I missed something here?

why would it work? kernel headers outside of uapi are not
intended to be consumed by userspace.


> > 
> > Besides above, had you tested this with `make O=...`?
> 
> You are right, the generated/autoconf.h is in another directory
> with `make O=...`.
> 
> Any nice idea to fix the above problem?
> 
> > 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4] virtio-blk: Add validation for block size in config space

2021-07-05 Thread Michael S. Tsirkin
On Mon, Jul 05, 2021 at 06:00:06PM +0800, Xie Yongji wrote:
> This ensures that we will not use an invalid block size
> in config space (might come from an untrusted device).
> 
> Signed-off-by: Xie Yongji 

I replied on v3.
Silently ignoring what hypervisor said is not a good idea.


> ---
>  drivers/block/virtio_blk.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index e4bd3b1fc3c2..e9d7747c3cc0 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -819,7 +819,7 @@ static int virtblk_probe(struct virtio_device *vdev)
>   err = virtio_cread_feature(vdev, VIRTIO_BLK_F_BLK_SIZE,
>  struct virtio_blk_config, blk_size,
>  _size);
> - if (!err)
> + if (!err && blk_size >= SECTOR_SIZE && blk_size <= PAGE_SIZE)
>   blk_queue_logical_block_size(q, blk_size);
>   else
>   blk_size = queue_logical_block_size(q);
> -- 
> 2.11.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next 0/2] refactor the ringtest testing for ptr_ring

2021-07-05 Thread Michael S. Tsirkin
On Mon, Jul 05, 2021 at 09:36:26PM +0300, Andy Shevchenko wrote:
> On Mon, Jul 05, 2021 at 02:26:32PM -0400, Michael S. Tsirkin wrote:
> > On Mon, Jul 05, 2021 at 08:06:50PM +0800, Yunsheng Lin wrote:
> > > On 2021/7/5 17:56, Andy Shevchenko wrote:
> > > > On Mon, Jul 05, 2021 at 11:57:33AM +0800, Yunsheng Lin wrote:
> > > >> tools/include/* have a lot of abstract layer for building
> > > >> kernel code from userspace, so reuse or add the abstract
> > > >> layer in tools/include/ to build the ptr_ring for ringtest
> > > >> testing.
> > > > 
> > > > ...
> > > > 
> > > >>  create mode 100644 tools/include/asm/cache.h
> > > >>  create mode 100644 tools/include/asm/processor.h
> > > >>  create mode 100644 tools/include/generated/autoconf.h
> > > >>  create mode 100644 tools/include/linux/align.h
> > > >>  create mode 100644 tools/include/linux/cache.h
> > > >>  create mode 100644 tools/include/linux/slab.h
> > > > 
> > > > Maybe somebody can change this to be able to include in-tree headers 
> > > > directly?
> > > 
> > > If the above works, maybe the files in tools/include/* is not
> > > necessary any more, just use the in-tree headers to compile
> > > the user space app?
> > > 
> > > Or I missed something here?
> > 
> > why would it work? kernel headers outside of uapi are not
> > intended to be consumed by userspace.
> 
> The problem here, that we are almost getting two copies of the headers, and
> tools are not in a good maintenance, so it's often desynchronized from the
> actual Linux headers. This will become more and more diverse if we keep same
> way of operation. So, I would rather NAK any new copies of the headers from
> include/ to tools/include.

We already have the copies
yes they are not maintained well ... what's the plan then?
NAK won't help us improve the situation.
I would say copies are kind of okay just make sure they are
built with kconfig. Then any breakage will be
detected.

> > > > Besides above, had you tested this with `make O=...`?
> > > 
> > > You are right, the generated/autoconf.h is in another directory
> > > with `make O=...`.
> > > 
> > > Any nice idea to fix the above problem?
> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V2 2/2] vDPA/ifcvf: implement management netlink framework for ifcvf

2021-07-05 Thread Michael S. Tsirkin
On Mon, Jul 05, 2021 at 10:13:33PM +0800, Zhu Lingshan wrote:
> This commit implments the management netlink framework for ifcvf,

implements

> including register and add / remove a device
> 
> It works with iprouter2:

I am guessing iproute2?

> [root@localhost lszhu]# vdpa mgmtdev show -jp
> {
> "mgmtdev": {
> "pci/:01:00.5": {
> "supported_classes": [ "net" ]
> },
> "pci/:01:00.6": {
> "supported_classes": [ "net" ]
> }
> }
> }
> 
> [root@localhost lszhu]# vdpa dev add mgmtdev pci/:01:00.5 name vdpa0
> [root@localhost lszhu]# vdpa dev add mgmtdev pci/:01:00.6 name vdpa1
> 
> Signed-off-by: Zhu Lingshan 
> ---
>  drivers/vdpa/ifcvf/ifcvf_base.h |   6 ++
>  drivers/vdpa/ifcvf/ifcvf_main.c | 154 
>  2 files changed, 124 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/vdpa/ifcvf/ifcvf_base.h b/drivers/vdpa/ifcvf/ifcvf_base.h
> index ded1b1b5fb13..e5251fcbb200 100644
> --- a/drivers/vdpa/ifcvf/ifcvf_base.h
> +++ b/drivers/vdpa/ifcvf/ifcvf_base.h
> @@ -104,6 +104,12 @@ struct ifcvf_lm_cfg {
>   struct ifcvf_vring_lm_cfg vring_lm_cfg[IFCVF_MAX_QUEUE_PAIRS];
>  };
>  
> +struct ifcvf_vdpa_mgmt_dev {
> + struct vdpa_mgmt_dev mdev;
> + struct ifcvf_adapter *adapter;
> + struct pci_dev *pdev;
> +};
> +
>  int ifcvf_init_hw(struct ifcvf_hw *hw, struct pci_dev *dev);
>  int ifcvf_start_hw(struct ifcvf_hw *hw);
>  void ifcvf_stop_hw(struct ifcvf_hw *hw);
> diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c
> index 5f70ab1283a0..c72d9b36e4a0 100644
> --- a/drivers/vdpa/ifcvf/ifcvf_main.c
> +++ b/drivers/vdpa/ifcvf/ifcvf_main.c
> @@ -218,7 +218,7 @@ static void ifcvf_vdpa_set_status(struct vdpa_device 
> *vdpa_dev, u8 status)
>   int ret;
>  
>   vf  = vdpa_to_vf(vdpa_dev);
> - adapter = dev_get_drvdata(vdpa_dev->dev.parent);
> + adapter = vdpa_to_adapter(vdpa_dev);
>   status_old = ifcvf_get_status(vf);
>  
>   if (status_old == status)
> @@ -442,6 +442,16 @@ static const struct vdpa_config_ops ifc_vdpa_ops = {
>   .set_config_cb  = ifcvf_vdpa_set_config_cb,
>  };
>  
> +static struct virtio_device_id id_table_net[] = {
> + {VIRTIO_ID_NET, VIRTIO_DEV_ANY_ID},
> + {0},
> +};
> +
> +static struct virtio_device_id id_table_blk[] = {
> + {VIRTIO_ID_BLOCK, VIRTIO_DEV_ANY_ID},
> + {0},
> +};
> +
>  static u32 get_dev_type(struct pci_dev *pdev)
>  {
>   u32 dev_type;
> @@ -462,48 +472,30 @@ static u32 get_dev_type(struct pci_dev *pdev)
>   return dev_type;
>  }
>  
> -static int ifcvf_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> +static int ifcvf_vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name)
>  {
> - struct device *dev = >dev;
> + struct ifcvf_vdpa_mgmt_dev *ifcvf_mgmt_dev;
>   struct ifcvf_adapter *adapter;
> + struct pci_dev *pdev;
>   struct ifcvf_hw *vf;
> + struct device *dev;
>   int ret, i;
>  
> - ret = pcim_enable_device(pdev);
> - if (ret) {
> - IFCVF_ERR(pdev, "Failed to enable device\n");
> - return ret;
> - }
> -
> - ret = pcim_iomap_regions(pdev, BIT(0) | BIT(2) | BIT(4),
> -  IFCVF_DRIVER_NAME);
> - if (ret) {
> - IFCVF_ERR(pdev, "Failed to request MMIO region\n");
> - return ret;
> - }
> -
> - ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
> - if (ret) {
> - IFCVF_ERR(pdev, "No usable DMA configuration\n");
> - return ret;
> - }
> -
> - ret = devm_add_action_or_reset(dev, ifcvf_free_irq_vectors, pdev);
> - if (ret) {
> - IFCVF_ERR(pdev,
> -   "Failed for adding devres for freeing irq vectors\n");
> - return ret;
> - }
> + ifcvf_mgmt_dev = container_of(mdev, struct ifcvf_vdpa_mgmt_dev, mdev);
> + if (ifcvf_mgmt_dev->adapter)
> + return -EOPNOTSUPP;
>  
> + pdev = ifcvf_mgmt_dev->pdev;
> + dev = >dev;
>   adapter = vdpa_alloc_device(struct ifcvf_adapter, vdpa,
> - dev, _vdpa_ops, NULL);
> - if (adapter == NULL) {
> + dev, _vdpa_ops, name);
> + if (!adapter) {
>   IFCVF_ERR(pdev, "Failed to allocate vDPA structure");
>   return -ENOMEM;
>   }
>  
> - pci_set_master(pdev);
> - pci_set_drvdata(pdev, adapter);
> + ifcvf_mgmt_dev->adapter = adapter;
> + pci_set_drvdata(pdev, ifcvf_mgmt_dev);
>  
>   vf = >vf;
>   vf->dev_type = get_dev_type(pdev);
> @@ -523,9 +515,10 @@ static int ifcvf_probe(struct pci_dev *pdev, const 
> struct pci_device_id *id)
>  
>   vf->hw_features = ifcvf_get_hw_features(vf);
>  
> - ret = vdpa_register_device(>vdpa, IFCVF_MAX_QUEUE_PAIRS * 2);
> + adapter->vdpa.mdev = _mgmt_dev->mdev;
> + ret = _vdpa_register_device(>vdpa, IFCVF_MAX_QUEUE_PAIRS * 2);
>  

Re: [PATCH net-next 1/2] tools: add missing infrastructure for building ptr_ring.h

2021-07-05 Thread Michael S. Tsirkin
On Mon, Jul 05, 2021 at 11:57:34AM +0800, Yunsheng Lin wrote:
> In order to build ptr_ring.h in userspace, the cacheline
> aligning, cpu_relax() and slab related infrastructure is
> needed, so add them in this patch.
> 
> As L1_CACHE_BYTES may be different for different arch, which
> is mostly defined in include/generated/autoconf.h, so user may
> need to do "make defconfig" before building a tool using the
> API in linux/cache.h.
> 
> Also "linux/lockdep.h" is not added in "tools/include" yet,
> so remove it in "linux/spinlock.h", and the only place using
> "linux/spinlock.h" is tools/testing/radix-tree, removing that
> does not break radix-tree testing.
> 
> Signed-off-by: Yunsheng Lin 

This is hard to review.
Try to split this please. Functional changes separate from
merely moving code around.


> ---
>  tools/include/asm/cache.h  | 56 
>  tools/include/asm/processor.h  | 36 
>  tools/include/generated/autoconf.h |  1 +
>  tools/include/linux/align.h| 15 +++
>  tools/include/linux/cache.h| 87 
> ++
>  tools/include/linux/gfp.h  |  4 ++
>  tools/include/linux/slab.h | 46 
>  tools/include/linux/spinlock.h |  2 -
>  8 files changed, 245 insertions(+), 2 deletions(-)
>  create mode 100644 tools/include/asm/cache.h
>  create mode 100644 tools/include/asm/processor.h
>  create mode 100644 tools/include/generated/autoconf.h
>  create mode 100644 tools/include/linux/align.h
>  create mode 100644 tools/include/linux/cache.h
>  create mode 100644 tools/include/linux/slab.h
> 
> diff --git a/tools/include/asm/cache.h b/tools/include/asm/cache.h
> new file mode 100644
> index 000..071e310
> --- /dev/null
> +++ b/tools/include/asm/cache.h
> @@ -0,0 +1,56 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef __TOOLS_LINUX_ASM_CACHE_H
> +#define __TOOLS_LINUX_ASM_CACHE_H
> +
> +#include 
> +
> +#if defined(__i386__) || defined(__x86_64__)
> +#define L1_CACHE_SHIFT   (CONFIG_X86_L1_CACHE_SHIFT)
> +#elif defined(__arm__)
> +#define L1_CACHE_SHIFT   (CONFIG_ARM_L1_CACHE_SHIFT)
> +#elif defined(__aarch64__)
> +#define L1_CACHE_SHIFT   (6)
> +#elif defined(__powerpc__)
> +
> +/* bytes per L1 cache line */
> +#if defined(CONFIG_PPC_8xx)
> +#define L1_CACHE_SHIFT   4
> +#elif defined(CONFIG_PPC_E500MC)
> +#define L1_CACHE_SHIFT   6
> +#elif defined(CONFIG_PPC32)
> +#if defined(CONFIG_PPC_47x)
> +#define L1_CACHE_SHIFT   7
> +#else
> +#define L1_CACHE_SHIFT   5
> +#endif
> +#else /* CONFIG_PPC64 */
> +#define L1_CACHE_SHIFT   7
> +#endif
> +
> +#elif defined(__sparc__)
> +#define L1_CACHE_SHIFT 5
> +#elif defined(__alpha__)
> +
> +#if defined(CONFIG_ALPHA_GENERIC) || defined(CONFIG_ALPHA_EV6)
> +#define L1_CACHE_SHIFT   6
> +#else
> +/* Both EV4 and EV5 are write-through, read-allocate,
> +   direct-mapped, physical.
> +*/
> +#define L1_CACHE_SHIFT   5
> +#endif
> +
> +#elif defined(__mips__)
> +#define L1_CACHE_SHIFT   CONFIG_MIPS_L1_CACHE_SHIFT
> +#elif defined(__ia64__)
> +#define L1_CACHE_SHIFT   CONFIG_IA64_L1_CACHE_SHIFT
> +#elif defined(__nds32__)
> +#define L1_CACHE_SHIFT   5
> +#else
> +#define L1_CACHE_SHIFT   5
> +#endif
> +
> +#define L1_CACHE_BYTES   (1 << L1_CACHE_SHIFT)
> +
> +#endif
> diff --git a/tools/include/asm/processor.h b/tools/include/asm/processor.h
> new file mode 100644
> index 000..3198ad6
> --- /dev/null
> +++ b/tools/include/asm/processor.h
> @@ -0,0 +1,36 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef __TOOLS_LINUX_ASM_PROCESSOR_H
> +#define __TOOLS_LINUX_ASM_PROCESSOR_H
> +
> +#include 
> +
> +#if defined(__i386__) || defined(__x86_64__)
> +#include "../../arch/x86/include/asm/vdso/processor.h"
> +#elif defined(__arm__)
> +#include "../../arch/arm/include/asm/vdso/processor.h"
> +#elif defined(__aarch64__)
> +#include "../../arch/arm64/include/asm/vdso/processor.h"
> +#elif defined(__powerpc__)
> +#include "../../arch/powerpc/include/vdso/processor.h"
> +#elif defined(__s390__)
> +#include "../../arch/s390/include/vdso/processor.h"
> +#elif defined(__sh__)
> +#include "../../arch/sh/include/asm/processor.h"
> +#elif defined(__sparc__)
> +#include "../../arch/sparc/include/asm/processor.h"
> +#elif defined(__alpha__)
> +#include "../../arch/alpha/include/asm/processor.h"
> +#elif defined(__mips__)
> +#include "../../arch/mips/include/asm/vdso/processor.h"
> +#elif defined(__ia64__)
> +#include "../../arch/ia64/include/asm/processor.h"
> +#elif defined(__xtensa__)
> +#include "../../arch/xtensa/include/asm/processor.h"
> +#elif defined(__nds32__)
> +#include "../../arch/nds32/include/asm/processor.h"
> +#else
> +#define cpu_relax()  sched_yield()

Does this have a chance to work outside of kernel?

> +#endif

did you actually test or even test build all these arches?
Not sure we need to bother with hacks like these.


> +
> +#endif
> diff --git 

Re: [PATCH V2 5/6] virtio: add one field into virtio_device for recording if device uses managed irq

2021-07-05 Thread Christoph Hellwig
On Fri, Jul 02, 2021 at 11:05:54PM +0800, Ming Lei wrote:
> blk-mq needs to know if the device uses managed irq, so add one field
> to virtio_device for recording if device uses managed irq.
> 
> If the driver use managed irq, this flag has to be set so it can be
> passed to blk-mq.

I don't think all this boilerplate code make a whole lot of sense.
I think we need to record this information deep down in the irq code by
setting a flag in struct device only if pci_alloc_irq_vectors_affinity
atually managed to allocate multiple vectors and the PCI_IRQ_AFFINITY
flag was set.  Then blk-mq can look at that flag, and also check that
more than one queue is in used and work based on that.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization