[PATCH] target/ppc: Do not clear MSR[ME] on MCE interrupts to supervisor
Hardware clears the MSR[ME] bit when delivering a machine check interrupt, so that is what QEMU does. The spapr environment runs in supervisor mode though, and receives machine check interrupts after they are processed by the hypervisor, and MSR[ME] must always be enabled in supervisor mode (otherwise it could checkstop the system). So MSR[ME] must not be cleared when delivering machine checks to the supervisor. The fix to prevent supervisor mode from modifying MSR[ME] also prevented it from re-enabling the incorrectly cleared MSR[ME] bit when returning from handling the interrupt. Before that fix, the problem was not very noticable with well-behaved code. So the Fixes tag is not strictly correct, but practically they go together. Found by kvm-unit-tests machine check tests (not yet upstream). Fixes: 678b6f1af75ef ("target/ppc: Prevent supervisor from modifying MSR[ME]") Signed-off-by: Nicholas Piggin --- target/ppc/excp_helper.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c index 80f584f933..674c05a2ce 100644 --- a/target/ppc/excp_helper.c +++ b/target/ppc/excp_helper.c @@ -1345,9 +1345,10 @@ static void powerpc_excp_books(PowerPCCPU *cpu, int excp) * clear (e.g., see FWNMI in PAPR). */ new_msr |= (target_ulong)MSR_HVB; + +/* HV machine check exceptions don't have ME set */ +new_msr &= ~((target_ulong)1 << MSR_ME); } -/* machine check exceptions don't have ME set */ -new_msr &= ~((target_ulong)1 << MSR_ME); msr |= env->error_code; break; -- 2.42.0
Re: [PATCH v2] target/riscv: Fix the element agnostic function problem
On 2024/3/21 11:58, Huang Tao wrote: In RVV and vcrypto instructions, the masked and tail elements are set to 1s using vext_set_elems_1s function if the vma/vta bit is set. It is the element agnostic policy. However, this function can't deal the big endian situation. This patch fixes the problem by adding handling of such case. Signed-off-by: Huang Tao Suggested-by: Richard Henderson --- Changes in v2: - Keep the api of vext_set_elems_1s - Reduce the number of patches. --- target/riscv/vector_internals.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/target/riscv/vector_internals.c b/target/riscv/vector_internals.c index 12f5964fbb..3e45b9b4a7 100644 --- a/target/riscv/vector_internals.c +++ b/target/riscv/vector_internals.c @@ -30,6 +30,28 @@ void vext_set_elems_1s(void *base, uint32_t is_agnostic, uint32_t cnt, if (tot - cnt == 0) { return ; } + +#if HOST_BIG_ENDIAN +/* + * Deal the situation when the elements are insdie + * only one uint64 block including setting the + * masked-off element. + */ +if ((tot - 1) ^ cnt < 8) { +memset(base + H1(tot - 1), -1, tot - cnt); +return; +} +/* + * Otherwise, at least cross two uint64_t blocks. + * Set first unaligned block. + */ +if (cnt % 8 != 0) { +uint32_t j = ROUND_UP(cnt, 8); +memset(base + H1(j - 1), -1, j - cnt); +cnt = j; +} +/* Set other 64bit aligend blocks */ +#endif Reviewed-by: LIU Zhiwei Zhiwei memset(base + cnt, -1, tot - cnt); }
Re: [PATCH v4 1/2] vhost: dirty log should be per backend type
On Thu, Mar 21, 2024 at 4:29 AM Si-Wei Liu wrote: > > > > On 3/19/2024 8:25 PM, Jason Wang wrote: > > On Tue, Mar 19, 2024 at 6:06 AM Si-Wei Liu wrote: > >> > >> > >> On 3/17/2024 8:20 PM, Jason Wang wrote: > >>> On Sat, Mar 16, 2024 at 2:33 AM Si-Wei Liu wrote: > > On 3/14/2024 8:50 PM, Jason Wang wrote: > > On Fri, Mar 15, 2024 at 5:39 AM Si-Wei Liu > > wrote: > >> There could be a mix of both vhost-user and vhost-kernel clients > >> in the same QEMU process, where separate vhost loggers for the > >> specific vhost type have to be used. Make the vhost logger per > >> backend type, and have them properly reference counted. > > It's better to describe what's the advantage of doing this. > Yes, I can add that to the log. Although it's a niche use case, it was > actually a long standing limitation / bug that vhost-user and > vhost-kernel loggers can't co-exist per QEMU process, but today it's > just silent failure that may be ended up with. This bug fix removes that > implicit limitation in the code. > >>> Ok. > >>> > >> Suggested-by: Michael S. Tsirkin > >> Signed-off-by: Si-Wei Liu > >> > >> --- > >> v3->v4: > >> - remove checking NULL return value from vhost_log_get > >> > >> v2->v3: > >> - remove non-effective assertion that never be reached > >> - do not return NULL from vhost_log_get() > >> - add neccessary assertions to vhost_log_get() > >> --- > >> hw/virtio/vhost.c | 45 > >> + > >> 1 file changed, 33 insertions(+), 12 deletions(-) > >> > >> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c > >> index 2c9ac79..612f4db 100644 > >> --- a/hw/virtio/vhost.c > >> +++ b/hw/virtio/vhost.c > >> @@ -43,8 +43,8 @@ > >> do { } while (0) > >> #endif > >> > >> -static struct vhost_log *vhost_log; > >> -static struct vhost_log *vhost_log_shm; > >> +static struct vhost_log *vhost_log[VHOST_BACKEND_TYPE_MAX]; > >> +static struct vhost_log *vhost_log_shm[VHOST_BACKEND_TYPE_MAX]; > >> > >> /* Memslots used by backends that support private memslots > >> (without an fd). */ > >> static unsigned int used_memslots; > >> @@ -287,6 +287,10 @@ static int vhost_set_backend_type(struct > >> vhost_dev *dev, > >> r = -1; > >> } > >> > >> +if (r == 0) { > >> +assert(dev->vhost_ops->backend_type == backend_type); > >> +} > >> + > > Under which condition could we hit this? > Just in case some other function inadvertently corrupted this earlier, > we have to capture discrepancy in the first place... On the other hand, > it will be helpful for other vhost backend writers to diagnose day-one > bug in the code. I feel just code comment here will not be > sufficient/helpful. > >>> See below. > >>> > > It seems not good to assert a local logic. > It seems to me quite a few local asserts are in the same file already, > vhost_save_backend_state, > >>> For example it has assert for > >>> > >>> assert(!dev->started); > >>> > >>> which is not the logic of the function itself but require > >>> vhost_dev_start() not to be called before. > >>> > >>> But it looks like this patch you assert the code just a few lines > >>> above the assert itself? > >> Yes, that was the intent - for e.g. xxx_ops may contain corrupted > >> xxx_ops.backend_type already before coming to this > >> vhost_set_backend_type() function. And we may capture this corrupted > >> state by asserting the expected xxx_ops.backend_type (to be consistent > >> with the backend_type passed in), > > This can happen for all variables. Not sure why backend_ops is special. > The assert is just checking the backend_type field only. The other op > fields in backend_ops have similar assert within the op function itself > also. For e.g. vhost_user_requires_shm_log() and a lot of other > vhost_user ops have the following: > > assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER); > > vhost_vdpa_vq_get_addr() and a lot of other vhost_vdpa ops have: > > assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA); > > vhost_kernel ops has similar assertions as well. > > The reason why it has to be checked against here is now the callers of > vhost_log_get(), would pass in dev->vhost_ops->backend_type to the API, > which are unable to verify the validity of the backend_type by > themselves. The vhost_log_get() has necessary asserts to make bound > check for the vhost_log[] or vhost_log_shm[] array, but specific assert > against the exact backend type in vhost_set_backend_type() will further > harden the implementation in vhost_log_get() and other backend ops. As discussed, those assertions are to make sure of the logic dependencies of other functions. (The assignment
[PATCH v2] target/riscv: Fix the element agnostic function problem
In RVV and vcrypto instructions, the masked and tail elements are set to 1s using vext_set_elems_1s function if the vma/vta bit is set. It is the element agnostic policy. However, this function can't deal the big endian situation. This patch fixes the problem by adding handling of such case. Signed-off-by: Huang Tao Suggested-by: Richard Henderson --- Changes in v2: - Keep the api of vext_set_elems_1s - Reduce the number of patches. --- target/riscv/vector_internals.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/target/riscv/vector_internals.c b/target/riscv/vector_internals.c index 12f5964fbb..3e45b9b4a7 100644 --- a/target/riscv/vector_internals.c +++ b/target/riscv/vector_internals.c @@ -30,6 +30,28 @@ void vext_set_elems_1s(void *base, uint32_t is_agnostic, uint32_t cnt, if (tot - cnt == 0) { return ; } + +#if HOST_BIG_ENDIAN +/* + * Deal the situation when the elements are insdie + * only one uint64 block including setting the + * masked-off element. + */ +if ((tot - 1) ^ cnt < 8) { +memset(base + H1(tot - 1), -1, tot - cnt); +return; +} +/* + * Otherwise, at least cross two uint64_t blocks. + * Set first unaligned block. + */ +if (cnt % 8 != 0) { +uint32_t j = ROUND_UP(cnt, 8); +memset(base + H1(j - 1), -1, j - cnt); +cnt = j; +} +/* Set other 64bit aligend blocks */ +#endif memset(base + cnt, -1, tot - cnt); } -- 2.41.0
Re: [PATCH v4 2/2] vhost: Perform memory section dirty scans once per iteration
On Thu, Mar 21, 2024 at 5:03 AM Si-Wei Liu wrote: > > > > On 3/19/2024 8:27 PM, Jason Wang wrote: > > On Tue, Mar 19, 2024 at 6:16 AM Si-Wei Liu wrote: > >> > >> > >> On 3/17/2024 8:22 PM, Jason Wang wrote: > >>> On Sat, Mar 16, 2024 at 2:45 AM Si-Wei Liu wrote: > > On 3/14/2024 9:03 PM, Jason Wang wrote: > > On Fri, Mar 15, 2024 at 5:39 AM Si-Wei Liu > > wrote: > >> On setups with one or more virtio-net devices with vhost on, > >> dirty tracking iteration increases cost the bigger the number > >> amount of queues are set up e.g. on idle guests migration the > >> following is observed with virtio-net with vhost=on: > >> > >> 48 queues -> 78.11% [.] vhost_dev_sync_region.isra.13 > >> 8 queues -> 40.50% [.] vhost_dev_sync_region.isra.13 > >> 1 queue -> 6.89% [.] vhost_dev_sync_region.isra.13 > >> 2 devices, 1 queue -> 18.60% [.] vhost_dev_sync_region.isra.14 > >> > >> With high memory rates the symptom is lack of convergence as soon > >> as it has a vhost device with a sufficiently high number of queues, > >> the sufficient number of vhost devices. > >> > >> On every migration iteration (every 100msecs) it will redundantly > >> query the *shared log* the number of queues configured with vhost > >> that exist in the guest. For the virtqueue data, this is necessary, > >> but not for the memory sections which are the same. So essentially > >> we end up scanning the dirty log too often. > >> > >> To fix that, select a vhost device responsible for scanning the > >> log with regards to memory sections dirty tracking. It is selected > >> when we enable the logger (during migration) and cleared when we > >> disable the logger. If the vhost logger device goes away for some > >> reason, the logger will be re-selected from the rest of vhost > >> devices. > >> > >> After making mem-section logger a singleton instance, constant cost > >> of 7%-9% (like the 1 queue report) will be seen, no matter how many > >> queues or how many vhost devices are configured: > >> > >> 48 queues -> 8.71%[.] vhost_dev_sync_region.isra.13 > >> 2 devices, 8 queues -> 7.97% [.] vhost_dev_sync_region.isra.14 > >> > >> Co-developed-by: Joao Martins > >> Signed-off-by: Joao Martins > >> Signed-off-by: Si-Wei Liu > >> > >> --- > >> v3 -> v4: > >> - add comment to clarify effect on cache locality and > >>performance > >> > >> v2 -> v3: > >> - add after-fix benchmark to commit log > >> - rename vhost_log_dev_enabled to vhost_dev_should_log > >> - remove unneeded comparisons for backend_type > >> - use QLIST array instead of single flat list to store vhost > >>logger devices > >> - simplify logger election logic > >> --- > >> hw/virtio/vhost.c | 67 > >> ++- > >> include/hw/virtio/vhost.h | 1 + > >> 2 files changed, 62 insertions(+), 6 deletions(-) > >> > >> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c > >> index 612f4db..58522f1 100644 > >> --- a/hw/virtio/vhost.c > >> +++ b/hw/virtio/vhost.c > >> @@ -45,6 +45,7 @@ > >> > >> static struct vhost_log *vhost_log[VHOST_BACKEND_TYPE_MAX]; > >> static struct vhost_log *vhost_log_shm[VHOST_BACKEND_TYPE_MAX]; > >> +static QLIST_HEAD(, vhost_dev) vhost_log_devs[VHOST_BACKEND_TYPE_MAX]; > >> > >> /* Memslots used by backends that support private memslots > >> (without an fd). */ > >> static unsigned int used_memslots; > >> @@ -149,6 +150,47 @@ bool vhost_dev_has_iommu(struct vhost_dev *dev) > >> } > >> } > >> > >> +static inline bool vhost_dev_should_log(struct vhost_dev *dev) > >> +{ > >> +assert(dev->vhost_ops); > >> +assert(dev->vhost_ops->backend_type > VHOST_BACKEND_TYPE_NONE); > >> +assert(dev->vhost_ops->backend_type < VHOST_BACKEND_TYPE_MAX); > >> + > >> +return dev == > >> QLIST_FIRST(_log_devs[dev->vhost_ops->backend_type]); > > A dumb question, why not simple check > > > > dev->log == vhost_log_shm[dev->vhost_ops->backend_type] > Because we are not sure if the logger comes from vhost_log_shm[] or > vhost_log[]. Don't want to complicate the check here by calling into > vhost_dev_log_is_shared() everytime when the .log_sync() is called. > >>> It has very low overhead, isn't it? > >> Whether this has low overhead will have to depend on the specific > >> backend's implementation for .vhost_requires_shm_log(), which the common > >> vhost layer should not assume upon or rely on the current implementation. > >> > >>> static bool vhost_dev_log_is_shared(struct vhost_dev *dev) > >>> { > >>> return dev->vhost_ops->vhost_requires_shm_log && > >>>
Re: [PATCH v1] target/loongarch: Fix qemu-system-loongarch64 assert failed with the option '-d int'
在 2024/3/21 上午10:50, Richard Henderson 写道: On 3/20/24 16:11, Song Gao wrote: qemu-system-loongarch64 assert failed with the option '-d int', the helper_idle() raise an exception EXCP_HLT, but the exception name is undefined. Signed-off-by: Song Gao --- target/loongarch/cpu.c | 75 ++ 1 file changed, 46 insertions(+), 29 deletions(-) diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c index f6ffb3aadb..17a923de02 100644 --- a/target/loongarch/cpu.c +++ b/target/loongarch/cpu.c @@ -45,33 +45,46 @@ const char * const fregnames[32] = { "f24", "f25", "f26", "f27", "f28", "f29", "f30", "f31", }; -static const char * const excp_names[] = { - [EXCCODE_INT] = "Interrupt", - [EXCCODE_PIL] = "Page invalid exception for load", - [EXCCODE_PIS] = "Page invalid exception for store", - [EXCCODE_PIF] = "Page invalid exception for fetch", - [EXCCODE_PME] = "Page modified exception", - [EXCCODE_PNR] = "Page Not Readable exception", - [EXCCODE_PNX] = "Page Not Executable exception", - [EXCCODE_PPI] = "Page Privilege error", - [EXCCODE_ADEF] = "Address error for instruction fetch", - [EXCCODE_ADEM] = "Address error for Memory access", - [EXCCODE_SYS] = "Syscall", - [EXCCODE_BRK] = "Break", - [EXCCODE_INE] = "Instruction Non-Existent", - [EXCCODE_IPE] = "Instruction privilege error", - [EXCCODE_FPD] = "Floating Point Disabled", - [EXCCODE_FPE] = "Floating Point Exception", - [EXCCODE_DBP] = "Debug breakpoint", - [EXCCODE_BCE] = "Bound Check Exception", - [EXCCODE_SXD] = "128 bit vector instructions Disable exception", - [EXCCODE_ASXD] = "256 bit vector instructions Disable exception", +struct TypeExcp { + int32_t exccode; + const char *name; +}; + +static const struct TypeExcp excp_names[] = { + {EXCCODE_INT, "Interrupt"}, + {EXCCODE_PIL, "Page invalid exception for load"}, + {EXCCODE_PIS, "Page invalid exception for store"}, + {EXCCODE_PIF, "Page invalid exception for fetch"}, + {EXCCODE_PME, "Page modified exception"}, + {EXCCODE_PNR, "Page Not Readable exception"}, + {EXCCODE_PNX, "Page Not Executable exception"}, + {EXCCODE_PPI, "Page Privilege error"}, + {EXCCODE_ADEF, "Address error for instruction fetch"}, + {EXCCODE_ADEM, "Address error for Memory access"}, + {EXCCODE_SYS, "Syscall"}, + {EXCCODE_BRK, "Break"}, + {EXCCODE_INE, "Instruction Non-Existent"}, + {EXCCODE_IPE, "Instruction privilege error"}, + {EXCCODE_FPD, "Floating Point Disabled"}, + {EXCCODE_FPE, "Floating Point Exception"}, + {EXCCODE_DBP, "Debug breakpoint"}, + {EXCCODE_BCE, "Bound Check Exception"}, + {EXCCODE_SXD, "128 bit vector instructions Disable exception"}, + {EXCCODE_ASXD, "256 bit vector instructions Disable exception"}, }; const char *loongarch_exception_name(int32_t exception) { - assert(excp_names[exception]); - return excp_names[exception]; + int i; + const char *name = "unknown"; + + for (i = 0; i < ARRAY_SIZE(excp_names); i++) { + if (excp_names[i].exccode == exception) { + name = excp_names[i].name; + break; + } + } + return name; } I think you should return null for unknown, and then... void G_NORETURN do_raise_exception(CPULoongArchState *env, @@ -79,11 +92,17 @@ void G_NORETURN do_raise_exception(CPULoongArchState *env, uintptr_t pc) { CPUState *cs = env_cpu(env); + const char *name; + if (exception == EXCP_HLT) { + name = "EXCP_HLT"; + } else { + name = loongarch_exception_name(exception); + } qemu_log_mask(CPU_LOG_INT, "%s: %d (%s)\n", __func__, exception, - loongarch_exception_name(exception)); + name); ... use two different printfs, one of which prints the exception number. Why would you special case HLT here instead of putting it in the table? Hmm, put HLT in the table no problem. I will correct it. I considered HLT not a real exception to the LoongAarh architecture, so I didn't put it in the table. Thanks. Song Gao r~
Re: [PATCH v1] target/loongarch: Fix qemu-system-loongarch64 assert failed with the option '-d int'
On 3/20/24 16:11, Song Gao wrote: qemu-system-loongarch64 assert failed with the option '-d int', the helper_idle() raise an exception EXCP_HLT, but the exception name is undefined. Signed-off-by: Song Gao --- target/loongarch/cpu.c | 75 ++ 1 file changed, 46 insertions(+), 29 deletions(-) diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c index f6ffb3aadb..17a923de02 100644 --- a/target/loongarch/cpu.c +++ b/target/loongarch/cpu.c @@ -45,33 +45,46 @@ const char * const fregnames[32] = { "f24", "f25", "f26", "f27", "f28", "f29", "f30", "f31", }; -static const char * const excp_names[] = { -[EXCCODE_INT] = "Interrupt", -[EXCCODE_PIL] = "Page invalid exception for load", -[EXCCODE_PIS] = "Page invalid exception for store", -[EXCCODE_PIF] = "Page invalid exception for fetch", -[EXCCODE_PME] = "Page modified exception", -[EXCCODE_PNR] = "Page Not Readable exception", -[EXCCODE_PNX] = "Page Not Executable exception", -[EXCCODE_PPI] = "Page Privilege error", -[EXCCODE_ADEF] = "Address error for instruction fetch", -[EXCCODE_ADEM] = "Address error for Memory access", -[EXCCODE_SYS] = "Syscall", -[EXCCODE_BRK] = "Break", -[EXCCODE_INE] = "Instruction Non-Existent", -[EXCCODE_IPE] = "Instruction privilege error", -[EXCCODE_FPD] = "Floating Point Disabled", -[EXCCODE_FPE] = "Floating Point Exception", -[EXCCODE_DBP] = "Debug breakpoint", -[EXCCODE_BCE] = "Bound Check Exception", -[EXCCODE_SXD] = "128 bit vector instructions Disable exception", -[EXCCODE_ASXD] = "256 bit vector instructions Disable exception", +struct TypeExcp { +int32_t exccode; +const char *name; +}; + +static const struct TypeExcp excp_names[] = { +{EXCCODE_INT, "Interrupt"}, +{EXCCODE_PIL, "Page invalid exception for load"}, +{EXCCODE_PIS, "Page invalid exception for store"}, +{EXCCODE_PIF, "Page invalid exception for fetch"}, +{EXCCODE_PME, "Page modified exception"}, +{EXCCODE_PNR, "Page Not Readable exception"}, +{EXCCODE_PNX, "Page Not Executable exception"}, +{EXCCODE_PPI, "Page Privilege error"}, +{EXCCODE_ADEF, "Address error for instruction fetch"}, +{EXCCODE_ADEM, "Address error for Memory access"}, +{EXCCODE_SYS, "Syscall"}, +{EXCCODE_BRK, "Break"}, +{EXCCODE_INE, "Instruction Non-Existent"}, +{EXCCODE_IPE, "Instruction privilege error"}, +{EXCCODE_FPD, "Floating Point Disabled"}, +{EXCCODE_FPE, "Floating Point Exception"}, +{EXCCODE_DBP, "Debug breakpoint"}, +{EXCCODE_BCE, "Bound Check Exception"}, +{EXCCODE_SXD, "128 bit vector instructions Disable exception"}, +{EXCCODE_ASXD, "256 bit vector instructions Disable exception"}, }; const char *loongarch_exception_name(int32_t exception) { -assert(excp_names[exception]); -return excp_names[exception]; +int i; +const char *name = "unknown"; + +for (i = 0; i < ARRAY_SIZE(excp_names); i++) { +if (excp_names[i].exccode == exception) { +name = excp_names[i].name; +break; +} +} +return name; } I think you should return null for unknown, and then... void G_NORETURN do_raise_exception(CPULoongArchState *env, @@ -79,11 +92,17 @@ void G_NORETURN do_raise_exception(CPULoongArchState *env, uintptr_t pc) { CPUState *cs = env_cpu(env); +const char *name; +if (exception == EXCP_HLT) { +name = "EXCP_HLT"; +} else { +name = loongarch_exception_name(exception); +} qemu_log_mask(CPU_LOG_INT, "%s: %d (%s)\n", __func__, exception, - loongarch_exception_name(exception)); + name); ... use two different printfs, one of which prints the exception number. Why would you special case HLT here instead of putting it in the table? r~
Re: [PATCH v3 1/1] target/i386: Enable page walking from MMIO memory
Paolo, ping! On 3/13/24 09:30, Richard Henderson wrote: On 3/7/24 05:53, Jonathan Cameron wrote: From: Gregory Price CXL emulation of interleave requires read and write hooks due to requirement for subpage granularity. The Linux kernel stack now enables using this memory as conventional memory in a separate NUMA node. If a process is deliberately forced to run from that node $ numactl --membind=1 ls the page table walk on i386 fails. Useful part of backtrace: (cpu=cpu@entry=0x56fd9000, fmt=fmt@entry=0x55fe3378 "cpu_io_recompile: could not find TB for pc=%p") at ../../cpu-target.c:359 (retaddr=0, addr=19595792376, attrs=..., xlat=, cpu=0x56fd9000, out_offset=) at ../../accel/tcg/cputlb.c:1339 (cpu=0x56fd9000, full=0x7fffee0d96e0, ret_be=ret_be@entry=0, addr=19595792376, size=size@entry=8, mmu_idx=4, type=MMU_DATA_LOAD, ra=0) at ../../accel/tcg/cputlb.c:2030 (cpu=cpu@entry=0x56fd9000, p=p@entry=0x756fddc0, mmu_idx=, type=type@entry=MMU_DATA_LOAD, memop=, ra=ra@entry=0) at ../../accel/tcg/cputlb.c:2356 (cpu=cpu@entry=0x56fd9000, addr=addr@entry=19595792376, oi=oi@entry=52, ra=ra@entry=0, access_type=access_type@entry=MMU_DATA_LOAD) at ../../accel/tcg/cputlb.c:2439 at ../../accel/tcg/ldst_common.c.inc:301 at ../../target/i386/tcg/sysemu/excp_helper.c:173 (err=0x756fdf80, out=0x756fdf70, mmu_idx=0, access_type=MMU_INST_FETCH, addr=18446744072116178925, env=0x56fdb7c0) at ../../target/i386/tcg/sysemu/excp_helper.c:578 (cs=0x56fd9000, addr=18446744072116178925, size=, access_type=MMU_INST_FETCH, mmu_idx=0, probe=, retaddr=0) at ../../target/i386/tcg/sysemu/excp_helper.c:604 Avoid this by plumbing the address all the way down from x86_cpu_tlb_fill() where is available as retaddr to the actual accessors which provide it to probe_access_full() which already handles MMIO accesses. Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Richard Henderson Suggested-by: Peter Maydell Signed-off-by: Gregory Price Signed-off-by: Jonathan Cameron --- v3: No change. Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2180 Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2220 r~
Re: [PATCH-for-9.1 22/27] target/s390x: Convert to TCGCPUOps::get_cpu_state()
On 3/19/24 21:09, Philippe Mathieu-Daudé wrote: On 19/3/24 22:05, Richard Henderson wrote: On 3/19/24 05:42, Philippe Mathieu-Daudé wrote: Convert cpu_get_tb_cpu_state() to TCGCPUOps::get_cpu_state(). Note, now s390x_get_cpu_state() is restricted to TCG. Signed-off-by: Philippe Mathieu-Daudé --- target/s390x/cpu.h | 30 -- target/s390x/s390x-internal.h | 2 ++ target/s390x/cpu.c | 1 + target/s390x/tcg/mem_helper.c | 2 +- target/s390x/tcg/translate.c | 23 +++ 5 files changed, 27 insertions(+), 31 deletions(-) Why is the function in translate.c, not cpu.c (with or without ifdefs)? My understanding is target/foo/tcg/ is better for TCG-specific handlers, less #ifdef'ry and stubs. Then bar_helper.c are meant for TCG helpers (including "exec/helper-proto.h"). Can you think of a better file (new name?) in tcg/ or do you rather keep it in the main cpu.c? Given that all other targets to this point used cpu.c, I would prefer s390x and sparc to not be the only exceptions. r~
[PATCH v1] target/loongarch: Fix qemu-system-loongarch64 assert failed with the option '-d int'
qemu-system-loongarch64 assert failed with the option '-d int', the helper_idle() raise an exception EXCP_HLT, but the exception name is undefined. Signed-off-by: Song Gao --- target/loongarch/cpu.c | 75 ++ 1 file changed, 46 insertions(+), 29 deletions(-) diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c index f6ffb3aadb..17a923de02 100644 --- a/target/loongarch/cpu.c +++ b/target/loongarch/cpu.c @@ -45,33 +45,46 @@ const char * const fregnames[32] = { "f24", "f25", "f26", "f27", "f28", "f29", "f30", "f31", }; -static const char * const excp_names[] = { -[EXCCODE_INT] = "Interrupt", -[EXCCODE_PIL] = "Page invalid exception for load", -[EXCCODE_PIS] = "Page invalid exception for store", -[EXCCODE_PIF] = "Page invalid exception for fetch", -[EXCCODE_PME] = "Page modified exception", -[EXCCODE_PNR] = "Page Not Readable exception", -[EXCCODE_PNX] = "Page Not Executable exception", -[EXCCODE_PPI] = "Page Privilege error", -[EXCCODE_ADEF] = "Address error for instruction fetch", -[EXCCODE_ADEM] = "Address error for Memory access", -[EXCCODE_SYS] = "Syscall", -[EXCCODE_BRK] = "Break", -[EXCCODE_INE] = "Instruction Non-Existent", -[EXCCODE_IPE] = "Instruction privilege error", -[EXCCODE_FPD] = "Floating Point Disabled", -[EXCCODE_FPE] = "Floating Point Exception", -[EXCCODE_DBP] = "Debug breakpoint", -[EXCCODE_BCE] = "Bound Check Exception", -[EXCCODE_SXD] = "128 bit vector instructions Disable exception", -[EXCCODE_ASXD] = "256 bit vector instructions Disable exception", +struct TypeExcp { +int32_t exccode; +const char *name; +}; + +static const struct TypeExcp excp_names[] = { +{EXCCODE_INT, "Interrupt"}, +{EXCCODE_PIL, "Page invalid exception for load"}, +{EXCCODE_PIS, "Page invalid exception for store"}, +{EXCCODE_PIF, "Page invalid exception for fetch"}, +{EXCCODE_PME, "Page modified exception"}, +{EXCCODE_PNR, "Page Not Readable exception"}, +{EXCCODE_PNX, "Page Not Executable exception"}, +{EXCCODE_PPI, "Page Privilege error"}, +{EXCCODE_ADEF, "Address error for instruction fetch"}, +{EXCCODE_ADEM, "Address error for Memory access"}, +{EXCCODE_SYS, "Syscall"}, +{EXCCODE_BRK, "Break"}, +{EXCCODE_INE, "Instruction Non-Existent"}, +{EXCCODE_IPE, "Instruction privilege error"}, +{EXCCODE_FPD, "Floating Point Disabled"}, +{EXCCODE_FPE, "Floating Point Exception"}, +{EXCCODE_DBP, "Debug breakpoint"}, +{EXCCODE_BCE, "Bound Check Exception"}, +{EXCCODE_SXD, "128 bit vector instructions Disable exception"}, +{EXCCODE_ASXD, "256 bit vector instructions Disable exception"}, }; const char *loongarch_exception_name(int32_t exception) { -assert(excp_names[exception]); -return excp_names[exception]; +int i; +const char *name = "unknown"; + +for (i = 0; i < ARRAY_SIZE(excp_names); i++) { +if (excp_names[i].exccode == exception) { +name = excp_names[i].name; +break; +} +} +return name; } void G_NORETURN do_raise_exception(CPULoongArchState *env, @@ -79,11 +92,17 @@ void G_NORETURN do_raise_exception(CPULoongArchState *env, uintptr_t pc) { CPUState *cs = env_cpu(env); +const char *name; +if (exception == EXCP_HLT) { +name = "EXCP_HLT"; +} else { +name = loongarch_exception_name(exception); +} qemu_log_mask(CPU_LOG_INT, "%s: %d (%s)\n", __func__, exception, - loongarch_exception_name(exception)); + name); cs->exception_index = exception; cpu_loop_exit_restore(cs, pc); @@ -159,13 +178,11 @@ static void loongarch_cpu_do_interrupt(CPUState *cs) uint32_t vec_size = FIELD_EX64(env->CSR_ECFG, CSR_ECFG, VS); if (cs->exception_index != EXCCODE_INT) { -if (cs->exception_index < 0 || -cs->exception_index >= ARRAY_SIZE(excp_names)) { -name = "unknown"; +if (cs->exception_index == EXCP_HLT) { +name = "EXCP_HLT"; } else { -name = excp_names[cs->exception_index]; +name = loongarch_exception_name(cs->exception_index); } - qemu_log_mask(CPU_LOG_INT, "%s enter: pc " TARGET_FMT_lx " ERA " TARGET_FMT_lx " TLBRERA " TARGET_FMT_lx " %s exception\n", __func__, -- 2.25.1
[PATCH v4] ui/gtk: flush display pipeline before saving vmstate when blob=true
From: Dongwon Kim It is required to ensure the current scanout frame is completed before transitioning guest's run-state to save to prevent potential guest waiting for the response on the resource flush of the old scanout frame upon resume. v2: Giving some time for the fence to be signaled before flushing the pipeline v3: Prevent redudant call of gd_hw_gl_flushed by checking dmabuf and fence_fd >= 0 in it (e.g. during and after eglClientWaitSync in gd_change_runstate) v4: Rewrote the commit msg Creating fence_fd in the same function where sync is created to handle the case where the valid sync is created but fence_fd is failed to be created. 0 is a valid fd so any fence_fd > -1 for the fence in draw function in gtk-egl.c and gtk-gl-area.c will be considered valid egl_sync and fence_fd for it are created in the same function Cc: Marc-André Lureau Cc: Vivek Kasireddy Signed-off-by: Dongwon Kim --- ui/egl-helpers.c | 16 ++-- ui/gtk-egl.c | 10 ++ ui/gtk-gl-area.c | 9 ++--- ui/gtk.c | 31 +++ 4 files changed, 37 insertions(+), 29 deletions(-) diff --git a/ui/egl-helpers.c b/ui/egl-helpers.c index 3d19dbe382..b6a8169ffc 100644 --- a/ui/egl-helpers.c +++ b/ui/egl-helpers.c @@ -376,20 +376,16 @@ void egl_dmabuf_create_sync(QemuDmaBuf *dmabuf) EGL_SYNC_NATIVE_FENCE_ANDROID, NULL); if (sync != EGL_NO_SYNC_KHR) { dmabuf->sync = sync; +dmabuf->fence_fd = eglDupNativeFenceFDANDROID(qemu_egl_display, + dmabuf->sync); +if (dmabuf->fence_fd < 0) { +eglDestroySyncKHR(qemu_egl_display, dmabuf->sync); +dmabuf->sync = NULL; +} } } } -void egl_dmabuf_create_fence(QemuDmaBuf *dmabuf) -{ -if (dmabuf->sync) { -dmabuf->fence_fd = eglDupNativeFenceFDANDROID(qemu_egl_display, - dmabuf->sync); -eglDestroySyncKHR(qemu_egl_display, dmabuf->sync); -dmabuf->sync = NULL; -} -} - #endif /* CONFIG_GBM */ /* -- */ diff --git a/ui/gtk-egl.c b/ui/gtk-egl.c index 3af5ac5bcf..683a87c6b3 100644 --- a/ui/gtk-egl.c +++ b/ui/gtk-egl.c @@ -98,8 +98,8 @@ void gd_egl_draw(VirtualConsole *vc) glFlush(); #ifdef CONFIG_GBM if (dmabuf) { -egl_dmabuf_create_fence(dmabuf); -if (dmabuf->fence_fd > 0) { +egl_dmabuf_create_sync(dmabuf); +if (dmabuf->fence_fd > -1) { qemu_set_fd_handler(dmabuf->fence_fd, gd_hw_gl_flushed, NULL, vc); return; } @@ -348,12 +348,6 @@ void gd_egl_scanout_flush(DisplayChangeListener *dcl, egl_fb_blit(>gfx.win_fb, >gfx.guest_fb, !vc->gfx.y0_top); } -#ifdef CONFIG_GBM -if (vc->gfx.guest_fb.dmabuf) { -egl_dmabuf_create_sync(vc->gfx.guest_fb.dmabuf); -} -#endif - eglSwapBuffers(qemu_egl_display, vc->gfx.esurface); } diff --git a/ui/gtk-gl-area.c b/ui/gtk-gl-area.c index 52dcac161e..7791498646 100644 --- a/ui/gtk-gl-area.c +++ b/ui/gtk-gl-area.c @@ -77,16 +77,11 @@ void gd_gl_area_draw(VirtualConsole *vc) glBlitFramebuffer(0, y1, vc->gfx.w, y2, 0, 0, ww, wh, GL_COLOR_BUFFER_BIT, GL_NEAREST); -#ifdef CONFIG_GBM -if (dmabuf) { -egl_dmabuf_create_sync(dmabuf); -} -#endif glFlush(); #ifdef CONFIG_GBM if (dmabuf) { -egl_dmabuf_create_fence(dmabuf); -if (dmabuf->fence_fd > 0) { +egl_dmabuf_create_sync(dmabuf); +if (dmabuf->fence_fd > -1) { qemu_set_fd_handler(dmabuf->fence_fd, gd_hw_gl_flushed, NULL, vc); return; } diff --git a/ui/gtk.c b/ui/gtk.c index 810d7fc796..bbe05a0baf 100644 --- a/ui/gtk.c +++ b/ui/gtk.c @@ -597,10 +597,14 @@ void gd_hw_gl_flushed(void *vcon) VirtualConsole *vc = vcon; QemuDmaBuf *dmabuf = vc->gfx.guest_fb.dmabuf; -qemu_set_fd_handler(dmabuf->fence_fd, NULL, NULL, NULL); -close(dmabuf->fence_fd); -dmabuf->fence_fd = -1; -graphic_hw_gl_block(vc->gfx.dcl.con, false); +if (dmabuf && dmabuf->fence_fd > -1) { +qemu_set_fd_handler(dmabuf->fence_fd, NULL, NULL, NULL); +close(dmabuf->fence_fd); +dmabuf->fence_fd = -1; +eglDestroySyncKHR(qemu_egl_display, dmabuf->sync); +dmabuf->sync = NULL; +graphic_hw_gl_block(vc->gfx.dcl.con, false); +} } /** DisplayState Callbacks (opengl version) **/ @@ -678,6 +682,25 @@ static const DisplayGLCtxOps egl_ctx_ops = { static void gd_change_runstate(void *opaque, bool running, RunState state) { GtkDisplayState *s = opaque; +int i; + +if (state == RUN_STATE_SAVE_VM) { +
RE: [PATCH v5 5/7] migration/multifd: implement initialization of qpl compression
> -Original Message- > From: Peter Xu > Sent: Thursday, March 21, 2024 4:32 AM > To: Liu, Yuan1 > Cc: Daniel P. Berrangé ; faro...@suse.de; qemu- > de...@nongnu.org; hao.xi...@bytedance.com; bryan.zh...@bytedance.com; Zou, > Nanhai > Subject: Re: [PATCH v5 5/7] migration/multifd: implement initialization of > qpl compression > > On Wed, Mar 20, 2024 at 04:23:01PM +, Liu, Yuan1 wrote: > > let me explain here, during the decompression operation of IAA, the > > decompressed data can be directly output to the virtual address of the > > guest memory by IAA hardware. It can avoid copying the decompressed > data > > to guest memory by CPU. > > I see. > > > Without -mem-prealloc, all the guest memory is not populated, and IAA > > hardware needs to trigger I/O page fault first and then output the > > decompressed data to the guest memory region. Besides that, CPU page > > faults will also trigger IOTLB flush operation when IAA devices use SVM. > > Oh so the IAA hardware already can use CPU pgtables? Nice.. > > Why IOTLB flush is needed? AFAIU we're only installing new pages, the > request can either come from a CPU access or a DMA. In all cases there > should have no tearing down of an old page. Isn't an iotlb flush only > needed if a tear down happens? As far as I know, IAA hardware uses SVM technology to use the CPU's page table for address translation (IOMMU scalable mode directly accesses the CPU page table). Therefore, when the CPU page table changes, the device's Invalidation operation needs to be triggered to update the IOMMU and the device's cache. My current kernel version is mainline 6.2. The issue I see is as follows: --Handle_mm_fault | -- wp_page_copy | -- mmu_notifier_invalidate_range | -- intel_invalidate_rage | -- qi_flush_piotlb -- qi_flush_dev_iotlb_pasid > > Due to the inability to quickly resolve a large number of IO page faults > > and IOTLB flushes, the decompression throughput of the IAA device will > > decrease significantly. > > -- > Peter Xu
Re: [External] Re: [PATCH v3 1/2] memory tier: dax/kmem: create CPUless memory tiers after obtaining HMAT info
On Wed, Mar 20, 2024 at 12:15 AM Huang, Ying wrote: > > "Ho-Ren (Jack) Chuang" writes: > > > The current implementation treats emulated memory devices, such as > > CXL1.1 type3 memory, as normal DRAM when they are emulated as normal memory > > (E820_TYPE_RAM). However, these emulated devices have different > > characteristics than traditional DRAM, making it important to > > distinguish them. Thus, we modify the tiered memory initialization process > > to introduce a delay specifically for CPUless NUMA nodes. This delay > > ensures that the memory tier initialization for these nodes is deferred > > until HMAT information is obtained during the boot process. Finally, > > demotion tables are recalculated at the end. > > > > More details: > > You have done several stuff in one patch. So you need "more details". > You may separate them into multiple patches. One for echo "*" below. > But I have no strong opinion on that. > > > * late_initcall(memory_tier_late_init); > > Some device drivers may have initialized memory tiers between > > `memory_tier_init()` and `memory_tier_late_init()`, potentially bringing > > online memory nodes and configuring memory tiers. They should be excluded > > in the late init. > > > > * Abstract common functions into `mt_find_alloc_memory_type()` > > Since different memory devices require finding or allocating a memory type, > > these common steps are abstracted into a single function, > > `mt_find_alloc_memory_type()`, enhancing code scalability and conciseness. > > > > * Handle cases where there is no HMAT when creating memory tiers > > There is a scenario where a CPUless node does not provide HMAT information. > > If no HMAT is specified, it falls back to using the default DRAM tier. > > > > * Change adist calculation code to use another new lock, `mt_perf_lock`. > > In the current implementation, iterating through CPUlist nodes requires > > holding the `memory_tier_lock`. However, `mt_calc_adistance()` will end up > > trying to acquire the same lock, leading to a potential deadlock. > > Therefore, we propose introducing a standalone `mt_perf_lock` to protect > > `default_dram_perf`. This approach not only avoids deadlock but also > > prevents holding a large lock simultaneously. > > > > * Upgrade `set_node_memory_tier` to support additional cases, including > > default DRAM, late CPUless, and hot-plugged initializations. > > To cover hot-plugged memory nodes, `mt_calc_adistance()` and > > `mt_find_alloc_memory_type()` are moved into `set_node_memory_tier()` to > > handle cases where memtype is not initialized and where HMAT information is > > available. > > > > * Introduce `default_memory_types` for those memory types that are not > > initialized by device drivers. > > Because late initialized memory and default DRAM memory need to be managed, > > a default memory type is created for storing all memory types that are > > not initialized by device drivers and as a fallback. > > > > Signed-off-by: Ho-Ren (Jack) Chuang > > Signed-off-by: Hao Xiang > > --- > > drivers/dax/kmem.c | 13 + > > include/linux/memory-tiers.h | 7 +++ > > mm/memory-tiers.c| 94 +--- > > 3 files changed, 95 insertions(+), 19 deletions(-) > > > > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c > > index 42ee360cf4e3..de1333aa7b3e 100644 > > --- a/drivers/dax/kmem.c > > +++ b/drivers/dax/kmem.c > > @@ -55,21 +55,10 @@ static LIST_HEAD(kmem_memory_types); > > > > static struct memory_dev_type *kmem_find_alloc_memory_type(int adist) > > { > > - bool found = false; > > struct memory_dev_type *mtype; > > > > mutex_lock(_memory_type_lock); > > - list_for_each_entry(mtype, _memory_types, list) { > > - if (mtype->adistance == adist) { > > - found = true; > > - break; > > - } > > - } > > - if (!found) { > > - mtype = alloc_memory_type(adist); > > - if (!IS_ERR(mtype)) > > - list_add(>list, _memory_types); > > - } > > + mtype = mt_find_alloc_memory_type(adist, _memory_types); > > mutex_unlock(_memory_type_lock); > > > > return mtype; > > It seems that there's some miscommunication about my previous comments > about this. What I suggested is to create one separate patch, which > moves mt_find_alloc_memory_type() and mt_put_memory_types() into > memory-tiers.c. And make this patch the first one of the series. > I will make mt_find_alloc/mt_put_memory_type changes as a separate patch and the first of my patch series. Thanks. > > diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h > > index 69e781900082..b2135334ac18 100644 > > --- a/include/linux/memory-tiers.h > > +++ b/include/linux/memory-tiers.h > > @@ -48,6 +48,8 @@ int mt_calc_adistance(int node, int *adist); > > int mt_set_default_dram_perf(int nid, struct access_coordinate *perf, > >
[PATCH v2 2/2] Implement QEMU GA commands for Windows
From: Aidan Leuck Signed-off-by: Aidan Leuck --- qga/commands-posix-ssh.c | 47 +--- qga/commands-ssh-core.c| 57 + qga/commands-ssh-core.h| 15 + qga/commands-windows-ssh.c | 64 -- qga/commands-windows-ssh.h | 15 - qga/meson.build| 5 +++ 6 files changed, 86 insertions(+), 117 deletions(-) create mode 100644 qga/commands-ssh-core.c create mode 100644 qga/commands-ssh-core.h diff --git a/qga/commands-posix-ssh.c b/qga/commands-posix-ssh.c index 236f80de44..9a71b109f9 100644 --- a/qga/commands-posix-ssh.c +++ b/qga/commands-posix-ssh.c @@ -9,6 +9,7 @@ #include #include +#include "commands-ssh-core.h" #include "qapi/error.h" #include "qga-qapi-commands.h" @@ -80,37 +81,6 @@ mkdir_for_user(const char *path, const struct passwd *p, return true; } -static bool -check_openssh_pub_key(const char *key, Error **errp) -{ -/* simple sanity-check, we may want more? */ -if (!key || key[0] == '#' || strchr(key, '\n')) { -error_setg(errp, "invalid OpenSSH public key: '%s'", key); -return false; -} - -return true; -} - -static bool -check_openssh_pub_keys(strList *keys, size_t *nkeys, Error **errp) -{ -size_t n = 0; -strList *k; - -for (k = keys; k != NULL; k = k->next) { -if (!check_openssh_pub_key(k->value, errp)) { -return false; -} -n++; -} - -if (nkeys) { -*nkeys = n; -} -return true; -} - static bool write_authkeys(const char *path, const GStrv keys, const struct passwd *p, Error **errp) @@ -139,21 +109,6 @@ write_authkeys(const char *path, const GStrv keys, return true; } -static GStrv -read_authkeys(const char *path, Error **errp) -{ -g_autoptr(GError) err = NULL; -g_autofree char *contents = NULL; - -if (!g_file_get_contents(path, , NULL, )) { -error_setg(errp, "failed to read '%s': %s", path, err->message); -return NULL; -} - -return g_strsplit(contents, "\n", -1); - -} - void qmp_guest_ssh_add_authorized_keys(const char *username, strList *keys, bool has_reset, bool reset, diff --git a/qga/commands-ssh-core.c b/qga/commands-ssh-core.c new file mode 100644 index 00..c77cee8a11 --- /dev/null +++ b/qga/commands-ssh-core.c @@ -0,0 +1,57 @@ + /* + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" +#include +#include +#include "qapi/error.h" +#include "commands-ssh-core.h" + +GStrv read_authkeys(const char *path, Error **errp) +{ +g_autoptr(GError) err = NULL; +g_autofree char *contents = NULL; + +if (!g_file_get_contents(path, , NULL, )) +{ +error_setg(errp, "failed to read '%s': %s", path, err->message); +return NULL; +} + +return g_strsplit(contents, "\n", -1); +} + +bool check_openssh_pub_keys(strList *keys, size_t *nkeys, Error **errp) +{ +size_t n = 0; +strList *k; + +for (k = keys; k != NULL; k = k->next) +{ +if (!check_openssh_pub_key(k->value, errp)) +{ +return false; +} +n++; +} + +if (nkeys) +{ +*nkeys = n; +} +return true; +} + +bool check_openssh_pub_key(const char *key, Error **errp) +{ +/* simple sanity-check, we may want more? */ +if (!key || key[0] == '#' || strchr(key, '\n')) +{ +error_setg(errp, "invalid OpenSSH public key: '%s'", key); +return false; +} + +return true; +} diff --git a/qga/commands-ssh-core.h b/qga/commands-ssh-core.h new file mode 100644 index 00..9c9e992c62 --- /dev/null +++ b/qga/commands-ssh-core.h @@ -0,0 +1,15 @@ +/* + * Header file for commands-ssh-core.c + * + * Copyright IBM Corp. 2024 + * + * Authors: + * Aidan Leuck + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +GStrv read_authkeys(const char *path, Error **errp); +bool check_openssh_pub_keys(strList *keys, size_t *nkeys, Error **errp); +bool check_openssh_pub_key(const char *key, Error **errp); \ No newline at end of file diff --git a/qga/commands-windows-ssh.c b/qga/commands-windows-ssh.c index e9faae90fc..0739d694ed 100644 --- a/qga/commands-windows-ssh.c +++ b/qga/commands-windows-ssh.c @@ -23,7 +23,6 @@ #include "lmapibuf.h" #include "lmerr.h" #include "qapi/error.h" - #include "qga-qapi-commands.h" #include "sddl.h" #include "shlobj.h" @@ -35,69 +34,6 @@ #define ADMIN_SID "S-1-5-32-544" #define WORLD_SID "S-1-1-0" -/* - * Reads the authorized_keys file and returns an array of strings for each entry - * parameters: - * path -> Path to the authorized_keys file - * errp -> Error structure that will contain errors upon failure. - * returns: Array of strings, where
[PATCH v2 1/2] Implement QEMU GA commands for Windows
From: Aidan Leuck Signed-off-by: Aidan Leuck --- qga/commands-windows-ssh.c | 823 + qga/commands-windows-ssh.h | 26 ++ qga/meson.build| 9 +- qga/qapi-schema.json | 22 +- 4 files changed, 867 insertions(+), 13 deletions(-) create mode 100644 qga/commands-windows-ssh.c create mode 100644 qga/commands-windows-ssh.h diff --git a/qga/commands-windows-ssh.c b/qga/commands-windows-ssh.c new file mode 100644 index 00..e9faae90fc --- /dev/null +++ b/qga/commands-windows-ssh.c @@ -0,0 +1,823 @@ +/* + * QEMU Guest Agent win32-specific command implementations for SSH keys. + * The implementation is opinionated and expects the SSH implementation to + * be OpenSSH. + * + * Copyright Schweitzer Engineering Laboratories. 2024 + * + * Authors: + * Aidan Leuck + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" +#include +#include + +#include "commands-windows-ssh.h" +#include "guest-agent-core.h" +#include "limits.h" +#include "lmaccess.h" +#include "lmapibuf.h" +#include "lmerr.h" +#include "qapi/error.h" + +#include "qga-qapi-commands.h" +#include "sddl.h" +#include "shlobj.h" +#include "userenv.h" + +#define AUTHORIZED_KEY_FILE "authorized_keys" +#define AUTHORIZED_KEY_FILE_ADMIN "administrators_authorized_keys" +#define LOCAL_SYSTEM_SID "S-1-5-18" +#define ADMIN_SID "S-1-5-32-544" +#define WORLD_SID "S-1-1-0" + +/* + * Reads the authorized_keys file and returns an array of strings for each entry + * parameters: + * path -> Path to the authorized_keys file + * errp -> Error structure that will contain errors upon failure. + * returns: Array of strings, where each entry is an authorized key. + */ +static GStrv read_authkeys(const char *path, Error **errp) +{ + g_autoptr(GError) err = NULL; + g_autofree char *contents = NULL; + + if (!g_file_get_contents(path, , NULL, )) { +error_setg(errp, "failed to read '%s': %s", path, err->message); +return NULL; + } + + return g_strsplit(contents, "\n", -1); +} + +/* + * Checks if a OpenSSH key is valid + * parameters: + * key* Key to check for validity + * errp -> Error structure that will contain errors upon failure. + * returns: true if key is valid, false otherwise + */ +static bool check_openssh_pub_key(const char *key, Error **errp) +{ + /* simple sanity-check, we may want more? */ + if (!key || key[0] == '#' || strchr(key, '\n')) { +error_setg(errp, "invalid OpenSSH public key: '%s'", key); +return false; + } + + return true; +} + +/* + * Checks if all openssh keys in the array are valid + * parameters: + * keys -> Array of keys to check + * errp -> Error structure that will contain errors upon failure. + * returns: true if all keys are valid, false otherwise + */ +static bool check_openssh_pub_keys(strList *keys, size_t *nkeys, Error **errp) +{ + size_t n = 0; + strList *k; + + for (k = keys; k != NULL; k = k->next) { +if (!check_openssh_pub_key(k->value, errp)) { + return false; +} +n++; + } + + if (nkeys) { +*nkeys = n; + } + return true; +} + +/* + * Frees userInfo structure. This implements the g_auto cleanup + * for the structure. + */ +void free_userInfo(PWindowsUserInfo info) +{ + g_free(info->sshDirectory); + g_free(info->authorizedKeyFile); + LocalFree(info->SSID); + g_free(info->username); + g_free(info); +} + +/* + * Gets the admin SSH folder for OpenSSH. OpenSSH does not store + * the authorized_key file in the users home directory for security reasons and + * instead stores it at %PROGRAMDATA%/ssh. This function returns the path to + * that directory on the users machine parameters: errp -> error structure to + * set when an error occurs returns: The path to the ssh folder in %PROGRAMDATA% + * or NULl if an error occurred. + */ +static char *get_admin_ssh_folder(Error **errp) +{ + // Allocate memory for the program data path + g_autofree char *programDataPath = NULL; + char *authkeys_path = NULL; + PWSTR pgDataW; + GError *gerr = NULL; + + // Get the KnownFolderPath on the machine. + HRESULT folderResult = + SHGetKnownFolderPath(_ProgramData, 0, NULL, ); + if (folderResult != S_OK) { +error_setg(errp, "Failed to retrieve ProgramData folder"); +goto error; + } + + // Convert from a wide string back to a standard character string. + programDataPath = g_utf16_to_utf8(pgDataW, -1, NULL, NULL, ); + if (!programDataPath) { +goto error; + } + + // Build the path to the file. + authkeys_path = g_build_filename(programDataPath, "ssh", NULL); + CoTaskMemFree(pgDataW); + return authkeys_path; + +error: + CoTaskMemFree(pgDataW); + + if (gerr) { +error_setg(errp,"Failed to convert program data path from wide string to standard utf 8 string. %s", gerr->message); +g_error_free(gerr); + } + + return NULL; +} + +/* + * Gets the path to the SSH folder for the specified user. If
Re: [PATCH-for-9.0] monitor/hmp-cmds-target.c: append a space in error message in gpa2hva()
* Philippe Mathieu-Daudé (phi...@linaro.org) wrote: > On 19/3/24 03:16, Shiyang Ruan via wrote: > > From: Yao Xingtao > > > > In qemu monitor mode, when we use gpa2hva command to print the host > > virtual address corresponding to a guest physical address, if the gpa is > > not in RAM, the error message is below: > > > > (qemu) gpa2hva 0x75000 > > Memory at address 0x75000is not RAM > > > > a space is missed between '0x75000' and 'is'. > > > > Signed-off-by: Yao Xingtao > > --- > > monitor/hmp-cmds-target.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/monitor/hmp-cmds-target.c b/monitor/hmp-cmds-target.c > > index 9338ae8440..ff01cf9d8d 100644 > > --- a/monitor/hmp-cmds-target.c > > +++ b/monitor/hmp-cmds-target.c > > @@ -261,7 +261,7 @@ void *gpa2hva(MemoryRegion **p_mr, hwaddr addr, > > uint64_t size, Error **errp) > > } > > if (!memory_region_is_ram(mrs.mr) && !memory_region_is_romd(mrs.mr)) { > > -error_setg(errp, "Memory at address 0x%" HWADDR_PRIx "is not RAM", > > addr); > > +error_setg(errp, "Memory at address 0x%" HWADDR_PRIx " is not > > RAM", addr); > > memory_region_unref(mrs.mr); > > return NULL; > > } > > Fixes: e9628441df ("hmp: gpa2hva and gpa2hpa hostaddr command") > Reviewed-by: Philippe Mathieu-Daudé Thanks, Reviewed-by: Dr. David Alan Gilbert Cc'ing in Trivial. Dave > -- -Open up your eyes, open up your mind, open up your code --- / Dr. David Alan Gilbert| Running GNU/Linux | Happy \ \dave @ treblig.org | | In Hex / \ _|_ http://www.treblig.org |___/
Re: [PATCH v3 40/49] hw/i386/sev: Add function to get SEV metadata from OVMF header
On Wed, Mar 20, 2024 at 10:55:35AM -0700, Isaku Yamahata wrote: > On Wed, Mar 20, 2024 at 03:39:36AM -0500, > Michael Roth wrote: > > > From: Brijesh Singh > > > > A recent version of OVMF expanded the reset vector GUID list to add > > SEV-specific metadata GUID. The SEV metadata describes the reserved > > memory regions such as the secrets and CPUID page used during the SEV-SNP > > guest launch. > > > > The pc_system_get_ovmf_sev_metadata_ptr() is used to retieve the SEV > > metadata pointer from the OVMF GUID list. > > > > Signed-off-by: Brijesh Singh > > Signed-off-by: Michael Roth > > --- > > hw/i386/pc_sysfw_ovmf.c | 33 + > > include/hw/i386/pc.h| 26 ++ > > 2 files changed, 59 insertions(+) > > > > diff --git a/hw/i386/pc_sysfw_ovmf.c b/hw/i386/pc_sysfw_ovmf.c > > index 07a4c267fa..32efa34614 100644 > > --- a/hw/i386/pc_sysfw_ovmf.c > > +++ b/hw/i386/pc_sysfw_ovmf.c > > @@ -35,6 +35,31 @@ static const int bytes_after_table_footer = 32; > > static bool ovmf_flash_parsed; > > static uint8_t *ovmf_table; > > static int ovmf_table_len; > > +static OvmfSevMetadata *ovmf_sev_metadata_table; > > + > > +#define OVMF_SEV_META_DATA_GUID "dc886566-984a-4798-A75e-5585a7bf67cc" > > +typedef struct __attribute__((__packed__)) OvmfSevMetadataOffset { > > +uint32_t offset; > > +} OvmfSevMetadataOffset; > > + > > +static void pc_system_parse_sev_metadata(uint8_t *flash_ptr, size_t > > flash_size) > > +{ > > +OvmfSevMetadata *metadata; > > +OvmfSevMetadataOffset *data; > > + > > +if (!pc_system_ovmf_table_find(OVMF_SEV_META_DATA_GUID, (uint8_t > > **), > > + NULL)) { > > +return; > > +} > > + > > +metadata = (OvmfSevMetadata *)(flash_ptr + flash_size - data->offset); > > +if (memcmp(metadata->signature, "ASEV", 4) != 0) { > > +return; > > +} > > + > > +ovmf_sev_metadata_table = g_malloc(metadata->len); > > +memcpy(ovmf_sev_metadata_table, metadata, metadata->len); > > +} > > > > void pc_system_parse_ovmf_flash(uint8_t *flash_ptr, size_t flash_size) > > { > > @@ -90,6 +115,9 @@ void pc_system_parse_ovmf_flash(uint8_t *flash_ptr, > > size_t flash_size) > > */ > > memcpy(ovmf_table, ptr - tot_len, tot_len); > > ovmf_table += tot_len; > > + > > +/* Copy the SEV metadata table (if exist) */ > > +pc_system_parse_sev_metadata(flash_ptr, flash_size); > > } > > Can we move this call to x86_firmware_configure() @ pc_sysfw.c, and move sev > specific bits to somewhere to sev specific file? We don't have to parse sev > metadata for non-SEV case, right? > > We don't have to touch common ovmf file. It also will be consistent with tdx > case. TDX patch series adds tdx_parse_tdvf() to x86_firmware_configure(). Yep, makes sense to handle it similarly for SNP. Thanks, Mike > > thanks, > > > > > /** > > @@ -159,3 +187,8 @@ bool pc_system_ovmf_table_find(const char *entry, > > uint8_t **data, > > } > > return false; > > } > > + > > +OvmfSevMetadata *pc_system_get_ovmf_sev_metadata_ptr(void) > > +{ > > +return ovmf_sev_metadata_table; > > +} > > diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h > > index fb1d4106e5..df9a61540d 100644 > > --- a/include/hw/i386/pc.h > > +++ b/include/hw/i386/pc.h > > @@ -163,6 +163,32 @@ void pc_acpi_smi_interrupt(void *opaque, int irq, int > > level); > > #define PCI_HOST_ABOVE_4G_MEM_SIZE "above-4g-mem-size" > > #define PCI_HOST_PROP_SMM_RANGES "smm-ranges" > > > > +typedef enum { > > +SEV_DESC_TYPE_UNDEF, > > +/* The section contains the region that must be validated by the VMM. > > */ > > +SEV_DESC_TYPE_SNP_SEC_MEM, > > +/* The section contains the SNP secrets page */ > > +SEV_DESC_TYPE_SNP_SECRETS, > > +/* The section contains address that can be used as a CPUID page */ > > +SEV_DESC_TYPE_CPUID, > > + > > +} ovmf_sev_metadata_desc_type; > > + > > +typedef struct __attribute__((__packed__)) OvmfSevMetadataDesc { > > +uint32_t base; > > +uint32_t len; > > +ovmf_sev_metadata_desc_type type; > > +} OvmfSevMetadataDesc; > > + > > +typedef struct __attribute__((__packed__)) OvmfSevMetadata { > > +uint8_t signature[4]; > > +uint32_t len; > > +uint32_t version; > > +uint32_t num_desc; > > +OvmfSevMetadataDesc descs[]; > > +} OvmfSevMetadata; > > + > > +OvmfSevMetadata *pc_system_get_ovmf_sev_metadata_ptr(void); > > > > void pc_pci_as_mapping_init(MemoryRegion *system_memory, > > MemoryRegion *pci_address_space); > > -- > > 2.25.1 > > > > > > -- > Isaku Yamahata
Re: [PATCH v3 37/49] i386/sev: Add the SNP launch start context
On Wed, Mar 20, 2024 at 10:58:30AM +0100, Paolo Bonzini wrote: > On 3/20/24 09:39, Michael Roth wrote: > > From: Brijesh Singh > > > > The SNP_LAUNCH_START is called first to create a cryptographic launch > > context within the firmware. > > > > Signed-off-by: Brijesh Singh > > Signed-off-by: Michael Roth > > --- > > target/i386/sev.c| 42 +++- > > target/i386/trace-events | 1 + > > 2 files changed, 42 insertions(+), 1 deletion(-) > > > > diff --git a/target/i386/sev.c b/target/i386/sev.c > > index 3b4dbc63b1..9f63a41f08 100644 > > --- a/target/i386/sev.c > > +++ b/target/i386/sev.c > > @@ -39,6 +39,7 @@ > > #include "confidential-guest.h" > > #include "hw/i386/pc.h" > > #include "exec/address-spaces.h" > > +#include "qemu/queue.h" > > OBJECT_DECLARE_SIMPLE_TYPE(SevCommonState, SEV_COMMON) > > OBJECT_DECLARE_SIMPLE_TYPE(SevGuestState, SEV_GUEST) > > @@ -106,6 +107,16 @@ struct SevSnpGuestState { > > #define DEFAULT_SEV_DEVICE "/dev/sev" > > #define DEFAULT_SEV_SNP_POLICY 0x3 > > +typedef struct SevLaunchUpdateData { > > +QTAILQ_ENTRY(SevLaunchUpdateData) next; > > +hwaddr gpa; > > +void *hva; > > +uint64_t len; > > +int type; > > +} SevLaunchUpdateData; > > + > > +static QTAILQ_HEAD(, SevLaunchUpdateData) launch_update; > > + > > #define SEV_INFO_BLOCK_GUID "00f771de-1a7e-4fcb-890e-68c77e2fb44e" > > typedef struct __attribute__((__packed__)) SevInfoBlock { > > /* SEV-ES Reset Vector Address */ > > @@ -668,6 +679,30 @@ sev_read_file_base64(const char *filename, guchar > > **data, gsize *len) > > return 0; > > } > > +static int > > +sev_snp_launch_start(SevSnpGuestState *sev_snp_guest) > > +{ > > +int fw_error, rc; > > +SevCommonState *sev_common = SEV_COMMON(sev_snp_guest); > > +struct kvm_sev_snp_launch_start *start = > > _snp_guest->kvm_start_conf; > > + > > +trace_kvm_sev_snp_launch_start(start->policy, > > sev_snp_guest->guest_visible_workarounds); > > + > > +rc = sev_ioctl(sev_common->sev_fd, KVM_SEV_SNP_LAUNCH_START, > > + start, _error); > > +if (rc < 0) { > > +error_report("%s: SNP_LAUNCH_START ret=%d fw_error=%d '%s'", > > +__func__, rc, fw_error, fw_error_to_str(fw_error)); > > +return 1; > > +} > > + > > +QTAILQ_INIT(_update); > > + > > +sev_set_guest_state(sev_common, SEV_STATE_LAUNCH_UPDATE); > > + > > +return 0; > > +} > > + > > static int > > sev_launch_start(SevGuestState *sev_guest) > > { > > @@ -1007,7 +1042,12 @@ static int sev_kvm_init(ConfidentialGuestSupport > > *cgs, Error **errp) > > goto err; > > } > > -ret = sev_launch_start(SEV_GUEST(sev_common)); > > +if (sev_snp_enabled()) { > > +ret = sev_snp_launch_start(SEV_SNP_GUEST(sev_common)); > > +} else { > > +ret = sev_launch_start(SEV_GUEST(sev_common)); > > +} > > Instead of an "if", this should be a method in sev-common. Likewise for > launch_finish in the next patch. Makes sense. > > Also, patch 47 should introduce an "int (*launch_update_data)(hwaddr gpa, > uint8_t *ptr, uint64_t len)" method whose implementation is either the > existing sev_launch_update_data() for sev-guest, or a wrapper around > snp_launch_update_data() (to add KVM_SEV_SNP_PAGE_TYPE_NORMAL) for > sev-snp-guest. I suppose if we end up introducing an unused 'gpa' parameter in the case of sev_launch_update_data() that's still worth the change? Seems reasonable to me. > > In general, the only uses of sev_snp_enabled() should be in > sev_add_kernel_loader_hashes() and kvm_handle_vmgexit_ext_req(). I would > not be that strict for the QMP and HMP functions, but if you want to make > those methods of sev-common I wouldn't complain. There's a good bit of duplication in those cases which is a little awkward to break out into a common helper. Will consider these as well though. Thanks, Mike > > Paolo > > > if (ret) { > > error_setg(errp, "%s: failed to create encryption context", > > __func__); > > goto err; > > diff --git a/target/i386/trace-events b/target/i386/trace-events > > index 2cd8726eeb..cb26d8a925 100644 > > --- a/target/i386/trace-events > > +++ b/target/i386/trace-events > > @@ -11,3 +11,4 @@ kvm_sev_launch_measurement(const char *value) "data %s" > > kvm_sev_launch_finish(void) "" > > kvm_sev_launch_secret(uint64_t hpa, uint64_t hva, uint64_t secret, int > > len) "hpa 0x%" PRIx64 " hva 0x%" PRIx64 " data 0x%" PRIx64 " len %d" > > kvm_sev_attestation_report(const char *mnonce, const char *data) "mnonce > > %s data %s" > > +kvm_sev_snp_launch_start(uint64_t policy, char *gosvw) "policy 0x%" PRIx64 > > " gosvw %s" >
Re: [PATCH v3 31/49] i386/sev: Update query-sev QAPI format to handle SEV-SNP
On Wed, Mar 20, 2024 at 12:10:04PM +, Daniel P. Berrangé wrote: > On Wed, Mar 20, 2024 at 03:39:27AM -0500, Michael Roth wrote: > > Most of the current 'query-sev' command is relevant to both legacy > > SEV/SEV-ES guests and SEV-SNP guests, with 2 exceptions: > > > > - 'policy' is a 64-bit field for SEV-SNP, not 32-bit, and > > the meaning of the bit positions has changed > > - 'handle' is not relevant to SEV-SNP > > > > To address this, this patch adds a new 'sev-type' field that can be > > used as a discriminator to select between SEV and SEV-SNP-specific > > fields/formats without breaking compatibility for existing management > > tools (so long as management tools that add support for launching > > SEV-SNP guest update their handling of query-sev appropriately). > > > > The corresponding HMP command has also been fixed up similarly. > > > > Signed-off-by: Michael Roth > > --- > > qapi/misc-target.json | 71 ++- > > target/i386/sev.c | 50 -- > > target/i386/sev.h | 3 ++ > > 3 files changed, 94 insertions(+), 30 deletions(-) > > > > diff --git a/qapi/misc-target.json b/qapi/misc-target.json > > index 4e0a6492a9..daceb85d95 100644 > > --- a/qapi/misc-target.json > > +++ b/qapi/misc-target.json > > @@ -47,6 +47,49 @@ > > 'send-update', 'receive-update' ], > >'if': 'TARGET_I386' } > > > > +## > > +# @SevGuestType: > > +# > > +# An enumeration indicating the type of SEV guest being run. > > +# > > +# @sev: The guest is a legacy SEV or SEV-ES guest. > > +# @sev-snp: The guest is an SEV-SNP guest. > > +# > > +# Since: 6.2 > > Now 9.1 at the earliest. > > > +## > > +{ 'enum': 'SevGuestType', > > + 'data': [ 'sev', 'sev-snp' ], > > + 'if': 'TARGET_I386' } > > + > > +## > > +# @SevGuestInfo: > > +# > > +# Information specific to legacy SEV/SEV-ES guests. > > +# > > +# @policy: SEV policy value > > +# > > +# @handle: SEV firmware handle > > +# > > +# Since: 2.12 > > +## > > +{ 'struct': 'SevGuestInfo', > > + 'data': { 'policy': 'uint32', > > +'handle': 'uint32' }, > > + 'if': 'TARGET_I386' } > > + > > +## > > +# @SevSnpGuestInfo: > > +# > > +# Information specific to SEV-SNP guests. > > +# > > +# @snp-policy: SEV-SNP policy value > > +# > > +# Since: 6.2 > > +## > > +{ 'struct': 'SevSnpGuestInfo', > > + 'data': { 'snp-policy': 'uint64' }, > > + 'if': 'TARGET_I386' } > > IMHO it can just be called 'policy' still, since > it is implicitly within a 'Snp' specific type. > > > > + > > ## > > # @SevInfo: > > # > > @@ -60,25 +103,25 @@ > > # > > # @build-id: SEV FW build id > > # > > -# @policy: SEV policy value > > -# > > # @state: SEV guest state > > # > > -# @handle: SEV firmware handle > > +# @sev-type: Type of SEV guest being run > > # > > # Since: 2.12 > > ## > > -{ 'struct': 'SevInfo', > > -'data': { 'enabled': 'bool', > > - 'api-major': 'uint8', > > - 'api-minor' : 'uint8', > > - 'build-id' : 'uint8', > > - 'policy' : 'uint32', > > - 'state' : 'SevState', > > - 'handle' : 'uint32' > > -}, > > - 'if': 'TARGET_I386' > > -} > > +{ 'union': 'SevInfo', > > + 'base': { 'enabled': 'bool', > > +'api-major': 'uint8', > > +'api-minor' : 'uint8', > > +'build-id' : 'uint8', > > +'state' : 'SevState', > > +'sev-type' : 'SevGuestType' }, > > + 'discriminator': 'sev-type', > > + 'data': { > > + 'sev': 'SevGuestInfo', > > + 'sev-snp': 'SevSnpGuestInfo' }, > > + 'if': 'TARGET_I386' } > > + > > > > ## > > # @query-sev: > > diff --git a/target/i386/sev.c b/target/i386/sev.c > > index 43e6c0172f..b03d70a3d1 100644 > > --- a/target/i386/sev.c > > +++ b/target/i386/sev.c > > @@ -353,25 +353,27 @@ static SevInfo *sev_get_info(void) > > { > > SevInfo *info; > > SevCommonState *sev_common = > > SEV_COMMON(MACHINE(qdev_get_machine())->cgs); > > -SevGuestState *sev_guest = > > -(SevGuestState *)object_dynamic_cast(OBJECT(sev_common), > > - TYPE_SEV_GUEST); > > > > info = g_new0(SevInfo, 1); > > info->enabled = sev_enabled(); > > > > if (info->enabled) { > > -if (sev_guest) { > > -info->handle = sev_guest->handle; > > -} > > info->api_major = sev_common->api_major; > > info->api_minor = sev_common->api_minor; > > info->build_id = sev_common->build_id; > > info->state = sev_common->state; > > -/* we only report the lower 32-bits of policy for SNP, ok for > > now... */ > > -info->policy = > > -(uint32_t)object_property_get_uint(OBJECT(sev_common), > > - "policy", NULL); > > + > > +if (sev_snp_enabled()) { > > +info->sev_type = SEV_GUEST_TYPE_SEV_SNP; > > +
Re: [PATCH v3 25/49] i386/sev: Skip RAMBlock notifiers for SNP
On Wed, Mar 20, 2024 at 10:46:29AM +0100, Paolo Bonzini wrote: > On 3/20/24 09:39, Michael Roth wrote: > > SEV uses these notifiers to register/pin pages prior to guest use, since > > they could potentially be used for private memory where page migration > > is not supported. But SNP only uses guest_memfd-provided pages for > > private memory, which has its own kernel-internal mechanisms for > > registering/pinning memory. > > > > Signed-off-by: Michael Roth > > --- > > target/i386/sev.c | 10 +- > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > diff --git a/target/i386/sev.c b/target/i386/sev.c > > index 61af312a11..774262d834 100644 > > --- a/target/i386/sev.c > > +++ b/target/i386/sev.c > > @@ -982,7 +982,15 @@ static int sev_kvm_init(ConfidentialGuestSupport *cgs, > > Error **errp) > > goto err; > > } > > -ram_block_notifier_add(_ram_notifier); > > +if (!sev_snp_enabled()) { > > +/* > > + * SEV uses these notifiers to register/pin pages prior to guest > > use, > > + * but SNP relies on guest_memfd for private pages, which has it's > > + * own internal mechanisms for registering/pinning private memory. > > + */ > > +ram_block_notifier_add(_ram_notifier); > > +} > > + > > qemu_add_machine_init_done_notifier(_machine_done_notify); > > qemu_add_vm_change_state_handler(sev_vm_state_change, sev_common); > > These three lines can be done in any order, so I suggest removing > ram_block_notifier_add + qemu_add_machine_init_done_notifier from the > sev-common implementation of kvm_init (let's call it sev_common_kvm_init); > and add an override in sev-guest that calls them if sev_common_kvm_init() > succeeds. > > (treat this as a review for 25/26/29). Makes sense. Will split out the common bits of sev_kvm_init() and use class methods for initialization specific to sev-guest and sev-snp-guest. -Mike > > Paolo >
Re: [PATCH v3 23/49] i386/sev: Add a sev_snp_enabled() helper
On Wed, Mar 20, 2024 at 12:35:09PM +, Daniel P. Berrangé wrote: > On Wed, Mar 20, 2024 at 03:39:19AM -0500, Michael Roth wrote: > > Add a simple helper to check if the current guest type is SNP. Also have > > SNP-enabled imply that SEV-ES is enabled as well, and fix up any places > > where the sev_es_enabled() check is expecting a pure/non-SNP guest. > > > > Signed-off-by: Michael Roth > > --- > > target/i386/sev.c | 13 - > > target/i386/sev.h | 2 ++ > > 2 files changed, 14 insertions(+), 1 deletion(-) > > > > diff --git a/target/i386/sev.c b/target/i386/sev.c > > index 7e6dab642a..2eb13ba639 100644 > > --- a/target/i386/sev.c > > +++ b/target/i386/sev.c > > > > @@ -933,7 +942,9 @@ static int sev_kvm_init(ConfidentialGuestSupport *cgs, > > Error **errp) > > __func__); > > goto err; > > } > > +} > > > > +if (sev_es_enabled() && !sev_snp_enabled()) { > > if (!(status.flags & SEV_STATUS_FLAGS_CONFIG_ES)) { > > error_report("%s: guest policy requires SEV-ES, but " > > "host SEV-ES support unavailable", > > Opps, pre-existing bug here - this method has an 'Error **errp' > parameter, so should be using 'error_report'. > > There are several more examples of this in this method that > predate your patch series. Can you put a patch at the start > of this series that fixes them before introducing SNP. Sure, will add a pre-patch to fix up all the pre-existing issues you've noted. -Mike > > > With regards, > Daniel > -- > |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o-https://fstop138.berrange.com :| > |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :| >
Re: [PATCH v3 22/49] i386/sev: Introduce 'sev-snp-guest' object
On Wed, Mar 20, 2024 at 11:58:57AM +, Daniel P. Berrangé wrote: > On Wed, Mar 20, 2024 at 03:39:18AM -0500, Michael Roth wrote: > > From: Brijesh Singh > > > > SEV-SNP support relies on a different set of properties/state than the > > existing 'sev-guest' object. This patch introduces the 'sev-snp-guest' > > object, which can be used to configure an SEV-SNP guest. For example, > > a default-configured SEV-SNP guest with no additional information > > passed in for use with attestation: > > > > -object sev-snp-guest,id=sev0 > > > > or a fully-specified SEV-SNP guest where all spec-defined binary > > blobs are passed in as base64-encoded strings: > > > > -object sev-snp-guest,id=sev0, \ > > policy=0x3, \ > > init-flags=0, \ > > id-block=YWFhYWFhYWFhYWFhYWFhCg==, \ > > id-auth=CxHK/OKLkXGn/KpAC7Wl1FSiisWDbGTEKz..., \ > > auth-key-enabled=on, \ > > host-data=LNkCWBRC5CcdGXirbNUV1OrsR28s..., \ > > guest-visible-workarounds=AA==, \ > > > > See the QAPI schema updates included in this patch for more usage > > details. > > > > In some cases these blobs may be up to 4096 characters, but this is > > generally well below the default limit for linux hosts where > > command-line sizes are defined by the sysconf-configurable ARG_MAX > > value, which defaults to 2097152 characters for Ubuntu hosts, for > > example. > > > > Signed-off-by: Brijesh Singh > > Co-developed-by: Michael Roth > > Acked-by: Markus Armbruster (for QAPI schema) > > Signed-off-by: Michael Roth > > --- > > docs/system/i386/amd-memory-encryption.rst | 78 ++- > > qapi/qom.json | 51 + > > target/i386/sev.c | 241 + > > target/i386/sev.h | 1 + > > 4 files changed, 369 insertions(+), 2 deletions(-) > > > > > +## > > +# @SevSnpGuestProperties: > > +# > > +# Properties for sev-snp-guest objects. Most of these are direct arguments > > +# for the KVM_SNP_* interfaces documented in the linux kernel source > > +# under Documentation/virt/kvm/amd-memory-encryption.rst, which are in > > +# turn closely coupled with the SNP_INIT/SNP_LAUNCH_* firmware commands > > +# documented in the SEV-SNP Firmware ABI Specification (Rev 0.9). > > +# > > +# More usage information is also available in the QEMU source tree under > > +# docs/amd-memory-encryption. > > +# > > +# @policy: the 'POLICY' parameter to the SNP_LAUNCH_START command, as > > +# defined in the SEV-SNP firmware ABI (default: 0x3) > > +# > > +# @guest-visible-workarounds: 16-byte, base64-encoded blob to report > > +# hypervisor-defined workarounds, corresponding > > +# to the 'GOSVW' parameter of the > > +# SNP_LAUNCH_START command defined in the > > +# SEV-SNP firmware ABI (default: all-zero) > > +# > > +# @id-block: 96-byte, base64-encoded blob to provide the 'ID Block' > > +#structure for the SNP_LAUNCH_FINISH command defined in the > > +#SEV-SNP firmware ABI (default: all-zero) > > +# > > +# @id-auth: 4096-byte, base64-encoded blob to provide the 'ID > > Authentication > > +# Information Structure' for the SNP_LAUNCH_FINISH command > > defined > > +# in the SEV-SNP firmware ABI (default: all-zero) > > +# > > +# @auth-key-enabled: true if 'id-auth' blob contains the 'AUTHOR_KEY' field > > +#defined SEV-SNP firmware ABI (default: false) > > +# > > +# @host-data: 32-byte, base64-encoded, user-defined blob to provide to the > > +# guest, as documented for the 'HOST_DATA' parameter of the > > +# SNP_LAUNCH_FINISH command in the SEV-SNP firmware ABI > > +# (default: all-zero) > > +# > > +# Since: 7.2 > > This will be 9.1 at the earliest now. Amazing how good I am at remembering these once I see a reply to a schema patch I'd already hit 'send' on :) > > > +## > > +{ 'struct': 'SevSnpGuestProperties', > > + 'base': 'SevCommonProperties', > > + 'data': { > > +'*policy': 'uint64', > > +'*guest-visible-workarounds': 'str', > > +'*id-block': 'str', > > +'*id-auth': 'str', > > +'*auth-key-enabled': 'bool', > > +'*host-data': 'str' } } > > + > > > diff --git a/target/i386/sev.c b/target/i386/sev.c > > index 63a220de5e..7e6dab642a 100644 > > --- a/target/i386/sev.c > > +++ b/target/i386/sev.c > > @@ -42,6 +42,7 @@ > > > > OBJECT_DECLARE_SIMPLE_TYPE(SevCommonState, SEV_COMMON) > > OBJECT_DECLARE_SIMPLE_TYPE(SevGuestState, SEV_GUEST) > > +OBJECT_DECLARE_SIMPLE_TYPE(SevSnpGuestState, SEV_SNP_GUEST) > > > > struct SevCommonState { > > X86ConfidentialGuest parent_obj; > > @@ -87,8 +88,22 @@ struct SevGuestState { > > bool kernel_hashes; > > }; > > > > +struct SevSnpGuestState { > > +SevCommonState sev_common; > > + > > +/*
Re: [PATCH v3 21/49] i386/sev: Introduce "sev-common" type to encapsulate common SEV state
On Wed, Mar 20, 2024 at 11:47:28AM +, Daniel P. Berrangé wrote: > On Wed, Mar 20, 2024 at 03:39:17AM -0500, Michael Roth wrote: > > Currently all SEV/SEV-ES functionality is managed through a single > > 'sev-guest' QOM type. With upcoming support for SEV-SNP, taking this > > same approach won't work well since some of the properties/state > > managed by 'sev-guest' is not applicable to SEV-SNP, which will instead > > rely on a new QOM type with its own set of properties/state. > > > > To prepare for this, this patch moves common state into an abstract > > 'sev-common' parent type to encapsulate properties/state that are > > common to both SEV/SEV-ES and SEV-SNP, leaving only SEV/SEV-ES-specific > > properties/state in the current 'sev-guest' type. This should not > > affect current behavior or command-line options. > > > > As part of this patch, some related changes are also made: > > > > - a static 'sev_guest' variable is currently used to keep track of > > the 'sev-guest' instance. SEV-SNP would similarly introduce an > > 'sev_snp_guest' static variable. But these instances are now > > available via qdev_get_machine()->cgs, so switch to using that > > instead and drop the static variable. > > > > - 'sev_guest' is currently used as the name for the static variable > > holding a pointer to the 'sev-guest' instance. Re-purpose the name > > as a local variable referring the 'sev-guest' instance, and use > > that consistently throughout the code so it can be easily > > distinguished from sev-common/sev-snp-guest instances. > > > > - 'sev' is generally used as the name for local variables holding a > > pointer to the 'sev-guest' instance. In cases where that now points > > to common state, use the name 'sev_common'; in cases where that now > > points to state specific to 'sev-guest' instance, use the name > > 'sev_guest' > > > > Signed-off-by: Michael Roth > > --- > > qapi/qom.json | 32 ++-- > > target/i386/sev.c | 457 ++ > > target/i386/sev.h | 3 + > > 3 files changed, 281 insertions(+), 211 deletions(-) > > > > > static SevInfo *sev_get_info(void) > > { > > SevInfo *info; > > +SevCommonState *sev_common = > > SEV_COMMON(MACHINE(qdev_get_machine())->cgs); > > +SevGuestState *sev_guest = > > +(SevGuestState *)object_dynamic_cast(OBJECT(sev_common), > > + TYPE_SEV_GUEST); > > > > info = g_new0(SevInfo, 1); > > info->enabled = sev_enabled(); > > > > if (info->enabled) { > > -info->api_major = sev_guest->api_major; > > -info->api_minor = sev_guest->api_minor; > > -info->build_id = sev_guest->build_id; > > -info->policy = sev_guest->policy; > > -info->state = sev_guest->state; > > -info->handle = sev_guest->handle; > > +if (sev_guest) { > > +info->handle = sev_guest->handle; > > +} > > If we're not going to provide a value for 'handle', then > we should update the QAPI for this to mark the property > as optional, which would then require doing > > info->has_handle = true; > > inside this 'if' block. I think this is another temporarily-awkward case that gets resolved with: i386/sev: Update query-sev QAPI format to handle SEV-SNP With that patch 'handle' is always available for SEV guests, and never available for SNP, and that's managed through a discriminated union type. I think that info->handle should be treated the same as the other fields as part of this patch and any changes in how they are reported should be kept in the above-mentioned patch. This might be another artifact from v2's handling. Will get this fixed up. -Mike > > +} > > > +info->api_major = sev_common->api_major; > > +info->api_minor = sev_common->api_minor; > > +info->build_id = sev_common->build_id; > > +info->state = sev_common->state; > > +/* we only report the lower 32-bits of policy for SNP, ok for > > now... */ > > +info->policy = > > +(uint32_t)object_property_get_uint(OBJECT(sev_common), > > + "policy", NULL); > > } > > With regards, > Daniel > -- > |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o-https://fstop138.berrange.com :| > |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :| >
Re: [PATCH v3 21/49] i386/sev: Introduce "sev-common" type to encapsulate common SEV state
On Wed, Mar 20, 2024 at 11:44:13AM +, Daniel P. Berrangé wrote: > On Wed, Mar 20, 2024 at 03:39:17AM -0500, Michael Roth wrote: > > Currently all SEV/SEV-ES functionality is managed through a single > > 'sev-guest' QOM type. With upcoming support for SEV-SNP, taking this > > same approach won't work well since some of the properties/state > > managed by 'sev-guest' is not applicable to SEV-SNP, which will instead > > rely on a new QOM type with its own set of properties/state. > > > > To prepare for this, this patch moves common state into an abstract > > 'sev-common' parent type to encapsulate properties/state that are > > common to both SEV/SEV-ES and SEV-SNP, leaving only SEV/SEV-ES-specific > > properties/state in the current 'sev-guest' type. This should not > > affect current behavior or command-line options. > > > > As part of this patch, some related changes are also made: > > > > - a static 'sev_guest' variable is currently used to keep track of > > the 'sev-guest' instance. SEV-SNP would similarly introduce an > > 'sev_snp_guest' static variable. But these instances are now > > available via qdev_get_machine()->cgs, so switch to using that > > instead and drop the static variable. > > > > - 'sev_guest' is currently used as the name for the static variable > > holding a pointer to the 'sev-guest' instance. Re-purpose the name > > as a local variable referring the 'sev-guest' instance, and use > > that consistently throughout the code so it can be easily > > distinguished from sev-common/sev-snp-guest instances. > > > > - 'sev' is generally used as the name for local variables holding a > > pointer to the 'sev-guest' instance. In cases where that now points > > to common state, use the name 'sev_common'; in cases where that now > > points to state specific to 'sev-guest' instance, use the name > > 'sev_guest' > > > > Signed-off-by: Michael Roth > > --- > > qapi/qom.json | 32 ++-- > > target/i386/sev.c | 457 ++ > > target/i386/sev.h | 3 + > > 3 files changed, 281 insertions(+), 211 deletions(-) > > > > diff --git a/qapi/qom.json b/qapi/qom.json > > index baae3a183f..66b5781ca6 100644 > > --- a/qapi/qom.json > > +++ b/qapi/qom.json > > @@ -875,12 +875,29 @@ > >'data': { '*filename': 'str' } } > > > > ## > > -# @SevGuestProperties: > > +# @SevCommonProperties: > > # > > -# Properties for sev-guest objects. > > +# Properties common to objects that are derivatives of sev-common. > > # > > # @sev-device: SEV device to use (default: "/dev/sev") > > # > > +# @cbitpos: C-bit location in page table entry (default: 0) > > +# > > +# @reduced-phys-bits: number of bits in physical addresses that become > > +# unavailable when SEV is enabled > > +# > > +# Since: 2.12 > > Not quite sure what we've done in this scenario before. > It feels wierd to use '2.12' for the new base type, even > though in effect the properties all existed since 2.12 in > the sub-class. > > Perhaps 'Since: 9.1' for the type, but 'Since: 2.12' for the > properties, along with an explanatory comment about stuff > moving into the new base type ? > > Markus, opinions ? My thinking is that the internal details are less important than what's actually exposed to users in the form of command-line options/etc. So in that context the "Since: 2.12" sort of becomes the "default" for when those properties were first made available to users, and then anything we add after would then get special treatment with the per-property versioning. But no issue with taking a different approach if that's preferred. > > > +## > > +{ 'struct': 'SevCommonProperties', > > + 'data': { '*sev-device': 'str', > > +'*cbitpos': 'uint32', > > +'reduced-phys-bits': 'uint32' } } > > + > > +## > > +# @SevGuestProperties: > > +# > > +# Properties for sev-guest objects. > > +# > > # @dh-cert-file: guest owners DH certificate (encoded with base64) > > # > > # @session-file: guest owners session parameters (encoded with base64) > > @@ -889,11 +906,6 @@ > > # > > # @handle: SEV firmware handle (default: 0) > > # > > -# @cbitpos: C-bit location in page table entry (default: 0) > > -# > > -# @reduced-phys-bits: number of bits in physical addresses that become > > -# unavailable when SEV is enabled > > -# > > # @kernel-hashes: if true, add hashes of kernel/initrd/cmdline to a > > # designated guest firmware page for measured boot with -kernel > > # (default: false) (since 6.2) > > @@ -901,13 +913,11 @@ > > # Since: 2.12 > > ## > > { 'struct': 'SevGuestProperties', > > - 'data': { '*sev-device': 'str', > > -'*dh-cert-file': 'str', > > + 'base': 'SevCommonProperties', > > + 'data': { '*dh-cert-file': 'str', > > '*session-file': 'str', > > '*policy': 'uint32', > > '*handle': 'uint32', > > -'*cbitpos': 'uint32', > > -
[PATCH v2 0/2] Implement QEMU GA commands for Windows
From: Aidan Leuck * Fixed styling errors * Moved from wcstombs to g_utf functions * Removed unnecessary if checks on calls to free * Fixed copyright headers * Refactored create_acl functions into base function, admin function and user function * Removed unused user count function * Split up refactor into a separate patch Aidan Leuck (2): Implement QEMU GA commands for Windows Factored out common functions between POSIX and Windows implementation qga/commands-posix-ssh.c | 47 +-- qga/commands-ssh-core.c| 57 +++ qga/commands-ssh-core.h| 15 + qga/commands-windows-ssh.c | 759 + qga/commands-windows-ssh.h | 27 ++ qga/meson.build| 12 +- qga/qapi-schema.json | 22 +- 7 files changed, 881 insertions(+), 58 deletions(-) create mode 100644 qga/commands-ssh-core.c create mode 100644 qga/commands-ssh-core.h create mode 100644 qga/commands-windows-ssh.c create mode 100644 qga/commands-windows-ssh.h -- 2.34.1 >From b77264e7ba7390adeaaa9d5df707c60693d78c16 Mon Sep 17 00:00:00 2001 From: Aidan Leuck Date: Wed, 20 Mar 2024 15:36:18 -0600 Subject: [PATCH v2 1/2] Implement QEMU GA commands for Windows Signed-off-by: Aidan Leuck --- qga/commands-windows-ssh.c | 823 + qga/commands-windows-ssh.h | 26 ++ qga/meson.build| 9 +- qga/qapi-schema.json | 22 +- 4 files changed, 867 insertions(+), 13 deletions(-) create mode 100644 qga/commands-windows-ssh.c create mode 100644 qga/commands-windows-ssh.h diff --git a/qga/commands-windows-ssh.c b/qga/commands-windows-ssh.c new file mode 100644 index 00..e9faae90fc --- /dev/null +++ b/qga/commands-windows-ssh.c @@ -0,0 +1,823 @@ +/* + * QEMU Guest Agent win32-specific command implementations for SSH keys. + * The implementation is opinionated and expects the SSH implementation to + * be OpenSSH. + * + * Copyright Schweitzer Engineering Laboratories. 2024 + * + * Authors: + * Aidan Leuck + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" +#include +#include + +#include "commands-windows-ssh.h" +#include "guest-agent-core.h" +#include "limits.h" +#include "lmaccess.h" +#include "lmapibuf.h" +#include "lmerr.h" +#include "qapi/error.h" + +#include "qga-qapi-commands.h" +#include "sddl.h" +#include "shlobj.h" +#include "userenv.h" + +#define AUTHORIZED_KEY_FILE "authorized_keys" +#define AUTHORIZED_KEY_FILE_ADMIN "administrators_authorized_keys" +#define LOCAL_SYSTEM_SID "S-1-5-18" +#define ADMIN_SID "S-1-5-32-544" +#define WORLD_SID "S-1-1-0" + +/* + * Reads the authorized_keys file and returns an array of strings for each entry + * parameters: + * path -> Path to the authorized_keys file + * errp -> Error structure that will contain errors upon failure. + * returns: Array of strings, where each entry is an authorized key. + */ +static GStrv read_authkeys(const char *path, Error **errp) +{ + g_autoptr(GError) err = NULL; + g_autofree char *contents = NULL; + + if (!g_file_get_contents(path, , NULL, )) { +error_setg(errp, "failed to read '%s': %s", path, err->message); +return NULL; + } + + return g_strsplit(contents, "\n", -1); +} + +/* + * Checks if a OpenSSH key is valid + * parameters: + * key* Key to check for validity + * errp -> Error structure that will contain errors upon failure. + * returns: true if key is valid, false otherwise + */ +static bool check_openssh_pub_key(const char *key, Error **errp) +{ + /* simple sanity-check, we may want more? */ + if (!key || key[0] == '#' || strchr(key, '\n')) { +error_setg(errp, "invalid OpenSSH public key: '%s'", key); +return false; + } + + return true; +} + +/* + * Checks if all openssh keys in the array are valid + * parameters: + * keys -> Array of keys to check + * errp -> Error structure that will contain errors upon failure. + * returns: true if all keys are valid, false otherwise + */ +static bool check_openssh_pub_keys(strList *keys, size_t *nkeys, Error **errp) +{ + size_t n = 0; + strList *k; + + for (k = keys; k != NULL; k = k->next) { +if (!check_openssh_pub_key(k->value, errp)) { + return false; +} +n++; + } + + if (nkeys) { +*nkeys = n; + } + return true; +} + +/* + * Frees userInfo structure. This implements the g_auto cleanup + * for the structure. + */ +void free_userInfo(PWindowsUserInfo info) +{ + g_free(info->sshDirectory); + g_free(info->authorizedKeyFile); + LocalFree(info->SSID); + g_free(info->username); + g_free(info); +} + +/* + * Gets the admin SSH folder for OpenSSH. OpenSSH does not store + * the authorized_key file in the users home directory for security reasons and + * instead stores it at %PROGRAMDATA%/ssh. This function returns the path to + * that directory on the users machine parameters: errp -> error structure to + * set when an error occurs returns: The path to
Re: [PATCH-for-9.0 0/2] target/monitor: Deprecate 'info tlb/mem' in favor of 'info mmu'
* Peter Maydell (peter.mayd...@linaro.org) wrote: > On Wed, 20 Mar 2024 at 17:06, Philippe Mathieu-Daudé > wrote: > > > > +Alex/Daniel > > > > On 20/3/24 17:53, Peter Maydell wrote: > > > On Wed, 20 Mar 2024 at 16:40, Philippe Mathieu-Daudé > > > wrote: > > >> > > >> 'info tlb' and 'info mem' commands don't scale in heterogeneous > > >> emulation. They will be reworked after the next release, hidden > > >> behind the 'info mmu' command. It is not too late to deprecate > > >> commands, so add the 'info mmu' command as wrapper to the other > > >> ones, but already deprecate them. > > >> > > >> Philippe Mathieu-Daudé (2): > > >>target/monitor: Introduce 'info mmu' command > > >>target/monitor: Deprecate 'info tlb' and 'info mem' commands > > > > > > This seems to replace "info tlb" and "info mem" with "info mmu -t" > > > and "info mmu -m", but it doesn't really say anything about: > > > * what the difference is between these two things > > > > I really don't know; I'm only trying to keep the monitor interface > > identical. > > You don't, though: you change it from "info tlb" to "info mmu -t" etc. > > > > * which targets implement which and why > > > > This one is easy to answer: > > > > #if defined(TARGET_I386) || defined(TARGET_SH4) || defined(TARGET_SPARC) > > || \ > > defined(TARGET_PPC) || defined(TARGET_XTENSA) || defined(TARGET_M68K) > > { > > .name = "tlb", > > > > #if defined(TARGET_I386) || defined(TARGET_RISCV) > > { > > .name = "mem", > > > > > * what the plan is for the future > > > > My problem is with linking a single QEMU binary, as these two symbols > > (hmp_info_mem and hmp_info_tlb) clash. > > Yes, but they both (implicitly) operate on the current HMP CPU, > so the problem with linking into a single binary is that they're > not indirected through a method on the CPU object, not the syntax > used in the monitor to invoke them, presumably. > > > I'm indeed only postponing the problem, without looking at what > > this code does. I did it adding hmp_info_mmu_tlb/mem hooks in > > TCGCPUOps ("hw/core/tcg-cpu-ops.h"), so the command can be > > dispatched per target vcpu as target-agnostic code in > > monitor/hmp-cmds.c: > > > > +#include "hw/core/tcg-cpu-ops.h" > > + > > +static void hmp_info_mmu_tlb(Monitor *mon, CPUState *cpu) > > +{ > > +const TCGCPUOps *tcg_ops = cpu->cc->tcg_ops; > > + > > +if (tcg_ops->hmp_info_mmu_tlb) { > > +tcg_ops->hmp_info_mmu_tlb(mon, cpu_env(cpu)); > > +} else { > > +monitor_puts(mon, "No per-CPU information available on this > > target\n"); > > +} > > +} > > These aren't TCG specific though, so why TCGCPUOps ? > > > > I am definitely not a fan of either of these commands, because > > > (as we currently implement them) they effectively require each > > > target architecture to implement a second copy of the page table > > > walking code. But before we can deprecate them we need to be > > > pretty sure that "info mmu" is what we want to replace them with. > > > > An alternative is to just deprecate them, without adding "info mmu" :) > > > > It is OK to un-deprecate stuff if we realize its usefulness. > > The commands are there because some users find them useful. > I just dislike them because I think they're a bit niche and > annoying to implement and not consistent across target > architectures and not very well documented... > > By the way, we have no obligation to follow the deprecate-and-drop > process for HMP commands; unlike QMP, we give ourselves the > license to vary it when we feel like it, because the users are > humans, not programs or scripts. Right, so no rush to get the deprecation in; change it when you agree what you'd like a replacement to look like. Dave > -- PMM -- -Open up your eyes, open up your mind, open up your code --- / Dr. David Alan Gilbert| Running GNU/Linux | Happy \ \dave @ treblig.org | | In Hex / \ _|_ http://www.treblig.org |___/
[PATCH] migration/postcopy: Fix high frequency sync
From: Peter Xu On current code base I can observe extremely high sync count during precopy, as long as one enables postcopy-ram=on before switchover to postcopy. To provide some context of when we decide to do a full sync: we check must_precopy (which implies "data must be sent during precopy phase"), and as long as it is lower than the threshold size we calculated (out of bandwidth and expected downtime) we will kick off the slow sync. However, when postcopy is enabled (even if still during precopy phase), RAM only reports all pages as can_postcopy, and report must_precopy==0. Then "must_precopy <= threshold_size" mostly always triggers and enforces a slow sync for every call to migration_iteration_run() when postcopy is enabled even if not used. That is insane. It turns out it was a regress bug introduced in the previous refactoring in QEMU 8.0 in late 2022. Fix this by checking the whole RAM size rather than must_precopy, like before. Not copy stable yet as many things changed, and even if this should be a major performance regression, no functional change has observed (and that's also probably why nobody found it). I only notice this when looking for another bug reported by Nina. When at it, cleanup a little bit on the lines around. Cc: Nina Schoetterl-Glausch Fixes: c8df4a7aef ("migration: Split save_live_pending() into state_pending_*") Signed-off-by: Peter Xu --- Nina: I copied you only because this might still be relevant, as this issue also misteriously points back to c8df4a7aef.. However I don't think it should be a fix of your problem, at most it can change the possibility of reproducability. This is not a regression for this release, but I still want to have it for 9.0. Fabiano, any opinions / objections? --- migration/migration.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 047b6b49cf..9fe8fd2afd 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -3199,17 +3199,16 @@ typedef enum { */ static MigIterateState migration_iteration_run(MigrationState *s) { -uint64_t must_precopy, can_postcopy; +uint64_t must_precopy, can_postcopy, pending_size; Error *local_err = NULL; bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE; bool can_switchover = migration_can_switchover(s); qemu_savevm_state_pending_estimate(_precopy, _postcopy); -uint64_t pending_size = must_precopy + can_postcopy; - +pending_size = must_precopy + can_postcopy; trace_migrate_pending_estimate(pending_size, must_precopy, can_postcopy); -if (must_precopy <= s->threshold_size) { +if (pending_size < s->threshold_size) { qemu_savevm_state_pending_exact(_precopy, _postcopy); pending_size = must_precopy + can_postcopy; trace_migrate_pending_exact(pending_size, must_precopy, can_postcopy); -- 2.44.0
Re: [PATCH v4 2/2] vhost: Perform memory section dirty scans once per iteration
On 3/19/2024 8:27 PM, Jason Wang wrote: On Tue, Mar 19, 2024 at 6:16 AM Si-Wei Liu wrote: On 3/17/2024 8:22 PM, Jason Wang wrote: On Sat, Mar 16, 2024 at 2:45 AM Si-Wei Liu wrote: On 3/14/2024 9:03 PM, Jason Wang wrote: On Fri, Mar 15, 2024 at 5:39 AM Si-Wei Liu wrote: On setups with one or more virtio-net devices with vhost on, dirty tracking iteration increases cost the bigger the number amount of queues are set up e.g. on idle guests migration the following is observed with virtio-net with vhost=on: 48 queues -> 78.11% [.] vhost_dev_sync_region.isra.13 8 queues -> 40.50% [.] vhost_dev_sync_region.isra.13 1 queue -> 6.89% [.] vhost_dev_sync_region.isra.13 2 devices, 1 queue -> 18.60% [.] vhost_dev_sync_region.isra.14 With high memory rates the symptom is lack of convergence as soon as it has a vhost device with a sufficiently high number of queues, the sufficient number of vhost devices. On every migration iteration (every 100msecs) it will redundantly query the *shared log* the number of queues configured with vhost that exist in the guest. For the virtqueue data, this is necessary, but not for the memory sections which are the same. So essentially we end up scanning the dirty log too often. To fix that, select a vhost device responsible for scanning the log with regards to memory sections dirty tracking. It is selected when we enable the logger (during migration) and cleared when we disable the logger. If the vhost logger device goes away for some reason, the logger will be re-selected from the rest of vhost devices. After making mem-section logger a singleton instance, constant cost of 7%-9% (like the 1 queue report) will be seen, no matter how many queues or how many vhost devices are configured: 48 queues -> 8.71%[.] vhost_dev_sync_region.isra.13 2 devices, 8 queues -> 7.97% [.] vhost_dev_sync_region.isra.14 Co-developed-by: Joao Martins Signed-off-by: Joao Martins Signed-off-by: Si-Wei Liu --- v3 -> v4: - add comment to clarify effect on cache locality and performance v2 -> v3: - add after-fix benchmark to commit log - rename vhost_log_dev_enabled to vhost_dev_should_log - remove unneeded comparisons for backend_type - use QLIST array instead of single flat list to store vhost logger devices - simplify logger election logic --- hw/virtio/vhost.c | 67 ++- include/hw/virtio/vhost.h | 1 + 2 files changed, 62 insertions(+), 6 deletions(-) diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index 612f4db..58522f1 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -45,6 +45,7 @@ static struct vhost_log *vhost_log[VHOST_BACKEND_TYPE_MAX]; static struct vhost_log *vhost_log_shm[VHOST_BACKEND_TYPE_MAX]; +static QLIST_HEAD(, vhost_dev) vhost_log_devs[VHOST_BACKEND_TYPE_MAX]; /* Memslots used by backends that support private memslots (without an fd). */ static unsigned int used_memslots; @@ -149,6 +150,47 @@ bool vhost_dev_has_iommu(struct vhost_dev *dev) } } +static inline bool vhost_dev_should_log(struct vhost_dev *dev) +{ +assert(dev->vhost_ops); +assert(dev->vhost_ops->backend_type > VHOST_BACKEND_TYPE_NONE); +assert(dev->vhost_ops->backend_type < VHOST_BACKEND_TYPE_MAX); + +return dev == QLIST_FIRST(_log_devs[dev->vhost_ops->backend_type]); A dumb question, why not simple check dev->log == vhost_log_shm[dev->vhost_ops->backend_type] Because we are not sure if the logger comes from vhost_log_shm[] or vhost_log[]. Don't want to complicate the check here by calling into vhost_dev_log_is_shared() everytime when the .log_sync() is called. It has very low overhead, isn't it? Whether this has low overhead will have to depend on the specific backend's implementation for .vhost_requires_shm_log(), which the common vhost layer should not assume upon or rely on the current implementation. static bool vhost_dev_log_is_shared(struct vhost_dev *dev) { return dev->vhost_ops->vhost_requires_shm_log && dev->vhost_ops->vhost_requires_shm_log(dev); } For example, if I understand the code correctly, the log type won't be changed during runtime, so we can endup with a boolean to record that instead of a query ops? Right now the log type won't change during runtime, but I am not sure if this may prohibit future revisit to allow change at the runtime, then there'll be complex code involvled to maintain the state. Other than this, I think it's insufficient to just check the shm log v.s. normal log. The logger device requires to identify a leading logger device that gets elected in vhost_dev_elect_mem_logger(), as all the dev->log points to the same logger that is refenerce counted, that we have to add extra field and complex logic to maintain the election status. I thought that Eugenio's previous suggestion tried to simplify the logic in vhost_dev_elect_mem_logger(), as the QLIST_FIRST
Re: [PATCH v2 1/2] target/riscv/csr.c: Add functional of hvictl CSR
Hi, This patch doesn't apply in master or alistair/riscv-to-apply.next. Can you please re-send? Thanks, Daniel On 3/20/24 13:42, Irina Ryapolova wrote: CSR hvictl (Hypervisor Virtual Interrupt Control) provides further flexibility for injecting interrupts into VS level in situations not fully supported by the facilities described thus far, but only with more active involvement of the hypervisor. A hypervisor must use hvictl for any of the following: • asserting for VS level a major interrupt not supported by hvien and hvip; • implementing configurability of priorities at VS level for major interrupts beyond those sup- ported by hviprio1 and hviprio2; or • emulating an external interrupt controller for a virtual hart without the use of an IMSIC’s guest interrupt file, while also supporting configurable priorities both for external interrupts and for major interrupts to the virtual hart. All hvictl fields together can affect the value of CSR vstopi (Virtual Supervisor Top Interrupt) and therefore the interrupt identity reported in vscause when an interrupt traps to VS-mode. When hvictl.VTI = 1, the absence of an interrupt for VS level can be indicated only by setting hvictl.IID = 9. Software might want to use the pair IID = 9, IPRIO = 0 generally to represent no interrupt in hvictl. (See riscv-interrupts-1.0: Interrupts at VS level) Signed-off-by: Irina Ryapolova --- Changes for v2: -added more information in commit message --- target/riscv/csr.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/target/riscv/csr.c b/target/riscv/csr.c index 674ea075a4..0c21145eaf 100644 --- a/target/riscv/csr.c +++ b/target/riscv/csr.c @@ -3585,6 +3585,21 @@ static int read_hvictl(CPURISCVState *env, int csrno, target_ulong *val) static int write_hvictl(CPURISCVState *env, int csrno, target_ulong val) { env->hvictl = val & HVICTL_VALID_MASK; +if (env->hvictl & HVICTL_VTI) +{ +uint32_t hviid = get_field(env->hvictl, HVICTL_IID); +uint32_t hviprio = get_field(env->hvictl, HVICTL_IPRIO); +/* the pair IID = 9, IPRIO = 0 generally to represent no interrupt in hvictl. */ +if (!(hviid == IRQ_S_EXT && hviprio == 0)) { +uint64_t new_val = BIT(hviid) ; + if (new_val & S_MODE_INTERRUPTS) { +rmw_hvip64(env, csrno, NULL, new_val << 1, new_val << 1); +} else if (new_val & LOCAL_INTERRUPTS) { +rmw_hvip64(env, csrno, NULL, new_val, new_val); +} +} +} + return RISCV_EXCP_NONE; }
Re: [PATCH RFC v3 00/49] Add AMD Secure Nested Paging (SEV-SNP) support
On 3/21/2024 1:08 AM, Paolo Bonzini wrote: On Wed, Mar 20, 2024 at 10:59 AM Paolo Bonzini wrote: I will now focus on reviewing patches 6-20. This way we can prepare a common tree for SEV_INIT2/SNP/TDX, for both vendors to build upon. Ok, the attachment is the delta that I have. The only major change is requiring discard (thus effectively blocking VFIO support for SEV-SNP/TDX, at least for now). I will push it shortly to the same sevinit2 branch, and will post the patches sometime soon. Xiaoyao, you can use that branch too (it's on https://gitlab.com/bonzini/qemu) as the basis for your TDX work. Sure, it's really a good news for us. BTW, there are some minor comments on guest_memfd patches of my v5 post[*]. Could you please resolve them it your branch? [*] https://lore.kernel.org/qemu-devel/20240229063726.610065-1-xiaoyao...@intel.com/ Paolo
[PATCH 1/3] hw/virtio: initialize QemuDmaBuf using the function from ui/console
From: Dongwon Kim QemuDmaBuf is an abstraction of dmabuf specifically for ui/console usage. To enhance safety and maintainability, it is needed to centralizes its creation and initialization within ui/console using newly introduced methods. Cc: Philippe Mathieu-Daudé Cc: Marc-André Lureau Cc: Vivek Kasireddy Signed-off-by: Dongwon Kim --- hw/display/virtio-gpu-udmabuf.c | 27 +++ include/hw/virtio/virtio-gpu.h | 2 +- 2 files changed, 12 insertions(+), 17 deletions(-) diff --git a/hw/display/virtio-gpu-udmabuf.c b/hw/display/virtio-gpu-udmabuf.c index d51184d658..dde6c8e9d9 100644 --- a/hw/display/virtio-gpu-udmabuf.c +++ b/hw/display/virtio-gpu-udmabuf.c @@ -162,7 +162,7 @@ static void virtio_gpu_free_dmabuf(VirtIOGPU *g, VGPUDMABuf *dmabuf) struct virtio_gpu_scanout *scanout; scanout = >parent_obj.scanout[dmabuf->scanout_id]; -dpy_gl_release_dmabuf(scanout->con, >buf); +dpy_gl_release_dmabuf(scanout->con, dmabuf->buf); QTAILQ_REMOVE(>dmabuf.bufs, dmabuf, next); g_free(dmabuf); } @@ -181,17 +181,10 @@ static VGPUDMABuf } dmabuf = g_new0(VGPUDMABuf, 1); -dmabuf->buf.width = r->width; -dmabuf->buf.height = r->height; -dmabuf->buf.stride = fb->stride; -dmabuf->buf.x = r->x; -dmabuf->buf.y = r->y; -dmabuf->buf.backing_width = fb->width; -dmabuf->buf.backing_height = fb->height; -dmabuf->buf.fourcc = qemu_pixman_to_drm_format(fb->format); -dmabuf->buf.fd = res->dmabuf_fd; -dmabuf->buf.allow_fences = true; -dmabuf->buf.draw_submitted = false; +dmabuf->buf = dpy_gl_create_dmabuf(r->width, r->height, fb->stride, + r->x, r->y, fb->width, fb->height, + qemu_pixman_to_drm_format(fb->format), + 0, res->dmabuf_fd, false); dmabuf->scanout_id = scanout_id; QTAILQ_INSERT_HEAD(>dmabuf.bufs, dmabuf, next); @@ -206,21 +199,23 @@ int virtio_gpu_update_dmabuf(VirtIOGPU *g, { struct virtio_gpu_scanout *scanout = >parent_obj.scanout[scanout_id]; VGPUDMABuf *new_primary, *old_primary = NULL; +uint32_t width, height; new_primary = virtio_gpu_create_dmabuf(g, scanout_id, res, fb, r); if (!new_primary) { return -EINVAL; } +width = dpy_gl_dmabuf_get_width(new_primary->buf); +height = dpy_gl_dmabuf_get_height(new_primary->buf); + if (g->dmabuf.primary[scanout_id]) { old_primary = g->dmabuf.primary[scanout_id]; } g->dmabuf.primary[scanout_id] = new_primary; -qemu_console_resize(scanout->con, -new_primary->buf.width, -new_primary->buf.height); -dpy_gl_scanout_dmabuf(scanout->con, _primary->buf); +qemu_console_resize(scanout->con, width, height); +dpy_gl_scanout_dmabuf(scanout->con, new_primary->buf); if (old_primary) { virtio_gpu_free_dmabuf(g, old_primary); diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h index ed44cdad6b..010083e8e3 100644 --- a/include/hw/virtio/virtio-gpu.h +++ b/include/hw/virtio/virtio-gpu.h @@ -169,7 +169,7 @@ struct VirtIOGPUBaseClass { DEFINE_PROP_UINT32("yres", _state, _conf.yres, 800) typedef struct VGPUDMABuf { -QemuDmaBuf buf; +QemuDmaBuf *buf; uint32_t scanout_id; QTAILQ_ENTRY(VGPUDMABuf) next; } VGPUDMABuf; -- 2.34.1
[PATCH 2/3] hw/vfio: intialize QemuDmaBuf using the function from ui/console
From: Dongwon Kim QemuDmaBuf is an abstraction of dmabuf specifically for ui/console usage. To enhance safety and maintainability, it is needed to centralizes its creation and initialization within ui/console using newly introduced methods. Cc: Philippe Mathieu-Daudé Cc: Marc-André Lureau Cc: Vivek Kasireddy Signed-off-by: Dongwon Kim --- hw/vfio/display.c | 35 --- include/hw/vfio/vfio-common.h | 2 +- 2 files changed, 21 insertions(+), 16 deletions(-) diff --git a/hw/vfio/display.c b/hw/vfio/display.c index 1aa440c663..a3bdb01789 100644 --- a/hw/vfio/display.c +++ b/hw/vfio/display.c @@ -241,14 +241,10 @@ static VFIODMABuf *vfio_display_get_dmabuf(VFIOPCIDevice *vdev, dmabuf = g_new0(VFIODMABuf, 1); dmabuf->dmabuf_id = plane.dmabuf_id; -dmabuf->buf.width = plane.width; -dmabuf->buf.height = plane.height; -dmabuf->buf.backing_width = plane.width; -dmabuf->buf.backing_height = plane.height; -dmabuf->buf.stride = plane.stride; -dmabuf->buf.fourcc = plane.drm_format; -dmabuf->buf.modifier = plane.drm_format_mod; -dmabuf->buf.fd = fd; +dmabuf->buf = dpy_gl_create_dmabuf(plane.width, plane.height, plane.stride, + 0, 0, plane.width, plane.height, + plane.drm_format, plane.drm_format_mod, + fd, false); if (plane_type == DRM_PLANE_TYPE_CURSOR) { vfio_display_update_cursor(dmabuf, ); } @@ -259,9 +255,15 @@ static VFIODMABuf *vfio_display_get_dmabuf(VFIOPCIDevice *vdev, static void vfio_display_free_one_dmabuf(VFIODisplay *dpy, VFIODMABuf *dmabuf) { +int fd; + QTAILQ_REMOVE(>dmabuf.bufs, dmabuf, next); -dpy_gl_release_dmabuf(dpy->con, >buf); -close(dmabuf->buf.fd); +fd = dpy_gl_dmabuf_get_fd(dmabuf->buf); +if (fd > -1) { +close(fd); +} + +dpy_gl_release_dmabuf(dpy->con, dmabuf->buf); g_free(dmabuf); } @@ -286,6 +288,7 @@ static void vfio_display_dmabuf_update(void *opaque) VFIOPCIDevice *vdev = opaque; VFIODisplay *dpy = vdev->dpy; VFIODMABuf *primary, *cursor; +uint32_t width, height; bool free_bufs = false, new_cursor = false; primary = vfio_display_get_dmabuf(vdev, DRM_PLANE_TYPE_PRIMARY); @@ -296,11 +299,13 @@ static void vfio_display_dmabuf_update(void *opaque) return; } +width = dpy_gl_dmabuf_get_width(primary->buf); +height = dpy_gl_dmabuf_get_height(primary->buf); + if (dpy->dmabuf.primary != primary) { dpy->dmabuf.primary = primary; -qemu_console_resize(dpy->con, -primary->buf.width, primary->buf.height); -dpy_gl_scanout_dmabuf(dpy->con, >buf); +qemu_console_resize(dpy->con, width, height); +dpy_gl_scanout_dmabuf(dpy->con, primary->buf); free_bufs = true; } @@ -314,7 +319,7 @@ static void vfio_display_dmabuf_update(void *opaque) if (cursor && (new_cursor || cursor->hot_updates)) { bool have_hot = (cursor->hot_x != 0x && cursor->hot_y != 0x); -dpy_gl_cursor_dmabuf(dpy->con, >buf, have_hot, +dpy_gl_cursor_dmabuf(dpy->con, cursor->buf, have_hot, cursor->hot_x, cursor->hot_y); cursor->hot_updates = 0; } else if (!cursor && new_cursor) { @@ -328,7 +333,7 @@ static void vfio_display_dmabuf_update(void *opaque) cursor->pos_updates = 0; } -dpy_gl_update(dpy->con, 0, 0, primary->buf.width, primary->buf.height); +dpy_gl_update(dpy->con, 0, 0, width, height); if (free_bufs) { vfio_display_free_dmabufs(vdev); diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index b9da6c08ef..d66e27db02 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -148,7 +148,7 @@ typedef struct VFIOGroup { } VFIOGroup; typedef struct VFIODMABuf { -QemuDmaBuf buf; +QemuDmaBuf *buf; uint32_t pos_x, pos_y, pos_updates; uint32_t hot_x, hot_y, hot_updates; int dmabuf_id; -- 2.34.1
[PATCH 3/3] ui/console: add methods for allocating, intializing and accessing QemuDmaBuf
From: Dongwon Kim This commit introduces new methods within ui/console to handle the allocation, initialization, and field retrieval of QemuDmaBuf. By isolating these operations within ui/console, it enhances safety and encapsulation of the struct. Cc: Philippe Mathieu-Daudé Cc: Marc-André Lureau Cc: Vivek Kasireddy Signed-off-by: Dongwon Kim --- include/ui/console.h | 10 ui/console.c | 55 2 files changed, 65 insertions(+) diff --git a/include/ui/console.h b/include/ui/console.h index 0bc7a00ac0..70903f1b0d 100644 --- a/include/ui/console.h +++ b/include/ui/console.h @@ -279,6 +279,7 @@ typedef struct DisplayChangeListenerOps { /* optional */ void (*dpy_gl_cursor_position)(DisplayChangeListener *dcl, uint32_t pos_x, uint32_t pos_y); + /* optional */ void (*dpy_gl_release_dmabuf)(DisplayChangeListener *dcl, QemuDmaBuf *dmabuf); @@ -358,6 +359,15 @@ void dpy_gl_cursor_dmabuf(QemuConsole *con, QemuDmaBuf *dmabuf, bool have_hot, uint32_t hot_x, uint32_t hot_y); void dpy_gl_cursor_position(QemuConsole *con, uint32_t pos_x, uint32_t pos_y); +QemuDmaBuf *dpy_gl_create_dmabuf(uint32_t width, uint32_t height, + uint32_t stride, uint32_t x, + uint32_t y, uint32_t backing_width, + uint32_t backing_height, uint32_t fourcc, + uint32_t modifier, uint32_t dmabuf_fd, + bool allow_fences); +uint32_t dpy_gl_dmabuf_get_width(QemuDmaBuf *dmabuf); +uint32_t dpy_gl_dmabuf_get_height(QemuDmaBuf *dmabuf); +int32_t dpy_gl_dmabuf_get_fd(QemuDmaBuf *dmabuf); void dpy_gl_release_dmabuf(QemuConsole *con, QemuDmaBuf *dmabuf); void dpy_gl_update(QemuConsole *con, diff --git a/ui/console.c b/ui/console.c index 43226c5c14..bac24756f0 100644 --- a/ui/console.c +++ b/ui/console.c @@ -1132,6 +1132,60 @@ void dpy_gl_cursor_position(QemuConsole *con, } } +QemuDmaBuf *dpy_gl_create_dmabuf(uint32_t width, uint32_t height, + uint32_t stride, uint32_t x, + uint32_t y, uint32_t backing_width, + uint32_t backing_height, uint32_t fourcc, + uint32_t modifier, uint32_t dmabuf_fd, + bool allow_fences) +{ +QemuDmaBuf *dmabuf; + +dmabuf = g_new0(QemuDmaBuf, 1); + +dmabuf->width = width; +dmabuf->height = height; +dmabuf->stride = stride; +dmabuf->x = x; +dmabuf->y = y; +dmabuf->backing_width = backing_width; +dmabuf->backing_height = backing_height; +dmabuf->fourcc = fourcc; +dmabuf->modifier = modifier; +dmabuf->fd = dmabuf_fd; +dmabuf->allow_fences = allow_fences; +dmabuf->fence_fd = -1; + +return dmabuf; +} + +uint32_t dpy_gl_dmabuf_get_width(QemuDmaBuf *dmabuf) +{ +if (dmabuf) { +return dmabuf->width; +} + +return 0; +} + +uint32_t dpy_gl_dmabuf_get_height(QemuDmaBuf *dmabuf) +{ +if (dmabuf) { +return dmabuf->height; +} + +return 0; +} + +int32_t dpy_gl_dmabuf_get_fd(QemuDmaBuf *dmabuf) +{ +if (dmabuf) { +return dmabuf->fd; +} + +return -1; +} + void dpy_gl_release_dmabuf(QemuConsole *con, QemuDmaBuf *dmabuf) { @@ -1145,6 +1199,7 @@ void dpy_gl_release_dmabuf(QemuConsole *con, if (dcl->ops->dpy_gl_release_dmabuf) { dcl->ops->dpy_gl_release_dmabuf(dcl, dmabuf); } +g_free(dmabuf); } } -- 2.34.1
[PATCH 0/3] ui/console: initialize QemuDmaBuf in ui/console
From: Dongwon Kim QemuDmaBuf struct is defined and primarily used by ui/console/gl so it is better to handle its creation, initialization and access within ui/console rather than within hw modules such as hw/display/virtio-gpu and hw/vfio/display. To achieve this, new methods for allocating, initializing the struct, and accessing certain fields necessary for hardware modules have been introduced in ui/console.c. (3rd patch) Furthermore, modifications have been made to hw/display/virtio-gpu and hw/vfio/display to utilize these new methods instead of setting up the struct independently. (1st and 2nd patches) Dongwon Kim (3): hw/virtio: intialize QemuDmaBuf using the function from ui/console hw/vfio: intialize QemuDmaBuf using the function from ui/console ui/console: add methods for allocating, intializing and accessing QemuDmaBuf hw/display/virtio-gpu-udmabuf.c | 27 +++- hw/vfio/display.c | 35 - include/hw/vfio/vfio-common.h | 2 +- include/hw/virtio/virtio-gpu.h | 2 +- include/ui/console.h| 10 ++ ui/console.c| 55 + 6 files changed, 98 insertions(+), 33 deletions(-) -- 2.34.1
Re: [PATCH 2/3] migration: Drop unnecessary check in ram's pending_exact()
On Wed, Mar 20, 2024 at 03:46:44PM -0400, Peter Xu wrote: > On Wed, Mar 20, 2024 at 08:21:30PM +0100, Nina Schoetterl-Glausch wrote: > > On Wed, 2024-03-20 at 14:57 -0400, Peter Xu wrote: > > > On Wed, Mar 20, 2024 at 06:51:26PM +0100, Nina Schoetterl-Glausch wrote: > > > > On Wed, 2024-01-17 at 15:58 +0800, pet...@redhat.com wrote: > > > > > From: Peter Xu > > > > > > > > > > When the migration frameworks fetches the exact pending sizes, it > > > > > means > > > > > this check: > > > > > > > > > > remaining_size < s->threshold_size > > > > > > > > > > Must have been done already, actually at migration_iteration_run(): > > > > > > > > > > if (must_precopy <= s->threshold_size) { > > > > > qemu_savevm_state_pending_exact(_precopy, _postcopy); > > > > > > > > > > That should be after one round of ram_state_pending_estimate(). It > > > > > makes > > > > > the 2nd check meaningless and can be dropped. > > > > > > > > > > To say it in another way, when reaching ->state_pending_exact(), we > > > > > unconditionally sync dirty bits for precopy. > > > > > > > > > > Then we can drop migrate_get_current() there too. > > > > > > > > > > Signed-off-by: Peter Xu > > > > > > > > Hi Peter, > > > > > > Hi, Nina, > > > > > > > > > > > could you have a look at this issue: > > > > https://gitlab.com/qemu-project/qemu/-/issues/1565 > > > > > > > > which I reopened. Previous thread here: > > > > > > > > https://lore.kernel.org/qemu-devel/20230324184129.3119575-1-...@linux.ibm.com/ > > > > > > > > I'm seeing migration failures with s390x TCG again, which look the same > > > > to me > > > > as those a while back. > > > > > > I'm still quite confused how that could be caused of this. > > > > > > What you described in the previous bug report seems to imply some page was > > > leftover in migration so some page got corrupted after migrated. > > > > > > However what this patch mostly does is it can sync more than before even > > > if > > > I overlooked the condition check there (I still think the check is > > > redundant, there's one outlier when remaining_size == threshold_size, but > > > I > > > don't think it should matter here as of now). It'll make more sense if > > > this patch made the sync less, but that's not the case but vice versa. > > > > [...] > > > > > In the previous discussion, you mentioned that you bisected to the commit > > > and also verified the fix. Now you also mentioned in the bz that you > > > can't > > > reporduce this bug manually. > > > > > > Is it still possible to be reproduced with some scripts? Do you also mean > > > that it's harder to reproduce comparing to before? In all cases, some way > > > to reproduce it would definitely be helpful. > > > > I tried running the kvm-unit-test a bunch of times in a loop and couldn't > > trigger a failure. I just tried again on a different system and managed just > > fine, yay. No idea why it wouldn't on the first system tho. > > There's probably still a bug somewhere. If reproduction rate changed, it's > also a sign that it might not be directly relevant to this change, as > otherwise it should reproduce the same as before. > > > > > > > Even if we want to revert this change, we'll need to know whether this > > > will > > > fix your case so we need something to verify it before a revert. I'll > > > consider that the last though as I had a feeling this is papering over > > > something else. > > > > I can check if I can reproduce the issue before & after b0504edd > > ("migration: > > Drop unnecessary check in ram's pending_exact()"). > > I can also check if I can reproduce it on x86, that worked last time. > > Anything else? Ideas on how to pinpoint where the corruption happens? > > I don't have a solid clue yet, but more information of the single case > where it reproduced could help. > > I saw from the bug link that the cmdline is pretty simple. However still > not sure of something that can be relevant. E.g., did you use postcopy > (including when postcopy-ram enabled but precopy completed)? Is there any > special device, like s390's CMMA (would that simplest cmdline include such > a device; apologies, I have zero knowledge there before today)? > > I _think_ when reading the code I already found something quite unusual, > but only when postcopy is selected: I notice postcopy will frequently sync > dirty bitmap while it doesn't really necessarily need to, because > ram_state_pending_estimate() will report all ram as "can_postcopy"; it > means it's highly likely that this check will 99.999% always be true simply > because must_precopy can in most cases be zero: > > if (must_precopy <= s->threshold_size) { < > here > qemu_savevm_state_pending_exact(_precopy, _postcopy); > pending_size = must_precopy + can_postcopy; > trace_migrate_pending_exact(pending_size, must_precopy, can_postcopy); > } > > I need to think more of this, but this doesn't
Re: [PATCH v5 5/7] migration/multifd: implement initialization of qpl compression
On Wed, Mar 20, 2024 at 04:23:01PM +, Liu, Yuan1 wrote: > let me explain here, during the decompression operation of IAA, the > decompressed data can be directly output to the virtual address of the > guest memory by IAA hardware. It can avoid copying the decompressed data > to guest memory by CPU. I see. > Without -mem-prealloc, all the guest memory is not populated, and IAA > hardware needs to trigger I/O page fault first and then output the > decompressed data to the guest memory region. Besides that, CPU page > faults will also trigger IOTLB flush operation when IAA devices use SVM. Oh so the IAA hardware already can use CPU pgtables? Nice.. Why IOTLB flush is needed? AFAIU we're only installing new pages, the request can either come from a CPU access or a DMA. In all cases there should have no tearing down of an old page. Isn't an iotlb flush only needed if a tear down happens? > > Due to the inability to quickly resolve a large number of IO page faults > and IOTLB flushes, the decompression throughput of the IAA device will > decrease significantly. -- Peter Xu
Re: [PATCH v4 1/2] vhost: dirty log should be per backend type
On 3/19/2024 8:25 PM, Jason Wang wrote: On Tue, Mar 19, 2024 at 6:06 AM Si-Wei Liu wrote: On 3/17/2024 8:20 PM, Jason Wang wrote: On Sat, Mar 16, 2024 at 2:33 AM Si-Wei Liu wrote: On 3/14/2024 8:50 PM, Jason Wang wrote: On Fri, Mar 15, 2024 at 5:39 AM Si-Wei Liu wrote: There could be a mix of both vhost-user and vhost-kernel clients in the same QEMU process, where separate vhost loggers for the specific vhost type have to be used. Make the vhost logger per backend type, and have them properly reference counted. It's better to describe what's the advantage of doing this. Yes, I can add that to the log. Although it's a niche use case, it was actually a long standing limitation / bug that vhost-user and vhost-kernel loggers can't co-exist per QEMU process, but today it's just silent failure that may be ended up with. This bug fix removes that implicit limitation in the code. Ok. Suggested-by: Michael S. Tsirkin Signed-off-by: Si-Wei Liu --- v3->v4: - remove checking NULL return value from vhost_log_get v2->v3: - remove non-effective assertion that never be reached - do not return NULL from vhost_log_get() - add neccessary assertions to vhost_log_get() --- hw/virtio/vhost.c | 45 + 1 file changed, 33 insertions(+), 12 deletions(-) diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index 2c9ac79..612f4db 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -43,8 +43,8 @@ do { } while (0) #endif -static struct vhost_log *vhost_log; -static struct vhost_log *vhost_log_shm; +static struct vhost_log *vhost_log[VHOST_BACKEND_TYPE_MAX]; +static struct vhost_log *vhost_log_shm[VHOST_BACKEND_TYPE_MAX]; /* Memslots used by backends that support private memslots (without an fd). */ static unsigned int used_memslots; @@ -287,6 +287,10 @@ static int vhost_set_backend_type(struct vhost_dev *dev, r = -1; } +if (r == 0) { +assert(dev->vhost_ops->backend_type == backend_type); +} + Under which condition could we hit this? Just in case some other function inadvertently corrupted this earlier, we have to capture discrepancy in the first place... On the other hand, it will be helpful for other vhost backend writers to diagnose day-one bug in the code. I feel just code comment here will not be sufficient/helpful. See below. It seems not good to assert a local logic. It seems to me quite a few local asserts are in the same file already, vhost_save_backend_state, For example it has assert for assert(!dev->started); which is not the logic of the function itself but require vhost_dev_start() not to be called before. But it looks like this patch you assert the code just a few lines above the assert itself? Yes, that was the intent - for e.g. xxx_ops may contain corrupted xxx_ops.backend_type already before coming to this vhost_set_backend_type() function. And we may capture this corrupted state by asserting the expected xxx_ops.backend_type (to be consistent with the backend_type passed in), This can happen for all variables. Not sure why backend_ops is special. The assert is just checking the backend_type field only. The other op fields in backend_ops have similar assert within the op function itself also. For e.g. vhost_user_requires_shm_log() and a lot of other vhost_user ops have the following: assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER); vhost_vdpa_vq_get_addr() and a lot of other vhost_vdpa ops have: assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA); vhost_kernel ops has similar assertions as well. The reason why it has to be checked against here is now the callers of vhost_log_get(), would pass in dev->vhost_ops->backend_type to the API, which are unable to verify the validity of the backend_type by themselves. The vhost_log_get() has necessary asserts to make bound check for the vhost_log[] or vhost_log_shm[] array, but specific assert against the exact backend type in vhost_set_backend_type() will further harden the implementation in vhost_log_get() and other backend ops. which needs be done in the first place when this discrepancy is detected. In practice I think there should be no harm to add this assert, but this will add warranted guarantee to the current code. For example, such corruption can happen after the assert() so a TOCTOU issue. Sure, it's best effort only. As pointed out earlier, I think together with this, there are other similar asserts already in various backend ops, which could be helpful to nail down the earliest point or a specific range where things may go wrong in the first place. Thanks, -Siwei Thanks Regards, -Siwei dev->vhost_ops = _ops; ... assert(dev->vhost_ops->backend_type == backend_type) ? Thanks vhost_load_backend_state, vhost_virtqueue_mask, vhost_config_mask, just to name a few. Why local assert a problem? Thanks, -Siwei Thanks
[PATCH v2 0/2] Add support for STM32G0 SoC family
Hi all, These two patches add support for STM32G0 family and nucleo-g071rb board. Patches have been tested with minimal embedded rust examples. Changes since v1: - Patch 1: - Convert tabs to spaces (checkpatch.pl) - Correct lines longer than 80 characters (checkpatch.pl) - Correct num-prio-bits (Samuel Tardieu) - Correct num-irqs (Found reviewing RM0444) - Patch 2: - Convert tabs to spaces (checkpatch.pl) Felipe Balbi (2): hw/arm: Add support for stm32g000 SoC family hw/arm: Add nucleo-g071rb board MAINTAINERS| 13 ++ hw/arm/Kconfig | 12 ++ hw/arm/meson.build | 2 + hw/arm/nucleo-g071rb.c | 70 + hw/arm/stm32g000_soc.c | 253 + include/hw/arm/stm32g000_soc.h | 62 6 files changed, 412 insertions(+) create mode 100644 hw/arm/nucleo-g071rb.c create mode 100644 hw/arm/stm32g000_soc.c create mode 100644 include/hw/arm/stm32g000_soc.h -- 2.44.0
[PATCH v2 1/2] hw/arm: Add support for stm32g000 SoC family
Minimal support with USARTs and SPIs working. This SoC will be used to create and nucleo-g071rb board. Signed-off-by: Felipe Balbi --- Changes since v1: - Convert tabs to spaces (checkpatch.pl) - Correct lines longer than 80 characters (checkpatch.pl) - Correct num-prio-bits (Samuel Tardieu) - Correct num-irqs (Found reviewing RM0444) MAINTAINERS| 7 + hw/arm/Kconfig | 6 + hw/arm/meson.build | 1 + hw/arm/stm32g000_soc.c | 253 + include/hw/arm/stm32g000_soc.h | 62 5 files changed, 329 insertions(+) create mode 100644 hw/arm/stm32g000_soc.c create mode 100644 include/hw/arm/stm32g000_soc.h diff --git a/MAINTAINERS b/MAINTAINERS index 409d7db4d457..bce2eb3ad70b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1134,6 +1134,13 @@ F: hw/misc/stm32l4x5_rcc.c F: hw/gpio/stm32l4x5_gpio.c F: include/hw/*/stm32l4x5_*.h +STM32G000 SoC Family +M: Felipe Balbi +L: qemu-...@nongnu.org +S: Maintained +F: hw/arm/stm32g000_soc.c +F: include/hw/*/stm32g000_*.h + B-L475E-IOT01A IoT Node M: Arnaud Minier M: Inès Varhol diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig index 893a7bff66b9..28a46d2b1ad3 100644 --- a/hw/arm/Kconfig +++ b/hw/arm/Kconfig @@ -463,6 +463,12 @@ config STM32F405_SOC select STM32F4XX_SYSCFG select STM32F4XX_EXTI +config STM32G000_SOC +bool +select ARM_V7M +select STM32F2XX_USART +select STM32F2XX_SPI + config B_L475E_IOT01A bool default y diff --git a/hw/arm/meson.build b/hw/arm/meson.build index 6808135c1f79..9c4137a988e1 100644 --- a/hw/arm/meson.build +++ b/hw/arm/meson.build @@ -34,6 +34,7 @@ arm_ss.add(when: ['CONFIG_RASPI', 'TARGET_AARCH64'], if_true: files('bcm2838.c', arm_ss.add(when: 'CONFIG_STM32F100_SOC', if_true: files('stm32f100_soc.c')) arm_ss.add(when: 'CONFIG_STM32F205_SOC', if_true: files('stm32f205_soc.c')) arm_ss.add(when: 'CONFIG_STM32F405_SOC', if_true: files('stm32f405_soc.c')) +arm_ss.add(when: 'CONFIG_STM32G000_SOC', if_true: files('stm32g000_soc.c')) arm_ss.add(when: 'CONFIG_B_L475E_IOT01A', if_true: files('b-l475e-iot01a.c')) arm_ss.add(when: 'CONFIG_STM32L4X5_SOC', if_true: files('stm32l4x5_soc.c')) arm_ss.add(when: 'CONFIG_XLNX_ZYNQMP_ARM', if_true: files('xlnx-zynqmp.c', 'xlnx-zcu102.c')) diff --git a/hw/arm/stm32g000_soc.c b/hw/arm/stm32g000_soc.c new file mode 100644 index ..48531d41fcc7 --- /dev/null +++ b/hw/arm/stm32g000_soc.c @@ -0,0 +1,253 @@ +/* + * STM32G000 SoC + * + * Copyright (c) 2024 Felipe Balbi + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include "qemu/osdep.h" +#include "qapi/error.h" +#include "qemu/module.h" +#include "hw/arm/boot.h" +#include "exec/address-spaces.h" +#include "hw/arm/stm32g000_soc.h" +#include "hw/qdev-properties.h" +#include "hw/qdev-clock.h" +#include "hw/misc/unimp.h" +#include "sysemu/sysemu.h" + +/* stm32g000_soc implementation is derived from stm32f100_soc */ + +struct stm32g0_ip_config { +const char *name; +uint32_taddr; +uint32_tirq; +}; + +#define STM32G0_DEFINE_IP(n, a, i)\ +{ \ +.name = (n), \ +.addr = (a), \ +.irq = (i), \ +} + +static const struct stm32g0_ip_config usart_config[STM_NUM_USARTS] = { +STM32G0_DEFINE_IP("USART1", 0x40013800, 27), +STM32G0_DEFINE_IP("USART2", 0x40004000, 28), +STM32G0_DEFINE_IP("USART3", 0x40004400, 29), +STM32G0_DEFINE_IP("USART4", 0x40004800, 29), +STM32G0_DEFINE_IP("USART5", 0x40004c00, 29), +STM32G0_DEFINE_IP("USART6", 0x40005000, 29), +STM32G0_DEFINE_IP("LPUSART1", 0x40008000, 29), +STM32G0_DEFINE_IP("LPUSART2", 0x40008400, 28), +}; + +static const struct stm32g0_ip_config spi_config[STM_NUM_SPIS] = { +STM32G0_DEFINE_IP("SPI1", 0x40013000, 25), +STM32G0_DEFINE_IP("SPI2", 0x40003800, 26), +
[PATCH v2 2/2] hw/arm: Add nucleo-g071rb board
This board is based around STM32G071RB SoC, a Cortex-M0 based device. More information can be found at: https://www.st.com/en/product/nucleo-g071rb.html Signed-off-by: Felipe Balbi --- Changes since v1: - Convert tabs to spaces (checkpatch.pl) MAINTAINERS| 6 hw/arm/Kconfig | 6 hw/arm/meson.build | 1 + hw/arm/nucleo-g071rb.c | 70 ++ 4 files changed, 83 insertions(+) create mode 100644 hw/arm/nucleo-g071rb.c diff --git a/MAINTAINERS b/MAINTAINERS index bce2eb3ad70b..052ce4dcfb97 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1116,6 +1116,12 @@ L: qemu-...@nongnu.org S: Maintained F: hw/arm/netduinoplus2.c +Nucleo G071RB +M: Felipe Balbi +L: qemu-...@nongnu.org +S: Maintained +F: hw/arm/nucleo-g071rb.c + Olimex STM32 H405 M: Felipe Balbi L: qemu-...@nongnu.org diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig index 28a46d2b1ad3..5938bb8208a1 100644 --- a/hw/arm/Kconfig +++ b/hw/arm/Kconfig @@ -310,6 +310,12 @@ config STM32VLDISCOVERY depends on TCG && ARM select STM32F100_SOC +config NUCLEO_G071RB +bool +default y +depends on TCG && ARM +select STM32G000_SOC + config STRONGARM bool select PXA2XX diff --git a/hw/arm/meson.build b/hw/arm/meson.build index 9c4137a988e1..580c2d55fc3f 100644 --- a/hw/arm/meson.build +++ b/hw/arm/meson.build @@ -18,6 +18,7 @@ arm_ss.add(when: 'CONFIG_REALVIEW', if_true: files('realview.c')) arm_ss.add(when: 'CONFIG_SBSA_REF', if_true: files('sbsa-ref.c')) arm_ss.add(when: 'CONFIG_STELLARIS', if_true: files('stellaris.c')) arm_ss.add(when: 'CONFIG_STM32VLDISCOVERY', if_true: files('stm32vldiscovery.c')) +arm_ss.add(when: 'CONFIG_NUCLEO_G071RB', if_true: files('nucleo-g071rb.c')) arm_ss.add(when: 'CONFIG_ZYNQ', if_true: files('xilinx_zynq.c')) arm_ss.add(when: 'CONFIG_SABRELITE', if_true: files('sabrelite.c')) diff --git a/hw/arm/nucleo-g071rb.c b/hw/arm/nucleo-g071rb.c new file mode 100644 index ..580b52bacf2c --- /dev/null +++ b/hw/arm/nucleo-g071rb.c @@ -0,0 +1,70 @@ +/* + * ST Nucleo G071RB + * + * Copyright (c) 2024 Felipe Balbi + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include "qemu/osdep.h" +#include "qapi/error.h" +#include "hw/boards.h" +#include "hw/qdev-properties.h" +#include "hw/qdev-clock.h" +#include "qemu/error-report.h" +#include "hw/arm/stm32g000_soc.h" +#include "hw/arm/boot.h" + +/* nucleo_g071rb implementation is derived from olimex-stm32-h405.c */ + +/* Main SYSCLK frequency in Hz (48MHz) */ +#define SYSCLK_FRQ 4800ULL + +static void nucleo_g071rb_init(MachineState *machine) +{ +DeviceState *dev; +Clock *sysclk; + +/* This clock doesn't need migration because it is fixed-frequency */ +sysclk = clock_new(OBJECT(machine), "SYSCLK"); +clock_set_hz(sysclk, SYSCLK_FRQ); + +dev = qdev_new(TYPE_STM32G000_SOC); +object_property_add_child(OBJECT(machine), "soc", OBJECT(dev)); +qdev_connect_clock_in(dev, "sysclk", sysclk); +sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), _fatal); + +armv7m_load_kernel(ARM_CPU(first_cpu), + machine->kernel_filename, + 0, FLASH_SIZE); +} + +static void nucleo_g071rb_machine_init(MachineClass *mc) +{ +static const char * const valid_cpu_types[] = { +ARM_CPU_TYPE_NAME("cortex-m0"), +NULL +}; + +mc->desc = "ST Nucleo-G071RB (Cortex-M0)"; +mc->init = nucleo_g071rb_init; +mc->valid_cpu_types = valid_cpu_types; +} + +DEFINE_MACHINE("nucleo-g071rb", nucleo_g071rb_machine_init) -- 2.44.0
Re: [PATCH] target/riscv: Fix mode in riscv_tlb_fill
On 3/20/24 14:28, Irina Ryapolova wrote: Need to convert mmu_idx to privilege mode for PMP function. Please add: Fixes: b297129ae1 ("target/riscv: propagate PMP permission to TLB page") Signed-off-by: Irina Ryapolova --- Reviewed-by: Daniel Henrique Barboza target/riscv/cpu_helper.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c index ce7322011d..fc090d729a 100644 --- a/target/riscv/cpu_helper.c +++ b/target/riscv/cpu_helper.c @@ -1315,7 +1315,7 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int size, bool two_stage_lookup = mmuidx_2stage(mmu_idx); bool two_stage_indirect_error = false; int ret = TRANSLATE_FAIL; -int mode = mmu_idx; +int mode = mmuidx_priv(mmu_idx); /* default TLB page size */ target_ulong tlb_size = TARGET_PAGE_SIZE;
Re: [PATCH] target/riscv: rvv: Remove redudant SEW checking for vector fp narrow/widen instructions
On 3/20/24 04:25, Max Chou wrote: If the checking functions check both the single and double width operators at the same time, then the single width operator checking functions (require_rvf[min]) will check whether the SEW is 8. Signed-off-by: Max Chou --- Reviewed-by: Daniel Henrique Barboza target/riscv/insn_trans/trans_rvv.c.inc | 16 1 file changed, 4 insertions(+), 12 deletions(-) diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc index 19059fea5f..08c22f48cb 100644 --- a/target/riscv/insn_trans/trans_rvv.c.inc +++ b/target/riscv/insn_trans/trans_rvv.c.inc @@ -2333,7 +2333,6 @@ static bool opfvv_widen_check(DisasContext *s, arg_rmrr *a) return require_rvv(s) && require_rvf(s) && require_scale_rvf(s) && - (s->sew != MO_8) && vext_check_isa_ill(s) && vext_check_dss(s, a->rd, a->rs1, a->rs2, a->vm); } @@ -2373,7 +2372,6 @@ static bool opfvf_widen_check(DisasContext *s, arg_rmrr *a) return require_rvv(s) && require_rvf(s) && require_scale_rvf(s) && - (s->sew != MO_8) && vext_check_isa_ill(s) && vext_check_ds(s, a->rd, a->rs2, a->vm); } @@ -2406,7 +2404,6 @@ static bool opfwv_widen_check(DisasContext *s, arg_rmrr *a) return require_rvv(s) && require_rvf(s) && require_scale_rvf(s) && - (s->sew != MO_8) && vext_check_isa_ill(s) && vext_check_dds(s, a->rd, a->rs1, a->rs2, a->vm); } @@ -2446,7 +2443,6 @@ static bool opfwf_widen_check(DisasContext *s, arg_rmrr *a) return require_rvv(s) && require_rvf(s) && require_scale_rvf(s) && - (s->sew != MO_8) && vext_check_isa_ill(s) && vext_check_dd(s, a->rd, a->rs2, a->vm); } @@ -2704,8 +2700,7 @@ static bool opffv_widen_check(DisasContext *s, arg_rmr *a) { return opfv_widen_check(s, a) && require_rvfmin(s) && - require_scale_rvfmin(s) && - (s->sew != MO_8); + require_scale_rvfmin(s); } #define GEN_OPFV_WIDEN_TRANS(NAME, CHECK, HELPER, FRM) \ @@ -2810,16 +2805,14 @@ static bool opffv_narrow_check(DisasContext *s, arg_rmr *a) { return opfv_narrow_check(s, a) && require_rvfmin(s) && - require_scale_rvfmin(s) && - (s->sew != MO_8); + require_scale_rvfmin(s); } static bool opffv_rod_narrow_check(DisasContext *s, arg_rmr *a) { return opfv_narrow_check(s, a) && require_rvf(s) && - require_scale_rvf(s) && - (s->sew != MO_8); + require_scale_rvf(s); } #define GEN_OPFV_NARROW_TRANS(NAME, CHECK, HELPER, FRM)\ @@ -2947,8 +2940,7 @@ static bool freduction_widen_check(DisasContext *s, arg_rmrr *a) { return reduction_widen_check(s, a) && require_rvf(s) && - require_scale_rvf(s) && - (s->sew != MO_8); + require_scale_rvf(s); } GEN_OPFVV_WIDEN_TRANS(vfwredusum_vs, freduction_widen_check)
Re: [PATCH] target/riscv: rvv: Check single width operator for vfncvt.rod.f.f.w
On 3/20/24 04:25, Max Chou wrote: The opfv_narrow_check needs to check the single width float operator by require_rvf. Signed-off-by: Max Chou --- Reviewed-by: Daniel Henrique Barboza target/riscv/insn_trans/trans_rvv.c.inc | 1 + 1 file changed, 1 insertion(+) diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc index 6cb9bc9fde..19059fea5f 100644 --- a/target/riscv/insn_trans/trans_rvv.c.inc +++ b/target/riscv/insn_trans/trans_rvv.c.inc @@ -2817,6 +2817,7 @@ static bool opffv_narrow_check(DisasContext *s, arg_rmr *a) static bool opffv_rod_narrow_check(DisasContext *s, arg_rmr *a) { return opfv_narrow_check(s, a) && + require_rvf(s) && require_scale_rvf(s) && (s->sew != MO_8); }
Re: [PATCH] target/riscv: rvv: Check single width operator for vector fp widen instructions
On 3/20/24 04:25, Max Chou wrote: The require_scale_rvf function only checks the double width operator for the vector floating point widen instructions, so most of the widen checking functions need to add require_rvf for single width operator. The vfwcvt.f.x.v and vfwcvt.f.xu.v instructions convert single width integer to double width float, so the opfxv_widen_check function doesn’t need require_rvf for the single width operator(integer). Signed-off-by: Max Chou --- Reviewed-by: Daniel Henrique Barboza target/riscv/insn_trans/trans_rvv.c.inc | 5 + 1 file changed, 5 insertions(+) diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc index ef568e263d..6cb9bc9fde 100644 --- a/target/riscv/insn_trans/trans_rvv.c.inc +++ b/target/riscv/insn_trans/trans_rvv.c.inc @@ -2331,6 +2331,7 @@ GEN_OPFVF_TRANS(vfrsub_vf, opfvf_check) static bool opfvv_widen_check(DisasContext *s, arg_rmrr *a) { return require_rvv(s) && + require_rvf(s) && require_scale_rvf(s) && (s->sew != MO_8) && vext_check_isa_ill(s) && @@ -2370,6 +2371,7 @@ GEN_OPFVV_WIDEN_TRANS(vfwsub_vv, opfvv_widen_check) static bool opfvf_widen_check(DisasContext *s, arg_rmrr *a) { return require_rvv(s) && + require_rvf(s) && require_scale_rvf(s) && (s->sew != MO_8) && vext_check_isa_ill(s) && @@ -2402,6 +2404,7 @@ GEN_OPFVF_WIDEN_TRANS(vfwsub_vf) static bool opfwv_widen_check(DisasContext *s, arg_rmrr *a) { return require_rvv(s) && + require_rvf(s) && require_scale_rvf(s) && (s->sew != MO_8) && vext_check_isa_ill(s) && @@ -2441,6 +2444,7 @@ GEN_OPFWV_WIDEN_TRANS(vfwsub_wv) static bool opfwf_widen_check(DisasContext *s, arg_rmrr *a) { return require_rvv(s) && + require_rvf(s) && require_scale_rvf(s) && (s->sew != MO_8) && vext_check_isa_ill(s) && @@ -2941,6 +2945,7 @@ GEN_OPFVV_TRANS(vfredmin_vs, freduction_check) static bool freduction_widen_check(DisasContext *s, arg_rmrr *a) { return reduction_widen_check(s, a) && + require_rvf(s) && require_scale_rvf(s) && (s->sew != MO_8); }
Re: [PATCH] target/riscv: rvv: Fix Zvfhmin checking for vfwcvt.f.f.v and vfncvt.f.f.w instructions
On 3/20/24 04:25, Max Chou wrote: According v spec 18.4, only the vfwcvt.f.f.v and vfncvt.f.f.w instructions will be affected by Zvfhmin extension. And the vfwcvt.f.f.v and vfncvt.f.f.w instructions only support the conversions of * From 1*SEW(16/32) to 2*SEW(32/64) * From 2*SEW(32/64) to 1*SEW(16/32) Signed-off-by: Max Chou --- Reviewed-by: Daniel Henrique Barboza target/riscv/insn_trans/trans_rvv.c.inc | 20 ++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc index 7d84e7d812..ef568e263d 100644 --- a/target/riscv/insn_trans/trans_rvv.c.inc +++ b/target/riscv/insn_trans/trans_rvv.c.inc @@ -50,6 +50,22 @@ static bool require_rvf(DisasContext *s) } } +static bool require_rvfmin(DisasContext *s) +{ +if (s->mstatus_fs == EXT_STATUS_DISABLED) { +return false; +} + +switch (s->sew) { +case MO_16: +return s->cfg_ptr->ext_zvfhmin; +case MO_32: +return s->cfg_ptr->ext_zve32f; +default: +return false; +} +} + static bool require_scale_rvf(DisasContext *s) { if (s->mstatus_fs == EXT_STATUS_DISABLED) { @@ -75,8 +91,6 @@ static bool require_scale_rvfmin(DisasContext *s) } switch (s->sew) { -case MO_8: -return s->cfg_ptr->ext_zvfhmin; case MO_16: return s->cfg_ptr->ext_zve32f; case MO_32: @@ -2685,6 +2699,7 @@ static bool opxfv_widen_check(DisasContext *s, arg_rmr *a) static bool opffv_widen_check(DisasContext *s, arg_rmr *a) { return opfv_widen_check(s, a) && + require_rvfmin(s) && require_scale_rvfmin(s) && (s->sew != MO_8); } @@ -2790,6 +2805,7 @@ static bool opfxv_narrow_check(DisasContext *s, arg_rmr *a) static bool opffv_narrow_check(DisasContext *s, arg_rmr *a) { return opfv_narrow_check(s, a) && + require_rvfmin(s) && require_scale_rvfmin(s) && (s->sew != MO_8); }
Re: [PATCH v3 11/49] physmem: Introduce ram_block_discard_guest_memfd_range()
On 20.03.24 18:38, Michael Roth wrote: On Wed, Mar 20, 2024 at 10:37:14AM +0100, David Hildenbrand wrote: On 20.03.24 09:39, Michael Roth wrote: From: Xiaoyao Li When memory page is converted from private to shared, the original private memory is back'ed by guest_memfd. Introduce ram_block_discard_guest_memfd_range() for discarding memory in guest_memfd. Originally-from: Isaku Yamahata Codeveloped-by: Xiaoyao Li "Co-developed-by" Signed-off-by: Xiaoyao Li Reviewed-by: David Hildenbrand Your SOB should go here. --- Changes in v5: - Collect Reviewed-by from David; Changes in in v4: - Drop ram_block_convert_range() and open code its implementation in the next Patch. Signed-off-by: Michael Roth I only received 3 patches from this series, and now I am confused: changelog talks about v5 and this is "PATCH v3" Please make sure to send at least the cover letter along (I might not need the other 46 patches :D ). Sorry for the confusion, you got auto-Cc'd by git, which is good, but not sure there's a good way to make sure everyone gets a copy of the cover letter. I could see how it would help useful to potential reviewers though. I'll try to come up with a script for it and take that approach in the future. A script shared with me in the past to achieve that in most cases: $ cat cc-cmd.sh #!/bin/bash if [[ $1 == *gitsendemail.msg* || $1 == *cover-letter* ]]; then grep ': .* <.*@.*>' -h *.patch | sed 's/^.*: //' | sort | uniq fi And attach to "git send-email ... *.patch": --cc-cmd=./cc-cmd.sh -- Cheers, David / dhildenb
Re: [PATCH v3 19/49] kvm: Make kvm_convert_memory() obey ram_block_discard_is_enabled()
On Wed, Mar 20, 2024 at 05:26:00PM +0100, Paolo Bonzini wrote: > On 3/20/24 09:39, Michael Roth wrote: > > Some subsystems like VFIO might disable ram block discard for > > uncoordinated cases. Since kvm_convert_memory()/guest_memfd don't > > implement a RamDiscardManager handler to convey discard operations to > > various listeners like VFIO. > Because of this, sequences like the > > following can result due to stale IOMMU mappings: > > Alternatively, should guest-memfd memory regions call > ram_block_discard_require(true)? This will prevent VFIO from operating, but > it will avoid consuming twice the memory. > > If desirable, guest-memfd support can be changed to implement an extension > of RamDiscardManager that notifies about private/shared memory changes, and > then guest-memfd would be able to support coordinated discard. But I wonder In an earlier/internal version of the SNP+gmem patches (when there was still a dedicated hostmem-memfd-private backend for restrictedmem/gmem), we had a rough implementation of RamDiscardManager that did this: https://github.com/AMDESE/qemu/blob/snp-latest-gmem-v12/backends/hostmem-memfd-private.c#L75 Now that gmem handling is mostly done transparently to the HostMem backend in use I'm not sure what the right place would be to implement something similar, but maybe it can be done in a more generic way. There were some notable downsides to that approach though that I'm a little hazy on now, but I think they were both kernel limitations: - VFIO seemed to have some limitation where it expects that the DMA mapping for a particular iova will be unmapped/mapped with the same granularity, but for an SNP guest there's no guarantee that if you flip a 2MB page from shared->private, that it won't later be flipped private->shared again but this time with a 4K granularity/sub-range. I think the current code still treats this as an -EINVAL case. So we end up needing to do everything with 4K granularity, which I *think* results in 4K IOMMU page table mappings, but I'd need to confirm. - VFIO doesn't seem to be optimized for this sort of use case and generally expects a much larger granularity and defaults to 64K max DMA entries, so for a 16GB guest you need to configure VFIO with something like: vfio_iommu_type1.dma_entry_limit=4194304 I didn't see any reason to suggest that's problematic but it makes we wonder if there's other stuff me might run into. > if that's doable at all - how common are shared<->private flips, and is it > feasible to change the IOMMU page tables every time? - For OVMF+guest kernel that don't do lazy-acceptance: I think the bulk of the flipping is during boot where most of shared GPA ranges get converted to private memory, and then later on the guest kernel switches memory back to to shared for stuff like SWIOTLB, and after that I think DMA mappings would be fairly stable. - For OVMF+guest kernel that support lazy-acceptance: The first 4GB get converted to private, and the rest remains shared until guest kernel needs to allocate memory from it. I'm not sure if SWIOTLB allocation is optimized to avoid unecessary flipping if it's allocated from that pool of still-shared memory, but normal/private allocations will result in a steady stream of DMA unmap operations as the guest faults in its working set. > > If the real solution is SEV-TIO (which means essentially guest_memfd support > for VFIO), calling ram_block_discard_require(true) may be the simplest > stopgap solution. Hard to guess how cloud vendors will feel about waiting for trusted I/O. It does make sense in the context of CoCo to expect them to wait, but would be nice to have a stop-gap to offer like disabling discard, since it has minimal requirements on the QEMU/VFIO side and might be enough to get early adopters up and running at least. All that said, if you think something based around RamDiscardManager seems tenable given all above then we can re-visit that approach as well. -Mike > > Paolo > > >- convert page shared->private > >- discard shared page > >- convert page private->shared > >- new page is allocated > >- issue DMA operations against that shared page > > > > Address this by taking ram_block_discard_is_enabled() into account when > > deciding whether or not to discard pages. > > > > Signed-off-by: Michael Roth > > --- > > accel/kvm/kvm-all.c | 8 ++-- > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c > > index 53ce4f091e..6ae03c880f 100644 > > --- a/accel/kvm/kvm-all.c > > +++ b/accel/kvm/kvm-all.c > > @@ -2962,10 +2962,14 @@ static int kvm_convert_memory(hwaddr start, hwaddr > > size, bool to_private) > > */ > > return 0; > > } else { > > -ret = ram_block_discard_range(rb, offset, size); > > +ret =
Re: [PATCH v3 11/49] physmem: Introduce ram_block_discard_guest_memfd_range()
On Wed, Mar 20, 2024 at 10:37:14AM +0100, David Hildenbrand wrote: > On 20.03.24 09:39, Michael Roth wrote: > > From: Xiaoyao Li > > > > When memory page is converted from private to shared, the original > > private memory is back'ed by guest_memfd. Introduce > > ram_block_discard_guest_memfd_range() for discarding memory in > > guest_memfd. > > > > Originally-from: Isaku Yamahata > > Codeveloped-by: Xiaoyao Li > > "Co-developed-by" > > > Signed-off-by: Xiaoyao Li > > Reviewed-by: David Hildenbrand > > Your SOB should go here. > > > --- > > Changes in v5: > > - Collect Reviewed-by from David; > > > > Changes in in v4: > > - Drop ram_block_convert_range() and open code its implementation in the > >next Patch. > > > > Signed-off-by: Michael Roth > > I only received 3 patches from this series, and now I am confused: changelog > talks about v5 and this is "PATCH v3" > > Please make sure to send at least the cover letter along (I might not need > the other 46 patches :D ). Sorry for the confusion, you got auto-Cc'd by git, which is good, but not sure there's a good way to make sure everyone gets a copy of the cover letter. I could see how it would help useful to potential reviewers though. I'll try to come up with a script for it and take that approach in the future. -Mike > > -- > Cheers, > > David / dhildenb >
Re: [PATCH 2/3] migration: Drop unnecessary check in ram's pending_exact()
On Wed, Mar 20, 2024 at 08:21:30PM +0100, Nina Schoetterl-Glausch wrote: > On Wed, 2024-03-20 at 14:57 -0400, Peter Xu wrote: > > On Wed, Mar 20, 2024 at 06:51:26PM +0100, Nina Schoetterl-Glausch wrote: > > > On Wed, 2024-01-17 at 15:58 +0800, pet...@redhat.com wrote: > > > > From: Peter Xu > > > > > > > > When the migration frameworks fetches the exact pending sizes, it means > > > > this check: > > > > > > > > remaining_size < s->threshold_size > > > > > > > > Must have been done already, actually at migration_iteration_run(): > > > > > > > > if (must_precopy <= s->threshold_size) { > > > > qemu_savevm_state_pending_exact(_precopy, _postcopy); > > > > > > > > That should be after one round of ram_state_pending_estimate(). It > > > > makes > > > > the 2nd check meaningless and can be dropped. > > > > > > > > To say it in another way, when reaching ->state_pending_exact(), we > > > > unconditionally sync dirty bits for precopy. > > > > > > > > Then we can drop migrate_get_current() there too. > > > > > > > > Signed-off-by: Peter Xu > > > > > > Hi Peter, > > > > Hi, Nina, > > > > > > > > could you have a look at this issue: > > > https://gitlab.com/qemu-project/qemu/-/issues/1565 > > > > > > which I reopened. Previous thread here: > > > > > > https://lore.kernel.org/qemu-devel/20230324184129.3119575-1-...@linux.ibm.com/ > > > > > > I'm seeing migration failures with s390x TCG again, which look the same > > > to me > > > as those a while back. > > > > I'm still quite confused how that could be caused of this. > > > > What you described in the previous bug report seems to imply some page was > > leftover in migration so some page got corrupted after migrated. > > > > However what this patch mostly does is it can sync more than before even if > > I overlooked the condition check there (I still think the check is > > redundant, there's one outlier when remaining_size == threshold_size, but I > > don't think it should matter here as of now). It'll make more sense if > > this patch made the sync less, but that's not the case but vice versa. > > [...] > > > In the previous discussion, you mentioned that you bisected to the commit > > and also verified the fix. Now you also mentioned in the bz that you can't > > reporduce this bug manually. > > > > Is it still possible to be reproduced with some scripts? Do you also mean > > that it's harder to reproduce comparing to before? In all cases, some way > > to reproduce it would definitely be helpful. > > I tried running the kvm-unit-test a bunch of times in a loop and couldn't > trigger a failure. I just tried again on a different system and managed just > fine, yay. No idea why it wouldn't on the first system tho. There's probably still a bug somewhere. If reproduction rate changed, it's also a sign that it might not be directly relevant to this change, as otherwise it should reproduce the same as before. > > > > Even if we want to revert this change, we'll need to know whether this will > > fix your case so we need something to verify it before a revert. I'll > > consider that the last though as I had a feeling this is papering over > > something else. > > I can check if I can reproduce the issue before & after b0504edd ("migration: > Drop unnecessary check in ram's pending_exact()"). > I can also check if I can reproduce it on x86, that worked last time. > Anything else? Ideas on how to pinpoint where the corruption happens? I don't have a solid clue yet, but more information of the single case where it reproduced could help. I saw from the bug link that the cmdline is pretty simple. However still not sure of something that can be relevant. E.g., did you use postcopy (including when postcopy-ram enabled but precopy completed)? Is there any special device, like s390's CMMA (would that simplest cmdline include such a device; apologies, I have zero knowledge there before today)? I _think_ when reading the code I already found something quite unusual, but only when postcopy is selected: I notice postcopy will frequently sync dirty bitmap while it doesn't really necessarily need to, because ram_state_pending_estimate() will report all ram as "can_postcopy"; it means it's highly likely that this check will 99.999% always be true simply because must_precopy can in most cases be zero: if (must_precopy <= s->threshold_size) { < here qemu_savevm_state_pending_exact(_precopy, _postcopy); pending_size = must_precopy + can_postcopy; trace_migrate_pending_exact(pending_size, must_precopy, can_postcopy); } I need to think more of this, but this doesn't sound right at all. There's no such issue with precopy-only, and I'm surprised it is like that for years. -- Peter Xu
Re: [PATCH 3/4] hw/nmi: Remove @cpu_index argument from NMIClass::nmi_handler()
On Wed, 20 Mar 2024 at 19:05, Markus Armbruster wrote: > > Philippe Mathieu-Daudé writes: > > > On 20/3/24 14:23, Peter Maydell wrote: > >> On Tue, 20 Feb 2024 at 15:09, Philippe Mathieu-Daudé > >> wrote: > >>> > >>> Only s390x was using the 'cpu_index' argument, but since the > >>> previous commit it isn't anymore (it use the first cpu). > >>> Since this argument is now completely unused, remove it. Have > >>> the callback return a boolean indicating failure. > >>> > >>> Signed-off-by: Philippe Mathieu-Daudé > >>> --- > >>> include/hw/nmi.h | 11 ++- > >>> hw/core/nmi.c | 3 +-- > >>> hw/hppa/machine.c | 8 +--- > >>> hw/i386/x86.c | 7 --- > >>> hw/intc/m68k_irqc.c| 6 -- > >>> hw/m68k/q800-glue.c| 6 -- > >>> hw/misc/macio/gpio.c | 6 -- > >>> hw/ppc/pnv.c | 6 -- > >>> hw/ppc/spapr.c | 6 -- > >>> hw/s390x/s390-virtio-ccw.c | 6 -- > >>> 10 files changed, 44 insertions(+), 21 deletions(-) > >>> > >>> diff --git a/include/hw/nmi.h b/include/hw/nmi.h > >>> index fff41bebc6..c70db941c9 100644 > >>> --- a/include/hw/nmi.h > >>> +++ b/include/hw/nmi.h > >>> @@ -37,7 +37,16 @@ typedef struct NMIState NMIState; > >>> struct NMIClass { > >>> InterfaceClass parent_class; > >>> > >>> -void (*nmi_monitor_handler)(NMIState *n, int cpu_index, Error > >>> **errp); > >>> +/** > >>> + * nmi_handler: Callback to handle NMI notifications. > >>> + * > >>> + * @n: Class #NMIState state > >>> + * @errp: pointer to error object > >>> + * > >>> + * On success, return %true. > >>> + * On failure, store an error through @errp and return %false. > >>> + */ > >>> +bool (*nmi_handler)(NMIState *n, Error **errp); > >> Any particular reason to change the method name here? > >> Do we really need to indicate failure both through the bool return > >> and the Error** ? > > > > No, but this is the style *recommended* by the Error API since > > commit e3fe3988d7 ("error: Document Error API usage rules"): > > > > error: Document Error API usage rules > > > > This merely codifies existing practice, with one exception: the rule > > advising against returning void, where existing practice is mixed. > > > > When the Error API was created, we adopted the (unwritten) rule to > > return void when the function returns no useful value on success, > > unlike GError, which recommends to return true on success and false > > on error then. > > > > [...] > > > > Make the rule advising against returning void official by putting it > > in writing. This will hopefully reduce confusion. > > > > * - Whenever practical, also return a value that indicates success / > > * failure. This can make the error checking more concise, and can > > * avoid useless error object creation and destruction. Note that > > It's the difference between > > if (!frobnicate(arg, errp)) { > return; > } > > and > > frobnicate(arg, ); > if (err) { > error_propagate(errp, err); > return; > } > > Readabilty dies by a thousand cuts. > > GError got this right. We deviated from it for Error, until we > understood why it's right. > > Another win: _abort gives you a backtrace into frobnicate() with > the former, and into error_propagate() with the latter. Fair enough. (When I made the comment I was vaguely wondering if we wanted to keep the return value available to distinguish "this hook has handled the NMI, don't keep iterating" from "no error, but you should keep iterating through other handlers". But I think in the end my feeling is we should always stop after the first NMI handler we find regardless. -- PMM
Re: [PATCH 2/3] migration: Drop unnecessary check in ram's pending_exact()
On Wed, 2024-03-20 at 14:57 -0400, Peter Xu wrote: > On Wed, Mar 20, 2024 at 06:51:26PM +0100, Nina Schoetterl-Glausch wrote: > > On Wed, 2024-01-17 at 15:58 +0800, pet...@redhat.com wrote: > > > From: Peter Xu > > > > > > When the migration frameworks fetches the exact pending sizes, it means > > > this check: > > > > > > remaining_size < s->threshold_size > > > > > > Must have been done already, actually at migration_iteration_run(): > > > > > > if (must_precopy <= s->threshold_size) { > > > qemu_savevm_state_pending_exact(_precopy, _postcopy); > > > > > > That should be after one round of ram_state_pending_estimate(). It makes > > > the 2nd check meaningless and can be dropped. > > > > > > To say it in another way, when reaching ->state_pending_exact(), we > > > unconditionally sync dirty bits for precopy. > > > > > > Then we can drop migrate_get_current() there too. > > > > > > Signed-off-by: Peter Xu > > > > Hi Peter, > > Hi, Nina, > > > > > could you have a look at this issue: > > https://gitlab.com/qemu-project/qemu/-/issues/1565 > > > > which I reopened. Previous thread here: > > > > https://lore.kernel.org/qemu-devel/20230324184129.3119575-1-...@linux.ibm.com/ > > > > I'm seeing migration failures with s390x TCG again, which look the same to > > me > > as those a while back. > > I'm still quite confused how that could be caused of this. > > What you described in the previous bug report seems to imply some page was > leftover in migration so some page got corrupted after migrated. > > However what this patch mostly does is it can sync more than before even if > I overlooked the condition check there (I still think the check is > redundant, there's one outlier when remaining_size == threshold_size, but I > don't think it should matter here as of now). It'll make more sense if > this patch made the sync less, but that's not the case but vice versa. [...] > In the previous discussion, you mentioned that you bisected to the commit > and also verified the fix. Now you also mentioned in the bz that you can't > reporduce this bug manually. > > Is it still possible to be reproduced with some scripts? Do you also mean > that it's harder to reproduce comparing to before? In all cases, some way > to reproduce it would definitely be helpful. I tried running the kvm-unit-test a bunch of times in a loop and couldn't trigger a failure. I just tried again on a different system and managed just fine, yay. No idea why it wouldn't on the first system tho. > > Even if we want to revert this change, we'll need to know whether this will > fix your case so we need something to verify it before a revert. I'll > consider that the last though as I had a feeling this is papering over > something else. I can check if I can reproduce the issue before & after b0504edd ("migration: Drop unnecessary check in ram's pending_exact()"). I can also check if I can reproduce it on x86, that worked last time. Anything else? Ideas on how to pinpoint where the corruption happens? > > Thanks, >
Re: [PATCH 1/2] hw/arm: Add support for stm32g000 SoC family
Felipe Balbi writes: > +qdev_prop_set_uint8(armv7m, "num-prio-bits", 4); Hi Felipe. This should be 2, not 4. From RM0454 section 11.1 on page 250: "4 programmable priority levels (2 bits of interrupt priority are used)". Sam -- Samuel Tardieu
Re: [PATCH 3/4] hw/nmi: Remove @cpu_index argument from NMIClass::nmi_handler()
Philippe Mathieu-Daudé writes: > On 20/3/24 14:23, Peter Maydell wrote: >> On Tue, 20 Feb 2024 at 15:09, Philippe Mathieu-Daudé >> wrote: >>> >>> Only s390x was using the 'cpu_index' argument, but since the >>> previous commit it isn't anymore (it use the first cpu). >>> Since this argument is now completely unused, remove it. Have >>> the callback return a boolean indicating failure. >>> >>> Signed-off-by: Philippe Mathieu-Daudé >>> --- >>> include/hw/nmi.h | 11 ++- >>> hw/core/nmi.c | 3 +-- >>> hw/hppa/machine.c | 8 +--- >>> hw/i386/x86.c | 7 --- >>> hw/intc/m68k_irqc.c| 6 -- >>> hw/m68k/q800-glue.c| 6 -- >>> hw/misc/macio/gpio.c | 6 -- >>> hw/ppc/pnv.c | 6 -- >>> hw/ppc/spapr.c | 6 -- >>> hw/s390x/s390-virtio-ccw.c | 6 -- >>> 10 files changed, 44 insertions(+), 21 deletions(-) >>> >>> diff --git a/include/hw/nmi.h b/include/hw/nmi.h >>> index fff41bebc6..c70db941c9 100644 >>> --- a/include/hw/nmi.h >>> +++ b/include/hw/nmi.h >>> @@ -37,7 +37,16 @@ typedef struct NMIState NMIState; >>> struct NMIClass { >>> InterfaceClass parent_class; >>> >>> -void (*nmi_monitor_handler)(NMIState *n, int cpu_index, Error **errp); >>> +/** >>> + * nmi_handler: Callback to handle NMI notifications. >>> + * >>> + * @n: Class #NMIState state >>> + * @errp: pointer to error object >>> + * >>> + * On success, return %true. >>> + * On failure, store an error through @errp and return %false. >>> + */ >>> +bool (*nmi_handler)(NMIState *n, Error **errp); >> Any particular reason to change the method name here? >> Do we really need to indicate failure both through the bool return >> and the Error** ? > > No, but this is the style *recommended* by the Error API since > commit e3fe3988d7 ("error: Document Error API usage rules"): > > error: Document Error API usage rules > > This merely codifies existing practice, with one exception: the rule > advising against returning void, where existing practice is mixed. > > When the Error API was created, we adopted the (unwritten) rule to > return void when the function returns no useful value on success, > unlike GError, which recommends to return true on success and false > on error then. > > [...] > > Make the rule advising against returning void official by putting it > in writing. This will hopefully reduce confusion. > > * - Whenever practical, also return a value that indicates success / > * failure. This can make the error checking more concise, and can > * avoid useless error object creation and destruction. Note that It's the difference between if (!frobnicate(arg, errp)) { return; } and frobnicate(arg, ); if (err) { error_propagate(errp, err); return; } Readabilty dies by a thousand cuts. GError got this right. We deviated from it for Error, until we understood why it's right. Another win: _abort gives you a backtrace into frobnicate() with the former, and into error_propagate() with the latter. > * we still have many functions returning void. We recommend > * • bool-valued functions return true on success / false on failure, > * • pointer-valued functions return non-null / null pointer, and > * • integer-valued functions return non-negative / negative. > > Anyway I'll respin removing @cpu_index as a single change :)
Re: [PATCH 2/3] migration: Drop unnecessary check in ram's pending_exact()
On Wed, Mar 20, 2024 at 06:51:26PM +0100, Nina Schoetterl-Glausch wrote: > On Wed, 2024-01-17 at 15:58 +0800, pet...@redhat.com wrote: > > From: Peter Xu > > > > When the migration frameworks fetches the exact pending sizes, it means > > this check: > > > > remaining_size < s->threshold_size > > > > Must have been done already, actually at migration_iteration_run(): > > > > if (must_precopy <= s->threshold_size) { > > qemu_savevm_state_pending_exact(_precopy, _postcopy); > > > > That should be after one round of ram_state_pending_estimate(). It makes > > the 2nd check meaningless and can be dropped. > > > > To say it in another way, when reaching ->state_pending_exact(), we > > unconditionally sync dirty bits for precopy. > > > > Then we can drop migrate_get_current() there too. > > > > Signed-off-by: Peter Xu > > Hi Peter, Hi, Nina, > > could you have a look at this issue: > https://gitlab.com/qemu-project/qemu/-/issues/1565 > > which I reopened. Previous thread here: > > https://lore.kernel.org/qemu-devel/20230324184129.3119575-1-...@linux.ibm.com/ > > I'm seeing migration failures with s390x TCG again, which look the same to me > as those a while back. I'm still quite confused how that could be caused of this. What you described in the previous bug report seems to imply some page was leftover in migration so some page got corrupted after migrated. However what this patch mostly does is it can sync more than before even if I overlooked the condition check there (I still think the check is redundant, there's one outlier when remaining_size == threshold_size, but I don't think it should matter here as of now). It'll make more sense if this patch made the sync less, but that's not the case but vice versa. > > > --- > > migration/ram.c | 9 - > > 1 file changed, 4 insertions(+), 5 deletions(-) > > > > diff --git a/migration/ram.c b/migration/ram.c > > index c0cdcccb75..d5b7cd5ac2 100644 > > --- a/migration/ram.c > > +++ b/migration/ram.c > > @@ -3213,21 +3213,20 @@ static void ram_state_pending_estimate(void > > *opaque, uint64_t *must_precopy, > > static void ram_state_pending_exact(void *opaque, uint64_t *must_precopy, > > uint64_t *can_postcopy) > > { > > -MigrationState *s = migrate_get_current(); > > RAMState **temp = opaque; > > RAMState *rs = *temp; > > +uint64_t remaining_size; > > > > -uint64_t remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE; > > - > > -if (!migration_in_postcopy() && remaining_size < s->threshold_size) { > > +if (!migration_in_postcopy()) { > > bql_lock(); > > WITH_RCU_READ_LOCK_GUARD() { > > migration_bitmap_sync_precopy(rs, false); > > } > > bql_unlock(); > > -remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE; > > } > > > > +remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE; > > + > > if (migrate_postcopy_ram()) { > > /* We can do postcopy, and all the data is postcopiable */ > > *can_postcopy += remaining_size; > > This basically reverts 28ef5339c3 ("migration: fix > ram_state_pending_exact()"), which originally > made the issue disappear. > > Any thoughts on the matter appreciated. In the previous discussion, you mentioned that you bisected to the commit and also verified the fix. Now you also mentioned in the bz that you can't reporduce this bug manually. Is it still possible to be reproduced with some scripts? Do you also mean that it's harder to reproduce comparing to before? In all cases, some way to reproduce it would definitely be helpful. Even if we want to revert this change, we'll need to know whether this will fix your case so we need something to verify it before a revert. I'll consider that the last though as I had a feeling this is papering over something else. Thanks, -- Peter Xu
[PATCH 1/2] hw/arm: Add support for stm32g000 SoC family
From: Felipe Balbi Minimal support with USARTs and SPIs working. This SoC will be used to create and nucleo-g071rb board. Signed-off-by: Felipe Balbi --- hw/arm/Kconfig | 6 + hw/arm/meson.build | 1 + hw/arm/stm32g000_soc.c | 246 + include/hw/arm/stm32g000_soc.h | 62 + 4 files changed, 315 insertions(+) create mode 100644 hw/arm/stm32g000_soc.c create mode 100644 include/hw/arm/stm32g000_soc.h diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig index 893a7bff66b9..28a46d2b1ad3 100644 --- a/hw/arm/Kconfig +++ b/hw/arm/Kconfig @@ -463,6 +463,12 @@ config STM32F405_SOC select STM32F4XX_SYSCFG select STM32F4XX_EXTI +config STM32G000_SOC +bool +select ARM_V7M +select STM32F2XX_USART +select STM32F2XX_SPI + config B_L475E_IOT01A bool default y diff --git a/hw/arm/meson.build b/hw/arm/meson.build index 6808135c1f79..9c4137a988e1 100644 --- a/hw/arm/meson.build +++ b/hw/arm/meson.build @@ -34,6 +34,7 @@ arm_ss.add(when: ['CONFIG_RASPI', 'TARGET_AARCH64'], if_true: files('bcm2838.c', arm_ss.add(when: 'CONFIG_STM32F100_SOC', if_true: files('stm32f100_soc.c')) arm_ss.add(when: 'CONFIG_STM32F205_SOC', if_true: files('stm32f205_soc.c')) arm_ss.add(when: 'CONFIG_STM32F405_SOC', if_true: files('stm32f405_soc.c')) +arm_ss.add(when: 'CONFIG_STM32G000_SOC', if_true: files('stm32g000_soc.c')) arm_ss.add(when: 'CONFIG_B_L475E_IOT01A', if_true: files('b-l475e-iot01a.c')) arm_ss.add(when: 'CONFIG_STM32L4X5_SOC', if_true: files('stm32l4x5_soc.c')) arm_ss.add(when: 'CONFIG_XLNX_ZYNQMP_ARM', if_true: files('xlnx-zynqmp.c', 'xlnx-zcu102.c')) diff --git a/hw/arm/stm32g000_soc.c b/hw/arm/stm32g000_soc.c new file mode 100644 index ..8f97d8c89ad9 --- /dev/null +++ b/hw/arm/stm32g000_soc.c @@ -0,0 +1,246 @@ +/* + * STM32G000 SoC + * + * Copyright (c) 2024 Felipe Balbi + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include "qemu/osdep.h" +#include "qapi/error.h" +#include "qemu/module.h" +#include "hw/arm/boot.h" +#include "exec/address-spaces.h" +#include "hw/arm/stm32g000_soc.h" +#include "hw/qdev-properties.h" +#include "hw/qdev-clock.h" +#include "hw/misc/unimp.h" +#include "sysemu/sysemu.h" + +/* stm32g000_soc implementation is derived from stm32f100_soc */ + +struct stm32g0_ip_config { +const char *name; +uint32_t addr; +uint32_t irq; +}; + +#define STM32G0_DEFINE_IP(n, a, i)\ +{ \ +.name = (n), \ +.addr = (a), \ +.irq = (i), \ +} + +static const struct stm32g0_ip_config usart_config[STM_NUM_USARTS] = { +STM32G0_DEFINE_IP("USART1", 0x40013800, 27), +STM32G0_DEFINE_IP("USART2", 0x40004000, 28), +STM32G0_DEFINE_IP("USART3", 0x40004400, 29), +STM32G0_DEFINE_IP("USART4", 0x40004800, 29), +STM32G0_DEFINE_IP("USART5", 0x40004c00, 29), +STM32G0_DEFINE_IP("USART6", 0x40005000, 29), +STM32G0_DEFINE_IP("LPUSART1", 0x40008000, 29), +STM32G0_DEFINE_IP("LPUSART2", 0x40008400, 28), +}; + +static const struct stm32g0_ip_config spi_config[STM_NUM_SPIS] = { +STM32G0_DEFINE_IP("SPI1", 0x40013000, 25), +STM32G0_DEFINE_IP("SPI2", 0x40003800, 26), +/* STM32G0_DEFINE_IP("SPI3", 0x4003c000, 26), only on STM32G0B1xx and STM32G0C1xx */ +}; + +static void stm32g000_soc_initfn(Object *obj) +{ +STM32G000State *s = STM32G000_SOC(obj); +int i; + +object_initialize_child(obj, "armv7m", >armv7m, TYPE_ARMV7M); + +for (i = 0; i < STM_NUM_USARTS; i++) { +object_initialize_child(obj, "usart[*]", >usart[i], +TYPE_STM32F2XX_USART); +} + +for (i = 0; i < STM_NUM_SPIS; i++) { +object_initialize_child(obj, "spi[*]", >spi[i], TYPE_STM32F2XX_SPI); +} + +s->sysclk = qdev_init_clock_in(DEVICE(s), "sysclk", NULL, NULL, 0); +s->refclk =
[PATCH 2/2] hw/arm: Add nucleo-g071rb board
From: Felipe Balbi This board is based around STM32G071RB SoC, a Cortex-M0 based device. More information can be found at: https://www.st.com/en/product/nucleo-g071rb.html Signed-off-by: Felipe Balbi --- hw/arm/Kconfig | 6 hw/arm/meson.build | 1 + hw/arm/nucleo-g071rb.c | 70 ++ 3 files changed, 77 insertions(+) create mode 100644 hw/arm/nucleo-g071rb.c diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig index 28a46d2b1ad3..5938bb8208a1 100644 --- a/hw/arm/Kconfig +++ b/hw/arm/Kconfig @@ -310,6 +310,12 @@ config STM32VLDISCOVERY depends on TCG && ARM select STM32F100_SOC +config NUCLEO_G071RB +bool +default y +depends on TCG && ARM +select STM32G000_SOC + config STRONGARM bool select PXA2XX diff --git a/hw/arm/meson.build b/hw/arm/meson.build index 9c4137a988e1..580c2d55fc3f 100644 --- a/hw/arm/meson.build +++ b/hw/arm/meson.build @@ -18,6 +18,7 @@ arm_ss.add(when: 'CONFIG_REALVIEW', if_true: files('realview.c')) arm_ss.add(when: 'CONFIG_SBSA_REF', if_true: files('sbsa-ref.c')) arm_ss.add(when: 'CONFIG_STELLARIS', if_true: files('stellaris.c')) arm_ss.add(when: 'CONFIG_STM32VLDISCOVERY', if_true: files('stm32vldiscovery.c')) +arm_ss.add(when: 'CONFIG_NUCLEO_G071RB', if_true: files('nucleo-g071rb.c')) arm_ss.add(when: 'CONFIG_ZYNQ', if_true: files('xilinx_zynq.c')) arm_ss.add(when: 'CONFIG_SABRELITE', if_true: files('sabrelite.c')) diff --git a/hw/arm/nucleo-g071rb.c b/hw/arm/nucleo-g071rb.c new file mode 100644 index ..580b52bacf2c --- /dev/null +++ b/hw/arm/nucleo-g071rb.c @@ -0,0 +1,70 @@ +/* + * ST Nucleo G071RB + * + * Copyright (c) 2024 Felipe Balbi + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include "qemu/osdep.h" +#include "qapi/error.h" +#include "hw/boards.h" +#include "hw/qdev-properties.h" +#include "hw/qdev-clock.h" +#include "qemu/error-report.h" +#include "hw/arm/stm32g000_soc.h" +#include "hw/arm/boot.h" + +/* nucleo_g071rb implementation is derived from olimex-stm32-h405.c */ + +/* Main SYSCLK frequency in Hz (48MHz) */ +#define SYSCLK_FRQ 4800ULL + +static void nucleo_g071rb_init(MachineState *machine) +{ +DeviceState *dev; +Clock *sysclk; + +/* This clock doesn't need migration because it is fixed-frequency */ +sysclk = clock_new(OBJECT(machine), "SYSCLK"); +clock_set_hz(sysclk, SYSCLK_FRQ); + +dev = qdev_new(TYPE_STM32G000_SOC); +object_property_add_child(OBJECT(machine), "soc", OBJECT(dev)); +qdev_connect_clock_in(dev, "sysclk", sysclk); +sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), _fatal); + +armv7m_load_kernel(ARM_CPU(first_cpu), + machine->kernel_filename, + 0, FLASH_SIZE); +} + +static void nucleo_g071rb_machine_init(MachineClass *mc) +{ +static const char * const valid_cpu_types[] = { +ARM_CPU_TYPE_NAME("cortex-m0"), +NULL +}; + +mc->desc = "ST Nucleo-G071RB (Cortex-M0)"; +mc->init = nucleo_g071rb_init; +mc->valid_cpu_types = valid_cpu_types; +} + +DEFINE_MACHINE("nucleo-g071rb", nucleo_g071rb_machine_init) -- 2.44.0
[PATCH 0/2] Add support for STM32G0 SoC family
From: Felipe Balbi Hi all, These two patches add support for STM32G0 family and nucleo-g071rb board. Patches have been tested with minimal embedded rust examples. Felipe Balbi (2): hw/arm: Add support for stm32g000 SoC family hw/arm: Add nucleo-g071rb board hw/arm/Kconfig | 12 ++ hw/arm/meson.build | 2 + hw/arm/nucleo-g071rb.c | 70 ++ hw/arm/stm32g000_soc.c | 246 + include/hw/arm/stm32g000_soc.h | 62 + 5 files changed, 392 insertions(+) create mode 100644 hw/arm/nucleo-g071rb.c create mode 100644 hw/arm/stm32g000_soc.c create mode 100644 include/hw/arm/stm32g000_soc.h -- 2.44.0
[PATCH] coroutine: reserve 5,000 mappings
Daniel P. Berrangé pointed out that the coroutine pool size heuristic is very conservative. Instead of halving max_map_count, he suggested reserving 5,000 mappings for non-coroutine users based on observations of guests he has access to. Fixes: 86a637e48104 ("coroutine: cap per-thread local pool size") Signed-off-by: Stefan Hajnoczi --- util/qemu-coroutine.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c index 2790959eaf..eb4eebefdf 100644 --- a/util/qemu-coroutine.c +++ b/util/qemu-coroutine.c @@ -377,12 +377,17 @@ static unsigned int get_global_pool_hard_max_size(void) NULL) && qemu_strtoi(contents, NULL, 10, _map_count) == 0) { /* - * This is a conservative upper bound that avoids exceeding - * max_map_count. Leave half for non-coroutine users like library - * dependencies, vhost-user, etc. Each coroutine takes up 2 VMAs so - * halve the amount again. + * This is an upper bound that avoids exceeding max_map_count. Leave a + * fixed amount for non-coroutine users like library dependencies, + * vhost-user, etc. Each coroutine takes up 2 VMAs so halve the + * remaining amount. */ -return max_map_count / 4; +if (max_map_count > 5000) { +return (max_map_count - 5000) / 2; +} else { +/* Disable the global pool but threads still have local pools */ +return 0; +} } #endif -- 2.44.0
Re: [PATCH v3 48/49] hw/i386/sev: Use guest_memfd for legacy ROMs
On Wed, Mar 20, 2024 at 03:39:44AM -0500, Michael Roth wrote: > TODO: make this SNP-specific if TDX disables legacy ROMs in general TDX disables pc.rom, not disable isa-bios. IIRC, TDX doesn't need pc pflash. Xiaoyao can chime in. Thanks, > > Current SNP guest kernels will attempt to access these regions with > with C-bit set, so guest_memfd is needed to handle that. Otherwise, > kvm_convert_memory() will fail when the guest kernel tries to access it > and QEMU attempts to call KVM_SET_MEMORY_ATTRIBUTES to set these ranges > to private. > > Whether guests should actually try to access ROM regions in this way (or > need to deal with legacy ROM regions at all), is a separate issue to be > addressed on kernel side, but current SNP guest kernels will exhibit > this behavior and so this handling is needed to allow QEMU to continue > running existing SNP guest kernels. > > Signed-off-by: Michael Roth > --- > hw/i386/pc.c | 13 + > hw/i386/pc_sysfw.c | 13 ++--- > 2 files changed, 19 insertions(+), 7 deletions(-) > > diff --git a/hw/i386/pc.c b/hw/i386/pc.c > index feb7a93083..5feaeb43ee 100644 > --- a/hw/i386/pc.c > +++ b/hw/i386/pc.c > @@ -1011,10 +1011,15 @@ void pc_memory_init(PCMachineState *pcms, > pc_system_firmware_init(pcms, rom_memory); > > option_rom_mr = g_malloc(sizeof(*option_rom_mr)); > -memory_region_init_ram(option_rom_mr, NULL, "pc.rom", PC_ROM_SIZE, > - _fatal); > -if (pcmc->pci_enabled) { > -memory_region_set_readonly(option_rom_mr, true); > +if (machine_require_guest_memfd(machine)) { > +memory_region_init_ram_guest_memfd(option_rom_mr, NULL, "pc.rom", > + PC_ROM_SIZE, _fatal); > +} else { > +memory_region_init_ram(option_rom_mr, NULL, "pc.rom", PC_ROM_SIZE, > + _fatal); > +if (pcmc->pci_enabled) { > +memory_region_set_readonly(option_rom_mr, true); > +} > } > memory_region_add_subregion_overlap(rom_memory, > PC_ROM_MIN_VGA, > diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c > index 9dbb3f7337..850f86edd4 100644 > --- a/hw/i386/pc_sysfw.c > +++ b/hw/i386/pc_sysfw.c > @@ -54,8 +54,13 @@ static void pc_isa_bios_init(MemoryRegion *rom_memory, > /* map the last 128KB of the BIOS in ISA space */ > isa_bios_size = MIN(flash_size, 128 * KiB); > isa_bios = g_malloc(sizeof(*isa_bios)); > -memory_region_init_ram(isa_bios, NULL, "isa-bios", isa_bios_size, > - _fatal); > +if (machine_require_guest_memfd(current_machine)) { > +memory_region_init_ram_guest_memfd(isa_bios, NULL, "isa-bios", > + isa_bios_size, _fatal); > +} else { > +memory_region_init_ram(isa_bios, NULL, "isa-bios", isa_bios_size, > + _fatal); > +} > memory_region_add_subregion_overlap(rom_memory, > 0x10 - isa_bios_size, > isa_bios, > @@ -68,7 +73,9 @@ static void pc_isa_bios_init(MemoryRegion *rom_memory, > ((uint8_t*)flash_ptr) + (flash_size - isa_bios_size), > isa_bios_size); > > -memory_region_set_readonly(isa_bios, true); > +if (!machine_require_guest_memfd(current_machine)) { > +memory_region_set_readonly(isa_bios, true); > +} > } > > static PFlashCFI01 *pc_pflash_create(PCMachineState *pcms, > -- > 2.25.1 > > -- Isaku Yamahata
Re: [PATCH 2/3] migration: Drop unnecessary check in ram's pending_exact()
I cc'ed Juan, but it looks like he is no longer with Redhat.
Re: [PATCH v3 40/49] hw/i386/sev: Add function to get SEV metadata from OVMF header
On Wed, Mar 20, 2024 at 03:39:36AM -0500, Michael Roth wrote: > From: Brijesh Singh > > A recent version of OVMF expanded the reset vector GUID list to add > SEV-specific metadata GUID. The SEV metadata describes the reserved > memory regions such as the secrets and CPUID page used during the SEV-SNP > guest launch. > > The pc_system_get_ovmf_sev_metadata_ptr() is used to retieve the SEV > metadata pointer from the OVMF GUID list. > > Signed-off-by: Brijesh Singh > Signed-off-by: Michael Roth > --- > hw/i386/pc_sysfw_ovmf.c | 33 + > include/hw/i386/pc.h| 26 ++ > 2 files changed, 59 insertions(+) > > diff --git a/hw/i386/pc_sysfw_ovmf.c b/hw/i386/pc_sysfw_ovmf.c > index 07a4c267fa..32efa34614 100644 > --- a/hw/i386/pc_sysfw_ovmf.c > +++ b/hw/i386/pc_sysfw_ovmf.c > @@ -35,6 +35,31 @@ static const int bytes_after_table_footer = 32; > static bool ovmf_flash_parsed; > static uint8_t *ovmf_table; > static int ovmf_table_len; > +static OvmfSevMetadata *ovmf_sev_metadata_table; > + > +#define OVMF_SEV_META_DATA_GUID "dc886566-984a-4798-A75e-5585a7bf67cc" > +typedef struct __attribute__((__packed__)) OvmfSevMetadataOffset { > +uint32_t offset; > +} OvmfSevMetadataOffset; > + > +static void pc_system_parse_sev_metadata(uint8_t *flash_ptr, size_t > flash_size) > +{ > +OvmfSevMetadata *metadata; > +OvmfSevMetadataOffset *data; > + > +if (!pc_system_ovmf_table_find(OVMF_SEV_META_DATA_GUID, (uint8_t > **), > + NULL)) { > +return; > +} > + > +metadata = (OvmfSevMetadata *)(flash_ptr + flash_size - data->offset); > +if (memcmp(metadata->signature, "ASEV", 4) != 0) { > +return; > +} > + > +ovmf_sev_metadata_table = g_malloc(metadata->len); > +memcpy(ovmf_sev_metadata_table, metadata, metadata->len); > +} > > void pc_system_parse_ovmf_flash(uint8_t *flash_ptr, size_t flash_size) > { > @@ -90,6 +115,9 @@ void pc_system_parse_ovmf_flash(uint8_t *flash_ptr, size_t > flash_size) > */ > memcpy(ovmf_table, ptr - tot_len, tot_len); > ovmf_table += tot_len; > + > +/* Copy the SEV metadata table (if exist) */ > +pc_system_parse_sev_metadata(flash_ptr, flash_size); > } Can we move this call to x86_firmware_configure() @ pc_sysfw.c, and move sev specific bits to somewhere to sev specific file? We don't have to parse sev metadata for non-SEV case, right? We don't have to touch common ovmf file. It also will be consistent with tdx case. TDX patch series adds tdx_parse_tdvf() to x86_firmware_configure(). thanks, > > /** > @@ -159,3 +187,8 @@ bool pc_system_ovmf_table_find(const char *entry, uint8_t > **data, > } > return false; > } > + > +OvmfSevMetadata *pc_system_get_ovmf_sev_metadata_ptr(void) > +{ > +return ovmf_sev_metadata_table; > +} > diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h > index fb1d4106e5..df9a61540d 100644 > --- a/include/hw/i386/pc.h > +++ b/include/hw/i386/pc.h > @@ -163,6 +163,32 @@ void pc_acpi_smi_interrupt(void *opaque, int irq, int > level); > #define PCI_HOST_ABOVE_4G_MEM_SIZE "above-4g-mem-size" > #define PCI_HOST_PROP_SMM_RANGES "smm-ranges" > > +typedef enum { > +SEV_DESC_TYPE_UNDEF, > +/* The section contains the region that must be validated by the VMM. */ > +SEV_DESC_TYPE_SNP_SEC_MEM, > +/* The section contains the SNP secrets page */ > +SEV_DESC_TYPE_SNP_SECRETS, > +/* The section contains address that can be used as a CPUID page */ > +SEV_DESC_TYPE_CPUID, > + > +} ovmf_sev_metadata_desc_type; > + > +typedef struct __attribute__((__packed__)) OvmfSevMetadataDesc { > +uint32_t base; > +uint32_t len; > +ovmf_sev_metadata_desc_type type; > +} OvmfSevMetadataDesc; > + > +typedef struct __attribute__((__packed__)) OvmfSevMetadata { > +uint8_t signature[4]; > +uint32_t len; > +uint32_t version; > +uint32_t num_desc; > +OvmfSevMetadataDesc descs[]; > +} OvmfSevMetadata; > + > +OvmfSevMetadata *pc_system_get_ovmf_sev_metadata_ptr(void); > > void pc_pci_as_mapping_init(MemoryRegion *system_memory, > MemoryRegion *pci_address_space); > -- > 2.25.1 > > -- Isaku Yamahata
Re: [PATCH 2/3] migration: Drop unnecessary check in ram's pending_exact()
On Wed, 2024-01-17 at 15:58 +0800, pet...@redhat.com wrote: > From: Peter Xu > > When the migration frameworks fetches the exact pending sizes, it means > this check: > > remaining_size < s->threshold_size > > Must have been done already, actually at migration_iteration_run(): > > if (must_precopy <= s->threshold_size) { > qemu_savevm_state_pending_exact(_precopy, _postcopy); > > That should be after one round of ram_state_pending_estimate(). It makes > the 2nd check meaningless and can be dropped. > > To say it in another way, when reaching ->state_pending_exact(), we > unconditionally sync dirty bits for precopy. > > Then we can drop migrate_get_current() there too. > > Signed-off-by: Peter Xu Hi Peter, could you have a look at this issue: https://gitlab.com/qemu-project/qemu/-/issues/1565 which I reopened. Previous thread here: https://lore.kernel.org/qemu-devel/20230324184129.3119575-1-...@linux.ibm.com/ I'm seeing migration failures with s390x TCG again, which look the same to me as those a while back. > --- > migration/ram.c | 9 - > 1 file changed, 4 insertions(+), 5 deletions(-) > > diff --git a/migration/ram.c b/migration/ram.c > index c0cdcccb75..d5b7cd5ac2 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -3213,21 +3213,20 @@ static void ram_state_pending_estimate(void *opaque, > uint64_t *must_precopy, > static void ram_state_pending_exact(void *opaque, uint64_t *must_precopy, > uint64_t *can_postcopy) > { > -MigrationState *s = migrate_get_current(); > RAMState **temp = opaque; > RAMState *rs = *temp; > +uint64_t remaining_size; > > -uint64_t remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE; > - > -if (!migration_in_postcopy() && remaining_size < s->threshold_size) { > +if (!migration_in_postcopy()) { > bql_lock(); > WITH_RCU_READ_LOCK_GUARD() { > migration_bitmap_sync_precopy(rs, false); > } > bql_unlock(); > -remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE; > } > > +remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE; > + > if (migrate_postcopy_ram()) { > /* We can do postcopy, and all the data is postcopiable */ > *can_postcopy += remaining_size; This basically reverts 28ef5339c3 ("migration: fix ram_state_pending_exact()"), which originally made the issue disappear. Any thoughts on the matter appreciated. Thanks, Nina
Re: [PATCH] libqos/virtio.c: Correct 'flags' reading in qvirtqueue_kick
On Wed, 20 Mar 2024 at 09:10, Zheyu Ma wrote: > > In qvirtqueue_kick(), the 'flags' were previously being incorrectly read from > vq->avail instead of the correct vq->used location. This update ensures > 'flags' > are read from the correct location as per the virtio standard. > > Signed-off-by: Zheyu Ma > --- > tests/qtest/libqos/virtio.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Reviewed-by: Stefan Hajnoczi > diff --git a/tests/qtest/libqos/virtio.c b/tests/qtest/libqos/virtio.c > index 82a6e122bf..a21b6eee9c 100644 > --- a/tests/qtest/libqos/virtio.c > +++ b/tests/qtest/libqos/virtio.c > @@ -394,7 +394,7 @@ void qvirtqueue_kick(QTestState *qts, QVirtioDevice *d, > QVirtQueue *vq, > qvirtio_writew(d, qts, vq->avail + 2, idx + 1); > > /* Must read after idx is updated */ > -flags = qvirtio_readw(d, qts, vq->avail); > +flags = qvirtio_readw(d, qts, vq->used); > avail_event = qvirtio_readw(d, qts, vq->used + 4 + > sizeof(struct vring_used_elem) * vq->size); > > -- > 2.34.1 > >
Re: [PATCH for-9.1 v5 09/14] memory: Add Error** argument to .log_global_start() handler
On Wed, Mar 20, 2024 at 05:15:06PM +0100, Cédric Le Goater wrote: > Sure, or I will in a v6. Markus had a comment on 8/14. Yeah, I can handle both if they're the only ones. Thanks, -- Peter Xu
[PATCH] target/riscv: Fix mode in riscv_tlb_fill
Need to convert mmu_idx to privilege mode for PMP function. Signed-off-by: Irina Ryapolova --- target/riscv/cpu_helper.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c index ce7322011d..fc090d729a 100644 --- a/target/riscv/cpu_helper.c +++ b/target/riscv/cpu_helper.c @@ -1315,7 +1315,7 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int size, bool two_stage_lookup = mmuidx_2stage(mmu_idx); bool two_stage_indirect_error = false; int ret = TRANSLATE_FAIL; -int mode = mmu_idx; +int mode = mmuidx_priv(mmu_idx); /* default TLB page size */ target_ulong tlb_size = TARGET_PAGE_SIZE; -- 2.25.1
[PATCH v8 6/6] target/riscv: Enable updates for pointer masking variables and thus enable pointer masking extension
From: Alexey Baturo Signed-off-by: Alexey Baturo Reviewed-by: Alistair Francis --- target/riscv/cpu.c | 8 1 file changed, 8 insertions(+) diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c index 73c69f3d0a..9e3bf6c5c5 100644 --- a/target/riscv/cpu.c +++ b/target/riscv/cpu.c @@ -190,6 +190,9 @@ const RISCVIsaExtData isa_edata_arr[] = { ISA_EXT_DATA_ENTRY(svinval, PRIV_VERSION_1_12_0, ext_svinval), ISA_EXT_DATA_ENTRY(svnapot, PRIV_VERSION_1_12_0, ext_svnapot), ISA_EXT_DATA_ENTRY(svpbmt, PRIV_VERSION_1_12_0, ext_svpbmt), +ISA_EXT_DATA_ENTRY(ssnpm, PRIV_VERSION_1_12_0, ext_ssnpm), +ISA_EXT_DATA_ENTRY(smnpm, PRIV_VERSION_1_12_0, ext_smnpm), +ISA_EXT_DATA_ENTRY(smmpm, PRIV_VERSION_1_12_0, ext_smmpm), ISA_EXT_DATA_ENTRY(xtheadba, PRIV_VERSION_1_11_0, ext_xtheadba), ISA_EXT_DATA_ENTRY(xtheadbb, PRIV_VERSION_1_11_0, ext_xtheadbb), ISA_EXT_DATA_ENTRY(xtheadbs, PRIV_VERSION_1_11_0, ext_xtheadbs), @@ -1561,6 +1564,11 @@ const RISCVCPUMultiExtConfig riscv_cpu_vendor_exts[] = { /* These are experimental so mark with 'x-' */ const RISCVCPUMultiExtConfig riscv_cpu_experimental_exts[] = { +/* Zjpm v0.8 extensions */ +MULTI_EXT_CFG_BOOL("x-ssnpm", ext_ssnpm, false), +MULTI_EXT_CFG_BOOL("x-smnpm", ext_smnpm, false), +MULTI_EXT_CFG_BOOL("x-smmpm", ext_smmpm, false), + DEFINE_PROP_END_OF_LIST(), }; -- 2.34.1
Re: [PATCH v4 1/3] qio: add support for SO_PEERCRED for socket channel
On Mon, Mar 18, 2024 at 04:12:14PM +0100, Anthony Harivel wrote: > The function qio_channel_get_peercred() returns a pointer to the > credentials of the peer process connected to this socket. > > This credentials structure is defined in as follows: > > struct ucred { > pid_t pid;/* Process ID of the sending process */ > uid_t uid;/* User ID of the sending process */ > gid_t gid;/* Group ID of the sending process */ > }; > > The use of this function is possible only for connected AF_UNIX stream > sockets and for AF_UNIX stream and datagram socket pairs. > > On platform other than Linux, the function return 0. > > Signed-off-by: Anthony Harivel > --- > include/io/channel.h | 21 + > io/channel-socket.c | 24 > io/channel.c | 12 > 3 files changed, 57 insertions(+) > > diff --git a/include/io/channel.h b/include/io/channel.h > index 7986c49c713a..01ad7bd7e430 100644 > --- a/include/io/channel.h > +++ b/include/io/channel.h > @@ -160,6 +160,9 @@ struct QIOChannelClass { >void *opaque); > int (*io_flush)(QIOChannel *ioc, > Error **errp); > +void (*io_peerpid)(QIOChannel *ioc, > + unsigned int *pid, > + Error **errp); > }; > > /* General I/O handling functions */ > @@ -981,4 +984,22 @@ int coroutine_mixed_fn > qio_channel_writev_full_all(QIOChannel *ioc, > int qio_channel_flush(QIOChannel *ioc, >Error **errp); > > +/** > + * qio_channel_get_peercred: > + * @ioc: the channel object > + * @pid: pointer to pid > + * @errp: pointer to a NULL-initialized error object > + * > + * Returns the pid of the peer process connected to this socket. > + * > + * The use of this function is possible only for connected > + * AF_UNIX stream sockets and for AF_UNIX stream and datagram > + * socket pairs on Linux. > + * Return an error with pid -1 for the non-Linux OS. > + * > + */ > +void qio_channel_get_peerpid(QIOChannel *ioc, > + unsigned int *pid, > + Error **errp); > + > #endif /* QIO_CHANNEL_H */ > diff --git a/io/channel-socket.c b/io/channel-socket.c > index 3a899b060858..fcff92ecc151 100644 > --- a/io/channel-socket.c > +++ b/io/channel-socket.c > @@ -841,6 +841,29 @@ qio_channel_socket_set_cork(QIOChannel *ioc, > socket_set_cork(sioc->fd, v); > } > > +static void > +qio_channel_socket_get_peerpid(QIOChannel *ioc, > + unsigned int *pid, > + Error **errp) > +{ > +#ifdef CONFIG_LINUX > +QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc); > +Error *err = NULL; > +socklen_t len = sizeof(struct ucred); > + > +struct ucred cred; > +if (getsockopt(sioc->fd, > + SOL_SOCKET, SO_PEERCRED, > + , ) == -1) { Set '*pid = -1' > +error_setg_errno(, errno, "Unable to get peer credentials"); > +error_propagate(errp, err); and 'return;' here, since accessing 'cred.pid' below is undefined behaviour if getsockopt failed. > +} > +*pid = (unsigned int)cred.pid; > +#else > +error_setg(errp, "Unsupported feature"); > +*pid = -1; > +#endif > +} > > static int > qio_channel_socket_close(QIOChannel *ioc, > @@ -938,6 +961,7 @@ static void qio_channel_socket_class_init(ObjectClass > *klass, > #ifdef QEMU_MSG_ZEROCOPY > ioc_klass->io_flush = qio_channel_socket_flush; > #endif > +ioc_klass->io_peerpid = qio_channel_socket_get_peerpid; > } > > static const TypeInfo qio_channel_socket_info = { > diff --git a/io/channel.c b/io/channel.c > index a1f12f8e9096..777989bc9a81 100644 > --- a/io/channel.c > +++ b/io/channel.c > @@ -548,6 +548,18 @@ void qio_channel_set_cork(QIOChannel *ioc, > } > } > > +void qio_channel_get_peerpid(QIOChannel *ioc, > + unsigned int *pid, > + Error **errp) > +{ > +QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc); > + > +if (!klass->io_peerpid) { > +error_setg(errp, "Channel does not support peer pid"); > +return; > +} > +klass->io_peerpid(ioc, pid, errp); > +} > > off_t qio_channel_io_seek(QIOChannel *ioc, >off_t offset, > -- > 2.44.0 > With regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
[PATCH v8 4/6] target/riscv: Add pointer masking tb flags
From: Alexey Baturo Signed-off-by: Alexey Baturo Reviewed-by: Richard Henderson Reviewed-by: Alistair Francis --- target/riscv/cpu.h| 3 +++ target/riscv/cpu_helper.c | 3 +++ target/riscv/translate.c | 5 + 3 files changed, 11 insertions(+) diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h index 0112b568a0..404f6ec50d 100644 --- a/target/riscv/cpu.h +++ b/target/riscv/cpu.h @@ -566,6 +566,9 @@ FIELD(TB_FLAGS, ITRIGGER, 20, 1) FIELD(TB_FLAGS, VIRT_ENABLED, 21, 1) FIELD(TB_FLAGS, PRIV, 22, 2) FIELD(TB_FLAGS, AXL, 24, 2) +/* If pointer masking should be applied and address sign extended */ +FIELD(TB_FLAGS, PM_PMM, 26, 2) +FIELD(TB_FLAGS, PM_SIGNEXTEND, 28, 1) #ifdef TARGET_RISCV32 #define riscv_cpu_mxl(env) ((void)(env), MXL_RV32) diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c index a563451c48..4dea564fd8 100644 --- a/target/riscv/cpu_helper.c +++ b/target/riscv/cpu_helper.c @@ -68,6 +68,7 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, vaddr *pc, RISCVCPU *cpu = env_archcpu(env); RISCVExtStatus fs, vs; uint32_t flags = 0; +bool pm_signext = riscv_cpu_virt_mem_enabled(env); *pc = env->xl == MXL_RV32 ? env->pc & UINT32_MAX : env->pc; *cs_base = 0; @@ -138,6 +139,8 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, vaddr *pc, flags = FIELD_DP32(flags, TB_FLAGS, VS, vs); flags = FIELD_DP32(flags, TB_FLAGS, XL, env->xl); flags = FIELD_DP32(flags, TB_FLAGS, AXL, cpu_address_xl(env)); +flags = FIELD_DP32(flags, TB_FLAGS, PM_PMM, riscv_pm_get_pmm(env)); +flags = FIELD_DP32(flags, TB_FLAGS, PM_SIGNEXTEND, pm_signext); *pflags = flags; } diff --git a/target/riscv/translate.c b/target/riscv/translate.c index 3382eb0a5f..a85a2abf2e 100644 --- a/target/riscv/translate.c +++ b/target/riscv/translate.c @@ -103,6 +103,9 @@ typedef struct DisasContext { bool vl_eq_vlmax; CPUState *cs; TCGv zero; +/* actual address width */ +uint8_t addr_width; +bool addr_signed; /* Ztso */ bool ztso; /* Use icount trigger for native debug */ @@ -1180,6 +1183,8 @@ static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs) ctx->xl = FIELD_EX32(tb_flags, TB_FLAGS, XL); ctx->address_xl = FIELD_EX32(tb_flags, TB_FLAGS, AXL); ctx->cs = cs; +ctx->addr_width = 0; +ctx->addr_signed = false; ctx->ztso = cpu->cfg.ext_ztso; ctx->itrigger = FIELD_EX32(tb_flags, TB_FLAGS, ITRIGGER); ctx->zero = tcg_constant_tl(0); -- 2.34.1
[PATCH v8 5/6] target/riscv: Update address modify functions to take into account pointer masking
From: Alexey Baturo Signed-off-by: Alexey Baturo Reviewed-by: Richard Henderson Reviewed-by: Alistair Francis --- target/riscv/translate.c | 22 -- target/riscv/vector_helper.c | 13 + 2 files changed, 29 insertions(+), 6 deletions(-) diff --git a/target/riscv/translate.c b/target/riscv/translate.c index a85a2abf2e..99c5c6a530 100644 --- a/target/riscv/translate.c +++ b/target/riscv/translate.c @@ -581,8 +581,10 @@ static TCGv get_address(DisasContext *ctx, int rs1, int imm) TCGv src1 = get_gpr(ctx, rs1, EXT_NONE); tcg_gen_addi_tl(addr, src1, imm); -if (get_address_xl(ctx) == MXL_RV32) { -tcg_gen_ext32u_tl(addr, addr); +if (ctx->addr_signed) { +tcg_gen_sextract_tl(addr, addr, 0, ctx->addr_width); +} else { +tcg_gen_extract_tl(addr, addr, 0, ctx->addr_width); } return addr; @@ -595,8 +597,10 @@ static TCGv get_address_indexed(DisasContext *ctx, int rs1, TCGv offs) TCGv src1 = get_gpr(ctx, rs1, EXT_NONE); tcg_gen_add_tl(addr, src1, offs); -if (get_xl(ctx) == MXL_RV32) { -tcg_gen_ext32u_tl(addr, addr); +if (ctx->addr_signed) { +tcg_gen_sextract_tl(addr, addr, 0, ctx->addr_width); +} else { +tcg_gen_extract_tl(addr, addr, 0, ctx->addr_width); } return addr; } @@ -1183,8 +1187,14 @@ static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs) ctx->xl = FIELD_EX32(tb_flags, TB_FLAGS, XL); ctx->address_xl = FIELD_EX32(tb_flags, TB_FLAGS, AXL); ctx->cs = cs; -ctx->addr_width = 0; -ctx->addr_signed = false; +if (get_xl(ctx) == MXL_RV32) { +ctx->addr_width = 32; +ctx->addr_signed = false; +} else { +int pm_pmm = FIELD_EX32(tb_flags, TB_FLAGS, PM_PMM); +ctx->addr_width = 64 - riscv_pm_get_pmlen(pm_pmm); +ctx->addr_signed = FIELD_EX32(tb_flags, TB_FLAGS, PM_SIGNEXTEND); +} ctx->ztso = cpu->cfg.ext_ztso; ctx->itrigger = FIELD_EX32(tb_flags, TB_FLAGS, ITRIGGER); ctx->zero = tcg_constant_tl(0); diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c index 4934b43722..c77fbd8929 100644 --- a/target/riscv/vector_helper.c +++ b/target/riscv/vector_helper.c @@ -104,6 +104,19 @@ static inline uint32_t vext_max_elems(uint32_t desc, uint32_t log2_esz) static inline target_ulong adjust_addr(CPURISCVState *env, target_ulong addr) { +RISCVPmPmm pmm = riscv_pm_get_pmm(env); +if (pmm == PMM_FIELD_DISABLED) { +return addr; +} +int pmlen = riscv_pm_get_pmlen(pmm); +bool signext = riscv_cpu_virt_mem_enabled(env); +addr = addr << pmlen; +/* sign/zero extend masked address by N-1 bit */ +if (signext) { +addr = (target_long)addr >> pmlen; +} else { +addr = addr >> pmlen; +} return addr; } -- 2.34.1
[PATCH v8 3/6] target/riscv: Add helper functions to calculate current number of masked bits for pointer masking
From: Alexey Baturo Signed-off-by: Alexey Baturo Reviewed-by: Alistair Francis --- target/riscv/cpu.h| 4 +++ target/riscv/cpu_helper.c | 58 +++ 2 files changed, 62 insertions(+) diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h index b694cc62bf..0112b568a0 100644 --- a/target/riscv/cpu.h +++ b/target/riscv/cpu.h @@ -700,6 +700,10 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, vaddr *pc, bool riscv_cpu_is_32bit(RISCVCPU *cpu); +bool riscv_cpu_virt_mem_enabled(CPURISCVState *env); +RISCVPmPmm riscv_pm_get_pmm(CPURISCVState *env); +int riscv_pm_get_pmlen(RISCVPmPmm pmm); + RISCVException riscv_csrrw(CPURISCVState *env, int csrno, target_ulong *ret_value, target_ulong new_value, target_ulong write_mask); diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c index d20bffdd5a..a563451c48 100644 --- a/target/riscv/cpu_helper.c +++ b/target/riscv/cpu_helper.c @@ -142,6 +142,64 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, vaddr *pc, *pflags = flags; } +RISCVPmPmm riscv_pm_get_pmm(CPURISCVState *env) +{ +int pmm = 0; +#ifndef CONFIG_USER_ONLY +int priv_mode = cpu_address_mode(env); +/* Get current PMM field */ +switch (priv_mode) { +case PRV_M: +pmm = riscv_cpu_cfg(env)->ext_smmpm ? + get_field(env->mseccfg, MSECCFG_PMM) : PMM_FIELD_DISABLED; +break; +case PRV_S: +pmm = riscv_cpu_cfg(env)->ext_smnpm ? + get_field(env->menvcfg, MENVCFG_PMM) : PMM_FIELD_DISABLED; +break; +case PRV_U: +pmm = riscv_cpu_cfg(env)->ext_ssnpm ? + get_field(env->senvcfg, SENVCFG_PMM) : PMM_FIELD_DISABLED; +break; +default: +g_assert_not_reached(); +} +#endif +return pmm; +} + +bool riscv_cpu_virt_mem_enabled(CPURISCVState *env) +{ +bool virt_mem_en = false; +#ifndef CONFIG_USER_ONLY +int satp_mode = 0; +int priv_mode = cpu_address_mode(env); +/* Get current PMM field */ +if (riscv_cpu_mxl(env) == MXL_RV32) { +satp_mode = get_field(env->satp, SATP32_MODE); +} else { +satp_mode = get_field(env->satp, SATP64_MODE); +} +virt_mem_en = ((satp_mode != VM_1_10_MBARE) && (priv_mode != PRV_M)); +#endif +return virt_mem_en; +} + +int riscv_pm_get_pmlen(RISCVPmPmm pmm) +{ +switch (pmm) { +case PMM_FIELD_DISABLED: +return 0; +case PMM_FIELD_PMLEN7: +return 7; +case PMM_FIELD_PMLEN16: +return 16; +default: +g_assert_not_reached(); +} +return -1; +} + #ifndef CONFIG_USER_ONLY /* -- 2.34.1
[PATCH v8 2/6] target/riscv: Add new CSR fields for S{sn, mn, m}pm extensions as part of Zjpm v0.8
From: Alexey Baturo Signed-off-by: Alexey Baturo Reviewed-by: Alistair Francis --- target/riscv/cpu.h | 8 target/riscv/cpu_bits.h | 3 +++ target/riscv/cpu_cfg.h | 3 +++ target/riscv/csr.c | 11 +++ target/riscv/machine.c | 10 +++--- target/riscv/pmp.c | 13 ++--- target/riscv/pmp.h | 11 ++- 7 files changed, 48 insertions(+), 11 deletions(-) diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h index cfad5281a1..b694cc62bf 100644 --- a/target/riscv/cpu.h +++ b/target/riscv/cpu.h @@ -123,6 +123,14 @@ typedef enum { EXT_STATUS_DIRTY, } RISCVExtStatus; +/* Enum holds PMM field values for Zjpm v0.8 extension */ +typedef enum { +PMM_FIELD_DISABLED = 0, +PMM_FIELD_RESERVED = 1, +PMM_FIELD_PMLEN7 = 2, +PMM_FIELD_PMLEN16 = 3, +} RISCVPmPmm; + #define MMU_USER_IDX 3 #define MAX_RISCV_PMPS (16) diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h index 5098d2d613..e9e6e1f952 100644 --- a/target/riscv/cpu_bits.h +++ b/target/riscv/cpu_bits.h @@ -708,6 +708,7 @@ typedef enum RISCVException { #define MENVCFG_CBIE (3UL << 4) #define MENVCFG_CBCFE BIT(6) #define MENVCFG_CBZE BIT(7) +#define MENVCFG_PMM(3ULL << 32) #define MENVCFG_ADUE (1ULL << 61) #define MENVCFG_PBMTE (1ULL << 62) #define MENVCFG_STCE (1ULL << 63) @@ -721,11 +722,13 @@ typedef enum RISCVException { #define SENVCFG_CBIE MENVCFG_CBIE #define SENVCFG_CBCFE MENVCFG_CBCFE #define SENVCFG_CBZE MENVCFG_CBZE +#define SENVCFG_PMMMENVCFG_PMM #define HENVCFG_FIOM MENVCFG_FIOM #define HENVCFG_CBIE MENVCFG_CBIE #define HENVCFG_CBCFE MENVCFG_CBCFE #define HENVCFG_CBZE MENVCFG_CBZE +#define HENVCFG_PMMMENVCFG_PMM #define HENVCFG_ADUE MENVCFG_ADUE #define HENVCFG_PBMTE MENVCFG_PBMTE #define HENVCFG_STCE MENVCFG_STCE diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h index 2040b90da0..963de724c2 100644 --- a/target/riscv/cpu_cfg.h +++ b/target/riscv/cpu_cfg.h @@ -118,6 +118,9 @@ struct RISCVCPUConfig { bool ext_ssaia; bool ext_sscofpmf; bool ext_smepmp; +bool ext_ssnpm; +bool ext_smnpm; +bool ext_smmpm; bool rvv_ta_all_1s; bool rvv_ma_all_1s; diff --git a/target/riscv/csr.c b/target/riscv/csr.c index ffb5a1102e..69c0279c12 100644 --- a/target/riscv/csr.c +++ b/target/riscv/csr.c @@ -530,6 +530,9 @@ static RISCVException have_mseccfg(CPURISCVState *env, int csrno) if (riscv_cpu_cfg(env)->ext_zkr) { return RISCV_EXCP_NONE; } +if (riscv_cpu_cfg(env)->ext_smmpm) { +return RISCV_EXCP_NONE; +} return RISCV_EXCP_ILLEGAL_INST; } @@ -2080,6 +2083,10 @@ static RISCVException write_menvcfg(CPURISCVState *env, int csrno, (cfg->ext_sstc ? MENVCFG_STCE : 0) | (cfg->ext_svadu ? MENVCFG_ADUE : 0); } +/* Update PMM field only if the value is valid according to Zjpm v0.8 */ +if (((val & MENVCFG_PMM) >> 32) != PMM_FIELD_RESERVED) { +mask |= MENVCFG_PMM; +} env->menvcfg = (env->menvcfg & ~mask) | (val & mask); return RISCV_EXCP_NONE; @@ -2124,6 +2131,10 @@ static RISCVException write_senvcfg(CPURISCVState *env, int csrno, target_ulong val) { uint64_t mask = SENVCFG_FIOM | SENVCFG_CBIE | SENVCFG_CBCFE | SENVCFG_CBZE; +/* Update PMM field only if the value is valid according to Zjpm v0.8 */ +if (((val & SENVCFG_PMM) >> 32) != PMM_FIELD_RESERVED) { +mask |= SENVCFG_PMM; +} RISCVException ret; ret = smstateen_acc_ok(env, 0, SMSTATEEN0_HSENVCFG); diff --git a/target/riscv/machine.c b/target/riscv/machine.c index 64ab66e332..28f373 100644 --- a/target/riscv/machine.c +++ b/target/riscv/machine.c @@ -152,15 +152,19 @@ static const VMStateDescription vmstate_vector = { static bool pointermasking_needed(void *opaque) { -return false; +RISCVCPU *cpu = opaque; +return cpu->cfg.ext_ssnpm || cpu->cfg.ext_smnpm || cpu->cfg.ext_smmpm; } static const VMStateDescription vmstate_pointermasking = { .name = "cpu/pointer_masking", -.version_id = 1, -.minimum_version_id = 1, +.version_id = 2, +.minimum_version_id = 2, .needed = pointermasking_needed, .fields = (const VMStateField[]) { +VMSTATE_UINTTL(env.mseccfg, RISCVCPU), +VMSTATE_UINTTL(env.senvcfg, RISCVCPU), +VMSTATE_UINTTL(env.menvcfg, RISCVCPU), VMSTATE_END_OF_LIST() } }; diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c index 2a76b611a0..7ddb9dbf0b 100644 --- a/target/riscv/pmp.c
[PATCH v8 0/6] Pointer Masking update for Zjpm v0.8
From: Alexey Baturo Hi, Rebasing patches on current qemu branch and resubmitting them. Thanks. [v7]: I'm terribly sorry, but previous rebase went wrong and somehow I missed it. This time I double-checked rebased version. This patch series is properly rebased on https://github.com/alistair23/qemu/tree/riscv-to-apply.next [v6]: This patch series is rebased on https://github.com/alistair23/qemu/tree/riscv-to-apply.next [v5]: This patch series targets Zjpm v0.8 extension. The spec itself could be found here: https://github.com/riscv/riscv-j-extension/blob/8088461d8d66a7676872b61c908cbeb7cf5c5d1d/zjpm-spec.pdf This patch series is updated after the suggested comments: - add "x-" to the extension names to indicate experimental [v4]: Patch series updated after the suggested comments: - removed J-letter extension as it's unused - renamed and fixed function to detect if address should be sign-extended - zeroed unused context variables and moved computation logic to another patch - bumped pointer masking version_id and minimum_version_id by 1 [v3]: There patches are updated after Richard's comments: - moved new tb flags to the end - used tcg_gen_(s)extract to get the final address - properly handle CONFIG_USER_ONLY [v2]: As per Richard's suggestion I made pmm field part of tb_flags. It allowed to get rid of global variable to store pmlen. Also it allowed to simplify all the machinery around it. [v1]: It looks like Zjpm v0.8 is almost frozen and we don't expect it change drastically anymore. Compared to the original implementation with explicit base and mask CSRs, we now only have several fixed options for number of masked bits which are set using existing CSRs. The changes have been tested with handwritten assembly tests and LLVM HWASAN test suite. Alexey Baturo (6): target/riscv: Remove obsolete pointer masking extension code. target/riscv: Add new CSR fields for S{sn,mn,m}pm extensions as part of Zjpm v0.8 target/riscv: Add helper functions to calculate current number of masked bits for pointer masking target/riscv: Add pointer masking tb flags target/riscv: Update address modify functions to take into account pointer masking target/riscv: Enable updates for pointer masking variables and thus enable pointer masking extension target/riscv/cpu.c | 21 +-- target/riscv/cpu.h | 45 +++-- target/riscv/cpu_bits.h | 90 +- target/riscv/cpu_cfg.h | 3 + target/riscv/cpu_helper.c| 97 +- target/riscv/csr.c | 337 ++- target/riscv/machine.c | 20 +-- target/riscv/pmp.c | 13 +- target/riscv/pmp.h | 11 +- target/riscv/tcg/tcg-cpu.c | 5 +- target/riscv/translate.c | 46 ++--- target/riscv/vector_helper.c | 15 +- 12 files changed, 157 insertions(+), 546 deletions(-) -- 2.34.1
[PATCH v8 1/6] target/riscv: Remove obsolete pointer masking extension code.
From: Alexey Baturo Zjpm v0.8 is almost frozen and it's much simplier compared to the existing one: The newer version doesn't allow to specify custom mask or base for masking. Instead it allows only certain options for masking top bits. Signed-off-by: Alexey Baturo Acked-by: Alistair Francis --- target/riscv/cpu.c | 13 +- target/riscv/cpu.h | 30 +--- target/riscv/cpu_bits.h | 87 -- target/riscv/cpu_helper.c| 52 -- target/riscv/csr.c | 326 --- target/riscv/machine.c | 14 +- target/riscv/tcg/tcg-cpu.c | 5 +- target/riscv/translate.c | 27 +-- target/riscv/vector_helper.c | 2 +- 9 files changed, 13 insertions(+), 543 deletions(-) diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c index c160b9216b..73c69f3d0a 100644 --- a/target/riscv/cpu.c +++ b/target/riscv/cpu.c @@ -42,7 +42,7 @@ /* RISC-V CPU definitions */ static const char riscv_single_letter_exts[] = "IEMAFDQCBPVH"; const uint32_t misa_bits[] = {RVI, RVE, RVM, RVA, RVF, RVD, RVV, - RVC, RVS, RVU, RVH, RVJ, RVG, RVB, 0}; + RVC, RVS, RVU, RVH, RVG, RVB, 0}; /* * From vector_helper.c @@ -793,13 +793,6 @@ static void riscv_cpu_dump_state(CPUState *cs, FILE *f, int flags) CSR_MSCRATCH, CSR_SSCRATCH, CSR_SATP, -CSR_MMTE, -CSR_UPMBASE, -CSR_UPMMASK, -CSR_SPMBASE, -CSR_SPMMASK, -CSR_MPMBASE, -CSR_MPMMASK, }; for (i = 0; i < ARRAY_SIZE(dump_csrs); ++i) { @@ -979,8 +972,6 @@ static void riscv_cpu_reset_hold(Object *obj) } i++; } -/* mmte is supposed to have pm.current hardwired to 1 */ -env->mmte |= (EXT_STATUS_INITIAL | MMTE_M_PM_CURRENT); /* * Bits 10, 6, 2 and 12 of mideleg are read only 1 when the Hypervisor @@ -1002,7 +993,6 @@ static void riscv_cpu_reset_hold(Object *obj) pmp_unlock_entries(env); #endif env->xl = riscv_cpu_mxl(env); -riscv_cpu_update_mask(env); cs->exception_index = RISCV_EXCP_NONE; env->load_res = -1; set_default_nan_mode(1, >fp_status); @@ -1393,7 +1383,6 @@ static const MISAExtInfo misa_ext_info_arr[] = { MISA_EXT_INFO(RVS, "s", "Supervisor-level instructions"), MISA_EXT_INFO(RVU, "u", "User-level instructions"), MISA_EXT_INFO(RVH, "h", "Hypervisor"), -MISA_EXT_INFO(RVJ, "x-j", "Dynamic translated languages"), MISA_EXT_INFO(RVV, "v", "Vector operations"), MISA_EXT_INFO(RVG, "g", "General purpose (IMAFD_Zicsr_Zifencei)"), MISA_EXT_INFO(RVB, "x-b", "Bit manipulation (Zba_Zbb_Zbs)") diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h index 3b1a02b944..cfad5281a1 100644 --- a/target/riscv/cpu.h +++ b/target/riscv/cpu.h @@ -68,7 +68,6 @@ typedef struct CPUArchState CPURISCVState; #define RVS RV('S') #define RVU RV('U') #define RVH RV('H') -#define RVJ RV('J') #define RVG RV('G') #define RVB RV('B') @@ -395,17 +394,6 @@ struct CPUArchState { /* True if in debugger mode. */ bool debugger; -/* - * CSRs for PointerMasking extension - */ -target_ulong mmte; -target_ulong mpmmask; -target_ulong mpmbase; -target_ulong spmmask; -target_ulong spmbase; -target_ulong upmmask; -target_ulong upmbase; - /* CSRs for execution environment configuration */ uint64_t menvcfg; uint64_t mstateen[SMSTATEEN_MAX_COUNT]; @@ -414,9 +402,6 @@ struct CPUArchState { target_ulong senvcfg; uint64_t henvcfg; #endif -target_ulong cur_pmmask; -target_ulong cur_pmbase; - /* Fields from here on are preserved across CPU reset. */ QEMUTimer *stimer; /* Internal timer for S-mode interrupt */ QEMUTimer *vstimer; /* Internal timer for VS-mode interrupt */ @@ -565,16 +550,14 @@ FIELD(TB_FLAGS, VSTART_EQ_ZERO, 15, 1) /* The combination of MXL/SXL/UXL that applies to the current cpu mode. */ FIELD(TB_FLAGS, XL, 16, 2) /* If PointerMasking should be applied */ -FIELD(TB_FLAGS, PM_MASK_ENABLED, 18, 1) -FIELD(TB_FLAGS, PM_BASE_ENABLED, 19, 1) -FIELD(TB_FLAGS, VTA, 20, 1) -FIELD(TB_FLAGS, VMA, 21, 1) +FIELD(TB_FLAGS, VTA, 18, 1) +FIELD(TB_FLAGS, VMA, 19, 1) /* Native debug itrigger */ -FIELD(TB_FLAGS, ITRIGGER, 22, 1) +FIELD(TB_FLAGS, ITRIGGER, 20, 1) /* Virtual mode enabled */ -FIELD(TB_FLAGS, VIRT_ENABLED, 23, 1) -FIELD(TB_FLAGS, PRIV, 24, 2) -FIELD(TB_FLAGS, AXL, 26, 2) +FIELD(TB_FLAGS, VIRT_ENABLED, 21, 1) +FIELD(TB_FLAGS, PRIV, 22, 2) +FIELD(TB_FLAGS, AXL, 24, 2) #ifdef TARGET_RISCV32 #define riscv_cpu_mxl(env) ((void)(env), MXL_RV32) @@ -707,7 +690,6 @@ static inline uint32_t vext_get_vlmax(uint32_t vlenb, uint32_t vsew, void cpu_get_tb_cpu_state(CPURISCVState *env, vaddr *pc, uint64_t *cs_base, uint32_t *pflags); -void riscv_cpu_update_mask(CPURISCVState *env); bool riscv_cpu_is_32bit(RISCVCPU *cpu);
Re: [PATCH-for-9.0 0/2] target/monitor: Deprecate 'info tlb/mem' in favor of 'info mmu'
On Wed, 20 Mar 2024 at 17:06, Philippe Mathieu-Daudé wrote: > > +Alex/Daniel > > On 20/3/24 17:53, Peter Maydell wrote: > > On Wed, 20 Mar 2024 at 16:40, Philippe Mathieu-Daudé > > wrote: > >> > >> 'info tlb' and 'info mem' commands don't scale in heterogeneous > >> emulation. They will be reworked after the next release, hidden > >> behind the 'info mmu' command. It is not too late to deprecate > >> commands, so add the 'info mmu' command as wrapper to the other > >> ones, but already deprecate them. > >> > >> Philippe Mathieu-Daudé (2): > >>target/monitor: Introduce 'info mmu' command > >>target/monitor: Deprecate 'info tlb' and 'info mem' commands > > > > This seems to replace "info tlb" and "info mem" with "info mmu -t" > > and "info mmu -m", but it doesn't really say anything about: > > * what the difference is between these two things > > I really don't know; I'm only trying to keep the monitor interface > identical. You don't, though: you change it from "info tlb" to "info mmu -t" etc. > > * which targets implement which and why > > This one is easy to answer: > > #if defined(TARGET_I386) || defined(TARGET_SH4) || defined(TARGET_SPARC) > || \ > defined(TARGET_PPC) || defined(TARGET_XTENSA) || defined(TARGET_M68K) > { > .name = "tlb", > > #if defined(TARGET_I386) || defined(TARGET_RISCV) > { > .name = "mem", > > > * what the plan is for the future > > My problem is with linking a single QEMU binary, as these two symbols > (hmp_info_mem and hmp_info_tlb) clash. Yes, but they both (implicitly) operate on the current HMP CPU, so the problem with linking into a single binary is that they're not indirected through a method on the CPU object, not the syntax used in the monitor to invoke them, presumably. > I'm indeed only postponing the problem, without looking at what > this code does. I did it adding hmp_info_mmu_tlb/mem hooks in > TCGCPUOps ("hw/core/tcg-cpu-ops.h"), so the command can be > dispatched per target vcpu as target-agnostic code in > monitor/hmp-cmds.c: > > +#include "hw/core/tcg-cpu-ops.h" > + > +static void hmp_info_mmu_tlb(Monitor *mon, CPUState *cpu) > +{ > +const TCGCPUOps *tcg_ops = cpu->cc->tcg_ops; > + > +if (tcg_ops->hmp_info_mmu_tlb) { > +tcg_ops->hmp_info_mmu_tlb(mon, cpu_env(cpu)); > +} else { > +monitor_puts(mon, "No per-CPU information available on this > target\n"); > +} > +} These aren't TCG specific though, so why TCGCPUOps ? > > I am definitely not a fan of either of these commands, because > > (as we currently implement them) they effectively require each > > target architecture to implement a second copy of the page table > > walking code. But before we can deprecate them we need to be > > pretty sure that "info mmu" is what we want to replace them with. > > An alternative is to just deprecate them, without adding "info mmu" :) > > It is OK to un-deprecate stuff if we realize its usefulness. The commands are there because some users find them useful. I just dislike them because I think they're a bit niche and annoying to implement and not consistent across target architectures and not very well documented... By the way, we have no obligation to follow the deprecate-and-drop process for HMP commands; unlike QMP, we give ourselves the license to vary it when we feel like it, because the users are humans, not programs or scripts. -- PMM
Re: [PATCH v2] vfio/pci: migration: Skip config space check for vendor specific capability during restore/load
On 18/03/24 8:28 pm, Alex Williamson wrote: External email: Use caution opening links or attachments On Fri, 15 Mar 2024 23:22:22 +0530 Vinayak Kale wrote: On 11/03/24 8:32 pm, Alex Williamson wrote: External email: Use caution opening links or attachments On Mon, 11 Mar 2024 17:45:19 +0530 Vinayak Kale wrote: In case of migration, during restore operation, qemu checks config space of the pci device with the config space in the migration stream captured during save operation. In case of config space data mismatch, restore operation is failed. config space check is done in function get_pci_config_device(). By default VSC (vendor-specific-capability) in config space is checked. Ideally qemu should not check VSC for VFIO-PCI device during restore/load as qemu is not aware of VSC ABI. It's disappointing that we can't seem to have a discussion about why it's not the responsibility of the underlying migration support in the vfio-pci variant driver to make the vendor specific capability consistent across migration. I think it is device vendor driver's responsibility to ensure that VSC is consistent across migration. Here consistency could mean that VSC format should be same on source and destination, however actual VSC contents may not be byte-to-byte identical. If a vfio-pci device is migration capable and if vfio-pci vendor driver is OK with volatile VSC contents as long as consistency is maintained for VSC format then QEMU should exempt config space check for VSC contents. I tend to agree that ultimately the variant driver is responsible for making the device consistent during migration and QEMU's policy that even vendor defined ABI needs to be byte for byte identical is somewhat arbitrary. Also, for future maintenance, specifically what device is currently broken by this and under what conditions? Under certain conditions VSC contents vary for NVIDIA vGPU devices in case of live migration. Due to QEMU's current config space check for VSC, live migration is broken across NVIDIA vGPU devices. This is incredibly vague. We've been testing NVIDIA vGPU migration and have not experienced a migration failure due to VSC mismatch. Does this require a specific device? A specific workload? What specific conditions trigger this problem? In case of live migration, in a situation where source and destination host driver is different, Vendor Specific Information in VSC varies on the destination to ensure vGPU feature capabilities exposed to guest driver are compatible with destination host. This is applicable to all NVIDIA vGPU devices. While as above, I agree in theory that the responsibility lies on the migration support in the variant driver, there are risks involved, particularly if new dependencies on the VSC contents are developed in the guest. For future maintenance and development in this space, the commit log should describe exactly the scenario that requires this policy change. Thanks, I'll add aforementioned scenario (situation when live migration is broken for NVIDIA vGPU devices) in the commit description. Thanks. Alex This patch skips the check for VFIO-PCI device by clearing pdev->cmask[] for VSC offsets. If cmask[] is not set for an offset, then qemu skips config space check for that offset. Signed-off-by: Vinayak Kale --- Version History v1->v2: - Limited scope of change to vfio-pci devices instead of all pci devices. hw/vfio/pci.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index d7fe06715c..9edaff4b37 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -2132,6 +2132,22 @@ static void vfio_check_af_flr(VFIOPCIDevice *vdev, uint8_t pos) } } +static int vfio_add_vendor_specific_cap(VFIOPCIDevice *vdev, int pos, +uint8_t size, Error **errp) +{ +PCIDevice *pdev = >pdev; + +pos = pci_add_capability(pdev, PCI_CAP_ID_VNDR, pos, size, errp); +if (pos < 0) { +return pos; +} + +/* Exempt config space check for VSC during restore/load */ +memset(pdev->cmask + pos, 0, size); This excludes the entire capability from comparison, including the capability ID, next pointer, and capability length. Even if the contents of the capability are considered volatile vendor information, the header is spec defined ABI which must be consistent. Thanks, This makes sense, I'll address this in V3. Thanks. Alex + +return pos; +} + static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos, Error **errp) { PCIDevice *pdev = >pdev; @@ -2199,6 +2215,9 @@ static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos, Error **errp) vfio_check_af_flr(vdev, pos); ret = pci_add_capability(pdev, cap_id, pos, size, errp); break; +case PCI_CAP_ID_VNDR: +ret = vfio_add_vendor_specific_cap(vdev, pos, size, errp); +break; default: ret =
Re: [PATCH RFC v3 00/49] Add AMD Secure Nested Paging (SEV-SNP) support
On Wed, Mar 20, 2024 at 10:59 AM Paolo Bonzini wrote: > I will now focus on reviewing patches 6-20. This way we can prepare a > common tree for SEV_INIT2/SNP/TDX, for both vendors to build upon. Ok, the attachment is the delta that I have. The only major change is requiring discard (thus effectively blocking VFIO support for SEV-SNP/TDX, at least for now). I will push it shortly to the same sevinit2 branch, and will post the patches sometime soon. Xiaoyao, you can use that branch too (it's on https://gitlab.com/bonzini/qemu) as the basis for your TDX work. Paolo diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index bf0ae0c8adb..428468950d9 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -285,19 +285,8 @@ static int kvm_set_user_memory_region(KVMMemoryListener *kml, KVMSlot *slot, boo { KVMState *s = kvm_state; struct kvm_userspace_memory_region2 mem; -static int cap_user_memory2 = -1; int ret; -if (cap_user_memory2 == -1) { -cap_user_memory2 = kvm_check_extension(s, KVM_CAP_USER_MEMORY2); -} - -if (!cap_user_memory2 && slot->guest_memfd >= 0) { -error_report("%s, KVM doesn't support KVM_CAP_USER_MEMORY2," - " which is required by guest memfd!", __func__); -exit(1); -} - mem.slot = slot->slot | (kml->as_id << 16); mem.guest_phys_addr = slot->start_addr; mem.userspace_addr = (unsigned long)slot->ram; @@ -310,7 +299,7 @@ static int kvm_set_user_memory_region(KVMMemoryListener *kml, KVMSlot *slot, boo * value. This is needed based on KVM commit 75d61fbc. */ mem.memory_size = 0; -if (cap_user_memory2) { +if (kvm_guest_memfd_supported) { ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION2, ); } else { ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, ); @@ -320,7 +309,7 @@ static int kvm_set_user_memory_region(KVMMemoryListener *kml, KVMSlot *slot, boo } } mem.memory_size = slot->memory_size; -if (cap_user_memory2) { +if (kvm_guest_memfd_supported) { ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION2, ); } else { ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, ); @@ -332,7 +321,7 @@ err: mem.userspace_addr, mem.guest_memfd, mem.guest_memfd_offset, ret); if (ret < 0) { -if (cap_user_memory2) { +if (kvm_guest_memfd_supported) { error_report("%s: KVM_SET_USER_MEMORY_REGION2 failed, slot=%d," " start=0x%" PRIx64 ", size=0x%" PRIx64 "," " flags=0x%" PRIx32 ", guest_memfd=%" PRId32 "," @@ -502,6 +491,7 @@ static int kvm_mem_flags(MemoryRegion *mr) flags |= KVM_MEM_READONLY; } if (memory_region_has_guest_memfd(mr)) { +assert(kvm_guest_memfd_supported); flags |= KVM_MEM_GUEST_MEMFD; } return flags; @@ -1310,18 +1300,7 @@ static int kvm_set_memory_attributes(hwaddr start, hwaddr size, uint64_t attr) struct kvm_memory_attributes attrs; int r; -if (kvm_supported_memory_attributes == 0) { -error_report("No memory attribute supported by KVM\n"); -return -EINVAL; -} - -if ((attr & kvm_supported_memory_attributes) != attr) { -error_report("memory attribute 0x%lx not supported by KVM," - " supported bits are 0x%lx\n", - attr, kvm_supported_memory_attributes); -return -EINVAL; -} - +assert((attr & kvm_supported_memory_attributes) == attr); attrs.attributes = attr; attrs.address = start; attrs.size = size; @@ -2488,11 +2467,14 @@ static int kvm_init(MachineState *ms) } s->as = g_new0(struct KVMAs, s->nr_as); -kvm_guest_memfd_supported = kvm_check_extension(s, KVM_CAP_GUEST_MEMFD); - ret = kvm_check_extension(s, KVM_CAP_MEMORY_ATTRIBUTES); kvm_supported_memory_attributes = ret > 0 ? ret : 0; +kvm_guest_memfd_supported = +kvm_check_extension(s, KVM_CAP_GUEST_MEMFD) && +kvm_check_extension(s, KVM_CAP_USER_MEMORY2) && +(kvm_supported_memory_attributes & KVM_MEMORY_ATTRIBUTE_PRIVATE); + if (object_property_find(OBJECT(current_machine), "kvm-type")) { g_autofree char *kvm_type = object_property_get_str(OBJECT(current_machine), "kvm-type", @@ -2962,14 +2944,10 @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private) */ return 0; } else { -ret = ram_block_discard_is_disabled() - ? ram_block_discard_range(rb, offset, size) - : 0; +ret = ram_block_discard_range(rb, offset, size); } } else { -ret = ram_block_discard_is_disabled() - ? ram_block_discard_guest_memfd_range(rb, offset,
Re: [PATCH-for-9.0 0/2] target/monitor: Deprecate 'info tlb/mem' in favor of 'info mmu'
+Alex/Daniel On 20/3/24 17:53, Peter Maydell wrote: On Wed, 20 Mar 2024 at 16:40, Philippe Mathieu-Daudé wrote: 'info tlb' and 'info mem' commands don't scale in heterogeneous emulation. They will be reworked after the next release, hidden behind the 'info mmu' command. It is not too late to deprecate commands, so add the 'info mmu' command as wrapper to the other ones, but already deprecate them. Philippe Mathieu-Daudé (2): target/monitor: Introduce 'info mmu' command target/monitor: Deprecate 'info tlb' and 'info mem' commands This seems to replace "info tlb" and "info mem" with "info mmu -t" and "info mmu -m", but it doesn't really say anything about: * what the difference is between these two things I really don't know; I'm only trying to keep the monitor interface identical. * which targets implement which and why This one is easy to answer: #if defined(TARGET_I386) || defined(TARGET_SH4) || defined(TARGET_SPARC) || \ defined(TARGET_PPC) || defined(TARGET_XTENSA) || defined(TARGET_M68K) { .name = "tlb", #if defined(TARGET_I386) || defined(TARGET_RISCV) { .name = "mem", * what the plan is for the future My problem is with linking a single QEMU binary, as these two symbols (hmp_info_mem and hmp_info_tlb) clash. Luckily for me these are the only 2 implemented by more then one target: $ git grep TARGET_ -- hmp-commands* hmp-commands-info.hx:116:#if defined(TARGET_I386) hmp-commands-info.hx:225:#if defined(TARGET_I386) || defined(TARGET_SH4) || defined(TARGET_SPARC) || \ hmp-commands-info.hx:226:defined(TARGET_PPC) || defined(TARGET_XTENSA) || defined(TARGET_M68K) hmp-commands-info.hx:241:#if defined(TARGET_I386) || defined(TARGET_RISCV) hmp-commands-info.hx:729:#if defined(TARGET_S390X) hmp-commands-info.hx:744:#if defined(TARGET_S390X) hmp-commands-info.hx:828:#if defined(TARGET_I386) hmp-commands-info.hx:882:#if defined(TARGET_I386) hmp-commands.hx:1126:#if defined(TARGET_S390X) hmp-commands.hx:1141:#if defined(TARGET_S390X) hmp-commands.hx:1489:#if defined(TARGET_I386) All the other ones are only implemented by a single target, so not a problem for now. I'm indeed only postponing the problem, without looking at what this code does. I did it adding hmp_info_mmu_tlb/mem hooks in TCGCPUOps ("hw/core/tcg-cpu-ops.h"), so the command can be dispatched per target vcpu as target-agnostic code in monitor/hmp-cmds.c: +#include "hw/core/tcg-cpu-ops.h" + +static void hmp_info_mmu_tlb(Monitor *mon, CPUState *cpu) +{ +const TCGCPUOps *tcg_ops = cpu->cc->tcg_ops; + +if (tcg_ops->hmp_info_mmu_tlb) { +tcg_ops->hmp_info_mmu_tlb(mon, cpu_env(cpu)); +} else { +monitor_puts(mon, "No per-CPU information available on this target\n"); +} +} I am definitely not a fan of either of these commands, because (as we currently implement them) they effectively require each target architecture to implement a second copy of the page table walking code. But before we can deprecate them we need to be pretty sure that "info mmu" is what we want to replace them with. An alternative is to just deprecate them, without adding "info mmu" :) It is OK to un-deprecate stuff if we realize its usefulness. Regards, Phil.
Re: [PULL 0/6] QEMU bug fixes for 20240320
On Wed, 20 Mar 2024 at 10:32, Paolo Bonzini wrote: > > The following changes since commit ba49d760eb04630e7b15f423ebecf6c871b8f77b: > > Merge tag 'pull-maintainer-final-130324-1' of > https://gitlab.com/stsquad/qemu into staging (2024-03-13 15:12:14 +) > > are available in the Git repository at: > > https://gitlab.com/bonzini/qemu.git tags/for-upstream > > for you to fetch changes up to 05007258f02da253af370387b69fe98e9f37b320: > > meson: remove dead dictionary access (2024-03-20 11:30:49 +0100) > > > * fix use-after-free issue > * fix i386 TLB issue > * fix crash with wrong -M confidential-guest-support argument > * fix NULL pointer dereference in x86 MCE injection > Applied, thanks. Please update the changelog at https://wiki.qemu.org/ChangeLog/9.0 for any user-visible changes. -- PMM
Re: [PULL 0/5] Ui patches
On Wed, 20 Mar 2024 at 13:54, wrote: > > From: Marc-André Lureau > > The following changes since commit c62d54d0a8067ffb3d5b909276f7296d7df33fa7: > > Update version for v9.0.0-rc0 release (2024-03-19 19:13:52 +) > > are available in the Git repository at: > > https://gitlab.com/marcandre.lureau/qemu.git tags/ui-pull-request > > for you to fetch changes up to d4069a84a3380247c1b524096c6a807743bf687a: > > ui: compile dbus-display1.c with -fPIC as necessary (2024-03-20 10:28:00 > +0400) > > > UI: fixes > > - dbus-display shared-library compilation fix > - remove console_select() and fix related issues > > Applied, thanks. Please update the changelog at https://wiki.qemu.org/ChangeLog/9.0 for any user-visible changes. -- PMM
Re: [PULL 0/5] Edk2 20240320 patches
On Wed, 20 Mar 2024 at 07:09, Gerd Hoffmann wrote: > > The following changes since commit ba49d760eb04630e7b15f423ebecf6c871b8f77b: > > Merge tag 'pull-maintainer-final-130324-1' of > https://gitlab.com/stsquad/qemu into staging (2024-03-13 15:12:14 +) > > are available in the Git repository at: > > https://gitlab.com/kraxel/qemu.git tags/edk2-20240320-pull-request > > for you to fetch changes up to 4a1babe58a1b3cd2c493ee6e0d774e70f62ad9c3: > > update edk2 binaries for arm, risc-v and x86 secure boot. (2024-03-19 > 16:42:10 +0100) > > > edk2: cleanup fix, update build config, rebuild binaries. > > Applied, thanks. Please update the changelog at https://wiki.qemu.org/ChangeLog/9.0 for any user-visible changes. -- PMM
Re: [PATCH] libqos/virtio.c: Correct 'flags' reading in qvirtqueue_kick
Cc'ing Marc & Stefan for commit 1053587c3f ("libqos: Added EVENT_IDX support"). On 20/3/24 10:04, Zheyu Ma wrote: In qvirtqueue_kick(), the 'flags' were previously being incorrectly read from vq->avail instead of the correct vq->used location. This update ensures 'flags' are read from the correct location as per the virtio standard. Signed-off-by: Zheyu Ma --- tests/qtest/libqos/virtio.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/qtest/libqos/virtio.c b/tests/qtest/libqos/virtio.c index 82a6e122bf..a21b6eee9c 100644 --- a/tests/qtest/libqos/virtio.c +++ b/tests/qtest/libqos/virtio.c @@ -394,7 +394,7 @@ void qvirtqueue_kick(QTestState *qts, QVirtioDevice *d, QVirtQueue *vq, qvirtio_writew(d, qts, vq->avail + 2, idx + 1); /* Must read after idx is updated */ -flags = qvirtio_readw(d, qts, vq->avail); +flags = qvirtio_readw(d, qts, vq->used); avail_event = qvirtio_readw(d, qts, vq->used + 4 + sizeof(struct vring_used_elem) * vq->size); Reviewed-by: Philippe Mathieu-Daudé
Re: [PATCH-for-9.0 0/2] target/monitor: Deprecate 'info tlb/mem' in favor of 'info mmu'
On Wed, 20 Mar 2024 at 16:40, Philippe Mathieu-Daudé wrote: > > 'info tlb' and 'info mem' commands don't scale in heterogeneous > emulation. They will be reworked after the next release, hidden > behind the 'info mmu' command. It is not too late to deprecate > commands, so add the 'info mmu' command as wrapper to the other > ones, but already deprecate them. > > Philippe Mathieu-Daudé (2): > target/monitor: Introduce 'info mmu' command > target/monitor: Deprecate 'info tlb' and 'info mem' commands This seems to replace "info tlb" and "info mem" with "info mmu -t" and "info mmu -m", but it doesn't really say anything about: * what the difference is between these two things * which targets implement which and why * what the plan is for the future I am definitely not a fan of either of these commands, because (as we currently implement them) they effectively require each target architecture to implement a second copy of the page table walking code. But before we can deprecate them we need to be pretty sure that "info mmu" is what we want to replace them with. thanks -- PMM
Re: [PATCH 3/4] hw/nmi: Remove @cpu_index argument from NMIClass::nmi_handler()
On 20/3/24 14:23, Peter Maydell wrote: On Tue, 20 Feb 2024 at 15:09, Philippe Mathieu-Daudé wrote: Only s390x was using the 'cpu_index' argument, but since the previous commit it isn't anymore (it use the first cpu). Since this argument is now completely unused, remove it. Have the callback return a boolean indicating failure. Signed-off-by: Philippe Mathieu-Daudé --- include/hw/nmi.h | 11 ++- hw/core/nmi.c | 3 +-- hw/hppa/machine.c | 8 +--- hw/i386/x86.c | 7 --- hw/intc/m68k_irqc.c| 6 -- hw/m68k/q800-glue.c| 6 -- hw/misc/macio/gpio.c | 6 -- hw/ppc/pnv.c | 6 -- hw/ppc/spapr.c | 6 -- hw/s390x/s390-virtio-ccw.c | 6 -- 10 files changed, 44 insertions(+), 21 deletions(-) diff --git a/include/hw/nmi.h b/include/hw/nmi.h index fff41bebc6..c70db941c9 100644 --- a/include/hw/nmi.h +++ b/include/hw/nmi.h @@ -37,7 +37,16 @@ typedef struct NMIState NMIState; struct NMIClass { InterfaceClass parent_class; -void (*nmi_monitor_handler)(NMIState *n, int cpu_index, Error **errp); +/** + * nmi_handler: Callback to handle NMI notifications. + * + * @n: Class #NMIState state + * @errp: pointer to error object + * + * On success, return %true. + * On failure, store an error through @errp and return %false. + */ +bool (*nmi_handler)(NMIState *n, Error **errp); Any particular reason to change the method name here? Do we really need to indicate failure both through the bool return and the Error** ? No, but this is the style *recommended* by the Error API since commit e3fe3988d7 ("error: Document Error API usage rules"): error: Document Error API usage rules This merely codifies existing practice, with one exception: the rule advising against returning void, where existing practice is mixed. When the Error API was created, we adopted the (unwritten) rule to return void when the function returns no useful value on success, unlike GError, which recommends to return true on success and false on error then. [...] Make the rule advising against returning void official by putting it in writing. This will hopefully reduce confusion. * - Whenever practical, also return a value that indicates success / * failure. This can make the error checking more concise, and can * avoid useless error object creation and destruction. Note that * we still have many functions returning void. We recommend * • bool-valued functions return true on success / false on failure, * • pointer-valued functions return non-null / null pointer, and * • integer-valued functions return non-negative / negative. Anyway I'll respin removing @cpu_index as a single change :)
[PATCH v2 2/2] target/riscv/csr: Added the ability to delegate LCOFI to VS
From: Vadim Shakirov In the AIA specification in the paragraph "Virtual interrupts for VS level" it is indicated for interrupts 13-63: if the bit in hideleg is enabled, then the corresponding vsip and vsie bits are aliases to sip and sie Signed-off-by: Vadim Shakirov Reviewed-by: Alistair Francis --- target/riscv/csr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target/riscv/csr.c b/target/riscv/csr.c index 0c21145eaf..51b1099e10 100644 --- a/target/riscv/csr.c +++ b/target/riscv/csr.c @@ -1136,7 +1136,7 @@ static RISCVException write_stimecmph(CPURISCVState *env, int csrno, static const uint64_t delegable_ints = S_MODE_INTERRUPTS | VS_MODE_INTERRUPTS | MIP_LCOFIP; static const uint64_t vs_delegable_ints = -(VS_MODE_INTERRUPTS | LOCAL_INTERRUPTS) & ~MIP_LCOFIP; +VS_MODE_INTERRUPTS | LOCAL_INTERRUPTS; static const uint64_t all_ints = M_MODE_INTERRUPTS | S_MODE_INTERRUPTS | HS_MODE_INTERRUPTS | LOCAL_INTERRUPTS; #define DELEGABLE_EXCPS ((1ULL << (RISCV_EXCP_INST_ADDR_MIS)) | \ -- 2.25.1
[PATCH v2 1/2] target/riscv/csr.c: Add functional of hvictl CSR
CSR hvictl (Hypervisor Virtual Interrupt Control) provides further flexibility for injecting interrupts into VS level in situations not fully supported by the facilities described thus far, but only with more active involvement of the hypervisor. A hypervisor must use hvictl for any of the following: • asserting for VS level a major interrupt not supported by hvien and hvip; • implementing configurability of priorities at VS level for major interrupts beyond those sup- ported by hviprio1 and hviprio2; or • emulating an external interrupt controller for a virtual hart without the use of an IMSIC’s guest interrupt file, while also supporting configurable priorities both for external interrupts and for major interrupts to the virtual hart. All hvictl fields together can affect the value of CSR vstopi (Virtual Supervisor Top Interrupt) and therefore the interrupt identity reported in vscause when an interrupt traps to VS-mode. When hvictl.VTI = 1, the absence of an interrupt for VS level can be indicated only by setting hvictl.IID = 9. Software might want to use the pair IID = 9, IPRIO = 0 generally to represent no interrupt in hvictl. (See riscv-interrupts-1.0: Interrupts at VS level) Signed-off-by: Irina Ryapolova --- Changes for v2: -added more information in commit message --- target/riscv/csr.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/target/riscv/csr.c b/target/riscv/csr.c index 674ea075a4..0c21145eaf 100644 --- a/target/riscv/csr.c +++ b/target/riscv/csr.c @@ -3585,6 +3585,21 @@ static int read_hvictl(CPURISCVState *env, int csrno, target_ulong *val) static int write_hvictl(CPURISCVState *env, int csrno, target_ulong val) { env->hvictl = val & HVICTL_VALID_MASK; +if (env->hvictl & HVICTL_VTI) +{ +uint32_t hviid = get_field(env->hvictl, HVICTL_IID); +uint32_t hviprio = get_field(env->hvictl, HVICTL_IPRIO); +/* the pair IID = 9, IPRIO = 0 generally to represent no interrupt in hvictl. */ +if (!(hviid == IRQ_S_EXT && hviprio == 0)) { +uint64_t new_val = BIT(hviid) ; + if (new_val & S_MODE_INTERRUPTS) { +rmw_hvip64(env, csrno, NULL, new_val << 1, new_val << 1); +} else if (new_val & LOCAL_INTERRUPTS) { +rmw_hvip64(env, csrno, NULL, new_val, new_val); +} +} +} + return RISCV_EXCP_NONE; } -- 2.25.1
[PATCH-for-9.0 2/2] target/monitor: Deprecate 'info tlb' and 'info mem' commands
'info tlb' has been replaced by 'info mmu -t', and 'info mem' by 'info mmu -m'. Signed-off-by: Philippe Mathieu-Daudé --- docs/about/deprecated.rst| 10 ++ include/monitor/hmp-target.h | 2 ++ monitor/hmp-cmds-target.c| 20 target/i386/monitor.c| 4 ++-- target/m68k/monitor.c| 2 +- target/nios2/monitor.c | 2 +- target/ppc/ppc-qmp-cmds.c| 2 +- target/riscv/monitor.c | 2 +- target/sh4/monitor.c | 2 +- target/sparc/monitor.c | 2 +- target/xtensa/monitor.c | 2 +- hmp-commands-info.hx | 8 12 files changed, 41 insertions(+), 17 deletions(-) diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst index 7b548519b5..4f5f4becbe 100644 --- a/docs/about/deprecated.rst +++ b/docs/about/deprecated.rst @@ -158,6 +158,16 @@ points was removed in 7.0. However QMP still exposed the vcpu parameter. This argument has now been deprecated and the remaining remaining trace points that used it are selected just by name. +Human Monitor Protocol (HMP) commands +- + +``info tlb`` and ``info mem`` (since 9.0) +' + +The ``info tlb`` and ``info mem`` commands have been replaced by +the ``info mmu`` command, which has the same behaviour but a less +misleading name. + Host Architectures -- diff --git a/include/monitor/hmp-target.h b/include/monitor/hmp-target.h index 2af84b1915..057f7c6841 100644 --- a/include/monitor/hmp-target.h +++ b/include/monitor/hmp-target.h @@ -46,7 +46,9 @@ CPUState *mon_get_cpu(Monitor *mon); void hmp_info_mmu(Monitor *mon, const QDict *qdict); void hmp_info_mem(Monitor *mon, const QDict *qdict); +void hmp_info_mem_deprecated(Monitor *mon, const QDict *qdict); void hmp_info_tlb(Monitor *mon, const QDict *qdict); +void hmp_info_tlb_deprecated(Monitor *mon, const QDict *qdict); void hmp_mce(Monitor *mon, const QDict *qdict); void hmp_info_local_apic(Monitor *mon, const QDict *qdict); void hmp_info_sev(Monitor *mon, const QDict *qdict); diff --git a/monitor/hmp-cmds-target.c b/monitor/hmp-cmds-target.c index 71bce4870a..086b58b8d6 100644 --- a/monitor/hmp-cmds-target.c +++ b/monitor/hmp-cmds-target.c @@ -382,19 +382,31 @@ void hmp_gpa2hpa(Monitor *mon, const QDict *qdict) #endif __attribute__((weak)) -void hmp_info_mem(Monitor *mon, const QDict *qdict) +void hmp_info_mem_deprecated(Monitor *mon, const QDict *qdict) { monitor_puts(mon, "No per-CPU mapping information available on this target\n"); } __attribute__((weak)) -void hmp_info_tlb(Monitor *mon, const QDict *qdict) +void hmp_info_tlb_deprecated(Monitor *mon, const QDict *qdict) { monitor_puts(mon, "No per-CPU TLB information available on this target\n"); } +void hmp_info_mem(Monitor *mon, const QDict *qdict) +{ +monitor_puts(mon, "This command is deprecated, please use 'info mmu -m'\n"); +hmp_info_mem_deprecated(mon, qdict); +} + +void hmp_info_tlb(Monitor *mon, const QDict *qdict) +{ +monitor_puts(mon, "This command is deprecated, please use 'info mmu -t'\n"); +hmp_info_tlb_deprecated(mon, qdict); +} + void hmp_info_mmu(Monitor *mon, const QDict *qdict) { bool tlb = qdict_get_try_bool(qdict, "tlb", false); @@ -410,9 +422,9 @@ void hmp_info_mmu(Monitor *mon, const QDict *qdict) } if (mem) { -hmp_info_mem(mon, qdict); +hmp_info_mem_deprecated(mon, qdict); } if (tlb) { -hmp_info_tlb(mon, qdict); +hmp_info_tlb_deprecated(mon, qdict); } } diff --git a/target/i386/monitor.c b/target/i386/monitor.c index 3a281dab02..5da77b6b22 100644 --- a/target/i386/monitor.c +++ b/target/i386/monitor.c @@ -217,7 +217,7 @@ static void tlb_info_la57(Monitor *mon, CPUArchState *env) } #endif /* TARGET_X86_64 */ -void hmp_info_tlb(Monitor *mon, const QDict *qdict) +void hmp_info_tlb_deprecated(Monitor *mon, const QDict *qdict) { CPUArchState *env; @@ -545,7 +545,7 @@ static void mem_info_la57(Monitor *mon, CPUArchState *env) } #endif /* TARGET_X86_64 */ -void hmp_info_mem(Monitor *mon, const QDict *qdict) +void hmp_info_mem_deprecated(Monitor *mon, const QDict *qdict) { CPUArchState *env; diff --git a/target/m68k/monitor.c b/target/m68k/monitor.c index 2bdf6acae0..ea303805c4 100644 --- a/target/m68k/monitor.c +++ b/target/m68k/monitor.c @@ -10,7 +10,7 @@ #include "monitor/hmp-target.h" #include "monitor/monitor.h" -void hmp_info_tlb(Monitor *mon, const QDict *qdict) +void hmp_info_tlb_deprecated(Monitor *mon, const QDict *qdict) { CPUArchState *env1 = mon_get_cpu_env(mon); diff --git a/target/nios2/monitor.c b/target/nios2/monitor.c index 0152dec3fa..2e4efee1aa 100644 --- a/target/nios2/monitor.c +++ b/target/nios2/monitor.c @@ -27,7 +27,7 @@ #include "monitor/hmp-target.h" #include "monitor/hmp.h" -void hmp_info_tlb(Monitor *mon, const QDict *qdict) +void
[PATCH-for-9.0 1/2] target/monitor: Introduce 'info mmu' command
Introduce the 'info mmu' command. For now it only forward to the 'info tlb' and 'info mem' commands, which will be deprecated. Signed-off-by: Philippe Mathieu-Daudé --- include/monitor/hmp-target.h | 1 + monitor/hmp-cmds-target.c| 37 hmp-commands-info.hx | 14 ++ 3 files changed, 52 insertions(+) diff --git a/include/monitor/hmp-target.h b/include/monitor/hmp-target.h index d78e979f05..2af84b1915 100644 --- a/include/monitor/hmp-target.h +++ b/include/monitor/hmp-target.h @@ -44,6 +44,7 @@ int target_get_monitor_def(CPUState *cs, const char *name, uint64_t *pval); CPUArchState *mon_get_cpu_env(Monitor *mon); CPUState *mon_get_cpu(Monitor *mon); +void hmp_info_mmu(Monitor *mon, const QDict *qdict); void hmp_info_mem(Monitor *mon, const QDict *qdict); void hmp_info_tlb(Monitor *mon, const QDict *qdict); void hmp_mce(Monitor *mon, const QDict *qdict); diff --git a/monitor/hmp-cmds-target.c b/monitor/hmp-cmds-target.c index 9338ae8440..71bce4870a 100644 --- a/monitor/hmp-cmds-target.c +++ b/monitor/hmp-cmds-target.c @@ -31,6 +31,7 @@ #include "qapi/error.h" #include "qapi/qmp/qdict.h" #include "sysemu/hw_accel.h" +#include "sysemu/tcg.h" /* Set the current CPU defined by the user. Callers must hold BQL. */ int monitor_set_cpu(Monitor *mon, int cpu_index) @@ -379,3 +380,39 @@ void hmp_gpa2hpa(Monitor *mon, const QDict *qdict) memory_region_unref(mr); } #endif + +__attribute__((weak)) +void hmp_info_mem(Monitor *mon, const QDict *qdict) +{ +monitor_puts(mon, + "No per-CPU mapping information available on this target\n"); +} + +__attribute__((weak)) +void hmp_info_tlb(Monitor *mon, const QDict *qdict) +{ +monitor_puts(mon, + "No per-CPU TLB information available on this target\n"); +} + +void hmp_info_mmu(Monitor *mon, const QDict *qdict) +{ +bool tlb = qdict_get_try_bool(qdict, "tlb", false); +bool mem = qdict_get_try_bool(qdict, "mem", false); + +if (!tcg_enabled()) { +monitor_puts(mon, "This command is specific to TCG accelerator\n"); +return; +} + +if (!tlb && !mem) { +tlb = mem = true; +} + +if (mem) { +hmp_info_mem(mon, qdict); +} +if (tlb) { +hmp_info_tlb(mon, qdict); +} +} diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx index ad1b1306e3..e31f2467fb 100644 --- a/hmp-commands-info.hx +++ b/hmp-commands-info.hx @@ -208,6 +208,20 @@ SRST Show PCI information. ERST +{ +.name = "mmu", +.args_type = "tlb:-t,mem:-m", +.params = "[-t][-m]", +.help = "show virtual to physical memory " + "(-t: TLB; -m: active mapping)", +.cmd= hmp_info_mmu, +}, + +SRST + ``info mmu`` +Show virtual to physical memory mappings. +ERST + #if defined(TARGET_I386) || defined(TARGET_SH4) || defined(TARGET_SPARC) || \ defined(TARGET_PPC) || defined(TARGET_XTENSA) || defined(TARGET_M68K) { -- 2.41.0
[PATCH-for-9.0 0/2] target/monitor: Deprecate 'info tlb/mem' in favor of 'info mmu'
'info tlb' and 'info mem' commands don't scale in heterogeneous emulation. They will be reworked after the next release, hidden behind the 'info mmu' command. It is not too late to deprecate commands, so add the 'info mmu' command as wrapper to the other ones, but already deprecate them. Philippe Mathieu-Daudé (2): target/monitor: Introduce 'info mmu' command target/monitor: Deprecate 'info tlb' and 'info mem' commands docs/about/deprecated.rst| 10 include/monitor/hmp-target.h | 3 +++ monitor/hmp-cmds-target.c| 49 target/i386/monitor.c| 4 +-- target/m68k/monitor.c| 2 +- target/nios2/monitor.c | 2 +- target/ppc/ppc-qmp-cmds.c| 2 +- target/riscv/monitor.c | 2 +- target/sh4/monitor.c | 2 +- target/sparc/monitor.c | 2 +- target/xtensa/monitor.c | 2 +- hmp-commands-info.hx | 22 +--- 12 files changed, 89 insertions(+), 13 deletions(-) -- 2.41.0
Re: [PATCH v3 06/49] RAMBlock: Add support of KVM private guest memfd
On 3/20/24 09:39, Michael Roth wrote: @@ -1842,6 +1842,17 @@ static void ram_block_add(RAMBlock *new_block, Error **errp) } } +if (kvm_enabled() && (new_block->flags & RAM_GUEST_MEMFD)) { +assert(new_block->guest_memfd < 0); + +new_block->guest_memfd = kvm_create_guest_memfd(new_block->max_length, +0, errp); +if (new_block->guest_memfd < 0) { +qemu_mutex_unlock_ramlist(); +return; +} +} + This potentially leaks new_block->host. This can be squashed into the patch: diff --git a/system/physmem.c b/system/physmem.c index 3a4a3f10d5a..0836aff190e 100644 --- a/system/physmem.c +++ b/system/physmem.c @@ -1810,6 +1810,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp) const bool shared = qemu_ram_is_shared(new_block); RAMBlock *block; RAMBlock *last_block = NULL; +bool free_on_error = false; ram_addr_t old_ram_size, new_ram_size; Error *err = NULL; @@ -1839,6 +1841,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp) return; } memory_try_enable_merging(new_block->host, new_block->max_length); +free_on_error = true; } } @@ -1849,7 +1852,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp) 0, errp); if (new_block->guest_memfd < 0) { qemu_mutex_unlock_ramlist(); -return; +goto out_free; } } @@ -1901,6 +1904,13 @@ static void ram_block_add(RAMBlock *new_block, Error **errp) ram_block_notify_add(new_block->host, new_block->used_length, new_block->max_length); } +return; + +out_free: +if (free_on_error) { +qemu_anon_ram_free(new_block->host, new_block->max_length); +new_block->host = NULL; +} } #ifdef CONFIG_POSIX
Re: Intention to work on GSoC project
On Mon, Mar 18, 2024 at 8:47 PM Sahil wrote: > > Hi, > > I was reading the "Virtqueues and virtio ring: How the data travels" > article [1]. There are a few things that I have not understood in the > "avail rings" section. > > Q1. > Step 2 in the "Process to make a buffer available" diagram depicts > how the virtio driver writes the descriptor index in the avail ring. > In the example, the descriptor index #0 is written in the first entry. > But in figure 2, the number 0 is in the 4th position in the avail ring. > Is the avail ring queue an array of "struct virtq_avail" which maintains > metadata such as the number of descriptor indexes in the header? > struct virtq_avail has two members: uint16_t idx and ring[]. To be in the first position of the avail ring means to be in ring[0] there. Idx and ring[] are just headers in the figure, not actual positions. Same as Avail. Now that you mention maybe there is a better way to represent that, yes. Let me know if I didn't explain it well. > Also, in the second position, the number changes from 0 (figure 1) to > 1 (figure 2). I haven't understood what idx, 0 (later 1) and ring[] represent > in the figures. Does this number represent the number of descriptors > that are currently in the avail ring? > It is the position in ring[] where the device needs to stop looking for descriptors. It starts at 0, and when the device sees 1 it means ring[0] has a descriptor to process. Now you need to apply a "modulo virtqueue size" to that index. So if the virtqueue is 256, avail_idx 257 means the last valid descriptor is at 0. This happens naturally when the driver keeps adding descriptors and wraps the queue. The authoritative source of this is the VirtQueues section of the virtio standard [1], feel free to check it in case it clarifies something better. > Q2. > > There's this paragraph in the article right below the above mentioned > diagram: > > > The avail ring must be able to hold the same number of descriptors > > as the descriptor area, and the descriptor area must have a size power > > of two, so idx wraps naturally at some point. For example, if the ring > > size is 256 entries, idx 1 references the same descriptor as idx 257, 513... > > And it will wrap at a 16 bit boundary. This way, neither side needs to > > worry about processing an invalid idx: They are all valid. > > I haven't really understood this. I have understood that idx is calculated > as idx mod queue_length. But I haven't understood the "16 bit boundary" > part. > avail_idx is an uin16_t, so ((uint16_t)-1) + 1 == 0. > I am also not very clear on how a queue length that is not a power of 2 > might cause trouble. Could you please expand on this? > That's a limitation in the standard, but I'm not sure where it comes from beyond being computationally easier to calculate ring position with a mask than with a remainder of a random non-power-of-two number. Packed virtqueue removes that limitation. > Q3. > I have started going through the source code in > "drivers/virtio/virtio_ring.c". > I have understood that the virtio driver runs in the guest's kernel. Does that > mean the drivers in "drivers/virtio/*" are enabled when linux is being run in > a guest VM? > For PCI devices, as long as it detects a device with vendor == Red Hat, Inc. (0x1AF4) and device ID 0x1000 through 0x107F inclusive, yes. You can also load and unload manually with modprobe as other drivers. Let me know if you have more doubts. Thanks! [1] https://docs.oasis-open.org/virtio/virtio/v1.3/virtio-v1.3.html > Thanks, > Sahil > > [1] https://www.redhat.com/en/blog/virtqueues-and-virtio-ring-how-data-travels > > > >
RE: [PATCH v5 7/7] tests/migration-test: add qpl compression test
> -Original Message- > From: Daniel P. Berrangé > Sent: Wednesday, March 20, 2024 11:40 PM > To: Liu, Yuan1 > Cc: pet...@redhat.com; faro...@suse.de; qemu-devel@nongnu.org; > hao.xi...@bytedance.com; bryan.zh...@bytedance.com; Zou, Nanhai > > Subject: Re: [PATCH v5 7/7] tests/migration-test: add qpl compression test > > On Wed, Mar 20, 2024 at 03:30:40PM +, Liu, Yuan1 wrote: > > > -Original Message- > > > From: Daniel P. Berrangé > > > Sent: Wednesday, March 20, 2024 6:46 PM > > > To: Liu, Yuan1 > > > Cc: pet...@redhat.com; faro...@suse.de; qemu-devel@nongnu.org; > > > hao.xi...@bytedance.com; bryan.zh...@bytedance.com; Zou, Nanhai > > > > > > Subject: Re: [PATCH v5 7/7] tests/migration-test: add qpl compression > test > > > > > > On Wed, Mar 20, 2024 at 12:45:27AM +0800, Yuan Liu wrote: > > > > add qpl to compression method test for multifd migration > > > > > > > > the migration with qpl compression needs to access IAA hardware > > > > resource, please run "check-qtest" with sudo or root permission, > > > > otherwise migration test will fail > > > > > > That's not an acceptable requirement. > > > > > > If someone builds QEMU with QPL, the migration test *must* > > > pass 100% reliably when either running on a host without > > > the QPL required hardware, or when lacking permissions. > > > > > > The test case needs to detect these scenarios and automatically > > > skip the test if it is incapable of running successfully. > > > This raises another question though. If QPL migration requires > > > running as root, then it is effectively unusable for QEMU, as > > > no sane deployment ever runs QEMU as root. > > > > > > Is there a way to make QPL work for non-root users ? > > > > There are two issues here > > 1. I need to add an IAA resource detection before the QPL test begins > >In this way, when QPL resources are unavailable, the live migration > >test will not be affected. > > > > 2. I need to add some additional information about IAA configuration in > >the devel/qpl-compression.rst documentation. In addition to > configuring > >IAA resources, the system administrator also needs to assign IAA > resources > >to user groups. > >For example, the system administrator runs "chown -R user /dev/iax", > then > >all IAA resources can be accessed by "user", this method does not > require > >sudo or root permissions > > Ok, so in the test suite you likely should do something > approximately like > > #ifdef CONFIG_QPL > if (access("/dev/iax", R_OK|W_OK) == 0) { > migration_test_add("/migration/multifd/tcp/plain/qpl", >test_multifd_tcp_qpl); > } > #endif > > possibly more if you need to actually query supported features > of /dev/iax before trying to use it Yes, very thanks for your suggestion, I will fix this in the next version. > > > > Signed-off-by: Yuan Liu > > > > Reviewed-by: Nanhai Zou > > > > --- > > > > tests/qtest/migration-test.c | 24 > > > > 1 file changed, 24 insertions(+) > > > > > > > > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration- > test.c > > > > index 71895abb7f..052d0d60fd 100644 > > > > --- a/tests/qtest/migration-test.c > > > > +++ b/tests/qtest/migration-test.c > > > > @@ -2815,6 +2815,15 @@ > > > test_migrate_precopy_tcp_multifd_zstd_start(QTestState *from, > > > > } > > > > #endif /* CONFIG_ZSTD */ > > > > > > > > +#ifdef CONFIG_QPL > > > > +static void * > > > > +test_migrate_precopy_tcp_multifd_qpl_start(QTestState *from, > > > > +QTestState *to) > > > > +{ > > > > +return test_migrate_precopy_tcp_multifd_start_common(from, to, > > > "qpl"); > > > > +} > > > > +#endif /* CONFIG_QPL */ > > > > + > > > > static void test_multifd_tcp_none(void) > > > > { > > > > MigrateCommon args = { > > > > @@ -2880,6 +2889,17 @@ static void test_multifd_tcp_zstd(void) > > > > } > > > > #endif > > > > > > > > +#ifdef CONFIG_QPL > > > > +static void test_multifd_tcp_qpl(void) > > > > +{ > > > > +MigrateCommon args = { > > > > +.listen_uri = "defer", > > > > +.start_hook = test_migrate_precopy_tcp_multifd_qpl_start, > > > > +}; > > > > +test_precopy_common(); > > > > +} > > > > +#endif > > > > + > > > > #ifdef CONFIG_GNUTLS > > > > static void * > > > > test_migrate_multifd_tcp_tls_psk_start_match(QTestState *from, > > > > @@ -3789,6 +3809,10 @@ int main(int argc, char **argv) > > > > migration_test_add("/migration/multifd/tcp/plain/zstd", > > > > test_multifd_tcp_zstd); > > > > #endif > > > > +#ifdef CONFIG_QPL > > > > +migration_test_add("/migration/multifd/tcp/plain/qpl", > > > > + test_multifd_tcp_qpl); > > > > +#endif > > > > #ifdef CONFIG_GNUTLS > > > > migration_test_add("/migration/multifd/tcp/tls/psk/match", > > > > test_multifd_tcp_tls_psk_match); > > > > -- > > > > 2.39.3 > > > > > >
Re: [PATCH v3 19/49] kvm: Make kvm_convert_memory() obey ram_block_discard_is_enabled()
On 3/20/24 09:39, Michael Roth wrote: Some subsystems like VFIO might disable ram block discard for uncoordinated cases. Since kvm_convert_memory()/guest_memfd don't implement a RamDiscardManager handler to convey discard operations to various listeners like VFIO. > Because of this, sequences like the following can result due to stale IOMMU mappings: Alternatively, should guest-memfd memory regions call ram_block_discard_require(true)? This will prevent VFIO from operating, but it will avoid consuming twice the memory. If desirable, guest-memfd support can be changed to implement an extension of RamDiscardManager that notifies about private/shared memory changes, and then guest-memfd would be able to support coordinated discard. But I wonder if that's doable at all - how common are shared<->private flips, and is it feasible to change the IOMMU page tables every time? If the real solution is SEV-TIO (which means essentially guest_memfd support for VFIO), calling ram_block_discard_require(true) may be the simplest stopgap solution. Paolo - convert page shared->private - discard shared page - convert page private->shared - new page is allocated - issue DMA operations against that shared page Address this by taking ram_block_discard_is_enabled() into account when deciding whether or not to discard pages. Signed-off-by: Michael Roth --- accel/kvm/kvm-all.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 53ce4f091e..6ae03c880f 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -2962,10 +2962,14 @@ static int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private) */ return 0; } else { -ret = ram_block_discard_range(rb, offset, size); +ret = ram_block_discard_is_disabled() + ? ram_block_discard_range(rb, offset, size) + : 0; } } else { -ret = ram_block_discard_guest_memfd_range(rb, offset, size); +ret = ram_block_discard_is_disabled() + ? ram_block_discard_guest_memfd_range(rb, offset, size) + : 0; } } else { error_report("Convert non guest_memfd backed memory region "
RE: [PATCH v5 5/7] migration/multifd: implement initialization of qpl compression
> -Original Message- > From: Peter Xu > Sent: Wednesday, March 20, 2024 11:35 PM > To: Liu, Yuan1 > Cc: Daniel P. Berrangé ; faro...@suse.de; qemu- > de...@nongnu.org; hao.xi...@bytedance.com; bryan.zh...@bytedance.com; Zou, > Nanhai > Subject: Re: [PATCH v5 5/7] migration/multifd: implement initialization of > qpl compression > > On Wed, Mar 20, 2024 at 03:02:59PM +, Liu, Yuan1 wrote: > > > > +static int alloc_zbuf(QplData *qpl, uint8_t chan_id, Error **errp) > > > > +{ > > > > +int flags = MAP_PRIVATE | MAP_POPULATE | MAP_ANONYMOUS; > > > > +uint32_t size = qpl->job_num * qpl->data_size; > > > > +uint8_t *buf; > > > > + > > > > +buf = (uint8_t *) mmap(NULL, size, PROT_READ | PROT_WRITE, > flags, - > > > 1, 0); > > > > +if (buf == MAP_FAILED) { > > > > +error_setg(errp, "multifd: %u: alloc_zbuf failed, job > num %u, > > > size %u", > > > > + chan_id, qpl->job_num, qpl->data_size); > > > > +return -1; > > > > +} > > > > > > What's the reason for using mmap here, rather than a normal > > > malloc ? > > > > I want to populate the memory accessed by the IAA device in the > initialization > > phase, and then avoid initiating I/O page faults through the IAA device > during > > migration, a large number of I/O page faults are not good for > performance. > > mmap() doesn't populate pages, unless with MAP_POPULATE. And even with > that it shouldn't be guaranteed, as the populate phase should ignore all > errors. > >MAP_POPULATE (since Linux 2.5.46) > Populate (prefault) page tables for a mapping. For a file > map‐ > ping, this causes read-ahead on the file. This will help to > re‐ > duce blocking on page faults later. The mmap() call > doesn't > fail if the mapping cannot be populated (for example, due > to > limitations on the number of mapped huge pages when > using > MAP_HUGETLB). Support for MAP_POPULATE in conjunction with > pri‐ > vate mappings was added in Linux 2.6.23. > > OTOH, I think g_malloc0() should guarantee to prefault everything in as > long as the call returned (even though they can be swapped out later, but > that applies to all cases anyway). Thanks, Peter. I will try the g_malloc0 method here > > This problem also occurs at the destination, therefore, I recommend that > > customers need to add -mem-prealloc for destination boot parameters. > > I'm not sure what issue you hit when testing it, but -mem-prealloc flag > should only control the guest memory backends not the buffers that QEMU > internally use, afaiu. > > Thanks, > > -- > Peter Xu let me explain here, during the decompression operation of IAA, the decompressed data can be directly output to the virtual address of the guest memory by IAA hardware. It can avoid copying the decompressed data to guest memory by CPU. Without -mem-prealloc, all the guest memory is not populated, and IAA hardware needs to trigger I/O page fault first and then output the decompressed data to the guest memory region. Besides that, CPU page faults will also trigger IOTLB flush operation when IAA devices use SVM. Due to the inability to quickly resolve a large number of IO page faults and IOTLB flushes, the decompression throughput of the IAA device will decrease significantly.
[PATCH v4 3/7] qga/commands-posix: qmp_guest_shutdown: use ga_run_command helper
Also remove the G_GNUC_UNUSED attribute added in the previous commit from the helper. Signed-off-by: Andrey Drobyshev Reviewed-by: Daniel P. Berrangé --- qga/commands-posix.c | 39 ++- 1 file changed, 6 insertions(+), 33 deletions(-) diff --git a/qga/commands-posix.c b/qga/commands-posix.c index 9b1bdf194c..cb9eed9a0b 100644 --- a/qga/commands-posix.c +++ b/qga/commands-posix.c @@ -108,7 +108,6 @@ static ssize_t ga_pipe_read_str(int fd[2], char **str) * sending string to stdin and taking error message from * stdout/err. */ -G_GNUC_UNUSED static int ga_run_command(const char *argv[], const char *in_str, const char *action, Error **errp) { @@ -230,8 +229,6 @@ void qmp_guest_shutdown(const char *mode, Error **errp) { const char *shutdown_flag; Error *local_err = NULL; -pid_t pid; -int status; #ifdef CONFIG_SOLARIS const char *powerdown_flag = "-i5"; @@ -260,46 +257,22 @@ void qmp_guest_shutdown(const char *mode, Error **errp) return; } -pid = fork(); -if (pid == 0) { -/* child, start the shutdown */ -setsid(); -reopen_fd_to_null(0); -reopen_fd_to_null(1); -reopen_fd_to_null(2); - +const char *argv[] = {"/sbin/shutdown", #ifdef CONFIG_SOLARIS -execl("/sbin/shutdown", "shutdown", shutdown_flag, "-g0", "-y", - "hypervisor initiated shutdown", (char *)NULL); + shutdown_flag, "-g0", "-y", #elif defined(CONFIG_BSD) -execl("/sbin/shutdown", "shutdown", shutdown_flag, "+0", - "hypervisor initiated shutdown", (char *)NULL); + shutdown_flag, "+0", #else -execl("/sbin/shutdown", "shutdown", "-h", shutdown_flag, "+0", - "hypervisor initiated shutdown", (char *)NULL); + "-h", shutdown_flag, "+0", #endif -_exit(EXIT_FAILURE); -} else if (pid < 0) { -error_setg_errno(errp, errno, "failed to create child process"); -return; -} + "hypervisor initiated shutdown", (char *) NULL}; -ga_wait_child(pid, , _err); +ga_run_command(argv, NULL, "shutdown", _err); if (local_err) { error_propagate(errp, local_err); return; } -if (!WIFEXITED(status)) { -error_setg(errp, "child process has terminated abnormally"); -return; -} - -if (WEXITSTATUS(status)) { -error_setg(errp, "child process has failed to shutdown"); -return; -} - /* succeeded */ } -- 2.39.3
[PATCH v4 1/7] qga: guest-get-fsinfo: add optional 'total-bytes-privileged' field
Since the commit 25b5ff1a86 ("qga: add mountpoint usage info to GuestFilesystemInfo") we have 2 values reported in guest-get-fsinfo: used = (f_blocks - f_bfree), total = (f_blocks - f_bfree + f_bavail) as returned by statvfs(3). While on Windows guests that's all we can get with GetDiskFreeSpaceExA(), on POSIX guests we might also be interested in total file system size, as it's visible for root user. Let's add an optional field 'total-bytes-privileged' to GuestFilesystemInfo struct, which'd only be reported on POSIX and represent f_blocks value as returned by statvfs(3). While here, also tweak the docs to reflect better where those values come from. Signed-off-by: Andrey Drobyshev --- qga/commands-posix.c | 2 ++ qga/commands-win32.c | 1 + qga/qapi-schema.json | 7 +-- 3 files changed, 8 insertions(+), 2 deletions(-) diff --git a/qga/commands-posix.c b/qga/commands-posix.c index 26008db497..7df2d72e9f 100644 --- a/qga/commands-posix.c +++ b/qga/commands-posix.c @@ -1569,8 +1569,10 @@ static GuestFilesystemInfo *build_guest_fsinfo(struct FsMount *mount, nonroot_total = used + buf.f_bavail; fs->used_bytes = used * fr_size; fs->total_bytes = nonroot_total * fr_size; +fs->total_bytes_privileged = buf.f_blocks * fr_size; fs->has_total_bytes = true; +fs->has_total_bytes_privileged = true; fs->has_used_bytes = true; } diff --git a/qga/commands-win32.c b/qga/commands-win32.c index 6242737b00..6fee0e1e6f 100644 --- a/qga/commands-win32.c +++ b/qga/commands-win32.c @@ -1143,6 +1143,7 @@ static GuestFilesystemInfo *build_guest_fsinfo(char *guid, Error **errp) fs = g_malloc(sizeof(*fs)); fs->name = g_strdup(guid); fs->has_total_bytes = false; +fs->has_total_bytes_privileged = false; fs->has_used_bytes = false; if (len == 0) { fs->mountpoint = g_strdup("System Reserved"); diff --git a/qga/qapi-schema.json b/qga/qapi-schema.json index 9554b566a7..dcc469b268 100644 --- a/qga/qapi-schema.json +++ b/qga/qapi-schema.json @@ -1026,7 +1026,10 @@ # # @used-bytes: file system used bytes (since 3.0) # -# @total-bytes: non-root file system total bytes (since 3.0) +# @total-bytes: filesystem capacity in bytes for unprivileged users (since 3.0) +# +# @total-bytes-privileged: filesystem capacity in bytes for privileged users +# (since 9.0) # # @disk: an array of disk hardware information that the volume lies # on, which may be empty if the disk type is not supported @@ -1036,7 +1039,7 @@ { 'struct': 'GuestFilesystemInfo', 'data': {'name': 'str', 'mountpoint': 'str', 'type': 'str', '*used-bytes': 'uint64', '*total-bytes': 'uint64', - 'disk': ['GuestDiskAddress']} } + '*total-bytes-privileged': 'uint64', 'disk': ['GuestDiskAddress']} } ## # @guest-get-fsinfo: -- 2.39.3
[PATCH v4 5/7] qga/commands-posix: execute_fsfreeze_hook: use ga_run_command helper
There's no need to check for the existence of the hook executable, as the exec() call will do that for us. Signed-off-by: Andrey Drobyshev Reviewed-by: Daniel P. Berrangé --- qga/commands-posix.c | 35 +++ 1 file changed, 3 insertions(+), 32 deletions(-) diff --git a/qga/commands-posix.c b/qga/commands-posix.c index 545f3c99dc..9b993772f5 100644 --- a/qga/commands-posix.c +++ b/qga/commands-posix.c @@ -736,8 +736,6 @@ static const char *fsfreeze_hook_arg_string[] = { static void execute_fsfreeze_hook(FsfreezeHookArg arg, Error **errp) { -int status; -pid_t pid; const char *hook; const char *arg_str = fsfreeze_hook_arg_string[arg]; Error *local_err = NULL; @@ -746,42 +744,15 @@ static void execute_fsfreeze_hook(FsfreezeHookArg arg, Error **errp) if (!hook) { return; } -if (access(hook, X_OK) != 0) { -error_setg_errno(errp, errno, "can't access fsfreeze hook '%s'", hook); -return; -} -slog("executing fsfreeze hook with arg '%s'", arg_str); -pid = fork(); -if (pid == 0) { -setsid(); -reopen_fd_to_null(0); -reopen_fd_to_null(1); -reopen_fd_to_null(2); - -execl(hook, hook, arg_str, NULL); -_exit(EXIT_FAILURE); -} else if (pid < 0) { -error_setg_errno(errp, errno, "failed to create child process"); -return; -} +const char *argv[] = {hook, arg_str, NULL}; -ga_wait_child(pid, , _err); +slog("executing fsfreeze hook with arg '%s'", arg_str); +ga_run_command(argv, NULL, "execute fsfreeze hook", _err); if (local_err) { error_propagate(errp, local_err); return; } - -if (!WIFEXITED(status)) { -error_setg(errp, "fsfreeze hook has terminated abnormally"); -return; -} - -status = WEXITSTATUS(status); -if (status) { -error_setg(errp, "fsfreeze hook has failed with status %d", status); -return; -} } /* -- 2.39.3
[PATCH v4 0/7] qga/commands-posix: replace code duplicating commands with a helper
v3 -> v4: * Patch 1/7: - Replaced "since 8.3" with "since 9.0" as we're now at v9.0.0-rc0; - Renamed the field to 'total-bytes-privileged'; - Got rid of the implementation details in the docs; * Patch 6/7: added g_autoptr macro to local error declaration. v3: https://lists.nongnu.org/archive/html/qemu-devel/2024-03/msg04068.html Andrey Drobyshev (7): qga: guest-get-fsinfo: add optional 'total-bytes-privileged' field qga: introduce ga_run_command() helper for guest cmd execution qga/commands-posix: qmp_guest_shutdown: use ga_run_command helper qga/commands-posix: qmp_guest_set_time: use ga_run_command helper qga/commands-posix: execute_fsfreeze_hook: use ga_run_command helper qga/commands-posix: don't do fork()/exec() when suspending via sysfs qga/commands-posix: qmp_guest_set_user_password: use ga_run_command helper qga/commands-posix.c | 404 +++ qga/commands-win32.c | 1 + qga/qapi-schema.json | 7 +- 3 files changed, 187 insertions(+), 225 deletions(-) -- 2.39.3
[PATCH v4 6/7] qga/commands-posix: don't do fork()/exec() when suspending via sysfs
Since commit 246d76eba ("qga: guest_suspend: decoupling pm-utils and sys logic") pm-utils logic is running in a separate child from the sysfs logic. Now when suspending via sysfs we don't really need to do that in a separate process as we only need to perform one write to /sys/power/state. Let's just use g_file_set_contents() to simplify things here. Suggested-by: Daniel P. Berrangé Signed-off-by: Andrey Drobyshev Reviewed-by: Daniel P. Berrangé --- qga/commands-posix.c | 41 + 1 file changed, 5 insertions(+), 36 deletions(-) diff --git a/qga/commands-posix.c b/qga/commands-posix.c index 9b993772f5..9910957ff5 100644 --- a/qga/commands-posix.c +++ b/qga/commands-posix.c @@ -1928,52 +1928,21 @@ static bool linux_sys_state_supports_mode(SuspendMode mode, Error **errp) static void linux_sys_state_suspend(SuspendMode mode, Error **errp) { -Error *local_err = NULL; +g_autoptr(GError) local_gerr = NULL; const char *sysfile_strs[3] = {"disk", "mem", NULL}; const char *sysfile_str = sysfile_strs[mode]; -pid_t pid; -int status; if (!sysfile_str) { error_setg(errp, "unknown guest suspend mode"); return; } -pid = fork(); -if (!pid) { -/* child */ -int fd; - -setsid(); -reopen_fd_to_null(0); -reopen_fd_to_null(1); -reopen_fd_to_null(2); - -fd = open(LINUX_SYS_STATE_FILE, O_WRONLY); -if (fd < 0) { -_exit(EXIT_FAILURE); -} - -if (write(fd, sysfile_str, strlen(sysfile_str)) < 0) { -_exit(EXIT_FAILURE); -} - -_exit(EXIT_SUCCESS); -} else if (pid < 0) { -error_setg_errno(errp, errno, "failed to create child process"); -return; -} - -ga_wait_child(pid, , _err); -if (local_err) { -error_propagate(errp, local_err); +if (!g_file_set_contents(LINUX_SYS_STATE_FILE, sysfile_str, + -1, _gerr)) { +error_setg(errp, "suspend: cannot write to '%s': %s", + LINUX_SYS_STATE_FILE, local_gerr->message); return; } - -if (WEXITSTATUS(status)) { -error_setg(errp, "child process has failed to suspend"); -} - } static void guest_suspend(SuspendMode mode, Error **errp) -- 2.39.3
[PATCH v4 4/7] qga/commands-posix: qmp_guest_set_time: use ga_run_command helper
There's no need to check for the existence of "/sbin/hwclock", the exec() call will do that for us. Signed-off-by: Andrey Drobyshev Reviewed-by: Daniel P. Berrangé --- qga/commands-posix.c | 43 +++ 1 file changed, 3 insertions(+), 40 deletions(-) diff --git a/qga/commands-posix.c b/qga/commands-posix.c index cb9eed9a0b..545f3c99dc 100644 --- a/qga/commands-posix.c +++ b/qga/commands-posix.c @@ -279,21 +279,9 @@ void qmp_guest_shutdown(const char *mode, Error **errp) void qmp_guest_set_time(bool has_time, int64_t time_ns, Error **errp) { int ret; -int status; -pid_t pid; Error *local_err = NULL; struct timeval tv; -static const char hwclock_path[] = "/sbin/hwclock"; -static int hwclock_available = -1; - -if (hwclock_available < 0) { -hwclock_available = (access(hwclock_path, X_OK) == 0); -} - -if (!hwclock_available) { -error_setg(errp, QERR_UNSUPPORTED); -return; -} +const char *argv[] = {"/sbin/hwclock", has_time ? "-w" : "-s", NULL}; /* If user has passed a time, validate and set it. */ if (has_time) { @@ -324,37 +312,12 @@ void qmp_guest_set_time(bool has_time, int64_t time_ns, Error **errp) * just need to synchronize the hardware clock. However, if no time was * passed, user is requesting the opposite: set the system time from the * hardware clock (RTC). */ -pid = fork(); -if (pid == 0) { -setsid(); -reopen_fd_to_null(0); -reopen_fd_to_null(1); -reopen_fd_to_null(2); - -/* Use '/sbin/hwclock -w' to set RTC from the system time, - * or '/sbin/hwclock -s' to set the system time from RTC. */ -execl(hwclock_path, "hwclock", has_time ? "-w" : "-s", NULL); -_exit(EXIT_FAILURE); -} else if (pid < 0) { -error_setg_errno(errp, errno, "failed to create child process"); -return; -} - -ga_wait_child(pid, , _err); +ga_run_command(argv, NULL, "set hardware clock to system time", + _err); if (local_err) { error_propagate(errp, local_err); return; } - -if (!WIFEXITED(status)) { -error_setg(errp, "child process has terminated abnormally"); -return; -} - -if (WEXITSTATUS(status)) { -error_setg(errp, "hwclock failed to set hardware clock to system time"); -return; -} } typedef enum { -- 2.39.3
[PATCH v4 7/7] qga/commands-posix: qmp_guest_set_user_password: use ga_run_command helper
There's no need to check for the existence of the "chpasswd", "pw" executables, as the exec() call will do that for us. Signed-off-by: Andrey Drobyshev Reviewed-by: Daniel P. Berrangé --- qga/commands-posix.c | 96 ++-- 1 file changed, 13 insertions(+), 83 deletions(-) diff --git a/qga/commands-posix.c b/qga/commands-posix.c index 9910957ff5..7a065c4085 100644 --- a/qga/commands-posix.c +++ b/qga/commands-posix.c @@ -2151,14 +2151,8 @@ void qmp_guest_set_user_password(const char *username, Error **errp) { Error *local_err = NULL; -char *passwd_path = NULL; -pid_t pid; -int status; -int datafd[2] = { -1, -1 }; -char *rawpasswddata = NULL; +g_autofree char *rawpasswddata = NULL; size_t rawpasswdlen; -char *chpasswddata = NULL; -size_t chpasswdlen; rawpasswddata = (char *)qbase64_decode(password, -1, , errp); if (!rawpasswddata) { @@ -2169,95 +2163,31 @@ void qmp_guest_set_user_password(const char *username, if (strchr(rawpasswddata, '\n')) { error_setg(errp, "forbidden characters in raw password"); -goto out; +return; } if (strchr(username, '\n') || strchr(username, ':')) { error_setg(errp, "forbidden characters in username"); -goto out; +return; } #ifdef __FreeBSD__ -chpasswddata = g_strdup(rawpasswddata); -passwd_path = g_find_program_in_path("pw"); +g_autofree char *chpasswdata = g_strdup(rawpasswddata); +const char *crypt_flag = crypted ? "-H" : "-h"; +const char *argv[] = {"pw", "usermod", "-n", username, + crypt_flag, "0", NULL}; #else -chpasswddata = g_strdup_printf("%s:%s\n", username, rawpasswddata); -passwd_path = g_find_program_in_path("chpasswd"); +g_autofree char *chpasswddata = g_strdup_printf("%s:%s\n", username, +rawpasswddata); +const char *crypt_flag = crypted ? "-e" : NULL; +const char *argv[] = {"chpasswd", crypt_flag, NULL}; #endif -chpasswdlen = strlen(chpasswddata); - -if (!passwd_path) { -error_setg(errp, "cannot find 'passwd' program in PATH"); -goto out; -} - -if (!g_unix_open_pipe(datafd, FD_CLOEXEC, NULL)) { -error_setg(errp, "cannot create pipe FDs"); -goto out; -} - -pid = fork(); -if (pid == 0) { -close(datafd[1]); -/* child */ -setsid(); -dup2(datafd[0], 0); -reopen_fd_to_null(1); -reopen_fd_to_null(2); - -#ifdef __FreeBSD__ -const char *h_arg; -h_arg = (crypted) ? "-H" : "-h"; -execl(passwd_path, "pw", "usermod", "-n", username, h_arg, "0", NULL); -#else -if (crypted) { -execl(passwd_path, "chpasswd", "-e", NULL); -} else { -execl(passwd_path, "chpasswd", NULL); -} -#endif -_exit(EXIT_FAILURE); -} else if (pid < 0) { -error_setg_errno(errp, errno, "failed to create child process"); -goto out; -} -close(datafd[0]); -datafd[0] = -1; - -if (qemu_write_full(datafd[1], chpasswddata, chpasswdlen) != chpasswdlen) { -error_setg_errno(errp, errno, "cannot write new account password"); -goto out; -} -close(datafd[1]); -datafd[1] = -1; - -ga_wait_child(pid, , _err); +ga_run_command(argv, chpasswddata, "set user password", _err); if (local_err) { error_propagate(errp, local_err); -goto out; -} - -if (!WIFEXITED(status)) { -error_setg(errp, "child process has terminated abnormally"); -goto out; -} - -if (WEXITSTATUS(status)) { -error_setg(errp, "child process has failed to set user password"); -goto out; -} - -out: -g_free(chpasswddata); -g_free(rawpasswddata); -g_free(passwd_path); -if (datafd[0] != -1) { -close(datafd[0]); -} -if (datafd[1] != -1) { -close(datafd[1]); +return; } } #else /* __linux__ || __FreeBSD__ */ -- 2.39.3