HI Gavin,
On Thu, Oct 23, 2025 at 12:14 AM Gavin Shan <[email protected]> wrote:
>
> Hi Salil,
>
> On 10/23/25 4:50 AM, Salil Mehta wrote:
> > On Wed, Oct 22, 2025 at 6:18 PM Salil Mehta <[email protected]> wrote:
> >> On Wed, Oct 22, 2025 at 10:37 AM Gavin Shan <[email protected]> wrote:
> >>>
> >>> Hi Salil,
> >>>
> >>> On 10/1/25 11:01 AM, [email protected] wrote:
> >>>> From: Salil Mehta <[email protected]>
> >>>>
> >>>> ARM CPU architecture does not allow CPUs to be plugged after system has
> >>>> initialized. This is a constraint. Hence, the Kernel must know all the
> >>>> CPUs
> >>>> being booted during its initialization. This applies to the Guest Kernel
> >>>> as
> >>>> well and therefore, the number of KVM vCPU descriptors in the host must
> >>>> be
> >>>> fixed at VM initialization time.
> >>>>
> >>>> Also, the GIC must know all the CPUs it is connected to during its
> >>>> initialization, and this cannot change afterward. This must also be
> >>>> ensured
> >>>> during the initialization of the VGIC in KVM. This is necessary because:
> >>>>
> >>>> 1. The association between GICR and MPIDR must be fixed at VM
> >>>> initialization
> >>>> time. This is represented by the register
> >>>> `GICR_TYPER(mp_affinity, proc_num)`.
> >>>> 2. Memory regions associated with GICR, etc., cannot be changed (added,
> >>>> deleted, or modified) after the VM has been initialized. This is
> >>>> not an
> >>>> ARM architectural constraint but rather invites a difficult and
> >>>> messy
> >>>> change in VGIC data structures.
> >>>>
> >>>> To enable a hot-add–like model while preserving these constraints, the
> >>>> virt
> >>>> machine may enumerate more CPUs than are enabled at boot using
> >>>> `-smp disabledcpus=N`. Such CPUs are present but start offline (i.e.,
> >>>> administratively disabled at init). The topology remains fixed at VM
> >>>> creation time; only the online/offline status may change later.
> >>>>
> >>>> Administratively disabled vCPUs are not realized in QOM until first
> >>>> enabled,
> >>>> avoiding creation of unnecessary vCPU threads at boot. On large systems,
> >>>> this
> >>>> reduces startup time proportionally to the number of disabled vCPUs.
> >>>> Once a
> >>>> QOM vCPU is realized and its thread created, subsequent enable/disable
> >>>> actions
> >>>> do not unrealize it. This behaviour was adopted following review
> >>>> feedback and
> >>>> differs from earlier RFC versions.
> >>>>
> >>>> Co-developed-by: Keqian Zhu <[email protected]>
> >>>> Signed-off-by: Keqian Zhu <[email protected]>
> >>>> Signed-off-by: Salil Mehta <[email protected]>
> >>>> ---
> >>>> accel/kvm/kvm-all.c | 2 +-
> >>>> hw/arm/virt.c | 77 ++++++++++++++++++++++++++++++++++++++----
> >>>> hw/core/qdev.c | 17 ++++++++++
> >>>> include/hw/qdev-core.h | 19 +++++++++++
> >>>> include/system/kvm.h | 8 +++++
> >>>> target/arm/cpu.c | 2 ++
> >>>> target/arm/kvm.c | 40 +++++++++++++++++++++-
> >>>> target/arm/kvm_arm.h | 11 ++++++
> >>>> 8 files changed, 168 insertions(+), 8 deletions(-)
> >>>>
> >
> > [...]
> >
> >>>> +void kvm_arm_create_host_vcpu(ARMCPU *cpu)
> >>>> +{
> >>>> + CPUState *cs = CPU(cpu);
> >>>> + unsigned long vcpu_id = cs->cpu_index;
> >>>> + int ret;
> >>>> +
> >>>> + ret = kvm_create_vcpu(cs);
> >>>> + if (ret < 0) {
> >>>> + error_report("Failed to create host vcpu %ld", vcpu_id);
> >>>> + abort();
> >>>> + }
> >>>> +
> >>>> + /*
> >>>> + * Initialize the vCPU in the host. This will reset the sys regs
> >>>> + * for this vCPU and related registers like MPIDR_EL1 etc. also
> >>>> + * get programmed during this call to host. These are referenced
> >>>> + * later while setting device attributes of the GICR during GICv3
> >>>> + * reset.
> >>>> + */
> >>>> + ret = kvm_arch_init_vcpu(cs);
> >>>> + if (ret < 0) {
> >>>> + error_report("Failed to initialize host vcpu %ld", vcpu_id);
> >>>> + abort();
> >>>> + }
> >>>> +
> >>>> + /*
> >>>> + * park the created vCPU. shall be used during kvm_get_vcpu() when
> >>>> + * threads are created during realization of ARM vCPUs.
> >>>> + */
> >>>> + kvm_park_vcpu(cs);
> >>>> +}
> >>>> +
> >>>
> >>> I don't think we're able to simply call kvm_arch_init_vcpu() in the
> >>> lazily realized
> >>> path. Otherwise, it can trigger a crash dump on my Nvidia's grace-hopper
> >>> machine where
> >>> SVE is supported by default.
> >>
> >> Thanks for reporting this. That is not true. As long as we initialize
> >> KVM correctly and
> >> finalize the features like SVE we should be fine. In fact, this is
> >> precisely what we are
> >> doing right now.
> >>
> >> To understand the crash, I need a bit more info.
> >>
> >> 1# is happening because KVM_ARM_VCPU_INIT is failing. If yes, the can you
> >> check
> >> within the KVM if it is happening because
> >> a. features specified by QEMU are not matching the defaults within
> >> the KVM
> >> (HInt: check kvm_vcpu_init_check_features())?
> >> b. or complaining about init feate change kvm_vcpu_init_changed()?
> >> 2# or it is happening during the setting of vector length or
> >> finalizing features?
> >>
> >> int kvm_arch_init_vcpu(CPUState *cs)
> >> {
> >> [...]
> >> /* Do KVM_ARM_VCPU_INIT ioctl */
> >> ret = kvm_arm_vcpu_init(cpu); ---->[1]
> >> if (ret) {
> >> return ret;
> >> }
> >> if (cpu_isar_feature(aa64_sve, cpu)) {
> >> ret = kvm_arm_sve_set_vls(cpu); ---->[2]
> >> if (ret) {
> >> return ret;
> >> }
> >> ret = kvm_arm_vcpu_finalize(cpu, KVM_ARM_VCPU_SVE);--->[3]
> >> if (ret) {
> >> return ret;
> >> }
> >> }
> >> [...]
> >> }
> >>
> >> I think it's happening because vector length is going uninitialized.
> >> This initialization
> >> happens in context to arm_cpu_finalize_features() which I forgot to call
> >> before
> >> calling KVM finalize.
> >>
> >>>
> >>> kvm_arch_init_vcpu() is supposed to be called in the realization path in
> >>> current
> >>> implementation (without this series) because the parameters (features) to
> >>> KVM_ARM_VCPU_INIT
> >>> is populated at vCPU realization time.
> >>
> >> Not necessarily. It is just meant to initialize the KVM. If we take care
> >> of the
> >> KVM requirements in the similar way the realize path does we should be
> >> fine. Can you try to add the patch below in your code and test if it works?
> >>
> >> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> >> index c4b68a0b17..1091593478 100644
> >> --- a/target/arm/kvm.c
> >> +++ b/target/arm/kvm.c
> >> @@ -1068,6 +1068,9 @@ void kvm_arm_create_host_vcpu(ARMCPU *cpu)
> >> abort();
> >> }
> >>
> >> + /* finalize the features like SVE, SME etc */
> >> + arm_cpu_finalize_features(cpu, &error_abort);
> >> +
> >> /*
> >> * Initialize the vCPU in the host. This will reset the sys regs
> >> * for this vCPU and related registers like MPIDR_EL1 etc. also
> >>
> >>
> >>
> >>
> >>>
> >>> $ home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
> >>> --enable-kvm -machine virt,gic-version=3 -cpu host \
> >>> -smp cpus=4,disabledcpus=2 -m 1024M \
> >>> -kernel /home/gavin/sandbox/linux.guest/arch/arm64/boot/Image \
> >>> -initrd /home/gavin/sandbox/images/rootfs.cpio.xz -nographic
> >>> qemu-system-aarch64: Failed to initialize host vcpu 4
> >>> Aborted (core dumped)
> >>>
> >>> Backtrace
> >>> =========
> >>> (gdb) bt
> >>> #0 0x0000ffff9106bc80 in __pthread_kill_implementation () at
> >>> /lib64/libc.so.6
> >>> #1 0x0000ffff9101aa40 [PAC] in raise () at /lib64/libc.so.6
> >>> #2 0x0000ffff91005988 [PAC] in abort () at /lib64/libc.so.6
> >>> #3 0x0000aaaab1cc26b8 [PAC] in kvm_arm_create_host_vcpu
> >>> (cpu=0xaaaab9ab1bc0)
> >>> at ../target/arm/kvm.c:1081
> >>> #4 0x0000aaaab1cd0c94 in virt_setup_lazy_vcpu_realization
> >>> (cpuobj=0xaaaab9ab1bc0, vms=0xaaaab98870a0)
> >>> at ../hw/arm/virt.c:2483
> >>> #5 0x0000aaaab1cd180c in machvirt_init (machine=0xaaaab98870a0) at
> >>> ../hw/arm/virt.c:2777
> >>> #6 0x0000aaaab160f220 in machine_run_board_init
> >>> (machine=0xaaaab98870a0, mem_path=0x0, errp=0xfffffa86bdc8) at
> >>> ../hw/core/machine.c:1722
> >>> #7 0x0000aaaab1a25ef4 in qemu_init_board () at ../system/vl.c:2723
> >>> #8 0x0000aaaab1a2635c in qmp_x_exit_preconfig (errp=0xaaaab38a50f0
> >>> <error_fatal>)
> >>> at ../system/vl.c:2821
> >>> #9 0x0000aaaab1a28b08 in qemu_init (argc=15, argv=0xfffffa86c1f8) at
> >>> ../system/vl.c:3882
> >>> #10 0x0000aaaab221d9e4 in main (argc=15, argv=0xfffffa86c1f8) at
> >>> ../system/main.c:71
> >>
> >>
> >> Thank you for this. Please let me know if the above fix works and also
> >> the return values in
> >> case you encounter errors.
> >
> > I've pushed the fix to below branch for your convenience:
> >
> > Branch:
> > https://github.com/salil-mehta/qemu/commits/virt-cpuhp-armv8/rfc-v6.2
> > Fix:
> > https://github.com/salil-mehta/qemu/commit/1f1fbc0998ffb1fe26140df3c336bf2be2aa8669
> >
>
> I guess rfc-v6.2 branch isn't ready for test because it runs into another
> crash
> dump with rfc-v6.2 branch, like below.
rfc-6.2 is not crashing on Kunpeng920 where I tested. But this
chip does not have some ARM extensions like SVE etc so
Unfortunately, I can't test SVE/SME/PAuth etc support.
Can you disable SVE and then try if it comes up just to corner
the case?
>
> host$ /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64
> \
> -accel kvm -machine virt,gic-version=host,nvdimm=on
> \
> -cpu host,sve=on
> \
> -smp
> maxcpus=4,cpus=2,disabledcpus=2,sockets=2,clusters=2,cores=1,threads=1 \
> -m 4096M,slots=16,maxmem=128G
> \
> -object memory-backend-ram,id=mem0,size=2048M
> \
> -object memory-backend-ram,id=mem1,size=2048M
> \
> -numa node,nodeid=0,memdev=mem0,cpus=0-1
> \
> -numa node,nodeid=1,memdev=mem1,cpus=2-3
> \
> -L /home/gavin/sandbox/qemu.main/build/pc-bios
> \
> -monitor none -serial mon:stdio -nographic -gdb tcp::6666
> \
> -qmp tcp:localhost:5555,server,wait=off
> \
> -bios /home/gavin/sandbox/qemu.main/build/pc-bios/edk2-aarch64-code.fd
> \
> -kernel /home/gavin/sandbox/linux.guest/arch/arm64/boot/Image
> \
> -initrd /home/gavin/sandbox/images/rootfs.cpio.xz
> \
> -append memhp_default_state=online_movable
> :
> :
> guest$ cd /sys/devices/system/cpu/
> guest$ cat present enabled online
> 0-3
> 0-1
> 0-1
> (qemu) device_set
> host-arm-cpu,socket-id=1,cluster-id=0,core-id=0,thread-id=0,admin-state=enable
> qemu-system-aarch64: kvm_init_vcpu: kvm_arch_init_vcpu failed (2): Operation
> not permitted
Ah, I see. I think I understand the issue. It's complaining
about calling the finalize twice. Is it possible to check as
I do not have a way to test it?
int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int feature)
{
switch (feature) {
case KVM_ARM_VCPU_SVE:
[...]
if (kvm_arm_vcpu_sve_finalized(vcpu))
return -EPERM;-----> this where it must be popping?
[...]
}
>
> I picked the fix (the last patch in rfc-v6.2 branch) to rfc-v6 branch, same
> crash dump
> can be seen.
Are you getting previously reported abort or above new problem?
Thanks
Salil.