Hi Gavin, On Thu, Oct 23, 2025 at 12:35 AM Salil Mehta <[email protected]> wrote: > > HI Gavin, > > On Thu, Oct 23, 2025 at 12:14 AM Gavin Shan <[email protected]> wrote: > > > > Hi Salil, > > > > On 10/23/25 4:50 AM, Salil Mehta wrote: > > > On Wed, Oct 22, 2025 at 6:18 PM Salil Mehta <[email protected]> > > > wrote: > > >> On Wed, Oct 22, 2025 at 10:37 AM Gavin Shan <[email protected]> wrote: > > >>> > > >>> Hi Salil, > > >>> > > >>> On 10/1/25 11:01 AM, [email protected] wrote: > > >>>> From: Salil Mehta <[email protected]> > > >>>> > > >>>> ARM CPU architecture does not allow CPUs to be plugged after system has > > >>>> initialized. This is a constraint. Hence, the Kernel must know all the > > >>>> CPUs > > >>>> being booted during its initialization. This applies to the Guest > > >>>> Kernel as > > >>>> well and therefore, the number of KVM vCPU descriptors in the host > > >>>> must be > > >>>> fixed at VM initialization time. > > >>>> > > >>>> Also, the GIC must know all the CPUs it is connected to during its > > >>>> initialization, and this cannot change afterward. This must also be > > >>>> ensured > > >>>> during the initialization of the VGIC in KVM. This is necessary > > >>>> because: > > >>>> > > >>>> 1. The association between GICR and MPIDR must be fixed at VM > > >>>> initialization > > >>>> time. This is represented by the register > > >>>> `GICR_TYPER(mp_affinity, proc_num)`. > > >>>> 2. Memory regions associated with GICR, etc., cannot be changed (added, > > >>>> deleted, or modified) after the VM has been initialized. This is > > >>>> not an > > >>>> ARM architectural constraint but rather invites a difficult and > > >>>> messy > > >>>> change in VGIC data structures. > > >>>> > > >>>> To enable a hot-add–like model while preserving these constraints, the > > >>>> virt > > >>>> machine may enumerate more CPUs than are enabled at boot using > > >>>> `-smp disabledcpus=N`. Such CPUs are present but start offline (i.e., > > >>>> administratively disabled at init). The topology remains fixed at VM > > >>>> creation time; only the online/offline status may change later. > > >>>> > > >>>> Administratively disabled vCPUs are not realized in QOM until first > > >>>> enabled, > > >>>> avoiding creation of unnecessary vCPU threads at boot. On large > > >>>> systems, this > > >>>> reduces startup time proportionally to the number of disabled vCPUs. > > >>>> Once a > > >>>> QOM vCPU is realized and its thread created, subsequent enable/disable > > >>>> actions > > >>>> do not unrealize it. This behaviour was adopted following review > > >>>> feedback and > > >>>> differs from earlier RFC versions. > > >>>> > > >>>> Co-developed-by: Keqian Zhu <[email protected]> > > >>>> Signed-off-by: Keqian Zhu <[email protected]> > > >>>> Signed-off-by: Salil Mehta <[email protected]> > > >>>> --- > > >>>> accel/kvm/kvm-all.c | 2 +- > > >>>> hw/arm/virt.c | 77 > > >>>> ++++++++++++++++++++++++++++++++++++++---- > > >>>> hw/core/qdev.c | 17 ++++++++++ > > >>>> include/hw/qdev-core.h | 19 +++++++++++ > > >>>> include/system/kvm.h | 8 +++++ > > >>>> target/arm/cpu.c | 2 ++ > > >>>> target/arm/kvm.c | 40 +++++++++++++++++++++- > > >>>> target/arm/kvm_arm.h | 11 ++++++ > > >>>> 8 files changed, 168 insertions(+), 8 deletions(-) > > >>>> > > > > > > [...] > > > > > >>>> +void kvm_arm_create_host_vcpu(ARMCPU *cpu) > > >>>> +{ > > >>>> + CPUState *cs = CPU(cpu); > > >>>> + unsigned long vcpu_id = cs->cpu_index; > > >>>> + int ret; > > >>>> + > > >>>> + ret = kvm_create_vcpu(cs); > > >>>> + if (ret < 0) { > > >>>> + error_report("Failed to create host vcpu %ld", vcpu_id); > > >>>> + abort(); > > >>>> + } > > >>>> + > > >>>> + /* > > >>>> + * Initialize the vCPU in the host. This will reset the sys regs > > >>>> + * for this vCPU and related registers like MPIDR_EL1 etc. also > > >>>> + * get programmed during this call to host. These are referenced > > >>>> + * later while setting device attributes of the GICR during GICv3 > > >>>> + * reset. > > >>>> + */ > > >>>> + ret = kvm_arch_init_vcpu(cs); > > >>>> + if (ret < 0) { > > >>>> + error_report("Failed to initialize host vcpu %ld", vcpu_id); > > >>>> + abort(); > > >>>> + } > > >>>> + > > >>>> + /* > > >>>> + * park the created vCPU. shall be used during kvm_get_vcpu() when > > >>>> + * threads are created during realization of ARM vCPUs. > > >>>> + */ > > >>>> + kvm_park_vcpu(cs); > > >>>> +} > > >>>> + > > >>> > > >>> I don't think we're able to simply call kvm_arch_init_vcpu() in the > > >>> lazily realized > > >>> path. Otherwise, it can trigger a crash dump on my Nvidia's > > >>> grace-hopper machine where > > >>> SVE is supported by default. > > >> > > >> Thanks for reporting this. That is not true. As long as we initialize > > >> KVM correctly and > > >> finalize the features like SVE we should be fine. In fact, this is > > >> precisely what we are > > >> doing right now. > > >> > > >> To understand the crash, I need a bit more info. > > >> > > >> 1# is happening because KVM_ARM_VCPU_INIT is failing. If yes, the can > > >> you check > > >> within the KVM if it is happening because > > >> a. features specified by QEMU are not matching the defaults > > >> within the KVM > > >> (HInt: check kvm_vcpu_init_check_features())? > > >> b. or complaining about init feate change kvm_vcpu_init_changed()? > > >> 2# or it is happening during the setting of vector length or > > >> finalizing features? > > >> > > >> int kvm_arch_init_vcpu(CPUState *cs) > > >> { > > >> [...] > > >> /* Do KVM_ARM_VCPU_INIT ioctl */ > > >> ret = kvm_arm_vcpu_init(cpu); ---->[1] > > >> if (ret) { > > >> return ret; > > >> } > > >> if (cpu_isar_feature(aa64_sve, cpu)) { > > >> ret = kvm_arm_sve_set_vls(cpu); ---->[2] > > >> if (ret) { > > >> return ret; > > >> } > > >> ret = kvm_arm_vcpu_finalize(cpu, KVM_ARM_VCPU_SVE);--->[3] > > >> if (ret) { > > >> return ret; > > >> } > > >> } > > >> [...] > > >> } > > >> > > >> I think it's happening because vector length is going uninitialized. > > >> This initialization > > >> happens in context to arm_cpu_finalize_features() which I forgot to > > >> call before > > >> calling KVM finalize. > > >> > > >>> > > >>> kvm_arch_init_vcpu() is supposed to be called in the realization path > > >>> in current > > >>> implementation (without this series) because the parameters (features) > > >>> to KVM_ARM_VCPU_INIT > > >>> is populated at vCPU realization time. > > >> > > >> Not necessarily. It is just meant to initialize the KVM. If we take care > > >> of the > > >> KVM requirements in the similar way the realize path does we should be > > >> fine. Can you try to add the patch below in your code and test if it > > >> works? > > >> > > >> diff --git a/target/arm/kvm.c b/target/arm/kvm.c > > >> index c4b68a0b17..1091593478 100644 > > >> --- a/target/arm/kvm.c > > >> +++ b/target/arm/kvm.c > > >> @@ -1068,6 +1068,9 @@ void kvm_arm_create_host_vcpu(ARMCPU *cpu) > > >> abort(); > > >> } > > >> > > >> + /* finalize the features like SVE, SME etc */ > > >> + arm_cpu_finalize_features(cpu, &error_abort); > > >> + > > >> /* > > >> * Initialize the vCPU in the host. This will reset the sys regs > > >> * for this vCPU and related registers like MPIDR_EL1 etc. also > > >> > > >> > > >> > > >> > > >>> > > >>> $ home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ > > >>> --enable-kvm -machine virt,gic-version=3 -cpu host \ > > >>> -smp cpus=4,disabledcpus=2 -m 1024M \ > > >>> -kernel /home/gavin/sandbox/linux.guest/arch/arm64/boot/Image \ > > >>> -initrd /home/gavin/sandbox/images/rootfs.cpio.xz -nographic > > >>> qemu-system-aarch64: Failed to initialize host vcpu 4 > > >>> Aborted (core dumped) > > >>> > > >>> Backtrace > > >>> ========= > > >>> (gdb) bt > > >>> #0 0x0000ffff9106bc80 in __pthread_kill_implementation () at > > >>> /lib64/libc.so.6 > > >>> #1 0x0000ffff9101aa40 [PAC] in raise () at /lib64/libc.so.6 > > >>> #2 0x0000ffff91005988 [PAC] in abort () at /lib64/libc.so.6 > > >>> #3 0x0000aaaab1cc26b8 [PAC] in kvm_arm_create_host_vcpu > > >>> (cpu=0xaaaab9ab1bc0) > > >>> at ../target/arm/kvm.c:1081 > > >>> #4 0x0000aaaab1cd0c94 in virt_setup_lazy_vcpu_realization > > >>> (cpuobj=0xaaaab9ab1bc0, vms=0xaaaab98870a0) > > >>> at ../hw/arm/virt.c:2483 > > >>> #5 0x0000aaaab1cd180c in machvirt_init (machine=0xaaaab98870a0) at > > >>> ../hw/arm/virt.c:2777 > > >>> #6 0x0000aaaab160f220 in machine_run_board_init > > >>> (machine=0xaaaab98870a0, mem_path=0x0, errp=0xfffffa86bdc8) at > > >>> ../hw/core/machine.c:1722 > > >>> #7 0x0000aaaab1a25ef4 in qemu_init_board () at ../system/vl.c:2723 > > >>> #8 0x0000aaaab1a2635c in qmp_x_exit_preconfig (errp=0xaaaab38a50f0 > > >>> <error_fatal>) > > >>> at ../system/vl.c:2821 > > >>> #9 0x0000aaaab1a28b08 in qemu_init (argc=15, argv=0xfffffa86c1f8) at > > >>> ../system/vl.c:3882 > > >>> #10 0x0000aaaab221d9e4 in main (argc=15, argv=0xfffffa86c1f8) at > > >>> ../system/main.c:71 > > >> > > >> > > >> Thank you for this. Please let me know if the above fix works and also > > >> the return values in > > >> case you encounter errors. > > > > > > I've pushed the fix to below branch for your convenience: > > > > > > Branch: > > > https://github.com/salil-mehta/qemu/commits/virt-cpuhp-armv8/rfc-v6.2 > > > Fix: > > > https://github.com/salil-mehta/qemu/commit/1f1fbc0998ffb1fe26140df3c336bf2be2aa8669 > > > > > > > I guess rfc-v6.2 branch isn't ready for test because it runs into another > > crash > > dump with rfc-v6.2 branch, like below. > > > rfc-6.2 is not crashing on Kunpeng920 where I tested. But this > chip does not have some ARM extensions like SVE etc so > Unfortunately, I can't test SVE/SME/PAuth etc support. > > Can you disable SVE and then try if it comes up just to corner > the case? > > > > > host$ /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 > > \ > > -accel kvm -machine virt,gic-version=host,nvdimm=on > > \ > > -cpu host,sve=on > > \ > > -smp > > maxcpus=4,cpus=2,disabledcpus=2,sockets=2,clusters=2,cores=1,threads=1 \ > > -m 4096M,slots=16,maxmem=128G > > \ > > -object memory-backend-ram,id=mem0,size=2048M > > \ > > -object memory-backend-ram,id=mem1,size=2048M > > \ > > -numa node,nodeid=0,memdev=mem0,cpus=0-1 > > \ > > -numa node,nodeid=1,memdev=mem1,cpus=2-3 > > \ > > -L /home/gavin/sandbox/qemu.main/build/pc-bios > > \ > > -monitor none -serial mon:stdio -nographic -gdb tcp::6666 > > \ > > -qmp tcp:localhost:5555,server,wait=off > > \ > > -bios > > /home/gavin/sandbox/qemu.main/build/pc-bios/edk2-aarch64-code.fd \ > > -kernel /home/gavin/sandbox/linux.guest/arch/arm64/boot/Image > > \ > > -initrd /home/gavin/sandbox/images/rootfs.cpio.xz > > \ > > -append memhp_default_state=online_movable > > : > > : > > guest$ cd /sys/devices/system/cpu/ > > guest$ cat present enabled online > > 0-3 > > 0-1 > > 0-1 > > (qemu) device_set > > host-arm-cpu,socket-id=1,cluster-id=0,core-id=0,thread-id=0,admin-state=enable > > qemu-system-aarch64: kvm_init_vcpu: kvm_arch_init_vcpu failed (2): > > Operation not permitted > > > Ah, I see. I think I understand the issue. It's complaining > about calling the finalize twice. Is it possible to check as > I do not have a way to test it? > > > int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int feature) > { > switch (feature) { > case KVM_ARM_VCPU_SVE: > [...] > if (kvm_arm_vcpu_sve_finalized(vcpu)) > return -EPERM;-----> this where it must be popping? > [...] > }
I've pushed the fix to avoid calling the finalizing SVE feature (KVM_ARM_VCPU_FINALIZE) twice on the same RFC-V6.2 branch. May I kindly request you to validate the fix again and check SVE works on NVIDIA grace-hopper? Many thanks! Best regards Salil.
