Hi Gavin,

On Thu, Oct 23, 2025 at 12:35 AM Salil Mehta <[email protected]> wrote:
>
> HI Gavin,
>
> On Thu, Oct 23, 2025 at 12:14 AM Gavin Shan <[email protected]> wrote:
> >
> > Hi Salil,
> >
> > On 10/23/25 4:50 AM, Salil Mehta wrote:
> > > On Wed, Oct 22, 2025 at 6:18 PM Salil Mehta <[email protected]> 
> > > wrote:
> > >> On Wed, Oct 22, 2025 at 10:37 AM Gavin Shan <[email protected]> wrote:
> > >>>
> > >>> Hi Salil,
> > >>>
> > >>> On 10/1/25 11:01 AM, [email protected] wrote:
> > >>>> From: Salil Mehta <[email protected]>
> > >>>>
> > >>>> ARM CPU architecture does not allow CPUs to be plugged after system has
> > >>>> initialized. This is a constraint. Hence, the Kernel must know all the 
> > >>>> CPUs
> > >>>> being booted during its initialization. This applies to the Guest 
> > >>>> Kernel as
> > >>>> well and therefore, the number of KVM vCPU descriptors in the host 
> > >>>> must be
> > >>>> fixed at VM initialization time.
> > >>>>
> > >>>> Also, the GIC must know all the CPUs it is connected to during its
> > >>>> initialization, and this cannot change afterward. This must also be 
> > >>>> ensured
> > >>>> during the initialization of the VGIC in KVM. This is necessary 
> > >>>> because:
> > >>>>
> > >>>> 1. The association between GICR and MPIDR must be fixed at VM 
> > >>>> initialization
> > >>>>      time. This is represented by the register
> > >>>>      `GICR_TYPER(mp_affinity, proc_num)`.
> > >>>> 2. Memory regions associated with GICR, etc., cannot be changed (added,
> > >>>>      deleted, or modified) after the VM has been initialized. This is 
> > >>>> not an
> > >>>>      ARM architectural constraint but rather invites a difficult and 
> > >>>> messy
> > >>>>      change in VGIC data structures.
> > >>>>
> > >>>> To enable a hot-add–like model while preserving these constraints, the 
> > >>>> virt
> > >>>> machine may enumerate more CPUs than are enabled at boot using
> > >>>> `-smp disabledcpus=N`. Such CPUs are present but start offline (i.e.,
> > >>>> administratively disabled at init). The topology remains fixed at VM
> > >>>> creation time; only the online/offline status may change later.
> > >>>>
> > >>>> Administratively disabled vCPUs are not realized in QOM until first 
> > >>>> enabled,
> > >>>> avoiding creation of unnecessary vCPU threads at boot. On large 
> > >>>> systems, this
> > >>>> reduces startup time proportionally to the number of disabled vCPUs. 
> > >>>> Once a
> > >>>> QOM vCPU is realized and its thread created, subsequent enable/disable 
> > >>>> actions
> > >>>> do not unrealize it. This behaviour was adopted following review 
> > >>>> feedback and
> > >>>> differs from earlier RFC versions.
> > >>>>
> > >>>> Co-developed-by: Keqian Zhu <[email protected]>
> > >>>> Signed-off-by: Keqian Zhu <[email protected]>
> > >>>> Signed-off-by: Salil Mehta <[email protected]>
> > >>>> ---
> > >>>>    accel/kvm/kvm-all.c    |  2 +-
> > >>>>    hw/arm/virt.c          | 77 
> > >>>> ++++++++++++++++++++++++++++++++++++++----
> > >>>>    hw/core/qdev.c         | 17 ++++++++++
> > >>>>    include/hw/qdev-core.h | 19 +++++++++++
> > >>>>    include/system/kvm.h   |  8 +++++
> > >>>>    target/arm/cpu.c       |  2 ++
> > >>>>    target/arm/kvm.c       | 40 +++++++++++++++++++++-
> > >>>>    target/arm/kvm_arm.h   | 11 ++++++
> > >>>>    8 files changed, 168 insertions(+), 8 deletions(-)
> > >>>>
> > >
> > > [...]
> > >
> > >>>> +void kvm_arm_create_host_vcpu(ARMCPU *cpu)
> > >>>> +{
> > >>>> +    CPUState *cs = CPU(cpu);
> > >>>> +    unsigned long vcpu_id = cs->cpu_index;
> > >>>> +    int ret;
> > >>>> +
> > >>>> +    ret = kvm_create_vcpu(cs);
> > >>>> +    if (ret < 0) {
> > >>>> +        error_report("Failed to create host vcpu %ld", vcpu_id);
> > >>>> +        abort();
> > >>>> +    }
> > >>>> +
> > >>>> +    /*
> > >>>> +     * Initialize the vCPU in the host. This will reset the sys regs
> > >>>> +     * for this vCPU and related registers like MPIDR_EL1 etc. also
> > >>>> +     * get programmed during this call to host. These are referenced
> > >>>> +     * later while setting device attributes of the GICR during GICv3
> > >>>> +     * reset.
> > >>>> +     */
> > >>>> +    ret = kvm_arch_init_vcpu(cs);
> > >>>> +    if (ret < 0) {
> > >>>> +        error_report("Failed to initialize host vcpu %ld", vcpu_id);
> > >>>> +        abort();
> > >>>> +    }
> > >>>> +
> > >>>> +    /*
> > >>>> +     * park the created vCPU. shall be used during kvm_get_vcpu() when
> > >>>> +     * threads are created during realization of ARM vCPUs.
> > >>>> +     */
> > >>>> +    kvm_park_vcpu(cs);
> > >>>> +}
> > >>>> +
> > >>>
> > >>> I don't think we're able to simply call kvm_arch_init_vcpu() in the 
> > >>> lazily realized
> > >>> path. Otherwise, it can trigger a crash dump on my Nvidia's 
> > >>> grace-hopper machine where
> > >>> SVE is supported by default.
> > >>
> > >> Thanks for reporting this. That is not true. As long as we initialize
> > >> KVM correctly and
> > >> finalize the features like SVE we should be fine. In fact, this is
> > >> precisely what we are
> > >> doing right now.
> > >>
> > >> To understand the crash, I need a bit more info.
> > >>
> > >> 1#  is happening because KVM_ARM_VCPU_INIT is failing. If yes, the can 
> > >> you check
> > >>        within the KVM if it is happening because
> > >>       a.  features specified by QEMU are not matching the defaults 
> > >> within the KVM
> > >>             (HInt: check kvm_vcpu_init_check_features())?
> > >>       b. or complaining about init feate change kvm_vcpu_init_changed()?
> > >> 2#  or it is happening during the setting of vector length or
> > >> finalizing features?
> > >>
> > >> int kvm_arch_init_vcpu(CPUState *cs)
> > >> {
> > >>     [...]
> > >>           /* Do KVM_ARM_VCPU_INIT ioctl */
> > >>          ret = kvm_arm_vcpu_init(cpu);   ---->[1]
> > >>          if (ret) {
> > >>             return ret;
> > >>         }
> > >>            if (cpu_isar_feature(aa64_sve, cpu)) {
> > >>          ret = kvm_arm_sve_set_vls(cpu); ---->[2]
> > >>          if (ret) {
> > >>              return ret;
> > >>          }
> > >>          ret = kvm_arm_vcpu_finalize(cpu, KVM_ARM_VCPU_SVE);--->[3]
> > >>          if (ret) {
> > >>              return ret;
> > >>          }
> > >>      }
> > >> [...]
> > >> }
> > >>
> > >> I think it's happening because vector length is going uninitialized.
> > >> This initialization
> > >> happens in context to  arm_cpu_finalize_features() which I forgot to 
> > >> call before
> > >> calling KVM finalize.
> > >>
> > >>>
> > >>> kvm_arch_init_vcpu() is supposed to be called in the realization path 
> > >>> in current
> > >>> implementation (without this series) because the parameters (features) 
> > >>> to KVM_ARM_VCPU_INIT
> > >>> is populated at vCPU realization time.
> > >>
> > >> Not necessarily. It is just meant to initialize the KVM. If we take care 
> > >> of the
> > >> KVM requirements in the similar way the realize path does we should be
> > >> fine. Can you try to add the patch below in your code and test if it 
> > >> works?
> > >>
> > >>   diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> > >> index c4b68a0b17..1091593478 100644
> > >> --- a/target/arm/kvm.c
> > >> +++ b/target/arm/kvm.c
> > >> @@ -1068,6 +1068,9 @@ void kvm_arm_create_host_vcpu(ARMCPU *cpu)
> > >>           abort();
> > >>       }
> > >>
> > >> +     /* finalize the features like SVE, SME etc */
> > >> +     arm_cpu_finalize_features(cpu, &error_abort);
> > >> +
> > >>       /*
> > >>        * Initialize the vCPU in the host. This will reset the sys regs
> > >>        * for this vCPU and related registers like MPIDR_EL1 etc. also
> > >>
> > >>
> > >>
> > >>
> > >>>
> > >>> $ home/gavin/sandbox/qemu.main/build/qemu-system-aarch64           \
> > >>>     --enable-kvm -machine virt,gic-version=3 -cpu host               \
> > >>>     -smp cpus=4,disabledcpus=2 -m 1024M                              \
> > >>>     -kernel /home/gavin/sandbox/linux.guest/arch/arm64/boot/Image    \
> > >>>     -initrd /home/gavin/sandbox/images/rootfs.cpio.xz -nographic
> > >>> qemu-system-aarch64: Failed to initialize host vcpu 4
> > >>> Aborted (core dumped)
> > >>>
> > >>> Backtrace
> > >>> =========
> > >>> (gdb) bt
> > >>> #0  0x0000ffff9106bc80 in __pthread_kill_implementation () at 
> > >>> /lib64/libc.so.6
> > >>> #1  0x0000ffff9101aa40 [PAC] in raise () at /lib64/libc.so.6
> > >>> #2  0x0000ffff91005988 [PAC] in abort () at /lib64/libc.so.6
> > >>> #3  0x0000aaaab1cc26b8 [PAC] in kvm_arm_create_host_vcpu 
> > >>> (cpu=0xaaaab9ab1bc0)
> > >>>       at ../target/arm/kvm.c:1081
> > >>> #4  0x0000aaaab1cd0c94 in virt_setup_lazy_vcpu_realization 
> > >>> (cpuobj=0xaaaab9ab1bc0, vms=0xaaaab98870a0)
> > >>>       at ../hw/arm/virt.c:2483
> > >>> #5  0x0000aaaab1cd180c in machvirt_init (machine=0xaaaab98870a0) at 
> > >>> ../hw/arm/virt.c:2777
> > >>> #6  0x0000aaaab160f220 in machine_run_board_init
> > >>>       (machine=0xaaaab98870a0, mem_path=0x0, errp=0xfffffa86bdc8) at 
> > >>> ../hw/core/machine.c:1722
> > >>> #7  0x0000aaaab1a25ef4 in qemu_init_board () at ../system/vl.c:2723
> > >>> #8  0x0000aaaab1a2635c in qmp_x_exit_preconfig (errp=0xaaaab38a50f0 
> > >>> <error_fatal>)
> > >>>       at ../system/vl.c:2821
> > >>> #9  0x0000aaaab1a28b08 in qemu_init (argc=15, argv=0xfffffa86c1f8) at 
> > >>> ../system/vl.c:3882
> > >>> #10 0x0000aaaab221d9e4 in main (argc=15, argv=0xfffffa86c1f8) at 
> > >>> ../system/main.c:71
> > >>
> > >>
> > >> Thank you for this. Please let me know if the above fix works and also
> > >> the return values in
> > >> case you encounter errors.
> > >
> > > I've pushed the fix to below branch for your convenience:
> > >
> > > Branch: 
> > > https://github.com/salil-mehta/qemu/commits/virt-cpuhp-armv8/rfc-v6.2
> > > Fix: 
> > > https://github.com/salil-mehta/qemu/commit/1f1fbc0998ffb1fe26140df3c336bf2be2aa8669
> > >
> >
> > I guess rfc-v6.2 branch isn't ready for test because it runs into another 
> > crash
> > dump with rfc-v6.2 branch, like below.
>
>
> rfc-6.2 is not crashing on Kunpeng920 where I tested. But this
> chip does not have some ARM extensions like SVE etc so
> Unfortunately, I can't test SVE/SME/PAuth etc support.
>
> Can you disable SVE and then try if it comes up just to corner
> the case?
>
> >
> > host$ /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64               
> >       \
> >        -accel kvm -machine virt,gic-version=host,nvdimm=on                  
> >        \
> >        -cpu host,sve=on                                                     
> >        \
> >        -smp 
> > maxcpus=4,cpus=2,disabledcpus=2,sockets=2,clusters=2,cores=1,threads=1 \
> >        -m 4096M,slots=16,maxmem=128G                                        
> >        \
> >        -object memory-backend-ram,id=mem0,size=2048M                        
> >        \
> >        -object memory-backend-ram,id=mem1,size=2048M                        
> >        \
> >        -numa node,nodeid=0,memdev=mem0,cpus=0-1                             
> >        \
> >        -numa node,nodeid=1,memdev=mem1,cpus=2-3                             
> >        \
> >        -L /home/gavin/sandbox/qemu.main/build/pc-bios                       
> >        \
> >        -monitor none -serial mon:stdio -nographic -gdb tcp::6666            
> >        \
> >        -qmp tcp:localhost:5555,server,wait=off                              
> >        \
> >        -bios 
> > /home/gavin/sandbox/qemu.main/build/pc-bios/edk2-aarch64-code.fd      \
> >        -kernel /home/gavin/sandbox/linux.guest/arch/arm64/boot/Image        
> >        \
> >        -initrd /home/gavin/sandbox/images/rootfs.cpio.xz                    
> >        \
> >        -append memhp_default_state=online_movable
> >          :
> >          :
> > guest$ cd /sys/devices/system/cpu/
> > guest$ cat present enabled online
> > 0-3
> > 0-1
> > 0-1
> > (qemu) device_set 
> > host-arm-cpu,socket-id=1,cluster-id=0,core-id=0,thread-id=0,admin-state=enable
> > qemu-system-aarch64: kvm_init_vcpu: kvm_arch_init_vcpu failed (2): 
> > Operation not permitted
>
>
> Ah, I see. I think I understand the issue. It's complaining
> about calling the  finalize twice. Is it possible to check as
> I do not have a way to test it?
>
>
> int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int feature)
> {
> switch (feature) {
> case KVM_ARM_VCPU_SVE:
> [...]
> if (kvm_arm_vcpu_sve_finalized(vcpu))
> return -EPERM;-----> this where it must be popping?
> [...]
> }

I've pushed the fix to avoid calling the finalizing SVE
feature (KVM_ARM_VCPU_FINALIZE) twice on the
same RFC-V6.2 branch.

May I kindly request you to validate the fix again and
check SVE works on NVIDIA grace-hopper?

Many thanks!

Best regards
Salil.

Reply via email to