HI Gavin,

On Thu, Oct 23, 2025 at 12:14 AM Gavin Shan <[email protected]> wrote:
>
> Hi Salil,
>
> On 10/23/25 4:50 AM, Salil Mehta wrote:
> > On Wed, Oct 22, 2025 at 6:18 PM Salil Mehta <[email protected]> wrote:
> >> On Wed, Oct 22, 2025 at 10:37 AM Gavin Shan <[email protected]> wrote:
> >>>
> >>> Hi Salil,
> >>>
> >>> On 10/1/25 11:01 AM, [email protected] wrote:
> >>>> From: Salil Mehta <[email protected]>
> >>>>
> >>>> ARM CPU architecture does not allow CPUs to be plugged after system has
> >>>> initialized. This is a constraint. Hence, the Kernel must know all the 
> >>>> CPUs
> >>>> being booted during its initialization. This applies to the Guest Kernel 
> >>>> as
> >>>> well and therefore, the number of KVM vCPU descriptors in the host must 
> >>>> be
> >>>> fixed at VM initialization time.
> >>>>
> >>>> Also, the GIC must know all the CPUs it is connected to during its
> >>>> initialization, and this cannot change afterward. This must also be 
> >>>> ensured
> >>>> during the initialization of the VGIC in KVM. This is necessary because:
> >>>>
> >>>> 1. The association between GICR and MPIDR must be fixed at VM 
> >>>> initialization
> >>>>      time. This is represented by the register
> >>>>      `GICR_TYPER(mp_affinity, proc_num)`.
> >>>> 2. Memory regions associated with GICR, etc., cannot be changed (added,
> >>>>      deleted, or modified) after the VM has been initialized. This is 
> >>>> not an
> >>>>      ARM architectural constraint but rather invites a difficult and 
> >>>> messy
> >>>>      change in VGIC data structures.
> >>>>
> >>>> To enable a hot-add–like model while preserving these constraints, the 
> >>>> virt
> >>>> machine may enumerate more CPUs than are enabled at boot using
> >>>> `-smp disabledcpus=N`. Such CPUs are present but start offline (i.e.,
> >>>> administratively disabled at init). The topology remains fixed at VM
> >>>> creation time; only the online/offline status may change later.
> >>>>
> >>>> Administratively disabled vCPUs are not realized in QOM until first 
> >>>> enabled,
> >>>> avoiding creation of unnecessary vCPU threads at boot. On large systems, 
> >>>> this
> >>>> reduces startup time proportionally to the number of disabled vCPUs. 
> >>>> Once a
> >>>> QOM vCPU is realized and its thread created, subsequent enable/disable 
> >>>> actions
> >>>> do not unrealize it. This behaviour was adopted following review 
> >>>> feedback and
> >>>> differs from earlier RFC versions.
> >>>>
> >>>> Co-developed-by: Keqian Zhu <[email protected]>
> >>>> Signed-off-by: Keqian Zhu <[email protected]>
> >>>> Signed-off-by: Salil Mehta <[email protected]>
> >>>> ---
> >>>>    accel/kvm/kvm-all.c    |  2 +-
> >>>>    hw/arm/virt.c          | 77 ++++++++++++++++++++++++++++++++++++++----
> >>>>    hw/core/qdev.c         | 17 ++++++++++
> >>>>    include/hw/qdev-core.h | 19 +++++++++++
> >>>>    include/system/kvm.h   |  8 +++++
> >>>>    target/arm/cpu.c       |  2 ++
> >>>>    target/arm/kvm.c       | 40 +++++++++++++++++++++-
> >>>>    target/arm/kvm_arm.h   | 11 ++++++
> >>>>    8 files changed, 168 insertions(+), 8 deletions(-)
> >>>>
> >
> > [...]
> >
> >>>> +void kvm_arm_create_host_vcpu(ARMCPU *cpu)
> >>>> +{
> >>>> +    CPUState *cs = CPU(cpu);
> >>>> +    unsigned long vcpu_id = cs->cpu_index;
> >>>> +    int ret;
> >>>> +
> >>>> +    ret = kvm_create_vcpu(cs);
> >>>> +    if (ret < 0) {
> >>>> +        error_report("Failed to create host vcpu %ld", vcpu_id);
> >>>> +        abort();
> >>>> +    }
> >>>> +
> >>>> +    /*
> >>>> +     * Initialize the vCPU in the host. This will reset the sys regs
> >>>> +     * for this vCPU and related registers like MPIDR_EL1 etc. also
> >>>> +     * get programmed during this call to host. These are referenced
> >>>> +     * later while setting device attributes of the GICR during GICv3
> >>>> +     * reset.
> >>>> +     */
> >>>> +    ret = kvm_arch_init_vcpu(cs);
> >>>> +    if (ret < 0) {
> >>>> +        error_report("Failed to initialize host vcpu %ld", vcpu_id);
> >>>> +        abort();
> >>>> +    }
> >>>> +
> >>>> +    /*
> >>>> +     * park the created vCPU. shall be used during kvm_get_vcpu() when
> >>>> +     * threads are created during realization of ARM vCPUs.
> >>>> +     */
> >>>> +    kvm_park_vcpu(cs);
> >>>> +}
> >>>> +
> >>>
> >>> I don't think we're able to simply call kvm_arch_init_vcpu() in the 
> >>> lazily realized
> >>> path. Otherwise, it can trigger a crash dump on my Nvidia's grace-hopper 
> >>> machine where
> >>> SVE is supported by default.
> >>
> >> Thanks for reporting this. That is not true. As long as we initialize
> >> KVM correctly and
> >> finalize the features like SVE we should be fine. In fact, this is
> >> precisely what we are
> >> doing right now.
> >>
> >> To understand the crash, I need a bit more info.
> >>
> >> 1#  is happening because KVM_ARM_VCPU_INIT is failing. If yes, the can you 
> >> check
> >>        within the KVM if it is happening because
> >>       a.  features specified by QEMU are not matching the defaults within 
> >> the KVM
> >>             (HInt: check kvm_vcpu_init_check_features())?
> >>       b. or complaining about init feate change kvm_vcpu_init_changed()?
> >> 2#  or it is happening during the setting of vector length or
> >> finalizing features?
> >>
> >> int kvm_arch_init_vcpu(CPUState *cs)
> >> {
> >>     [...]
> >>           /* Do KVM_ARM_VCPU_INIT ioctl */
> >>          ret = kvm_arm_vcpu_init(cpu);   ---->[1]
> >>          if (ret) {
> >>             return ret;
> >>         }
> >>            if (cpu_isar_feature(aa64_sve, cpu)) {
> >>          ret = kvm_arm_sve_set_vls(cpu); ---->[2]
> >>          if (ret) {
> >>              return ret;
> >>          }
> >>          ret = kvm_arm_vcpu_finalize(cpu, KVM_ARM_VCPU_SVE);--->[3]
> >>          if (ret) {
> >>              return ret;
> >>          }
> >>      }
> >> [...]
> >> }
> >>
> >> I think it's happening because vector length is going uninitialized.
> >> This initialization
> >> happens in context to  arm_cpu_finalize_features() which I forgot to call 
> >> before
> >> calling KVM finalize.
> >>
> >>>
> >>> kvm_arch_init_vcpu() is supposed to be called in the realization path in 
> >>> current
> >>> implementation (without this series) because the parameters (features) to 
> >>> KVM_ARM_VCPU_INIT
> >>> is populated at vCPU realization time.
> >>
> >> Not necessarily. It is just meant to initialize the KVM. If we take care 
> >> of the
> >> KVM requirements in the similar way the realize path does we should be
> >> fine. Can you try to add the patch below in your code and test if it works?
> >>
> >>   diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> >> index c4b68a0b17..1091593478 100644
> >> --- a/target/arm/kvm.c
> >> +++ b/target/arm/kvm.c
> >> @@ -1068,6 +1068,9 @@ void kvm_arm_create_host_vcpu(ARMCPU *cpu)
> >>           abort();
> >>       }
> >>
> >> +     /* finalize the features like SVE, SME etc */
> >> +     arm_cpu_finalize_features(cpu, &error_abort);
> >> +
> >>       /*
> >>        * Initialize the vCPU in the host. This will reset the sys regs
> >>        * for this vCPU and related registers like MPIDR_EL1 etc. also
> >>
> >>
> >>
> >>
> >>>
> >>> $ home/gavin/sandbox/qemu.main/build/qemu-system-aarch64           \
> >>>     --enable-kvm -machine virt,gic-version=3 -cpu host               \
> >>>     -smp cpus=4,disabledcpus=2 -m 1024M                              \
> >>>     -kernel /home/gavin/sandbox/linux.guest/arch/arm64/boot/Image    \
> >>>     -initrd /home/gavin/sandbox/images/rootfs.cpio.xz -nographic
> >>> qemu-system-aarch64: Failed to initialize host vcpu 4
> >>> Aborted (core dumped)
> >>>
> >>> Backtrace
> >>> =========
> >>> (gdb) bt
> >>> #0  0x0000ffff9106bc80 in __pthread_kill_implementation () at 
> >>> /lib64/libc.so.6
> >>> #1  0x0000ffff9101aa40 [PAC] in raise () at /lib64/libc.so.6
> >>> #2  0x0000ffff91005988 [PAC] in abort () at /lib64/libc.so.6
> >>> #3  0x0000aaaab1cc26b8 [PAC] in kvm_arm_create_host_vcpu 
> >>> (cpu=0xaaaab9ab1bc0)
> >>>       at ../target/arm/kvm.c:1081
> >>> #4  0x0000aaaab1cd0c94 in virt_setup_lazy_vcpu_realization 
> >>> (cpuobj=0xaaaab9ab1bc0, vms=0xaaaab98870a0)
> >>>       at ../hw/arm/virt.c:2483
> >>> #5  0x0000aaaab1cd180c in machvirt_init (machine=0xaaaab98870a0) at 
> >>> ../hw/arm/virt.c:2777
> >>> #6  0x0000aaaab160f220 in machine_run_board_init
> >>>       (machine=0xaaaab98870a0, mem_path=0x0, errp=0xfffffa86bdc8) at 
> >>> ../hw/core/machine.c:1722
> >>> #7  0x0000aaaab1a25ef4 in qemu_init_board () at ../system/vl.c:2723
> >>> #8  0x0000aaaab1a2635c in qmp_x_exit_preconfig (errp=0xaaaab38a50f0 
> >>> <error_fatal>)
> >>>       at ../system/vl.c:2821
> >>> #9  0x0000aaaab1a28b08 in qemu_init (argc=15, argv=0xfffffa86c1f8) at 
> >>> ../system/vl.c:3882
> >>> #10 0x0000aaaab221d9e4 in main (argc=15, argv=0xfffffa86c1f8) at 
> >>> ../system/main.c:71
> >>
> >>
> >> Thank you for this. Please let me know if the above fix works and also
> >> the return values in
> >> case you encounter errors.
> >
> > I've pushed the fix to below branch for your convenience:
> >
> > Branch: 
> > https://github.com/salil-mehta/qemu/commits/virt-cpuhp-armv8/rfc-v6.2
> > Fix: 
> > https://github.com/salil-mehta/qemu/commit/1f1fbc0998ffb1fe26140df3c336bf2be2aa8669
> >
>
> I guess rfc-v6.2 branch isn't ready for test because it runs into another 
> crash
> dump with rfc-v6.2 branch, like below.


rfc-6.2 is not crashing on Kunpeng920 where I tested. But this
chip does not have some ARM extensions like SVE etc so
Unfortunately, I can't test SVE/SME/PAuth etc support.

Can you disable SVE and then try if it comes up just to corner
the case?

>
> host$ /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64                 
>     \
>        -accel kvm -machine virt,gic-version=host,nvdimm=on                    
>      \
>        -cpu host,sve=on                                                       
>      \
>        -smp 
> maxcpus=4,cpus=2,disabledcpus=2,sockets=2,clusters=2,cores=1,threads=1 \
>        -m 4096M,slots=16,maxmem=128G                                          
>      \
>        -object memory-backend-ram,id=mem0,size=2048M                          
>      \
>        -object memory-backend-ram,id=mem1,size=2048M                          
>      \
>        -numa node,nodeid=0,memdev=mem0,cpus=0-1                               
>      \
>        -numa node,nodeid=1,memdev=mem1,cpus=2-3                               
>      \
>        -L /home/gavin/sandbox/qemu.main/build/pc-bios                         
>      \
>        -monitor none -serial mon:stdio -nographic -gdb tcp::6666              
>      \
>        -qmp tcp:localhost:5555,server,wait=off                                
>      \
>        -bios /home/gavin/sandbox/qemu.main/build/pc-bios/edk2-aarch64-code.fd 
>      \
>        -kernel /home/gavin/sandbox/linux.guest/arch/arm64/boot/Image          
>      \
>        -initrd /home/gavin/sandbox/images/rootfs.cpio.xz                      
>      \
>        -append memhp_default_state=online_movable
>          :
>          :
> guest$ cd /sys/devices/system/cpu/
> guest$ cat present enabled online
> 0-3
> 0-1
> 0-1
> (qemu) device_set 
> host-arm-cpu,socket-id=1,cluster-id=0,core-id=0,thread-id=0,admin-state=enable
> qemu-system-aarch64: kvm_init_vcpu: kvm_arch_init_vcpu failed (2): Operation 
> not permitted


Ah, I see. I think I understand the issue. It's complaining
about calling the  finalize twice. Is it possible to check as
I do not have a way to test it?


int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int feature)
{
switch (feature) {
case KVM_ARM_VCPU_SVE:
[...]
if (kvm_arm_vcpu_sve_finalized(vcpu))
return -EPERM;-----> this where it must be popping?
[...]
}


>
> I picked the fix (the last patch in rfc-v6.2 branch) to rfc-v6 branch, same 
> crash dump
> can be seen.

Are you getting previously reported abort or above new problem?


Thanks
Salil.

Reply via email to