> From: Peter Maydell <[email protected]> > Sent: Tuesday, October 14, 2025 4:00 PM > To: Salil Mehta <[email protected]> > > On Tue, 14 Oct 2025 at 15:48, Salil Mehta <[email protected]> wrote: > > > > Hi Peter, > > > > > From: Peter Maydell <[email protected]> > > > Sent: Tuesday, October 14, 2025 3:29 PM > > > To: Salil Mehta <[email protected]> > > > > > > On Tue, 14 Oct 2025 at 15:22, Salil Mehta <[email protected]> > wrote: > > > > > > > > Hi Peter, > > > > > > > > > From: Peter Maydell <[email protected]> > > > > > Sent: Tuesday, October 14, 2025 2:50 PM > > > > > To: Salil Mehta <[email protected]> > > > > > > > > > > On Tue, 14 Oct 2025 at 14:41, Salil Mehta > > > > > <[email protected]> > > > wrote: > > > > > > I thought you asked me to validate the fix by replacing below: > > > > > > > > > > > > https://lore.kernel.org/qemu-devel/20251001010127.3092631-22-s > > > > > > alil > > > > > > .meh > > > > > > [email protected]/ > > > > > > > > > > > > > > > > > > Yes, I'm using the recent RFC V6 vCPU Hotplug patches branch > > > > > > I've pushed to the community. > > > > > > > > > > > > https://lore.kernel.org/qemu-devel/20251001010127.3092631-1- > salil. > > > > > > meht > > > > > > [email protected]/ > > > > > > > > > > That's the one with the "lazy realize" hack, right? I imagine > > > > > what's happening is that we realize the GIC, and the code in > > > > > this patch assumes that all the CPUs are already realized at > > > > > that point. When we try to get the register value for a > > > > > not-yet-realized CPU the kernel > > > complains. > > > > > > > > > > > > Even if we realize all of the vCPUs the problem will not go away. > > > > This problem is happening because we have recently started to Exit > > > > Hypercalls > > > to userspace. > > > > This means we are now accessing the system register in a > > > > non-atomic > > > context. > > > > > > The point of this patch is that it moves the read of ICC_CTLR_EL1 > > > out of the reset path and into the GIC realize method, at which > > > point no vCPUs should have started running. But it does assume that > > > you don't have half-created VCPUs connected to the GIC. > > > > > > This Is not true. Actually, inner cpu_exec() (in kvm-all..c) loop > > keeps on dipping into the KVM_RUN IOCTL and exiting back with INTR > > continuously as the realized vCPUs are in RUNNABLE state initially. > > The actual "start-powered-off" policy only gets applied after first system- > reset happens. > > In what situation do we ever start running a VCPU before the *GIC* has > been realized? The GIC should get realized as part of creating the virt board, > which must complete before we do anything like running a vcpu.
Just after realization of vCPU in the machvirt_init() you can see the default power_state is PSCI CPU_ON, which means KVM_MP_STATE_RUNNABLE. Since, the thread is up and not doing IO wait in userspace it gets into cpu_exec() loop and actually run KVM_RUN IOCTL. Inside the KVM it momentarily takes the vCPU mutex but later exit and releases. This keeps going on for all of the vCPU threads realized early. Sure, but GIC is not getting used by any of the vCPU threads. The guest kernel and hence the VGIC driver does not exist yet. It needs to do its initialization first before we can even think of any interrupt handling? > > > > > The observation you are seeing has got nothing to do with lazy > realization. > > > > The problem happens even after threads are realized and then we > > > > try to access the ICC_CTLR_EL1 register during cpu_reset() > > > > > > With this patch, we should not be accessing ICC_CTLR_EL1 during CPU > reset. > > > The backtrace you posted does not have CPU reset in it, so whatever > > > is going wrong there must be something else. > > > > Yes, but its crashing in the realization of the GIC i.e. in context of > > machvirt_init() First reset of the vCPUs happens much later than this. > > Hence, the reason of this contention is different than the one you are > trying to solve using this patch. > > Yes, and my suggestion is that the failure you are seeing is only because you > have got half-created vcpu objects. Your backtrace shows that the error here > is not EBUSY, but ENOTTY. Let me revisit that part again. Many thanks! Best regards Salil.
