Hi Peter, > From: Peter Maydell <[email protected]> > Sent: Tuesday, October 14, 2025 2:50 PM > To: Salil Mehta <[email protected]> > > On Tue, 14 Oct 2025 at 14:41, Salil Mehta <[email protected]> wrote: > > I thought you asked me to validate the fix by replacing below: > > > > https://lore.kernel.org/qemu-devel/20251001010127.3092631-22-salil.meh > > [email protected]/ > > > > > > Yes, I'm using the recent RFC V6 vCPU Hotplug patches branch I've > > pushed to the community. > > > > https://lore.kernel.org/qemu-devel/20251001010127.3092631-1-salil.meht > > [email protected]/ > > That's the one with the "lazy realize" hack, right? I imagine what's happening > is that we realize the GIC, and the code in this patch assumes that all the > CPUs are already realized at that point. When we try to get the register value > for a not-yet-realized CPU the kernel complains.
Even if we realize all of the vCPUs the problem will not go away. This problem is happening because we have recently started to Exit Hypercalls to userspace. This means we are now accessing the system register in a non-atomic context. In fact in contrary to above, lazy realization actually helps in reducing the vCPU lock contention as there are no threads running within KVM_RUN IOCTL. Hence, those threads do not take the lock and hence do not cause lock contention. If we are handling HVC and resetting the system register in vCPU thread context then we are already in atomic context as vCPU mutexes are taken inside the KVM . The problem what we are seeing comes into picture only when we are trying to access the system registers without holding vCPU mutex lock because we are not in KVM_RUN IOCTL. For example, 1. When we Exit the HVC.SMC Hypercall into userspace and access the ICC_CTLR_EL1 system register via KVM Device IOCTL. OR 2. Like in the current patch, we are trying to access ICC_CTLR_EL1 when we are not in any vCPU context running inside KVM_RUN IOCTL. Here, we will most probably contend with CPU0 held mutex (at least) > > (I strongly agree with Igor's review remarks here > https://lore.kernel.org/qemu-devel/20251006160027.20067fe4@fedora/ > that lazy realizing of CPU objects is a bad idea.) The observation you are seeing has got nothing to do with lazy realization. The problem happens even after threads are realized and then we try to access the ICC_CTLR_EL1 register during cpu_reset() Many thanks! Best regards Salil.
