On Tue, 14 Oct 2025 at 15:48, Salil Mehta <[email protected]> wrote:
>
> Hi Peter,
>
> > From: Peter Maydell <[email protected]>
> > Sent: Tuesday, October 14, 2025 3:29 PM
> > To: Salil Mehta <[email protected]>
> >
> > On Tue, 14 Oct 2025 at 15:22, Salil Mehta <[email protected]> wrote:
> > >
> > > Hi Peter,
> > >
> > > > From: Peter Maydell <[email protected]>
> > > > Sent: Tuesday, October 14, 2025 2:50 PM
> > > > To: Salil Mehta <[email protected]>
> > > >
> > > > On Tue, 14 Oct 2025 at 14:41, Salil Mehta <[email protected]>
> > wrote:
> > > > > I thought you asked me to validate the fix by replacing below:
> > > > >
> > > > > https://lore.kernel.org/qemu-devel/20251001010127.3092631-22-salil
> > > > > .meh
> > > > > [email protected]/
> > > > >
> > > > >
> > > > > Yes, I'm using the recent RFC V6 vCPU Hotplug patches branch I've
> > > > > pushed to the community.
> > > > >
> > > > > https://lore.kernel.org/qemu-devel/20251001010127.3092631-1-salil.
> > > > > meht
> > > > > [email protected]/
> > > >
> > > > That's the one with the "lazy realize" hack, right? I imagine what's
> > > > happening is that we realize the GIC, and the code in this patch
> > > > assumes that all the CPUs are already realized at that point. When
> > > > we try to get the register value for a not-yet-realized CPU the kernel
> > complains.
> > >
> > >
> > > Even if we realize all of the vCPUs the problem will not go away. This
> > > problem is happening because we have recently started to Exit Hypercalls
> > to userspace.
> > > This means we are now accessing the system register in a non-atomic
> > context.
> >
> > The point of this patch is that it moves the read of ICC_CTLR_EL1 out of the
> > reset path and into the GIC realize method, at which point no vCPUs should
> > have started running. But it does assume that you don't have half-created
> > VCPUs connected to the GIC.
>
>
> This Is not true. Actually, inner cpu_exec() (in kvm-all..c)  loop keeps on 
> dipping
> into the KVM_RUN IOCTL and exiting back with INTR continuously as the realized
> vCPUs are in RUNNABLE state initially. The actual "start-powered-off" policy 
> only
> gets applied after first system-reset happens.

In what situation do we ever start running a VCPU before
the *GIC* has been realized? The GIC should get realized
as part of creating the virt board, which must complete
before we do anything like running a vcpu.

> > > The observation you are seeing has got nothing to do with lazy 
> > > realization.
> > > The problem happens even after threads are realized and then we try to
> > > access the ICC_CTLR_EL1 register during cpu_reset()
> >
> > With this patch, we should not be accessing ICC_CTLR_EL1 during CPU reset.
> > The backtrace you posted does not have CPU reset in it, so whatever is going
> > wrong there must be something else.
>
> Yes, but its crashing in the realization of the GIC i.e. in context of 
> machvirt_init()
> First reset of the vCPUs happens much later than this. Hence, the reason of 
> this
> contention is different than the one you are trying to solve using this patch.

Yes, and my suggestion is that the failure you are seeing is only
because you have got half-created vcpu objects. Your backtrace
shows that the error here is not EBUSY, but ENOTTY.

-- PMM

Reply via email to