arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset

Salil Mehta via Sat, 18 Oct 2025 01:09:28 -0700

Hi Peter,

> From: Peter Maydell <[email protected]>
> Sent: Wednesday, October 15, 2025 1:06 PM
> To: Salil Mehta <[email protected]>
> On Wed, 15 Oct 2025 at 11:58, Salil Mehta <[email protected]> wrote:
> >
> > Hi Peter,
> >
> > > From: [email protected]
> <qemu-
> > > [email protected]> On Behalf Of
> Peter
> > > Maydell
> > > Sent: Tuesday, October 14, 2025 11:25 AM
> > > To: [email protected]
> > > Cc: Salil Mehta <[email protected]>; Marc Zyngier
> > > <[email protected]>
> > > Subject: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1
> > > from kernel in cpuif reset
> > >
> > > Currently in arm_gicv3_icc_reset() we read the kernel's value of
> > > ICC_CTLR_EL1 as part of resetting the CPU interface.  This mostly
> > > works, but we're actually breaking an assumption the kernel makes
> > > that userspace only accesses the in-kernel GIC data when the VM is
> > > totally paused, which may not be the case if a single vCPU is being
> > > reset.  The effect is that it's possible that the read attempt returns 
> > > EBUSY.
> > >
> > > Avoid this by reading the kernel's value of the reset ICC_CTLR_EL1
> > > once in device realize. This brings ICC_CTLR_EL1 into line with the
> > > other cpuif registers, where we assume we know what the kernel is
> > > resetting them to and just update QEMU's data structures in
> arm_gicv3_icc_reset().
> > >
> > > Signed-off-by: Peter Maydell <[email protected]>
> > > ---
> > > I've only tested this fairly lightly, but it seems to work.
> > > Salil, does this fix the EBUSY issues you were seeing ?
> > >
> > >  include/hw/intc/arm_gicv3_common.h |  3 ++
> > >  hw/intc/arm_gicv3_kvm.c            | 49 +++++++++++++++++++++---------
> > >  2 files changed, 38 insertions(+), 14 deletions(-)
> > >
> > [...]
> > > +
> > > +    /*
> > > +     * Now we can read the kernel's initial value of ICC_CTLR_EL1, which
> > > +     * we will need if a CPU interface is reset. If the kernel is ancient
> > > +     * and doesn't support writing the GIC state then we don't need to
> > > +     * care what reset does to QEMU's data structures.
> > > +     */
> > > +    if (!s->migration_blocker) {
> > > +        for (i = 0; i < s->num_cpu; i++) {
> > > +            GICv3CPUState *c = &s->cpu[i];
> > > +
> > > +            kvm_device_access(s->dev_fd,
> > > KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS,
> > > +                              KVM_VGIC_ATTR(ICC_CTLR_EL1, c->gicr_typer),
> > > +                              &c->kvm_reset_icc_ctlr_el1, false, 
> > > &error_abort);
> > > +        }
> > > +    }
> > >  }
> >
> >
> > Above changes assume that the driver's configured value of the
> > ICC_CTLR_EL1 system register is the same as the default value. I've
> > verified that this currently the case. However, it safe to assume that
> > this will remain true in the future as well?
> 
> I don't understand what you mean here. We read the kernel's ICC_CTLR_EL1
> at VM startup, when we know it will be the reset value, because we haven't
> run any VCPUs yet.



System register fetches its value from ICH_VMCR_EL2 and ICH_VTR_EL2.
In specific, EOIMode, PMHE and CBPR fields of ICC_CTLR_EL1 are from
the VMCR register. Later gets configured when driver gets loaded and again
re-configured in context to each CPU ON request(via in-kernel  CPU Hotplug
state machine; CPUHP_AP_IRQ_GIC_STARTING). This configures the VMCR
again and again. Although, the value as of now is same. 

You might want to check gic_cpu_sys_reg_init() in irq-gic-v3.c

My question was, is it *future* safe to ignore these updates to VMCR fields
which can happen in context to guest kernel actions at point of time.

https://lore.kernel.org/all/[email protected]/

The extra complexity which you saw in above proposed patch was to handle
that although it looked unnecessary as the value was always static.

The kernel patch proposed ensured that cache is always up to date (just in case)

https://lore.kernel.org/lkml/[email protected]/


Internally, before posting RFC V6, I had discussed the implementation you've
proposed in this patch with Marc and Jonathan earlier but because of the issues
of hang I faced and above mentioned unaddressed issue of the driver update
I thought to push the version present in the RFC V6 to start with for your 
humble
consideration and let you decide 😊


I've tested this patch and I can confirm it is working along with RFC V6

https://github.com/salil-mehta/qemu/commits/virt-cpuhp-armv8/rfc-v6.1/



I've also figured out the reason(s) why it did not work earlier i.e. below 
problems
and fixed them as well:
1. hang problem which I saw earlier
2. crash or no ENOTTY which I recently posted

I'll share the details of above problems in the context where the questions
were asked.


Many thanks!


> > We can instead cache the value once during the first cpu_reset() for
> > each CPU interface and reuse that cached value for all subsequent
> > cpu_reset() operations.  This would simplify the implementation while
> > retaining major goals.
> 
> No, we should read the value at realize time, as this patch does.
> Trying to do it at cpu_reset runs into the same problem that we have now
> that the kernel doesn't expect us to be reading the register when another
> vcpu might be running.


Agreed. we can. Problem was a miss in RFC V6. I fixed that.

Yes, the vCPU lock contention is an issue. Its better to avoid it as far as 
possible. 


> 
> > Calling pause_vcpus_all() at VM initialization time--during  the first
> > cpu_reset()--should not cause any issues, as all secondary vCPUs are
> > idle (i.e. their PCs are not initialized and kernel has not yet
> > loaded). Pausing such vCPUs at this stage should not be a disruptive action.
> 
> We should not need to run pause_vcpus_all() at all here. CPU reset should
> be fine done for one vcpu with the others running.


Yes, with this implementation working now, we can avoid it, so no issues.


> 
> > Moreover, if this stage poses a problem for a KVM device IOCTL access,
> > it similarly affect other components like GICv3 ITS, which also uses
> > KVM device IOCTL during GICv3 reset hold. At this stage all the vCPUs
> > should still be in a pre-boot or idle state
> 
> > static void kvm_arm_its_reset_hold(Object *obj, ResetType type) {
> > [...]
> >
> >     if (kvm_device_check_attr(s->dev_fd,
> KVM_DEV_ARM_VGIC_GRP_CTRL,
> >                                KVM_DEV_ARM_ITS_CTRL_RESET)) {
> >         kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_CTRL,
> >                           KVM_DEV_ARM_ITS_CTRL_RESET, NULL, true, 
> > &error_abort);
> >         return;
> >     }
> > [...]
> >
> > }
> >
> > We are not even pausing vCPUs here, yet we expect above KVM device
> > IOCTL to succeed on the first attempt. The operation also ends up
> > acquiring the mutex for all vCPUs within the KVM.
> 
> ITS reset should only happen (a) at start of the run, before any vcpus run or
> (b) as part of a full system reset. We should check that the vcpus are paused
> when we're doing system reset, but I would expect that to be the case
> because it would be likely to be buggy if not.


Agreed. I took bit of time to review what we stumbled upon in August and
September. The issue actually predated the "parking in user space for disabled
vCPU threads". It was first reported during testing, where enabling and 
disabling
a vCPU - followed by a reboot - led to a crash in GICv3 ITS because of the
lock contention due to the disabled vCPU thread sleeping inside the KVM.

As you rightly pointed, we might not see the same issue in (a) & (b).

Thanks for the clarification.

Best regards
Salil.

RE: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset

Reply via email to