Hi Julien, Thank you for the review.
On Mon, Jan 12, 2026 at 4:04 PM Julien Grall <[email protected]> wrote: > > Hi, > > On 12/01/2026 12:50, Mykola Kvach wrote: > > From: Mykola Kvach <[email protected]> > > > > If a spurious virtual timer interrupt occurs (i.e. the interrupt fires > > but CNTV_CTL_EL0 does not report it as pending), Xen masks the virtual > > timer and injects the vtimer IRQ into the guest. For Linux guests, the > > timer interrupt is unmasked only after programming a new CVAL value from > > the timer interrupt handler. When the interrupt is not reported as > > pending, the handler can skip that programming step, leaving the timer > > masked and stalling the affected CPU. > > I guess this is happening if Linux is trying to modify CVAL with the > local interrupt masked? CVAL/TVAL programming in Linux is indeed done with local IRQs disabled. The virtual timer IRQ handler runs in hardirq context with local IRQs disabled. However, the failure here is not "Linux modifies CVAL while IRQs are masked". The problematic case is that Xen injects a vtimer IRQ while the architectural state already says it is *not pending* (CNTV_CTL_EL0 has ISTATUS=0, and CNTV_TVAL_EL0 is positive). In my Xen trace the injection happens with ctl=0x1 and cntv_tval=0xf320, yet Xen still injects vIRQ 27. When Linux receives such an injected interrupt, it reads ISTATUS=0 and treats it as spurious, so it skips the normal re-arm path that would program a new CVAL and leave the timer enabled. If Xen masked the vtimer output before injection, it can remain masked, and the CPU stops getting timer events. So local IRQ masking is expected in Linux; the root cause is injecting when ISTATUS is already 0. To add more data points, here is a partial log from the stuck case. In my instrumentation: - "snap ... index 0" is from leave_hypervisor_to_guest (Xen->guest) - "snap ... index 1" is from enter_hypervisor_from_guest (guest->Xen) We do a normal timer expiry/injection path (vtimer: virt timer expired, then vgic_inject_irq), return to the guest (snap index 0), and later re-enter Xen (snap index 1). The issue is that on a subsequent entry from the guest, Xen hits vtimer_interrupt with CNTV_CTL_EL0=0x1 and CNTV_TVAL_EL0 positive, but still injects vIRQ 27: [513559989049] CPU1: snap: vcpu d1v0 index 0 now 513559988779 seq 4008771 pcpu 1 time ctl 1 cval 542163190853 cntvct 542159034528 [513559989679] CPU1: [1] b0a0020200000202 [513559989964] CPU1: [513559995049] CPU1: snap: vcpu d1v0 index 1 now 513559994374 seq 4008771 pcpu 1 time ctl 1 cval 542159103072 cntvct 542159040944 [513559999009] CPU1: virt_timer_save: vcpu d1v0 ctl=0x1 cval=0x7e3b336460 cntvct=0x7e3b328200 [513559999519] CPU1: virt_timer_save: setting timer for vcpu d1v0 513560053249 513559999459 [513560054630] CPU1: vtimer: virt timer expired for vcpu d1v0 ctl 0 cval 542159103072 cntvct 542159104512 [513560055185] CPU1: vgic_inject_irq: vcpu d1v0 virq 27 [513560056100] CPU1: VCPU unblock d1v0 pause 1 poll_evtchn 0 [513560057105] CPU1: VCPU wake d1v0 [513560058710] CPU1: virt_timer_restore: vcpu d1v0 ctl=0x3 cval=0x7e3b336460 CPU 1 status 1 expires=513560053249 now 513560058650 cntvct=0x7e3b337b00 [513560061050] CPU1: snap: vcpu d1v0 index 0 now 513560060795 seq 4008772 pcpu 1 time ctl 7 cval 542159103072 cntvct 542159111344 [513560061755] CPU1: [0] 50a000000000001b [513560062025] CPU1: [513560067785] CPU1: snap: vcpu d1v0 index 1 now 513560067185 seq 4008772 pcpu 1 time ctl 1 cval 542159180688 cntvct 542159118528 [513560068415] CPU1: [513560070215] CPU1: WFI blocking d1v0 [513560072180] CPU1: virt_timer_save: vcpu d1v0 ctl=0x1 cval=0x7e3b349390 cntvct=0x7e3b33b2e0 [513560072690] CPU1: virt_timer_save: setting timer for vcpu d1v0 513560126014 513560072630 [513560127275] CPU1: vtimer: virt timer expired for vcpu d1v0 ctl 0 cval 542159180688 cntvct 542159181984 [513560128250] CPU1: vgic_inject_irq: vcpu d1v0 virq 27 [513560132015] CPU1: virt_timer_restore: vcpu d1v0 ctl=0x3 cval=0x7e3b349390 CPU 1 status 1 expires=513560126014 now 513560131955 cntvct=0x7e3b34ac70 [513560134340] CPU1: snap: vcpu d1v0 index 0 now 513560134070 seq 4008773 pcpu 1 time ctl 7 cval 542159180688 cntvct 542159189504 [513560135045] CPU1: [0] 50a000000000001b [513560135345] CPU1: [513560140940] CPU1: vtimer_interrupt: vcpu d1v0 ctl=0x1 cval cache=0x7e3b349390 cval reg 0x7e3b35c4b0 cntvct=0x7e3b34d180 cntv_tval=0xf320 LR0=0x10a000000000001b LR1=0 LR2=0 LR3=0 [513560142230] CPU1: vgic_inject_irq: vcpu d1v0 virq 27 [513560143595] CPU1: vgic_raise_guest_irq: vcpu d1v0 virq 27 lr 0 [513560144540] CPU1: VCPU unblock d1v0 pause 0 poll_evtchn 0 [513560146175] CPU1: snap: vcpu d1v0 index 1 now 513560145740 seq 4008773 pcpu 1 time ctl 3 cval 542159258800 cntvct 542159202128 [513560146820] CPU1: [0] 50a000000000001b [513560147090] CPU1: [513560148875] CPU1: WFI blocking d1v0 [513560149310] CPU1: VCPU unblock d1v0 pause 1 poll_evtchn 0 [513560150885] CPU1: VCPU wake d1v0 [513560152370] CPU1: snap: vcpu d1v0 index 0 now 513560152115 seq 4008774 pcpu 1 time ctl 3 cval 542159258800 cntvct 542159208736 [513560153060] CPU1: [0] 50a000000000001b [513560153360] CPU1: [513560157080] CPU1: snap: vcpu d1v0 index 1 now 513560156435 seq 4008774 pcpu 1 time ctl 3 cval 542159258800 cntvct 542159213760 The same line also shows that CNTV_CVAL_EL0 already differs from Xen's cached value: cval cache=0x7e3b349390, cval reg=0x7e3b35c4b0 So the injection is not aligned with the current CVAL/ISTATUS state. This matches Linux treating the interrupt as spurious (ISTATUS=0) and skipping the normal re-arm path, which can leave the vtimer output masked and the CPU without further timer events. This is why the fix is at the injection boundary: if CNTV_CTL_EL0 indicates ISTATUS=0, Xen should not mask and inject the vtimer IRQ. > > > > > This patch mirrors the Linux arm generic timer handler: if the interrupt > > fires but the pending bit is not set, treat it as spurious and ignore it. > > Have you considered fixing properly our virtual timer emulation? I know > this requires more code, but at least we are not adding more > non-compliant code which requires patching the Guest OS. > > IIRC there was a series from Stewart to solve it and it was in pretty > good shape at the time it was posted. > I don’t think this patch adds more non-compliant behavior or requires any guest patching. Quite the opposite: it prevents Xen from delivering a virtual timer interrupt when the architectural state says it is not pending (CNTV_CTL_EL0.ISTATUS=0 / CNTV_TVAL_EL0 > 0). The current behavior (mask + inject in that situation) is what produces the spurious interrupt in the guest and can leave the timer output masked. So the intent is a minimal, self-contained Xen-side bug fix at the injection boundary: If ISTATUS==0, there is no vtimer interrupt to deliver, so we should not mask and inject one. This does not rely on guest spurious handling; it avoids creating the spurious condition in the first place. Also, could you please share the subject/Message-ID or a link to the series you have in mind, so I can make sure I’m looking at the right one? > > > > This issue is reproducible under heavy load on the R-Car X5H board > > (Cortex-A720AE r0p0). > > Signed-off-by: Mykola Kvach <[email protected]> > > --- > > xen/arch/arm/include/asm/perfc_defn.h | 7 ++++--- > > xen/arch/arm/time.c | 11 ++++++++++- > > 2 files changed, 14 insertions(+), 4 deletions(-) > > > > diff --git a/xen/arch/arm/include/asm/perfc_defn.h > > b/xen/arch/arm/include/asm/perfc_defn.h > > index effd25b69e..f83989d95a 100644 > > --- a/xen/arch/arm/include/asm/perfc_defn.h > > +++ b/xen/arch/arm/include/asm/perfc_defn.h > > @@ -69,9 +69,10 @@ PERFCOUNTER(ppis, "#PPIs") > > PERFCOUNTER(spis, "#SPIs") > > PERFCOUNTER(guest_irqs, "#GUEST-IRQS") > > > > -PERFCOUNTER(hyp_timer_irqs, "Hypervisor timer interrupts") > > -PERFCOUNTER(virt_timer_irqs, "Virtual timer interrupts") > > -PERFCOUNTER(maintenance_irqs, "Maintenance interrupts") > > +PERFCOUNTER(hyp_timer_irqs, "Hypervisor timer interrupts") > > +PERFCOUNTER(virt_timer_irqs, "Virtual timer interrupts") > > +PERFCOUNTER(virt_timer_spurious_irqs, "Virtual timer spurious > > interrupts") > > +PERFCOUNTER(maintenance_irqs, "Maintenance interrupts") > > > > PERFCOUNTER(atomics_guest, "atomics: guest access") > > PERFCOUNTER(atomics_guest_paused, "atomics: guest paused") > > diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c > > index cc3fcf47b6..d18d6568bb 100644 > > --- a/xen/arch/arm/time.c > > +++ b/xen/arch/arm/time.c > > @@ -258,6 +258,8 @@ static void htimer_interrupt(int irq, void *dev_id) > > > > static void vtimer_interrupt(int irq, void *dev_id) > > { > > + register_t ctl; > > + > > /* > > * Edge-triggered interrupts can be used for the virtual timer. Even > > * if the timer output signal is masked in the context switch, the > > @@ -271,9 +273,16 @@ static void vtimer_interrupt(int irq, void *dev_id) > > if ( unlikely(is_idle_vcpu(current)) ) > > return; > > > > + ctl = READ_SYSREG(CNTV_CTL_EL0); > > + if ( unlikely(!(ctl & CNTx_CTL_PENDING)) ) > > For the others, the Armv8 specification names this field ISTATUS. > Regardless what I wrote above, the change look alright. Before I ack, > can you confirm whether you checked other OSes (I am thinking at least > Zephyr) will also ignore spurious interrupt? Regarding other OSes: I haven't reproduced this with Zephyr domain yet, but I did take a look at Zephyr's Arm generic timer IRQ handling. At first glance it follows the usual pattern, and I don't immediately see anything that would break if a spurious vtimer IRQ was/wasn't delivered. That said, to be confident I will run a small domU test where I artificially inject spurious vtimer interrupts into the guest (i.e. deliver the IRQ while CNTV_CTL_EL0.ISTATUS is 0) and verify that Zephyr does not get stuck and continues to re-arm the timer correctly. I'll report the results in a follow-up. Best regards, Mykola > > Cheers, > > -- > Julien Grall >
