On 16.11.25 13:24, Julien Grall wrote: > Hi, > > On 12/11/2025 10:51, Mykyta Poturai wrote: >> Move IRQs from dying CPU to the online ones. >> Guest-bound IRQs are already handled by scheduler in the process of >> moving vCPUs to active pCPUs, so we only need to handle IRQs used by Xen >> itself. >> >> If IRQ is to be migrated, it's affinity is set to a mask of all online >> CPUs. With current GIC implementation, this means they are routed to a >> random online CPU. This may cause extra moves if multiple cores are >> disabled in sequence, but should prevent all interrupts from piling up >> on CPU0 in case of repeated up-down cycles on different cores. > > Wouldn't they eventually all move to CPU0 in the case of suspend/resume > or if all the CPUs but CPU0 are turned off and then off? If so, > shouldn't we try to rebalance the interrupts? >
In case of disabling/enabling all cores in a sequence, yes. This was designed with the idea of achieving some balancing when enabling/disabling some cores for power saving reasons. I agree that proper balancing should be implemented, but it is a complex task on its own and requires a substantial amount of testing on different hardware to prove it is close to optimal. So I think implementing it is out of scope for what I’m trying to do here. If this would be okay, I can implement a relatively simple solution of just adding onlined CPUs back to the affinity mask for now. I think it should improve the situation for the “switching all cores” case. >> >> IRQs from CPU 0 are never migrated, as dying CPU 0 means we are either >> shutting down compeletely or entering system suspend. > > I can't find a place where __cpu_disable() is called on CPU0. Do you > have any pointer? In any case, I am not sure I want to bake that > assumption in more places of the code. > I assume it would be called when suspend is implemented. In any case, I will rework this to replace the hard check for CPU 0 with the “is it the last CPU online” one. >> >> Considering that all Xen-used IRQs are currently allocated during init >> on CPU 0, and setup_irq uses smp_processor_id for the initial affinity. > > Looking at the SMMU driver, we seems to request IRQs at the time the > device is attached. So are you sure about this? > Indeed, I have missed that one. I will remove those statements then. >> This change is not strictly required for correct operation for now, but >> it should future-proof cpu hotplug and system suspend support in case >> some kind if IRQ balancing is implemented later. >> >> Signed-off-by: Mykyta Poturai <[email protected]> >> >> v3->v4: >> * patch introduced >> --- >> xen/arch/arm/include/asm/irq.h | 2 ++ >> xen/arch/arm/irq.c | 39 ++++++++++++++++++++++++++++++++++ >> xen/arch/arm/smpboot.c | 2 ++ >> 3 files changed, 43 insertions(+) >> >> diff --git a/xen/arch/arm/include/asm/irq.h b/xen/arch/arm/include/ >> asm/irq.h >> index 09788dbfeb..6e6e27bb80 100644 >> --- a/xen/arch/arm/include/asm/irq.h >> +++ b/xen/arch/arm/include/asm/irq.h >> @@ -126,6 +126,8 @@ bool irq_type_set_by_domain(const struct domain *d); >> void irq_end_none(struct irq_desc *irq); >> #define irq_end_none irq_end_none >> +void evacuate_irqs(unsigned int from); >> + >> #endif /* _ASM_HW_IRQ_H */ >> /* >> * Local variables: >> diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c >> index 28b40331f7..b383d71930 100644 >> --- a/xen/arch/arm/irq.c >> +++ b/xen/arch/arm/irq.c >> @@ -158,6 +158,45 @@ static int init_local_irq_data(unsigned int cpu) >> return 0; >> } >> +static void evacuate_irq(int irq, unsigned int from) > > Any reason why the 'irq' is signed? > >> +{ >> + struct irq_desc *desc = irq_to_desc(irq); >> + unsigned long flags; >> + >> + /* Don't move irqs from CPU 0 as it is always last to be disabled */ > > Per above, I am not convinced that we should special case CPU 0. But if > we do, then shouldn't this be part of evacuate_irqs() so we don't > pointlessly go through all the IRQs? > >> + if ( from == 0 ) >> + return; >> + >> + ASSERT(!cpumask_empty(&cpu_online_map)); >> + ASSERT(!cpumask_test_cpu(from, &cpu_online_map)); >> + >> + spin_lock_irqsave(&desc->lock, flags); >> + if ( likely(!desc->action) ) >> + goto out; >> + >> + if ( likely(test_bit(_IRQ_GUEST, &desc->status) || >> + test_bit(_IRQ_MOVE_PENDING, &desc->status)) ) >> + goto out; >> + >> + if ( cpumask_test_cpu(from, desc->affinity) ) >> + irq_set_affinity(desc, &cpu_online_map); > > I think it would be worth explaining why we are routing to any CPU > online rather than checking whether the affinity has other online CPUs. > > Just to note, I don't have strong opinion either way. It mainly needs to > be documented. > >> + >> +out: >> + spin_unlock_irqrestore(&desc->lock, flags); >> + return; >> +} >> + >> +void evacuate_irqs(unsigned int from) >> +{ >> + int irq; > > +> + for ( irq = NR_LOCAL_IRQS; irq < NR_IRQS; irq++ ) >> + evacuate_irq(irq, from); >> + >> + for ( irq = ESPI_BASE_INTID; irq < ESPI_MAX_INTID; irq++ ) > > AFAICT, irq_to_desc() would not be able to cope with ESPI interrupts > when CONFIG_GICV3_ESPI is not set. Has this been tested? > >> + evacuate_irq(irq, from); >> +} >> + >> static int cpu_callback(struct notifier_block *nfb, unsigned long >> action, >> void *hcpu) >> { >> diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c >> index 7f3cfa812e..46b24783dd 100644 >> --- a/xen/arch/arm/smpboot.c >> +++ b/xen/arch/arm/smpboot.c >> @@ -425,6 +425,8 @@ void __cpu_disable(void) >> smp_mb(); >> + evacuate_irqs(cpu); > > I think it would be worth explaining why evacuate_irqs() is called this > late in the process. > > > +> /* Return to caller; eventually the IPI mechanism will > unwind and the >> * scheduler will drop to the idle loop, which will call >> stop_cpu(). */ >> } > > Cheers, > -- Mykyta
