On Thu, Oct 16, 2025 at 09:31:21AM +0200, Jan Beulich wrote:
> Despite 1db7829e5657 ("x86/hpet: do local APIC EOI after interrupt
> processing") we can still observe nested invocations of
> hpet_interrupt_handler(). This is, afaict, a result of previously used
> channels retaining their IRQ affinity until some other CPU re-uses them.

But the underlying problem here is not so much the affinity itself,
but the fact that the channel is not stopped after firing?

> Such nesting is increasingly problematic with higher CPU counts, as both
> handle_hpet_broadcast() and cpumask_raise_softirq() have a cpumask_t local
> variable. IOW already a single level of nesting may require more stack
> space (2 times above 4k) than we have available (8k), when NR_CPUS=16383
> (the maximum value presently possible).
> 
> Signed-off-by: Jan Beulich <[email protected]>
> ---
> Whether this is still worthwhile with "x86/HPET: use single, global, low-
> priority vector for broadcast IRQ" isn't quite clear to me.
> 
> --- a/xen/arch/x86/hpet.c
> +++ b/xen/arch/x86/hpet.c
> @@ -442,6 +442,8 @@ static void __init hpet_fsb_cap_lookup(v
>             num_hpets_used, num_chs);
>  }
>  
> +static DEFINE_PER_CPU(struct hpet_event_channel *, lru_channel);
> +
>  static struct hpet_event_channel *hpet_get_channel(unsigned int cpu)
>  {
>      static unsigned int next_channel;
> @@ -454,9 +456,21 @@ static struct hpet_event_channel *hpet_g
>      if ( num_hpets_used >= nr_cpu_ids )
>          return &hpet_events[cpu];
>  
> +    /*
> +     * Try the least recently used channel first.  It may still have its 
> IRQ's
> +     * affinity set to the desired CPU.  This way we also limit having 
> multiple
> +     * of our IRQs raised on the same CPU, in possibly a nested manner.
> +     */
> +    ch = per_cpu(lru_channel, cpu);
> +    if ( ch && !test_and_set_bit(HPET_EVT_USED_BIT, &ch->flags) )
> +    {
> +        ch->cpu = cpu;
> +        return ch;
> +    }
> +
> +    /* Then look for an unused channel. */
>      next = arch_fetch_and_add(&next_channel, 1) % num_hpets_used;
>  
> -    /* try unused channel first */
>      for ( i = next; i < next + num_hpets_used; i++ )
>      {
>          ch = &hpet_events[i % num_hpets_used];
> @@ -479,6 +493,8 @@ static void set_channel_irq_affinity(str
>  {
>      struct irq_desc *desc = irq_to_desc(ch->msi.irq);
>  
> +    per_cpu(lru_channel, ch->cpu) = ch;
> +
>      ASSERT(!local_irq_is_enabled());
>      spin_lock(&desc->lock);
>      hpet_msi_mask(desc);

Maybe I'm missing the point here, but you are resetting the MSI
affinity anyway here, so there isn't much point in attempting to
re-use the same channel when Xen still unconditionally goes through the
process of setting the affinity anyway?

Thanks, Roger.

Reply via email to