On Tue, Oct 24, 2023 at 11:33:21AM +0200, Jan Beulich wrote:
> On 23.10.2023 14:46, Roger Pau Monne wrote:
> > Sporadically we have seen the following during AP bringup on AMD platforms
> > only:
> > 
> > microcode: CPU59 updated from revision 0x830107a to 0x830107a, date = 
> > 2023-05-17
> > microcode: CPU60 updated from revision 0x830104d to 0x830107a, date = 
> > 2023-05-17
> > CPU60: No irq handler for vector 27 (IRQ -2147483648)
> > microcode: CPU61 updated from revision 0x830107a to 0x830107a, date = 
> > 2023-05-17
> > 
> > This is similar to the issue raised on Linux commit 36e9e1eab777e, where 
> > they
> > observed i8259 (active) vectors getting delivered to CPUs different than 0.
> > 
> > On AMD or Hygon platforms adjust the target CPU mask of i8259 interrupt
> > descriptors to contain all possible CPUs, so that APs will reserve the 
> > vector
> > at startup if any legacy IRQ is still delivered through the i8259.  Note 
> > that
> > if the IO-APIC takes over those interrupt descriptors the CPU mask will be
> > reset.
> > 
> > Spurious i8259 interrupt vectors however (IRQ7 and IRQ15) can be injected 
> > even
> > when all i8259 pins are masked, and hence would need to be handled on all 
> > CPUs.
> > 
> > Do not reserve the PIC spurious vectors on all CPUs, but do check for such
> > spurious interrupts on all CPUs if the vendor is AMD or Hygon.
> 
> The first half of this sentence reads as if it was describing a change,
> but aiui there's no change to existing behavior in this regard anymore.
> Maybe use something like "continue to reserve PIC vectors on CPU0 only"?
> 
> >  Note that once
> > the vectors get used by devices detecting PIC spurious interrupts will no
> > longer be possible, however the device should be able to cope with spurious
> > interrupt.  Such PIC spurious interrupts occurring when the vector is in 
> > use by
> > a local APIC routed source will lead to an extra EOI, which might
> > unintentionally clear a different vector from ISR.  Note this is already the
> > current behavior, so assume it's infrequent enough to not cause real issues.
> > 
> > Finally, adjust the printed message to display the CPU where the spurious
> > interrupt has been received, so it looks like:
> > 
> > microcode: CPU1 updated from revision 0x830107a to 0x830107a, date = 
> > 2023-05-17
> > cpu1: spurious 8259A interrupt: IRQ7
> > microcode: CPU2 updated from revision 0x830104d to 0x830107a, date = 
> > 2023-05-17
> > 
> > Fixes: 3fba06ba9f8b ('x86/IRQ: re-use legacy vector ranges on APs')
> > Signed-off-by: Roger Pau Monné <[email protected]>
> > ---
> > Changes since v1:
> >  - Do not reserved spurious PIC vectors on APs, but still check for spurious
> >    PIC interrupts.
> >  - Reword commit message.
> > ---
> > Not sure if the Fixes tag is the most appropriate here, since AFAICT this 
> > is a
> > hardware glitch, but it makes it easier to see to which versions the fix 
> > should
> > be backported, because Xen previous behavior was to reserve all legacy 
> > vectors
> > on all CPUs.
> 
> I'm inclined to suggest to (informally) invent an Amends: tag.
> 
> > --- a/xen/arch/x86/i8259.c
> > +++ b/xen/arch/x86/i8259.c
> > @@ -37,6 +37,15 @@ static bool _mask_and_ack_8259A_irq(unsigned int irq);
> >  
> >  bool bogus_8259A_irq(unsigned int irq)
> >  {
> > +    if ( smp_processor_id() &&
> > +         !(boot_cpu_data.x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) 
> > )
> > +        /*
> > +         * For AMD/Hygon do spurious PIC interrupt detection on all CPUs, 
> > as it
> > +         * has been observed that during unknown circumstances spurious PIC
> > +         * interrupts have been delivered to CPUs different than the BSP.
> > +         */
> > +        return false;
> > +
> >      return !_mask_and_ack_8259A_irq(irq);
> >  }
> 
> I don't think this function should be changed; imo the adjustment belongs
> at the call site.

It makes the call site much more complex to follow, in fact I was
considering pulling the PIC vector range checks into
bogus_8259A_irq().  Would that convince you into placing the check here
rather than in the caller context?

> > @@ -349,7 +359,22 @@ void __init init_IRQ(void)
> >              continue;
> >          desc->handler = &i8259A_irq_type;
> >          per_cpu(vector_irq, cpu)[LEGACY_VECTOR(irq)] = irq;
> > -        cpumask_copy(desc->arch.cpu_mask, cpumask_of(cpu));
> > +
> > +        /*
> > +         * The interrupt affinity logic never targets interrupts to offline
> > +         * CPUs, hence it's safe to use cpumask_all here.
> > +         *
> > +         * Legacy PIC interrupts are only targeted to CPU0, but depending 
> > on
> > +         * the platform they can be distributed to any online CPU in 
> > hardware.
> > +         * Note this behavior has only been observed on AMD hardware. In 
> > order
> > +         * to cope install all active legacy vectors on all CPUs.
> > +         *
> > +         * IO-APIC will change the destination mask if/when taking 
> > ownership of
> > +         * the interrupt.
> > +         */
> > +        cpumask_copy(desc->arch.cpu_mask, boot_cpu_data.x86_vendor &
> > +                                          (X86_VENDOR_AMD | 
> > X86_VENDOR_HYGON) ?
> > +                                          &cpumask_all : cpumask_of(cpu));
> 
> Nit: Imo this would better be wrapped as
> 
>         cpumask_copy(desc->arch.cpu_mask,
>                      boot_cpu_data.x86_vendor &
>                      (X86_VENDOR_AMD | X86_VENDOR_HYGON) ?
>                      &cpumask_all : cpumask_of(cpu));
> 
> or (considering how we often prefer to wrap conditional operators)
> 
>         cpumask_copy(desc->arch.cpu_mask,
>                      boot_cpu_data.x86_vendor &
>                      (X86_VENDOR_AMD | X86_VENDOR_HYGON)
>                      ? &cpumask_all : cpumask_of(cpu));
> 
> or
> 
>         cpumask_copy(desc->arch.cpu_mask,
>                      boot_cpu_data.x86_vendor &
>                      (X86_VENDOR_AMD | X86_VENDOR_HYGON) ? &cpumask_all
>                                                          : cpumask_of(cpu));
> 
> , possibly with the argument spanning three lines additionally
> parenthesized as a whole.
> 
> (Of course an amd_like() construct like we have in the emulator would
> further help readability here, but the way that works it cannot be
> easily generalized for use outside of the emulator.)

I think the last one could be my preferred indentation, will adjust to
that.

Thanks, Roger.

Reply via email to