On Wed, Nov 22, 2023 at 02:40:02PM +0100, Roger Pau Monné wrote:
> On Tue, Nov 21, 2023 at 04:26:04PM +0000, Alejandro Vallejo wrote:
> > Both Intel and AMD manuals agree that on x2APIC mode, the APIC LDR and ID
> > registers are derivable from each other through a fixed formula.
> > 
> > Xen uses that formula, but applies it to vCPU IDs (which are sequential)
> > rather than x2APIC_IDs (which are not, at the moment). As I understand it,
> > this is an attempt to tightly pack vCPUs into clusters so each cluster has
> > 16 vCPUs rather than 8, but this is problematic for OSs that might read the
> > x2APIC_ID and internally derive LDR (or the other way around)
> > 
> > This patch fixes the implementation so we follow the rules in the x2APIC
> > spec(s).
> > 
> > The patch also covers migrations from broken hypervisors, so LDRs are
> > preserved even for hotppluggable CPUs and across APIC resets.
> > 
> > Fixes: f9e0cccf7b35 ("x86/HVM: fix ID handling of x2APIC emulation")
> > Signed-off-by: Alejandro Vallejo <alejandro.vall...@cloud.com>
> 
> LGTM, just a couple of style comments.
> 
> > ---
> > I tested this by creating 3 checkpoints.
> >  1. One with pre-4.4 broken state (every LDR=1, by hacking save_regs)
> >  2. One with 4.4 onwards broken state (LDRs packed in their clusters)
> >  3. One with correct LDR values
> > 
> > (1) and (3) restores to the same thing. Consistent APIC_ID+LDR
> > (2) restores to what it previously had and hotplugs follow the same logic
> > ---
> >  xen/arch/x86/hvm/vlapic.c             | 81 +++++++++++++++++++--------
> >  xen/arch/x86/include/asm/hvm/domain.h | 13 +++++
> >  2 files changed, 72 insertions(+), 22 deletions(-)
> > 
> > diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
> > index a8e87c4446..7f169f1e5f 100644
> > --- a/xen/arch/x86/hvm/vlapic.c
> > +++ b/xen/arch/x86/hvm/vlapic.c
> > @@ -1061,13 +1061,23 @@ static const struct hvm_mmio_ops vlapic_mmio_ops = {
> >      .write = vlapic_mmio_write,
> >  };
> >  
> > +static uint32_t x2apic_ldr_from_id(uint32_t id)
> > +{
> > +    return ((id & ~0xF) << 12) | (1 << (id & 0xF));
> 
> Seeing other usages in vlapic.c I think the preference is to use lower
> case for hex numbers.
I thought it was covered by a MISRA rule, but it seems to apply only to the
literal suffixes. Fair enough, I'll revert to lowercase.

> 
> > +}
> > +
> >  static void set_x2apic_id(struct vlapic *vlapic)
> >  {
> > -    u32 id = vlapic_vcpu(vlapic)->vcpu_id;
> > -    u32 ldr = ((id & ~0xf) << 12) | (1 << (id & 0xf));
> > +    uint32_t vcpu_id = vlapic_vcpu(vlapic)->vcpu_id;
> > +    uint32_t apic_id = vcpu_id * 2;
> > +    uint32_t apic_ldr = x2apic_ldr_from_id(apic_id);
> >  
> > -    vlapic_set_reg(vlapic, APIC_ID, id * 2);
> > -    vlapic_set_reg(vlapic, APIC_LDR, ldr);
> > +    /* This is a migration bug workaround. See wall of text in 
> > lapic_load_fixup() */
> > +    if ( vlapic_domain(vlapic)->arch.hvm.has_inconsistent_x2apic_ldr_bug )
> > +        apic_ldr = x2apic_ldr_from_id(vcpu_id);
> > +
> > +    vlapic_set_reg(vlapic, APIC_ID, apic_id);
> > +    vlapic_set_reg(vlapic, APIC_LDR, apic_ldr);
> >  }
> >  
> >  int guest_wrmsr_apic_base(struct vcpu *v, uint64_t val)
> > @@ -1495,30 +1505,57 @@ static int cf_check lapic_save_regs(struct vcpu *v, 
> > hvm_domain_context_t *h)
> >  /*
> >   * Following lapic_load_hidden()/lapic_load_regs() we may need to
> >   * correct ID and LDR when they come from an old, broken hypervisor.
> > + *
> > + * Xen <= 4.4 had a bug by which all the APICs configured in x2APIC mode
> > + * got LDR = 1. This was fixed back then, but another bug was introduced
> > + * causing APIC ID and LDR to break the consistency they are meant to have
> > + * according to the specs (LDR was derived from vCPU ID, rather than APIC
> > + * ID)
> > + *
> > + * Long story short, we can detect both cases here. For the LDR=1 case we
> > + * want to fix it up on migrate, as IPIs just don't work on non-physical
> > + * mode otherwise. For the other case we actually want to preserve previous
> > + * behaviour so that existing running instances that may have already read
> > + * the LDR at the source host aren't surprised when IPIs stop working as
> > + * they did at the other end.
> > + *
> > + * Note that "x2apic_id == 0" has always been correct and can't be used to
> > + * discriminate these cases.
> > + *
> 
> I think it's best if this big comment was split between the relevant
> parts of the if below used to detect the broken states, as that makes
> the comments more in-place with the code.
Ack
> 
> > + * Yuck!
> >   */
> >  static void lapic_load_fixup(struct vlapic *vlapic)
> >  {
> > -    uint32_t id = vlapic->loaded.id;
> > +    /*
> > +     * This LDR would be present in broken versions of Xen 4.4 through 
> > 4.18.
> > +     * It's correct for the cpu with x2apic_id=0 (vcpu0), but invalid for
> > +     * any other.
> > +     */
> > +    uint32_t bad_ldr = x2apic_ldr_from_id(vlapic_vcpu(vlapic)->vcpu_id);
> >  
> > -    if ( vlapic_x2apic_mode(vlapic) && id && vlapic->loaded.ldr == 1 )
> > -    {
> > +    /*
> > +     * No need to perform fixups in non-x2apic mode, and x2apic_id == 0 has
> > +     * always been correct.
> > +     */
> > +    if ( !vlapic_x2apic_mode(vlapic) || !vlapic->loaded.id )
> 
> You could replace the !vlapic->loaded.id check and instead use:
> 
> vlapic->loaded.ldr == x2apic_ldr_from_id(vlapic->loaded.id)
> 
> As that will allow returning early from the function if the LDR is
> correct.  Then if none of the fixups below apply we could print a
> warning message that the LDR is incorrect, but cannot be fixed up.
Sounds good.

> 
> > +        return;
> > +
> > +    if ( vlapic->loaded.ldr == 1 )
> > +       /*
> > +        * Migration from a broken Xen 4.4 or earlier. We can't leave it
> > +        * as-is because it assigned the same LDR to every CPU. We'll fix
> > +        * the bug now and assign LDR values consistent with the APIC ID.
> > +        */
> > +        set_x2apic_id(vlapic);
> 
> Previous code also did some checks here related to APIC ID sanity,
> which are now dropped?
> 
> Might be worth mentioning in the commit message, if that was intended.
> 
It was intentional, yes. And it's true it warrants something in the commit
message. For reference, the checks previously done where...

> > +    else if ( bad_ldr == vlapic->loaded.ldr )
> >          /*
> > -         * This is optional: ID != 0 contradicts LDR == 1. It's being added
> > -         * to aid in eventual debugging of issues arising from the fixup 
> > done
> > -         * here, but can be dropped as soon as it is found to conflict with
> > -         * other (future) changes.
> > +         * This is a migration from a broken Xen between 4.4 and 4.18 and 
> > we
> > +         * must _PRESERVE_ LDRs so new vCPUs use consistent derivations. In
> > +         * this case we set this domain boolean so future CPU hotplugs
> > +         * derive an LDR consistent with the older Xen's broken idea of
> > +         * consistency.
> >           */
> > -        if ( GET_xAPIC_ID(id) != vlapic_vcpu(vlapic)->vcpu_id * 2 ||
> > -             id != SET_xAPIC_ID(GET_xAPIC_ID(id)) )
> > -            printk(XENLOG_G_WARNING "%pv: bogus APIC ID %#x loaded\n",
> > -                   vlapic_vcpu(vlapic), id);
... these. But they _seem_ bogus for several reasons. And I just got rid of
them on these grounds:

  * It's using the GET_xAPIC_ID() macro on the register, but the LAPIC is
    not in xAPIC mode (due to the previous check), so the legacy APIC must
    be derived from the lowest octet, not the highest. That macro is meant
    to be used on the MMIO register, not the MSR one.
  * The printk (wants to be) triggered when the ID field is not "canonical"
    OR the x2APIC ID is not a pure xAPIC ID. Neither of these things are
    problems per se, the former just happens to be true at the moment, but
    the latter may very well not be, and that shouldn't trigger a scary
    printk.
  * The error message seems to imply the effective APIC ID eventually left
    loaded is the bogus one, but that's not the case.

Actually, I might just move the printk into a separate else in line with
your previous suggestion.

> > -        set_x2apic_id(vlapic);
> > -    }
> > -    else /* Undo an eventual earlier fixup. */
> > -    {
> > -        vlapic_set_reg(vlapic, APIC_ID, id);
> > -        vlapic_set_reg(vlapic, APIC_LDR, vlapic->loaded.ldr);
> > -    }
> > +        vlapic_domain(vlapic)->arch.hvm.has_inconsistent_x2apic_ldr_bug = 
> > true;
> >  }
> >  
> >  static int cf_check lapic_load_hidden(struct domain *d, 
> > hvm_domain_context_t *h)
> > diff --git a/xen/arch/x86/include/asm/hvm/domain.h 
> > b/xen/arch/x86/include/asm/hvm/domain.h
> > index 6e53ce4449..a42a6e99bb 100644
> > --- a/xen/arch/x86/include/asm/hvm/domain.h
> > +++ b/xen/arch/x86/include/asm/hvm/domain.h
> > @@ -61,6 +61,19 @@ struct hvm_domain {
> >      /* Cached CF8 for guest PCI config cycles */
> >      uint32_t                pci_cf8;
> >  
> > +    /*
> > +     * Xen had a bug between 4.4 and 4.18 by which the x2APIC LDR was
> > +     * derived from the vcpu_id rather than the x2APIC ID. This is wrong,
> > +     * but changing this behaviour is tricky because guests might have
> > +     * already read the LDR and used it accordingly. In the interest of not
> > +     * breaking migrations from those hypervisors we track here whether
> > +     * this domain suffers from this or not so a hotplugged vCPU or an APIC
> > +     * reset can recover the same LDR it would've had on the older host.
> > +     *
> > +     * Yuck!
> > +     */
> > +    bool has_inconsistent_x2apic_ldr_bug;
> 
> Could you place the new field after the existing boolean fields in the
> struct? (AFAICT there's plenty of padding left there)
Sure. I placed it somewhere with unused padding. (that u32 is sandwiched
between an "unsigned long" and a pointer), but I'm happy to stash it with
the other booleans.

> 
> I also think the field name is too long, I would rather use
> x2apic_ldr_vcpu_id for example (to note that LDR is calculated from
> vCPU ID rather than APIC ID).
I made it intentionally long so it can't be missed that it's a workaround
and not architectural behaviour. Would you consider
"x2apic_ldr_bug_vcpu_id" acceptable? I'd rather keep at least the "bug"
part of the identifier around so it's not lost to time that this is a
workaround marker and nothing else.

> 
> I think it would be good if we could trim a bit the comments, as I get
> the impression it's a bit repetitive.
> 
> So I would leave the big explanation in lapic_load_fixup(), and just
> comment here:
> 
> /* Compatibility setting for a bug in x2APIC LDR format. */
Sure, I'll put that comment on a diet

> 
> Thanks, Roger.

Cheers,
Alejandro

Reply via email to