On 18/08/2025 10:35 am, Jan Beulich wrote: > On 08.08.2025 22:23, Andrew Cooper wrote: >> With the shadow stack and exception handling adjustements in place, we can >> now >> activate FRED when appropriate. Note that opt_fred is still disabled by >> default. >> >> Introduce init_fred() to set up all the MSRs relevant for FRED. FRED uses >> MSR_STAR (entries from Ring3 only), and MSR_FRED_SSP_SL0 aliases MSR_PL0_SSP >> when CET-SS is active. Otherwise, they're all new MSRs. >> >> With init_fred() existing, load_system_tables() and legacy_syscall_init() >> should only be used when setting up IDT delivery. Insert ASSERT()s to this >> effect, and adjust the various *_init() functions to make this property true. >> >> Per the documentation, ap_early_traps_init() is responsible for switching off >> the boot GDT, which needs doing even in FRED mode. >> >> Finally, set CR4.FRED in {bsp,ap}_early_traps_init(). > Probably you've done that already, but these last two paragraphs will need > updating following patch 08 v1.1.
It's on my list, but not done yet. > >> Xen can now boot in FRED mode up until starting a PV guest, where it faults >> because IRET is not permitted to change privilege. >> >> Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com> > Reviewed-by: Jan Beulich <jbeul...@suse.com> Thanks, but I fear this patch has changed too much. I'll take a decision when I've cleaned up the integration of the PV work. > >> @@ -274,6 +279,44 @@ static void __init init_ler(void) >> setup_force_cpu_cap(X86_FEATURE_XEN_LBR); >> } >> >> +/* >> + * Set up all MSRs relevant for FRED event delivery. >> + * >> + * Xen does not use any of the optional config in MSR_FRED_CONFIG, so all >> that >> + * is needed is the entrypoint. >> + * >> + * Because FRED always provides a good stack, NMI and #DB do not need any >> + * special treatment. Only #DF needs another stack level, and #MC for the >> + * offchance that Xen's main stack suffers an uncorrectable error. >> + * >> + * FRED reuses MSR_STAR to provide the segment selector values to load on >> + * entry from Ring3. Entry from Ring0 leave %cs and %ss unmodified. >> + */ >> +static void init_fred(void) >> +{ >> + unsigned long stack_top = get_stack_bottom() & ~(STACK_SIZE - 1); >> + >> + ASSERT(opt_fred == 1); >> + >> + wrmsrns(MSR_STAR, XEN_MSR_STAR); >> + wrmsrns(MSR_FRED_CONFIG, (unsigned long)entry_FRED_R3); >> + >> + wrmsrns(MSR_FRED_RSP_SL0, (unsigned long)(&get_cpu_info()->_fred + 1)); >> + wrmsrns(MSR_FRED_RSP_SL1, 0); > In the event of a bug somewhere causing this slot to be accessed, is the > wrapping behavior well-defined, resulting in an attempt to write to the > top end of VA space? (Then again, if the wrapping itself caused a fault, > the overall effect would be largely the same - in many cases #DF.) The wrapping is well defined - like other cases, it goes to the top of address space, but that's owned by PV guests. SMAP ought to mitigate what would otherwise be a priv-esc. With IDT, we poisoned the unused pointers with non-canonical addresses, but that's not possible here, as they're MSRs and checked at this point, rather than when they're used. I suspect the best we can do is reuse the #DB or NMI stacks, and intentionally reverse the regular and shadow stack pointers, meaning that any attempt to use SL1 will hit a guard page and escalate to #DF. ~Andrew