On 18.08.2025 11:47, Andrew Cooper wrote: > On 18/08/2025 10:35 am, Jan Beulich wrote: >> On 08.08.2025 22:23, Andrew Cooper wrote: >>> With the shadow stack and exception handling adjustements in place, we can >>> now >>> activate FRED when appropriate. Note that opt_fred is still disabled by >>> default. >>> >>> Introduce init_fred() to set up all the MSRs relevant for FRED. FRED uses >>> MSR_STAR (entries from Ring3 only), and MSR_FRED_SSP_SL0 aliases MSR_PL0_SSP >>> when CET-SS is active. Otherwise, they're all new MSRs. >>> >>> With init_fred() existing, load_system_tables() and legacy_syscall_init() >>> should only be used when setting up IDT delivery. Insert ASSERT()s to this >>> effect, and adjust the various *_init() functions to make this property >>> true. >>> >>> Per the documentation, ap_early_traps_init() is responsible for switching >>> off >>> the boot GDT, which needs doing even in FRED mode. >>> >>> Finally, set CR4.FRED in {bsp,ap}_early_traps_init(). >> Probably you've done that already, but these last two paragraphs will need >> updating following patch 08 v1.1. > > It's on my list, but not done yet. > >> >>> Xen can now boot in FRED mode up until starting a PV guest, where it faults >>> because IRET is not permitted to change privilege. >>> >>> Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com> >> Reviewed-by: Jan Beulich <jbeul...@suse.com> > > Thanks, but I fear this patch has changed too much. I'll take a > decision when I've cleaned up the integration of the PV work. > >> >>> @@ -274,6 +279,44 @@ static void __init init_ler(void) >>> setup_force_cpu_cap(X86_FEATURE_XEN_LBR); >>> } >>> >>> +/* >>> + * Set up all MSRs relevant for FRED event delivery. >>> + * >>> + * Xen does not use any of the optional config in MSR_FRED_CONFIG, so all >>> that >>> + * is needed is the entrypoint. >>> + * >>> + * Because FRED always provides a good stack, NMI and #DB do not need any >>> + * special treatment. Only #DF needs another stack level, and #MC for the >>> + * offchance that Xen's main stack suffers an uncorrectable error. >>> + * >>> + * FRED reuses MSR_STAR to provide the segment selector values to load on >>> + * entry from Ring3. Entry from Ring0 leave %cs and %ss unmodified. >>> + */ >>> +static void init_fred(void) >>> +{ >>> + unsigned long stack_top = get_stack_bottom() & ~(STACK_SIZE - 1); >>> + >>> + ASSERT(opt_fred == 1); >>> + >>> + wrmsrns(MSR_STAR, XEN_MSR_STAR); >>> + wrmsrns(MSR_FRED_CONFIG, (unsigned long)entry_FRED_R3); >>> + >>> + wrmsrns(MSR_FRED_RSP_SL0, (unsigned long)(&get_cpu_info()->_fred + 1)); >>> + wrmsrns(MSR_FRED_RSP_SL1, 0); >> In the event of a bug somewhere causing this slot to be accessed, is the >> wrapping behavior well-defined, resulting in an attempt to write to the >> top end of VA space? (Then again, if the wrapping itself caused a fault, >> the overall effect would be largely the same - in many cases #DF.) > > The wrapping is well defined - like other cases, it goes to the top of > address space, but that's owned by PV guests. SMAP ought to mitigate > what would otherwise be a priv-esc. > > With IDT, we poisoned the unused pointers with non-canonical addresses, > but that's not possible here, as they're MSRs and checked at this point, > rather than when they're used. > > I suspect the best we can do is reuse the #DB or NMI stacks, and > intentionally reverse the regular and shadow stack pointers, meaning > that any attempt to use SL1 will hit a guard page and escalate to #DF.
I was wondering whether to store the upper end of zero_page[]. Or else point into entirely unmapped space. Jan