Re: [PATCH v3 3/4] x86/spec-ctrl: Fix up the RSBA/RRSBA bits as appropriate
On 14.06.2023 19:25, Andrew Cooper wrote: > On 13/06/2023 10:30 am, Jan Beulich wrote: >> On 12.06.2023 18:13, Andrew Cooper wrote: >>> @@ -593,15 +596,93 @@ static bool __init retpoline_calculations(void) >>> return false; >>> >>> /* >>> - * RSBA may be set by a hypervisor to indicate that we may move to a >>> - * processor which isn't retpoline-safe. >>> + * The meaning of the RSBA and RRSBA bits have evolved over time. The >>> + * agreed upon meaning at the time of writing (May 2023) is thus: >>> + * >>> + * - RSBA (RSB Alternative) means that an RSB may fall back to an >>> + * alternative predictor on underflow. Skylake uarch and later all >>> have >>> + * this property. Broadwell too, when running microcode versions >>> prior >>> + * to Jan 2018. >>> + * >>> + * - All eIBRS-capable processors suffer RSBA, but eIBRS also >>> introduces >>> + * tagging of predictions with the mode in which they were learned. >>> So >>> + * when eIBRS is active, RSBA becomes RRSBA (Restricted RSBA). >>> + * >>> + * - CPUs are not expected to enumerate both RSBA and RRSBA. >>> + * >>> + * Some parts (Broadwell) are not expected to ever enumerate this >>> + * behaviour directly. Other parts have differing enumeration with >>> + * microcode version. Fix up Xen's idea, so we can advertise them >>> safely >>> + * to guests, and so toolstacks can level a VM safety for migration. >>> + * >>> + * The following states exist: >>> + * >>> + * | | RSBA | EIBRS | RRSBA | Notes | Action| >>> + * |---+--+---+---++---| >>> + * | 1 |0 | 0 | 0 | OK (older parts) | Maybe +RSBA | >>> + * | 2 |0 | 0 | 1 | Broken | +RSBA, -RRSBA | >>> + * | 3 |0 | 1 | 0 | OK (pre-Aug ucode) | +RRSBA| >>> + * | 4 |0 | 1 | 1 | OK | | >>> + * | 5 |1 | 0 | 0 | OK | | >>> + * | 6 |1 | 0 | 1 | Broken | -RRSBA| >>> + * | 7 |1 | 1 | 0 | Broken | -RSBA, +RRSBA | >>> + * | 8 |1 | 1 | 1 | Broken | -RSBA | >> You've kept the Action column as you had it originally, despite no longer >> applying all the fixups. Wouldn't it make sense to mark those we don't do, >> e.g. by enclosing in parentheses? > > Hmm, yes. How does this look? > > | | RSBA | EIBRS | RRSBA | Notes | Action (in principle) | > |---+--+---+---++---| > | 1 | 0 | 0 | 0 | OK (older parts) | Maybe +RSBA | > | 2 | 0 | 0 | 1 | Broken | (+RSBA, -RRSBA) | > | 3 | 0 | 1 | 0 | OK (pre-Aug ucode) | +RRSBA | > | 4 | 0 | 1 | 1 | OK | | > | 5 | 1 | 0 | 0 | OK | | > | 6 | 1 | 0 | 1 | Broken | (-RRSBA) | > | 7 | 1 | 1 | 0 | Broken | (-RSBA, +RRSBA) | > | 8 | 1 | 1 | 1 | Broken | (-RSBA) | Yes, I think it's better to have it this way, thanks. Jan
Re: [PATCH v3 3/4] x86/spec-ctrl: Fix up the RSBA/RRSBA bits as appropriate
On 13/06/2023 10:30 am, Jan Beulich wrote: > On 12.06.2023 18:13, Andrew Cooper wrote: >> @@ -593,15 +596,93 @@ static bool __init retpoline_calculations(void) >> return false; >> >> /* >> - * RSBA may be set by a hypervisor to indicate that we may move to a >> - * processor which isn't retpoline-safe. >> + * The meaning of the RSBA and RRSBA bits have evolved over time. The >> + * agreed upon meaning at the time of writing (May 2023) is thus: >> + * >> + * - RSBA (RSB Alternative) means that an RSB may fall back to an >> + * alternative predictor on underflow. Skylake uarch and later all >> have >> + * this property. Broadwell too, when running microcode versions >> prior >> + * to Jan 2018. >> + * >> + * - All eIBRS-capable processors suffer RSBA, but eIBRS also introduces >> + * tagging of predictions with the mode in which they were learned. >> So >> + * when eIBRS is active, RSBA becomes RRSBA (Restricted RSBA). >> + * >> + * - CPUs are not expected to enumerate both RSBA and RRSBA. >> + * >> + * Some parts (Broadwell) are not expected to ever enumerate this >> + * behaviour directly. Other parts have differing enumeration with >> + * microcode version. Fix up Xen's idea, so we can advertise them >> safely >> + * to guests, and so toolstacks can level a VM safety for migration. >> + * >> + * The following states exist: >> + * >> + * | | RSBA | EIBRS | RRSBA | Notes | Action| >> + * |---+--+---+---++---| >> + * | 1 |0 | 0 | 0 | OK (older parts) | Maybe +RSBA | >> + * | 2 |0 | 0 | 1 | Broken | +RSBA, -RRSBA | >> + * | 3 |0 | 1 | 0 | OK (pre-Aug ucode) | +RRSBA| >> + * | 4 |0 | 1 | 1 | OK | | >> + * | 5 |1 | 0 | 0 | OK | | >> + * | 6 |1 | 0 | 1 | Broken | -RRSBA| >> + * | 7 |1 | 1 | 0 | Broken | -RSBA, +RRSBA | >> + * | 8 |1 | 1 | 1 | Broken | -RSBA | > You've kept the Action column as you had it originally, despite no longer > applying all the fixups. Wouldn't it make sense to mark those we don't do, > e.g. by enclosing in parentheses? Hmm, yes. How does this look? | | RSBA | EIBRS | RRSBA | Notes | Action (in principle) | |---+--+---+---++---| | 1 | 0 | 0 | 0 | OK (older parts) | Maybe +RSBA | | 2 | 0 | 0 | 1 | Broken | (+RSBA, -RRSBA) | | 3 | 0 | 1 | 0 | OK (pre-Aug ucode) | +RRSBA | | 4 | 0 | 1 | 1 | OK | | | 5 | 1 | 0 | 0 | OK | | | 6 | 1 | 0 | 1 | Broken | (-RRSBA) | | 7 | 1 | 1 | 0 | Broken | (-RSBA, +RRSBA) | | 8 | 1 | 1 | 1 | Broken | (-RSBA) | >> + * further investigation. >> + */ >> +if ( cpu_has_eibrs ? cpu_has_rsba /* Rows 7, 8 */ >> + : cpu_has_rrsba /* Rows 2, 6 */ ) >> +{ >> +printk(XENLOG_ERR >> + "FIRMWARE BUG: CPU %02x-%02x-%02x, ucode 0x%08x: RSBA %u, >> EIBRS %u, RRSBA %u\n", >> + boot_cpu_data.x86, boot_cpu_data.x86_model, >> + boot_cpu_data.x86_mask, ucode_rev, >> + cpu_has_rsba, cpu_has_eibrs, cpu_has_rrsba); > Perhaps with adjustments (as you deem them sensible) > Reviewed-by: Jan Beulich Thanks. ~Andrew
Re: [PATCH v3 3/4] x86/spec-ctrl: Fix up the RSBA/RRSBA bits as appropriate
On 12.06.2023 18:13, Andrew Cooper wrote: > @@ -593,15 +596,93 @@ static bool __init retpoline_calculations(void) > return false; > > /* > - * RSBA may be set by a hypervisor to indicate that we may move to a > - * processor which isn't retpoline-safe. > + * The meaning of the RSBA and RRSBA bits have evolved over time. The > + * agreed upon meaning at the time of writing (May 2023) is thus: > + * > + * - RSBA (RSB Alternative) means that an RSB may fall back to an > + * alternative predictor on underflow. Skylake uarch and later all > have > + * this property. Broadwell too, when running microcode versions prior > + * to Jan 2018. > + * > + * - All eIBRS-capable processors suffer RSBA, but eIBRS also introduces > + * tagging of predictions with the mode in which they were learned. So > + * when eIBRS is active, RSBA becomes RRSBA (Restricted RSBA). > + * > + * - CPUs are not expected to enumerate both RSBA and RRSBA. > + * > + * Some parts (Broadwell) are not expected to ever enumerate this > + * behaviour directly. Other parts have differing enumeration with > + * microcode version. Fix up Xen's idea, so we can advertise them safely > + * to guests, and so toolstacks can level a VM safety for migration. > + * > + * The following states exist: > + * > + * | | RSBA | EIBRS | RRSBA | Notes | Action| > + * |---+--+---+---++---| > + * | 1 |0 | 0 | 0 | OK (older parts) | Maybe +RSBA | > + * | 2 |0 | 0 | 1 | Broken | +RSBA, -RRSBA | > + * | 3 |0 | 1 | 0 | OK (pre-Aug ucode) | +RRSBA| > + * | 4 |0 | 1 | 1 | OK | | > + * | 5 |1 | 0 | 0 | OK | | > + * | 6 |1 | 0 | 1 | Broken | -RRSBA| > + * | 7 |1 | 1 | 0 | Broken | -RSBA, +RRSBA | > + * | 8 |1 | 1 | 1 | Broken | -RSBA | You've kept the Action column as you had it originally, despite no longer applying all the fixups. Wouldn't it make sense to mark those we don't do, e.g. by enclosing in parentheses? > + * However, we don't need perfect adherence to the spec. We only need > + * RSBA || RRSBA to indicate "alternative predictors potentially in use". > + * Rows 1 & 3 are fixed up by later logic, as they're known > configurations > + * which exist in the world. > * > + * Complain loudly at the broken cases. They're safe for Xen to use (so > we > + * don't attempt to correct), and may or may not exist in reality, but if > + * we ever encoutner them in practice, something is wrong and needs Nit: "encounter" > + * further investigation. > + */ > +if ( cpu_has_eibrs ? cpu_has_rsba /* Rows 7, 8 */ > + : cpu_has_rrsba /* Rows 2, 6 */ ) > +{ > +printk(XENLOG_ERR > + "FIRMWARE BUG: CPU %02x-%02x-%02x, ucode 0x%08x: RSBA %u, > EIBRS %u, RRSBA %u\n", > + boot_cpu_data.x86, boot_cpu_data.x86_model, > + boot_cpu_data.x86_mask, ucode_rev, > + cpu_has_rsba, cpu_has_eibrs, cpu_has_rrsba); Perhaps with adjustments (as you deem them sensible) Reviewed-by: Jan Beulich Jan
[PATCH v3 3/4] x86/spec-ctrl: Fix up the RSBA/RRSBA bits as appropriate
In order to level a VM safely for migration, the toolstack needs to know the RSBA/RRSBA properties of the CPU, whether or not they happen to be enumerated. See the code comment for details. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné CC: Wei Liu v3: * Add a taint for bad EIBRS vs RSBA/RRSBA. * Minor comment improvements. v2: * Rewrite almost from scratch. --- xen/arch/x86/include/asm/cpufeature.h | 1 + xen/arch/x86/spec_ctrl.c | 100 -- 2 files changed, 96 insertions(+), 5 deletions(-) diff --git a/xen/arch/x86/include/asm/cpufeature.h b/xen/arch/x86/include/asm/cpufeature.h index ace31e3b1f1a..e2cb8f3cc728 100644 --- a/xen/arch/x86/include/asm/cpufeature.h +++ b/xen/arch/x86/include/asm/cpufeature.h @@ -193,6 +193,7 @@ static inline bool boot_cpu_has(unsigned int feat) #define cpu_has_tsx_ctrlboot_cpu_has(X86_FEATURE_TSX_CTRL) #define cpu_has_taa_no boot_cpu_has(X86_FEATURE_TAA_NO) #define cpu_has_fb_clearboot_cpu_has(X86_FEATURE_FB_CLEAR) +#define cpu_has_rrsba boot_cpu_has(X86_FEATURE_RRSBA) /* Synthesized. */ #define cpu_has_arch_perfmonboot_cpu_has(X86_FEATURE_ARCH_PERFMON) diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c index 3892ce4d20ba..fb1b59b4d7e3 100644 --- a/xen/arch/x86/spec_ctrl.c +++ b/xen/arch/x86/spec_ctrl.c @@ -579,7 +579,10 @@ static bool __init check_smt_enabled(void) return false; } -/* Calculate whether Retpoline is known-safe on this CPU. */ +/* + * Calculate whether Retpoline is known-safe on this CPU. Fix up the + * RSBA/RRSBA bits as necessary. + */ static bool __init retpoline_calculations(void) { unsigned int ucode_rev = this_cpu(cpu_sig).rev; @@ -593,15 +596,93 @@ static bool __init retpoline_calculations(void) return false; /* - * RSBA may be set by a hypervisor to indicate that we may move to a - * processor which isn't retpoline-safe. + * The meaning of the RSBA and RRSBA bits have evolved over time. The + * agreed upon meaning at the time of writing (May 2023) is thus: + * + * - RSBA (RSB Alternative) means that an RSB may fall back to an + * alternative predictor on underflow. Skylake uarch and later all have + * this property. Broadwell too, when running microcode versions prior + * to Jan 2018. + * + * - All eIBRS-capable processors suffer RSBA, but eIBRS also introduces + * tagging of predictions with the mode in which they were learned. So + * when eIBRS is active, RSBA becomes RRSBA (Restricted RSBA). + * + * - CPUs are not expected to enumerate both RSBA and RRSBA. + * + * Some parts (Broadwell) are not expected to ever enumerate this + * behaviour directly. Other parts have differing enumeration with + * microcode version. Fix up Xen's idea, so we can advertise them safely + * to guests, and so toolstacks can level a VM safety for migration. + * + * The following states exist: + * + * | | RSBA | EIBRS | RRSBA | Notes | Action| + * |---+--+---+---++---| + * | 1 |0 | 0 | 0 | OK (older parts) | Maybe +RSBA | + * | 2 |0 | 0 | 1 | Broken | +RSBA, -RRSBA | + * | 3 |0 | 1 | 0 | OK (pre-Aug ucode) | +RRSBA| + * | 4 |0 | 1 | 1 | OK | | + * | 5 |1 | 0 | 0 | OK | | + * | 6 |1 | 0 | 1 | Broken | -RRSBA| + * | 7 |1 | 1 | 0 | Broken | -RSBA, +RRSBA | + * | 8 |1 | 1 | 1 | Broken | -RSBA | + * + * However, we don't need perfect adherence to the spec. We only need + * RSBA || RRSBA to indicate "alternative predictors potentially in use". + * Rows 1 & 3 are fixed up by later logic, as they're known configurations + * which exist in the world. * + * Complain loudly at the broken cases. They're safe for Xen to use (so we + * don't attempt to correct), and may or may not exist in reality, but if + * we ever encoutner them in practice, something is wrong and needs + * further investigation. + */ +if ( cpu_has_eibrs ? cpu_has_rsba /* Rows 7, 8 */ + : cpu_has_rrsba /* Rows 2, 6 */ ) +{ +printk(XENLOG_ERR + "FIRMWARE BUG: CPU %02x-%02x-%02x, ucode 0x%08x: RSBA %u, EIBRS %u, RRSBA %u\n", + boot_cpu_data.x86, boot_cpu_data.x86_model, + boot_cpu_data.x86_mask, ucode_rev, + cpu_has_rsba, cpu_has_eibrs, cpu_has_rrsba); +add_taint(TAINT_CPU_OUT_OF_SPEC); +} + +/* * Processors offering Enhanced IBRS are not guarenteed to be * repoline-safe. */ -if ( cpu_has_rsba || cpu_has_eibrs ) +if ( cpu_has_eibrs ) +