Re: [PATCH v3 3/4] x86/spec-ctrl: Fix up the RSBA/RRSBA bits as appropriate

2023-06-15 Thread Jan Beulich
On 14.06.2023 19:25, Andrew Cooper wrote:
> On 13/06/2023 10:30 am, Jan Beulich wrote:
>> On 12.06.2023 18:13, Andrew Cooper wrote:
>>> @@ -593,15 +596,93 @@ static bool __init retpoline_calculations(void)
>>>  return false;
>>>  
>>>  /*
>>> - * RSBA may be set by a hypervisor to indicate that we may move to a
>>> - * processor which isn't retpoline-safe.
>>> + * The meaning of the RSBA and RRSBA bits have evolved over time.  The
>>> + * agreed upon meaning at the time of writing (May 2023) is thus:
>>> + *
>>> + * - RSBA (RSB Alternative) means that an RSB may fall back to an
>>> + *   alternative predictor on underflow.  Skylake uarch and later all 
>>> have
>>> + *   this property.  Broadwell too, when running microcode versions 
>>> prior
>>> + *   to Jan 2018.
>>> + *
>>> + * - All eIBRS-capable processors suffer RSBA, but eIBRS also 
>>> introduces
>>> + *   tagging of predictions with the mode in which they were learned.  
>>> So
>>> + *   when eIBRS is active, RSBA becomes RRSBA (Restricted RSBA).
>>> + *
>>> + * - CPUs are not expected to enumerate both RSBA and RRSBA.
>>> + *
>>> + * Some parts (Broadwell) are not expected to ever enumerate this
>>> + * behaviour directly.  Other parts have differing enumeration with
>>> + * microcode version.  Fix up Xen's idea, so we can advertise them 
>>> safely
>>> + * to guests, and so toolstacks can level a VM safety for migration.
>>> + *
>>> + * The following states exist:
>>> + *
>>> + * |   | RSBA | EIBRS | RRSBA | Notes  | Action|
>>> + * |---+--+---+---++---|
>>> + * | 1 |0 | 0 | 0 | OK (older parts)   | Maybe +RSBA   |
>>> + * | 2 |0 | 0 | 1 | Broken | +RSBA, -RRSBA |
>>> + * | 3 |0 | 1 | 0 | OK (pre-Aug ucode) | +RRSBA|
>>> + * | 4 |0 | 1 | 1 | OK |   |
>>> + * | 5 |1 | 0 | 0 | OK |   |
>>> + * | 6 |1 | 0 | 1 | Broken | -RRSBA|
>>> + * | 7 |1 | 1 | 0 | Broken | -RSBA, +RRSBA |
>>> + * | 8 |1 | 1 | 1 | Broken | -RSBA |
>> You've kept the Action column as you had it originally, despite no longer
>> applying all the fixups. Wouldn't it make sense to mark those we don't do,
>> e.g. by enclosing in parentheses?
> 
> Hmm, yes.  How does this look?
> 
> |   | RSBA | EIBRS | RRSBA | Notes  | Action (in principle) |
> |---+--+---+---++---|
> | 1 |    0 | 0 | 0 | OK (older parts)   | Maybe +RSBA   |
> | 2 |    0 | 0 | 1 | Broken | (+RSBA, -RRSBA)   |
> | 3 |    0 | 1 | 0 | OK (pre-Aug ucode) | +RRSBA    |
> | 4 |    0 | 1 | 1 | OK |   |
> | 5 |    1 | 0 | 0 | OK |   |
> | 6 |    1 | 0 | 1 | Broken | (-RRSBA)  |
> | 7 |    1 | 1 | 0 | Broken | (-RSBA, +RRSBA)   |
> | 8 |    1 | 1 | 1 | Broken | (-RSBA)   |

Yes, I think it's better to have it this way, thanks.

Jan



Re: [PATCH v3 3/4] x86/spec-ctrl: Fix up the RSBA/RRSBA bits as appropriate

2023-06-14 Thread Andrew Cooper
On 13/06/2023 10:30 am, Jan Beulich wrote:
> On 12.06.2023 18:13, Andrew Cooper wrote:
>> @@ -593,15 +596,93 @@ static bool __init retpoline_calculations(void)
>>  return false;
>>  
>>  /*
>> - * RSBA may be set by a hypervisor to indicate that we may move to a
>> - * processor which isn't retpoline-safe.
>> + * The meaning of the RSBA and RRSBA bits have evolved over time.  The
>> + * agreed upon meaning at the time of writing (May 2023) is thus:
>> + *
>> + * - RSBA (RSB Alternative) means that an RSB may fall back to an
>> + *   alternative predictor on underflow.  Skylake uarch and later all 
>> have
>> + *   this property.  Broadwell too, when running microcode versions 
>> prior
>> + *   to Jan 2018.
>> + *
>> + * - All eIBRS-capable processors suffer RSBA, but eIBRS also introduces
>> + *   tagging of predictions with the mode in which they were learned.  
>> So
>> + *   when eIBRS is active, RSBA becomes RRSBA (Restricted RSBA).
>> + *
>> + * - CPUs are not expected to enumerate both RSBA and RRSBA.
>> + *
>> + * Some parts (Broadwell) are not expected to ever enumerate this
>> + * behaviour directly.  Other parts have differing enumeration with
>> + * microcode version.  Fix up Xen's idea, so we can advertise them 
>> safely
>> + * to guests, and so toolstacks can level a VM safety for migration.
>> + *
>> + * The following states exist:
>> + *
>> + * |   | RSBA | EIBRS | RRSBA | Notes  | Action|
>> + * |---+--+---+---++---|
>> + * | 1 |0 | 0 | 0 | OK (older parts)   | Maybe +RSBA   |
>> + * | 2 |0 | 0 | 1 | Broken | +RSBA, -RRSBA |
>> + * | 3 |0 | 1 | 0 | OK (pre-Aug ucode) | +RRSBA|
>> + * | 4 |0 | 1 | 1 | OK |   |
>> + * | 5 |1 | 0 | 0 | OK |   |
>> + * | 6 |1 | 0 | 1 | Broken | -RRSBA|
>> + * | 7 |1 | 1 | 0 | Broken | -RSBA, +RRSBA |
>> + * | 8 |1 | 1 | 1 | Broken | -RSBA |
> You've kept the Action column as you had it originally, despite no longer
> applying all the fixups. Wouldn't it make sense to mark those we don't do,
> e.g. by enclosing in parentheses?

Hmm, yes.  How does this look?

|   | RSBA | EIBRS | RRSBA | Notes  | Action (in principle) |
|---+--+---+---++---|
| 1 |    0 | 0 | 0 | OK (older parts)   | Maybe +RSBA   |
| 2 |    0 | 0 | 1 | Broken | (+RSBA, -RRSBA)   |
| 3 |    0 | 1 | 0 | OK (pre-Aug ucode) | +RRSBA    |
| 4 |    0 | 1 | 1 | OK |   |
| 5 |    1 | 0 | 0 | OK |   |
| 6 |    1 | 0 | 1 | Broken | (-RRSBA)  |
| 7 |    1 | 1 | 0 | Broken | (-RSBA, +RRSBA)   |
| 8 |    1 | 1 | 1 | Broken | (-RSBA)   |


>> + * further investigation.
>> + */
>> +if ( cpu_has_eibrs ? cpu_has_rsba  /* Rows 7, 8 */
>> +   : cpu_has_rrsba /* Rows 2, 6 */ )
>> +{
>> +printk(XENLOG_ERR
>> +   "FIRMWARE BUG: CPU %02x-%02x-%02x, ucode 0x%08x: RSBA %u, 
>> EIBRS %u, RRSBA %u\n",
>> +   boot_cpu_data.x86, boot_cpu_data.x86_model,
>> +   boot_cpu_data.x86_mask, ucode_rev,
>> +   cpu_has_rsba, cpu_has_eibrs, cpu_has_rrsba);
> Perhaps with adjustments (as you deem them sensible)
> Reviewed-by: Jan Beulich 

Thanks.

~Andrew



Re: [PATCH v3 3/4] x86/spec-ctrl: Fix up the RSBA/RRSBA bits as appropriate

2023-06-13 Thread Jan Beulich
On 12.06.2023 18:13, Andrew Cooper wrote:
> @@ -593,15 +596,93 @@ static bool __init retpoline_calculations(void)
>  return false;
>  
>  /*
> - * RSBA may be set by a hypervisor to indicate that we may move to a
> - * processor which isn't retpoline-safe.
> + * The meaning of the RSBA and RRSBA bits have evolved over time.  The
> + * agreed upon meaning at the time of writing (May 2023) is thus:
> + *
> + * - RSBA (RSB Alternative) means that an RSB may fall back to an
> + *   alternative predictor on underflow.  Skylake uarch and later all 
> have
> + *   this property.  Broadwell too, when running microcode versions prior
> + *   to Jan 2018.
> + *
> + * - All eIBRS-capable processors suffer RSBA, but eIBRS also introduces
> + *   tagging of predictions with the mode in which they were learned.  So
> + *   when eIBRS is active, RSBA becomes RRSBA (Restricted RSBA).
> + *
> + * - CPUs are not expected to enumerate both RSBA and RRSBA.
> + *
> + * Some parts (Broadwell) are not expected to ever enumerate this
> + * behaviour directly.  Other parts have differing enumeration with
> + * microcode version.  Fix up Xen's idea, so we can advertise them safely
> + * to guests, and so toolstacks can level a VM safety for migration.
> + *
> + * The following states exist:
> + *
> + * |   | RSBA | EIBRS | RRSBA | Notes  | Action|
> + * |---+--+---+---++---|
> + * | 1 |0 | 0 | 0 | OK (older parts)   | Maybe +RSBA   |
> + * | 2 |0 | 0 | 1 | Broken | +RSBA, -RRSBA |
> + * | 3 |0 | 1 | 0 | OK (pre-Aug ucode) | +RRSBA|
> + * | 4 |0 | 1 | 1 | OK |   |
> + * | 5 |1 | 0 | 0 | OK |   |
> + * | 6 |1 | 0 | 1 | Broken | -RRSBA|
> + * | 7 |1 | 1 | 0 | Broken | -RSBA, +RRSBA |
> + * | 8 |1 | 1 | 1 | Broken | -RSBA |

You've kept the Action column as you had it originally, despite no longer
applying all the fixups. Wouldn't it make sense to mark those we don't do,
e.g. by enclosing in parentheses?

> + * However, we don't need perfect adherence to the spec.  We only need
> + * RSBA || RRSBA to indicate "alternative predictors potentially in use".
> + * Rows 1 & 3 are fixed up by later logic, as they're known 
> configurations
> + * which exist in the world.
>   *
> + * Complain loudly at the broken cases. They're safe for Xen to use (so 
> we
> + * don't attempt to correct), and may or may not exist in reality, but if
> + * we ever encoutner them in practice, something is wrong and needs

Nit: "encounter"

> + * further investigation.
> + */
> +if ( cpu_has_eibrs ? cpu_has_rsba  /* Rows 7, 8 */
> +   : cpu_has_rrsba /* Rows 2, 6 */ )
> +{
> +printk(XENLOG_ERR
> +   "FIRMWARE BUG: CPU %02x-%02x-%02x, ucode 0x%08x: RSBA %u, 
> EIBRS %u, RRSBA %u\n",
> +   boot_cpu_data.x86, boot_cpu_data.x86_model,
> +   boot_cpu_data.x86_mask, ucode_rev,
> +   cpu_has_rsba, cpu_has_eibrs, cpu_has_rrsba);

Perhaps with adjustments (as you deem them sensible)
Reviewed-by: Jan Beulich 

Jan



[PATCH v3 3/4] x86/spec-ctrl: Fix up the RSBA/RRSBA bits as appropriate

2023-06-12 Thread Andrew Cooper
In order to level a VM safely for migration, the toolstack needs to know the
RSBA/RRSBA properties of the CPU, whether or not they happen to be enumerated.

See the code comment for details.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 

v3:
 * Add a taint for bad EIBRS vs RSBA/RRSBA.
 * Minor comment improvements.

v2:
 * Rewrite almost from scratch.
---
 xen/arch/x86/include/asm/cpufeature.h |   1 +
 xen/arch/x86/spec_ctrl.c  | 100 --
 2 files changed, 96 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/include/asm/cpufeature.h 
b/xen/arch/x86/include/asm/cpufeature.h
index ace31e3b1f1a..e2cb8f3cc728 100644
--- a/xen/arch/x86/include/asm/cpufeature.h
+++ b/xen/arch/x86/include/asm/cpufeature.h
@@ -193,6 +193,7 @@ static inline bool boot_cpu_has(unsigned int feat)
 #define cpu_has_tsx_ctrlboot_cpu_has(X86_FEATURE_TSX_CTRL)
 #define cpu_has_taa_no  boot_cpu_has(X86_FEATURE_TAA_NO)
 #define cpu_has_fb_clearboot_cpu_has(X86_FEATURE_FB_CLEAR)
+#define cpu_has_rrsba   boot_cpu_has(X86_FEATURE_RRSBA)
 
 /* Synthesized. */
 #define cpu_has_arch_perfmonboot_cpu_has(X86_FEATURE_ARCH_PERFMON)
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 3892ce4d20ba..fb1b59b4d7e3 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -579,7 +579,10 @@ static bool __init check_smt_enabled(void)
 return false;
 }
 
-/* Calculate whether Retpoline is known-safe on this CPU. */
+/*
+ * Calculate whether Retpoline is known-safe on this CPU.  Fix up the
+ * RSBA/RRSBA bits as necessary.
+ */
 static bool __init retpoline_calculations(void)
 {
 unsigned int ucode_rev = this_cpu(cpu_sig).rev;
@@ -593,15 +596,93 @@ static bool __init retpoline_calculations(void)
 return false;
 
 /*
- * RSBA may be set by a hypervisor to indicate that we may move to a
- * processor which isn't retpoline-safe.
+ * The meaning of the RSBA and RRSBA bits have evolved over time.  The
+ * agreed upon meaning at the time of writing (May 2023) is thus:
+ *
+ * - RSBA (RSB Alternative) means that an RSB may fall back to an
+ *   alternative predictor on underflow.  Skylake uarch and later all have
+ *   this property.  Broadwell too, when running microcode versions prior
+ *   to Jan 2018.
+ *
+ * - All eIBRS-capable processors suffer RSBA, but eIBRS also introduces
+ *   tagging of predictions with the mode in which they were learned.  So
+ *   when eIBRS is active, RSBA becomes RRSBA (Restricted RSBA).
+ *
+ * - CPUs are not expected to enumerate both RSBA and RRSBA.
+ *
+ * Some parts (Broadwell) are not expected to ever enumerate this
+ * behaviour directly.  Other parts have differing enumeration with
+ * microcode version.  Fix up Xen's idea, so we can advertise them safely
+ * to guests, and so toolstacks can level a VM safety for migration.
+ *
+ * The following states exist:
+ *
+ * |   | RSBA | EIBRS | RRSBA | Notes  | Action|
+ * |---+--+---+---++---|
+ * | 1 |0 | 0 | 0 | OK (older parts)   | Maybe +RSBA   |
+ * | 2 |0 | 0 | 1 | Broken | +RSBA, -RRSBA |
+ * | 3 |0 | 1 | 0 | OK (pre-Aug ucode) | +RRSBA|
+ * | 4 |0 | 1 | 1 | OK |   |
+ * | 5 |1 | 0 | 0 | OK |   |
+ * | 6 |1 | 0 | 1 | Broken | -RRSBA|
+ * | 7 |1 | 1 | 0 | Broken | -RSBA, +RRSBA |
+ * | 8 |1 | 1 | 1 | Broken | -RSBA |
+ *
+ * However, we don't need perfect adherence to the spec.  We only need
+ * RSBA || RRSBA to indicate "alternative predictors potentially in use".
+ * Rows 1 & 3 are fixed up by later logic, as they're known configurations
+ * which exist in the world.
  *
+ * Complain loudly at the broken cases. They're safe for Xen to use (so we
+ * don't attempt to correct), and may or may not exist in reality, but if
+ * we ever encoutner them in practice, something is wrong and needs
+ * further investigation.
+ */
+if ( cpu_has_eibrs ? cpu_has_rsba  /* Rows 7, 8 */
+   : cpu_has_rrsba /* Rows 2, 6 */ )
+{
+printk(XENLOG_ERR
+   "FIRMWARE BUG: CPU %02x-%02x-%02x, ucode 0x%08x: RSBA %u, EIBRS 
%u, RRSBA %u\n",
+   boot_cpu_data.x86, boot_cpu_data.x86_model,
+   boot_cpu_data.x86_mask, ucode_rev,
+   cpu_has_rsba, cpu_has_eibrs, cpu_has_rrsba);
+add_taint(TAINT_CPU_OUT_OF_SPEC);
+}
+
+/*
  * Processors offering Enhanced IBRS are not guarenteed to be
  * repoline-safe.
  */
-if ( cpu_has_rsba || cpu_has_eibrs )
+if ( cpu_has_eibrs )
+