On 27/11/2025 7:58 am, Jan Beulich wrote:
> On 26.11.2025 18:56, Andrew Cooper wrote:
>> On 26/11/2025 4:55 pm, Andrew Cooper wrote:
>>> On 26/11/2025 3:07 pm, Jan Beulich wrote:
>>>> On 26.11.2025 14:22, Andrew Cooper wrote:
>>>>> @@ -1075,6 +966,112 @@ static void cf_check fam17_disable_c6(void *arg)
>>>>> wrmsrl(MSR_AMD_CSTATE_CFG, val & mask);
>>>>> }
>>>>>
>>>>> +static bool zenbleed_use_chickenbit(void)
>>>>> +{
>>>>> + unsigned int curr_rev;
>>>>> + uint8_t fixed_rev;
>>>>> +
>>>>> + /*
>>>>> + * If we're virtualised, we can't do family/model checks safely, and
>>>>> + * we likely wouldn't have access to DE_CFG even if we could see a
>>>>> + * microcode revision.
>>>>> + *
>>>>> + * A hypervisor may hide AVX as a stopgap mitigation. We're not in a
>>>>> + * position to care either way. An admin doesn't want to be
>>>>> disabling
>>>>> + * AVX as a mitigation on any build of Xen with this logic present.
>>>>> + */
>>>>> + if ( cpu_has_hypervisor || boot_cpu_data.family != 0x17 )
>>>>> + return false;
>>>>> +
>>>>> + curr_rev = this_cpu(cpu_sig).rev;
>>>>> + switch ( curr_rev >> 8 )
>>>>> + {
>>>>> + case 0x083010: fixed_rev = 0x7a; break;
>>>>> + case 0x086001: fixed_rev = 0x0b; break;
>>>>> + case 0x086081: fixed_rev = 0x05; break;
>>>>> + case 0x087010: fixed_rev = 0x32; break;
>>>>> + case 0x08a000: fixed_rev = 0x08; break;
>>>>> + default:
>>>>> + /*
>>>>> + * With the Fam17h check above, most parts getting here are Zen1.
>>>>> + * They're not affected. Assume Zen2 ones making it here are
>>>>> affected
>>>>> + * regardless of microcode version.
>>>>> + */
>>>>> + return is_zen2_uarch();
>>>>> + }
>>>>> +
>>>>> + return (uint8_t)curr_rev >= fixed_rev;
>>>>> +}
>>>>> +
>>>>> +void amd_init_de_cfg(const struct cpuinfo_x86 *c)
>>>>> +{
>>>>> + uint64_t val, new = 0;
>>>>> +
>>>>> + /* The MSR doesn't exist on Fam 0xf/0x11. */
>>>>> + if ( c->family != 0xf && c->family != 0x11 )
>>>>> + return;
>>>> Comment and code don't match. Did you mean
>>>>
>>>> if ( c->family == 0xf || c->family == 0x11 )
>>>> return;
>>>>
>>>> (along the lines of what you have in amd_init_lfence_dispatch())?
>>> Oh - that was a last minute refactor which I didn't do quite correctly.
>>> Yes, it should match amd_init_lfence_dispatch().
>>>
>>>>> + /*
>>>>> + * On Zen3 (Fam 0x19) and later CPUs, LFENCE is unconditionally
>>>>> dispatch
>>>>> + * serialising, and is enumerated in CPUID. Hypervisors may also
>>>>> + * enumerate it when the setting is in place and MSR_AMD64_DE_CFG
>>>>> isn't
>>>>> + * available.
>>>>> + */
>>>>> + if ( !test_bit(X86_FEATURE_LFENCE_DISPATCH, c->x86_capability) )
>>>>> + new |= AMD64_DE_CFG_LFENCE_SERIALISE;
>>>>> +
>>>>> + /*
>>>>> + * If vulnerable to Zenbleed and not mitigated in microcode, use the
>>>>> + * bigger hammer.
>>>>> + */
>>>>> + if ( zenbleed_use_chickenbit() )
>>>>> + new |= (1 << 9);
>>>>> +
>>>>> + if ( !new )
>>>>> + return;
>>>>> +
>>>>> + if ( rdmsr_safe(MSR_AMD64_DE_CFG, &val) ||
>>>>> + (val & new) == new )
>>>>> + return;
>>>>> +
>>>>> + /*
>>>>> + * DE_CFG is a Core-scoped MSR, and this write is racy. However,
>>>>> both
>>>>> + * threads calculate the new value from state which expected to be
>>>>> + * consistent across CPUs and unrelated to the old value, so the
>>>>> result
>>>>> + * should be consistent.
>>>>> + */
>>>>> + wrmsr_safe(MSR_AMD64_DE_CFG, val | new);
>>>> Either of the bits may be the cause of #GP. In that case we wouldn't set
>>>> the
>>>> other bit, even if it may be possible to set it.
>>> This MSR does not #GP on real hardware.
> I consider this unexpected / inconsistent, at least as long as some of the
> bits would be documented as reserved. "Would be" because the particular
> Fam17 and Fam19 PPRs I'm looking at don't even mention DE_CFG (or BP_CFG,
> for that matter).
You need the even-more-NDA manual to find those details.
Reserved doesn't mean #GP. It means "don't rely on the behaviour".
>>> Also, both of these bits come from instructions AMD have provided,
>>> saying "set $X in case $Y", which we have honoured as part of the
>>> conditions for setting up new, which I consider to be a reasonable
>>> guarantee that no #GP will ensue.
> The AMD instructions are for particular models, aren't they? While that
> may mean the bits are fine to blindly (try to) set on other models, pretty
> likely this can't be extended to other families. (While
> zenbleed_use_chickenbit() is family-specific, the LFENCE bit is tried
> without regard to family.)
The Managing Speciation whitepaper says "set bit 1 on Fam 10, 12, 14,
15-17".
It also says that AMD will treat the MSR and bit 1 as architectural
moving forwards. In reality, on Zen3 (post-dating the whitepaper) and
later, it's write-discard, read-as-1, and this is the behaviour we
provide to all VMs.
The Zenbleed instruction say "set bit 9 on Zen2".
So, the logic in this patch following AMD's written instructions.
~Andrew