amd: Fix race editing DE_CFG

Andrew Cooper Thu, 27 Nov 2025 09:43:23 -0800

On 27/11/2025 7:58 am, Jan Beulich wrote:
> On 26.11.2025 18:56, Andrew Cooper wrote:
>> On 26/11/2025 4:55 pm, Andrew Cooper wrote:
>>> On 26/11/2025 3:07 pm, Jan Beulich wrote:
>>>> On 26.11.2025 14:22, Andrew Cooper wrote:
>>>>> @@ -1075,6 +966,112 @@ static void cf_check fam17_disable_c6(void *arg)
>>>>>   wrmsrl(MSR_AMD_CSTATE_CFG, val & mask);
>>>>>  }
>>>>>  
>>>>> +static bool zenbleed_use_chickenbit(void)
>>>>> +{
>>>>> +    unsigned int curr_rev;
>>>>> +    uint8_t fixed_rev;
>>>>> +
>>>>> +    /*
>>>>> +     * If we're virtualised, we can't do family/model checks safely, and
>>>>> +     * we likely wouldn't have access to DE_CFG even if we could see a
>>>>> +     * microcode revision.
>>>>> +     *
>>>>> +     * A hypervisor may hide AVX as a stopgap mitigation.  We're not in a
>>>>> +     * position to care either way.  An admin doesn't want to be 
>>>>> disabling
>>>>> +     * AVX as a mitigation on any build of Xen with this logic present.
>>>>> +     */
>>>>> +    if ( cpu_has_hypervisor || boot_cpu_data.family != 0x17 )
>>>>> +        return false;
>>>>> +
>>>>> +    curr_rev = this_cpu(cpu_sig).rev;
>>>>> +    switch ( curr_rev >> 8 )
>>>>> +    {
>>>>> +    case 0x083010: fixed_rev = 0x7a; break;
>>>>> +    case 0x086001: fixed_rev = 0x0b; break;
>>>>> +    case 0x086081: fixed_rev = 0x05; break;
>>>>> +    case 0x087010: fixed_rev = 0x32; break;
>>>>> +    case 0x08a000: fixed_rev = 0x08; break;
>>>>> +    default:
>>>>> +        /*
>>>>> +         * With the Fam17h check above, most parts getting here are Zen1.
>>>>> +         * They're not affected.  Assume Zen2 ones making it here are 
>>>>> affected
>>>>> +         * regardless of microcode version.
>>>>> +         */
>>>>> +        return is_zen2_uarch();
>>>>> +    }
>>>>> +
>>>>> +    return (uint8_t)curr_rev >= fixed_rev;
>>>>> +}
>>>>> +
>>>>> +void amd_init_de_cfg(const struct cpuinfo_x86 *c)
>>>>> +{
>>>>> +    uint64_t val, new = 0;
>>>>> +
>>>>> +    /* The MSR doesn't exist on Fam 0xf/0x11. */
>>>>> +    if ( c->family != 0xf && c->family != 0x11 )
>>>>> +        return;
>>>> Comment and code don't match. Did you mean
>>>>
>>>>     if ( c->family == 0xf || c->family == 0x11 )
>>>>         return;
>>>>
>>>> (along the lines of what you have in amd_init_lfence_dispatch())?
>>> Oh - that was a last minute refactor which I didn't do quite correctly. 
>>> Yes, it should match amd_init_lfence_dispatch().
>>>
>>>>> +    /*
>>>>> +     * On Zen3 (Fam 0x19) and later CPUs, LFENCE is unconditionally 
>>>>> dispatch
>>>>> +     * serialising, and is enumerated in CPUID.  Hypervisors may also
>>>>> +     * enumerate it when the setting is in place and MSR_AMD64_DE_CFG 
>>>>> isn't
>>>>> +     * available.
>>>>> +     */
>>>>> +    if ( !test_bit(X86_FEATURE_LFENCE_DISPATCH, c->x86_capability) )
>>>>> +        new |= AMD64_DE_CFG_LFENCE_SERIALISE;
>>>>> +
>>>>> +    /*
>>>>> +     * If vulnerable to Zenbleed and not mitigated in microcode, use the
>>>>> +     * bigger hammer.
>>>>> +     */
>>>>> +    if ( zenbleed_use_chickenbit() )
>>>>> +        new |= (1 << 9);
>>>>> +
>>>>> +    if ( !new )
>>>>> +        return;
>>>>> +
>>>>> +    if ( rdmsr_safe(MSR_AMD64_DE_CFG, &val) ||
>>>>> +         (val & new) == new )
>>>>> +        return;
>>>>> +
>>>>> +    /*
>>>>> +     * DE_CFG is a Core-scoped MSR, and this write is racy.  However, 
>>>>> both
>>>>> +     * threads calculate the new value from state which expected to be
>>>>> +     * consistent across CPUs and unrelated to the old value, so the 
>>>>> result
>>>>> +     * should be consistent.
>>>>> +     */
>>>>> +    wrmsr_safe(MSR_AMD64_DE_CFG, val | new);
>>>> Either of the bits may be the cause of #GP. In that case we wouldn't set 
>>>> the
>>>> other bit, even if it may be possible to set it.
>>> This MSR does not #GP on real hardware.
> I consider this unexpected / inconsistent, at least as long as some of the
> bits would be documented as reserved. "Would be" because the particular
> Fam17 and Fam19 PPRs I'm looking at don't even mention DE_CFG (or BP_CFG,
> for that matter).


You need the even-more-NDA manual to find those details.

Reserved doesn't mean #GP. It means "don't rely on the behaviour".

>>> Also, both of these bits come from instructions AMD have provided,
>>> saying "set $X in case $Y", which we have honoured as part of the
>>> conditions for setting up new, which I consider to be a reasonable
>>> guarantee that no #GP will ensue.
> The AMD instructions are for particular models, aren't they? While that
> may mean the bits are fine to blindly (try to) set on other models, pretty
> likely this can't be extended to other families. (While
> zenbleed_use_chickenbit() is family-specific, the LFENCE bit is tried
> without regard to family.)

The Managing Speciation whitepaper says "set bit 1 on Fam 10, 12, 14,
15-17".

It also says that AMD will treat the MSR and bit 1 as architectural
moving forwards.  In reality, on Zen3 (post-dating the whitepaper) and
later, it's write-discard, read-as-1, and this is the behaviour we
provide to all VMs.

The Zenbleed instruction say "set bit 9 on Zen2".

So, the logic in this patch following AMD's written instructions.

~Andrew

Re: [PATCH 3/3] x86/amd: Fix race editing DE_CFG

Reply via email to