On 21.01.2025 15:25, Andrew Cooper wrote:
> Logic using performance counters needs to look at
> MSR_MISC_ENABLE.PERF_AVAILABLE before touching any other resources.
> 
> When virtualised under ESX, Xen dies with a #GP fault trying to read
> MSR_CORE_PERF_GLOBAL_CTRL.
> 
> Factor this logic out into a separate function (it's already too squashed to
> the RHS), and insert a check of MSR_MISC_ENABLE.PERF_AVAILABLE.
> 
> This also limits setting X86_FEATURE_ARCH_PERFMON, although oprofile (the only
> consumer of this flag) cross-checks too.
> 
> Reported-by: Jonathan Katz <jonathan.k...@aptar.com>
> Link: https://xcp-ng.org/forum/topic/10286/nesting-xcp-ng-on-esx-8
> Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>
> ---
> CC: Jan Beulich <jbeul...@suse.com>
> CC: Roger Pau Monné <roger....@citrix.com>
> CC: Oleksii Kurochko <oleksii.kuroc...@gmail.com>
> 
> Untested, but this is the same pattern used by oprofile and watchdog setup.

Wow, in the oprofile case with pretty bad open-coding.

> I've intentionally stopped using Intel style.  This file is already mixed (as
> visible even in context), and it doesn't remotely resemble it's Linux origin
> any more.

I guess you mean s/Intel/Linux/ here? (Yes, I'm entirely fine with using Xen
style there.)

> --- a/xen/arch/x86/cpu/intel.c
> +++ b/xen/arch/x86/cpu/intel.c
> @@ -535,39 +535,49 @@ static void intel_log_freq(const struct cpuinfo_x86 *c)
>      printk("%u MHz\n", (factor * max_ratio + 50) / 100);
>  }
>  
> +static void init_intel_perf(struct cpuinfo_x86 *c)
> +{
> +    uint64_t val;
> +    unsigned int eax, ver, nr_cnt;
> +
> +    if ( c->cpuid_level <= 9 ||
> +         rdmsr_safe(MSR_IA32_MISC_ENABLE, val) ||

In e.g. intel_unlock_cpuid_leaves() and early_init_intel() and in particular
also in boot/head.S we access this MSR without recovery attached. Is there a
reason rdmsr_safe() needs using here?

> +         !(val & MSR_IA32_MISC_ENABLE_PERF_AVAIL) )
> +        return;
> +
> +    eax = cpuid_eax(10);
> +    ver = eax & 0xff;
> +    nr_cnt = (eax >> 8) & 0xff;
> +
> +    if ( ver && nr_cnt > 1 && nr_cnt <= 32 )
> +    {
> +        unsigned int cnt_mask = (1UL << nr_cnt) - 1;
> +
> +        /*
> +         * On (some?) Sapphire/Emerald Rapids platforms each package-BSP
> +         * starts with all the enable bits for the general-purpose PMCs
> +         * cleared.  Adjust so counters can be enabled from EVNTSEL.
> +         */
> +        rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, val);
> +
> +        if ( (val & cnt_mask) != cnt_mask )
> +        {
> +            printk("FIRMWARE BUG: CPU%u invalid PERF_GLOBAL_CTRL: %#"PRIx64" 
> adjusting to %#"PRIx64"\n",
> +                   smp_processor_id(), val, val | cnt_mask);
> +            wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, val | cnt_mask);
> +        }
> +    }
> +
> +    __set_bit(X86_FEATURE_ARCH_PERFMON, c->x86_capability);

This moved, without the description suggesting the move is intentional.
It did live at the end of the earlier scope before, ...

> +}
> +
>  static void cf_check init_intel(struct cpuinfo_x86 *c)
>  {
>       /* Detect the extended topology information if available */
>       detect_extended_topology(c);
>  
>       init_intel_cacheinfo(c);
> -     if (c->cpuid_level > 9) {
> -             unsigned eax = cpuid_eax(10);
> -             unsigned int cnt = (eax >> 8) & 0xff;
> -
> -             /* Check for version and the number of counters */
> -             if ((eax & 0xff) && (cnt > 1) && (cnt <= 32)) {
> -                     uint64_t global_ctrl;
> -                     unsigned int cnt_mask = (1UL << cnt) - 1;
> -
> -                     /*
> -                      * On (some?) Sapphire/Emerald Rapids platforms each
> -                      * package-BSP starts with all the enable bits for the
> -                      * general-purpose PMCs cleared.  Adjust so counters
> -                      * can be enabled from EVNTSEL.
> -                      */
> -                     rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
> -                     if ((global_ctrl & cnt_mask) != cnt_mask) {
> -                             printk("CPU%u: invalid PERF_GLOBAL_CTRL: %#"
> -                                    PRIx64 " adjusting to %#" PRIx64 "\n",
> -                                    smp_processor_id(), global_ctrl,
> -                                    global_ctrl | cnt_mask);
> -                             wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL,
> -                                    global_ctrl | cnt_mask);
> -                     }
> -                     __set_bit(X86_FEATURE_ARCH_PERFMON, c->x86_capability);
> -             }
> -     }

... as can be seen here.

Jan

> +     init_intel_perf(c);
>  
>       if ( !cpu_has(c, X86_FEATURE_XTOPOLOGY) )
>       {
> 
> base-commit: c3f5d1bb40b57d467cb4051eafa86f5933ec9003


Reply via email to