On Tue Jul 8, 2025 at 2:07 AM CEST, Stefano Stabellini wrote:
> Today, checking for non-fatal MCE errors on ARM is very invasive: it
> involves a periodic timer interrupting the physical CPU execution at
> regular intervals. Moreover, when the timer fires, the handler sends an
> IPI to all physical CPUs.
>
> Both these actions are disruptive in terms of latency and deterministic
> execution times for real-time workloads. They might miss a deadline due
> to one of these IPIs. Make it possible to disable non-fatal MCE errors
> checking with a new Kconfig option (AMD_MCE_NONFATAL).
>
> Signed-off-by: Stefano Stabellini <stefano.stabell...@amd.com>
> ---
> RFC. I couldn't find a better way to do this.
> ---
>  xen/arch/x86/Kconfig.cpu               | 15 +++++++++++++++
>  xen/arch/x86/cpu/mcheck/amd_nonfatal.c |  3 ++-
>  2 files changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/xen/arch/x86/Kconfig.cpu b/xen/arch/x86/Kconfig.cpu
> index 5fb18db1aa..14e20ad19d 100644
> --- a/xen/arch/x86/Kconfig.cpu
> +++ b/xen/arch/x86/Kconfig.cpu
> @@ -10,6 +10,21 @@ config AMD
>         May be turned off in builds targetting other vendors.  Otherwise,
>         must be enabled for Xen to work suitably on AMD platforms.
>  
> +config AMD_MCE_NONFATAL
> +     bool "Check for non-fatal MCEs on AMD CPUs"
> +     default y
> +     depends on AMD
> +     help
> +       Check for non-fatal MCE errors.
> +
> +       When this option is on (default), Xen regularly checks for
> +       non-fatal MCEs potentially occurring on all physical CPUs. The
> +       checking is done via timers and IPI interrupts, which is
> +       acceptable in most configurations, but not for real-time.
> +
> +       Turn this option off if you plan on deploying real-time workloads
> +       on Xen.
> +

This being in the CPU vendor submenu seems off. I'd expect only a list of
silicon vendors here. I think it ought to be in the regular Kconfig file.

>  config INTEL
>       bool "Support Intel CPUs"
>       default y
> diff --git a/xen/arch/x86/cpu/mcheck/amd_nonfatal.c 
> b/xen/arch/x86/cpu/mcheck/amd_nonfatal.c
> index 7d48c9ab5f..812e18f612 100644
> --- a/xen/arch/x86/cpu/mcheck/amd_nonfatal.c
> +++ b/xen/arch/x86/cpu/mcheck/amd_nonfatal.c
> @@ -191,7 +191,8 @@ static void cf_check mce_amd_work_fn(void *data)
>  
>  void __init amd_nonfatal_mcheck_init(struct cpuinfo_x86 *c)
>  {
> -     if (!(c->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)))
> +     if ( !IS_ENABLED(CONFIG_AMD_MCE_NONFATAL) ||
> +          (!(c->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON))) )
>               return;
>  
>       /* Assume we are on K8 or newer AMD or Hygon CPU here */

It can be made more general to remove more code. What do you think of removing
all non-fatals and getting rid of the initcall altogether?

        diff --git a/xen/arch/x86/Kconfig.cpu b/xen/arch/x86/Kconfig.cpu
        index 5fb18db1aa..a4b892a1aa 100644
        --- a/xen/arch/x86/Kconfig.cpu
        +++ b/xen/arch/x86/Kconfig.cpu
        @@ -10,6 +10,20 @@ config AMD
                  May be turned off in builds targetting other vendors.  
Otherwise,
                  must be enabled for Xen to work suitably on AMD platforms.

        +config MCE_NONFATAL
        +       bool "Check for non-fatal MCEs"
        +       default y
        +       help
        +         Check for non-fatal MCE errors.
        +
        +         When this option is on (default), Xen regularly checks for
        +         non-fatal MCEs potentially occurring on all physical CPUs. The
        +         checking is done via timers and IPI interrupts, which is
        +         acceptable in most configurations, but not for real-time.
        +
        +         Turn this option off if you plan on deploying real-time 
workloads
        +         on Xen.
        +
         config INTEL
                bool "Support Intel CPUs"
                default y
        diff --git a/xen/arch/x86/cpu/mcheck/Makefile 
b/xen/arch/x86/cpu/mcheck/Makefile
        index e6cb4dd503..c70b441888 100644
        --- a/xen/arch/x86/cpu/mcheck/Makefile
        +++ b/xen/arch/x86/cpu/mcheck/Makefile
        @@ -1,12 +1,12 @@
        -obj-$(CONFIG_AMD) += amd_nonfatal.o
        +obj-$(filter $(CONFIG_AMD),$(CONFIG_MCE_NONFATAL)) += amd_nonfatal.o
         obj-$(CONFIG_AMD) += mce_amd.o
         obj-y += mcaction.o
         obj-y += barrier.o
        -obj-$(CONFIG_INTEL) += intel-nonfatal.o
        +obj-$(filter $(CONFIG_INTEL),$(CONFIG_MCE_NONFATAL)) += 
intel-nonfatal.o
         obj-y += mctelem.o
         obj-y += mce.o
         obj-y += mce-apei.o
         obj-$(CONFIG_INTEL) += mce_intel.o
        -obj-y += non-fatal.o
        +obj-$(CONFIG_MCE_NONFATAL) += non-fatal.o
         obj-y += util.o
         obj-y += vmce.o

... with the Kconfig option probably in the regular x86 Kconfig rather than
Kconfig.cpu

Thoughts?

Cheers,
Alejandro

Reply via email to