On Wed,  5 Nov 2025 21:44:53 +1000
Gavin Shan <[email protected]> wrote:

> In the combination of 64KiB host and 4KiB guest, a problematic host
> page affects 16x guest pages that can be owned by different threads.
> It means 16x memory errors can be raised at once due to the parallel
> accesses to those 16x guest pages on the guest. Unfortunately, QEMU
> can't deliver them one by one because we just one GHES error block,

we have just one

> corresponding one read acknowledgement register. It can eventually
> cause QEMU crash dump due to the contention on that register, meaning
> the current memory error can't be delivered before the previous error
> isn't acknowledged.
> 
> Imporve push_ghes_memory_errors() to push 16x consecutive memory errors
Improve

> under this situation to avoid the contention on the read acknowledgement
> register.
> 
> Signed-off-by: Gavin Shan <[email protected]>
Hi Gavin

Silly question that never occurred to me before:
What happens if we just report a single larger error?

The CPER record has a Physical Address Mask that I think lets us say we
are only reporting at a 64KiB granularity.

In linux drivers/edac/ghes_edac.c seems to handle this via e->grain.
https://elixir.bootlin.com/linux/v6.18-rc4/source/drivers/edac/ghes_edac.c#L346

I haven't chased the whole path through to whether this does appropriate 
poisoning
on the guest though.

> ---
>  target/arm/kvm.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 50 insertions(+), 2 deletions(-)
> 
> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> index 5b151eda3c..d7de8262da 100644
> --- a/target/arm/kvm.c
> +++ b/target/arm/kvm.c
> @@ -11,6 +11,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qemu/units.h"
>  #include <sys/ioctl.h>
>  
>  #include <linux/kvm.h>
> @@ -2432,12 +2433,59 @@ int kvm_arch_get_registers(CPUState *cs, Error **errp)
>  static void push_ghes_memory_errors(CPUState *c, AcpiGhesState *ags,
>                                      uint64_t paddr, Error **errp)
>  {
> +    uint64_t val, start, end, guest_pgsz, host_pgsz;
>      uint64_t addresses[16];
> +    uint32_t num_of_addresses;
> +    int ret;
> +
> +    /*
> +     * Sort out the guest page size from TCR_EL1, which can be modified
> +     * by the guest from time to time. So we have to sort it out dynamically.
> +     */
> +    ret = read_sys_reg64(c->kvm_fd, &val, ARM64_SYS_REG(3, 0, 2, 0, 2));
> +    if (ret) {
> +        error_setg(errp, "Error %" PRId32 " to read TCR_EL1 register", ret);
> +        return;
> +    }
> +
> +    switch (extract64(val, 14, 2)) {
> +    case 0:
> +        guest_pgsz = 4 * KiB;
> +        break;
> +    case 1:
> +        guest_pgsz = 64 * KiB;
> +        break;
> +    case 2:
> +        guest_pgsz = 16 * KiB;
> +        break;
> +    default:
> +        error_setg(errp, "Unknown page size from TCR_EL1 (0x%" PRIx64 ")", 
> val);
> +        return;
> +    }
> +
> +    host_pgsz = qemu_real_host_page_size();
> +    start = paddr & ~(host_pgsz - 1);
> +    end = start + host_pgsz;
> +    num_of_addresses = 0;
>  
> -    addresses[0] = paddr;
> +    while (start < end) {
> +        /*
> +         * The precise physical address is provided for the affected
> +         * guest page that contains @paddr. Otherwise, the starting
> +         * address of the guest page is provided.
> +         */
> +        if (paddr >= start && paddr < (start + guest_pgsz)) {
> +            addresses[num_of_addresses++] = paddr;
> +        } else {
> +            addresses[num_of_addresses++] = start;
> +        }
> +
> +        start += guest_pgsz;
> +    }
>  
>      kvm_cpu_synchronize_state(c);
> -    acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC, addresses, 1, errp);
> +    acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC,
> +                            addresses, num_of_addresses, errp);
>      kvm_inject_arm_sea(c);
>  }
>  


Reply via email to