On Wed, 5 Nov 2025 21:44:53 +1000 Gavin Shan <[email protected]> wrote:
> In the combination of 64KiB host and 4KiB guest, a problematic host > page affects 16x guest pages that can be owned by different threads. > It means 16x memory errors can be raised at once due to the parallel > accesses to those 16x guest pages on the guest. Unfortunately, QEMU > can't deliver them one by one because we just one GHES error block, we have just one > corresponding one read acknowledgement register. It can eventually > cause QEMU crash dump due to the contention on that register, meaning > the current memory error can't be delivered before the previous error > isn't acknowledged. > > Imporve push_ghes_memory_errors() to push 16x consecutive memory errors Improve > under this situation to avoid the contention on the read acknowledgement > register. > > Signed-off-by: Gavin Shan <[email protected]> Hi Gavin Silly question that never occurred to me before: What happens if we just report a single larger error? The CPER record has a Physical Address Mask that I think lets us say we are only reporting at a 64KiB granularity. In linux drivers/edac/ghes_edac.c seems to handle this via e->grain. https://elixir.bootlin.com/linux/v6.18-rc4/source/drivers/edac/ghes_edac.c#L346 I haven't chased the whole path through to whether this does appropriate poisoning on the guest though. > --- > target/arm/kvm.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 50 insertions(+), 2 deletions(-) > > diff --git a/target/arm/kvm.c b/target/arm/kvm.c > index 5b151eda3c..d7de8262da 100644 > --- a/target/arm/kvm.c > +++ b/target/arm/kvm.c > @@ -11,6 +11,7 @@ > */ > > #include "qemu/osdep.h" > +#include "qemu/units.h" > #include <sys/ioctl.h> > > #include <linux/kvm.h> > @@ -2432,12 +2433,59 @@ int kvm_arch_get_registers(CPUState *cs, Error **errp) > static void push_ghes_memory_errors(CPUState *c, AcpiGhesState *ags, > uint64_t paddr, Error **errp) > { > + uint64_t val, start, end, guest_pgsz, host_pgsz; > uint64_t addresses[16]; > + uint32_t num_of_addresses; > + int ret; > + > + /* > + * Sort out the guest page size from TCR_EL1, which can be modified > + * by the guest from time to time. So we have to sort it out dynamically. > + */ > + ret = read_sys_reg64(c->kvm_fd, &val, ARM64_SYS_REG(3, 0, 2, 0, 2)); > + if (ret) { > + error_setg(errp, "Error %" PRId32 " to read TCR_EL1 register", ret); > + return; > + } > + > + switch (extract64(val, 14, 2)) { > + case 0: > + guest_pgsz = 4 * KiB; > + break; > + case 1: > + guest_pgsz = 64 * KiB; > + break; > + case 2: > + guest_pgsz = 16 * KiB; > + break; > + default: > + error_setg(errp, "Unknown page size from TCR_EL1 (0x%" PRIx64 ")", > val); > + return; > + } > + > + host_pgsz = qemu_real_host_page_size(); > + start = paddr & ~(host_pgsz - 1); > + end = start + host_pgsz; > + num_of_addresses = 0; > > - addresses[0] = paddr; > + while (start < end) { > + /* > + * The precise physical address is provided for the affected > + * guest page that contains @paddr. Otherwise, the starting > + * address of the guest page is provided. > + */ > + if (paddr >= start && paddr < (start + guest_pgsz)) { > + addresses[num_of_addresses++] = paddr; > + } else { > + addresses[num_of_addresses++] = start; > + } > + > + start += guest_pgsz; > + } > > kvm_cpu_synchronize_state(c); > - acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC, addresses, 1, errp); > + acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC, > + addresses, num_of_addresses, errp); > kvm_inject_arm_sea(c); > } >
