In the combination of 64KiB host and 4KiB guest, a problematic host
page affects 16x guest pages. Those 16x guest pages are most likely
owned by separate threads and accessed by the threads in parallel.
It means 16x memory errors can be raised at once. However, we're
unable to handle this situation because the only error source has
one read acknowledgement register in current design. QEMU has to
crash in the following path due to the previously delivered error
isn't acknowledged by the guest on attempt to deliver another error.
kvm_vcpu_thread_fn
kvm_cpu_exec
kvm_arch_on_sigbus_vcpu
kvm_cpu_synchronize_state
acpi_ghes_memory_errors
abort
This series fixes the issue by sending 16x consective CPER errors
which are contained in a single GHES error block.
PATCH[1-4] Increases GHES raw data maximal length from 1KiB to 4KiB
PATCH[5] Supports multiple error records in a single error block
PATCH[6-7] Improves the error handling in the error delivery path
PATCH[8] Sends 16x consective CPERs in a single block if needed
Changelog
=========
v4:
* v3: https://lists.nongnu.org/archive/html/qemu-arm/2025-11/msg00264.html
* Pick r-b tags from Jonathan
* Add compat property 'x-error-block-size' for migration (Igor)
* Code and commit log improvements (Igor/Jonathan)
* Use error_fatal in the memory error delivery path (Markus)
* Use APIs from registerfields.h (Philippe)
v3:
* v2: https://lists.nongnu.org/archive/html/qemu-arm/2025-10/msg00372.html
* Code and changelog improvements (Jonathan)
* Fixed GHES error block status field and improved error
handling in the error delivery path (Igor)
* Fixed ACPI HEST table and document (Mauro)
v2:
* v1: https://lists.nongnu.org/archive/html/qemu-arm/2025-02/msg00897.html
* Send 16x memory errors for the specific case (Jonathan)
Gavin Shan (8):
acpi/ghes: Make GHES max raw data length dynamic
tests/qtest/bios-tables-test: Prepare for changes in the HEST table
acpi/ghes: Increase GHES raw data maximal length to 4KiB
tests/qtest/bios-tables-test: Update HEST table
acpi/ghes: Extend acpi_ghes_memory_errors() for multiple CPERs
acpi/ghes: Bail early on error from get_ghes_source_offsets()
acpi/ghes: Use error_fatal in acpi_ghes_memory_errors()
target/arm/kvm: Support multiple memory CPERs injection
docs/specs/acpi_hest_ghes.rst | 6 +-
hw/acpi/generic_event_device.c | 2 +
hw/acpi/ghes-stub.c | 7 +-
hw/acpi/ghes.c | 127 +++++++++++++++++-------------
hw/core/machine.c | 1 +
include/hw/acpi/ghes.h | 8 +-
target/arm/kvm.c | 72 +++++++++++++++--
tests/data/acpi/aarch64/virt/HEST | Bin 224 -> 224 bytes
8 files changed, 153 insertions(+), 70 deletions(-)
--
2.51.1