Hi Igor,

On 11/12/25 10:32 PM, Igor Mammedov wrote:
On Tue, 11 Nov 2025 14:05:23 +1000
Gavin Shan <[email protected]> wrote:
On 11/11/25 12:11 AM, Igor Mammedov wrote:
On Wed,  5 Nov 2025 21:44:47 +1000
Gavin Shan <[email protected]> wrote:
The current GHES raw data maximal length isn't enough for 16 consecutive
CPER errors, which will be sent to a guest with 4KiB page size on a
erroneous 64KiB host page. Note those 16 CPER errors will be contained
in one single error block, meaning all CPER errors should be identical
in terms of type and severity and all of them should be delivered in
one shot.

Increase GHES raw data maximal length from 1KiB to 4KiB so that the
error block has enough storage space for 16 consecutive CPER errors.

Signed-off-by: Gavin Shan <[email protected]>
---
   docs/specs/acpi_hest_ghes.rst | 2 +-
   hw/acpi/ghes.c                | 2 +-
   2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/specs/acpi_hest_ghes.rst b/docs/specs/acpi_hest_ghes.rst
index aaf7b1ad11..acf31d6eeb 100644
--- a/docs/specs/acpi_hest_ghes.rst
+++ b/docs/specs/acpi_hest_ghes.rst
@@ -68,7 +68,7 @@ Design Details
       and N Read Ack Register entries. The size for each entry is 8-byte.
       The Error Status Data Block table contains N Error Status Data Block
       entries. The size for each entry is defined at the source code as
-    ACPI_GHES_MAX_RAW_DATA_LENGTH (currently 1024 bytes). The total size
+    ACPI_GHES_MAX_RAW_DATA_LENGTH (currently 4096 bytes). The total size

is it safe to bump without compat glue?

consider VM migrated from old QEMU to new one,
it will have  etc/hardware_errors allocated with 1K GESB,
and more importantly error_block_addressN will have 1K offsets as well

however with ACPI_GHES_MAX_RAW_DATA_LENGTH all length checks will
let >1K blocks to be written into into 1K 'formated' etc/hardware_errors.

Thanks to previous refactoring we get all addresses right (1K version),
but if you write large GESB there it will either overlap with the next GESB
or a smaller GESB might overwrite tail of preceding large one.
And in works case it's OOB when writing large GESB in the last block.

Given we have to write GESB successfully or abort, there is no point
in adding compat knobs. But we still need to check if GEBS will fit into
whatever block size etc/hardware_errors inside guest RAM is laid out originally.

Good point. You're right that we're not safe for migration from old QEMU to
and new QEMU. So I think I need to bump vmstate_hest_state::minimum_version_id
in generic_event_device.c ?

that won't help,
what would help is creating compat property (in the owner of GHES MMIO 
registers),
and lower limits (to former value) for older machine types.
That way sizes would match even if you do ping pong migration
between old qemu and new one, since one would still be using old machine type
for that.


In v4, a compat property 'x-error-block-size' has been added, as the fence
between QEMU 10.1 and 10.2 (1KiB vs 4KiB GHES error block size).

https://lists.nongnu.org/archive/html/qemu-arm/2025-11/msg00534.html



       for the "etc/hardware_errors" fw_cfg blob is
       (N * 8 * 2 + N * ACPI_GHES_MAX_RAW_DATA_LENGTH) bytes.
       N is the number of the kinds of hardware error sources.
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 06555905ce..a9c08e73c0 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -33,7 +33,7 @@
   #define ACPI_HEST_ADDR_FW_CFG_FILE          "etc/acpi_table_hest_addr"
/* The max size in bytes for one error block */
-#define ACPI_GHES_MAX_RAW_DATA_LENGTH   (1 * KiB)
+#define ACPI_GHES_MAX_RAW_DATA_LENGTH   (4 * KiB)
/* Generic Hardware Error Source version 2 */
   #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10


Thanks,
Gavin


Reply via email to