On Sat, 14 Sep 2024 07:33:14 +0200 Mauro Carvalho Chehab <mchehab+hua...@kernel.org> wrote:
> Hi Igor, > > Em Fri, 13 Sep 2024 15:25:18 +0200 > Igor Mammedov <imamm...@redhat.com> escreveu: > > > > > in addition to this, it needs a patch on top to make sure > > > > that we migrate hest_addr_le. > > > > See a08a64627b6b 'ACPI: Record the Generic Error Status Block address' > > > > and fixes on top of that for an example. > > > > > > Hmm... If I understood such change well, vmstate_ghes_state() will > > > use this structure as basis to do migration: > > > > > > /* ghes.h */ > > > typedef struct AcpiGhesState { > > > uint64_t hest_addr_le; > > > uint64_t ghes_addr_le; > > > bool present; /* True if GHES is present at all on this board */ > > > } AcpiGhesState; > > > > > > /* generic_event_device.c */ > > > static const VMStateDescription vmstate_ghes_state = { > > > .name = "acpi-ged/ghes", > > > .version_id = 1, > > > .minimum_version_id = 1, > > > .needed = ghes_needed, > > > .fields = (VMStateField[]) { > > > VMSTATE_STRUCT(ghes_state, AcpiGedState, 1, > > > vmstate_ghes_state, AcpiGhesState), > > > VMSTATE_END_OF_LIST() > > > } > > > }; > > > > current code looks like that: > > > > > > static const VMStateDescription vmstate_ghes = { > > > > .name = "acpi-ghes", > > > > .version_id = 1, > > > > .minimum_version_id = 1, > > > > .fields = (const VMStateField[]) { > > > > VMSTATE_UINT64(ghes_addr_le, AcpiGhesState), <<=== > > > > VMSTATE_END_OF_LIST() > > > > }, > > > > }; > > > > > > > > static bool ghes_needed(void *opaque) > > > > { > > > > AcpiGedState *s = opaque; > > > > return s->ghes_state.ghes_addr_le; > > > > } > > > > > > > > static const VMStateDescription vmstate_ghes_state = { > > > > .name = "acpi-ged/ghes", > > > > .version_id = 1, > > > > .minimum_version_id = 1, > > > > .needed = ghes_needed, > > > > .fields = (const VMStateField[]) { > > > > VMSTATE_STRUCT(ghes_state, AcpiGedState, 1, > > > > vmstate_ghes, AcpiGhesState), > > > > VMSTATE_END_OF_LIST() > > > > } > > > > }; > > > > where > > VMSTATE_UINT64(ghes_addr_le, AcpiGhesState), > > explicitly defines field(s) within structure to be sent over wire. > > > > we need to add a conditional field for ghes_addr_le > > which will be sent only with new machine types, but not with old ones > > to avoid migration breakage. > > > > I don't know much about migration, but maybe we can get away with > > similar condition as in ghes_needed(), or enabling QMP error injection > > based on machine type version. > > > > Or maybe adding a CLI option to enable QMP error injection in which > > case the explicit option would serve as a trigger enable QMP command and > > to migrate hest_addr_le. > > It might be even better this way, since machine wouldn't need to > > carry extra error source that will be used only for testing > > and practically never in production VMs (aka reduced attack surface). > > > > You can easily test it locally: > > new-qemu: with your patches > > old-qemu: qemu-9.1 > > > > and then try to do forth & back migration for following cases: > > 1. (ping-pong case with OLD firmware/ACPI tables) > > start old-qemu with 9.1 machine type -> > > migrate to file -> > > start new-qemu with 9.1 machine type -> restore from file -> > > migrate to file -> > > As I never used migration, I'm a little stuck with the command line > parameters. > > I guess I got the one to do the migration at the monitor: > > (qemu) migrate file://tmp/migrate > > But no idea how to start a machine using a saved state. see https://www.linux-kvm.org/page/Migration 'savevm/loadvm to an external state file (using pseudo-migration)' section > > > start old-qemu with 9.1 machine type ->restore from file -> > > > > 2. (ping-pong case with NEW firmware/ACPI tables) > > do the same as #1 but starting with new-qemu binary > > > > (from upstream pov #2 is optional, but not implementing it > > is pain for downstream so it's better to have it if it's not > > too much work) > > If I understood the migration documentation, every when new fields > are added, we should increment .version_id. If new version is > not backward-compatible, .minimum_version_id is also incremented. > > So, for a migration-compatible code with a 9.1 VM, the code needs to > handle the case where hest_addr_le is not defined, e. g. use offsets > relative to ghes_addr_le, just like the current version, e.g.: > > uint64_t cper_addr, read_ack_start_addr; > > AcpiGedState *acpi_ged_state = > ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED, NULL)); > AcpiGhesState *ags = &acpi_ged_state->ghes_state; > > if (!ags->hest_addr_le) { > // Backward-compatible migration code > uint64_t base = le64_to_cpu(ags->ghes_addr_le); > > *read_ack_start_addr = base + > ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) + > error_source_to_index[notify] * sizeof(uint64_t); > > *cper_addr = base + > ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) + > ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) + > error_source_to_index[notify] * ACPI_GHES_MAX_RAW_DATA_LENGTH; > } else { > // Use the new logic from ags->hest_addr_le > } > > There are two problems with that: > > 1. On your reviews, if I understood right, the code above is not > migration safe. So, while implementing it would be technically > correct, migration still won't work; > > 2. With the new code, ACPI_GHES_ERROR_SOURCE_COUNT is not > defined anymore, as the size of the error source structure can > be different on different architectures, being 2 for new > VMs and 1 for old ones. > > Basically the new code gets it right because it can see a > pointer to the HEST table, so it can get the number from there: > > hest_addr = le64_to_cpu(ags->hest_addr_le); > cpu_physical_memory_read(hest_addr, &num_sources, sizeof(num_sources)); > > But, without hest_addr_le, getting num_sources is not possible. > > An alternative would be to add a hacky code that works only for > arm machines (as new versions may support more archs). > > Something like: > #define V1_ARM_ACPI_GHES_ERROR_SOURCE_COUNT 1 > #define V2_ARM_ACPI_GHES_ERROR_SOURCE_COUNT 2 > > And have a hardcoded logic that would work before/after this > changeset but may break on newer versions, if the number of > source IDs change, if we add other HEST types, etc. > > Now, assuming that such hack would work, it sounds too hacky to > my taste. > > So, IMO it is a lot safer to not support migrations from v1 (only > ghes_addr_le), using a patch like the enclosed one to ensure that. > > Btw, checking existing migration structs, it sounds that for almost all > structures, .version_id is identical to .minimum_version_id, meaning that > migration between different versions aren't supported on most cases. let me try to find more examples on how to implement migration bits hopefully more comprehensible. > > Thanks, > Mauro > > --- > > [PATCH] acpi/generic_event_device: Update GHES migration to cover hest addr > > The GHES migration logic at GED should now support HEST table > location too. > > Increase migration version and change needed to check for both > ghes_addr_le and hest_addr_le. > > Signed-off-by: Mauro Carvalho Chehab <mchehab+hua...@kernel.org> > > diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c > index b4c83a089a02..efae0ff62c7b 100644 > --- a/hw/acpi/generic_event_device.c > +++ b/hw/acpi/generic_event_device.c > @@ -351,10 +351,11 @@ static const VMStateDescription vmstate_ged_state = { > > static const VMStateDescription vmstate_ghes = { > .name = "acpi-ghes", > - .version_id = 1, > - .minimum_version_id = 1, > + .version_id = 2, > + .minimum_version_id = 2, > .fields = (const VMStateField[]) { > VMSTATE_UINT64(ghes_addr_le, AcpiGhesState), > + VMSTATE_UINT64(hest_addr_le, AcpiGhesState), > VMSTATE_END_OF_LIST() > }, > }; > @@ -362,13 +363,13 @@ static const VMStateDescription vmstate_ghes = { > static bool ghes_needed(void *opaque) > { > AcpiGedState *s = opaque; > - return s->ghes_state.ghes_addr_le; > + return s->ghes_state.ghes_addr_le && s->ghes_state.hest_addr_le; > } > > static const VMStateDescription vmstate_ghes_state = { > .name = "acpi-ged/ghes", > - .version_id = 1, > - .minimum_version_id = 1, > + .version_id = 2, > + .minimum_version_id = 2, > .needed = ghes_needed, > .fields = (const VMStateField[]) { > VMSTATE_STRUCT(ghes_state, AcpiGedState, 1, > >