On Thu, Sep 18, 2025 at 08:53:17PM +0530, Arun Menon wrote: > Hello, > > Currently, when a migration of a VM with an encrypted vTPM > fails on the destination host (e.g., due to a mismatch in secret values), > the error message displayed on the source host is generic and unhelpful. > > For example, a typical error looks like this: > "operation failed: job 'migration out' failed: Sibling indicated error 1. > operation failed: job 'migration in' failed: load of migration failed: > Input/output error" > > This message does not provide any specific indication of a vTPM failure. > Such generic errors are logged using error_report(), which prints to > the console/monitor but does not make the detailed error accessible via > the QMP query-migrate command. > > This series addresses the issue, by ensuring that specific TPM error > messages are propagated via the QEMU Error object. > To make this possible, > - A set of functions in the call stack is changed > to incorporate an Error object as an additional parameter. > - Also, the TPM backend makes use of a new hook called post_load_errp() > that explicitly passes an Error object. > > It is organized as follows, > - Patches 1-23 focuses on pushing Error object into the functions > that are important in the call stack where TPM errors are observed. > We still need to make changes in rest of the functions in savevm.c > such that they also incorporate the errp object for propagating errors. > - Patches 12, 13, 20, are minor refactoring changes. > - Patch 24 removes error variant of vmstate_save_state() function. > - Patch 25 renames post_save() to cleanup_save() > - Patch 26 introduces the new variants of the hooks in VMStateDescription > structure. These hooks should be used in future implementations. > - Patch 27 focuses on changing the TPM backend such that the errors are > set in the Error object. > > While this series focuses specifically on TPM error reporting during > live migration, it lays the groundwork for broader improvements. > A lot of methods in savevm.c that previously returned an integer now capture > errors in the Error object, enabling other modules to adopt the > post_load_errp hook in the future. > > One such change previously attempted: > https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg01727.html > > Resolves: https://issues.redhat.com/browse/RHEL-82826 > > Signed-off-by: Arun Menon <[email protected]> > --- > Changes in v14: > - Addressed two regressions introduced in > [PATCH 07/27] migration: push Error **errp into qemu_loadvm_state() > thanks to Akihiko's suggestions. > a) duplicate error reporting: The previous patch introduced a duplicate > error message when 'exit_on_error' is false and the migration failed. > This is not trivial to fix because there is one such scenario in postcopy > migration where querying the error message using the QMP client or > libvirt > is not possible as the destination abruptly exits. > However, the logic to print error is put into a separate specific path > and > a comment is added in this path, which can be improved in future to > catch the > specific case. > b) Ignoring the error in the migration state and reporting local_err: > Solved this by restoring the patch 07. > - Link to v13: > https://lore.kernel.org/qemu-devel/[email protected] > > Changes in v13: > - Akihiko suggested to use error_report_err() instead of warn_report_err(). > We must not prefix error messages with "warning:" unless the error is a > non-critical > failure that can be logged while the program continues to function. > - Removed error_prepend() with ERRP_GUARD() where it is not required within > the series. > - Link to v12: > https://lore.kernel.org/qemu-devel/[email protected] > > Changes in v12: > - Remove error_prepend() calls where no additional information is appended to > the error string. This also allows us to remove unnecessary ERRP_GUARD(). > - Avoid ambiguity by propagating clear messages in errp. > - Add clarity to commit messages throughout the series. > - Link to v11: > https://lore.kernel.org/qemu-devel/[email protected] > > Changes in v11: > - Remove unnecessary NULL check in postcopy_ram_listen_thread. > - Change error_warn to error_fatal or pass local_err wherever appropriate, > because, > > https://lore.kernel.org/qemu-devel/[email protected]/ > Most changes are in patches 2,24. > - Link to v10: > https://lore.kernel.org/qemu-devel/[email protected] > > Changes in v10: > - Remove the patch to propagate most recent error and the patch of refactoring > vmstate_save_state_v(): 23,24. They are not required because we intend to > keep > the design as is. > - Added 2 new patches > - patch 25: Rename post_save() to cleanup_save() and make it void > - patch 20: Return -1 on memory allocation failure in ram.c > - Pass &error_warn or &error_fatal to capture error or exit on error. > - Link to v9: > https://lore.kernel.org/qemu-devel/[email protected] > > Changes in v9: > - Re ordering patches such that error is reported in each one of them. > - format specifier enclosed in '' changed i.e. '%d' changed to %d > - Reporting errors where they were missed before. Setting errp to NULL > in case of retry. > - Link to v8: > https://lore.kernel.org/qemu-devel/[email protected] > > Changes in v8: > - 3 new patches added: > - patch 23: > - Changes the error propagation by returning the most recent error > to the caller when both save device state and post_save fails. > - patch 24: > - Refactors the vmstate_save_state_v() function by adding wrapper > functions to separate concerns. > - patch 25: > - Removes the error variant of the vmstate_save_state() > function introduced in commit 969298f9d7. > - Use ERRP_GUARD() where there is an errp dereference or an error_prepend > call. > - Pass &error_warn in place of NULL, in vmstate_load_state() calls so > that the caller knows about the error. > - Remove unnecessary null check before setting errp. Dereferencing it is not > required. > - Documentation for the new variants of post/pre save/load hooks added. > - Some patches, although they received a 'Reviewed-by' tag, have undergone > few minor changes, > Patch 1 : removed extra space > Patch 2 : Commit message changed, refactoring the function to > always set errp and return. > Patch 8 : Commit message changed. > Patch 9 : use error_setg_errno instead of error_setg. > Patch 27 : use error_setg_errno instead of error_setg. > - Link to v7: > https://lore.kernel.org/qemu-devel/[email protected] > > Changes in v7: > - Fix propagating errors in post_save_errp. The latest error encountered is > propagated. > - user-strings in error_prepend() calls now end with a ': ' so that the print > is pretty. > - Change the order of one of the patches. > - Link to v6: > https://lore.kernel.org/qemu-devel/[email protected] > > Changes in v6: > - Incorporated review comments from Daniel and Akihiko, related to few > semantic errors and improve error logging. > - Add one more patch that removes NULL checks after calling > qemu_file_get_return_path() because it does not fail. > - Link to v5: > https://lore.kernel.org/qemu-devel/[email protected] > > Changes in v5: > - Solve a bug that set errp even though it was not NULL, pointed out by > Fabiano in v4. > - Link to v4: > https://lore.kernel.org/qemu-devel/[email protected] > > Changes in v4: > - Split the patches into smaller ones based on functions. Pass NULL in the > caller until errp is made available. Every function that has an > Error **errp object passed to it, ensures that it sets the errp object > in case of failure. > - A few more functions within loadvm_process_command() now handle errors using > the errp object. I've converted these for consistency, taking Daniel's > patches (link above) as a reference. > - Along with the post_load_errp() hook, other duplicate hooks are also > introduced. > This will enable us to migrate to the newer versions eventually. > - Fix some semantic errors, like using error_propagate_prepend() in places > where > we need to preserve existing behaviour of accumulating the error in > local_err > and then propagating it to errp. This can be refactored in a later commit. > - Add more information in commit messages explaining the changes. > - Link to v3: > https://lore.kernel.org/qemu-devel/[email protected] > > Changes in v3: > - Split the 2nd patch into 2. Introducing post_load_with_error() hook > has been separated from using it in the backends TPM module. This is > so that it can be acknowledged. > - Link to v2: > https://lore.kernel.org/qemu-devel/[email protected] > > Changes in v2: > - Combine the first two changes into one, focusing on passing the > Error object (errp) consistently through functions involved in > loading the VM's state. Other functions are not yet changed. > - As suggested in the review comment, add null checks for errp > before adding error messages, preventing crashes. > We also now correctly set errors when post-copy migration fails. > - In process_incoming_migration_co(), switch to error_prepend > instead of error_setg. This means we now null-check local_err in > the "fail" section before using it, preventing dereferencing issues. > - Link to v1: > https://lore.kernel.org/qemu-devel/[email protected] > > --- > Arun Menon (27): > migration: push Error **errp into vmstate_subsection_load() > migration: push Error **errp into vmstate_load_state() > migration: push Error **errp into qemu_loadvm_state_header() > migration: push Error **errp into vmstate_load() > migration: push Error **errp into loadvm_process_command() > migration: push Error **errp into loadvm_handle_cmd_packaged() > migration: push Error **errp into qemu_loadvm_state() > migration: push Error **errp into qemu_load_device_state() > migration: push Error **errp into qemu_loadvm_state_main() > migration: push Error **errp into qemu_loadvm_section_start_full() > migration: push Error **errp into qemu_loadvm_section_part_end() > migration: Update qemu_file_get_return_path() docs and remove dead > checks > migration: make loadvm_postcopy_handle_resume() void > migration: push Error **errp into ram_postcopy_incoming_init() > migration: push Error **errp into loadvm_postcopy_handle_advise() > migration: push Error **errp into loadvm_postcopy_handle_listen() > migration: push Error **errp into loadvm_postcopy_handle_run() > migration: push Error **errp into loadvm_postcopy_ram_handle_discard() > migration: push Error **errp into loadvm_handle_recv_bitmap() > migration: Return -1 on memory allocation failure in ram.c > migration: push Error **errp into loadvm_process_enable_colo() > migration: push Error **errp into > loadvm_postcopy_handle_switchover_start() > migration: Capture error in postcopy_ram_listen_thread() > migration: Remove error variant of vmstate_save_state() function > migration: Rename post_save() to cleanup_save() and make it void > migration: Add error-parameterized function variants in VMSD struct > backends/tpm: Propagate vTPM error on migration failure > > backends/tpm/tpm_emulator.c | 40 ++--- > docs/devel/migration/main.rst | 21 ++- > hw/display/virtio-gpu.c | 5 +- > hw/pci/pci.c | 5 +- > hw/ppc/spapr_pci.c | 5 +- > hw/s390x/virtio-ccw.c | 4 +- > hw/scsi/spapr_vscsi.c | 6 +- > hw/vfio/pci.c | 9 +- > hw/virtio/virtio-mmio.c | 5 +- > hw/virtio/virtio-pci.c | 4 +- > hw/virtio/virtio.c | 13 +- > include/migration/colo.h | 2 +- > include/migration/vmstate.h | 20 ++- > migration/colo.c | 10 +- > migration/cpr.c | 6 +- > migration/migration.c | 38 ++--- > migration/postcopy-ram.c | 9 +- > migration/postcopy-ram.h | 2 +- > migration/qemu-file.c | 1 - > migration/ram.c | 16 +- > migration/ram.h | 4 +- > migration/savevm.c | 334 > ++++++++++++++++++++++++------------------ > migration/savevm.h | 7 +- > migration/vmstate-types.c | 53 ++++--- > migration/vmstate.c | 115 ++++++++++----- > target/arm/machine.c | 6 +- > tests/unit/test-vmstate.c | 83 +++++++++-- > ui/vdagent.c | 8 +- > 28 files changed, 523 insertions(+), 308 deletions(-) > --- > base-commit: f0007b7f03e2d7fc33e71c3a582f2364c51a226b > change-id: 20250624-propagate_tpm_error-bf4ae6c23d30 > > Best regards, > -- > Arun Menon <[email protected]> >
Hi, Gentle ping for the series. Is there something more to be done to improve this before queueing it? TIA. Regards, Arun Menon
