On Thu, Sep 18, 2025 at 08:53:17PM +0530, Arun Menon wrote:
> Hello,
> 
> Currently, when a migration of a VM with an encrypted vTPM
> fails on the destination host (e.g., due to a mismatch in secret values),
> the error message displayed on the source host is generic and unhelpful.
> 
> For example, a typical error looks like this:
> "operation failed: job 'migration out' failed: Sibling indicated error 1.
> operation failed: job 'migration in' failed: load of migration failed:
> Input/output error"
> 
> This message does not provide any specific indication of a vTPM failure.
> Such generic errors are logged using error_report(), which prints to
> the console/monitor but does not make the detailed error accessible via
> the QMP query-migrate command.
> 
> This series addresses the issue, by ensuring that specific TPM error
> messages are propagated via the QEMU Error object.
> To make this possible,
> - A set of functions in the call stack is changed
>   to incorporate an Error object as an additional parameter.
> - Also, the TPM backend makes use of a new hook called post_load_errp()
>   that explicitly passes an Error object.
> 
> It is organized as follows,
>  - Patches 1-23 focuses on pushing Error object into the functions
>    that are important in the call stack where TPM errors are observed.
>    We still need to make changes in rest of the functions in savevm.c
>    such that they also incorporate the errp object for propagating errors.
>  - Patches 12, 13, 20, are minor refactoring changes.
>  - Patch 24 removes error variant of vmstate_save_state() function.
>  - Patch 25 renames post_save() to cleanup_save()
>  - Patch 26 introduces the new variants of the hooks in VMStateDescription
>    structure. These hooks should be used in future implementations.
>  - Patch 27 focuses on changing the TPM backend such that the errors are
>    set in the Error object.
> 
> While this series focuses specifically on TPM error reporting during
> live migration, it lays the groundwork for broader improvements.
> A lot of methods in savevm.c that previously returned an integer now capture
> errors in the Error object, enabling other modules to adopt the
> post_load_errp hook in the future.
> 
> One such change previously attempted:
> https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg01727.html
> 
> Resolves: https://issues.redhat.com/browse/RHEL-82826
> 
> Signed-off-by: Arun Menon <[email protected]>
> ---
> Changes in v14:
> - Addressed two regressions introduced in
>   [PATCH 07/27] migration: push Error **errp into qemu_loadvm_state()
>   thanks to Akihiko's suggestions.
>   a) duplicate error reporting: The previous patch introduced a duplicate
>      error message when 'exit_on_error' is false and the migration failed.
>      This is not trivial to fix because there is one such scenario in postcopy
>      migration where querying the error message using the QMP client or 
> libvirt
>      is not possible as the destination abruptly exits.
>      However, the logic to print error is put into a separate specific path 
> and
>      a comment is added in this path, which can be improved in future to 
> catch the
>      specific case.
>   b) Ignoring the error in the migration state and reporting local_err:
>      Solved this by restoring the patch 07.
> - Link to v13: 
> https://lore.kernel.org/qemu-devel/[email protected]
> 
> Changes in v13:
> - Akihiko suggested to use error_report_err() instead of warn_report_err().
>   We must not prefix error messages with "warning:" unless the error is a 
> non-critical
>   failure that can be logged while the program continues to function.
> - Removed error_prepend() with ERRP_GUARD() where it is not required within
>   the series.
> - Link to v12: 
> https://lore.kernel.org/qemu-devel/[email protected]
> 
> Changes in v12:
> - Remove error_prepend() calls where no additional information is appended to
>   the error string. This also allows us to remove unnecessary ERRP_GUARD().
> - Avoid ambiguity by propagating clear messages in errp.
> - Add clarity to commit messages throughout the series.
> - Link to v11: 
> https://lore.kernel.org/qemu-devel/[email protected]
> 
> Changes in v11:
> - Remove unnecessary NULL check in postcopy_ram_listen_thread.
> - Change error_warn to error_fatal or pass local_err wherever appropriate, 
> because,
>   
> https://lore.kernel.org/qemu-devel/[email protected]/
>   Most changes are in patches 2,24.
> - Link to v10: 
> https://lore.kernel.org/qemu-devel/[email protected]
> 
> Changes in v10:
> - Remove the patch to propagate most recent error and the patch of refactoring
>   vmstate_save_state_v(): 23,24. They are not required because we intend to 
> keep
>   the design as is.
> - Added 2 new patches
>   - patch 25: Rename post_save() to cleanup_save() and make it void
>   - patch 20: Return -1 on memory allocation failure in ram.c
> - Pass &error_warn or &error_fatal to capture error or exit on error.
> - Link to v9: 
> https://lore.kernel.org/qemu-devel/[email protected]
> 
> Changes in v9:
> - Re ordering patches such that error is reported in each one of them.
> - format specifier enclosed in '' changed i.e. '%d' changed to %d
> - Reporting errors where they were missed before. Setting errp to NULL
>   in case of retry.
> - Link to v8: 
> https://lore.kernel.org/qemu-devel/[email protected]
> 
> Changes in v8:
> - 3 new patches added:
>   - patch 23:
>       - Changes the error propagation by returning the most recent error
>         to the caller when both save device state and post_save fails.
>   - patch 24:
>       - Refactors the vmstate_save_state_v() function by adding wrapper
>         functions to separate concerns.
>   - patch 25:
>       - Removes the error variant of the vmstate_save_state()
>         function introduced in commit 969298f9d7.
> - Use ERRP_GUARD() where there is an errp dereference or an error_prepend 
> call.
> - Pass &error_warn in place of NULL, in vmstate_load_state() calls so
>   that the caller knows about the error.
> - Remove unnecessary null check before setting errp. Dereferencing it is not 
> required.
> - Documentation for the new variants of post/pre save/load hooks added.
> - Some patches, although they received a 'Reviewed-by' tag, have undergone 
> few minor changes,
>       Patch 1 : removed extra space
>       Patch 2 : Commit message changed, refactoring the function to
>               always set errp and return.
>       Patch 8 : Commit message changed.
>       Patch 9 : use error_setg_errno instead of error_setg.
>       Patch 27 : use error_setg_errno instead of error_setg.
> - Link to v7: 
> https://lore.kernel.org/qemu-devel/[email protected]
> 
> Changes in v7:
> - Fix propagating errors in post_save_errp. The latest error encountered is
>   propagated.
> - user-strings in error_prepend() calls now end with a ': ' so that the print 
> is pretty.
> - Change the order of one of the patches.
> - Link to v6: 
> https://lore.kernel.org/qemu-devel/[email protected]
> 
> Changes in v6:
> - Incorporated review comments from Daniel and Akihiko, related to few
>   semantic errors and improve error logging.
> - Add one more patch that removes NULL checks after calling
>   qemu_file_get_return_path() because it does not fail.
> - Link to v5: 
> https://lore.kernel.org/qemu-devel/[email protected]
> 
> Changes in v5:
> - Solve a bug that set errp even though it was not NULL, pointed out by 
> Fabiano in v4.
> - Link to v4: 
> https://lore.kernel.org/qemu-devel/[email protected]
> 
> Changes in v4:
> - Split the patches into smaller ones based on functions. Pass NULL in the
>   caller until errp is made available. Every function that has an
>   Error **errp object passed to it, ensures that it sets the errp object
>   in case of failure.
> - A few more functions within loadvm_process_command() now handle errors using
>   the errp object. I've converted these for consistency, taking Daniel's
>   patches (link above) as a reference.
> - Along with the post_load_errp() hook, other duplicate hooks are also 
> introduced.
>   This will enable us to migrate to the newer versions eventually.
> - Fix some semantic errors, like using error_propagate_prepend() in places 
> where
>   we need to preserve existing behaviour of accumulating the error in 
> local_err
>   and then propagating it to errp. This can be refactored in a later commit.
> - Add more information in commit messages explaining the changes.
> - Link to v3: 
> https://lore.kernel.org/qemu-devel/[email protected]
> 
> Changes in v3:
> - Split the 2nd patch into 2. Introducing post_load_with_error() hook
>   has been separated from using it in the backends TPM module. This is
>   so that it can be acknowledged.
> - Link to v2: 
> https://lore.kernel.org/qemu-devel/[email protected]
> 
> Changes in v2:
> - Combine the first two changes into one, focusing on passing the
>   Error object (errp) consistently through functions involved in
>   loading the VM's state. Other functions are not yet changed.
> - As suggested in the review comment, add null checks for errp
>   before adding error messages, preventing crashes.
>   We also now correctly set errors when post-copy migration fails.
> - In process_incoming_migration_co(), switch to error_prepend
>   instead of error_setg. This means we now null-check local_err in
>   the "fail" section before using it, preventing dereferencing issues.
> - Link to v1: 
> https://lore.kernel.org/qemu-devel/[email protected]
> 
> ---
> Arun Menon (27):
>       migration: push Error **errp into vmstate_subsection_load()
>       migration: push Error **errp into vmstate_load_state()
>       migration: push Error **errp into qemu_loadvm_state_header()
>       migration: push Error **errp into vmstate_load()
>       migration: push Error **errp into loadvm_process_command()
>       migration: push Error **errp into loadvm_handle_cmd_packaged()
>       migration: push Error **errp into qemu_loadvm_state()
>       migration: push Error **errp into qemu_load_device_state()
>       migration: push Error **errp into qemu_loadvm_state_main()
>       migration: push Error **errp into qemu_loadvm_section_start_full()
>       migration: push Error **errp into qemu_loadvm_section_part_end()
>       migration: Update qemu_file_get_return_path() docs and remove dead 
> checks
>       migration: make loadvm_postcopy_handle_resume() void
>       migration: push Error **errp into ram_postcopy_incoming_init()
>       migration: push Error **errp into loadvm_postcopy_handle_advise()
>       migration: push Error **errp into loadvm_postcopy_handle_listen()
>       migration: push Error **errp into loadvm_postcopy_handle_run()
>       migration: push Error **errp into loadvm_postcopy_ram_handle_discard()
>       migration: push Error **errp into loadvm_handle_recv_bitmap()
>       migration: Return -1 on memory allocation failure in ram.c
>       migration: push Error **errp into loadvm_process_enable_colo()
>       migration: push Error **errp into 
> loadvm_postcopy_handle_switchover_start()
>       migration: Capture error in postcopy_ram_listen_thread()
>       migration: Remove error variant of vmstate_save_state() function
>       migration: Rename post_save() to cleanup_save() and make it void
>       migration: Add error-parameterized function variants in VMSD struct
>       backends/tpm: Propagate vTPM error on migration failure
> 
>  backends/tpm/tpm_emulator.c   |  40 ++---
>  docs/devel/migration/main.rst |  21 ++-
>  hw/display/virtio-gpu.c       |   5 +-
>  hw/pci/pci.c                  |   5 +-
>  hw/ppc/spapr_pci.c            |   5 +-
>  hw/s390x/virtio-ccw.c         |   4 +-
>  hw/scsi/spapr_vscsi.c         |   6 +-
>  hw/vfio/pci.c                 |   9 +-
>  hw/virtio/virtio-mmio.c       |   5 +-
>  hw/virtio/virtio-pci.c        |   4 +-
>  hw/virtio/virtio.c            |  13 +-
>  include/migration/colo.h      |   2 +-
>  include/migration/vmstate.h   |  20 ++-
>  migration/colo.c              |  10 +-
>  migration/cpr.c               |   6 +-
>  migration/migration.c         |  38 ++---
>  migration/postcopy-ram.c      |   9 +-
>  migration/postcopy-ram.h      |   2 +-
>  migration/qemu-file.c         |   1 -
>  migration/ram.c               |  16 +-
>  migration/ram.h               |   4 +-
>  migration/savevm.c            | 334 
> ++++++++++++++++++++++++------------------
>  migration/savevm.h            |   7 +-
>  migration/vmstate-types.c     |  53 ++++---
>  migration/vmstate.c           | 115 ++++++++++-----
>  target/arm/machine.c          |   6 +-
>  tests/unit/test-vmstate.c     |  83 +++++++++--
>  ui/vdagent.c                  |   8 +-
>  28 files changed, 523 insertions(+), 308 deletions(-)
> ---
> base-commit: f0007b7f03e2d7fc33e71c3a582f2364c51a226b
> change-id: 20250624-propagate_tpm_error-bf4ae6c23d30
> 
> Best regards,
> -- 
> Arun Menon <[email protected]>
> 

Hi,
Gentle ping for the series.
Is there something more to be done to improve this before queueing it?
TIA.


Regards,
Arun Menon


Reply via email to