On 2025/08/06 3:25, Arun Menon wrote:
We consider,
- the saving of the device state (save or subsection save) and
- running the cleanup after (post_save)
as an atomic operation. And that is why, post_save() is called regardless
of whether there is a preceeding error. This means that, it is possible
that we have 2 distict errors, one from the preceeding function and the
other from the post_save() function.
This commit changes the error handling behavior when two errors occur during
a save operation.
1) Preceding errors were propagated before, but they are now dismissed, if there
is a new post_save() error.
- A failure during the main save operation, means the system failed to
reach a new desired state. A failure during the post-save cleanup,
however, is a more critical issue as it can leave the system in an
inconsistent or corrupted state. At present, all post_save() calls
succeed. However, the introduction of error handling variants of these
functions (post_save_errp) in the following commit, imply that we need
to handle and propagate these errors. Therefore, we prioritize reporting
the more severe error.
This explains why the post_save() error is propagated instead of
propagating the preceding error. But the preceding errors still do not
have to be dismissed if we report them with error_report_err() instead.
2) post_save() errors were dismissed before, but they are now propagated.
- The original design implicitly assumed post-save hooks were infallible
cleanup routines.
This will not be the case after introducing the post/pre save/load errp
variant functions. Dismissing these errors prevents users from
identifying the specific operation that failed.
Re-iterating previous comments, if introducing post save errp variant
functions break the assumption, we just don't have to introduce one. The
reason to introduce one needs to be explained.
The entire save-and-cleanup process is treated as a single
logical operation, and all potential failures are communicated.
Signed-off-by: Arun Menon <arme...@redhat.com>
---
migration/vmstate.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/migration/vmstate.c b/migration/vmstate.c
index
fbc59caadbbcc75fe6de27b459aa9aa25e76aa0a..ef78a1e62ad92e9608de72d125da80ea496c8dd1
100644
--- a/migration/vmstate.c
+++ b/migration/vmstate.c
@@ -554,6 +554,12 @@ static int vmstate_save_dispatch(QEMUFile *f,
error_setg(errp, "Save of field %s/%s failed",
vmsd->name, field->name);
ps_ret = post_save_dispatch(vmsd, opaque, &local_err);
+ if (ps_ret) {
+ ret = ps_ret;
+ error_free_or_abort(errp);
+ error_setg(errp, "post-save for %s failed, ret: %d",
+ vmsd->name, ret);
+ }
return ret;
}
@@ -603,10 +609,14 @@ int vmstate_save_state_v(QEMUFile *f, const VMStateDescription *vmsd,
}
ret = vmstate_subsection_save(f, vmsd, opaque, vmdesc, errp);
+
ps_ret = post_save_dispatch(vmsd, opaque, &local_err);
- if (!ret && ps_ret) {
+ if (ps_ret) {
+ if (ret) {
+ error_free_or_abort(errp);
+ }
ret = ps_ret;
- error_setg(errp, "post-save failed: %s", vmsd->name);
+ error_propagate(errp, local_err);
}
return ret;