Hello Peter
On 2/8/24 06:57, Peter Xu wrote:
On Wed, Feb 07, 2024 at 02:33:47PM +0100, Cédric Le Goater wrote:
In case of error, close_return_path_on_source() can perform a shutdown
to exit the return-path thread. However, in migrate_fd_cleanup(),
'to_dst_file' is closed before calling close_return_path_on_source()
and the shutdown fails, leaving the source and destination waiting for
an event to occur.
Close the file after calling close_return_path_on_source() so that the
shutdown succeeds and the return-path thread exits.
Signed-off-by: Cédric Le Goater <c...@redhat.com>
---
This is an RFC because the correct fix implies reworking the QEMUFile
construct, built on top of the QEMU I/O channel.
migration/migration.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index
5f55af3d7624750ca416c4177781241b3e291e5d..de329f2c553288935d824748286e79e535929b8b
100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1313,6 +1313,8 @@ void migrate_set_state(int *state, int old_state, int
new_state)
static void migrate_fd_cleanup(MigrationState *s)
{
+ QEMUFile *tmp = NULL;
+
g_free(s->hostname);
s->hostname = NULL;
json_writer_free(s->vmdesc);
@@ -1321,8 +1323,6 @@ static void migrate_fd_cleanup(MigrationState *s)
qemu_savevm_state_cleanup();
if (s->to_dst_file) {
- QEMUFile *tmp;
-
trace_migrate_fd_cleanup();
bql_unlock();
if (s->migration_thread_running) {
@@ -1341,15 +1341,14 @@ static void migrate_fd_cleanup(MigrationState *s)
* critical section won't block for long.
*/
migration_ioc_unregister_yank_from_file(tmp);
- qemu_fclose(tmp);
}
- /*
- * We already cleaned up to_dst_file, so errors from the return
- * path might be due to that, ignore them.
- */
close_return_path_on_source(s);
+ if (tmp) {
+ qemu_fclose(tmp);
+ }
+
assert(!migration_is_active(s));
if (s->state == MIGRATION_STATUS_CANCELLING) {
I think this is okay to me for a short term plan. I'll see how others
think, also add Dan into the loop.
If so, would you please add a rich comment explaining why tmp needs to be
closed later? Especially, explicit comment on the ordering requirement
would be helpful: IMHO here it's an order that qemu_fclose() must happen
after close_return_path_on_source(). So when others work on this code we
don't easily break it without noticing.
Sure. I will when we have clarified with Fabiano what is the best
approach.
Also please feel free to post separately on migration patches if you'd like
us to merge the patches when repost.
This series is a collection of multiple (related) changes :
* extra Error** parameter to save_setup() migration handlers.
This change has consequences on the various callers which are not
fully analyzed.
* similar changes for memory logging handlers. These looks more self
contained and I will see if I can send then separately.
* return-path thread termination
and then, in background we have open questions regarding :
* the QEMUfile implementation and its QIOChannel usage for migration
streams
* qemu_file_set_error* vs. migrate_set_error. It is confusing, at least
for me. Do we have some documentation on best practices ?
Thanks,
C.