Re: [PULL 26/34] migration/multifd: Join the TLS thread
Michael Tokarev writes: > 14.02.2024 16:27, Fabiano Rosas : >> Michael Tokarev writes: > ..>>> This change, which is suggested for -stable, while simple by its own, > seems to depend on the previous changes in this series, which are not for -stable. In particular, whole "Finally recycle all the threads" loop in multifd_send_terminate_threads() (to which the join is being added by this change) is moved from elsewhere by 12808db3b8 "migration/multifd: Cleanup multifd_save_cleanup()" (patch 24 in this same series). >>> We can probably add the missing join right into the previous location of >>> this >>> loop (before 12808db3b8). I did this in the attached variant for 8.2, is >>> this correct? > > I forgot to attach the patch. It just moves the join from > multifd_send_terminate_threads() > back to multifd_save_cleanup. Attached now. > >> It should work. This was originally developed without the rest of the >> changes on this PR. >> >>> And this does not pass even the basic tests, so it's not that simple :) >> >> Do you have a log of what failed? > > Re-running it again... I haven't even tried to push it somewhere for CI to > run, > I run local `ninja test', which painted some migration tests in red. Here: > > 202/844 qemu:qtest+qtest-aarch64 / qtest-aarch64/migration-test ERROR > 70.26s killed by signal 6 SIGABRT > 330/844 qemu:qtest+qtest-i386 / qtest-i386/migration-test ERROR > 85.33s killed by signal 6 SIGABRT > 454/844 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test ERROR > 101.02s killed by signal 6 SIGABRT > > Unfortunately I don't see anything interesting in the log: > > # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-463614.sock > -qtest-log /dev/null -chardev socket,path=/tmp/qtest-463614.qmp,id=char0 > -mon chardev=char0,mode=control -display none -audio none -accel kvm -accel > tcg -machine pc-q35-8.2, -name target,debug-threads=on -m 150M -serial > file:/tmp/migration-test-SPJTI2/dest_serial -incoming defer -drive > if=none,id=d0,file=/tmp/migration-test-SPJTI2/bootsect,format=raw -device > ide-hd,drive=d0,secs=1,cyls=1,heads=12>/dev/null -accel qtest > --- stderr --- > ../../build/qemu/8.2/tests/qtest/libqtest.c:204: kill_qemu() detected QEMU > death from signal 6 (Aborted) > (test program exited with status code -6) > > Without the attached patch it works. Ah ok, this is hitting the bug fixed by patch 31. Let's leave patches 26, 27 and 31 out of stable, it would be too risky to backport.
Re: [PULL 26/34] migration/multifd: Join the TLS thread
14.02.2024 16:27, Fabiano Rosas : Michael Tokarev writes: ..>>> This change, which is suggested for -stable, while simple by its own, seems to depend on the previous changes in this series, which are not for -stable. In particular, whole "Finally recycle all the threads" loop in multifd_send_terminate_threads() (to which the join is being added by this change) is moved from elsewhere by 12808db3b8 "migration/multifd: Cleanup multifd_save_cleanup()" (patch 24 in this same series). We can probably add the missing join right into the previous location of this loop (before 12808db3b8). I did this in the attached variant for 8.2, is this correct? I forgot to attach the patch. It just moves the join from multifd_send_terminate_threads() back to multifd_save_cleanup. Attached now. It should work. This was originally developed without the rest of the changes on this PR. And this does not pass even the basic tests, so it's not that simple :) Do you have a log of what failed? Re-running it again... I haven't even tried to push it somewhere for CI to run, I run local `ninja test', which painted some migration tests in red. Here: 202/844 qemu:qtest+qtest-aarch64 / qtest-aarch64/migration-test ERROR 70.26s killed by signal 6 SIGABRT 330/844 qemu:qtest+qtest-i386 / qtest-i386/migration-test ERROR 85.33s killed by signal 6 SIGABRT 454/844 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test ERROR 101.02s killed by signal 6 SIGABRT Unfortunately I don't see anything interesting in the log: # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-463614.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-463614.qmp,id=char0 -mon chardev=char0,mode=control -display none -audio none -accel kvm -accel tcg -machine pc-q35-8.2, -name target,debug-threads=on -m 150M -serial file:/tmp/migration-test-SPJTI2/dest_serial -incoming defer -drive if=none,id=d0,file=/tmp/migration-test-SPJTI2/bootsect,format=raw -device ide-hd,drive=d0,secs=1,cyls=1,heads=12>/dev/null -accel qtest --- stderr --- ../../build/qemu/8.2/tests/qtest/libqtest.c:204: kill_qemu() detected QEMU death from signal 6 (Aborted) (test program exited with status code -6) Without the attached patch it works. Anyway, I could prepare a backport on top of 8.2 for you. Well, that would definitely be helpful, if you think it's worth to provide backports for 8.2 for these. As my attempt apparently isn't very successful :) The following patch (27/34) is more questionable than this one. Thank you! /mjt From 6d4aae84a06fc7e26dcb1d986a4de3c6d65eb064 Mon Sep 17 00:00:00 2001 From: Fabiano Rosas Date: Tue, 6 Feb 2024 18:51:13 -0300 Subject: [PATCH] migration/multifd: Join the TLS thread We're currently leaking the resources of the TLS thread by not joining it and also overwriting the p->thread pointer altogether. Fixes: a1af605bd5 ("migration/multifd: fix hangup with TLS-Multifd due to blocking handshake") Cc: qemu-stable Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240206215118.6171-2-faro...@suse.de Signed-off-by: Peter Xu (cherry picked from commit e1921f10d9afe651f4887284e85f6789b37e67d3) Signed-off-by: Michael Tokarev (Mjt: fixup for before v8.2.0-1142-g12808db3b8 "migration/multifd: Cleanup multifd_save_cleanup()") --- migration/multifd.c | 8 +++- migration/multifd.h | 2 ++ 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/migration/multifd.c b/migration/multifd.c index 409460684f..3183aa9e82 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -525,6 +525,10 @@ void multifd_save_cleanup(void) for (i = 0; i < migrate_multifd_channels(); i++) { MultiFDSendParams *p = _send_state->params[i]; +if (p->tls_thread_created) { +qemu_thread_join(>tls_thread); +} + if (p->running) { qemu_thread_join(>thread); } @@ -826,7 +830,9 @@ static bool multifd_tls_channel_connect(MultiFDSendParams *p, trace_multifd_tls_outgoing_handshake_start(ioc, tioc, hostname); qio_channel_set_name(QIO_CHANNEL(tioc), "multifd-tls-outgoing"); p->c = QIO_CHANNEL(tioc); -qemu_thread_create(>thread, "multifd-tls-handshake-worker", + +p->tls_thread_created = true; +qemu_thread_create(>tls_thread, "multifd-tls-handshake-worker", multifd_tls_handshake_thread, p, QEMU_THREAD_JOINABLE); return true; diff --git a/migration/multifd.h b/migration/multifd.h index a835643b48..8fbffbaa5a 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -75,6 +75,8 @@ typedef struct { char *name; /* channel thread id */ QemuThread thread; +QemuThread tls_thread; +bool tls_thread_created; /* communication channel */ QIOChannel *c; /* is the yank function registered */ -- 2.39.2
Re: [PULL 26/34] migration/multifd: Join the TLS thread
Michael Tokarev writes: > 10.02.2024 12:18, Michael Tokarev: >> 08.02.2024 06:05, pet...@redhat.com : >>> From: Fabiano Rosas >>> >>> We're currently leaking the resources of the TLS thread by not joining >>> it and also overwriting the p->thread pointer altogether. >>> >>> Fixes: a1af605bd5 ("migration/multifd: fix hangup with TLS-Multifd due to >>> blocking handshake") >>> Cc: qemu-stable >>> Reviewed-by: Peter Xu >>> Signed-off-by: Fabiano Rosas >>> Link: https://lore.kernel.org/r/20240206215118.6171-2-faro...@suse.de >>> Signed-off-by: Peter Xu >> >> This change, which is suggested for -stable, while simple by its own, seems >> to depend on the previous changes in this series, which are not for -stable. >> In particular, whole "Finally recycle all the threads" loop in >> multifd_send_terminate_threads() >> (to which the join is being added by this change) is moved from elsewhere by >> 12808db3b8 "migration/multifd: Cleanup multifd_save_cleanup()" (patch 24 in >> this same series). >> > We can probably add the missing join right into the previous location of this > loop (before 12808db3b8). I did this in the attached variant for 8.2, is > this correct? It should work. This was originally developed without the rest of the changes on this PR. > > And this does not pass even the basic tests, so it's not that simple :) Do you have a log of what failed? Anyway, I could prepare a backport on top of 8.2 for you. > > The following patch (27/34) is more questionable than this one. > > Thanks! > > /mjt
Re: [PULL 26/34] migration/multifd: Join the TLS thread
10.02.2024 12:18, Michael Tokarev: 08.02.2024 06:05, pet...@redhat.com : From: Fabiano Rosas We're currently leaking the resources of the TLS thread by not joining it and also overwriting the p->thread pointer altogether. Fixes: a1af605bd5 ("migration/multifd: fix hangup with TLS-Multifd due to blocking handshake") Cc: qemu-stable Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240206215118.6171-2-faro...@suse.de Signed-off-by: Peter Xu This change, which is suggested for -stable, while simple by its own, seems to depend on the previous changes in this series, which are not for -stable. In particular, whole "Finally recycle all the threads" loop in multifd_send_terminate_threads() (to which the join is being added by this change) is moved from elsewhere by 12808db3b8 "migration/multifd: Cleanup multifd_save_cleanup()" (patch 24 in this same series). We can probably add the missing join right into the previous location of this loop (before 12808db3b8). I did this in the attached variant for 8.2, is this correct? And this does not pass even the basic tests, so it's not that simple :) The following patch (27/34) is more questionable than this one. Thanks! /mjt
Re: [PULL 26/34] migration/multifd: Join the TLS thread
08.02.2024 06:05, pet...@redhat.com : From: Fabiano Rosas We're currently leaking the resources of the TLS thread by not joining it and also overwriting the p->thread pointer altogether. Fixes: a1af605bd5 ("migration/multifd: fix hangup with TLS-Multifd due to blocking handshake") Cc: qemu-stable Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240206215118.6171-2-faro...@suse.de Signed-off-by: Peter Xu This change, which is suggested for -stable, while simple by its own, seems to depend on the previous changes in this series, which are not for -stable. In particular, whole "Finally recycle all the threads" loop in multifd_send_terminate_threads() (to which the join is being added by this change) is moved from elsewhere by 12808db3b8 "migration/multifd: Cleanup multifd_save_cleanup()" (patch 24 in this same series). We can probably add the missing join right into the previous location of this loop (before 12808db3b8). I did this in the attached variant for 8.2, is this correct? Should we pick this up for 7.2 too? Thanks, /mjt migration/multifd.h | 2 ++ migration/multifd.c | 8 +++- 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/migration/multifd.h b/migration/multifd.h index 78a2317263..720c9d50db 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -73,6 +73,8 @@ typedef struct { char *name; /* channel thread id */ QemuThread thread; +QemuThread tls_thread; +bool tls_thread_created; /* communication channel */ QIOChannel *c; /* is the yank function registered */ diff --git a/migration/multifd.c b/migration/multifd.c index fbdb129088..5551711a2a 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -630,6 +630,10 @@ static void multifd_send_terminate_threads(void) for (i = 0; i < migrate_multifd_channels(); i++) { MultiFDSendParams *p = _send_state->params[i]; +if (p->tls_thread_created) { +qemu_thread_join(>tls_thread); +} + if (p->running) { qemu_thread_join(>thread); } @@ -921,7 +925,9 @@ static bool multifd_tls_channel_connect(MultiFDSendParams *p, trace_multifd_tls_outgoing_handshake_start(ioc, tioc, hostname); qio_channel_set_name(QIO_CHANNEL(tioc), "multifd-tls-outgoing"); p->c = QIO_CHANNEL(tioc); -qemu_thread_create(>thread, "multifd-tls-handshake-worker", + +p->tls_thread_created = true; +qemu_thread_create(>tls_thread, "multifd-tls-handshake-worker", multifd_tls_handshake_thread, p, QEMU_THREAD_JOINABLE); return true;
[PULL 26/34] migration/multifd: Join the TLS thread
From: Fabiano Rosas We're currently leaking the resources of the TLS thread by not joining it and also overwriting the p->thread pointer altogether. Fixes: a1af605bd5 ("migration/multifd: fix hangup with TLS-Multifd due to blocking handshake") Cc: qemu-stable Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240206215118.6171-2-faro...@suse.de Signed-off-by: Peter Xu --- migration/multifd.h | 2 ++ migration/multifd.c | 8 +++- 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/migration/multifd.h b/migration/multifd.h index 78a2317263..720c9d50db 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -73,6 +73,8 @@ typedef struct { char *name; /* channel thread id */ QemuThread thread; +QemuThread tls_thread; +bool tls_thread_created; /* communication channel */ QIOChannel *c; /* is the yank function registered */ diff --git a/migration/multifd.c b/migration/multifd.c index fbdb129088..5551711a2a 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -630,6 +630,10 @@ static void multifd_send_terminate_threads(void) for (i = 0; i < migrate_multifd_channels(); i++) { MultiFDSendParams *p = _send_state->params[i]; +if (p->tls_thread_created) { +qemu_thread_join(>tls_thread); +} + if (p->running) { qemu_thread_join(>thread); } @@ -921,7 +925,9 @@ static bool multifd_tls_channel_connect(MultiFDSendParams *p, trace_multifd_tls_outgoing_handshake_start(ioc, tioc, hostname); qio_channel_set_name(QIO_CHANNEL(tioc), "multifd-tls-outgoing"); p->c = QIO_CHANNEL(tioc); -qemu_thread_create(>thread, "multifd-tls-handshake-worker", + +p->tls_thread_created = true; +qemu_thread_create(>tls_thread, "multifd-tls-handshake-worker", multifd_tls_handshake_thread, p, QEMU_THREAD_JOINABLE); return true; -- 2.43.0