Re: [PULL 26/34] migration/multifd: Join the TLS thread

2024-02-15 Thread Fabiano Rosas
Michael Tokarev  writes:

> 14.02.2024 16:27, Fabiano Rosas :
>> Michael Tokarev  writes:
> ..>>> This change, which is suggested for -stable, while simple by its own, 
> seems
 to depend on the previous changes in this series, which are not for 
 -stable.
 In particular, whole "Finally recycle all the threads" loop in 
 multifd_send_terminate_threads()
 (to which the join is being added by this change) is moved from elsewhere 
 by
 12808db3b8 "migration/multifd: Cleanup multifd_save_cleanup()" (patch 24 in
 this same series).

>>> We can probably add the missing join right into the previous location of 
>>> this
>>> loop (before 12808db3b8).  I did this in the attached variant for 8.2, is
>>> this correct?
>
> I forgot to attach the patch.  It just moves the join from 
> multifd_send_terminate_threads()
> back to multifd_save_cleanup.  Attached now.
>
>> It should work. This was originally developed without the rest of the
>> changes on this PR.
>> 
>>> And this does not pass even the basic tests, so it's not that simple :)
>> 
>> Do you have a log of what failed?
>
> Re-running it again...  I haven't even tried to push it somewhere for CI to 
> run,
> I run local `ninja test', which painted some migration tests in red.  Here:
>
> 202/844 qemu:qtest+qtest-aarch64 / qtest-aarch64/migration-test   ERROR   
> 70.26s   killed by signal 6 SIGABRT
> 330/844 qemu:qtest+qtest-i386 / qtest-i386/migration-test ERROR   
> 85.33s   killed by signal 6 SIGABRT
> 454/844 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test ERROR  
> 101.02s   killed by signal 6 SIGABRT
>
> Unfortunately I don't see anything interesting in the log:
>
> # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-463614.sock 
> -qtest-log /dev/null -chardev socket,path=/tmp/qtest-463614.qmp,id=char0 
> -mon chardev=char0,mode=control -display none -audio none -accel kvm -accel 
> tcg -machine pc-q35-8.2, -name target,debug-threads=on -m 150M -serial 
> file:/tmp/migration-test-SPJTI2/dest_serial -incoming defer -drive 
> if=none,id=d0,file=/tmp/migration-test-SPJTI2/bootsect,format=raw -device 
> ide-hd,drive=d0,secs=1,cyls=1,heads=12>/dev/null -accel qtest
> --- stderr ---
> ../../build/qemu/8.2/tests/qtest/libqtest.c:204: kill_qemu() detected QEMU 
> death from signal 6 (Aborted)
> (test program exited with status code -6)
>
> Without the attached patch it works.

Ah ok, this is hitting the bug fixed by patch 31. Let's leave patches
26, 27 and 31 out of stable, it would be too risky to backport.




Re: [PULL 26/34] migration/multifd: Join the TLS thread

2024-02-14 Thread Michael Tokarev

14.02.2024 16:27, Fabiano Rosas :

Michael Tokarev  writes:

..>>> This change, which is suggested for -stable, while simple by its own, 
seems

to depend on the previous changes in this series, which are not for -stable.
In particular, whole "Finally recycle all the threads" loop in 
multifd_send_terminate_threads()
(to which the join is being added by this change) is moved from elsewhere by
12808db3b8 "migration/multifd: Cleanup multifd_save_cleanup()" (patch 24 in
this same series).


We can probably add the missing join right into the previous location of this
loop (before 12808db3b8).  I did this in the attached variant for 8.2, is
this correct?


I forgot to attach the patch.  It just moves the join from 
multifd_send_terminate_threads()
back to multifd_save_cleanup.  Attached now.


It should work. This was originally developed without the rest of the
changes on this PR.


And this does not pass even the basic tests, so it's not that simple :)


Do you have a log of what failed?


Re-running it again...  I haven't even tried to push it somewhere for CI to run,
I run local `ninja test', which painted some migration tests in red.  Here:

202/844 qemu:qtest+qtest-aarch64 / qtest-aarch64/migration-test   ERROR   
70.26s   killed by signal 6 SIGABRT
330/844 qemu:qtest+qtest-i386 / qtest-i386/migration-test ERROR   
85.33s   killed by signal 6 SIGABRT
454/844 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test ERROR  
101.02s   killed by signal 6 SIGABRT

Unfortunately I don't see anything interesting in the log:

# starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-463614.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-463614.qmp,id=char0 
-mon chardev=char0,mode=control -display none -audio none -accel kvm -accel tcg -machine pc-q35-8.2, -name target,debug-threads=on -m 150M -serial 
file:/tmp/migration-test-SPJTI2/dest_serial -incoming defer -drive if=none,id=d0,file=/tmp/migration-test-SPJTI2/bootsect,format=raw -device 
ide-hd,drive=d0,secs=1,cyls=1,heads=12>/dev/null -accel qtest

--- stderr ---
../../build/qemu/8.2/tests/qtest/libqtest.c:204: kill_qemu() detected QEMU 
death from signal 6 (Aborted)
(test program exited with status code -6)

Without the attached patch it works.


Anyway, I could prepare a backport on top of 8.2 for you.


Well, that would definitely be helpful, if you think it's worth to
provide backports for 8.2 for these.   As my attempt apparently isn't
very successful :)


The following patch (27/34) is more questionable than this one.


Thank you!

/mjt
From 6d4aae84a06fc7e26dcb1d986a4de3c6d65eb064 Mon Sep 17 00:00:00 2001
From: Fabiano Rosas 
Date: Tue, 6 Feb 2024 18:51:13 -0300
Subject: [PATCH] migration/multifd: Join the TLS thread

We're currently leaking the resources of the TLS thread by not joining
it and also overwriting the p->thread pointer altogether.

Fixes: a1af605bd5 ("migration/multifd: fix hangup with TLS-Multifd due to blocking handshake")
Cc: qemu-stable 
Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240206215118.6171-2-faro...@suse.de
Signed-off-by: Peter Xu 
(cherry picked from commit e1921f10d9afe651f4887284e85f6789b37e67d3)
Signed-off-by: Michael Tokarev 
(Mjt: fixup for before v8.2.0-1142-g12808db3b8
 "migration/multifd: Cleanup multifd_save_cleanup()")
---
 migration/multifd.c | 8 +++-
 migration/multifd.h | 2 ++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 409460684f..3183aa9e82 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -525,6 +525,10 @@ void multifd_save_cleanup(void)
 for (i = 0; i < migrate_multifd_channels(); i++) {
 MultiFDSendParams *p = _send_state->params[i];
 
+if (p->tls_thread_created) {
+qemu_thread_join(>tls_thread);
+}
+
 if (p->running) {
 qemu_thread_join(>thread);
 }
@@ -826,7 +830,9 @@ static bool multifd_tls_channel_connect(MultiFDSendParams *p,
 trace_multifd_tls_outgoing_handshake_start(ioc, tioc, hostname);
 qio_channel_set_name(QIO_CHANNEL(tioc), "multifd-tls-outgoing");
 p->c = QIO_CHANNEL(tioc);
-qemu_thread_create(>thread, "multifd-tls-handshake-worker",
+
+p->tls_thread_created = true;
+qemu_thread_create(>tls_thread, "multifd-tls-handshake-worker",
multifd_tls_handshake_thread, p,
QEMU_THREAD_JOINABLE);
 return true;
diff --git a/migration/multifd.h b/migration/multifd.h
index a835643b48..8fbffbaa5a 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -75,6 +75,8 @@ typedef struct {
 char *name;
 /* channel thread id */
 QemuThread thread;
+QemuThread tls_thread;
+bool tls_thread_created;
 /* communication channel */
 QIOChannel *c;
 /* is the yank function registered */
-- 
2.39.2



Re: [PULL 26/34] migration/multifd: Join the TLS thread

2024-02-14 Thread Fabiano Rosas
Michael Tokarev  writes:

> 10.02.2024 12:18, Michael Tokarev:
>> 08.02.2024 06:05, pet...@redhat.com :
>>> From: Fabiano Rosas 
>>>
>>> We're currently leaking the resources of the TLS thread by not joining
>>> it and also overwriting the p->thread pointer altogether.
>>>
>>> Fixes: a1af605bd5 ("migration/multifd: fix hangup with TLS-Multifd due to 
>>> blocking handshake")
>>> Cc: qemu-stable 
>>> Reviewed-by: Peter Xu 
>>> Signed-off-by: Fabiano Rosas 
>>> Link: https://lore.kernel.org/r/20240206215118.6171-2-faro...@suse.de
>>> Signed-off-by: Peter Xu 
>> 
>> This change, which is suggested for -stable, while simple by its own, seems
>> to depend on the previous changes in this series, which are not for -stable.
>> In particular, whole "Finally recycle all the threads" loop in 
>> multifd_send_terminate_threads()
>> (to which the join is being added by this change) is moved from elsewhere by
>> 12808db3b8 "migration/multifd: Cleanup multifd_save_cleanup()" (patch 24 in
>> this same series).
>> 
> We can probably add the missing join right into the previous location of this
> loop (before 12808db3b8).  I did this in the attached variant for 8.2, is
> this correct?

It should work. This was originally developed without the rest of the
changes on this PR.

>
> And this does not pass even the basic tests, so it's not that simple :)

Do you have a log of what failed?

Anyway, I could prepare a backport on top of 8.2 for you.

>
> The following patch (27/34) is more questionable than this one.
>
> Thanks!
>
> /mjt



Re: [PULL 26/34] migration/multifd: Join the TLS thread

2024-02-10 Thread Michael Tokarev

10.02.2024 12:18, Michael Tokarev:

08.02.2024 06:05, pet...@redhat.com :

From: Fabiano Rosas 

We're currently leaking the resources of the TLS thread by not joining
it and also overwriting the p->thread pointer altogether.

Fixes: a1af605bd5 ("migration/multifd: fix hangup with TLS-Multifd due to blocking 
handshake")
Cc: qemu-stable 
Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240206215118.6171-2-faro...@suse.de
Signed-off-by: Peter Xu 


This change, which is suggested for -stable, while simple by its own, seems
to depend on the previous changes in this series, which are not for -stable.
In particular, whole "Finally recycle all the threads" loop in 
multifd_send_terminate_threads()
(to which the join is being added by this change) is moved from elsewhere by
12808db3b8 "migration/multifd: Cleanup multifd_save_cleanup()" (patch 24 in
this same series).

We can probably add the missing join right into the previous location of this
loop (before 12808db3b8).  I did this in the attached variant for 8.2, is
this correct?


And this does not pass even the basic tests, so it's not that simple :)

The following patch (27/34) is more questionable than this one.

Thanks!

/mjt



Re: [PULL 26/34] migration/multifd: Join the TLS thread

2024-02-10 Thread Michael Tokarev

08.02.2024 06:05, pet...@redhat.com :

From: Fabiano Rosas 

We're currently leaking the resources of the TLS thread by not joining
it and also overwriting the p->thread pointer altogether.

Fixes: a1af605bd5 ("migration/multifd: fix hangup with TLS-Multifd due to blocking 
handshake")
Cc: qemu-stable 
Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240206215118.6171-2-faro...@suse.de
Signed-off-by: Peter Xu 


This change, which is suggested for -stable, while simple by its own, seems
to depend on the previous changes in this series, which are not for -stable.
In particular, whole "Finally recycle all the threads" loop in 
multifd_send_terminate_threads()
(to which the join is being added by this change) is moved from elsewhere by
12808db3b8 "migration/multifd: Cleanup multifd_save_cleanup()" (patch 24 in
this same series).

We can probably add the missing join right into the previous location of this
loop (before 12808db3b8).  I did this in the attached variant for 8.2, is
this correct?

Should we pick this up for 7.2 too?

Thanks,

/mjt


  migration/multifd.h | 2 ++
  migration/multifd.c | 8 +++-
  2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 78a2317263..720c9d50db 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -73,6 +73,8 @@ typedef struct {
  char *name;
  /* channel thread id */
  QemuThread thread;
+QemuThread tls_thread;
+bool tls_thread_created;
  /* communication channel */
  QIOChannel *c;
  /* is the yank function registered */
diff --git a/migration/multifd.c b/migration/multifd.c
index fbdb129088..5551711a2a 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -630,6 +630,10 @@ static void multifd_send_terminate_threads(void)
  for (i = 0; i < migrate_multifd_channels(); i++) {
  MultiFDSendParams *p = _send_state->params[i];
  
+if (p->tls_thread_created) {

+qemu_thread_join(>tls_thread);
+}
+
  if (p->running) {
  qemu_thread_join(>thread);
  }
@@ -921,7 +925,9 @@ static bool multifd_tls_channel_connect(MultiFDSendParams 
*p,
  trace_multifd_tls_outgoing_handshake_start(ioc, tioc, hostname);
  qio_channel_set_name(QIO_CHANNEL(tioc), "multifd-tls-outgoing");
  p->c = QIO_CHANNEL(tioc);
-qemu_thread_create(>thread, "multifd-tls-handshake-worker",
+
+p->tls_thread_created = true;
+qemu_thread_create(>tls_thread, "multifd-tls-handshake-worker",
 multifd_tls_handshake_thread, p,
 QEMU_THREAD_JOINABLE);
  return true;





[PULL 26/34] migration/multifd: Join the TLS thread

2024-02-07 Thread peterx
From: Fabiano Rosas 

We're currently leaking the resources of the TLS thread by not joining
it and also overwriting the p->thread pointer altogether.

Fixes: a1af605bd5 ("migration/multifd: fix hangup with TLS-Multifd due to 
blocking handshake")
Cc: qemu-stable 
Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240206215118.6171-2-faro...@suse.de
Signed-off-by: Peter Xu 
---
 migration/multifd.h | 2 ++
 migration/multifd.c | 8 +++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 78a2317263..720c9d50db 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -73,6 +73,8 @@ typedef struct {
 char *name;
 /* channel thread id */
 QemuThread thread;
+QemuThread tls_thread;
+bool tls_thread_created;
 /* communication channel */
 QIOChannel *c;
 /* is the yank function registered */
diff --git a/migration/multifd.c b/migration/multifd.c
index fbdb129088..5551711a2a 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -630,6 +630,10 @@ static void multifd_send_terminate_threads(void)
 for (i = 0; i < migrate_multifd_channels(); i++) {
 MultiFDSendParams *p = _send_state->params[i];
 
+if (p->tls_thread_created) {
+qemu_thread_join(>tls_thread);
+}
+
 if (p->running) {
 qemu_thread_join(>thread);
 }
@@ -921,7 +925,9 @@ static bool multifd_tls_channel_connect(MultiFDSendParams 
*p,
 trace_multifd_tls_outgoing_handshake_start(ioc, tioc, hostname);
 qio_channel_set_name(QIO_CHANNEL(tioc), "multifd-tls-outgoing");
 p->c = QIO_CHANNEL(tioc);
-qemu_thread_create(>thread, "multifd-tls-handshake-worker",
+
+p->tls_thread_created = true;
+qemu_thread_create(>tls_thread, "multifd-tls-handshake-worker",
multifd_tls_handshake_thread, p,
QEMU_THREAD_JOINABLE);
 return true;
-- 
2.43.0