On Thu, Aug 07, 2025 at 10:41:17AM +0800, yong.hu...@smartx.com wrote:
> From: Hyman Huang <yong.hu...@smartx.com>
> 
> When there are network issues like missing TCP ACKs on the send
> side during the multifd live migration. At the send side, the error
> "Connection timed out" is thrown out and source QEMU process stop
> sending data, at the receive side, The IO-channels may be blocked
> at recvmsg() and thus the main loop gets stuck and fails to respond
> to QMP commands consequently.

The core contract of the main event loop thread is that *NOTHING*
must ever go into a blocking sleep/wait state, precisely because
this breaks other functionality using the event loop such as QMP.

> The QEMU backtrace at the receive side with the main thread and two
> multi-channel threads is displayed as follows:

snip

> main thread:
> Thread 1 (Thread 0x7fd45f1fbe40 (LWP 1413088)):
> 0  0x00007fd46066b616 in futex_abstimed_wait_cancelable (private=0, 
> abstime=0x0, clockid=0, expected=0, futex_word=0x5556d7604e80) at 
> ../sysdeps/unix/sysv/linux/futex-internal.h:216
> 1  do_futex_wait (sem=sem@entry=0x5556d7604e80, abstime=0x0) at 
> sem_waitcommon.c:111
> 2  0x00007fd46066b708 in __new_sem_wait_slow (sem=sem@entry=0x5556d7604e80, 
> abstime=0x0) at sem_waitcommon.c:183
> 3  0x00007fd46066b779 in __new_sem_wait (sem=sem@entry=0x5556d7604e80) at 
> sem_wait.c:42
> 4  0x00005556d5415524 in qemu_sem_wait (sem=0x5556d7604e80) at 
> ../util/qemu-thread-posix.c:358
> 5  0x00005556d4fa5e99 in multifd_recv_sync_main () at 
> ../migration/multifd.c:1052
> 6  0x00005556d521ed65 in ram_load_precopy (f=f@entry=0x5556d75dfb90) at 
> ../migration/ram.c:4446
> 7  0x00005556d521f1dd in ram_load (f=0x5556d75dfb90, opaque=<optimized out>, 
> version_id=4) at ../migration/ram.c:4495
> 8  0x00005556d4faa3e7 in vmstate_load (f=f@entry=0x5556d75dfb90, 
> se=se@entry=0x5556d6083070) at ../migration/savevm.c:909
> 9  0x00005556d4fae7a0 in qemu_loadvm_section_part_end (mis=0x5556d6082cc0, 
> f=0x5556d75dfb90) at ../migration/savevm.c:2475
> 10 qemu_loadvm_state_main (f=f@entry=0x5556d75dfb90, 
> mis=mis@entry=0x5556d6082cc0) at ../migration/savevm.c:2634
> 11 0x00005556d4fafbd5 in qemu_loadvm_state (f=0x5556d75dfb90) at 
> ../migration/savevm.c:2706
> 12 0x00005556d4f9ebdb in process_incoming_migration_co (opaque=<optimized 
> out>) at ../migration/migration.c:561
> 13 0x00005556d542513b in coroutine_trampoline (i0=<optimized out>, 
> i1=<optimized out>) at ../util/coroutine-ucontext.c:186
> 14 0x00007fd4604ef970 in ?? () from target:/lib64/libc.so.6

Here we see the main event thread is running a migration
coroutine, and the migration code has gone into a blocking
sleep via qemu_sem_wait, which is a violation of the main
event thread contract.

> 
> Once the QEMU process falls into the above state in the presence of
> the network errors, live migration cannot be canceled gracefully,
> leaving the destination VM in the "paused" state, since the QEMU
> process on the destination side doesn't respond to the QMP command
> "migrate_cancel".
> 
> To fix that, make the main thread yield to the main loop after waiting
> too long for the multi-channels to finish receiving data during one
> iteration. 10 seconds is a sufficient timeout period to set.
> 
> Signed-off-by: Hyman Huang <yong.hu...@smartx.com>
> ---
>  migration/multifd.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/migration/multifd.c b/migration/multifd.c
> index b255778855..aca0aeb341 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -1228,6 +1228,16 @@ void multifd_recv_sync_main(void)
>              }
>          }
>          trace_multifd_recv_sync_main_signal(p->id);
> +        do {
> +            if (qemu_sem_timedwait(&multifd_recv_state->sem_sync, 10000) == 
> 0) {
> +                break;
> +            }
> +            if (qemu_in_coroutine()) {
> +                aio_co_schedule(qemu_get_current_aio_context(),
> +                                qemu_coroutine_self());
> +                qemu_coroutine_yield();
> +            }
> +        } while (1);

This tries to workaround the violation of the event loop contract using
short timeouts for the semaphore wait, but IMHO that is just papering
over the design flaw.

The migration code should not be using semaphores at all for sync purposes
if it wants to be running in a coroutine from the event loop thread. It
either needs to use some synchronization mechanism that can be polled by
the event thread in a non-blocking manner, or this code needs to move to
a background thread instead of a coroutine.

>          qemu_sem_post(&p->sem_sync);
>      }
>      trace_multifd_recv_sync_main(multifd_recv_state->packet_num);
> -- 
> 2.27.0
> 
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Reply via email to