Hi, On Fri, 11 Apr 2025 at 01:48, Fabiano Rosas <faro...@suse.de> wrote: > That's what it looks like. It could be some error condition that is not > being propagated properly. The thread hits an error and exits without > informing the rest of migration.
* The gdb(1) hanging in the postcopy_ram_fault_thread() is not conclusive. I tried to set following break-points gdb) break postcopy-ram.c:998 - poll_result = poll(pfd, pfd_len, -1 /* Wait forever */); gdb) break postcopy-ram.c:1057 - rb = qemu_ram_block_from_host(...); gdb(1) hangs for both of them, there might be another reason for it. Live-migration also stalls with it. > Some combination of the postcopy traces should give you that. Sorry, > Peter Xu really is the expert on postcopy, I just tag along. * I see. Maybe it could be logged with --migration-debug=<level> option. > The snippet I posted shows that it's the same page: > > (gdb) x/i $pc > => 0x7ffff5399d14 <__memcpy_evex_unaligned_erms+86>: rep movsb > %ds:(%rsi),%es:(%rdi) > (gdb) p/x $rsi > $1 = 0x7fffd68cc000 > === >> Thread 1 (Thread 0x7fbc4849df80 (LWP 7487) "qemu-system-x86"): ... >> Thread 10 (Thread 0x7fffce7fc700 (LWP 11778) "mig/dst/listen"): ... >> Thread 9 (Thread 0x7fffceffd700 (LWP 11777) "mig/dst/fault"): #0 0x00007ffff5314a89 in __GI___poll (fds=0x7fffc0000b60, nfds=2, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29 ... postcopy_ram_fault_thread_request Request for HVA=0x7fffd68cc000 rb=pc.ram offset=0xcc000 pid=11754 === * Looking at the above data, it seems the missing page fault occurred in thread=11754 , it may not be the memcpy(3) in thread-1(pid/tid=7487) that triggered the fault. * Secondly, if 'mig/dst/fault' thread is waiting at poll(2) call, ie. fault notification has not arrived on the mis->userfault_fd OR mis->userfault_event_fd descriptors yet. So the "Request for HVA=0x7fffd..." via postcopy_ram_fault_thread_request() could be an already served request. > Send your next version and I'll set some time aside to debug this. > > heads-up: I'll be off from 2025/04/18 until 2025/05/05. Peter should be > already back in the meantime. * Okay, I'll send the next version. Thank you. --- - Prasad