* Peter Maydell (peter.mayd...@linaro.org) wrote:
> migration-test hung again during 'make check'.

ccing in Peter Xu

> Process tree:
> 
> ubuntu   42067  0.0  0.0   5460  3156 ?        S    13:55   0:00
>            \_ bash -o pipefail -c echo
> 'MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}
> QTEST_QEMU_IMG=./qemu-img
> G_TEST_DBUS_DAEMON=/home/ubuntu/qemu/tests/dbus-vmstate-daemon.sh
> QTEST_QEMU_BINARY=./qemu-system-aarch64
> QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon
> tests/qtest/migration-test --tap -k' &&
> MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}
> QTEST_QEMU_IMG=./qemu-img
> G_TEST_DBUS_DAEMON=/home/ubuntu/qemu/tests/dbus-vmstate-daemon.sh
> QTEST_QEMU_BINARY=./qemu-system-aarch64
> QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon
> tests/qtest/migration-test --tap -k -m quick < /dev/null |
> ./scripts/tap-driver.pl --test-name="qtest-aarch64/migration-test"
> ubuntu   42070  0.0  0.0  78844  3184 ?        Sl   13:55   0:01
>                \_ tests/qtest/migration-test --tap -k -m quick
> ubuntu   43248  0.0  4.1 1401368 160852 ?      Sl   13:55   0:03
>                |   \_ ./qemu-system-aarch64 -qtest
> unix:/tmp/qtest-42070.sock -qtest-log /dev/null -chardev
> socket,path=/tmp/qtest-42070.qmp,id=char0 -mon
> chardev=char0,mode=control -display none -accel kvm -accel tcg
> -machine virt,gic-version=max -name source,debug-threads=on -m 150M
> -serial file:/tmp/migration-test-SL7EkN/src_serial -cpu max -kernel
> /tmp/migration-test-SL7EkN/bootsect -accel qtest
> ubuntu   43256  0.0  1.4 1394208 57648 ?       Sl   13:55   0:00
>                |   \_ ./qemu-system-aarch64 -qtest
> unix:/tmp/qtest-42070.sock -qtest-log /dev/null -chardev
> socket,path=/tmp/qtest-42070.qmp,id=char0 -mon
> chardev=char0,mode=control -display none -accel kvm -accel tcg
> -machine virt,gic-version=max -name target,debug-threads=on -m 150M
> -serial file:/tmp/migration-test-SL7EkN/dest_serial -incoming
> unix:/tmp/migration-test-SL7EkN/migsocket -cpu max -kernel
> /tmp/migration-test-SL7EkN/bootsect -accel qtest
> ubuntu   42071  0.0  0.2  14116 11372 ?        S    13:55   0:00
>                \_ perl ./scripts/tap-driver.pl
> --test-name=qtest-aarch64/migration-test
> 
> Backtrace, migration-test process:
> 
> Thread 2 (Thread 0x3ff8ff7f910 (LWP 42075)):
> #0  syscall () at ../sysdeps/unix/sysv/linux/s390/s390-64/syscall.S:53
> #1  0x000002aa1d0d1c1c in qemu_futex_wait (val=<optimized out>,
> f=<optimized out>) at /home/ubuntu/qemu/include/qemu/futex.h:29
> #2  qemu_event_wait (ev=ev@entry=0x2aa1d0fda58 <rcu_call_ready_event>)
> at ../../util/qemu-thread-posix.c:480
> #3  0x000002aa1d0d0884 in call_rcu_thread (opaque=opaque@entry=0x0) at
> ../../util/rcu.c:258
> #4  0x000002aa1d0d0c32 in qemu_thread_start (args=<optimized out>) at
> ../../util/qemu-thread-posix.c:541
> #5  0x000003ff90207aa8 in start_thread (arg=0x3ff8ff7f910) at
> pthread_create.c:463
> #6  0x000003ff900fa11e in thread_start () at
> ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> 
> Thread 1 (Thread 0x3ff905f7c00 (LWP 42070)):
> #0  0x000003ff90212978 in __waitpid (pid=<optimized out>,
> stat_loc=stat_loc@entry=0x2aa1d5e06bc, options=options@entry=0)
>     at ../sysdeps/unix/sysv/linux/waitpid.c:30
> #1  0x000002aa1d0a2176 in qtest_kill_qemu (s=0x2aa1d5e06b0) at
> ../../tests/qtest/libqtest.c:144
> #2  0x000003ff903c0de6 in g_hook_list_invoke () from
> /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> #3  <signal handler called>
> #4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> #5  0x000003ff9003f3ca in __GI_abort () at abort.c:79
> #6  0x000003ff903fcbb2 in g_assertion_message () from
> /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> #7  0x000003ff903fd2b4 in g_assertion_message_cmpstr () from
> /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> #8  0x000002aa1d0a10f6 in check_migration_status
> (ungoals=0x3ffffdfe198, goal=0x2aa1d0d75c0 "postcopy-paused",
> who=0x2aa1d5dee90)

            g_assert_cmpstr(current_status, !=,  *ungoal);

>     at ../../tests/qtest/migration-helpers.c:146
> #9  wait_for_migration_status (who=0x2aa1d5dee90, goal=<optimized
> out>, ungoals=0x3ffffdfe198)
>     at ../../tests/qtest/migration-helpers.c:156
> #10 0x000002aa1d09f610 in test_postcopy_recovery () at
> ../../tests/qtest/migration-test.c:773

    wait_for_migration_status(from, "postcopy-paused",
                              (const char * []) { "failed", "active",
                                                  "completed", NULL });

so I think that means it hit one of failed/active/completed rather than
the postcopy-paused it expected (shame we can't tell which)

> #11 0x000003ff903fc70c in ?? () from /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> #12 0x000003ff903fc648 in ?? () from /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> #13 0x000003ff903fc648 in ?? () from /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> #14 0x000003ff903fc648 in ?? () from /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> #15 0x000003ff903fc8ee in g_test_run_suite () from
> /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> #16 0x000003ff903fc928 in g_test_run () from
> /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> #17 0x000002aa1d09d8f2 in main (argc=<optimized out>, argv=<optimized
> out>) at ../../tests/qtest/migration-test.c:1495
> 
> 
> Backtrace, QEMU process 43248:
> 
> Thread 5 (Thread 0x3ff8d7fe910 (LWP 43361)):
> #0  0x000003ff9d110582 in futex_abstimed_wait_cancelable (private=0,
> abstime=0x0, expected=0, futex_word=0x2aa1e27dfac)
>     at ../sysdeps/unix/sysv/linux/futex-internal.h:205
> #1  0x000003ff9d110582 in do_futex_wait (sem=sem@entry=0x2aa1e27dfa8,
> abstime=0x0) at sem_waitcommon.c:111
> #2  0x000003ff9d11068c in __new_sem_wait_slow
> (sem=sem@entry=0x2aa1e27dfa8, abstime=0x0) at sem_waitcommon.c:181
> #3  0x000003ff9d110734 in __new_sem_wait (sem=sem@entry=0x2aa1e27dfa8)
> at sem_wait.c:42
> #4  0x000002aa1b6dfb96 in qemu_sem_wait (sem=0x2aa1e27dfa8,
> sem@entry=<error reading variable: value has been optimized out>)
>     at ../../util/qemu-thread-posix.c:357
> #5  0x000002aa1aefe5b0 in postcopy_pause_return_path_thread
> (s=0x2aa1e27dc00) at ../../migration/migration.c:2654
> #6  0x000002aa1aefe5b0 in source_return_path_thread
> (opaque=opaque@entry=0x2aa1e27dc00) at
> ../../migration/migration.c:2854
> #7  0x000002aa1b6ded72 in qemu_thread_start (args=<optimized out>) at
> ../../util/qemu-thread-posix.c:541
> #8  0x000003ff9d107aa8 in start_thread (arg=0x3ff8d7fe910) at
> pthread_create.c:463
> #9  0x000003ff9cffa11e in thread_start () at
> ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65

OK, source waiting for recovery - my reading is we don't send the
recovery until the next command in the test.

> 
> Thread 4 (Thread 0x3ff8ed42910 (LWP 43252)):
> #0  0x000003ff9d10d9fe in futex_wait_cancelable (private=<optimized
> out>, expected=0, futex_word=0x2aa1e49ca68)
>     at ../sysdeps/unix/sysv/linux/futex-internal.h:88
> #1  0x000003ff9d10d9fe in __pthread_cond_wait_common (abstime=0x0,
> mutex=0x2aa1c1a4840 <qemu_global_mutex>, cond=0x2aa1e49ca40)
>     at pthread_cond_wait.c:502
> #2  0x000003ff9d10d9fe in __pthread_cond_wait
> (cond=cond@entry=0x2aa1e49ca40, mutex=mutex@entry=0x2aa1c1a4840
> <qemu_global_mutex>)
>     at pthread_cond_wait.c:655
> #3  0x000002aa1b6df59e in qemu_cond_wait_impl (cond=0x2aa1e49ca40,
> mutex=0x2aa1c1a4840 <qemu_global_mutex>, file=0x2aa1b8a93c6
> "../../softmmu/cpus.c", line=<optimized out>) at
> ../../util/qemu-thread-posix.c:194
> #4  0x000002aa1b40764e in qemu_wait_io_event
> (cpu=cpu@entry=0x2aa1e43aa50) at ../../softmmu/cpus.c:419
> #5  0x000002aa1b406916 in mttcg_cpu_thread_fn
> (arg=arg@entry=0x2aa1e43aa50) at
> ../../accel/tcg/tcg-accel-ops-mttcg.c:98
> #6  0x000002aa1b6ded72 in qemu_thread_start (args=<optimized out>) at
> ../../util/qemu-thread-posix.c:541
> #7  0x000003ff9d107aa8 in start_thread (arg=0x3ff8ed42910) at
> pthread_create.c:463
> #8  0x000003ff9cffa11e in thread_start () at
> ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> 
> Thread 3 (Thread 0x3ff8f735910 (LWP 43251)):
> #0  0x000003ff9cfef1b0 in __GI___poll (fds=0x3ff64003620, nfds=3,
> timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:29
> #1  0x000003ff9e852624 in  () at /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> #2  0x000003ff9e852aa8 in g_main_loop_run () at
> /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> #3  0x000002aa1b4eaf0e in iothread_run
> (opaque=opaque@entry=0x2aa1df83b80) at ../../iothread.c:73
> #4  0x000002aa1b6ded72 in qemu_thread_start (args=<optimized out>) at
> ../../util/qemu-thread-posix.c:541
> #5  0x000003ff9d107aa8 in start_thread (arg=0x3ff8f735910) at
> pthread_create.c:463
> #6  0x000003ff9cffa11e in thread_start () at
> ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> 
> Thread 2 (Thread 0x3ff90037910 (LWP 43250)):
> #0  0x000003ff9cff52ea in syscall () at
> ../sysdeps/unix/sysv/linux/s390/s390-64/syscall.S:53
> #1  0x000002aa1b6dfd5c in qemu_futex_wait (val=<optimized out>,
> f=<optimized out>) at /home/ubuntu/qemu/include/qemu/futex.h:29
> #2  0x000002aa1b6dfd5c in qemu_event_wait (ev=ev@entry=0x2aa1c1c3888
> <rcu_call_ready_event>) at ../../util/qemu-thread-posix.c:480
> #3  0x000002aa1b6d917c in call_rcu_thread (opaque=opaque@entry=0x0) at
> ../../util/rcu.c:258
> #4  0x000002aa1b6ded72 in qemu_thread_start (args=<optimized out>) at
> ../../util/qemu-thread-posix.c:541
> #5  0x000003ff9d107aa8 in start_thread (arg=0x3ff90037910) at
> pthread_create.c:463
> #6  0x000003ff9cffa11e in thread_start () at
> ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> 
> Thread 1 (Thread 0x3ff9feb5470 (LWP 43248)):
> #0  0x000003ff9cfef302 in __GI_ppoll (fds=0x2aa1ed28d00, nfds=6,
> timeout=<optimized out>,
>     timeout@entry=0x3fff517ecf8, sigmask=sigmask@entry=0x0) at
> ../sysdeps/unix/sysv/linux/ppoll.c:39
> #1  0x000002aa1b701fec in ppoll (__ss=0x0, __timeout=0x3fff517ecf8,
> __nfds=<optimized out>, __fds=<optimized out>)
>     at /usr/include/s390x-linux-gnu/bits/poll2.h:77
> #2  0x000002aa1b701fec in qemu_poll_ns (fds=<optimized out>,
> nfds=<optimized out>, timeout=timeout@entry=1000000000)
>     at ../../util/qemu-timer.c:348
> #3  0x000002aa1b6d2f98 in os_host_main_loop_wait (timeout=1000000000)
> at ../../util/main-loop.c:250
> #4  0x000002aa1b6d2f98 in main_loop_wait
> (nonblocking=nonblocking@entry=0) at ../../util/main-loop.c:531
> #5  0x000002aa1b496d44 in qemu_main_loop () at ../../softmmu/runstate.c:726
> #6  0x000002aa1ae6d3f0 in main (argc=<optimized out>, argv=<optimized
> out>, envp=<optimized out>) at ../../softmmu/main.c:50
> 
> Backtrace, QEMU process 43256:
> 
> Thread 6 (Thread 0x3ff726ff910 (LWP 43440)):
> #0  0x000003ffb9f10582 in futex_abstimed_wait_cancelable (private=0,
> abstime=0x0, expected=0, futex_word=0x2aa1a75d2d4)
>     at ../sysdeps/unix/sysv/linux/futex-internal.h:205
> #1  0x000003ffb9f10582 in do_futex_wait (sem=sem@entry=0x2aa1a75d2d0,
> abstime=0x0) at sem_waitcommon.c:111
> #2  0x000003ffb9f1068c in __new_sem_wait_slow
> (sem=sem@entry=0x2aa1a75d2d0, abstime=0x0) at sem_waitcommon.c:181
> #3  0x000003ffb9f10734 in __new_sem_wait (sem=sem@entry=0x2aa1a75d2d0)
> at sem_wait.c:42
> #4  0x000002aa17ddfb96 in qemu_sem_wait (sem=sem@entry=0x2aa1a75d2d0)
> at ../../util/qemu-thread-posix.c:357
> #5  0x000002aa17599f06 in postcopy_pause_incoming (mis=<optimized
> out>) at ../../migration/savevm.c:2600

So the dest is still waiting to be woken up from pause? Hmm I thought
the test had already given it a kick.

> #6  0x000002aa17599f06 in qemu_loadvm_state_main
> (f=f@entry=0x2aa1a751610, mis=mis@entry=0x2aa1a75d130)
>     at ../../migration/savevm.c:2674
> #7  0x000002aa1759af4a in postcopy_ram_listen_thread
> (opaque=opaque@entry=0x0) at ../../migration/savevm.c:1872
> #8  0x000002aa17dded72 in qemu_thread_start (args=<optimized out>) at
> ../../util/qemu-thread-posix.c:541
> #9  0x000003ffb9f07aa8 in start_thread (arg=0x3ff726ff910) at
> pthread_create.c:463
> #10 0x000003ffb9dfa11e in thread_start () at
> ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> 
> Thread 5 (Thread 0x3ff88ffe910 (LWP 43439)):
> #0  0x000003ffb9f10582 in futex_abstimed_wait_cancelable (private=0,
> abstime=0x0, expected=0, futex_word=0x2aa1a75d2fc)
>     at ../sysdeps/unix/sysv/linux/futex-internal.h:205
> #1  0x000003ffb9f10582 in do_futex_wait (sem=sem@entry=0x2aa1a75d2f8,
> abstime=0x0) at sem_waitcommon.c:111
> #2  0x000003ffb9f1068c in __new_sem_wait_slow
> (sem=sem@entry=0x2aa1a75d2f8, abstime=0x0) at sem_waitcommon.c:181
> #3  0x000003ffb9f10734 in __new_sem_wait (sem=sem@entry=0x2aa1a75d2f8)
> at sem_wait.c:42
> #4  0x000002aa17ddfb96 in qemu_sem_wait (sem=0x2aa1a75d2f8,
> sem@entry=<error reading variable: value has been optimized out>)
>     at ../../util/qemu-thread-posix.c:357
> #5  0x000002aa177ae6aa in postcopy_pause_fault_thread (mis=<optimized
> out>) at ../../migration/postcopy-ram.c:847
> #6  0x000002aa177ae6aa in postcopy_ram_fault_thread
> (opaque=opaque@entry=0x2aa1a75d130) at
> ../../migration/postcopy-ram.c:911
> #7  0x000002aa17dded72 in qemu_thread_start (args=<optimized out>) at
> ../../util/qemu-thread-posix.c:541
> #8  0x000003ffb9f07aa8 in start_thread (arg=0x3ff88ffe910) at
> pthread_create.c:463
> #9  0x000003ffb9dfa11e in thread_start () at
> ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> 
> Thread 4 (Thread 0x3ff89b18910 (LWP 43271)):
> Python Exception <class 'gdb.error'> PC not saved:
> 
> Thread 3 (Thread 0x3ff8a319910 (LWP 43270)):
> #0  0x000003ffb9def1b0 in __GI___poll (fds=0x3ff84003620, nfds=3,
> timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:29
> #1  0x000003ffbb652624 in  () at /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> #2  0x000003ffbb652aa8 in g_main_loop_run () at
> /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> #3  0x000002aa17beaf0e in iothread_run
> (opaque=opaque@entry=0x2aa1a3aeb80) at ../../iothread.c:73
> #4  0x000002aa17dded72 in qemu_thread_start (args=<optimized out>) at
> ../../util/qemu-thread-posix.c:541
> #5  0x000003ffb9f07aa8 in start_thread (arg=0x3ff8a319910) at
> pthread_create.c:463
> #6  0x000003ffb9dfa11e in thread_start () at
> ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> 
> Thread 2 (Thread 0x3fface37910 (LWP 43267)):
> #0  0x000003ffb9df52ea in syscall () at
> ../sysdeps/unix/sysv/linux/s390/s390-64/syscall.S:53
> #1  0x000002aa17ddfd5c in qemu_futex_wait (val=<optimized out>,
> f=<optimized out>) at /home/ubuntu/qemu/include/qemu/futex.h:29
> #2  0x000002aa17ddfd5c in qemu_event_wait (ev=ev@entry=0x2aa188c3888
> <rcu_call_ready_event>) at ../../util/qemu-thread-posix.c:480
> #3  0x000002aa17dd917c in call_rcu_thread (opaque=opaque@entry=0x0) at
> ../../util/rcu.c:258
> #4  0x000002aa17dded72 in qemu_thread_start (args=<optimized out>) at
> ../../util/qemu-thread-posix.c:541
> #5  0x000003ffb9f07aa8 in start_thread (arg=0x3fface37910) at
> pthread_create.c:463
> #6  0x000003ffb9dfa11e in thread_start () at
> ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> 
> Thread 1 (Thread 0x3ffbccb5470 (LWP 43256)):
> #0  0x000003ffb9f0d9fe in futex_wait_cancelable (private=<optimized
> out>, expected=0, futex_word=0x2aa188a47e8 <qemu_pause_cond+40>)
>     at ../sysdeps/unix/sysv/linux/futex-internal.h:88
> #1  0x000003ffb9f0d9fe in __pthread_cond_wait_common (abstime=0x0,
> mutex=0x2aa188a4840 <qemu_global_mutex>, cond=0x2aa188a47c0
> <qemu_pause_cond>) at pthread_cond_wait.c:502
> #2  0x000003ffb9f0d9fe in __pthread_cond_wait
> (cond=cond@entry=0x2aa188a47c0 <qemu_pause_cond>,
> mutex=mutex@entry=0x2aa188a4840 <qemu_global_mutex>) at
> pthread_cond_wait.c:655
> #3  0x000002aa17ddf59e in qemu_cond_wait_impl (cond=0x2aa188a47c0
> <qemu_pause_cond>, mutex=0x2aa188a4840 <qemu_global_mutex>,
> file=0x2aa17fa93c6 "../../softmmu/cpus.c", line=<optimized out>) at
> ../../util/qemu-thread-posix.c:194
> #4  0x000002aa17b079e2 in pause_all_vcpus () at ../../softmmu/cpus.c:562
> #5  0x000002aa17b07aec in do_vm_stop
> (state=state@entry=RUN_STATE_SHUTDOWN,
> send_stop=send_stop@entry=false)
>     at ../../softmmu/cpus.c:261

Hang on; how is the dest (?) in shutdown?

> #6  0x000002aa17b07bb6 in vm_shutdown () at ../../softmmu/cpus.c:280
> #7  0x000002aa17b9716e in qemu_cleanup () at ../../softmmu/runstate.c:812
> #8  0x000002aa1756d3f6 in main (argc=<optimized out>, argv=<optimized
> out>, envp=<optimized out>) at ../../softmmu/main.c:51
> 
> No, I dunno why thread 4 was odd. Backtracing that thread seems to
> only work about one time in two. Here's the backtrace from a
> success:

That maybe due to postcopy userfault; it can confuse things sometimes.

> #0  0x000002aa17b1ca2e in load_memop (op=MO_8, haddr=0x3ff80c6d000) at
> ../../accel/tcg/cputlb.c:1860

So just a read, on a page boundary?  So possibly triggered a page read
over postcopy?

Dave

> #1  0x000002aa17b1ca2e in load_helper (full_load=<optimized out>,
> code_read=false, op=MO_8, retaddr=4396070052416, oi=<optimized out>,
> addr=1177997312, env=0x2aa1a86f890) at ../../accel/tcg/cputlb.c:1980
> #2  0x000002aa17b1ca2e in full_ldub_mmu (env=0x2aa1a86f890,
> addr=1177997312, oi=<optimized out>, retaddr=4396070052416)
>     at ../../accel/tcg/cputlb.c:1996
> #3  0x000003ff8a31a35e in code_gen_buffer ()
> #4  0x000002aa17bbabb4 in cpu_tb_exec (tb_exit=<synthetic pointer>,
> itb=<optimized out>, cpu=<optimized out>)
>     at ../../accel/tcg/cpu-exec.c:353
> #5  0x000002aa17bbabb4 in cpu_loop_exec_tb (tb_exit=<synthetic
> pointer>, last_tb=<synthetic pointer>, tb=<optimized out>,
> cpu=<optimized out>) at ../../accel/tcg/cpu-exec.c:812
> #6  0x000002aa17bbabb4 in cpu_exec (cpu=<optimized out>) at
> ../../accel/tcg/cpu-exec.c:970
> #7  0x000002aa18764620 in  ()
> 
> -- PMM
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK


Reply via email to