Paolo, While debugging hungs in ARM64 while doing a simple:
qemu-img convert -f qcow2 -O qcow2 file.qcow2 output.qcow2 I might have found 2 issues which I'd like you to review, if possible. ISSUE #1 ======== I've caught the following stack trace after an HUNG in qemu-img convert: (gdb) bt #0 syscall () #1 0x0000aaaaaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=ev@entry=0xaaaaaac86ce8 <rcu_call_ready_event>) #3 0x0000aaaaaabed05c in call_rcu_thread #4 0x0000aaaaaabd34c8 in qemu_thread_start #5 0x0000ffffbf25c880 in start_thread #6 0x0000ffffbf1b6b9c in thread_start () (gdb) print rcu_call_ready_event $4 = {value = 4294967295, initialized = true} value INT_MAX (4294967295) seems WRONG for qemu_futex_wait(): - EV_BUSY, being -1, and passed as an argument qemu_futex_wait(void *, unsigned), is a two's complement, making argument into a INT_MAX when that's not what is expected (unless I missed something). *** If that is the case, unsure if you, Paolo, prefer declaring *(QemuEvent)->value as an integer or changing EV_BUSY to "2" would okay here *** BUG: description: https://bugs.launchpad.net/qemu/+bug/1805256/comments/15 ======== ISSUE #2 ======== I found this when debugging lockups while in futex() in a specific ARM64 server - https://bugs.launchpad.net/qemu/+bug/1805256 - which I'm still investigating. After fixing the issue above, I'm still getting stuck into: qemu_event_wait() -> qemu_futex_wait() *** As if qemu_event_set() has ran before qemu_futex_wait() ever started running *** The Other threads are waiting for poll() on a PIPE coming from this stuck thread (thread #1), and in sigwait(): (gdb) thread 1 ... (gdb) bt #0 0x0000ffffbf1ad81c in __GI_ppoll #1 0x0000aaaaaabcf73c in ppoll #2 qemu_poll_ns #3 0x0000aaaaaabd0764 in os_host_main_loop_wait #4 main_loop_wait ... (gdb) thread 2 ... (gdb) bt #0 syscall () #1 0x0000aaaaaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=ev@entry=0xaaaaaac86ce8 <rcu_call_ready_event>) #3 0x0000aaaaaabed05c in call_rcu_thread #4 0x0000aaaaaabd34c8 in qemu_thread_start #5 0x0000ffffbf25c880 in start_thread #6 0x0000ffffbf1b6b9c in thread_start () (gdb) thread 3 ... (gdb) bt #0 0x0000ffffbf11aa20 in __GI___sigtimedwait #1 0x0000ffffbf2671b4 in __sigwait #2 0x0000aaaaaabd1ddc in sigwait_compat #3 0x0000aaaaaabd34c8 in qemu_thread_start #4 0x0000ffffbf25c880 in start_thread #5 0x0000ffffbf1b6b9c in thread_start QUESTION: - Should qemu_event_set() check return code from qemu_futex_wake()->qemu_futex()->syscall() in order to know if ANY waiter was ever woken up ? Maybe even loop until at least 1 is awaken ? Tks in advance, Rafael D. Tinoco