[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system
Alright, here is what is happening: Whenever program is stuck, thread #2 backtrace is this: (gdb) bt #0 syscall () at ../sysdeps/unix/sysv/linux/aarch64/syscall.S:38 #1 0xaabd41b0 in qemu_futex_wait (val=, f=) at ./util/qemu-thread-posix.c:438 #2 qemu_event_wait (ev=ev@entry=0xaac87ce8 ) at ./util/qemu-thread-posix.c:442 #3 0xaabee03c in call_rcu_thread (opaque=opaque@entry=0x0) at ./util/rcu.c:261 #4 0xaabd34c8 in qemu_thread_start (args=) at ./util/qemu-thread-posix.c:498 #5 0xbf26a880 in start_thread (arg=0xf5bf) at pthread_create.c:486 #6 0xbf1c4b9c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78 Meaning that code is waiting for a futex inside kernel. (gdb) print rcu_call_ready_event $4 = {value = 4294967295, initialized = true} The QemuEvent "rcu_call_ready_event->value" is set to INT_MAX and I don't know why yet. rcu_call_ready_event->value is only touched by: qemu_event_init() -> bool init ? EV_SET : EV_FREE qemu_event_reset() -> atomic_or(&ev->value, EV_FREE) qemu_event_set() -> atomic_xchg(&ev->value, EV_SET) qemu_event_wait() -> atomic_cmpxchg(&ev->value, EV_FREE, EV_BUSY)' And there should be no 0x7fff value for "ev->value". qemu_event_init() is the one initializing the global: static QemuEvent rcu_call_ready_event; and it is called by "rcu_init_complete()" which is called by "rcu_init()": static void __attribute__((__constructor__)) rcu_init(void) a constructor function. So, "fixing" this issue by: (gdb) print rcu_call_ready_event $8 = {value = 4294967295, initialized = true} (gdb) watch rcu_call_ready_event Hardware watchpoint 1: rcu_call_ready_event (gdb) set rcu_call_ready_event.initialized = 1 (gdb) set rcu_call_ready_event.value = 0 and note that I added a watchpoint to rcu_call_ready_event global: Thread 1 "qemu-img" received signal SIGINT, Interrupt. (gdb) thread 2 [Switching to thread 2 (Thread 0xbec61d90 (LWP 33625))] (gdb) bt #0 0xaabd4110 in qemu_event_reset (ev=ev@entry=0xaac87ce8 ) #1 0xaabedff8 in call_rcu_thread (opaque=opaque@entry=0x0) at ./util/rcu.c:255 #2 0xaabd34c8 in qemu_thread_start (args=) at ./util/qemu-thread-posix.c:498 #3 0xbf26a880 in start_thread (arg=0xf5bf) at pthread_create.c:486 #4 0xbf1c4b9c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78 (gdb) print rcu_call_ready_event $9 = {value = 0, initialized = true} You can see I advanced in the qemu_event_{reset,set,wait} logic. (gdb) disassemble /m 0xaabd4110 Dump of assembler code for function qemu_event_reset: 408 in ./util/qemu-thread-posix.c 409 in ./util/qemu-thread-posix.c 410 in ./util/qemu-thread-posix.c 411 in ./util/qemu-thread-posix.c 0xaabd40f0 <+0>: ldrbw1, [x0, #4] 0xaabd40f4 <+4>: cbz w1, 0xaabd411c 0xaabd411c <+44>:stp x29, x30, [sp, #-16]! 0xaabd4120 <+48>:adrpx3, 0xaac2 0xaabd4124 <+52>:add x3, x3, #0x908 0xaabd4128 <+56>:mov x29, sp 0xaabd412c <+60>:adrpx1, 0xaac2 0xaabd4130 <+64>:adrpx0, 0xaac2 0xaabd4134 <+68>:add x3, x3, #0x290 0xaabd4138 <+72>:add x1, x1, #0xc00 0xaabd413c <+76>:add x0, x0, #0xd40 0xaabd4140 <+80>:mov w2, #0x19b// #411 0xaabd4144 <+84>:bl 0xaaaff190 <__assert_fail@plt> 412 in ./util/qemu-thread-posix.c 0xaabd40f8 <+8>: ldr w1, [x0] 413 in ./util/qemu-thread-posix.c 0xaabd40fc <+12>:dmb ishld 414 in ./util/qemu-thread-posix.c 0xaabd4100 <+16>:cbz w1, 0xaabd4108 0xaabd4104 <+20>:ret 0xaabd4108 <+24>:ldaxr w1, [x0] 0xaabd410c <+28>:orr w1, w1, #0x1 => 0xaabd4110 <+32>:stlxr w2, w1, [x0] 0xaabd4114 <+36>:cbnzw2, 0xaabd4108 0xaabd4118 <+40>:ret And I'm currently inside the STLXR and LDAXR logic. To make sure my program counter is advancing, I added a breakpoint at 0xaabd4108, so CBNZ instruction would branch indefinitely into LDXAR instruction again, until the LDAXR<->STLXR logic is satisfied (inside qemu_event_wait()). (gdb) break *(0xaabd4108) Breakpoint 2 at 0xaabd4108: file ./util/qemu-thread-posix.c, line 414. which is basically this: if (value == EV_SET) {EV_SET == 0 atomic_or(&ev->value, EV_FREE); EV_FREE = 1 } and we can see that this logic being called one time after another: (gdb) c Thread 2 "qemu-img" hit Breakpoint 3, 0xaabd4108 in qemu_eve
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system
Alright, I'm still investigating this but wanted to share some findings... I haven't got a kernel dump yet after the task is frozen, I have analyzed only the userland part of it (although I have checked if code was running inside kernel with perf cycles:u/cycles:k at some point). The big picture is this: Whenever qemu-img hangs, we have 3 hung tasks basically with these stacks: TRHREAD #1 __GI_ppoll (../sysdeps/unix/sysv/linux/ppoll.c:39) ppoll (/usr/include/aarch64-linux-gnu/bits/poll2.h:77) qemu_poll_ns (./util/qemu-timer.c:322) os_host_main_loop_wait (./util/main-loop.c:233) main_loop_wait (./util/main-loop.c:497) convert_do_copy (./qemu-img.c:1981) img_convert (./qemu-img.c:2457) main (./qemu-img.c:4976) got stack traces: ./33293/stack ./33293/stack [<0>] __switch_to+0xc0/0x218 [<0>] __switch_to+0xc0/0x218 [<0>] ptrace_stop+0x148/0x2b0 [<0>] do_sys_poll+0x508/0x5c0 [<0>] get_signal+0x5a4/0x730 [<0>] __arm64_sys_ppoll+0xc0/0x118 [<0>] do_notify_resume+0x158/0x358 [<0>] el0_svc_common+0xa0/0x168 [<0>] work_pending+0x8/0x10[<0>] el0_svc_handler+0x38/0x78 [<0>] el0_svc+0x8/0xc root@d06-1:~$ perf record -F -e cycles:u -p 33293 -- sleep 10 [ perf record: Woken up 6 times to write data ] [ perf record: Captured and wrote 1.871 MB perf.data (48730 samples) ] root@d06-1:~$ perf report --stdio # Overhead Command Shared Object Symbol # .. .. # 37.82% qemu-img libc-2.29.so[.] 0x000df710 21.81% qemu-img [unknown] [k] 0x10099504 14.23% qemu-img [unknown] [k] 0x10085dc0 9.13% qemu-img [unknown] [k] 0x1008fff8 6.47% qemu-img libc-2.29.so[.] 0x000df708 5.69% qemu-img qemu-img[.] qemu_event_reset 2.57% qemu-img libc-2.29.so[.] 0x000df678 0.63% qemu-img libc-2.29.so[.] 0x000df700 0.49% qemu-img libc-2.29.so[.] __sigtimedwait 0.42% qemu-img libpthread-2.29.so [.] __libc_sigwait TRHREAD #3 __GI___sigtimedwait (../sysdeps/unix/sysv/linux/sigtimedwait.c:29) __sigwait (linux/sigwait.c:28) qemu_thread_start (./util/qemu-thread-posix.c:498) start_thread (pthread_create.c:486) thread_start (linux/aarch64/clone.S:78) ./33303/stack ./33303/stack [<0>] __switch_to+0xc0/0x218 [<0>] __switch_to+0xc0/0x218 [<0>] ptrace_stop+0x148/0x2b0 [<0>] do_sigtimedwait.isra.9+0x194/0x288 [<0>] get_signal+0x5a4/0x730 [<0>] __arm64_sys_rt_sigtimedwait+0xac/0x110 [<0>] do_notify_resume+0x158/0x358 [<0>] el0_svc_common+0xa0/0x168 [<0>] work_pending+0x8/0x10[<0>] el0_svc_handler+0x38/0x78 [<0>] el0_svc+0x8/0xc root@d06-1:~$ perf record -F -e cycles:u -p 33303 -- sleep 10 [ perf record: Woken up 6 times to write data ] [ perf record: Captured and wrote 1.905 MB perf.data (49647 samples) ] root@d06-1:~$ perf report --stdio # Overhead Command Shared Object Symbol # .. .. # 45.37% qemu-img libc-2.29.so[.] 0x000df710 23.52% qemu-img [unknown] [k] 0x10099504 9.08% qemu-img [unknown] [k] 0x1008fff8 8.89% qemu-img [unknown] [k] 0x10085dc0 5.56% qemu-img libc-2.29.so[.] 0x000df708 3.66% qemu-img libc-2.29.so[.] 0x000df678 1.01% qemu-img libc-2.29.so[.] __sigtimedwait 0.80% qemu-img libc-2.29.so[.] 0x000df700 0.64% qemu-img qemu-img[.] qemu_event_reset 0.55% qemu-img libc-2.29.so[.] 0x000df718 0.52% qemu-img libpthread-2.29.so [.] __libc_sigwait TRHREAD #2 syscall (linux/aarch64/syscall.S:38) qemu_futex_wait (./util/qemu-thread-posix.c:438) qemu_event_wait (./util/qemu-thread-posix.c:442) call_rcu_thread (./util/rcu.c:261) qemu_thread_start (./util/qemu-thread-posix.c:498) start_thread (pthread_create.c:486) thread_start (linux/aarch64/clone.S:78) ./33302/stack ./33302/stack [<0>] __switch_to+0xc0/0x218 [<0>] __switch_to+0xc0/0x218 [<0>] ptrace_stop+0x148/0x2b0 [<0>] ptrace_stop+0x148/0x2b0 [<0>] get_signal+0x5a4/0x730 [<0>] get_signal+0x5a4/0x730 [<0>] do_notify_resume+0x1c4/0x358 [<0>] do_notify_resume+0x1c4/0x358 [<0>] work_pending+0x8/0x10[<0>] work_pending+0x8/0x10 root@d06-1:~$ perf report --stdio # Overhead Command Shared Object Symbol # ..
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system
Alright, with a d06 aarch64 machine I was able to reproduce it after 8 attempts.I'll debug it today and provide feedback on my findings. (gdb) bt full #0 0xb0b2181c in __GI_ppoll (fds=0xce5ab770, nfds=4, timeout=, timeout@entry=0x0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:39 _x3tmp = 0 _x0tmp = 187650583213936 _x0 = 187650583213936 _x3 = 0 _x4tmp = 8 _x1tmp = 4 _x1 = 4 _x4 = 8 _x2tmp = _x2 = 0 _x8 = 73 _sys_result = _sys_result = sc_cancel_oldtype = 0 sc_ret = tval = {tv_sec = 0, tv_nsec = 187650583137792} #1 0xcd2a773c in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 No locals. #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at ./util/qemu-timer.c:322 No locals. #3 0xcd2a8764 in os_host_main_loop_wait (timeout=-1) at ./util/main-loop.c:233 context = 0xce599d90 ret = context = ret = #4 main_loop_wait (nonblocking=) at ./util/main-loop.c:497 ret = timeout = 4294967295 timeout_ns = #5 0xcd1df454 in convert_do_copy (s=0xf9b2b1d8) at ./qemu-img.c:1981 ret = i = n = sector_num = ret = i = n = sector_num = #6 img_convert (argc=, argv=) at ./qemu-img.c:2457 c = bs_i = flags = 16898 src_flags = 0 fmt = 0xf9b2bad1 "qcow2" out_fmt = cache = 0xcd2cb1c8 "unsafe" src_cache = 0xcd2ca9c0 "writeback" out_baseimg = out_filename = out_baseimg_param = snapshot_name = 0x0 drv = proto_drv = bdi = {cluster_size = 65536, vm_state_offset = 32212254720, is_dirty = false, unallocated_blocks_are_zero = true, needs_compressed_writes = false} out_bs = opts = 0xce5ab390 sn_opts = 0x0 create_opts = 0xce5ab0c0 open_opts = options = 0x0 local_err = 0x0 writethrough = false src_writethrough = false quiet = image_opts = false skip_create = false progress = tgt_image_opts = false ret = force_share = false explict_min_sparse = false s = {src = 0xce577240, src_sectors = 0xce577300, src_num = 1, total_sectors = 62914560,allocated_sectors = 9572096, allocated_done = 6541440, sector_num = 8863744, wr_offs = 8859776, status = BLK_DATA, sector_next_status = 8863744, target = 0xce5bd2a0, has_zero_init = true,compressed = false, unallocated_blocks_are_zero = true, target_has_backing = false, target_backing_sectors = -1, wr_in_order = true, copy_range = false, min_sparse = 8, alignment = 8,cluster_sectors = 128, buf_sectors = 4096, num_coroutines = 8, running_coroutines = 8, co = {0xce5ceda0,0xce5cef50, 0xce5cf100, 0xce5cf2b0, 0xce5cf460, 0xce5cf610, 0xce5cf7c0,0xce5cf970, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, wait_sector_num = {-1, 8859904, 8860928, 8863360,8861952, 8862976, 8862592, 8861440, 0, 0, 0, 0, 0, 0, 0, 0}, lock = {locked = 0, ctx = 0x0, from_push = {slh_first = 0x0}, to_pop = {slh_first = 0x0}, handoff = 0, sequence = 0, holder = 0x0}, ret = -115} __PRETTY_FUNCTION__ = "img_convert" #7 0xcd1d8400 in main (argc=7, argv=) at ./qemu-img.c:4976 cmd = 0xcd34ad78 cmdname = local_error = 0x0 trace_file = 0x0 c = long_options = {{name = 0xcd2cbbb0 "help", has_arg = 0, flag = 0x0, val = 104}, { name = 0xcd2cbc78 "version", has_arg = 0, flag = 0x0, val = 86}, {name = 0xcd2cbc80 "trace", has_arg = 1, flag = 0x0, val = 84}, {name = 0x0, has_arg = 0, flag = 0x0, val = 0}} -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on high core count ARM system Status in QEMU: Confirmed Status in qemu package in Ubuntu: In Progress Bug description: On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system
Alright, I couldn't reproduce this yet, I'm running same test case in a 24 cores box and causing lots of context switches and CPU migrations in parallel (trying to exhaust the logic). Will let this running for sometime to check. Unfortunately this can be related QEMU AIO BH locking/primitives and cache coherency in the HW in question (which I got specs from: https://en.wikichip.org/wiki/hisilicon/kunpeng/hi1616): l1$ size8 MiB l1d$ size 4 MiB l1i$ size 4 MiB l2$ size32 MiB l3$ size64 MiB like for example when having 2 threads in different NUMA domains, or some other situation. I can't simulate the same since I have a SOC with: Cortex-A53 MPCore 24cores, L1 I/D=32KB/32KB L2 =256KB L3 =4MB and I'm not even close to L1/L2/L3 cache numbers from D06 =o). Just got a note that I'll be able to reproduce this in the real HW, will get back soon with real gdb debugging. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on high core count ARM system Status in QEMU: Confirmed Status in qemu package in Ubuntu: In Progress Bug description: On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system
OOhh nm on the virtual environment test, as I just remembered we don't have KVM on 2nd level for aarch64 yet (at least in ARMv8 implementing virt extension). I'll try to reproduce in the real env only. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on high core count ARM system Status in QEMU: Confirmed Status in qemu package in Ubuntu: In Progress Bug description: On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system
Hello Liz, I'll try to reproduce this issue in a Cortex-A53 aarch64 real environment (w/ 24 HW threads) AND in a virtual environment w/ lots of vCPUs... but, if it's a barrier missing - or the lack of atomicity and/or ordering in a primitive - then, I'm afraid the context switch in between vCPUs might not be the same as in real CPUs (IPIs are sent and handled differently and host kernel delays IPI delivery because of its own callbacks, before scheduling, etc...) and I could need a qemu dump from your environment. Would that be feasible ? Can you reproduce this nowadays ? This bug has aged a little, so I'm now sure! Could you provide me the dump caused by latest package available for your Ubuntu version ? This way I have the debug symbols to work with. Meanwhile, I'll be trying to reproduce on my side. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on high core count ARM system Status in QEMU: Confirmed Status in qemu package in Ubuntu: In Progress Bug description: On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system
** Changed in: qemu (Ubuntu) Status: Confirmed => In Progress ** Changed in: qemu (Ubuntu) Assignee: (unassigned) => Rafael David Tinoco (rafaeldtinoco) ** Changed in: qemu (Ubuntu) Importance: Undecided => Medium -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on high core count ARM system Status in QEMU: Confirmed Status in qemu package in Ubuntu: In Progress Bug description: On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system
** Also affects: qemu (Ubuntu) Importance: Undecided Status: New ** Changed in: qemu (Ubuntu) Status: New => Confirmed -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on high core count ARM system Status in QEMU: Confirmed Status in qemu package in Ubuntu: Confirmed Bug description: On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system
I can reproduce this problem with qemu.git/matser. It still exists in qemu.git/matser. I found that when an IO return in worker threads and want to call aio_notify to wake up main_loop, but it found that ctx->notify_me is cleared to 0 by main_loop in aio_ctx_check by calling atomic_and(&ctx->notify_me, ~1) . So worker thread won't write enventfd to notify main_loop.If such a scene happens, the main_loop will hang: main loopworker thread1 worker thread2 --- qemu_poll_nsaio_worker qemu_bh_schedule(pool->completion_bh) glib_pollfds_poll g_main_context_check aio_ctx_check atomic_and(&ctx->notify_me, ~1)aio_worker qemu_bh_schedule(pool->completion_bh) /* do something for event */ qemu_poll_ns /* hangs !!!*/ As we known, ctx->notify_me will be visited by worker thread and main loop. I thank we should add a lock protection for ctx->notify_me to avoid this happend.what do you thank so? -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on high core count ARM system Status in QEMU: Confirmed Bug description: On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system
frazier, Do you find the conditions that necessarily make this problem appear? -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on high core count ARM system Status in QEMU: Confirmed Bug description: On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system
Do you have any good ideas about it? Maybe somewhere lack of memeory barriers that cause it? -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on high core count ARM system Status in QEMU: Confirmed Bug description: On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system
No, sorry - this bugs still persists w/ latest upstream (@ afccfc0). I found a report of similar symptoms: https://patchwork.kernel.org/patch/10047341/ https://bugzilla.redhat.com/show_bug.cgi?id=1524770#c13 To be clear, ^ is already fixed upstream, so it is not the *same* issue - but perhaps related. ** Bug watch added: Red Hat Bugzilla #1524770 https://bugzilla.redhat.com/show_bug.cgi?id=1524770 -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on high core count ARM system Status in QEMU: Confirmed Bug description: On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system
** Changed in: qemu Status: New => Confirmed -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on high core count ARM system Status in QEMU: Confirmed Bug description: On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system
sorry, I make a spelling mistake here("Hi, I also found a problem that qemu-img convert hands in ARM.").The right is "I also found a problem that qemu-img convert hangs in ARM". -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on high core count ARM system Status in QEMU: New Bug description: On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system
Hi, I also found a problem that qemu-img convert hands in ARM. The convert command line is "qemu-img convert -f qcow2 -O raw disk.qcow2 disk.raw ". The bt is below: Thread 2 (Thread 0x4b776e50 (LWP 27215)): #0 0x4a3f2994 in sigtimedwait () from /lib64/libc.so.6 #1 0x4a39c60c in sigwait () from /lib64/libpthread.so.0 #2 0xaae82610 in sigwait_compat (opaque=0xc5163b00) at util/compatfd.c:37 #3 0xaae85038 in qemu_thread_start (args=args@entry=0xc5163b90) at util/qemu_thread_posix.c:496 #4 0x4a3918bc in start_thread () from /lib64/libpthread.so.0 #5 0x4a492b2c in thread_start () from /lib64/libc.so.6 Thread 1 (Thread 0x4b573370 (LWP 27214)): #0 0x4a489020 in ppoll () from /lib64/libc.so.6 #1 0xaadaefc0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=) at qemu_timer.c:391 #3 0xaadae014 in os_host_main_loop_wait (timeout=) at main_loop.c:272 #4 0xaadae190 in main_loop_wait (nonblocking=) at main_loop.c:534 #5 0xaad97be0 in convert_do_copy (s=0xdc32eb48) at qemu-img.c:1923 #6 0xaada2d70 in img_convert (argc=, argv=) at qemu-img.c:2414 #7 0xaad99ac4 in main (argc=7, argv=) at qemu-img.c:5305 Do you find the cause of the problem and fix it? Thanks for your reply! -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on high core count ARM system Status in QEMU: New Bug description: On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system
** Tags added: qemu-img -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on high core count ARM system Status in QEMU: New Bug description: On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system
ext4 filesystem, SATA drive: (gdb) thread apply all bt Thread 3 (Thread 0x9bffc9a0 (LWP 9015)): #0 0xaaa462cc in __GI___sigtimedwait (set=, set@entry=0xe725c070, info=info@entry=0x9bffbf18, timeout=0x3ff1, timeout@entry=0x0) at ../sysdeps/unix/sysv/linux/sigtimedwait.c:42 #1 0xaab7dfac in __sigwait (set=set@entry=0xe725c070, sig=sig@entry=0x9bffbff4) at ../sysdeps/unix/sysv/linux/sigwait.c:28 #2 0xd998a628 in sigwait_compat (opaque=0xe725c070) at util/compatfd.c:36 #3 0xd998bce0 in qemu_thread_start (args=) at util/qemu-thread-posix.c:498 #4 0xaab73088 in start_thread (arg=0xc528531f) at pthread_create.c:463 #5 0xaaae34ec in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78 Thread 2 (Thread 0xa0e779a0 (LWP 9014)): #0 syscall () at ../sysdeps/unix/sysv/linux/aarch64/syscall.S:38 #1 0xd998c9e8 in qemu_futex_wait (val=, f=) at /home/ubuntu/qemu/include/qemu/futex.h:29 #2 qemu_event_wait (ev=ev@entry=0xd9a091c0 ) at util/qemu-thread-posix.c:442 #3 0xd99a6834 in call_rcu_thread (opaque=) at util/rcu.c:261 #4 0xd998bce0 in qemu_thread_start (args=) at util/qemu-thread-posix.c:498 #5 0xaab73088 in start_thread (arg=0xc528542f) at pthread_create.c:463 #6 0xaaae34ec in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78 Thread 1 (Thread 0xa0fa8010 (LWP 9013)): #0 0xaaada154 in __GI_ppoll (fds=0xe7291dc0, nfds=187650771816320, timeout=, timeout@entry=0x0, sigmask=0xc52852e0) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xd9987f00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xd9988f80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xd98b7a30 in convert_do_copy (s=0xc52854e8) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xd98b033c in main (argc=7, argv=) at qemu-img.c:4975 -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on high core count ARM system Status in QEMU: New Bug description: On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system
Hi, can you do a `thread apply all bt` instead? If I were to bet, we're probably waiting for some slow call like lseek to return in another thread. What filesystem/blockdevice is involved here? -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on high core count ARM system Status in QEMU: New Bug description: On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions