[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system

2019-09-10 Thread Rafael David Tinoco
Alright, here is what is happening:

Whenever program is stuck, thread #2 backtrace is this:

(gdb) bt
#0  syscall () at ../sysdeps/unix/sysv/linux/aarch64/syscall.S:38
#1  0xaabd41b0 in qemu_futex_wait (val=, f=) at ./util/qemu-thread-posix.c:438
#2  qemu_event_wait (ev=ev@entry=0xaac87ce8 ) at 
./util/qemu-thread-posix.c:442
#3  0xaabee03c in call_rcu_thread (opaque=opaque@entry=0x0) at 
./util/rcu.c:261
#4  0xaabd34c8 in qemu_thread_start (args=) at 
./util/qemu-thread-posix.c:498
#5  0xbf26a880 in start_thread (arg=0xf5bf) at 
pthread_create.c:486
#6  0xbf1c4b9c in thread_start () at 
../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Meaning that code is waiting for a futex inside kernel.

(gdb) print rcu_call_ready_event
$4 = {value = 4294967295, initialized = true}

The QemuEvent "rcu_call_ready_event->value" is set to INT_MAX and I
don't know why yet.

rcu_call_ready_event->value is only touched by:

qemu_event_init() -> bool init ? EV_SET : EV_FREE
qemu_event_reset() -> atomic_or(>value, EV_FREE)
qemu_event_set() -> atomic_xchg(>value, EV_SET)
qemu_event_wait() -> atomic_cmpxchg(>value, EV_FREE, EV_BUSY)'

And there should be no 0x7fff value for "ev->value".

qemu_event_init() is the one initializing the global:

static QemuEvent rcu_call_ready_event;

and it is called by "rcu_init_complete()" which is called by
"rcu_init()":

static void __attribute__((__constructor__)) rcu_init(void)

a constructor function.

So, "fixing" this issue by:

(gdb) print rcu_call_ready_event
$8 = {value = 4294967295, initialized = true}

(gdb) watch rcu_call_ready_event
Hardware watchpoint 1: rcu_call_ready_event

(gdb) set rcu_call_ready_event.initialized = 1

(gdb) set rcu_call_ready_event.value = 0

and note that I added a watchpoint to rcu_call_ready_event global:



Thread 1 "qemu-img" received signal SIGINT, Interrupt.
(gdb) thread 2
[Switching to thread 2 (Thread 0xbec61d90 (LWP 33625))]

(gdb) bt
#0  0xaabd4110 in qemu_event_reset (ev=ev@entry=0xaac87ce8 
)
#1  0xaabedff8 in call_rcu_thread (opaque=opaque@entry=0x0) at 
./util/rcu.c:255
#2  0xaabd34c8 in qemu_thread_start (args=) at 
./util/qemu-thread-posix.c:498
#3  0xbf26a880 in start_thread (arg=0xf5bf) at 
pthread_create.c:486
#4  0xbf1c4b9c in thread_start () at 
../sysdeps/unix/sysv/linux/aarch64/clone.S:78
(gdb) print rcu_call_ready_event
$9 = {value = 0, initialized = true}

You can see I advanced in the qemu_event_{reset,set,wait} logic.

(gdb) disassemble /m 0xaabd4110
Dump of assembler code for function qemu_event_reset:
408 in ./util/qemu-thread-posix.c

409 in ./util/qemu-thread-posix.c

410 in ./util/qemu-thread-posix.c
411 in ./util/qemu-thread-posix.c
   0xaabd40f0 <+0>: ldrbw1, [x0, #4]
   0xaabd40f4 <+4>: cbz w1, 0xaabd411c 

   0xaabd411c <+44>:stp x29, x30, [sp, #-16]!
   0xaabd4120 <+48>:adrpx3, 0xaac2
   0xaabd4124 <+52>:add x3, x3, #0x908
   0xaabd4128 <+56>:mov x29, sp
   0xaabd412c <+60>:adrpx1, 0xaac2
   0xaabd4130 <+64>:adrpx0, 0xaac2
   0xaabd4134 <+68>:add x3, x3, #0x290
   0xaabd4138 <+72>:add x1, x1, #0xc00
   0xaabd413c <+76>:add x0, x0, #0xd40
   0xaabd4140 <+80>:mov w2, #0x19b// #411
   0xaabd4144 <+84>:bl  0xaaaff190 <__assert_fail@plt>

412 in ./util/qemu-thread-posix.c
   0xaabd40f8 <+8>: ldr w1, [x0]

413 in ./util/qemu-thread-posix.c
   0xaabd40fc <+12>:dmb ishld

414 in ./util/qemu-thread-posix.c
   0xaabd4100 <+16>:cbz w1, 0xaabd4108 

   0xaabd4104 <+20>:ret
   0xaabd4108 <+24>:ldaxr   w1, [x0]
   0xaabd410c <+28>:orr w1, w1, #0x1
=> 0xaabd4110 <+32>:stlxr   w2, w1, [x0]
   0xaabd4114 <+36>:cbnzw2, 0xaabd4108 

   0xaabd4118 <+40>:ret

And I'm currently inside the STLXR and LDAXR logic. To make sure my program 
counter is advancing, I added a breakpoint at 0xaabd4108, so CBNZ 
instruction would branch indefinitely into LDXAR instruction again, until the
LDAXR<->STLXR logic is satisfied (inside qemu_event_wait()).

(gdb) break *(0xaabd4108)
Breakpoint 2 at 0xaabd4108: file ./util/qemu-thread-posix.c, line 414.

which is basically this:

if (value == EV_SET) {EV_SET == 0
atomic_or(>value, EV_FREE);   EV_FREE = 1
}

and we can see that this logic being called one time after another:

(gdb) c
Thread 2 "qemu-img" hit Breakpoint 3, 0xaabd4108 in qemu_event_reset (

[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system

2019-09-09 Thread Rafael David Tinoco
Alright,

I'm still investigating this but wanted to share some findings... I
haven't got a kernel dump yet after the task is frozen, I have analyzed
only the userland part of it (although I have checked if code was
running inside kernel with perf cycles:u/cycles:k at some point).

The big picture is this: Whenever qemu-img hangs, we have 3 hung tasks
basically with these stacks:



TRHREAD #1
__GI_ppoll (../sysdeps/unix/sysv/linux/ppoll.c:39)
ppoll (/usr/include/aarch64-linux-gnu/bits/poll2.h:77)
qemu_poll_ns (./util/qemu-timer.c:322)
os_host_main_loop_wait (./util/main-loop.c:233)
main_loop_wait (./util/main-loop.c:497)
convert_do_copy (./qemu-img.c:1981)
img_convert (./qemu-img.c:2457)
main (./qemu-img.c:4976)

got stack traces:

./33293/stack  ./33293/stack 
[<0>] __switch_to+0xc0/0x218   [<0>] __switch_to+0xc0/0x218  
[<0>] ptrace_stop+0x148/0x2b0  [<0>] do_sys_poll+0x508/0x5c0 
[<0>] get_signal+0x5a4/0x730   [<0>] __arm64_sys_ppoll+0xc0/0x118
[<0>] do_notify_resume+0x158/0x358 [<0>] el0_svc_common+0xa0/0x168   
[<0>] work_pending+0x8/0x10[<0>] el0_svc_handler+0x38/0x78   
   [<0>] el0_svc+0x8/0xc  

root@d06-1:~$ perf record -F  -e cycles:u -p 33293 -- sleep 10
[ perf record: Woken up 6 times to write data ]
[ perf record: Captured and wrote 1.871 MB perf.data (48730 samples) ]

root@d06-1:~$ perf report --stdio
# Overhead  Command   Shared Object   Symbol
#     ..  ..
#
37.82%  qemu-img  libc-2.29.so[.] 0x000df710
21.81%  qemu-img  [unknown]   [k] 0x10099504
14.23%  qemu-img  [unknown]   [k] 0x10085dc0
 9.13%  qemu-img  [unknown]   [k] 0x1008fff8
 6.47%  qemu-img  libc-2.29.so[.] 0x000df708
 5.69%  qemu-img  qemu-img[.] qemu_event_reset
 2.57%  qemu-img  libc-2.29.so[.] 0x000df678
 0.63%  qemu-img  libc-2.29.so[.] 0x000df700
 0.49%  qemu-img  libc-2.29.so[.] __sigtimedwait
 0.42%  qemu-img  libpthread-2.29.so  [.] __libc_sigwait



TRHREAD #3
__GI___sigtimedwait (../sysdeps/unix/sysv/linux/sigtimedwait.c:29)
__sigwait (linux/sigwait.c:28)
qemu_thread_start (./util/qemu-thread-posix.c:498)
start_thread (pthread_create.c:486)
thread_start (linux/aarch64/clone.S:78)


./33303/stack  ./33303/stack
   
[<0>] __switch_to+0xc0/0x218   [<0>] __switch_to+0xc0/0x218 
   
[<0>] ptrace_stop+0x148/0x2b0  [<0>] do_sigtimedwait.isra.9+0x194/0x288 
   
[<0>] get_signal+0x5a4/0x730   [<0>] 
__arm64_sys_rt_sigtimedwait+0xac/0x110
[<0>] do_notify_resume+0x158/0x358 [<0>] el0_svc_common+0xa0/0x168  
   
[<0>] work_pending+0x8/0x10[<0>] el0_svc_handler+0x38/0x78  
   
   [<0>] el0_svc+0x8/0xc   

root@d06-1:~$ perf record -F  -e cycles:u -p 33303 -- sleep 10
[ perf record: Woken up 6 times to write data ]
[ perf record: Captured and wrote 1.905 MB perf.data (49647 samples) ]

root@d06-1:~$ perf report --stdio
# Overhead  Command   Shared Object   Symbol
#     ..  ..
#
45.37%  qemu-img  libc-2.29.so[.] 0x000df710
23.52%  qemu-img  [unknown]   [k] 0x10099504
 9.08%  qemu-img  [unknown]   [k] 0x1008fff8
 8.89%  qemu-img  [unknown]   [k] 0x10085dc0
 5.56%  qemu-img  libc-2.29.so[.] 0x000df708
 3.66%  qemu-img  libc-2.29.so[.] 0x000df678
 1.01%  qemu-img  libc-2.29.so[.] __sigtimedwait
 0.80%  qemu-img  libc-2.29.so[.] 0x000df700
 0.64%  qemu-img  qemu-img[.] qemu_event_reset
 0.55%  qemu-img  libc-2.29.so[.] 0x000df718
 0.52%  qemu-img  libpthread-2.29.so  [.] __libc_sigwait



TRHREAD #2
syscall (linux/aarch64/syscall.S:38)
qemu_futex_wait (./util/qemu-thread-posix.c:438)
qemu_event_wait (./util/qemu-thread-posix.c:442)
call_rcu_thread (./util/rcu.c:261)
qemu_thread_start (./util/qemu-thread-posix.c:498)
start_thread (pthread_create.c:486)
thread_start (linux/aarch64/clone.S:78)

./33302/stack  ./33302/stack   
[<0>] __switch_to+0xc0/0x218   [<0>] __switch_to+0xc0/0x218
[<0>] ptrace_stop+0x148/0x2b0  [<0>] ptrace_stop+0x148/0x2b0   
[<0>] get_signal+0x5a4/0x730   [<0>] get_signal+0x5a4/0x730
[<0>] do_notify_resume+0x1c4/0x358 [<0>] do_notify_resume+0x1c4/0x358  
[<0>] work_pending+0x8/0x10[<0>] work_pending+0x8/0x10



root@d06-1:~$ perf report --stdio
# Overhead  Command   Shared Object   Symbol
#     

[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system

2019-09-09 Thread Rafael David Tinoco
Alright, with a d06 aarch64 machine I was able to reproduce it after 8
attempts.I'll debug it today and provide feedback on my findings.

(gdb) bt full
#0  0xb0b2181c in __GI_ppoll (fds=0xce5ab770, nfds=4, 
timeout=, timeout@entry=0x0,
sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:39
_x3tmp = 0
_x0tmp = 187650583213936
_x0 = 187650583213936
_x3 = 0
_x4tmp = 8
_x1tmp = 4
_x1 = 4
_x4 = 8
_x2tmp = 
_x2 = 0
_x8 = 73
_sys_result = 
_sys_result = 
sc_cancel_oldtype = 0
sc_ret = 
tval = {tv_sec = 0, tv_nsec = 187650583137792}
#1  0xcd2a773c in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=)
at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
No locals.
#2  qemu_poll_ns (fds=, nfds=, 
timeout=timeout@entry=-1) at ./util/qemu-timer.c:322
No locals.
#3  0xcd2a8764 in os_host_main_loop_wait (timeout=-1) at 
./util/main-loop.c:233
context = 0xce599d90
ret = 
context = 
ret = 
#4  main_loop_wait (nonblocking=) at ./util/main-loop.c:497
ret = 
timeout = 4294967295
timeout_ns = 
#5  0xcd1df454 in convert_do_copy (s=0xf9b2b1d8) at 
./qemu-img.c:1981
ret = 
i = 
n = 
sector_num = 
ret = 
i = 
n = 
sector_num = 
#6  img_convert (argc=, argv=) at 
./qemu-img.c:2457
c = 
bs_i = 
flags = 16898
src_flags = 0
fmt = 0xf9b2bad1 "qcow2"
out_fmt = 
cache = 0xcd2cb1c8 "unsafe"
src_cache = 0xcd2ca9c0 "writeback"
out_baseimg = 
out_filename = 
out_baseimg_param = 
snapshot_name = 0x0
drv = 
proto_drv = 
bdi = {cluster_size = 65536, vm_state_offset = 32212254720, is_dirty = 
false, unallocated_blocks_are_zero = true,
  needs_compressed_writes = false}
out_bs = 
opts = 0xce5ab390
sn_opts = 0x0
create_opts = 0xce5ab0c0
open_opts = 
options = 0x0
local_err = 0x0
writethrough = false
src_writethrough = false
quiet = 
image_opts = false
skip_create = false
progress = 
tgt_image_opts = false
ret = 
force_share = false
explict_min_sparse = false
s = {src = 0xce577240, src_sectors = 0xce577300, src_num = 1, 
total_sectors = 62914560,allocated_sectors = 9572096, allocated_done = 6541440, 
sector_num = 8863744, wr_offs = 8859776, status = BLK_DATA, sector_next_status 
= 8863744, target = 0xce5bd2a0, has_zero_init = true,compressed = false, 
unallocated_blocks_are_zero = true, target_has_backing = false, 
target_backing_sectors = -1, wr_in_order = true, copy_range = false, min_sparse 
= 8, alignment = 8,cluster_sectors = 128, buf_sectors = 4096, num_coroutines = 
8, running_coroutines = 8, co = {0xce5ceda0,0xce5cef50, 0xce5cf100, 
0xce5cf2b0, 0xce5cf460, 0xce5cf610, 0xce5cf7c0,0xce5cf970, 
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, wait_sector_num = {-1, 8859904, 
8860928, 8863360,8861952, 8862976, 8862592, 8861440, 0, 0, 0, 0, 0, 0, 0, 0}, 
lock = {locked = 0, ctx = 0x0, from_push = {slh_first = 0x0}, to_pop = 
{slh_first = 0x0}, handoff = 0, sequence = 0, holder = 0x0}, ret = -115}
__PRETTY_FUNCTION__ = "img_convert"
#7  0xcd1d8400 in main (argc=7, argv=) at 
./qemu-img.c:4976
cmd = 0xcd34ad78 
cmdname = 
local_error = 0x0
trace_file = 0x0
c = 
long_options = {{name = 0xcd2cbbb0 "help", has_arg = 0, flag = 0x0, 
val = 104}, {
name = 0xcd2cbc78 "version", has_arg = 0, flag = 0x0, val = 
86}, {name = 0xcd2cbc80 "trace",
has_arg = 1, flag = 0x0, val = 84}, {name = 0x0, has_arg = 0, flag 
= 0x0, val = 0}}

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on high core count ARM system

Status in QEMU:
  Confirmed
Status in qemu package in Ubuntu:
  In Progress

Bug description:
  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760, 
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, 
  __fds=) at 

[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system

2019-09-06 Thread Rafael David Tinoco
Alright, I couldn't reproduce this yet, I'm running same test case in a
24 cores box and causing lots of context switches and CPU migrations in
parallel (trying to exhaust the logic).

Will let this running for sometime to check.

Unfortunately this can be related QEMU AIO BH locking/primitives and
cache coherency in the HW in question (which I got specs from:
https://en.wikichip.org/wiki/hisilicon/kunpeng/hi1616):

l1$ size8 MiB
l1d$ size   4 MiB
l1i$ size   4 MiB
l2$ size32 MiB
l3$ size64 MiB

like for example when having 2 threads in different NUMA domains, or
some other situation.

I can't simulate the same since I have a SOC with:

Cortex-A53 MPCore 24cores,

L1 I/D=32KB/32KB
L2 =256KB
L3 =4MB

and I'm not even close to L1/L2/L3 cache numbers from D06 =o).

Just got a note that I'll be able to reproduce this in the real HW, will
get back soon with real gdb debugging.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on high core count ARM system

Status in QEMU:
  Confirmed
Status in qemu package in Ubuntu:
  In Progress

Bug description:
  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760, 
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, 
  __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=, 
  timeout=timeout@entry=-1) at util/qemu-timer.c:322
  #3  0xbbefbf80 in os_host_main_loop_wait (timeout=-1)
  at util/main-loop.c:233
  #4  main_loop_wait (nonblocking=) at util/main-loop.c:497
  #5  0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at 
qemu-img.c:1980
  #6  img_convert (argc=, argv=) at 
qemu-img.c:2456
  #7  0xbbe2333c in main (argc=7, argv=) at 
qemu-img.c:4975

  Reproduced w/ latest QEMU git (@ 53744e0a182)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions



[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system

2019-09-06 Thread Rafael David Tinoco
OOhh nm on the virtual environment test, as I just remembered we don't
have KVM on 2nd level for aarch64 yet (at least in ARMv8 implementing
virt extension). I'll try to reproduce in the real env only.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on high core count ARM system

Status in QEMU:
  Confirmed
Status in qemu package in Ubuntu:
  In Progress

Bug description:
  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760, 
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, 
  __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=, 
  timeout=timeout@entry=-1) at util/qemu-timer.c:322
  #3  0xbbefbf80 in os_host_main_loop_wait (timeout=-1)
  at util/main-loop.c:233
  #4  main_loop_wait (nonblocking=) at util/main-loop.c:497
  #5  0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at 
qemu-img.c:1980
  #6  img_convert (argc=, argv=) at 
qemu-img.c:2456
  #7  0xbbe2333c in main (argc=7, argv=) at 
qemu-img.c:4975

  Reproduced w/ latest QEMU git (@ 53744e0a182)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions



[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system

2019-09-06 Thread Rafael David Tinoco
Hello Liz,

I'll try to reproduce this issue in a Cortex-A53 aarch64 real
environment (w/ 24 HW threads) AND in a virtual environment w/ lots of
vCPUs... but, if it's a barrier missing - or the lack of atomicity
and/or ordering in a primitive - then, I'm afraid the context switch in
between vCPUs might not be the same as in real CPUs (IPIs are sent and
handled differently and host kernel delays IPI delivery because of its
own callbacks, before scheduling, etc...) and I could need a qemu dump
from your environment.

Would that be feasible ? Can you reproduce this nowadays ? This bug has
aged a little, so I'm now sure!

Could you provide me the dump caused by latest package available for
your Ubuntu version ? This way I have the debug symbols to work with.

Meanwhile, I'll be trying to reproduce on my side.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on high core count ARM system

Status in QEMU:
  Confirmed
Status in qemu package in Ubuntu:
  In Progress

Bug description:
  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760, 
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, 
  __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=, 
  timeout=timeout@entry=-1) at util/qemu-timer.c:322
  #3  0xbbefbf80 in os_host_main_loop_wait (timeout=-1)
  at util/main-loop.c:233
  #4  main_loop_wait (nonblocking=) at util/main-loop.c:497
  #5  0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at 
qemu-img.c:1980
  #6  img_convert (argc=, argv=) at 
qemu-img.c:2456
  #7  0xbbe2333c in main (argc=7, argv=) at 
qemu-img.c:4975

  Reproduced w/ latest QEMU git (@ 53744e0a182)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions



[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system

2019-09-05 Thread Rafael David Tinoco
** Changed in: qemu (Ubuntu)
   Status: Confirmed => In Progress

** Changed in: qemu (Ubuntu)
 Assignee: (unassigned) => Rafael David Tinoco (rafaeldtinoco)

** Changed in: qemu (Ubuntu)
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on high core count ARM system

Status in QEMU:
  Confirmed
Status in qemu package in Ubuntu:
  In Progress

Bug description:
  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760, 
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, 
  __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=, 
  timeout=timeout@entry=-1) at util/qemu-timer.c:322
  #3  0xbbefbf80 in os_host_main_loop_wait (timeout=-1)
  at util/main-loop.c:233
  #4  main_loop_wait (nonblocking=) at util/main-loop.c:497
  #5  0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at 
qemu-img.c:1980
  #6  img_convert (argc=, argv=) at 
qemu-img.c:2456
  #7  0xbbe2333c in main (argc=7, argv=) at 
qemu-img.c:4975

  Reproduced w/ latest QEMU git (@ 53744e0a182)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions



[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system

2019-06-05 Thread dann frazier
** Also affects: qemu (Ubuntu)
   Importance: Undecided
   Status: New

** Changed in: qemu (Ubuntu)
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on high core count ARM system

Status in QEMU:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760, 
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, 
  __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=, 
  timeout=timeout@entry=-1) at util/qemu-timer.c:322
  #3  0xbbefbf80 in os_host_main_loop_wait (timeout=-1)
  at util/main-loop.c:233
  #4  main_loop_wait (nonblocking=) at util/main-loop.c:497
  #5  0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at 
qemu-img.c:1980
  #6  img_convert (argc=, argv=) at 
qemu-img.c:2456
  #7  0xbbe2333c in main (argc=7, argv=) at 
qemu-img.c:4975

  Reproduced w/ latest QEMU git (@ 53744e0a182)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions



[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system

2019-04-22 Thread 贞贵李
I can reproduce this problem with qemu.git/matser. It still exists in 
qemu.git/matser. I found that when an IO return in worker threads and want to 
call aio_notify to wake up main_loop, but it found that ctx->notify_me is 
cleared to 0 by main_loop in aio_ctx_check by calling 
atomic_and(>notify_me, ~1) . So worker thread won't write enventfd to 
notify main_loop.If such a scene happens, the main_loop will hang:
main loopworker thread1  worker 
thread2
---
qemu_poll_nsaio_worker
qemu_bh_schedule(pool->completion_bh)
glib_pollfds_poll
g_main_context_check
aio_ctx_check
atomic_and(>notify_me, ~1)aio_worker
  
qemu_bh_schedule(pool->completion_bh)
/* do something for event */
qemu_poll_ns
/* hangs !!!*/

As we known, ctx->notify_me will be visited by worker thread and main
loop. I thank we should add a lock protection for ctx->notify_me to
avoid this happend.what do you thank so?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on high core count ARM system

Status in QEMU:
  Confirmed

Bug description:
  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760, 
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, 
  __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=, 
  timeout=timeout@entry=-1) at util/qemu-timer.c:322
  #3  0xbbefbf80 in os_host_main_loop_wait (timeout=-1)
  at util/main-loop.c:233
  #4  main_loop_wait (nonblocking=) at util/main-loop.c:497
  #5  0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at 
qemu-img.c:1980
  #6  img_convert (argc=, argv=) at 
qemu-img.c:2456
  #7  0xbbe2333c in main (argc=7, argv=) at 
qemu-img.c:4975

  Reproduced w/ latest QEMU git (@ 53744e0a182)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions



[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system

2019-04-16 Thread 贞贵李
frazier, Do you find the conditions that necessarily make this problem appear?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on high core count ARM system

Status in QEMU:
  Confirmed

Bug description:
  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760, 
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, 
  __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=, 
  timeout=timeout@entry=-1) at util/qemu-timer.c:322
  #3  0xbbefbf80 in os_host_main_loop_wait (timeout=-1)
  at util/main-loop.c:233
  #4  main_loop_wait (nonblocking=) at util/main-loop.c:497
  #5  0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at 
qemu-img.c:1980
  #6  img_convert (argc=, argv=) at 
qemu-img.c:2456
  #7  0xbbe2333c in main (argc=7, argv=) at 
qemu-img.c:4975

  Reproduced w/ latest QEMU git (@ 53744e0a182)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions



[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system

2019-04-16 Thread 贞贵李
Do you have any good ideas about it? Maybe somewhere lack of memeory
barriers that cause it?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on high core count ARM system

Status in QEMU:
  Confirmed

Bug description:
  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760, 
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, 
  __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=, 
  timeout=timeout@entry=-1) at util/qemu-timer.c:322
  #3  0xbbefbf80 in os_host_main_loop_wait (timeout=-1)
  at util/main-loop.c:233
  #4  main_loop_wait (nonblocking=) at util/main-loop.c:497
  #5  0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at 
qemu-img.c:1980
  #6  img_convert (argc=, argv=) at 
qemu-img.c:2456
  #7  0xbbe2333c in main (argc=7, argv=) at 
qemu-img.c:4975

  Reproduced w/ latest QEMU git (@ 53744e0a182)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions



[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system

2019-04-15 Thread dann frazier
No, sorry - this bugs still persists w/ latest upstream (@ afccfc0). I
found a report of similar symptoms:

  https://patchwork.kernel.org/patch/10047341/
  https://bugzilla.redhat.com/show_bug.cgi?id=1524770#c13

To be clear, ^ is already fixed upstream, so it is not the *same* issue
- but perhaps related.


** Bug watch added: Red Hat Bugzilla #1524770
   https://bugzilla.redhat.com/show_bug.cgi?id=1524770

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on high core count ARM system

Status in QEMU:
  Confirmed

Bug description:
  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760, 
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, 
  __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=, 
  timeout=timeout@entry=-1) at util/qemu-timer.c:322
  #3  0xbbefbf80 in os_host_main_loop_wait (timeout=-1)
  at util/main-loop.c:233
  #4  main_loop_wait (nonblocking=) at util/main-loop.c:497
  #5  0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at 
qemu-img.c:1980
  #6  img_convert (argc=, argv=) at 
qemu-img.c:2456
  #7  0xbbe2333c in main (argc=7, argv=) at 
qemu-img.c:4975

  Reproduced w/ latest QEMU git (@ 53744e0a182)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions



[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system

2019-04-15 Thread dann frazier
** Changed in: qemu
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on high core count ARM system

Status in QEMU:
  Confirmed

Bug description:
  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760, 
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, 
  __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=, 
  timeout=timeout@entry=-1) at util/qemu-timer.c:322
  #3  0xbbefbf80 in os_host_main_loop_wait (timeout=-1)
  at util/main-loop.c:233
  #4  main_loop_wait (nonblocking=) at util/main-loop.c:497
  #5  0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at 
qemu-img.c:1980
  #6  img_convert (argc=, argv=) at 
qemu-img.c:2456
  #7  0xbbe2333c in main (argc=7, argv=) at 
qemu-img.c:4975

  Reproduced w/ latest QEMU git (@ 53744e0a182)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions



[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system

2019-04-15 Thread 贞贵李
sorry, I make a spelling mistake here("Hi, I also found a problem that
qemu-img convert hands in ARM.").The right is "I also found a problem
that qemu-img convert hangs in ARM".

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on high core count ARM system

Status in QEMU:
  New

Bug description:
  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760, 
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, 
  __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=, 
  timeout=timeout@entry=-1) at util/qemu-timer.c:322
  #3  0xbbefbf80 in os_host_main_loop_wait (timeout=-1)
  at util/main-loop.c:233
  #4  main_loop_wait (nonblocking=) at util/main-loop.c:497
  #5  0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at 
qemu-img.c:1980
  #6  img_convert (argc=, argv=) at 
qemu-img.c:2456
  #7  0xbbe2333c in main (argc=7, argv=) at 
qemu-img.c:4975

  Reproduced w/ latest QEMU git (@ 53744e0a182)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions



[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system

2019-04-15 Thread 贞贵李
Hi, I also found a problem that qemu-img convert hands in ARM.

The convert command line is "qemu-img convert -f qcow2 -O raw disk.qcow2
disk.raw ".

The bt is below:

Thread 2 (Thread 0x4b776e50 (LWP 27215)):
#0 0x4a3f2994 in sigtimedwait () from /lib64/libc.so.6
#1 0x4a39c60c in sigwait () from /lib64/libpthread.so.0
#2 0xaae82610 in sigwait_compat (opaque=0xc5163b00) at 
util/compatfd.c:37
#3 0xaae85038 in qemu_thread_start (args=args@entry=0xc5163b90) at 
util/qemu_thread_posix.c:496
#4 0x4a3918bc in start_thread () from /lib64/libpthread.so.0
#5 0x4a492b2c in thread_start () from /lib64/libc.so.6

Thread 1 (Thread 0x4b573370 (LWP 27214)):
#0 0x4a489020 in ppoll () from /lib64/libc.so.6
#1 0xaadaefc0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/bits/poll2.h:77
#2 qemu_poll_ns (fds=, nfds=, timeout=) at qemu_timer.c:391
#3 0xaadae014 in os_host_main_loop_wait (timeout=) at 
main_loop.c:272
#4 0xaadae190 in main_loop_wait (nonblocking=) at 
main_loop.c:534
#5 0xaad97be0 in convert_do_copy (s=0xdc32eb48) at qemu-img.c:1923
#6 0xaada2d70 in img_convert (argc=, argv=) at qemu-img.c:2414
#7 0xaad99ac4 in main (argc=7, argv=) at qemu-img.c:5305


Do you find the cause of the problem and fix it? Thanks for your reply!

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on high core count ARM system

Status in QEMU:
  New

Bug description:
  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760, 
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, 
  __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=, 
  timeout=timeout@entry=-1) at util/qemu-timer.c:322
  #3  0xbbefbf80 in os_host_main_loop_wait (timeout=-1)
  at util/main-loop.c:233
  #4  main_loop_wait (nonblocking=) at util/main-loop.c:497
  #5  0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at 
qemu-img.c:1980
  #6  img_convert (argc=, argv=) at 
qemu-img.c:2456
  #7  0xbbe2333c in main (argc=7, argv=) at 
qemu-img.c:4975

  Reproduced w/ latest QEMU git (@ 53744e0a182)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions



[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system

2018-12-05 Thread Alex Bennée
** Tags added: qemu-img

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on high core count ARM system

Status in QEMU:
  New

Bug description:
  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760, 
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, 
  __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=, 
  timeout=timeout@entry=-1) at util/qemu-timer.c:322
  #3  0xbbefbf80 in os_host_main_loop_wait (timeout=-1)
  at util/main-loop.c:233
  #4  main_loop_wait (nonblocking=) at util/main-loop.c:497
  #5  0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at 
qemu-img.c:1980
  #6  img_convert (argc=, argv=) at 
qemu-img.c:2456
  #7  0xbbe2333c in main (argc=7, argv=) at 
qemu-img.c:4975

  Reproduced w/ latest QEMU git (@ 53744e0a182)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions



[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system

2018-11-26 Thread dann frazier
ext4 filesystem, SATA drive:

(gdb) thread apply all bt

Thread 3 (Thread 0x9bffc9a0 (LWP 9015)):
#0  0xaaa462cc in __GI___sigtimedwait (set=, 
set@entry=0xe725c070, info=info@entry=0x9bffbf18, 
timeout=0x3ff1, timeout@entry=0x0)
at ../sysdeps/unix/sysv/linux/sigtimedwait.c:42
#1  0xaab7dfac in __sigwait (set=set@entry=0xe725c070, 
sig=sig@entry=0x9bffbff4) at ../sysdeps/unix/sysv/linux/sigwait.c:28
#2  0xd998a628 in sigwait_compat (opaque=0xe725c070)
at util/compatfd.c:36
#3  0xd998bce0 in qemu_thread_start (args=)
at util/qemu-thread-posix.c:498
#4  0xaab73088 in start_thread (arg=0xc528531f)
at pthread_create.c:463
#5  0xaaae34ec in thread_start ()
at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 2 (Thread 0xa0e779a0 (LWP 9014)):
#0  syscall () at ../sysdeps/unix/sysv/linux/aarch64/syscall.S:38
#1  0xd998c9e8 in qemu_futex_wait (val=, f=)
at /home/ubuntu/qemu/include/qemu/futex.h:29
#2  qemu_event_wait (ev=ev@entry=0xd9a091c0 )
at util/qemu-thread-posix.c:442
#3  0xd99a6834 in call_rcu_thread (opaque=)
at util/rcu.c:261
#4  0xd998bce0 in qemu_thread_start (args=)
at util/qemu-thread-posix.c:498
#5  0xaab73088 in start_thread (arg=0xc528542f)
at pthread_create.c:463
#6  0xaaae34ec in thread_start ()
at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 1 (Thread 0xa0fa8010 (LWP 9013)):
#0  0xaaada154 in __GI_ppoll (fds=0xe7291dc0, nfds=187650771816320, 
timeout=, timeout@entry=0x0, sigmask=0xc52852e0)
at ../sysdeps/unix/sysv/linux/ppoll.c:39
#1  0xd9987f00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, 
__fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
#2  qemu_poll_ns (fds=, nfds=, 
timeout=timeout@entry=-1) at util/qemu-timer.c:322
#3  0xd9988f80 in os_host_main_loop_wait (timeout=-1)
at util/main-loop.c:233
#4  main_loop_wait (nonblocking=) at util/main-loop.c:497
#5  0xd98b7a30 in convert_do_copy (s=0xc52854e8) at qemu-img.c:1980
#6  img_convert (argc=, argv=) at qemu-img.c:2456
#7  0xd98b033c in main (argc=7, argv=) at qemu-img.c:4975

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on high core count ARM system

Status in QEMU:
  New

Bug description:
  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760, 
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, 
  __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=, 
  timeout=timeout@entry=-1) at util/qemu-timer.c:322
  #3  0xbbefbf80 in os_host_main_loop_wait (timeout=-1)
  at util/main-loop.c:233
  #4  main_loop_wait (nonblocking=) at util/main-loop.c:497
  #5  0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at 
qemu-img.c:1980
  #6  img_convert (argc=, argv=) at 
qemu-img.c:2456
  #7  0xbbe2333c in main (argc=7, argv=) at 
qemu-img.c:4975

  Reproduced w/ latest QEMU git (@ 53744e0a182)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions



[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system

2018-11-26 Thread John Snow
Hi, can you do a `thread apply all bt` instead? If I were to bet, we're
probably waiting for some slow call like lseek to return in another
thread.

What filesystem/blockdevice is involved here?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on high core count ARM system

Status in QEMU:
  New

Bug description:
  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760, 
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, 
  __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=, 
  timeout=timeout@entry=-1) at util/qemu-timer.c:322
  #3  0xbbefbf80 in os_host_main_loop_wait (timeout=-1)
  at util/main-loop.c:233
  #4  main_loop_wait (nonblocking=) at util/main-loop.c:497
  #5  0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at 
qemu-img.c:1980
  #6  img_convert (argc=, argv=) at 
qemu-img.c:2456
  #7  0xbbe2333c in main (argc=7, argv=) at 
qemu-img.c:4975

  Reproduced w/ latest QEMU git (@ 53744e0a182)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions