[Qemu-devel] [Bug 1836855] [NEW] virtio_scsi_ctx_check failed when detach virtio_scsi disk

2019-07-17 Thread
Public bug reported:

I found a problem  that virtio_scsi_ctx_check  failed when detaching
virtio_scsi disk.  The  bt is below:

(gdb) bt
#0  0xb02e1bd0 in raise () from /lib64/libc.so.6
#1  0xb02e2f7c in abort () from /lib64/libc.so.6
#2  0xb02db124 in __assert_fail_base () from /lib64/libc.so.6
#3  0xb02db1a4 in __assert_fail () from /lib64/libc.so.6
#4  0x004eb9a8 in virtio_scsi_ctx_check (d=d@entry=0xc70d790, 
s=, s=)
at /Images/lzg/code/710/qemu-2.8.1/hw/scsi/virtio-scsi.c:243
#5  0x004ec87c in virtio_scsi_handle_cmd_req_prepare 
(s=s@entry=0xd27a7a0, req=req@entry=0xafc4b90)
at /Images/lzg/code/710/qemu-2.8.1/hw/scsi/virtio-scsi.c:553
#6  0x004ecc20 in virtio_scsi_handle_cmd_vq (s=0xd27a7a0, vq=0xd283410)
at /Images/lzg/code/710/qemu-2.8.1/hw/scsi/virtio-scsi.c:588
#7  0x004eda20 in virtio_scsi_data_plane_handle_cmd (vdev=0x0, 
vq=0xae7a6f98)
at /Images/lzg/code/710/qemu-2.8.1/hw/scsi/virtio-scsi-dataplane.c:57
#8  0x00877254 in aio_dispatch (ctx=0xac61010) at util/aio-posix.c:323
#9  0x008773ec in aio_poll (ctx=0xac61010, blocking=true) at 
util/aio-posix.c:472
#10 0x005cd7cc in iothread_run (opaque=0xac5e4b0) at iothread.c:49
#11 0x0087a8b8 in qemu_thread_start (args=0xac61360) at 
util/qemu-thread-posix.c:495
#12 0x008a04e8 in thread_entry_for_hotfix (pthread_cb=0x0) at 
uvp/hotpatch/qemu_hotpatch_helper.c:579
#13 0xb041c8bc in start_thread () from /lib64/libpthread.so.0
#14 0xb0382f8c in thread_start () from /lib64/libc.so.6

assert(blk_get_aio_context(d->conf.blk) == s->ctx)  failed.

I think this patch
(https://git.qemu.org/?p=qemu.git;a=commitdiff;h=a6f230c8d13a7ff3a0c7f1097412f44bfd9eff0b)
introduce this problem.

commit a6f230c8d13a7ff3a0c7f1097412f44bfd9eff0b  move blockbackend back
to main AioContext on unplug. It set the AioContext of

SCSIDevice  to the main AioContex, but s->ctx is still the iothread
AioContext.  Is this a bug?

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1836855

Title:
  virtio_scsi_ctx_check failed when detach virtio_scsi disk

Status in QEMU:
  New

Bug description:
  I found a problem  that virtio_scsi_ctx_check  failed when detaching
  virtio_scsi disk.  The  bt is below:

  (gdb) bt
  #0  0xb02e1bd0 in raise () from /lib64/libc.so.6
  #1  0xb02e2f7c in abort () from /lib64/libc.so.6
  #2  0xb02db124 in __assert_fail_base () from /lib64/libc.so.6
  #3  0xb02db1a4 in __assert_fail () from /lib64/libc.so.6
  #4  0x004eb9a8 in virtio_scsi_ctx_check (d=d@entry=0xc70d790, 
s=, s=)
  at /Images/lzg/code/710/qemu-2.8.1/hw/scsi/virtio-scsi.c:243
  #5  0x004ec87c in virtio_scsi_handle_cmd_req_prepare 
(s=s@entry=0xd27a7a0, req=req@entry=0xafc4b90)
  at /Images/lzg/code/710/qemu-2.8.1/hw/scsi/virtio-scsi.c:553
  #6  0x004ecc20 in virtio_scsi_handle_cmd_vq (s=0xd27a7a0, 
vq=0xd283410)
  at /Images/lzg/code/710/qemu-2.8.1/hw/scsi/virtio-scsi.c:588
  #7  0x004eda20 in virtio_scsi_data_plane_handle_cmd (vdev=0x0, 
vq=0xae7a6f98)
  at /Images/lzg/code/710/qemu-2.8.1/hw/scsi/virtio-scsi-dataplane.c:57
  #8  0x00877254 in aio_dispatch (ctx=0xac61010) at util/aio-posix.c:323
  #9  0x008773ec in aio_poll (ctx=0xac61010, blocking=true) at 
util/aio-posix.c:472
  #10 0x005cd7cc in iothread_run (opaque=0xac5e4b0) at iothread.c:49
  #11 0x0087a8b8 in qemu_thread_start (args=0xac61360) at 
util/qemu-thread-posix.c:495
  #12 0x008a04e8 in thread_entry_for_hotfix (pthread_cb=0x0) at 
uvp/hotpatch/qemu_hotpatch_helper.c:579
  #13 0xb041c8bc in start_thread () from /lib64/libpthread.so.0
  #14 0xb0382f8c in thread_start () from /lib64/libc.so.6

  assert(blk_get_aio_context(d->conf.blk) == s->ctx)  failed.

  I think this patch
  
(https://git.qemu.org/?p=qemu.git;a=commitdiff;h=a6f230c8d13a7ff3a0c7f1097412f44bfd9eff0b)
  introduce this problem.

  commit a6f230c8d13a7ff3a0c7f1097412f44bfd9eff0b  move blockbackend
  back to main AioContext on unplug. It set the AioContext of

  SCSIDevice  to the main AioContex, but s->ctx is still the iothread
  AioContext.  Is this a bug?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1836855/+subscriptions



[Qemu-devel] [Bug 1831750] [NEW] virtual machine cpu soft lockup when qemu attach disk

2019-06-05 Thread
Public bug reported:

Hi,  I found a problem that virtual machine cpu soft lockup when I
attach a disk to the vm in the case that

backend storage network has a large delay or IO pressure is too large.

1) The disk xml which I attached is:


  
  
  
  
  
  


2) The bt of qemu main thread:

#0 0x9d78402c in pread64 () from /lib64/libpthread.so.0
#1 0xce3357d8 in pread64 (__offset=0, __nbytes=4096, 
__buf=0xd47a5200, __fd=202) at /usr/include/bits/unistd.h:99
#2 raw_is_io_aligned (fd=fd@entry=202, buf=buf@entry=0xd47a5200, 
len=len@entry=4096) at block/raw_posix.c:294
#3 0xce33597c in raw_probe_alignment (bs=bs@entry=0xd32ea920, 
fd=202, errp=errp@entry=0xfef7a330) at block/raw_posix.c:349
#4 0xce335a48 in raw_refresh_limits (bs=0xd32ea920, 
errp=0xfef7a330) at block/raw_posix.c:811
#5 0xce3404b0 in bdrv_refresh_limits (bs=0xd32ea920, 
errp=0xfef7a330, errp@entry=0xfef7a360) at block/io.c:122
#6 0xce340504 in bdrv_refresh_limits (bs=bs@entry=0xd09ce800, 
errp=errp@entry=0xfef7a3b0) at block/io.c:97
#7 0xce2eb9f0 in bdrv_open_common (bs=bs@entry=0xd09ce800, 
file=file@entry=0xd0e89800, options=, 
errp=errp@entry=0xfef7a450)
at block.c:1194
#8 0xce2eedec in bdrv_open_inherit (filename=, 
filename@entry=0xd25f92d0 "/dev/mapper/36384c4f100630193359db7a8011d",
reference=reference@entry=0x0, options=, 
options@entry=0xd3d0f4b0, flags=, flags@entry=128, 
parent=parent@entry=0x0,
child_role=child_role@entry=0x0, errp=errp@entry=0xfef7a710) at block.c:1895
#9 0xce2ef510 in bdrv_open (filename=filename@entry=0xd25f92d0 
"/dev/mapper/36384c4f100630193359db7a8011d", reference=reference@entry=0x0,
options=options@entry=0xd3d0f4b0, flags=flags@entry=128, 
errp=errp@entry=0xfef7a710) at block.c:1979
#10 0xce331ef0 in blk_new_open (filename=filename@entry=0xd25f92d0 
"/dev/mapper/36384c4f100630193359db7a8011d", reference=reference@entry=0x0,
options=options@entry=0xd3d0f4b0, flags=128, 
errp=errp@entry=0xfef7a710) at block/block_backend.c:213
#11 0xce0da1f4 in blockdev_init (file=file@entry=0xd25f92d0 
"/dev/mapper/36384c4f100630193359db7a8011d", 
bs_opts=bs_opts@entry=0xd3d0f4b0,
errp=errp@entry=0xfef7a710) at blockdev.c:603
#12 0xce0dc478 in drive_new (all_opts=all_opts@entry=0xd4dc31d0, 
block_default_type=) at blockdev.c:1116
#13 0xce0e3ee0 in add_init_drive (
optstr=optstr@entry=0xd0872ec0 
"file=/dev/mapper/36384c4f100630193359db7a8011d,format=raw,if=none,id=drive-scsi0-0-0-3,cache=none,aio=native")
at device_hotplug.c:46
#14 0xce0e3f78 in hmp_drive_add (mon=0xfef7a810, 
qdict=0xd0c8f000) at device_hotplug.c:67
#15 0xcdf7d688 in handle_hmp_command (mon=0xfef7a810, 
cmdline=) at /usr/src/debug/qemu-kvm-2.8.1/monitor.c:3199
#16 0xcdf7d778 in qmp_human_monitor_command (
command_line=0xcfc8e3c0 "drive_add dummy 
file=/dev/mapper/36384c4f100630193359db7a8011d,format=raw,if=none,id=drive-scsi0-0-0-3,cache=none,aio=native",
has_cpu_index=false, cpu_index=0, errp=errp@entry=0xfef7a968) at 
/usr/src/debug/qemu-kvm-2.8.1/monitor.c:660
#17 0xce0fdb30 in qmp_marshal_human_monitor_command (args=, ret=0xfef7a9e0, errp=0xfef7a9d8) at qmp-marshal.c:2223
#18 0xce3b6ad0 in do_qmp_dispatch (request=, 
errp=0xfef7aa20, errp@entry=0xfef7aa40) at qapi/qmp_dispatch.c:115
#19 0xce3b6d58 in qmp_dispatch (request=) at 
qapi/qmp_dispatch.c:142
#20 0xcdf79398 in handle_qmp_command (parser=, 
tokens=) at /usr/src/debug/qemu-kvm-2.8.1/monitor.c:4010
#21 0xce3bd6c0 in json_message_process_token (lexer=0xcf834c80, 
input=, type=JSON_RCURLY, x=214, y=274) at 
qobject/json_streamer.c:105
#22 0xce3f3d4c in json_lexer_feed_char 
(lexer=lexer@entry=0xcf834c80, ch=, flush=flush@entry=false) 
at qobject/json_lexer.c:319
#23 0xce3f3e6c in json_lexer_feed (lexer=0xcf834c80, 
buffer=, size=) at qobject/json_lexer.c:369
#24 0xcdf77c64 in monitor_qmp_read (opaque=, 
buf=, size=) at 
/usr/src/debug/qemu-kvm-2.8.1/monitor.c:4040
#25 0xce0eab18 in tcp_chr_read (chan=, cond=, opaque=0xcf90b280) at qemu_char.c:3260
#26 0x9dadf200 in g_main_context_dispatch () from 
/lib64/libglib-2.0.so.0
#27 0xce3c4a00 in glib_pollfds_poll () at util/main_loop.c:230
--Type  for more, q to quit, c to continue without paging--
#28 0xce3c4a88 in os_host_main_loop_wait (timeout=) at 
util/main_loop.c:278
#29 0xce3c4bf0 in main_loop_wait (nonblocking=) at 
util/main_loop.c:534
#30 0xce0f5d08 in main_loop () at vl.c:2120
#31 0xcdf3a770 in main (argc=, argv=, 
envp=) at vl.c:5017


>From the bt we can see,  when do qmp sush as drive_add,  qemu main thread 
>locks the qemu_global_mutex  and do pread in raw_probe_alignmen. 

[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system

2019-04-22 Thread
I can reproduce this problem with qemu.git/matser. It still exists in 
qemu.git/matser. I found that when an IO return in worker threads and want to 
call aio_notify to wake up main_loop, but it found that ctx->notify_me is 
cleared to 0 by main_loop in aio_ctx_check by calling 
atomic_and(>notify_me, ~1) . So worker thread won't write enventfd to 
notify main_loop.If such a scene happens, the main_loop will hang:
main loopworker thread1  worker 
thread2
---
qemu_poll_nsaio_worker
qemu_bh_schedule(pool->completion_bh)
glib_pollfds_poll
g_main_context_check
aio_ctx_check
atomic_and(>notify_me, ~1)aio_worker
  
qemu_bh_schedule(pool->completion_bh)
/* do something for event */
qemu_poll_ns
/* hangs !!!*/

As we known, ctx->notify_me will be visited by worker thread and main
loop. I thank we should add a lock protection for ctx->notify_me to
avoid this happend.what do you thank so?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on high core count ARM system

Status in QEMU:
  Confirmed

Bug description:
  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760, 
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, 
  __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=, 
  timeout=timeout@entry=-1) at util/qemu-timer.c:322
  #3  0xbbefbf80 in os_host_main_loop_wait (timeout=-1)
  at util/main-loop.c:233
  #4  main_loop_wait (nonblocking=) at util/main-loop.c:497
  #5  0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at 
qemu-img.c:1980
  #6  img_convert (argc=, argv=) at 
qemu-img.c:2456
  #7  0xbbe2333c in main (argc=7, argv=) at 
qemu-img.c:4975

  Reproduced w/ latest QEMU git (@ 53744e0a182)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions



[Qemu-devel] [Bug 1824053] Re: Qemu-img convert appears to be stuck on aarch64 host with low probability

2019-04-19 Thread
** Description changed:

  Hi,  I found a problem that qemu-img convert appears to be stuck on
  aarch64 host with low probability.
  
  The convert command  line is  "qemu-img convert -f qcow2 -O raw
  disk.qcow2 disk.raw ".
  
  The bt is below:
  
  Thread 2 (Thread 0x4b776e50 (LWP 27215)):
  #0  0x4a3f2994 in sigtimedwait () from /lib64/libc.so.6
  #1  0x4a39c60c in sigwait () from /lib64/libpthread.so.0
  #2  0xaae82610 in sigwait_compat (opaque=0xc5163b00) at 
util/compatfd.c:37
  #3  0xaae85038 in qemu_thread_start (args=args@entry=0xc5163b90) 
at util/qemu_thread_posix.c:496
  #4  0x4a3918bc in start_thread () from /lib64/libpthread.so.0
  #5  0x4a492b2c in thread_start () from /lib64/libc.so.6
  
  Thread 1 (Thread 0x4b573370 (LWP 27214)):
  #0  0x4a489020 in ppoll () from /lib64/libc.so.6
  #1  0xaadaefc0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=, 
timeout=) at qemu_timer.c:391
  #3  0xaadae014 in os_host_main_loop_wait (timeout=) at 
main_loop.c:272
  #4  0xaadae190 in main_loop_wait (nonblocking=) at 
main_loop.c:534
  #5  0xaad97be0 in convert_do_copy (s=0xdc32eb48) at 
qemu-img.c:1923
  #6  0xaada2d70 in img_convert (argc=, argv=) at qemu-img.c:2414
  #7  0xaad99ac4 in main (argc=7, argv=) at 
qemu-img.c:5305
  
- 
- The problem seems to be very similar to the phenomenon described by this 
patch 
(https://resources.ovirt.org/pub/ovirt-4.1/src/qemu-kvm-ev/0025-aio_notify-force-main-loop-wakeup-with-SIGIO-aarch64.patch),
 
+ The problem seems to be very similar to the phenomenon described by this
+ patch (https://resources.ovirt.org/pub/ovirt-4.1/src/qemu-kvm-ev/0025
+ -aio_notify-force-main-loop-wakeup-with-SIGIO-aarch64.patch),
  
  which force main loop wakeup with SIGIO.  But this patch was reverted by
  the patch (http://ovirt.repo.nfrance.com/src/qemu-kvm-ev/kvm-Revert-
  aio_notify-force-main-loop-wakeup-with-SIGIO-.patch).
  
- The problem still seems to exist in aarch64 host. The qemu version I used is 
2.8.1. The host version is 4.19.28-1.2.108.aarch64.
-  Do you have any solutions to fix it?  Thanks for your reply !
+ I can reproduce this problem with qemu.git/matser. It still exists in 
qemu.git/matser. I found that when an IO return in
+ worker threads and want to call aio_notify to wake up main_loop, but it found 
that ctx->notify_me is cleared to 0 by main_loop in aio_ctx_check by calling 
atomic_and(>notify_me, ~1) . So worker thread won't write enventfd to 
notify main_loop. If such a scene happens, the main_loop will hang:
+ main loopworker thread1 
worker thread2
+ 
--
   
+  qemu_poll_ns aio_worker
+ qemu_bh_schedule(pool->completion_bh) 
 
+ glib_pollfds_poll
+ g_main_context_check
+ aio_ctx_check 
aio_worker  
  
+ atomic_and(>notify_me, ~1) 
qemu_bh_schedule(pool->completion_bh)  
+   
 
+ /* do something for event */   
+ qemu_poll_ns
+ /* hangs !!!*/
+ 
+ As we known ,ctx->notify_me will be visited by worker thread and main
+ loop. I thank we should add a lock protection for ctx->notify_me to
+ avoid this happend.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1824053

Title:
  Qemu-img convert appears to be stuck on aarch64 host with low
  probability

Status in QEMU:
  Confirmed

Bug description:
  Hi,  I found a problem that qemu-img convert appears to be stuck on
  aarch64 host with low probability.

  The convert command  line is  "qemu-img convert -f qcow2 -O raw
  disk.qcow2 disk.raw ".

  The bt is below:

  Thread 2 (Thread 0x4b776e50 (LWP 27215)):
  #0  0x4a3f2994 in sigtimedwait () from /lib64/libc.so.6
  #1  0x4a39c60c in sigwait () from /lib64/libpthread.so.0
  #2  0xaae82610 in sigwait_compat (opaque=0xc5163b00) at 
util/compatfd.c:37
  #3  0xaae85038 in qemu_thread_start (args=args@entry=0xc5163b90) 
at util/qemu_thread_posix.c:496
  #4  0x4a3918bc in start_thread () from /lib64/libpthread.so.0
  #5  0x4a492b2c in thread_start () from /lib64/libc.so.6

  Thread 1 (Thread 0x4b573370 (LWP 27214)):
  #0  0x4a489020 in ppoll () from /lib64/libc.so.6
  #1  0xaadaefc0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/bits/poll2.h:77
  #2  

[Qemu-devel] [Bug 1824053] Re: Qemu-img convert appears to be stuck on aarch64 host with low probability

2019-04-19 Thread
I can reproduce this problem with qemu.git/matser. It still exists in 
qemu.git/matser. I found that when an IO return in
worker threads and want to call aio_notify to wake up main_loop, but it found 
that ctx->notify_me is cleared to 0 by main_loop in aio_ctx_check by calling 
atomic_and(>notify_me, ~1) . So worker thread won't write enventfd to 
notify main_loop. If such a scene happens, the main_loop will hang:

   main loop   worker thread1   
  worker thread2
-

 qemu_poll_nsaio_worker
qemu_bh_schedule(pool->completion_bh)   
   
glib_pollfds_poll
g_main_context_check
aio_ctx_check   
  aio_worker
   
atomic_and(>notify_me, ~1) 
  
   
qemu_bh_schedule(pool->completion_bh)
/* do something for event */   
qemu_poll_ns
/* hangs !!!*/  


As we known ,ctx->notify_me will be visited by worker thread and main loop. I 
thank we should add a lock protection for ctx->notify_me to avoid this happend.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1824053

Title:
  Qemu-img convert appears to be stuck on aarch64 host with low
  probability

Status in QEMU:
  Confirmed

Bug description:
  Hi,  I found a problem that qemu-img convert appears to be stuck on
  aarch64 host with low probability.

  The convert command  line is  "qemu-img convert -f qcow2 -O raw
  disk.qcow2 disk.raw ".

  The bt is below:

  Thread 2 (Thread 0x4b776e50 (LWP 27215)):
  #0  0x4a3f2994 in sigtimedwait () from /lib64/libc.so.6
  #1  0x4a39c60c in sigwait () from /lib64/libpthread.so.0
  #2  0xaae82610 in sigwait_compat (opaque=0xc5163b00) at 
util/compatfd.c:37
  #3  0xaae85038 in qemu_thread_start (args=args@entry=0xc5163b90) 
at util/qemu_thread_posix.c:496
  #4  0x4a3918bc in start_thread () from /lib64/libpthread.so.0
  #5  0x4a492b2c in thread_start () from /lib64/libc.so.6

  Thread 1 (Thread 0x4b573370 (LWP 27214)):
  #0  0x4a489020 in ppoll () from /lib64/libc.so.6
  #1  0xaadaefc0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=, 
timeout=) at qemu_timer.c:391
  #3  0xaadae014 in os_host_main_loop_wait (timeout=) at 
main_loop.c:272
  #4  0xaadae190 in main_loop_wait (nonblocking=) at 
main_loop.c:534
  #5  0xaad97be0 in convert_do_copy (s=0xdc32eb48) at 
qemu-img.c:1923
  #6  0xaada2d70 in img_convert (argc=, argv=) at qemu-img.c:2414
  #7  0xaad99ac4 in main (argc=7, argv=) at 
qemu-img.c:5305

  
  The problem seems to be very similar to the phenomenon described by this 
patch 
(https://resources.ovirt.org/pub/ovirt-4.1/src/qemu-kvm-ev/0025-aio_notify-force-main-loop-wakeup-with-SIGIO-aarch64.patch),
 

  which force main loop wakeup with SIGIO.  But this patch was reverted
  by the patch (http://ovirt.repo.nfrance.com/src/qemu-kvm-ev/kvm-
  Revert-aio_notify-force-main-loop-wakeup-with-SIGIO-.patch).

  The problem still seems to exist in aarch64 host. The qemu version I used is 
2.8.1. The host version is 4.19.28-1.2.108.aarch64.
   Do you have any solutions to fix it?  Thanks for your reply !

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1824053/+subscriptions



[Qemu-devel] [Bug 1824053] Re: Qemu-img convert appears to be stuck on aarch64 host with low probability

2019-04-19 Thread
** Changed in: qemu
   Status: Fix Released => Confirmed

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1824053

Title:
  Qemu-img convert appears to be stuck on aarch64 host with low
  probability

Status in QEMU:
  Confirmed

Bug description:
  Hi,  I found a problem that qemu-img convert appears to be stuck on
  aarch64 host with low probability.

  The convert command  line is  "qemu-img convert -f qcow2 -O raw
  disk.qcow2 disk.raw ".

  The bt is below:

  Thread 2 (Thread 0x4b776e50 (LWP 27215)):
  #0  0x4a3f2994 in sigtimedwait () from /lib64/libc.so.6
  #1  0x4a39c60c in sigwait () from /lib64/libpthread.so.0
  #2  0xaae82610 in sigwait_compat (opaque=0xc5163b00) at 
util/compatfd.c:37
  #3  0xaae85038 in qemu_thread_start (args=args@entry=0xc5163b90) 
at util/qemu_thread_posix.c:496
  #4  0x4a3918bc in start_thread () from /lib64/libpthread.so.0
  #5  0x4a492b2c in thread_start () from /lib64/libc.so.6

  Thread 1 (Thread 0x4b573370 (LWP 27214)):
  #0  0x4a489020 in ppoll () from /lib64/libc.so.6
  #1  0xaadaefc0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=, 
timeout=) at qemu_timer.c:391
  #3  0xaadae014 in os_host_main_loop_wait (timeout=) at 
main_loop.c:272
  #4  0xaadae190 in main_loop_wait (nonblocking=) at 
main_loop.c:534
  #5  0xaad97be0 in convert_do_copy (s=0xdc32eb48) at 
qemu-img.c:1923
  #6  0xaada2d70 in img_convert (argc=, argv=) at qemu-img.c:2414
  #7  0xaad99ac4 in main (argc=7, argv=) at 
qemu-img.c:5305

  
  The problem seems to be very similar to the phenomenon described by this 
patch 
(https://resources.ovirt.org/pub/ovirt-4.1/src/qemu-kvm-ev/0025-aio_notify-force-main-loop-wakeup-with-SIGIO-aarch64.patch),
 

  which force main loop wakeup with SIGIO.  But this patch was reverted
  by the patch (http://ovirt.repo.nfrance.com/src/qemu-kvm-ev/kvm-
  Revert-aio_notify-force-main-loop-wakeup-with-SIGIO-.patch).

  The problem still seems to exist in aarch64 host. The qemu version I used is 
2.8.1. The host version is 4.19.28-1.2.108.aarch64.
   Do you have any solutions to fix it?  Thanks for your reply !

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1824053/+subscriptions



[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system

2019-04-16 Thread
frazier, Do you find the conditions that necessarily make this problem appear?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on high core count ARM system

Status in QEMU:
  Confirmed

Bug description:
  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760, 
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, 
  __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=, 
  timeout=timeout@entry=-1) at util/qemu-timer.c:322
  #3  0xbbefbf80 in os_host_main_loop_wait (timeout=-1)
  at util/main-loop.c:233
  #4  main_loop_wait (nonblocking=) at util/main-loop.c:497
  #5  0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at 
qemu-img.c:1980
  #6  img_convert (argc=, argv=) at 
qemu-img.c:2456
  #7  0xbbe2333c in main (argc=7, argv=) at 
qemu-img.c:4975

  Reproduced w/ latest QEMU git (@ 53744e0a182)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions



[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system

2019-04-16 Thread
Do you have any good ideas about it? Maybe somewhere lack of memeory
barriers that cause it?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on high core count ARM system

Status in QEMU:
  Confirmed

Bug description:
  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760, 
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, 
  __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=, 
  timeout=timeout@entry=-1) at util/qemu-timer.c:322
  #3  0xbbefbf80 in os_host_main_loop_wait (timeout=-1)
  at util/main-loop.c:233
  #4  main_loop_wait (nonblocking=) at util/main-loop.c:497
  #5  0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at 
qemu-img.c:1980
  #6  img_convert (argc=, argv=) at 
qemu-img.c:2456
  #7  0xbbe2333c in main (argc=7, argv=) at 
qemu-img.c:4975

  Reproduced w/ latest QEMU git (@ 53744e0a182)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions



[Qemu-devel] [Bug 1824053] Re: Qemu-img convert appears to be stuck on aarch64 host with low probability

2019-04-16 Thread
dann frazier met the same problem as me in
(https://bugs.launchpad.net/qemu/+bug/1805256).

He said this bugs still persists w/ latest upstream (@ afccfc0). His
reply to me is below:

No, sorry - this bugs still persists w/ latest upstream (@ afccfc0). I
found a report of similar symptoms:

  https://patchwork.kernel.org/patch/10047341/
  https://bugzilla.redhat.com/show_bug.cgi?id=1524770#c13

To be clear, ^ is already fixed upstream, so it is not the *same* issue
- but perhaps related.


** Bug watch added: Red Hat Bugzilla #1524770
   https://bugzilla.redhat.com/show_bug.cgi?id=1524770

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1824053

Title:
  Qemu-img convert appears to be stuck on aarch64 host with low
  probability

Status in QEMU:
  Fix Released

Bug description:
  Hi,  I found a problem that qemu-img convert appears to be stuck on
  aarch64 host with low probability.

  The convert command  line is  "qemu-img convert -f qcow2 -O raw
  disk.qcow2 disk.raw ".

  The bt is below:

  Thread 2 (Thread 0x4b776e50 (LWP 27215)):
  #0  0x4a3f2994 in sigtimedwait () from /lib64/libc.so.6
  #1  0x4a39c60c in sigwait () from /lib64/libpthread.so.0
  #2  0xaae82610 in sigwait_compat (opaque=0xc5163b00) at 
util/compatfd.c:37
  #3  0xaae85038 in qemu_thread_start (args=args@entry=0xc5163b90) 
at util/qemu_thread_posix.c:496
  #4  0x4a3918bc in start_thread () from /lib64/libpthread.so.0
  #5  0x4a492b2c in thread_start () from /lib64/libc.so.6

  Thread 1 (Thread 0x4b573370 (LWP 27214)):
  #0  0x4a489020 in ppoll () from /lib64/libc.so.6
  #1  0xaadaefc0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=, 
timeout=) at qemu_timer.c:391
  #3  0xaadae014 in os_host_main_loop_wait (timeout=) at 
main_loop.c:272
  #4  0xaadae190 in main_loop_wait (nonblocking=) at 
main_loop.c:534
  #5  0xaad97be0 in convert_do_copy (s=0xdc32eb48) at 
qemu-img.c:1923
  #6  0xaada2d70 in img_convert (argc=, argv=) at qemu-img.c:2414
  #7  0xaad99ac4 in main (argc=7, argv=) at 
qemu-img.c:5305

  
  The problem seems to be very similar to the phenomenon described by this 
patch 
(https://resources.ovirt.org/pub/ovirt-4.1/src/qemu-kvm-ev/0025-aio_notify-force-main-loop-wakeup-with-SIGIO-aarch64.patch),
 

  which force main loop wakeup with SIGIO.  But this patch was reverted
  by the patch (http://ovirt.repo.nfrance.com/src/qemu-kvm-ev/kvm-
  Revert-aio_notify-force-main-loop-wakeup-with-SIGIO-.patch).

  The problem still seems to exist in aarch64 host. The qemu version I used is 
2.8.1. The host version is 4.19.28-1.2.108.aarch64.
   Do you have any solutions to fix it?  Thanks for your reply !

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1824053/+subscriptions



[Qemu-devel] [Bug 1788582] Re: Race condition during shutdown

2019-04-15 Thread
Do you find the cause of the bug and fix it?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1788582

Title:
  Race condition during shutdown

Status in QEMU:
  New

Bug description:
  I ran into a bug when I started several VMs in parallel using
  libvirt. The VMs are using only a kernel and a initrd (which includes a
  minimal OS). The guest OS itself does a 'poweroff -f' as soon as the
  login prompt shows up. So the expectaction is that the VMs will start,
  the shutdown will be initiated, and the QEMU processes will then
  end. But instead some of the QEMU processes get stuck in ppoll().

  A bisect showed that the first bad commit was
  0f12264e7a41458179ad10276a7c33c72024861a ("block: Allow graph changes in
  bdrv_drain_all_begin/end sections").

  I've already tried the current master 
(13b7b188501d419a7d63c016e00065bcc693b7d4) 
  since the problem might be related
  to the commit a1405acddeb0af6625dd9c30e8277b08e0885bd3 ("aio: Do
  aio_notify_accept only during blocking aio_poll"). But the bug is still
  there. I’ve reproduced the bug on x86_64 and on s390x.

  The backtrace of a hanging QEMU process:

  (gdb) bt
  #0  0x7f5d0e251b36 in ppoll () from target:/lib64/libc.so.6
  #1  0x560191052014 in qemu_poll_ns (fds=0x560193b23d60, nfds=5, 
timeout=55774838936000) at /home/user/git/qemu/util/qemu-timer.c:334
  #2  0x5601910531fa in os_host_main_loop_wait (timeout=55774838936000) at 
/home/user/git/qemu/util/main-loop.c:233
  #3  0x560191053119 in main_loop_wait (nonblocking=0) at 
/home/user/git/qemu/util/main-loop.c:497
  #4  0x560190baf454 in main_loop () at /home/user/git/qemu/vl.c:1866
  #5  0x560190baa552 in main (argc=71, argv=0x7ffde10e41c8, 
envp=0x7ffde10e4408) at /home/user/git/qemu/vl.c:4644

  The used domain definition is:

  
test
716800
2
8

  hvm
  /var/lib/libvirt/images/vmlinuz-4.14.13-200.fc26.x86_64
  
/var/lib/libvirt/images/test-image-qemux86_64+modules-4.14.13-200.fc26.x86_64.cpio.gz
  console=hvc0 STARTUP=shutdown.sh
  


  


destroy
restart
preserve

  /usr/local/qemu/master/bin/qemu-system-x86_64
  

  
  
  

  
  

  
  
  
  

  

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1788582/+subscriptions



[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system

2019-04-15 Thread
sorry, I make a spelling mistake here("Hi, I also found a problem that
qemu-img convert hands in ARM.").The right is "I also found a problem
that qemu-img convert hangs in ARM".

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on high core count ARM system

Status in QEMU:
  New

Bug description:
  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760, 
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, 
  __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=, 
  timeout=timeout@entry=-1) at util/qemu-timer.c:322
  #3  0xbbefbf80 in os_host_main_loop_wait (timeout=-1)
  at util/main-loop.c:233
  #4  main_loop_wait (nonblocking=) at util/main-loop.c:497
  #5  0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at 
qemu-img.c:1980
  #6  img_convert (argc=, argv=) at 
qemu-img.c:2456
  #7  0xbbe2333c in main (argc=7, argv=) at 
qemu-img.c:4975

  Reproduced w/ latest QEMU git (@ 53744e0a182)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions



[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system

2019-04-15 Thread
Hi, I also found a problem that qemu-img convert hands in ARM.

The convert command line is "qemu-img convert -f qcow2 -O raw disk.qcow2
disk.raw ".

The bt is below:

Thread 2 (Thread 0x4b776e50 (LWP 27215)):
#0 0x4a3f2994 in sigtimedwait () from /lib64/libc.so.6
#1 0x4a39c60c in sigwait () from /lib64/libpthread.so.0
#2 0xaae82610 in sigwait_compat (opaque=0xc5163b00) at 
util/compatfd.c:37
#3 0xaae85038 in qemu_thread_start (args=args@entry=0xc5163b90) at 
util/qemu_thread_posix.c:496
#4 0x4a3918bc in start_thread () from /lib64/libpthread.so.0
#5 0x4a492b2c in thread_start () from /lib64/libc.so.6

Thread 1 (Thread 0x4b573370 (LWP 27214)):
#0 0x4a489020 in ppoll () from /lib64/libc.so.6
#1 0xaadaefc0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/bits/poll2.h:77
#2 qemu_poll_ns (fds=, nfds=, timeout=) at qemu_timer.c:391
#3 0xaadae014 in os_host_main_loop_wait (timeout=) at 
main_loop.c:272
#4 0xaadae190 in main_loop_wait (nonblocking=) at 
main_loop.c:534
#5 0xaad97be0 in convert_do_copy (s=0xdc32eb48) at qemu-img.c:1923
#6 0xaada2d70 in img_convert (argc=, argv=) at qemu-img.c:2414
#7 0xaad99ac4 in main (argc=7, argv=) at qemu-img.c:5305


Do you find the cause of the problem and fix it? Thanks for your reply!

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on high core count ARM system

Status in QEMU:
  New

Bug description:
  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760, 
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, 
  __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=, 
  timeout=timeout@entry=-1) at util/qemu-timer.c:322
  #3  0xbbefbf80 in os_host_main_loop_wait (timeout=-1)
  at util/main-loop.c:233
  #4  main_loop_wait (nonblocking=) at util/main-loop.c:497
  #5  0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at 
qemu-img.c:1980
  #6  img_convert (argc=, argv=) at 
qemu-img.c:2456
  #7  0xbbe2333c in main (argc=7, argv=) at 
qemu-img.c:4975

  Reproduced w/ latest QEMU git (@ 53744e0a182)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions



[Qemu-devel] [Bug 1824053] Re: Qemu-img convert appears to be stuck on aarch64 host with low probability

2019-04-14 Thread
I  can't reproduce this problem with  qemu.git/matser?  It seems to have
been fixed in qemu.git/matser.

But  I haven't found which patch fixed this problem from QEMU version
2.8.1 to  qemu.git/matser.

Could anybody give me some suggestions? Thanks for your reply.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1824053

Title:
  Qemu-img convert appears to be stuck on aarch64 host with low
  probability

Status in QEMU:
  New

Bug description:
  Hi,  I found a problem that qemu-img convert appears to be stuck on
  aarch64 host with low probability.

  The convert command  line is  "qemu-img convert -f qcow2 -O raw
  disk.qcow2 disk.raw ".

  The bt is below:

  Thread 2 (Thread 0x4b776e50 (LWP 27215)):
  #0  0x4a3f2994 in sigtimedwait () from /lib64/libc.so.6
  #1  0x4a39c60c in sigwait () from /lib64/libpthread.so.0
  #2  0xaae82610 in sigwait_compat (opaque=0xc5163b00) at 
util/compatfd.c:37
  #3  0xaae85038 in qemu_thread_start (args=args@entry=0xc5163b90) 
at util/qemu_thread_posix.c:496
  #4  0x4a3918bc in start_thread () from /lib64/libpthread.so.0
  #5  0x4a492b2c in thread_start () from /lib64/libc.so.6

  Thread 1 (Thread 0x4b573370 (LWP 27214)):
  #0  0x4a489020 in ppoll () from /lib64/libc.so.6
  #1  0xaadaefc0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=, 
timeout=) at qemu_timer.c:391
  #3  0xaadae014 in os_host_main_loop_wait (timeout=) at 
main_loop.c:272
  #4  0xaadae190 in main_loop_wait (nonblocking=) at 
main_loop.c:534
  #5  0xaad97be0 in convert_do_copy (s=0xdc32eb48) at 
qemu-img.c:1923
  #6  0xaada2d70 in img_convert (argc=, argv=) at qemu-img.c:2414
  #7  0xaad99ac4 in main (argc=7, argv=) at 
qemu-img.c:5305

  
  The problem seems to be very similar to the phenomenon described by this 
patch 
(https://resources.ovirt.org/pub/ovirt-4.1/src/qemu-kvm-ev/0025-aio_notify-force-main-loop-wakeup-with-SIGIO-aarch64.patch),
 

  which force main loop wakeup with SIGIO.  But this patch was reverted
  by the patch (http://ovirt.repo.nfrance.com/src/qemu-kvm-ev/kvm-
  Revert-aio_notify-force-main-loop-wakeup-with-SIGIO-.patch).

  The problem still seems to exist in aarch64 host. The qemu version I used is 
2.8.1. The host version is 4.19.28-1.2.108.aarch64.
   Do you have any solutions to fix it?  Thanks for your reply !

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1824053/+subscriptions



[Qemu-devel] [Bug 1824053] Re: Qemu-img convert appears to be stuck on aarch64 host with low probability

2019-04-12 Thread
Anyone else has a similar problem?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1824053

Title:
  Qemu-img convert appears to be stuck on aarch64 host with low
  probability

Status in QEMU:
  New

Bug description:
  Hi,  I found a problem that qemu-img convert appears to be stuck on
  aarch64 host with low probability.

  The convert command  line is  "qemu-img convert -f qcow2 -O raw
  disk.qcow2 disk.raw ".

  The bt is below:

  Thread 2 (Thread 0x4b776e50 (LWP 27215)):
  #0  0x4a3f2994 in sigtimedwait () from /lib64/libc.so.6
  #1  0x4a39c60c in sigwait () from /lib64/libpthread.so.0
  #2  0xaae82610 in sigwait_compat (opaque=0xc5163b00) at 
util/compatfd.c:37
  #3  0xaae85038 in qemu_thread_start (args=args@entry=0xc5163b90) 
at util/qemu_thread_posix.c:496
  #4  0x4a3918bc in start_thread () from /lib64/libpthread.so.0
  #5  0x4a492b2c in thread_start () from /lib64/libc.so.6

  Thread 1 (Thread 0x4b573370 (LWP 27214)):
  #0  0x4a489020 in ppoll () from /lib64/libc.so.6
  #1  0xaadaefc0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=, 
timeout=) at qemu_timer.c:391
  #3  0xaadae014 in os_host_main_loop_wait (timeout=) at 
main_loop.c:272
  #4  0xaadae190 in main_loop_wait (nonblocking=) at 
main_loop.c:534
  #5  0xaad97be0 in convert_do_copy (s=0xdc32eb48) at 
qemu-img.c:1923
  #6  0xaada2d70 in img_convert (argc=, argv=) at qemu-img.c:2414
  #7  0xaad99ac4 in main (argc=7, argv=) at 
qemu-img.c:5305

  
  The problem seems to be very similar to the phenomenon described by this 
patch 
(https://resources.ovirt.org/pub/ovirt-4.1/src/qemu-kvm-ev/0025-aio_notify-force-main-loop-wakeup-with-SIGIO-aarch64.patch),
 

  which force main loop wakeup with SIGIO.  But this patch was reverted
  by the patch (http://ovirt.repo.nfrance.com/src/qemu-kvm-ev/kvm-
  Revert-aio_notify-force-main-loop-wakeup-with-SIGIO-.patch).

  The problem still seems to exist in aarch64 host. The qemu version I used is 
2.8.1. The host version is 4.19.28-1.2.108.aarch64.
   Do you have any solutions to fix it?  Thanks for your reply !

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1824053/+subscriptions



[Qemu-devel] [Bug 1824053] [NEW] Qemu-img convert appears to be stuck on aarch64 host with low probability

2019-04-09 Thread
Public bug reported:

Hi,  I found a problem that qemu-img convert appears to be stuck on
aarch64 host with low probability.

The convert command  line is  "qemu-img convert -f qcow2 -O raw
disk.qcow2 disk.raw ".

The bt is below:

Thread 2 (Thread 0x4b776e50 (LWP 27215)):
#0  0x4a3f2994 in sigtimedwait () from /lib64/libc.so.6
#1  0x4a39c60c in sigwait () from /lib64/libpthread.so.0
#2  0xaae82610 in sigwait_compat (opaque=0xc5163b00) at 
util/compatfd.c:37
#3  0xaae85038 in qemu_thread_start (args=args@entry=0xc5163b90) at 
util/qemu_thread_posix.c:496
#4  0x4a3918bc in start_thread () from /lib64/libpthread.so.0
#5  0x4a492b2c in thread_start () from /lib64/libc.so.6

Thread 1 (Thread 0x4b573370 (LWP 27214)):
#0  0x4a489020 in ppoll () from /lib64/libc.so.6
#1  0xaadaefc0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=, nfds=, timeout=) at qemu_timer.c:391
#3  0xaadae014 in os_host_main_loop_wait (timeout=) at 
main_loop.c:272
#4  0xaadae190 in main_loop_wait (nonblocking=) at 
main_loop.c:534
#5  0xaad97be0 in convert_do_copy (s=0xdc32eb48) at qemu-img.c:1923
#6  0xaada2d70 in img_convert (argc=, argv=) at qemu-img.c:2414
#7  0xaad99ac4 in main (argc=7, argv=) at qemu-img.c:5305


The problem seems to be very similar to the phenomenon described by this patch 
(https://resources.ovirt.org/pub/ovirt-4.1/src/qemu-kvm-ev/0025-aio_notify-force-main-loop-wakeup-with-SIGIO-aarch64.patch),
 

which force main loop wakeup with SIGIO.  But this patch was reverted by
the patch (http://ovirt.repo.nfrance.com/src/qemu-kvm-ev/kvm-Revert-
aio_notify-force-main-loop-wakeup-with-SIGIO-.patch).

The problem still seems to exist in aarch64 host. The qemu version I used is 
2.8.1. The host version is 4.19.28-1.2.108.aarch64.
 Do you have any solutions to fix it?  Thanks for your reply !

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1824053

Title:
  Qemu-img convert appears to be stuck on aarch64 host with low
  probability

Status in QEMU:
  New

Bug description:
  Hi,  I found a problem that qemu-img convert appears to be stuck on
  aarch64 host with low probability.

  The convert command  line is  "qemu-img convert -f qcow2 -O raw
  disk.qcow2 disk.raw ".

  The bt is below:

  Thread 2 (Thread 0x4b776e50 (LWP 27215)):
  #0  0x4a3f2994 in sigtimedwait () from /lib64/libc.so.6
  #1  0x4a39c60c in sigwait () from /lib64/libpthread.so.0
  #2  0xaae82610 in sigwait_compat (opaque=0xc5163b00) at 
util/compatfd.c:37
  #3  0xaae85038 in qemu_thread_start (args=args@entry=0xc5163b90) 
at util/qemu_thread_posix.c:496
  #4  0x4a3918bc in start_thread () from /lib64/libpthread.so.0
  #5  0x4a492b2c in thread_start () from /lib64/libc.so.6

  Thread 1 (Thread 0x4b573370 (LWP 27214)):
  #0  0x4a489020 in ppoll () from /lib64/libc.so.6
  #1  0xaadaefc0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=, 
timeout=) at qemu_timer.c:391
  #3  0xaadae014 in os_host_main_loop_wait (timeout=) at 
main_loop.c:272
  #4  0xaadae190 in main_loop_wait (nonblocking=) at 
main_loop.c:534
  #5  0xaad97be0 in convert_do_copy (s=0xdc32eb48) at 
qemu-img.c:1923
  #6  0xaada2d70 in img_convert (argc=, argv=) at qemu-img.c:2414
  #7  0xaad99ac4 in main (argc=7, argv=) at 
qemu-img.c:5305

  
  The problem seems to be very similar to the phenomenon described by this 
patch 
(https://resources.ovirt.org/pub/ovirt-4.1/src/qemu-kvm-ev/0025-aio_notify-force-main-loop-wakeup-with-SIGIO-aarch64.patch),
 

  which force main loop wakeup with SIGIO.  But this patch was reverted
  by the patch (http://ovirt.repo.nfrance.com/src/qemu-kvm-ev/kvm-
  Revert-aio_notify-force-main-loop-wakeup-with-SIGIO-.patch).

  The problem still seems to exist in aarch64 host. The qemu version I used is 
2.8.1. The host version is 4.19.28-1.2.108.aarch64.
   Do you have any solutions to fix it?  Thanks for your reply !

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1824053/+subscriptions



Re: [Qemu-devel] [Bug 1779120] Re: disk missing in the guest contingently when hotplug several virtio scsi disks consecutively

2018-06-28 Thread
Hi, Stefan.
(host)# rpm -qa | grep qemu-kvm
qemu-kvm-2.8.1-25.142.x86_64
(guest)# uname -r
3.10.0-514.el7.x86_64

I also tried the newest version of qemu-kvm, but it also met this issue.
The steps to reproduce this issue are below:

1)attach four virtio-scsi controller with dataplane to vm.
     
   
   
   
     
     
   
   
   
     
     
   
   
   
     
     
   
   
   
     

2)attach 35 virtio-scsi disks(sda - sdai) to vm consecutively. One 
controller has 15 scsi disks.
A example of disk xml is below:
     
   
   
   
   
   
   
   
     

    You can write a shell script like this:
        for((i=1;i++;i<=35))
        do
             virsh attach-device centos7.3_64_server scsi_disk_$i.xml 
--config --live
        done

This issue is a probabilistic event. If it does not appear, repeat the 
above steps several more times.
Thank you!

On 2018/6/28 21:01, Stefan Hajnoczi wrote:
> Please post the following information:
> (host)# rpm -qa | grep qemu-kvm
> (guest)# uname -r
>
> What are the exact steps to reproduce this issue (virsh command-lines
> and XML)?
>

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1779120

Title:
  disk missing in the guest contingently when hotplug several virtio
  scsi disks consecutively

Status in QEMU:
  New

Bug description:
  Hi, I found a bug that disk missing (not all disks missing ) in the
  guest contingently when hotplug several virtio scsi disks
  consecutively.  After rebooting the guest,the missing disks appear
  again.

  The guest is centos7.3 running on a centos7.3 host and the scsi
  controllers are configed with iothread.  The scsi controller xml is
  below:

  
    
    
    
  

  If the scsi controllers are configed without iothread,  disks are all
  can be seen in the guest when hotplug several virtio scsi disks
  consecutively.

  I think the biggest difference between them is that scsi controllers
  with iothread call virtio_notify_irqfd to notify guest and scsi
  controllers without iothread call virtio_notify instead.  What make it
  difference? Will interrupts are lost when call virtio_notify_irqfd
  due to  race condition for some unknow reasons? Maybe guys more
  familiar with scsi dataplane can help. Thanks for your reply!

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1779120/+subscriptions



[Qemu-devel] [Bug 1779120] [NEW] disk missing in the guest contingently when hotplug several virtio scsi disks consecutively

2018-06-28 Thread
Public bug reported:

Hi, I found a bug that disk missing (not all disks missing ) in the
guest contingently when hotplug several virtio scsi disks consecutively.
After rebooting the guest,the missing disks appear again.

The guest is centos7.3 running on a centos7.3 host and the scsi
controllers are configed with iothread.  The scsi controller xml is
below:


  
  
  


If the scsi controllers are configed without iothread,  disks are all
can be seen in the guest when hotplug several virtio scsi disks
consecutively.

I think the biggest difference between them is that scsi controllers
with iothread call virtio_notify_irqfd to notify guest and scsi
controllers without iothread call virtio_notify instead.  What make it
difference? Will interrupts are lost when call virtio_notify_irqfd  due
to  race condition for some unknow reasons? Maybe guys more familiar
with scsi dataplane can help. Thanks for your reply!

** Affects: qemu
 Importance: Undecided
 Status: New

** Description changed:

  Hi, I found a bug that disk missing (not all disks missing ) in the
  guest contingently when hotplug several virtio scsi disks consecutively.
  After rebooting the guest,the missing disks appear again.
  
  The guest is centos7.3 running on a centos7.3 host and the scsi
  controllers are configed with iothread.  The scsi controller xml is
  below:
  
- 
-   
-   
-   
- 
+ 
+   
+   
+   
+ 
  
- 
- If the scsi controllers are configed without iothread,  disks are all can be 
seen in the guest when hotplug several virtio scsi disks consecutively.
+ If the scsi controllers are configed without iothread,  disks are all
+ can be seen in the guest when hotplug several virtio scsi disks
+ consecutively.
  
  I think the biggest difference between them is that scsi controllers
  with iothread call virtio_notify_irqfd to notify guest and scsi
- controllers without
- 
- iothread call virtio_notify instead.  What make it difference? Will
- interrupts are lost when call virtio_notify_irqfd  due to  race
- condition for some unknow reasons? Maybe guys more familiar with scsi
- dataplane can help. Thanks for your reply!
+ controllers without iothread call virtio_notify instead.  What make it
+ difference? Will interrupts are lost when call virtio_notify_irqfd  due
+ to  race condition for some unknow reasons? Maybe guys more familiar
+ with scsi dataplane can help. Thanks for your reply!

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1779120

Title:
  disk missing in the guest contingently when hotplug several virtio
  scsi disks consecutively

Status in QEMU:
  New

Bug description:
  Hi, I found a bug that disk missing (not all disks missing ) in the
  guest contingently when hotplug several virtio scsi disks
  consecutively.  After rebooting the guest,the missing disks appear
  again.

  The guest is centos7.3 running on a centos7.3 host and the scsi
  controllers are configed with iothread.  The scsi controller xml is
  below:

  
    
    
    
  

  If the scsi controllers are configed without iothread,  disks are all
  can be seen in the guest when hotplug several virtio scsi disks
  consecutively.

  I think the biggest difference between them is that scsi controllers
  with iothread call virtio_notify_irqfd to notify guest and scsi
  controllers without iothread call virtio_notify instead.  What make it
  difference? Will interrupts are lost when call virtio_notify_irqfd
  due to  race condition for some unknow reasons? Maybe guys more
  familiar with scsi dataplane can help. Thanks for your reply!

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1779120/+subscriptions