Re: [Qemu-devel] iothread: release iothread around aio_poll causes random hangs at startup

2015-07-16 Thread Christian Borntraeger
Am 10.06.2015 um 11:34 schrieb Fam Zheng:
 On Wed, 06/10 11:18, Christian Borntraeger wrote:
 Am 10.06.2015 um 04:12 schrieb Fam Zheng:
 On Tue, 06/09 11:01, Christian Borntraeger wrote:
 Am 09.06.2015 um 04:28 schrieb Fam Zheng:
 On Tue, 06/02 16:36, Christian Borntraeger wrote:
 Paolo,

 I bisected 
 commit a0710f7995f914e3044e5899bd8ff6c43c62f916
 Author: Paolo Bonzini pbonz...@redhat.com
 AuthorDate: Fri Feb 20 17:26:52 2015 +0100
 Commit: Kevin Wolf kw...@redhat.com
 CommitDate: Tue Apr 28 15:36:08 2015 +0200

 iothread: release iothread around aio_poll

 to cause a problem with hanging guests.

 Having many guests all with a kernel/ramdisk (via -kernel) and
 several null block devices will result in hangs. All hanging 
 guests are in partition detection code waiting for an I/O to return
 so very early maybe even the first I/O.

 Reverting that commit fixes the hangs.
 Any ideas?


For what its worth, I can no longer reproduce the issue on
current master + cherry-pick of a0710f7995f (iothread: release iothread around 
aio_poll)

bisect tells me that

commit 53ec73e264f481b79b52efcadc9ceb8f8996975c
Author: Fam Zheng f...@redhat.com
AuthorDate: Fri May 29 18:53:14 2015 +0800
Commit: Stefan Hajnoczi stefa...@redhat.com
CommitDate: Tue Jul 7 14:27:14 2015 +0100

block: Use bdrv_drain to replace uncessary bdrv_drain_all

made the problem will blk-null go away. I still dont understand why.

Christian




Re: [Qemu-devel] iothread: release iothread around aio_poll causes random hangs at startup

2015-07-16 Thread Christian Borntraeger
Am 16.07.2015 um 13:20 schrieb Paolo Bonzini:
 
 
 On 16/07/2015 13:03, Christian Borntraeger wrote:
 For what its worth, I can no longer reproduce the issue on
 current master + cherry-pick of a0710f7995f (iothread: release iothread 
 around aio_poll)

 bisect tells me that

 commit 53ec73e264f481b79b52efcadc9ceb8f8996975c
 Author: Fam Zheng f...@redhat.com
 AuthorDate: Fri May 29 18:53:14 2015 +0800
 Commit: Stefan Hajnoczi stefa...@redhat.com
 CommitDate: Tue Jul 7 14:27:14 2015 +0100

 block: Use bdrv_drain to replace uncessary bdrv_drain_all

 made the problem will blk-null go away. I still dont understand why.
 
 It could be related to the AioContext problem that I'm fixing these
 days, too.  Good news, we'll requeue the patch for 2.5.

That was also something that I had in mind (in fact I retested this to check
the ctx patch). master + cherry-pick of a0710f7995f + revert of 53ec73e26 + 
this fix
still fails, so it was (is?) a different issue. The interesting part is that 
this
problem required 2 or more disk (and we replace drain_all with single drains) so
it somewhat sounds plausible.

Christian




Re: [Qemu-devel] iothread: release iothread around aio_poll causes random hangs at startup

2015-07-16 Thread Paolo Bonzini


On 16/07/2015 13:24, Christian Borntraeger wrote:
 The interesting part is that this problem required 2 or more disk
 (and we replace drain_all with single drains) so it somewhat sounds
 plausible.

Yes, indeed.  It is very plausible.  I wanted to reproduce it these
days, so thanks for saving me a lot of time!  I'll test your exact setup
(master + AioContext fix + cherry-pick of a0710f7995f + revert of
53ec73e26).

Thanks,

Paolo



Re: [Qemu-devel] iothread: release iothread around aio_poll causes random hangs at startup

2015-07-16 Thread Paolo Bonzini


On 16/07/2015 13:03, Christian Borntraeger wrote:
 For what its worth, I can no longer reproduce the issue on
 current master + cherry-pick of a0710f7995f (iothread: release iothread 
 around aio_poll)
 
 bisect tells me that
 
 commit 53ec73e264f481b79b52efcadc9ceb8f8996975c
 Author: Fam Zheng f...@redhat.com
 AuthorDate: Fri May 29 18:53:14 2015 +0800
 Commit: Stefan Hajnoczi stefa...@redhat.com
 CommitDate: Tue Jul 7 14:27:14 2015 +0100
 
 block: Use bdrv_drain to replace uncessary bdrv_drain_all
 
 made the problem will blk-null go away. I still dont understand why.

It could be related to the AioContext problem that I'm fixing these
days, too.  Good news, we'll requeue the patch for 2.5.

Paolo



Re: [Qemu-devel] iothread: release iothread around aio_poll causes random hangs at startup

2015-06-10 Thread Christian Borntraeger
Am 10.06.2015 um 11:34 schrieb Fam Zheng:
 On Wed, 06/10 11:18, Christian Borntraeger wrote:
 Am 10.06.2015 um 04:12 schrieb Fam Zheng:
 On Tue, 06/09 11:01, Christian Borntraeger wrote:
 Am 09.06.2015 um 04:28 schrieb Fam Zheng:
 On Tue, 06/02 16:36, Christian Borntraeger wrote:
 Paolo,

 I bisected 
 commit a0710f7995f914e3044e5899bd8ff6c43c62f916
 Author: Paolo Bonzini pbonz...@redhat.com
 AuthorDate: Fri Feb 20 17:26:52 2015 +0100
 Commit: Kevin Wolf kw...@redhat.com
 CommitDate: Tue Apr 28 15:36:08 2015 +0200

 iothread: release iothread around aio_poll

 to cause a problem with hanging guests.

 Having many guests all with a kernel/ramdisk (via -kernel) and
 several null block devices will result in hangs. All hanging 
 guests are in partition detection code waiting for an I/O to return
 so very early maybe even the first I/O.

 Reverting that commit fixes the hangs.
 Any ideas?

 Christian, I can't reproduce this on my x86 box with virtio-blk-pci. Do 
 you
 have a reproducer for x86? Or could you collect backtraces for all the 
 threads
 in QEMU when it hangs?

 My long shot is that the main loop is blocked at aio_context_acquire(ctx),
 while the iothread of that ctx is blocked at aio_poll(ctx, blocking).

 Here is a backtrace on s390. I need 2 or more disks, (one is not enough).

 It shows iothreads and main loop are all waiting for events, and the vcpu
 threads are running guest code.

 It could be the requests being leaked. Do you see this problem with a 
 regular
 file based image or null-co driver? Maybe we're missing something about the
 AioContext in block/null.c.

 It seems to run with normal file based images. As soon as I have two or more 
 null-aio
 devices it hangs pretty soon when doing a reboot loop.

 
 Ahh! If it's a reboot loop, the device reset thing may get fishy. I suspect 
 the
 completion BH used by null-aio may be messed up, that's why I wonder whether
 null-co:// would work for you. Could you test that?

null-co also fails.

 
 Also, could you try below patch with null-aio://, too?

The same. Guests still get stuck.


 
 Thanks,
 Fam
 
 ---
 
 diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
 index cd539aa..c87b444 100644
 --- a/hw/block/virtio-blk.c
 +++ b/hw/block/virtio-blk.c
 @@ -652,15 +652,11 @@ static void virtio_blk_reset(VirtIODevice *vdev)
  {
  VirtIOBlock *s = VIRTIO_BLK(vdev);
 
 -if (s-dataplane) {
 -virtio_blk_data_plane_stop(s-dataplane);
 -}
 -
 -/*
 - * This should cancel pending requests, but can't do nicely until there
 - * are per-device request lists.
 - */
  blk_drain_all();
 +if (s-dataplane) {
 +virtio_blk_data_plane_stop(s-dataplane);
 +}
 +
  blk_set_enable_write_cache(s-blk, s-original_wce);
  }
 




Re: [Qemu-devel] iothread: release iothread around aio_poll causes random hangs at startup

2015-06-10 Thread Fam Zheng
On Wed, 06/10 11:18, Christian Borntraeger wrote:
 Am 10.06.2015 um 04:12 schrieb Fam Zheng:
  On Tue, 06/09 11:01, Christian Borntraeger wrote:
  Am 09.06.2015 um 04:28 schrieb Fam Zheng:
  On Tue, 06/02 16:36, Christian Borntraeger wrote:
  Paolo,
 
  I bisected 
  commit a0710f7995f914e3044e5899bd8ff6c43c62f916
  Author: Paolo Bonzini pbonz...@redhat.com
  AuthorDate: Fri Feb 20 17:26:52 2015 +0100
  Commit: Kevin Wolf kw...@redhat.com
  CommitDate: Tue Apr 28 15:36:08 2015 +0200
 
  iothread: release iothread around aio_poll
 
  to cause a problem with hanging guests.
 
  Having many guests all with a kernel/ramdisk (via -kernel) and
  several null block devices will result in hangs. All hanging 
  guests are in partition detection code waiting for an I/O to return
  so very early maybe even the first I/O.
 
  Reverting that commit fixes the hangs.
  Any ideas?
 
  Christian, I can't reproduce this on my x86 box with virtio-blk-pci. Do 
  you
  have a reproducer for x86? Or could you collect backtraces for all the 
  threads
  in QEMU when it hangs?
 
  My long shot is that the main loop is blocked at aio_context_acquire(ctx),
  while the iothread of that ctx is blocked at aio_poll(ctx, blocking).
 
  Here is a backtrace on s390. I need 2 or more disks, (one is not enough).
  
  It shows iothreads and main loop are all waiting for events, and the vcpu
  threads are running guest code.
  
  It could be the requests being leaked. Do you see this problem with a 
  regular
  file based image or null-co driver? Maybe we're missing something about the
  AioContext in block/null.c.
 
 It seems to run with normal file based images. As soon as I have two or more 
 null-aio
 devices it hangs pretty soon when doing a reboot loop.
 

Ahh! If it's a reboot loop, the device reset thing may get fishy. I suspect the
completion BH used by null-aio may be messed up, that's why I wonder whether
null-co:// would work for you. Could you test that?

Also, could you try below patch with null-aio://, too?

Thanks,
Fam

---

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index cd539aa..c87b444 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -652,15 +652,11 @@ static void virtio_blk_reset(VirtIODevice *vdev)
 {
 VirtIOBlock *s = VIRTIO_BLK(vdev);
 
-if (s-dataplane) {
-virtio_blk_data_plane_stop(s-dataplane);
-}
-
-/*
- * This should cancel pending requests, but can't do nicely until there
- * are per-device request lists.
- */
 blk_drain_all();
+if (s-dataplane) {
+virtio_blk_data_plane_stop(s-dataplane);
+}
+
 blk_set_enable_write_cache(s-blk, s-original_wce);
 }



Re: [Qemu-devel] iothread: release iothread around aio_poll causes random hangs at startup

2015-06-10 Thread Christian Borntraeger
Am 10.06.2015 um 04:12 schrieb Fam Zheng:
 On Tue, 06/09 11:01, Christian Borntraeger wrote:
 Am 09.06.2015 um 04:28 schrieb Fam Zheng:
 On Tue, 06/02 16:36, Christian Borntraeger wrote:
 Paolo,

 I bisected 
 commit a0710f7995f914e3044e5899bd8ff6c43c62f916
 Author: Paolo Bonzini pbonz...@redhat.com
 AuthorDate: Fri Feb 20 17:26:52 2015 +0100
 Commit: Kevin Wolf kw...@redhat.com
 CommitDate: Tue Apr 28 15:36:08 2015 +0200

 iothread: release iothread around aio_poll

 to cause a problem with hanging guests.

 Having many guests all with a kernel/ramdisk (via -kernel) and
 several null block devices will result in hangs. All hanging 
 guests are in partition detection code waiting for an I/O to return
 so very early maybe even the first I/O.

 Reverting that commit fixes the hangs.
 Any ideas?

 Christian, I can't reproduce this on my x86 box with virtio-blk-pci. Do you
 have a reproducer for x86? Or could you collect backtraces for all the 
 threads
 in QEMU when it hangs?

 My long shot is that the main loop is blocked at aio_context_acquire(ctx),
 while the iothread of that ctx is blocked at aio_poll(ctx, blocking).

 Here is a backtrace on s390. I need 2 or more disks, (one is not enough).
 
 It shows iothreads and main loop are all waiting for events, and the vcpu
 threads are running guest code.
 
 It could be the requests being leaked. Do you see this problem with a regular
 file based image or null-co driver? Maybe we're missing something about the
 AioContext in block/null.c.

It seems to run with normal file based images. As soon as I have two or more 
null-aio
devices it hangs pretty soon when doing a reboot loop.

Christian




Re: [Qemu-devel] iothread: release iothread around aio_poll causes random hangs at startup

2015-06-09 Thread Christian Borntraeger
Am 09.06.2015 um 04:28 schrieb Fam Zheng:
 On Tue, 06/02 16:36, Christian Borntraeger wrote:
 Paolo,

 I bisected 
 commit a0710f7995f914e3044e5899bd8ff6c43c62f916
 Author: Paolo Bonzini pbonz...@redhat.com
 AuthorDate: Fri Feb 20 17:26:52 2015 +0100
 Commit: Kevin Wolf kw...@redhat.com
 CommitDate: Tue Apr 28 15:36:08 2015 +0200

 iothread: release iothread around aio_poll

 to cause a problem with hanging guests.

 Having many guests all with a kernel/ramdisk (via -kernel) and
 several null block devices will result in hangs. All hanging 
 guests are in partition detection code waiting for an I/O to return
 so very early maybe even the first I/O.

 Reverting that commit fixes the hangs.
 Any ideas?
 
 Christian, I can't reproduce this on my x86 box with virtio-blk-pci. Do you
 have a reproducer for x86? Or could you collect backtraces for all the threads
 in QEMU when it hangs?
 
 My long shot is that the main loop is blocked at aio_context_acquire(ctx),
 while the iothread of that ctx is blocked at aio_poll(ctx, blocking).

Here is a backtrace on s390. I need 2 or more disks, (one is not enough).

Thread 5 (Thread 0x3fffb406910 (LWP 74602)):
#0  0x03fffc0bde8e in syscall () from /lib64/libc.so.6
#1  0x801dd282 in futex_wait (val=4294967295, ev=0x8079c6c4 
rcu_call_ready_event) at 
/home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:301
#2  qemu_event_wait (ev=ev@entry=0x8079c6c4 rcu_call_ready_event) at 
/home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:399
#3  0x801ec75c in call_rcu_thread (opaque=optimized out) at 
/home/cborntra/REPOS/qemu/util/rcu.c:233
#4  0x03fffc16f4e6 in start_thread () from /lib64/libpthread.so.0
#5  0x03fffc0c30fa in thread_start () from /lib64/libc.so.6

Thread 4 (Thread 0x3fffabf7910 (LWP 74604)):
#0  0x03fffc0b75d6 in ppoll () from /lib64/libc.so.6
#1  0x8016fbd0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=optimized 
out, __fds=optimized out) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=optimized out, nfds=optimized out, 
timeout=timeout@entry=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:310
#3  0x80170d32 in aio_poll (ctx=0x807d6a70, 
blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:274
#4  0x800b3758 in iothread_run (opaque=0x807d6690) at 
/home/cborntra/REPOS/qemu/iothread.c:41
#5  0x03fffc16f4e6 in start_thread () from /lib64/libpthread.so.0
#6  0x03fffc0c30fa in thread_start () from /lib64/libc.so.6

Thread 3 (Thread 0x3fffa3f7910 (LWP 74605)):
#0  0x03fffc0b75d6 in ppoll () from /lib64/libc.so.6
#1  0x8016fbd0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=optimized 
out, __fds=optimized out) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=optimized out, nfds=optimized out, 
timeout=timeout@entry=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:310
#3  0x80170d32 in aio_poll (ctx=0x807d94a0, 
blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:274
#4  0x800b3758 in iothread_run (opaque=0x807d6f60) at 
/home/cborntra/REPOS/qemu/iothread.c:41
#5  0x03fffc16f4e6 in start_thread () from /lib64/libpthread.so.0
#6  0x03fffc0c30fa in thread_start () from /lib64/libc.so.6

Thread 2 (Thread 0x3fff8a21910 (LWP 74625)):
#0  0x03fffc0b90a2 in ioctl () from /lib64/libc.so.6
#1  0x80056a46 in kvm_vcpu_ioctl (cpu=cpu@entry=0x81f3b620, 
type=type@entry=44672) at /home/cborntra/REPOS/qemu/kvm-all.c:1916
#2  0x80056b08 in kvm_cpu_exec (cpu=cpu@entry=0x81f3b620) at 
/home/cborntra/REPOS/qemu/kvm-all.c:1775
#3  0x800445de in qemu_kvm_cpu_thread_fn (arg=0x81f3b620) at 
/home/cborntra/REPOS/qemu/cpus.c:979
#4  0x03fffc16f4e6 in start_thread () from /lib64/libpthread.so.0
#5  0x03fffc0c30fa in thread_start () from /lib64/libc.so.6

Thread 1 (Thread 0x3fffb408bc0 (LWP 74580)):
#0  0x03fffc0b75d6 in ppoll () from /lib64/libc.so.6
#1  0x8016fbb0 in ppoll (__ss=0x0, __timeout=0x3d64438, 
__nfds=optimized out, __fds=optimized out) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=optimized out, nfds=optimized out, 
timeout=timeout@entry=99900) at /home/cborntra/REPOS/qemu/qemu-timer.c:322
#3  0x8016f230 in os_host_main_loop_wait (timeout=99900) at 
/home/cborntra/REPOS/qemu/main-loop.c:239
#4  main_loop_wait (nonblocking=optimized out) at 
/home/cborntra/REPOS/qemu/main-loop.c:494
#5  0x8001346a in main_loop () at /home/cborntra/REPOS/qemu/vl.c:1789
#6  main (argc=optimized out, argv=optimized out, envp=optimized out) at 
/home/cborntra/REPOS/qemu/vl.c:4391




Re: [Qemu-devel] iothread: release iothread around aio_poll causes random hangs at startup

2015-06-09 Thread Fam Zheng
On Tue, 06/09 11:01, Christian Borntraeger wrote:
 Am 09.06.2015 um 04:28 schrieb Fam Zheng:
  On Tue, 06/02 16:36, Christian Borntraeger wrote:
  Paolo,
 
  I bisected 
  commit a0710f7995f914e3044e5899bd8ff6c43c62f916
  Author: Paolo Bonzini pbonz...@redhat.com
  AuthorDate: Fri Feb 20 17:26:52 2015 +0100
  Commit: Kevin Wolf kw...@redhat.com
  CommitDate: Tue Apr 28 15:36:08 2015 +0200
 
  iothread: release iothread around aio_poll
 
  to cause a problem with hanging guests.
 
  Having many guests all with a kernel/ramdisk (via -kernel) and
  several null block devices will result in hangs. All hanging 
  guests are in partition detection code waiting for an I/O to return
  so very early maybe even the first I/O.
 
  Reverting that commit fixes the hangs.
  Any ideas?
  
  Christian, I can't reproduce this on my x86 box with virtio-blk-pci. Do you
  have a reproducer for x86? Or could you collect backtraces for all the 
  threads
  in QEMU when it hangs?
  
  My long shot is that the main loop is blocked at aio_context_acquire(ctx),
  while the iothread of that ctx is blocked at aio_poll(ctx, blocking).
 
 Here is a backtrace on s390. I need 2 or more disks, (one is not enough).

It shows iothreads and main loop are all waiting for events, and the vcpu
threads are running guest code.

It could be the requests being leaked. Do you see this problem with a regular
file based image or null-co driver? Maybe we're missing something about the
AioContext in block/null.c.

Fam

 
 Thread 5 (Thread 0x3fffb406910 (LWP 74602)):
 #0  0x03fffc0bde8e in syscall () from /lib64/libc.so.6
 #1  0x801dd282 in futex_wait (val=4294967295, ev=0x8079c6c4 
 rcu_call_ready_event) at 
 /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:301
 #2  qemu_event_wait (ev=ev@entry=0x8079c6c4 rcu_call_ready_event) at 
 /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:399
 #3  0x801ec75c in call_rcu_thread (opaque=optimized out) at 
 /home/cborntra/REPOS/qemu/util/rcu.c:233
 #4  0x03fffc16f4e6 in start_thread () from /lib64/libpthread.so.0
 #5  0x03fffc0c30fa in thread_start () from /lib64/libc.so.6
 
 Thread 4 (Thread 0x3fffabf7910 (LWP 74604)):
 #0  0x03fffc0b75d6 in ppoll () from /lib64/libc.so.6
 #1  0x8016fbd0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=optimized 
 out, __fds=optimized out) at /usr/include/bits/poll2.h:77
 #2  qemu_poll_ns (fds=optimized out, nfds=optimized out, 
 timeout=timeout@entry=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:310
 #3  0x80170d32 in aio_poll (ctx=0x807d6a70, 
 blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:274
 #4  0x800b3758 in iothread_run (opaque=0x807d6690) at 
 /home/cborntra/REPOS/qemu/iothread.c:41
 #5  0x03fffc16f4e6 in start_thread () from /lib64/libpthread.so.0
 #6  0x03fffc0c30fa in thread_start () from /lib64/libc.so.6
 
 Thread 3 (Thread 0x3fffa3f7910 (LWP 74605)):
 #0  0x03fffc0b75d6 in ppoll () from /lib64/libc.so.6
 #1  0x8016fbd0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=optimized 
 out, __fds=optimized out) at /usr/include/bits/poll2.h:77
 #2  qemu_poll_ns (fds=optimized out, nfds=optimized out, 
 timeout=timeout@entry=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:310
 #3  0x80170d32 in aio_poll (ctx=0x807d94a0, 
 blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:274
 #4  0x800b3758 in iothread_run (opaque=0x807d6f60) at 
 /home/cborntra/REPOS/qemu/iothread.c:41
 #5  0x03fffc16f4e6 in start_thread () from /lib64/libpthread.so.0
 #6  0x03fffc0c30fa in thread_start () from /lib64/libc.so.6
 
 Thread 2 (Thread 0x3fff8a21910 (LWP 74625)):
 #0  0x03fffc0b90a2 in ioctl () from /lib64/libc.so.6
 #1  0x80056a46 in kvm_vcpu_ioctl (cpu=cpu@entry=0x81f3b620, 
 type=type@entry=44672) at /home/cborntra/REPOS/qemu/kvm-all.c:1916
 #2  0x80056b08 in kvm_cpu_exec (cpu=cpu@entry=0x81f3b620) at 
 /home/cborntra/REPOS/qemu/kvm-all.c:1775
 #3  0x800445de in qemu_kvm_cpu_thread_fn (arg=0x81f3b620) at 
 /home/cborntra/REPOS/qemu/cpus.c:979
 #4  0x03fffc16f4e6 in start_thread () from /lib64/libpthread.so.0
 #5  0x03fffc0c30fa in thread_start () from /lib64/libc.so.6
 
 Thread 1 (Thread 0x3fffb408bc0 (LWP 74580)):
 #0  0x03fffc0b75d6 in ppoll () from /lib64/libc.so.6
 #1  0x8016fbb0 in ppoll (__ss=0x0, __timeout=0x3d64438, 
 __nfds=optimized out, __fds=optimized out) at /usr/include/bits/poll2.h:77
 #2  qemu_poll_ns (fds=optimized out, nfds=optimized out, 
 timeout=timeout@entry=99900) at /home/cborntra/REPOS/qemu/qemu-timer.c:322
 #3  0x8016f230 in os_host_main_loop_wait (timeout=99900) at 
 /home/cborntra/REPOS/qemu/main-loop.c:239
 #4  main_loop_wait (nonblocking=optimized out) at 
 /home/cborntra/REPOS/qemu/main-loop.c:494
 #5  0x8001346a in main_loop () at /home/cborntra/REPOS/qemu/vl.c:1789
 #6  main (argc=optimized out, argv=optimized out, envp=optimized out) 
 at 

Re: [Qemu-devel] iothread: release iothread around aio_poll causes random hangs at startup

2015-06-08 Thread Fam Zheng
On Tue, 06/02 16:36, Christian Borntraeger wrote:
 Paolo,
 
 I bisected 
 commit a0710f7995f914e3044e5899bd8ff6c43c62f916
 Author: Paolo Bonzini pbonz...@redhat.com
 AuthorDate: Fri Feb 20 17:26:52 2015 +0100
 Commit: Kevin Wolf kw...@redhat.com
 CommitDate: Tue Apr 28 15:36:08 2015 +0200
 
 iothread: release iothread around aio_poll
 
 to cause a problem with hanging guests.
 
 Having many guests all with a kernel/ramdisk (via -kernel) and
 several null block devices will result in hangs. All hanging 
 guests are in partition detection code waiting for an I/O to return
 so very early maybe even the first I/O.
 
 Reverting that commit fixes the hangs.
 Any ideas?

Christian, I can't reproduce this on my x86 box with virtio-blk-pci. Do you
have a reproducer for x86? Or could you collect backtraces for all the threads
in QEMU when it hangs?

My long shot is that the main loop is blocked at aio_context_acquire(ctx),
while the iothread of that ctx is blocked at aio_poll(ctx, blocking).

Thanks,
Fam

 
 Christian
 
 PS: A guest xml looks like
 
 
 domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'
   nametest2/name
   uuid13bd8253-9abb-4be8-9399-73b899762aaa/uuid
   memory unit='KiB'286720/memory
   currentMemory unit='KiB'286720/currentMemory
   vcpu placement='static'10/vcpu
   iothreads8/iothreads
   cputune
 shares12/shares
   /cputune
   os
 type arch='s390x' machine='s390-ccw-virtio'hvm/type
 kernel/boot/vmlinux-4.0.0+/kernel
 initrd/boot/ramdisk.reboot/initrd
 cmdlineroot=/dev/ram0/cmdline
 boot dev='hd'/
   /os
   clock offset='utc'/
   on_poweroffdestroy/on_poweroff
   on_rebootrestart/on_reboot
   on_crashpreserve/on_crash
   devices
 controller type='usb' index='0' model='none'/
 console type='pty'
   target type='sclp' port='0'/
 /console
 memballoon model='none'/
   /devices
   qemu:commandline
 qemu:arg value='-drive'/
 qemu:arg value='driver=null-aio,id=null1,if=none,size=100G'/
 qemu:arg value='-device'/
 qemu:arg 
 value='virtio-blk-ccw,drive=null1,serial=null1,iothread=iothread1'/
 qemu:arg value='-drive'/
 qemu:arg value='driver=null-aio,id=null2,if=none,size=100G'/
 qemu:arg value='-device'/
 qemu:arg 
 value='virtio-blk-ccw,drive=null2,serial=null2,iothread=iothread2'/
 qemu:arg value='-drive'/
 qemu:arg value='driver=null-aio,id=null3,if=none,size=100G'/
 qemu:arg value='-device'/
 qemu:arg 
 value='virtio-blk-ccw,drive=null3,serial=null3,iothread=iothread3'/
 qemu:arg value='-drive'/
 qemu:arg value='driver=null-aio,id=null4,if=none,size=100G'/
 qemu:arg value='-device'/
 qemu:arg 
 value='virtio-blk-ccw,drive=null4,serial=null4,iothread=iothread4'/
 qemu:arg value='-drive'/
 qemu:arg value='driver=null-aio,id=null5,if=none,size=100G'/
 qemu:arg value='-device'/
 qemu:arg 
 value='virtio-blk-ccw,drive=null5,serial=null5,iothread=iothread5'/
 qemu:arg value='-drive'/
 qemu:arg value='driver=null-aio,id=null6,if=none,size=100G'/
 qemu:arg value='-device'/
 qemu:arg 
 value='virtio-blk-ccw,drive=null6,serial=null6,iothread=iothread6'/
 qemu:arg value='-drive'/
 qemu:arg value='driver=null-aio,id=null7,if=none,size=100G'/
 qemu:arg value='-device'/
 qemu:arg 
 value='virtio-blk-ccw,drive=null7,serial=null7,iothread=iothread7'/
 qemu:arg value='-drive'/
 qemu:arg value='driver=null-aio,id=null8,if=none,size=100G'/
 qemu:arg value='-device'/
 qemu:arg 
 value='virtio-blk-ccw,drive=null8,serial=null8,iothread=iothread8'/
   /qemu:commandline
 /domain
 
 



Re: [Qemu-devel] iothread: release iothread around aio_poll causes random hangs at startup

2015-06-03 Thread Stefan Hajnoczi
On Tue, Jun 02, 2015 at 04:51:46PM +0200, Paolo Bonzini wrote:
 
 
 On 02/06/2015 16:36, Christian Borntraeger wrote:
  commit a0710f7995f914e3044e5899bd8ff6c43c62f916
  Author: Paolo Bonzini pbonz...@redhat.com
  AuthorDate: Fri Feb 20 17:26:52 2015 +0100
  Commit: Kevin Wolf kw...@redhat.com
  CommitDate: Tue Apr 28 15:36:08 2015 +0200
  
  iothread: release iothread around aio_poll
  
  to cause a problem with hanging guests.
  
  Having many guests all with a kernel/ramdisk (via -kernel) and
  several null block devices will result in hangs. All hanging 
  guests are in partition detection code waiting for an I/O to return
  so very early maybe even the first I/O.
  
  Reverting that commit fixes the hangs.
  Any ideas?
 
 Stefan, please revert it as I will not have time to look at it until
 well into 2.4 soft freeze.

Ok

Stefan


pgpBn_HeEg19E.pgp
Description: PGP signature


[Qemu-devel] iothread: release iothread around aio_poll causes random hangs at startup

2015-06-02 Thread Christian Borntraeger
Paolo,

I bisected 
commit a0710f7995f914e3044e5899bd8ff6c43c62f916
Author: Paolo Bonzini pbonz...@redhat.com
AuthorDate: Fri Feb 20 17:26:52 2015 +0100
Commit: Kevin Wolf kw...@redhat.com
CommitDate: Tue Apr 28 15:36:08 2015 +0200

iothread: release iothread around aio_poll

to cause a problem with hanging guests.

Having many guests all with a kernel/ramdisk (via -kernel) and
several null block devices will result in hangs. All hanging 
guests are in partition detection code waiting for an I/O to return
so very early maybe even the first I/O.

Reverting that commit fixes the hangs.
Any ideas?

Christian

PS: A guest xml looks like


domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'
  nametest2/name
  uuid13bd8253-9abb-4be8-9399-73b899762aaa/uuid
  memory unit='KiB'286720/memory
  currentMemory unit='KiB'286720/currentMemory
  vcpu placement='static'10/vcpu
  iothreads8/iothreads
  cputune
shares12/shares
  /cputune
  os
type arch='s390x' machine='s390-ccw-virtio'hvm/type
kernel/boot/vmlinux-4.0.0+/kernel
initrd/boot/ramdisk.reboot/initrd
cmdlineroot=/dev/ram0/cmdline
boot dev='hd'/
  /os
  clock offset='utc'/
  on_poweroffdestroy/on_poweroff
  on_rebootrestart/on_reboot
  on_crashpreserve/on_crash
  devices
controller type='usb' index='0' model='none'/
console type='pty'
  target type='sclp' port='0'/
/console
memballoon model='none'/
  /devices
  qemu:commandline
qemu:arg value='-drive'/
qemu:arg value='driver=null-aio,id=null1,if=none,size=100G'/
qemu:arg value='-device'/
qemu:arg 
value='virtio-blk-ccw,drive=null1,serial=null1,iothread=iothread1'/
qemu:arg value='-drive'/
qemu:arg value='driver=null-aio,id=null2,if=none,size=100G'/
qemu:arg value='-device'/
qemu:arg 
value='virtio-blk-ccw,drive=null2,serial=null2,iothread=iothread2'/
qemu:arg value='-drive'/
qemu:arg value='driver=null-aio,id=null3,if=none,size=100G'/
qemu:arg value='-device'/
qemu:arg 
value='virtio-blk-ccw,drive=null3,serial=null3,iothread=iothread3'/
qemu:arg value='-drive'/
qemu:arg value='driver=null-aio,id=null4,if=none,size=100G'/
qemu:arg value='-device'/
qemu:arg 
value='virtio-blk-ccw,drive=null4,serial=null4,iothread=iothread4'/
qemu:arg value='-drive'/
qemu:arg value='driver=null-aio,id=null5,if=none,size=100G'/
qemu:arg value='-device'/
qemu:arg 
value='virtio-blk-ccw,drive=null5,serial=null5,iothread=iothread5'/
qemu:arg value='-drive'/
qemu:arg value='driver=null-aio,id=null6,if=none,size=100G'/
qemu:arg value='-device'/
qemu:arg 
value='virtio-blk-ccw,drive=null6,serial=null6,iothread=iothread6'/
qemu:arg value='-drive'/
qemu:arg value='driver=null-aio,id=null7,if=none,size=100G'/
qemu:arg value='-device'/
qemu:arg 
value='virtio-blk-ccw,drive=null7,serial=null7,iothread=iothread7'/
qemu:arg value='-drive'/
qemu:arg value='driver=null-aio,id=null8,if=none,size=100G'/
qemu:arg value='-device'/
qemu:arg 
value='virtio-blk-ccw,drive=null8,serial=null8,iothread=iothread8'/
  /qemu:commandline
/domain




Re: [Qemu-devel] iothread: release iothread around aio_poll causes random hangs at startup

2015-06-02 Thread Paolo Bonzini


On 02/06/2015 16:36, Christian Borntraeger wrote:
 commit a0710f7995f914e3044e5899bd8ff6c43c62f916
 Author: Paolo Bonzini pbonz...@redhat.com
 AuthorDate: Fri Feb 20 17:26:52 2015 +0100
 Commit: Kevin Wolf kw...@redhat.com
 CommitDate: Tue Apr 28 15:36:08 2015 +0200
 
 iothread: release iothread around aio_poll
 
 to cause a problem with hanging guests.
 
 Having many guests all with a kernel/ramdisk (via -kernel) and
 several null block devices will result in hangs. All hanging 
 guests are in partition detection code waiting for an I/O to return
 so very early maybe even the first I/O.
 
 Reverting that commit fixes the hangs.
 Any ideas?

Stefan, please revert it as I will not have time to look at it until
well into 2.4 soft freeze.

Paolo