Re: [Qemu-devel] iothread: release iothread around aio_poll causes random hangs at startup
Am 10.06.2015 um 11:34 schrieb Fam Zheng: On Wed, 06/10 11:18, Christian Borntraeger wrote: Am 10.06.2015 um 04:12 schrieb Fam Zheng: On Tue, 06/09 11:01, Christian Borntraeger wrote: Am 09.06.2015 um 04:28 schrieb Fam Zheng: On Tue, 06/02 16:36, Christian Borntraeger wrote: Paolo, I bisected commit a0710f7995f914e3044e5899bd8ff6c43c62f916 Author: Paolo Bonzini pbonz...@redhat.com AuthorDate: Fri Feb 20 17:26:52 2015 +0100 Commit: Kevin Wolf kw...@redhat.com CommitDate: Tue Apr 28 15:36:08 2015 +0200 iothread: release iothread around aio_poll to cause a problem with hanging guests. Having many guests all with a kernel/ramdisk (via -kernel) and several null block devices will result in hangs. All hanging guests are in partition detection code waiting for an I/O to return so very early maybe even the first I/O. Reverting that commit fixes the hangs. Any ideas? For what its worth, I can no longer reproduce the issue on current master + cherry-pick of a0710f7995f (iothread: release iothread around aio_poll) bisect tells me that commit 53ec73e264f481b79b52efcadc9ceb8f8996975c Author: Fam Zheng f...@redhat.com AuthorDate: Fri May 29 18:53:14 2015 +0800 Commit: Stefan Hajnoczi stefa...@redhat.com CommitDate: Tue Jul 7 14:27:14 2015 +0100 block: Use bdrv_drain to replace uncessary bdrv_drain_all made the problem will blk-null go away. I still dont understand why. Christian
Re: [Qemu-devel] iothread: release iothread around aio_poll causes random hangs at startup
Am 16.07.2015 um 13:20 schrieb Paolo Bonzini: On 16/07/2015 13:03, Christian Borntraeger wrote: For what its worth, I can no longer reproduce the issue on current master + cherry-pick of a0710f7995f (iothread: release iothread around aio_poll) bisect tells me that commit 53ec73e264f481b79b52efcadc9ceb8f8996975c Author: Fam Zheng f...@redhat.com AuthorDate: Fri May 29 18:53:14 2015 +0800 Commit: Stefan Hajnoczi stefa...@redhat.com CommitDate: Tue Jul 7 14:27:14 2015 +0100 block: Use bdrv_drain to replace uncessary bdrv_drain_all made the problem will blk-null go away. I still dont understand why. It could be related to the AioContext problem that I'm fixing these days, too. Good news, we'll requeue the patch for 2.5. That was also something that I had in mind (in fact I retested this to check the ctx patch). master + cherry-pick of a0710f7995f + revert of 53ec73e26 + this fix still fails, so it was (is?) a different issue. The interesting part is that this problem required 2 or more disk (and we replace drain_all with single drains) so it somewhat sounds plausible. Christian
Re: [Qemu-devel] iothread: release iothread around aio_poll causes random hangs at startup
On 16/07/2015 13:24, Christian Borntraeger wrote: The interesting part is that this problem required 2 or more disk (and we replace drain_all with single drains) so it somewhat sounds plausible. Yes, indeed. It is very plausible. I wanted to reproduce it these days, so thanks for saving me a lot of time! I'll test your exact setup (master + AioContext fix + cherry-pick of a0710f7995f + revert of 53ec73e26). Thanks, Paolo
Re: [Qemu-devel] iothread: release iothread around aio_poll causes random hangs at startup
On 16/07/2015 13:03, Christian Borntraeger wrote: For what its worth, I can no longer reproduce the issue on current master + cherry-pick of a0710f7995f (iothread: release iothread around aio_poll) bisect tells me that commit 53ec73e264f481b79b52efcadc9ceb8f8996975c Author: Fam Zheng f...@redhat.com AuthorDate: Fri May 29 18:53:14 2015 +0800 Commit: Stefan Hajnoczi stefa...@redhat.com CommitDate: Tue Jul 7 14:27:14 2015 +0100 block: Use bdrv_drain to replace uncessary bdrv_drain_all made the problem will blk-null go away. I still dont understand why. It could be related to the AioContext problem that I'm fixing these days, too. Good news, we'll requeue the patch for 2.5. Paolo
Re: [Qemu-devel] iothread: release iothread around aio_poll causes random hangs at startup
Am 10.06.2015 um 11:34 schrieb Fam Zheng: On Wed, 06/10 11:18, Christian Borntraeger wrote: Am 10.06.2015 um 04:12 schrieb Fam Zheng: On Tue, 06/09 11:01, Christian Borntraeger wrote: Am 09.06.2015 um 04:28 schrieb Fam Zheng: On Tue, 06/02 16:36, Christian Borntraeger wrote: Paolo, I bisected commit a0710f7995f914e3044e5899bd8ff6c43c62f916 Author: Paolo Bonzini pbonz...@redhat.com AuthorDate: Fri Feb 20 17:26:52 2015 +0100 Commit: Kevin Wolf kw...@redhat.com CommitDate: Tue Apr 28 15:36:08 2015 +0200 iothread: release iothread around aio_poll to cause a problem with hanging guests. Having many guests all with a kernel/ramdisk (via -kernel) and several null block devices will result in hangs. All hanging guests are in partition detection code waiting for an I/O to return so very early maybe even the first I/O. Reverting that commit fixes the hangs. Any ideas? Christian, I can't reproduce this on my x86 box with virtio-blk-pci. Do you have a reproducer for x86? Or could you collect backtraces for all the threads in QEMU when it hangs? My long shot is that the main loop is blocked at aio_context_acquire(ctx), while the iothread of that ctx is blocked at aio_poll(ctx, blocking). Here is a backtrace on s390. I need 2 or more disks, (one is not enough). It shows iothreads and main loop are all waiting for events, and the vcpu threads are running guest code. It could be the requests being leaked. Do you see this problem with a regular file based image or null-co driver? Maybe we're missing something about the AioContext in block/null.c. It seems to run with normal file based images. As soon as I have two or more null-aio devices it hangs pretty soon when doing a reboot loop. Ahh! If it's a reboot loop, the device reset thing may get fishy. I suspect the completion BH used by null-aio may be messed up, that's why I wonder whether null-co:// would work for you. Could you test that? null-co also fails. Also, could you try below patch with null-aio://, too? The same. Guests still get stuck. Thanks, Fam --- diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index cd539aa..c87b444 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -652,15 +652,11 @@ static void virtio_blk_reset(VirtIODevice *vdev) { VirtIOBlock *s = VIRTIO_BLK(vdev); -if (s-dataplane) { -virtio_blk_data_plane_stop(s-dataplane); -} - -/* - * This should cancel pending requests, but can't do nicely until there - * are per-device request lists. - */ blk_drain_all(); +if (s-dataplane) { +virtio_blk_data_plane_stop(s-dataplane); +} + blk_set_enable_write_cache(s-blk, s-original_wce); }
Re: [Qemu-devel] iothread: release iothread around aio_poll causes random hangs at startup
On Wed, 06/10 11:18, Christian Borntraeger wrote: Am 10.06.2015 um 04:12 schrieb Fam Zheng: On Tue, 06/09 11:01, Christian Borntraeger wrote: Am 09.06.2015 um 04:28 schrieb Fam Zheng: On Tue, 06/02 16:36, Christian Borntraeger wrote: Paolo, I bisected commit a0710f7995f914e3044e5899bd8ff6c43c62f916 Author: Paolo Bonzini pbonz...@redhat.com AuthorDate: Fri Feb 20 17:26:52 2015 +0100 Commit: Kevin Wolf kw...@redhat.com CommitDate: Tue Apr 28 15:36:08 2015 +0200 iothread: release iothread around aio_poll to cause a problem with hanging guests. Having many guests all with a kernel/ramdisk (via -kernel) and several null block devices will result in hangs. All hanging guests are in partition detection code waiting for an I/O to return so very early maybe even the first I/O. Reverting that commit fixes the hangs. Any ideas? Christian, I can't reproduce this on my x86 box with virtio-blk-pci. Do you have a reproducer for x86? Or could you collect backtraces for all the threads in QEMU when it hangs? My long shot is that the main loop is blocked at aio_context_acquire(ctx), while the iothread of that ctx is blocked at aio_poll(ctx, blocking). Here is a backtrace on s390. I need 2 or more disks, (one is not enough). It shows iothreads and main loop are all waiting for events, and the vcpu threads are running guest code. It could be the requests being leaked. Do you see this problem with a regular file based image or null-co driver? Maybe we're missing something about the AioContext in block/null.c. It seems to run with normal file based images. As soon as I have two or more null-aio devices it hangs pretty soon when doing a reboot loop. Ahh! If it's a reboot loop, the device reset thing may get fishy. I suspect the completion BH used by null-aio may be messed up, that's why I wonder whether null-co:// would work for you. Could you test that? Also, could you try below patch with null-aio://, too? Thanks, Fam --- diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index cd539aa..c87b444 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -652,15 +652,11 @@ static void virtio_blk_reset(VirtIODevice *vdev) { VirtIOBlock *s = VIRTIO_BLK(vdev); -if (s-dataplane) { -virtio_blk_data_plane_stop(s-dataplane); -} - -/* - * This should cancel pending requests, but can't do nicely until there - * are per-device request lists. - */ blk_drain_all(); +if (s-dataplane) { +virtio_blk_data_plane_stop(s-dataplane); +} + blk_set_enable_write_cache(s-blk, s-original_wce); }
Re: [Qemu-devel] iothread: release iothread around aio_poll causes random hangs at startup
Am 10.06.2015 um 04:12 schrieb Fam Zheng: On Tue, 06/09 11:01, Christian Borntraeger wrote: Am 09.06.2015 um 04:28 schrieb Fam Zheng: On Tue, 06/02 16:36, Christian Borntraeger wrote: Paolo, I bisected commit a0710f7995f914e3044e5899bd8ff6c43c62f916 Author: Paolo Bonzini pbonz...@redhat.com AuthorDate: Fri Feb 20 17:26:52 2015 +0100 Commit: Kevin Wolf kw...@redhat.com CommitDate: Tue Apr 28 15:36:08 2015 +0200 iothread: release iothread around aio_poll to cause a problem with hanging guests. Having many guests all with a kernel/ramdisk (via -kernel) and several null block devices will result in hangs. All hanging guests are in partition detection code waiting for an I/O to return so very early maybe even the first I/O. Reverting that commit fixes the hangs. Any ideas? Christian, I can't reproduce this on my x86 box with virtio-blk-pci. Do you have a reproducer for x86? Or could you collect backtraces for all the threads in QEMU when it hangs? My long shot is that the main loop is blocked at aio_context_acquire(ctx), while the iothread of that ctx is blocked at aio_poll(ctx, blocking). Here is a backtrace on s390. I need 2 or more disks, (one is not enough). It shows iothreads and main loop are all waiting for events, and the vcpu threads are running guest code. It could be the requests being leaked. Do you see this problem with a regular file based image or null-co driver? Maybe we're missing something about the AioContext in block/null.c. It seems to run with normal file based images. As soon as I have two or more null-aio devices it hangs pretty soon when doing a reboot loop. Christian
Re: [Qemu-devel] iothread: release iothread around aio_poll causes random hangs at startup
Am 09.06.2015 um 04:28 schrieb Fam Zheng: On Tue, 06/02 16:36, Christian Borntraeger wrote: Paolo, I bisected commit a0710f7995f914e3044e5899bd8ff6c43c62f916 Author: Paolo Bonzini pbonz...@redhat.com AuthorDate: Fri Feb 20 17:26:52 2015 +0100 Commit: Kevin Wolf kw...@redhat.com CommitDate: Tue Apr 28 15:36:08 2015 +0200 iothread: release iothread around aio_poll to cause a problem with hanging guests. Having many guests all with a kernel/ramdisk (via -kernel) and several null block devices will result in hangs. All hanging guests are in partition detection code waiting for an I/O to return so very early maybe even the first I/O. Reverting that commit fixes the hangs. Any ideas? Christian, I can't reproduce this on my x86 box with virtio-blk-pci. Do you have a reproducer for x86? Or could you collect backtraces for all the threads in QEMU when it hangs? My long shot is that the main loop is blocked at aio_context_acquire(ctx), while the iothread of that ctx is blocked at aio_poll(ctx, blocking). Here is a backtrace on s390. I need 2 or more disks, (one is not enough). Thread 5 (Thread 0x3fffb406910 (LWP 74602)): #0 0x03fffc0bde8e in syscall () from /lib64/libc.so.6 #1 0x801dd282 in futex_wait (val=4294967295, ev=0x8079c6c4 rcu_call_ready_event) at /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:301 #2 qemu_event_wait (ev=ev@entry=0x8079c6c4 rcu_call_ready_event) at /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:399 #3 0x801ec75c in call_rcu_thread (opaque=optimized out) at /home/cborntra/REPOS/qemu/util/rcu.c:233 #4 0x03fffc16f4e6 in start_thread () from /lib64/libpthread.so.0 #5 0x03fffc0c30fa in thread_start () from /lib64/libc.so.6 Thread 4 (Thread 0x3fffabf7910 (LWP 74604)): #0 0x03fffc0b75d6 in ppoll () from /lib64/libc.so.6 #1 0x8016fbd0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=optimized out, __fds=optimized out) at /usr/include/bits/poll2.h:77 #2 qemu_poll_ns (fds=optimized out, nfds=optimized out, timeout=timeout@entry=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:310 #3 0x80170d32 in aio_poll (ctx=0x807d6a70, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:274 #4 0x800b3758 in iothread_run (opaque=0x807d6690) at /home/cborntra/REPOS/qemu/iothread.c:41 #5 0x03fffc16f4e6 in start_thread () from /lib64/libpthread.so.0 #6 0x03fffc0c30fa in thread_start () from /lib64/libc.so.6 Thread 3 (Thread 0x3fffa3f7910 (LWP 74605)): #0 0x03fffc0b75d6 in ppoll () from /lib64/libc.so.6 #1 0x8016fbd0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=optimized out, __fds=optimized out) at /usr/include/bits/poll2.h:77 #2 qemu_poll_ns (fds=optimized out, nfds=optimized out, timeout=timeout@entry=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:310 #3 0x80170d32 in aio_poll (ctx=0x807d94a0, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:274 #4 0x800b3758 in iothread_run (opaque=0x807d6f60) at /home/cborntra/REPOS/qemu/iothread.c:41 #5 0x03fffc16f4e6 in start_thread () from /lib64/libpthread.so.0 #6 0x03fffc0c30fa in thread_start () from /lib64/libc.so.6 Thread 2 (Thread 0x3fff8a21910 (LWP 74625)): #0 0x03fffc0b90a2 in ioctl () from /lib64/libc.so.6 #1 0x80056a46 in kvm_vcpu_ioctl (cpu=cpu@entry=0x81f3b620, type=type@entry=44672) at /home/cborntra/REPOS/qemu/kvm-all.c:1916 #2 0x80056b08 in kvm_cpu_exec (cpu=cpu@entry=0x81f3b620) at /home/cborntra/REPOS/qemu/kvm-all.c:1775 #3 0x800445de in qemu_kvm_cpu_thread_fn (arg=0x81f3b620) at /home/cborntra/REPOS/qemu/cpus.c:979 #4 0x03fffc16f4e6 in start_thread () from /lib64/libpthread.so.0 #5 0x03fffc0c30fa in thread_start () from /lib64/libc.so.6 Thread 1 (Thread 0x3fffb408bc0 (LWP 74580)): #0 0x03fffc0b75d6 in ppoll () from /lib64/libc.so.6 #1 0x8016fbb0 in ppoll (__ss=0x0, __timeout=0x3d64438, __nfds=optimized out, __fds=optimized out) at /usr/include/bits/poll2.h:77 #2 qemu_poll_ns (fds=optimized out, nfds=optimized out, timeout=timeout@entry=99900) at /home/cborntra/REPOS/qemu/qemu-timer.c:322 #3 0x8016f230 in os_host_main_loop_wait (timeout=99900) at /home/cborntra/REPOS/qemu/main-loop.c:239 #4 main_loop_wait (nonblocking=optimized out) at /home/cborntra/REPOS/qemu/main-loop.c:494 #5 0x8001346a in main_loop () at /home/cborntra/REPOS/qemu/vl.c:1789 #6 main (argc=optimized out, argv=optimized out, envp=optimized out) at /home/cborntra/REPOS/qemu/vl.c:4391
Re: [Qemu-devel] iothread: release iothread around aio_poll causes random hangs at startup
On Tue, 06/09 11:01, Christian Borntraeger wrote: Am 09.06.2015 um 04:28 schrieb Fam Zheng: On Tue, 06/02 16:36, Christian Borntraeger wrote: Paolo, I bisected commit a0710f7995f914e3044e5899bd8ff6c43c62f916 Author: Paolo Bonzini pbonz...@redhat.com AuthorDate: Fri Feb 20 17:26:52 2015 +0100 Commit: Kevin Wolf kw...@redhat.com CommitDate: Tue Apr 28 15:36:08 2015 +0200 iothread: release iothread around aio_poll to cause a problem with hanging guests. Having many guests all with a kernel/ramdisk (via -kernel) and several null block devices will result in hangs. All hanging guests are in partition detection code waiting for an I/O to return so very early maybe even the first I/O. Reverting that commit fixes the hangs. Any ideas? Christian, I can't reproduce this on my x86 box with virtio-blk-pci. Do you have a reproducer for x86? Or could you collect backtraces for all the threads in QEMU when it hangs? My long shot is that the main loop is blocked at aio_context_acquire(ctx), while the iothread of that ctx is blocked at aio_poll(ctx, blocking). Here is a backtrace on s390. I need 2 or more disks, (one is not enough). It shows iothreads and main loop are all waiting for events, and the vcpu threads are running guest code. It could be the requests being leaked. Do you see this problem with a regular file based image or null-co driver? Maybe we're missing something about the AioContext in block/null.c. Fam Thread 5 (Thread 0x3fffb406910 (LWP 74602)): #0 0x03fffc0bde8e in syscall () from /lib64/libc.so.6 #1 0x801dd282 in futex_wait (val=4294967295, ev=0x8079c6c4 rcu_call_ready_event) at /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:301 #2 qemu_event_wait (ev=ev@entry=0x8079c6c4 rcu_call_ready_event) at /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:399 #3 0x801ec75c in call_rcu_thread (opaque=optimized out) at /home/cborntra/REPOS/qemu/util/rcu.c:233 #4 0x03fffc16f4e6 in start_thread () from /lib64/libpthread.so.0 #5 0x03fffc0c30fa in thread_start () from /lib64/libc.so.6 Thread 4 (Thread 0x3fffabf7910 (LWP 74604)): #0 0x03fffc0b75d6 in ppoll () from /lib64/libc.so.6 #1 0x8016fbd0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=optimized out, __fds=optimized out) at /usr/include/bits/poll2.h:77 #2 qemu_poll_ns (fds=optimized out, nfds=optimized out, timeout=timeout@entry=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:310 #3 0x80170d32 in aio_poll (ctx=0x807d6a70, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:274 #4 0x800b3758 in iothread_run (opaque=0x807d6690) at /home/cborntra/REPOS/qemu/iothread.c:41 #5 0x03fffc16f4e6 in start_thread () from /lib64/libpthread.so.0 #6 0x03fffc0c30fa in thread_start () from /lib64/libc.so.6 Thread 3 (Thread 0x3fffa3f7910 (LWP 74605)): #0 0x03fffc0b75d6 in ppoll () from /lib64/libc.so.6 #1 0x8016fbd0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=optimized out, __fds=optimized out) at /usr/include/bits/poll2.h:77 #2 qemu_poll_ns (fds=optimized out, nfds=optimized out, timeout=timeout@entry=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:310 #3 0x80170d32 in aio_poll (ctx=0x807d94a0, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:274 #4 0x800b3758 in iothread_run (opaque=0x807d6f60) at /home/cborntra/REPOS/qemu/iothread.c:41 #5 0x03fffc16f4e6 in start_thread () from /lib64/libpthread.so.0 #6 0x03fffc0c30fa in thread_start () from /lib64/libc.so.6 Thread 2 (Thread 0x3fff8a21910 (LWP 74625)): #0 0x03fffc0b90a2 in ioctl () from /lib64/libc.so.6 #1 0x80056a46 in kvm_vcpu_ioctl (cpu=cpu@entry=0x81f3b620, type=type@entry=44672) at /home/cborntra/REPOS/qemu/kvm-all.c:1916 #2 0x80056b08 in kvm_cpu_exec (cpu=cpu@entry=0x81f3b620) at /home/cborntra/REPOS/qemu/kvm-all.c:1775 #3 0x800445de in qemu_kvm_cpu_thread_fn (arg=0x81f3b620) at /home/cborntra/REPOS/qemu/cpus.c:979 #4 0x03fffc16f4e6 in start_thread () from /lib64/libpthread.so.0 #5 0x03fffc0c30fa in thread_start () from /lib64/libc.so.6 Thread 1 (Thread 0x3fffb408bc0 (LWP 74580)): #0 0x03fffc0b75d6 in ppoll () from /lib64/libc.so.6 #1 0x8016fbb0 in ppoll (__ss=0x0, __timeout=0x3d64438, __nfds=optimized out, __fds=optimized out) at /usr/include/bits/poll2.h:77 #2 qemu_poll_ns (fds=optimized out, nfds=optimized out, timeout=timeout@entry=99900) at /home/cborntra/REPOS/qemu/qemu-timer.c:322 #3 0x8016f230 in os_host_main_loop_wait (timeout=99900) at /home/cborntra/REPOS/qemu/main-loop.c:239 #4 main_loop_wait (nonblocking=optimized out) at /home/cborntra/REPOS/qemu/main-loop.c:494 #5 0x8001346a in main_loop () at /home/cborntra/REPOS/qemu/vl.c:1789 #6 main (argc=optimized out, argv=optimized out, envp=optimized out) at
Re: [Qemu-devel] iothread: release iothread around aio_poll causes random hangs at startup
On Tue, 06/02 16:36, Christian Borntraeger wrote: Paolo, I bisected commit a0710f7995f914e3044e5899bd8ff6c43c62f916 Author: Paolo Bonzini pbonz...@redhat.com AuthorDate: Fri Feb 20 17:26:52 2015 +0100 Commit: Kevin Wolf kw...@redhat.com CommitDate: Tue Apr 28 15:36:08 2015 +0200 iothread: release iothread around aio_poll to cause a problem with hanging guests. Having many guests all with a kernel/ramdisk (via -kernel) and several null block devices will result in hangs. All hanging guests are in partition detection code waiting for an I/O to return so very early maybe even the first I/O. Reverting that commit fixes the hangs. Any ideas? Christian, I can't reproduce this on my x86 box with virtio-blk-pci. Do you have a reproducer for x86? Or could you collect backtraces for all the threads in QEMU when it hangs? My long shot is that the main loop is blocked at aio_context_acquire(ctx), while the iothread of that ctx is blocked at aio_poll(ctx, blocking). Thanks, Fam Christian PS: A guest xml looks like domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0' nametest2/name uuid13bd8253-9abb-4be8-9399-73b899762aaa/uuid memory unit='KiB'286720/memory currentMemory unit='KiB'286720/currentMemory vcpu placement='static'10/vcpu iothreads8/iothreads cputune shares12/shares /cputune os type arch='s390x' machine='s390-ccw-virtio'hvm/type kernel/boot/vmlinux-4.0.0+/kernel initrd/boot/ramdisk.reboot/initrd cmdlineroot=/dev/ram0/cmdline boot dev='hd'/ /os clock offset='utc'/ on_poweroffdestroy/on_poweroff on_rebootrestart/on_reboot on_crashpreserve/on_crash devices controller type='usb' index='0' model='none'/ console type='pty' target type='sclp' port='0'/ /console memballoon model='none'/ /devices qemu:commandline qemu:arg value='-drive'/ qemu:arg value='driver=null-aio,id=null1,if=none,size=100G'/ qemu:arg value='-device'/ qemu:arg value='virtio-blk-ccw,drive=null1,serial=null1,iothread=iothread1'/ qemu:arg value='-drive'/ qemu:arg value='driver=null-aio,id=null2,if=none,size=100G'/ qemu:arg value='-device'/ qemu:arg value='virtio-blk-ccw,drive=null2,serial=null2,iothread=iothread2'/ qemu:arg value='-drive'/ qemu:arg value='driver=null-aio,id=null3,if=none,size=100G'/ qemu:arg value='-device'/ qemu:arg value='virtio-blk-ccw,drive=null3,serial=null3,iothread=iothread3'/ qemu:arg value='-drive'/ qemu:arg value='driver=null-aio,id=null4,if=none,size=100G'/ qemu:arg value='-device'/ qemu:arg value='virtio-blk-ccw,drive=null4,serial=null4,iothread=iothread4'/ qemu:arg value='-drive'/ qemu:arg value='driver=null-aio,id=null5,if=none,size=100G'/ qemu:arg value='-device'/ qemu:arg value='virtio-blk-ccw,drive=null5,serial=null5,iothread=iothread5'/ qemu:arg value='-drive'/ qemu:arg value='driver=null-aio,id=null6,if=none,size=100G'/ qemu:arg value='-device'/ qemu:arg value='virtio-blk-ccw,drive=null6,serial=null6,iothread=iothread6'/ qemu:arg value='-drive'/ qemu:arg value='driver=null-aio,id=null7,if=none,size=100G'/ qemu:arg value='-device'/ qemu:arg value='virtio-blk-ccw,drive=null7,serial=null7,iothread=iothread7'/ qemu:arg value='-drive'/ qemu:arg value='driver=null-aio,id=null8,if=none,size=100G'/ qemu:arg value='-device'/ qemu:arg value='virtio-blk-ccw,drive=null8,serial=null8,iothread=iothread8'/ /qemu:commandline /domain
Re: [Qemu-devel] iothread: release iothread around aio_poll causes random hangs at startup
On Tue, Jun 02, 2015 at 04:51:46PM +0200, Paolo Bonzini wrote: On 02/06/2015 16:36, Christian Borntraeger wrote: commit a0710f7995f914e3044e5899bd8ff6c43c62f916 Author: Paolo Bonzini pbonz...@redhat.com AuthorDate: Fri Feb 20 17:26:52 2015 +0100 Commit: Kevin Wolf kw...@redhat.com CommitDate: Tue Apr 28 15:36:08 2015 +0200 iothread: release iothread around aio_poll to cause a problem with hanging guests. Having many guests all with a kernel/ramdisk (via -kernel) and several null block devices will result in hangs. All hanging guests are in partition detection code waiting for an I/O to return so very early maybe even the first I/O. Reverting that commit fixes the hangs. Any ideas? Stefan, please revert it as I will not have time to look at it until well into 2.4 soft freeze. Ok Stefan pgpBn_HeEg19E.pgp Description: PGP signature
[Qemu-devel] iothread: release iothread around aio_poll causes random hangs at startup
Paolo, I bisected commit a0710f7995f914e3044e5899bd8ff6c43c62f916 Author: Paolo Bonzini pbonz...@redhat.com AuthorDate: Fri Feb 20 17:26:52 2015 +0100 Commit: Kevin Wolf kw...@redhat.com CommitDate: Tue Apr 28 15:36:08 2015 +0200 iothread: release iothread around aio_poll to cause a problem with hanging guests. Having many guests all with a kernel/ramdisk (via -kernel) and several null block devices will result in hangs. All hanging guests are in partition detection code waiting for an I/O to return so very early maybe even the first I/O. Reverting that commit fixes the hangs. Any ideas? Christian PS: A guest xml looks like domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0' nametest2/name uuid13bd8253-9abb-4be8-9399-73b899762aaa/uuid memory unit='KiB'286720/memory currentMemory unit='KiB'286720/currentMemory vcpu placement='static'10/vcpu iothreads8/iothreads cputune shares12/shares /cputune os type arch='s390x' machine='s390-ccw-virtio'hvm/type kernel/boot/vmlinux-4.0.0+/kernel initrd/boot/ramdisk.reboot/initrd cmdlineroot=/dev/ram0/cmdline boot dev='hd'/ /os clock offset='utc'/ on_poweroffdestroy/on_poweroff on_rebootrestart/on_reboot on_crashpreserve/on_crash devices controller type='usb' index='0' model='none'/ console type='pty' target type='sclp' port='0'/ /console memballoon model='none'/ /devices qemu:commandline qemu:arg value='-drive'/ qemu:arg value='driver=null-aio,id=null1,if=none,size=100G'/ qemu:arg value='-device'/ qemu:arg value='virtio-blk-ccw,drive=null1,serial=null1,iothread=iothread1'/ qemu:arg value='-drive'/ qemu:arg value='driver=null-aio,id=null2,if=none,size=100G'/ qemu:arg value='-device'/ qemu:arg value='virtio-blk-ccw,drive=null2,serial=null2,iothread=iothread2'/ qemu:arg value='-drive'/ qemu:arg value='driver=null-aio,id=null3,if=none,size=100G'/ qemu:arg value='-device'/ qemu:arg value='virtio-blk-ccw,drive=null3,serial=null3,iothread=iothread3'/ qemu:arg value='-drive'/ qemu:arg value='driver=null-aio,id=null4,if=none,size=100G'/ qemu:arg value='-device'/ qemu:arg value='virtio-blk-ccw,drive=null4,serial=null4,iothread=iothread4'/ qemu:arg value='-drive'/ qemu:arg value='driver=null-aio,id=null5,if=none,size=100G'/ qemu:arg value='-device'/ qemu:arg value='virtio-blk-ccw,drive=null5,serial=null5,iothread=iothread5'/ qemu:arg value='-drive'/ qemu:arg value='driver=null-aio,id=null6,if=none,size=100G'/ qemu:arg value='-device'/ qemu:arg value='virtio-blk-ccw,drive=null6,serial=null6,iothread=iothread6'/ qemu:arg value='-drive'/ qemu:arg value='driver=null-aio,id=null7,if=none,size=100G'/ qemu:arg value='-device'/ qemu:arg value='virtio-blk-ccw,drive=null7,serial=null7,iothread=iothread7'/ qemu:arg value='-drive'/ qemu:arg value='driver=null-aio,id=null8,if=none,size=100G'/ qemu:arg value='-device'/ qemu:arg value='virtio-blk-ccw,drive=null8,serial=null8,iothread=iothread8'/ /qemu:commandline /domain
Re: [Qemu-devel] iothread: release iothread around aio_poll causes random hangs at startup
On 02/06/2015 16:36, Christian Borntraeger wrote: commit a0710f7995f914e3044e5899bd8ff6c43c62f916 Author: Paolo Bonzini pbonz...@redhat.com AuthorDate: Fri Feb 20 17:26:52 2015 +0100 Commit: Kevin Wolf kw...@redhat.com CommitDate: Tue Apr 28 15:36:08 2015 +0200 iothread: release iothread around aio_poll to cause a problem with hanging guests. Having many guests all with a kernel/ramdisk (via -kernel) and several null block devices will result in hangs. All hanging guests are in partition detection code waiting for an I/O to return so very early maybe even the first I/O. Reverting that commit fixes the hangs. Any ideas? Stefan, please revert it as I will not have time to look at it until well into 2.4 soft freeze. Paolo