On Mon, 08/11 15:21, Stefan Hajnoczi wrote: > On Mon, Aug 11, 2014 at 04:33:21PM +0800, Bin Wu wrote: > > Hi, > > > > I tested the reliability of qemu in the IPSAN environment as follows: > > (1) create one VM on a X86 server which is connected to an IPSAN, and the VM > > has only one system volume which is on the IPSAN; > > (2) disconnect the network between the server and the IPSAN. On the server, > > I have a "multipath" software which can hold the IO for a long time > > (configurable) when the network is disconnected; > > (3) about 30 seconds later, the whole VM hangs there, nothing can be done to > > the VM! > > > > Then, I used "gstack" tool to collect the stacks of all qemu threads, it > > looked like: > > > > Thread 8 (Thread 0x7fd840bb5700 (LWP 6671)): > > #0 0x00007fd84253a4f6 in poll () from /lib64/libc.so.6 > > #1 0x00007fd84410ceff in aio_poll () > > #2 0x00007fd84429bb05 in qemu_aio_wait () > > #3 0x00007fd844120f51 in bdrv_drain_all () > > #4 0x00007fd8441f1a4a in bmdma_cmd_writeb () > > #5 0x00007fd8441f216e in bmdma_write () > > #6 0x00007fd8443a93cf in memory_region_write_accessor () > > #7 0x00007fd8443a94a6 in access_with_adjusted_size () > > #8 0x00007fd8443a9901 in memory_region_iorange_write () > > #9 0x00007fd8443a19bd in ioport_writeb_thunk () > > #10 0x00007fd8443a13a8 in ioport_write () > > #11 0x00007fd8443a1f55 in cpu_outb () > > #12 0x00007fd8443a5b12 in kvm_handle_io () > > #13 0x00007fd8443a64a9 in kvm_cpu_exec () > > #14 0x00007fd844330962 in qemu_kvm_cpu_thread_fn () > > #15 0x00007fd8427e77b6 in start_thread () from /lib64/libpthread.so.0 > > #16 0x00007fd8425439cd in clone () from /lib64/libc.so.6 > > #17 0x0000000000000000 in ?? () > > Use virtio-blk. Read, write, and flush are asynchronous in virtio-blk. > > Note that the QEMU monitor commands are typically synchronous so they > will still block the VM. >
If some of the requests are dropped by host and never return to QEMU, I think bdrv_drain_all() will still cause the hang. Even with virtio-blk, reset has such a call. Maybe we could add some -ETIMEDOUT machanism in QEMU's block layer. A workaround might be to configure the host storage to fail the IO after a timeout. Fam