On Tue, 08/12 10:09, Zhang Haoyu wrote: > >> > Hi, > >> > > >> > I tested the reliability of qemu in the IPSAN environment as follows: > >> > (1) create one VM on a X86 server which is connected to an IPSAN, and > >> > the VM > >> > has only one system volume which is on the IPSAN; > >> > (2) disconnect the network between the server and the IPSAN. On the > >> > server, > >> > I have a "multipath" software which can hold the IO for a long time > >> > (configurable) when the network is disconnected; > >> > (3) about 30 seconds later, the whole VM hangs there, nothing can be > >> > done to > >> > the VM! > >> > > >> > Then, I used "gstack" tool to collect the stacks of all qemu threads, it > >> > looked like: > >> > > >> > Thread 8 (Thread 0x7fd840bb5700 (LWP 6671)): > >> > #0 0x00007fd84253a4f6 in poll () from /lib64/libc.so.6 > >> > #1 0x00007fd84410ceff in aio_poll () > >> > #2 0x00007fd84429bb05 in qemu_aio_wait () > >> > #3 0x00007fd844120f51 in bdrv_drain_all () > >> > #4 0x00007fd8441f1a4a in bmdma_cmd_writeb () > >> > #5 0x00007fd8441f216e in bmdma_write () > >> > #6 0x00007fd8443a93cf in memory_region_write_accessor () > >> > #7 0x00007fd8443a94a6 in access_with_adjusted_size () > >> > #8 0x00007fd8443a9901 in memory_region_iorange_write () > >> > #9 0x00007fd8443a19bd in ioport_writeb_thunk () > >> > #10 0x00007fd8443a13a8 in ioport_write () > >> > #11 0x00007fd8443a1f55 in cpu_outb () > >> > #12 0x00007fd8443a5b12 in kvm_handle_io () > >> > #13 0x00007fd8443a64a9 in kvm_cpu_exec () > >> > #14 0x00007fd844330962 in qemu_kvm_cpu_thread_fn () > >> > #15 0x00007fd8427e77b6 in start_thread () from /lib64/libpthread.so.0 > >> > #16 0x00007fd8425439cd in clone () from /lib64/libc.so.6 > >> > #17 0x0000000000000000 in ?? () > >> > >> Use virtio-blk. Read, write, and flush are asynchronous in virtio-blk. > >> > >> Note that the QEMU monitor commands are typically synchronous so they > >> will still block the VM. > >> > > > >If some of the requests are dropped by host and never return to QEMU, I think > >bdrv_drain_all() will still cause the hang. Even with virtio-blk, reset has > >such a call. Maybe we could add some -ETIMEDOUT machanism in QEMU's block > >layer. > > > >A workaround might be to configure the host storage to fail the IO after a > >timeout. > > > If -ETIMEOUT returned after a short time network disconnection, may > unpredicted fault happened in VM ? > e.g., the VM was reading important data(like, system data). > Does aio replay work for this case?
Guest should do error handling with it, in a way similar to -EIO. The connection is still down even if guest is free to retry, isn't it? Fam