We have suffered from the following deadlock Thread 2 (Thread 0x7f1b7edf9700 (LWP 240293)): #0 0x00007f1bd1f0675f in ppoll () from /lib64/libc.so.6 #1 0x00007f1bd8c1d78b in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77 #2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at qemu-timer.c:310 #3 0x00007f1bd8c1e8bf in aio_poll (ctx=0x7f1bda091780, blocking=blocking@entry=true) at aio-posix.c:451 #4 0x00007f1bd8c119cf in bdrv_drain_one (bs=bs@entry=0x7f1bda0f2000) at block.c:2055 #5 0x00007f1bd8c13244 in bdrv_drain_all () at block.c:2115 #6 0x00007f1bd8a2c5e3 in vm_stop (state=<optimized out>) at /usr/src/debug/qemu-2.3.0/cpus.c:685 #7 0x00007f1bd8a2c636 in vm_stop_force_state (state=<optimized out>) at /usr/src/debug/qemu-2.3.0/cpus.c:1383 #8 0x00007f1bd8bc798f in migration_completion (start_time=<synthetic pointer>, old_vm_running=<synthetic pointer>, current_active_state=<optimized out>, s=0x7f1bd90e3c20 <current_migration.37255>) at migration/migration.c:1213 #9 migration_thread (opaque=0x7f1bd90e3c20 <current_migration.37255>) at migration/migration.c:1314 #10 0x00007f1bd21e3dc5 in start_thread () from /lib64/libpthread.so.0 #11 0x00007f1bd1f10ced in clone () from /lib64/libc.so.6
The problem was narrowed down to the commit commit 3ff2f67a7c24183fcbcfe1332e5223ac6f96438c Author: Evgeny Yakovlev <eyakov...@virtuozzo.com> Date: Mon Jul 18 22:39:52 2016 +0300 block: ignore flush requests when storage is clean This patches contains fixes for the situation. The probability of the problem is not that big. Our regression testing faces it ~1 time a week or less. Signed-off-by: Evgeny Yakovlev <eyakov...@virtuozzo.com> Signed-off-by: Denis V. Lunev <d...@openvz.org> CC: Stefan Hajnoczi <stefa...@redhat.com> CC: Fam Zheng <f...@redhat.com> CC: Kevin Wolf <kw...@redhat.com> CC: Max Reitz <mre...@redhat.com>