On 02/04/2015 18:26, Stefan Hajnoczi wrote: > John Snow has reported that qemu-io can hang when the host is under > heavy load. He made the following observations in gdb: > > 1. The program is sitting in aio_poll() (called by bdrv_prwv_co()) > waiting for request completion. > > 2. The thread pool has a ThreadPoolElement with ->state == THREAD_DONE. > > The ThreadPoolElement should have been reaped by > thread_pool_completion_bh() and its callback invoked. For some reason > this didn't happen and the program is blocked in poll(2) waiting. > > This suggests a race condition in thread-pool.c or qemu_bh_schedule() > (used to complete ThreadPoolElement from a QEMU event loop). > > I don't have a good theory why this happens yet. Just wanted to share > in case someone else hits this problem.
Laszlo hit something very similar fairly easily with virtio-scsi (but not virtio-blk!) on aarch64 hosts. Any attempt to debug it (ranging from compilation with -O0 to tracing) made it disappear. A reliable reproducer with qemu-io would be a dream... Paolo