Hi,
Recently, I got a problem when vm with multiple disks made an external
snapshot using dataplane with the same iothread. Aio context was
acquired in external_snapshot_prepare(), which will be released in
external_snapshot_clean() with all bs snapshots committed together. In
this scene, when the same iothread was configured with multiple disks,
the second bs will hang in aio_poll() waiting IO returned, as the
iothread could not acquired the aio context. Qemu stack as following:
Main thread lasts ppolling waiting for io completed:
#0 0x00007fdc1128a18f in ppoll () from /lib64/libc.so.6
#1 0x00007fdc19637aea in qemu_poll_ns (fds=0x7fdc1c9a1ea0, nfds=2,
timeout=-1) at qemu-timer.c:313
#2 0x00007fdc19639327 in aio_poll (ctx=0x7fdc1c9555a0, blocking=true)
at aio-posix.c:453
#3 0x00007fdc19690f84 in bdrv_flush (bs=0x7fdc1c9c6250) at block/io.c:2537
#4 0x00007fdc193d9603 in external_snapshot_prepare
(common=0x7fdc1e52ea20, errp=0x7ffdd62a5e28) at blockdev.c:1752
#5 0x00007fdc193da7af in qmp_transaction (dev_list=0x7fdc1d7ce5c0,
has_props=false, props=0x7fdc1e6ca030, errp=0x7ffdd62a5ea0)
While iotread attempts to acquire aio context hold by main thread which
will release the aio context only after all the snapshots committed.
#0 0x00007fdc14a10f4d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fdc14a0cd1d in _L_lock_840 () from /lib64/libpthread.so.0
#2 0x00007fdc14a0cc3a in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007fdc196fe295 in qemu_mutex_lock (mutex=0x7fdc1c96c080) at
util/qemu-thread-posix.c:73
#4 0x00007fdc1962c13c in aio_context_acquire (ctx=0x7fdc1c96c020) at
async.c:357
#5 0x00007fdc19639351 in aio_poll (ctx=0x7fdc1c96c020, blocking=true)
at aio-posix.c:459
#6 0x00007fdc193df354 in iothread_run (opaque=0x7fdc1c969400) at
iothread.c:53
#7 0x00007fdc14a0adc5 in start_thread () from /lib64/libpthread.so.0
#8 0x00007fdc1129471d in clone () from /lib64/libc.so.6
Is there any good way to solve this problem when using dataplane with
the same iothread in transactions? Thanks!