Am 27.09.21 um 23:55 schrieb Eric Blake: > From: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com> > > OK, that's a big rewrite of the logic. > > Pre-patch we have an always running coroutine - connection_co. It does > reply receiving and reconnecting. And it leads to a lot of difficult > and unobvious code around drained sections and context switch. We also > abuse bs->in_flight counter which is increased for connection_co and > temporary decreased in points where we want to allow drained section to > begin. One of these place is in another file: in nbd_read_eof() in > nbd/client.c. > > We also cancel reconnect and requests waiting for reconnect on drained > begin which is not correct. And this patch fixes that. > > Let's finally drop this always running coroutine and go another way: > do both reconnect and receiving in request coroutines. >
Hi, while updating our stack to 6.2, one of our live-migration tests stopped working (backtrace is below) and bisecting led me to this patch. The VM has a single qcow2 disk (converting to raw doesn't make a difference) and the issue only appears when using iothread (for both virtio-scsi-pci and virtio-block-pci). Reverting 1af7737871fb3b66036f5e520acb0a98fc2605f7 (which lives on top) and 4ddb5d2fde6f22b2cf65f314107e890a7ca14fcf (the commit corresponding to this patch) in v6.2.0 makes the migration work again. Backtrace: Thread 1 (Thread 0x7f9d93458fc0 (LWP 56711) "kvm"): #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007f9d9d6bc537 in __GI_abort () at abort.c:79 #2 0x00007f9d9d6bc40f in __assert_fail_base (fmt=0x7f9d9d825128 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5579153763f8 "qemu_get_current_aio_context() == qemu_coroutine_get_aio_context(co)", file=0x5579153764f9 "../io/channel.c", line=483, function=<optimized out>) at assert.c:92 #3 0x00007f9d9d6cb662 in __GI___assert_fail (assertion=assertion@entry=0x5579153763f8 "qemu_get_current_aio_context() == qemu_coroutine_get_aio_context(co)", file=file@entry=0x5579153764f9 "../io/channel.c", line=line@entry=483, function=function@entry=0x557915376570 <__PRETTY_FUNCTION__.2> "qio_channel_restart_read") at assert.c:101 #4 0x00005579150c351c in qio_channel_restart_read (opaque=<optimized out>) at ../io/channel.c:483 #5 qio_channel_restart_read (opaque=<optimized out>) at ../io/channel.c:477 #6 0x000055791520182a in aio_dispatch_handler (ctx=ctx@entry=0x557916908c60, node=0x7f9d8400f800) at ../util/aio-posix.c:329 #7 0x0000557915201f62 in aio_dispatch_handlers (ctx=0x557916908c60) at ../util/aio-posix.c:372 #8 aio_dispatch (ctx=0x557916908c60) at ../util/aio-posix.c:382 #9 0x00005579151ea74e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at ../util/async.c:311 #10 0x00007f9d9e647e6b in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #11 0x0000557915203030 in glib_pollfds_poll () at ../util/main-loop.c:232 #12 os_host_main_loop_wait (timeout=992816) at ../util/main-loop.c:255 #13 main_loop_wait (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:531 #14 0x00005579150539c1 in qemu_main_loop () at ../softmmu/runstate.c:726 #15 0x0000557914ce8ebe in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../softmmu/main.c:50