29.11.2018 13:42, Kevin Wolf wrote: > Am 28.11.2018 um 21:13 hat Vladimir Sementsov-Ogievskiy geschrieben: >> Hi all! >> >> We've faced the following mirror bug: >> >> Just run mirror on qcow2 image more than 1G, and qemu is in dead lock. >> >> Note: I've decided to send this as a patch with reproducer, to make it >> easier to reproduce). No needs to commit this before mirror fix, but >> after, commit message may be a bit shortened, may be even up to just >> "iotests: add simple mirror test". >> >> Note2: if drop 'kvm' option from the test it still reproduces, but if >> use iotests.VM() - does not (may be, because of qtest?). If add >> iothread, it doesn't reproduce too. Also, it doesn't reproduce with raw >> instead of qcow2 and doesn't reproduce for small images. >> >> So, here is the dead-lock: >> >> (gdb) info thr >> Id Target Id Frame >> 3 Thread 0x7fd7fd4ea700 (LWP 145412) "qemu-system-x86" syscall () at >> ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 >> 2 Thread 0x7fd7fc205700 (LWP 145416) "qemu-system-x86" >> __lll_lock_wait () at >> ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 >> * 1 Thread 0x7fd8102cfcc0 (LWP 145411) "qemu-system-x86" >> __lll_lock_wait () at >> ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 >> (gdb) bt >> #0 0x00007fd80e8864cd in __lll_lock_wait () at >> ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 >> #1 0x00007fd80e881dcb in _L_lock_812 () at /lib64/libpthread.so.0 >> #2 0x00007fd80e881c98 in __GI___pthread_mutex_lock (mutex=0x561032dce420 >> <qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:79 >> #3 0x0000561032654c3b in qemu_mutex_lock_impl (mutex=0x561032dce420 >> <qemu_global_mutex>, file=0x5610327d8654 "util/main-loop.c", line=236) at >> util/qemu-thread-posix.c:66 >> #4 0x00005610320cb327 in qemu_mutex_lock_iothread_impl >> (file=0x5610327d8654 "util/main-loop.c", line=236) at >> /work/src/qemu/up-mirror-dead-lock/cpus.c:1849 >> #5 0x000056103265038f in os_host_main_loop_wait (timeout=480116000) at >> util/main-loop.c:236 >> #6 0x0000561032650450 in main_loop_wait (nonblocking=0) at >> util/main-loop.c:497 >> #7 0x00005610322575c9 in main_loop () at vl.c:1892 >> #8 0x000056103225f3c7 in main (argc=13, argv=0x7ffcc8bb15a8, >> envp=0x7ffcc8bb1618) at vl.c:4648 >> (gdb) p qemu_global_mutex >> $1 = {lock = {__data = {__lock = 2, __count = 0, __owner = 145416, >> __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = >> 0x0, __next = 0x0}}, >> __size = "\002\000\000\000\000\000\000\000\b8\002\000\001", '\000' >> <repeats 26 times>, __align = 2}, file = 0x56103267bcb0 >> "/work/src/qemu/up-mirror-dead-lock/exec.c", line = 3197, initialized = true} >> >> So, we see, that thr1 is main loop, and now it wants BQL, which is >> owned by thr2. >> >> (gdb) thr 2 >> [Switching to thread 2 (Thread 0x7fd7fc205700 (LWP 145416))] >> #0 __lll_lock_wait () at >> ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 >> 135 2: movl %edx, %eax >> (gdb) bt >> #0 0x00007fd80e8864cd in __lll_lock_wait () at >> ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 >> #1 0x00007fd80e881de6 in _L_lock_870 () at /lib64/libpthread.so.0 >> #2 0x00007fd80e881cdf in __GI___pthread_mutex_lock (mutex=0x561034d25dc0) >> at ../nptl/pthread_mutex_lock.c:114 >> #3 0x0000561032654c3b in qemu_mutex_lock_impl (mutex=0x561034d25dc0, >> file=0x5610327d81d1 "util/async.c", line=511) at util/qemu-thread-posix.c:66 >> #4 0x000056103264d840 in aio_context_acquire (ctx=0x561034d25d60) at >> util/async.c:511 >> #5 0x0000561032254554 in dma_blk_cb (opaque=0x7fd7f41463b0, ret=0) at >> dma-helpers.c:169 >> #6 0x000056103225479d in dma_blk_io (ctx=0x561034d25d60, >> sg=0x561035a80210, offset=0, align=512, io_func=0x5610322547a3 >> <dma_blk_read_io_func>, io_func_opaque=0x561034d40870, cb=0x561032340b35 >> <ide_dma_cb>, opaque=0x561035a7fee8, dir=DMA_DIRECTION_FROM_DEVICE) at >> dma-helpers.c:227 >> #7 0x0000561032254855 in dma_blk_read (blk=0x561034d40870, >> sg=0x561035a80210, offset=0, align=512, cb=0x561032340b35 <ide_dma_cb>, >> opaque=0x561035a7fee8) at dma-helpers.c:245 >> #8 0x0000561032340e6e in ide_dma_cb (opaque=0x561035a7fee8, ret=0) at >> hw/ide/core.c:910 >> #9 0x000056103234a912 in bmdma_cmd_writeb (bm=0x561035a81030, val=9) at >> hw/ide/pci.c:240 >> #10 0x000056103234b5bd in bmdma_write (opaque=0x561035a81030, addr=0, >> val=9, size=1) at hw/ide/piix.c:76 >> #11 0x00005610320e5763 in memory_region_write_accessor (mr=0x561035a81180, >> addr=0, value=0x7fd7fc204678, size=1, shift=0, mask=255, attrs=...) at >> /work/src/qemu/up-mirror-dead-lock/memory.c:504 >> #12 0x00005610320e596d in access_with_adjusted_size (addr=0, >> value=0x7fd7fc204678, size=1, access_size_min=1, access_size_max=4, >> access_fn= >> 0x5610320e5683 <memory_region_write_accessor>, mr=0x561035a81180, >> attrs=...) at /work/src/qemu/up-mirror-dead-lock/memory.c:570 >> #13 0x00005610320e86ce in memory_region_dispatch_write (mr=0x561035a81180, >> addr=0, data=9, size=1, attrs=...) at >> /work/src/qemu/up-mirror-dead-lock/memory.c:1452 >> #14 0x0000561032083770 in flatview_write_continue (fv=0x7fd7f4108090, >> addr=49216, attrs=..., buf=0x7fd810309000 "\t\371\061", len=1, addr1=0, l=1, >> mr=0x561035a81180) >> at /work/src/qemu/up-mirror-dead-lock/exec.c:3233 >> #15 0x00005610320838cf in flatview_write (fv=0x7fd7f4108090, addr=49216, >> attrs=..., buf=0x7fd810309000 "\t\371\061", len=1) at >> /work/src/qemu/up-mirror-dead-lock/exec.c:3272 >> #16 0x0000561032083bef in address_space_write (as=0x561032dcdc40 >> <address_space_io>, addr=49216, attrs=..., buf=0x7fd810309000 "\t\371\061", >> len=1) at /work/src/qemu/up-mirror-dead-lock/exec.c:3362 >> #17 0x0000561032083c40 in address_space_rw (as=0x561032dcdc40 >> <address_space_io>, addr=49216, attrs=..., buf=0x7fd810309000 "\t\371\061", >> len=1, is_write=true) at /work/src/qemu/up-mirror-dead-lock/exec.c:3373 >> #18 0x0000561032104766 in kvm_handle_io (port=49216, attrs=..., >> data=0x7fd810309000, direction=1, size=1, count=1) at >> /work/src/qemu/up-mirror-dead-lock/accel/kvm/kvm-all.c:1775 >> #19 0x0000561032104f0e in kvm_cpu_exec (cpu=0x561034d60240) at >> /work/src/qemu/up-mirror-dead-lock/accel/kvm/kvm-all.c:2021 >> #20 0x00005610320ca396 in qemu_kvm_cpu_thread_fn (arg=0x561034d60240) at >> /work/src/qemu/up-mirror-dead-lock/cpus.c:1281 >> #21 0x00005610326557ab in qemu_thread_start (args=0x561034d81a70) at >> util/qemu-thread-posix.c:498 >> #22 0x00007fd80e87fdd5 in start_thread (arg=0x7fd7fc205700) at >> pthread_create.c:308 >> #23 0x00007fd804c97b3d in clone () at >> ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 >> (gdb) fr 2 >> #2 0x00007fd80e881cdf in __GI___pthread_mutex_lock (mutex=0x561034d25dc0) >> at ../nptl/pthread_mutex_lock.c:114 >> 114 LLL_MUTEX_LOCK (mutex); >> (gdb) p *mutex >> $2 = {__data = {__lock = 2, __count = 16, __owner = 145411, __nusers = 1, >> __kind = 1, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = >> 0x0}}, >> __size = >> "\002\000\000\000\020\000\000\000\003\070\002\000\001\000\000\000\001", >> '\000' <repeats 22 times>, __align = 68719476738} >> >> And thr2 wants aio context mutex, which is owned by thr1. Classic >> dead-lock. >> >> But who took aio context? Suppose that it is mirror and check: >> >> (gdb) p blk_next(0)->root->bs->job >> $3 = (BlockJob *) 0x5610358e2520 >> (gdb) p (MirrorBlockJob *)$3 >> $4 = (MirrorBlockJob *) 0x5610358e2520 >> [...] >> (gdb) p $4->target->root->bs->tracked_requests.lh_first->co >> $7 = (Coroutine *) 0x561035dd0860 >> [...] >> (gdb) qemu coroutine 0x561035dd0860 >> #0 0x0000561032670ab2 in qemu_coroutine_switch (from_=0x561035dd0860, >> to_=0x7fd8102cfb80, action=COROUTINE_YIELD) at util/coroutine-ucontext.c:218 >> #1 0x000056103266f0fc in qemu_coroutine_yield () at >> util/qemu-coroutine.c:194 >> #2 0x000056103266fb76 in qemu_co_mutex_lock_slowpath (ctx=0x561034d25d60, >> mutex=0x561035c5a4b0) at util/qemu-coroutine-lock.c:235 >> #3 0x000056103266fc82 in qemu_co_mutex_lock (mutex=0x561035c5a4b0) at >> util/qemu-coroutine-lock.c:273 >> #4 0x0000561032544f4e in qcow2_co_pwritev (bs=0x561035b86890, >> offset=531628032, bytes=1048576, qiov=0x561034fd08a8, flags=0) at >> block/qcow2.c:2063 >> #5 0x000056103258e5df in bdrv_driver_pwritev (bs=0x561035b86890, >> offset=531628032, bytes=1048576, qiov=0x561034fd08a8, flags=0) at >> block/io.c:1092 >> #6 0x000056103259004b in bdrv_aligned_pwritev (child=0x561035b178c0, >> req=0x7fd7a23eae30, offset=531628032, bytes=1048576, align=1, >> qiov=0x561034fd08a8, flags=0) at block/io.c:1719 >> #7 0x0000561032590cb2 in bdrv_co_pwritev (child=0x561035b178c0, >> offset=531628032, bytes=1048576, qiov=0x561034fd08a8, flags=0) at >> block/io.c:1961 >> #8 0x0000561032578478 in blk_co_pwritev (blk=0x561035dd0400, >> offset=531628032, bytes=1048576, qiov=0x561034fd08a8, flags=0) at >> block/block-backend.c:1203 >> #9 0x00005610325866a0 in mirror_read_complete (op=0x561034fd08a0, ret=0) >> at block/mirror.c:232 >> #10 0x0000561032586e1a in mirror_co_read (opaque=0x561034fd08a0) at >> block/mirror.c:370 >> #11 0x000056103267081d in coroutine_trampoline (i0=903678048, i1=22032) at >> util/coroutine-ucontext.c:116 >> #12 0x00007fd804be0fc0 in __start_context () at /lib64/libc.so.6 >> #13 0x00007fd7fafff600 in () >> #14 0x0000000000000000 in () >> >> Ok, look at mirror_read_complete: >> static void coroutine_fn mirror_read_complete(MirrorOp *op, int ret) >> { >> MirrorBlockJob *s = op->s; >> >> aio_context_acquire(blk_get_aio_context(s->common.blk)); >> if (ret < 0) { >> BlockErrorAction action; >> >> bdrv_set_dirty_bitmap(s->dirty_bitmap, op->offset, op->bytes); >> action = mirror_error_action(s, true, -ret); >> if (action == BLOCK_ERROR_ACTION_REPORT && s->ret >= 0) { >> s->ret = ret; >> } >> >> mirror_iteration_done(op, ret); >> } else { >> ret = blk_co_pwritev(s->target, op->offset, >> op->qiov.size, &op->qiov, 0); >> mirror_write_complete(op, ret); >> } >> aio_context_release(blk_get_aio_context(s->common.blk)); >> } >> >> Line 232 if blk_co_pwritev. So, we acquired aio context and go to >> write, which may yield, which makes the dead-lock possible. >> >> And bisect points to >> commit 2e1990b26e5aa1ba1a730aa6281df123bb7a71b6 (refs/bisect/bad) >> Author: Max Reitz <mre...@redhat.com> >> Date: Wed Jun 13 20:18:11 2018 +0200 >> >> block/mirror: Convert to coroutines >> >> So, yes, 3.0 has the bug. >> >> Interesting thing is that pair of aio_context_acquire / release exists >> even before that patch, and it was added in 2.9: >> commit b9e413dd3756f71abe4e8cafe1d7a459ce74ccf4 >> Author: Paolo Bonzini <pbonz...@redhat.com> >> Date: Mon Feb 13 14:52:32 2017 +0100 >> >> block: explicitly acquire aiocontext in aio callbacks that need it >> >> In 2.9 mirror_read_complete was a callback for blk_aio_preadv. But >> honestly, I don't understand why it needed to acquire the context: it >> is inserted to this context as a BH, why it need to additionally >> acquire the context? > > BHs don't automatically take the AioContext lock of the AioContext from > which they are run.
so, if I call aio_bh_schedule_oneshot(aio_contex, cb), I can't be sure, that this cb will run in this aio_context, and I should acquire aio_context in cb code? Then, what means the first parameter of aio_bh_schedule_oneshot(), and is text in docs/devel/multiple-iothreads.txt wrong, which sais: Side note: the best way to schedule a function call across threads is to call aio_bh_schedule_oneshot(). No acquire/release or locking is needed. > >> Anyway, after 2e1990b26e5 mirror_read/write_complete are not >> callbacks, so, even if b9e413dd3 was OK, it doesn't relate to this >> place now. > > I guess in this specific instance we can just remove the locking from > mirror_read/write_complete becasue it's only locking an AioContext whose > lock we already hold anyway because we're in a Coroutine owned by the > AioContext, so yielding will release the lock and avoid the deadlock. > > But more generally, does this mean that it is never correct to have an > explicit aio_context_acquire/release() pair in coroutine context, at > least for all practical cases where you call things that could yield? > And if you're inside a coroutine and want to access something in a > different AioContext, you need to drop out of the coroutine first? > > If so, should aio_context_acquire() then assert that it's not called > from coroutine context? > > Paolo, will this restriction of coroutines be solved for good when you > get your patches merged that remove the AioContext lock? > > Kevin > -- Best regards, Vladimir