Re: [Qemu-devel] AioContext locking vs. coroutines

Vladimir Sementsov-Ogievskiy Thu, 29 Nov 2018 04:20:03 -0800

29.11.2018 13:42, Kevin Wolf wrote:
> Am 28.11.2018 um 21:13 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> Hi all!
>>
>> We've faced the following mirror bug:
>>
>> Just run mirror on qcow2 image more than 1G, and qemu is in dead lock.
>>
>> Note: I've decided to send this as a patch with reproducer, to make it
>> easier to reproduce). No needs to commit this before mirror fix, but
>> after, commit message may be a bit shortened, may be even up to just
>> "iotests: add simple mirror test".
>>
>> Note2: if drop 'kvm' option from the test it still reproduces, but if
>> use iotests.VM() - does not (may be, because of qtest?). If add
>> iothread, it doesn't reproduce too. Also, it doesn't reproduce with raw
>> instead of qcow2 and doesn't reproduce for small images.
>>
>> So, here is the dead-lock:
>>
>>   (gdb) info thr
>>     Id   Target Id         Frame
>>     3    Thread 0x7fd7fd4ea700 (LWP 145412) "qemu-system-x86" syscall () at 
>> ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
>>     2    Thread 0x7fd7fc205700 (LWP 145416) "qemu-system-x86" 
>> __lll_lock_wait () at 
>> ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
>>   * 1    Thread 0x7fd8102cfcc0 (LWP 145411) "qemu-system-x86" 
>> __lll_lock_wait () at 
>> ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
>>   (gdb) bt
>>   #0  0x00007fd80e8864cd in __lll_lock_wait () at 
>> ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
>>   #1  0x00007fd80e881dcb in _L_lock_812 () at /lib64/libpthread.so.0
>>   #2  0x00007fd80e881c98 in __GI___pthread_mutex_lock (mutex=0x561032dce420 
>> <qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:79
>>   #3  0x0000561032654c3b in qemu_mutex_lock_impl (mutex=0x561032dce420 
>> <qemu_global_mutex>, file=0x5610327d8654 "util/main-loop.c", line=236) at 
>> util/qemu-thread-posix.c:66
>>   #4  0x00005610320cb327 in qemu_mutex_lock_iothread_impl 
>> (file=0x5610327d8654 "util/main-loop.c", line=236) at 
>> /work/src/qemu/up-mirror-dead-lock/cpus.c:1849
>>   #5  0x000056103265038f in os_host_main_loop_wait (timeout=480116000) at 
>> util/main-loop.c:236
>>   #6  0x0000561032650450 in main_loop_wait (nonblocking=0) at 
>> util/main-loop.c:497
>>   #7  0x00005610322575c9 in main_loop () at vl.c:1892
>>   #8  0x000056103225f3c7 in main (argc=13, argv=0x7ffcc8bb15a8, 
>> envp=0x7ffcc8bb1618) at vl.c:4648
>>   (gdb) p qemu_global_mutex
>>   $1 = {lock = {__data = {__lock = 2, __count = 0, __owner = 145416, 
>> __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 
>> 0x0, __next = 0x0}},
>>       __size = "\002\000\000\000\000\000\000\000\b8\002\000\001", '\000' 
>> <repeats 26 times>, __align = 2}, file = 0x56103267bcb0 
>> "/work/src/qemu/up-mirror-dead-lock/exec.c", line = 3197, initialized = true}
>>
>>   So, we see, that thr1 is main loop, and now it wants BQL, which is
>>   owned by thr2.
>>
>>   (gdb) thr 2
>>   [Switching to thread 2 (Thread 0x7fd7fc205700 (LWP 145416))]
>>   #0  __lll_lock_wait () at 
>> ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
>>   135     2:      movl    %edx, %eax
>>   (gdb) bt
>>   #0  0x00007fd80e8864cd in __lll_lock_wait () at 
>> ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
>>   #1  0x00007fd80e881de6 in _L_lock_870 () at /lib64/libpthread.so.0
>>   #2  0x00007fd80e881cdf in __GI___pthread_mutex_lock (mutex=0x561034d25dc0) 
>> at ../nptl/pthread_mutex_lock.c:114
>>   #3  0x0000561032654c3b in qemu_mutex_lock_impl (mutex=0x561034d25dc0, 
>> file=0x5610327d81d1 "util/async.c", line=511) at util/qemu-thread-posix.c:66
>>   #4  0x000056103264d840 in aio_context_acquire (ctx=0x561034d25d60) at 
>> util/async.c:511
>>   #5  0x0000561032254554 in dma_blk_cb (opaque=0x7fd7f41463b0, ret=0) at 
>> dma-helpers.c:169
>>   #6  0x000056103225479d in dma_blk_io (ctx=0x561034d25d60, 
>> sg=0x561035a80210, offset=0, align=512, io_func=0x5610322547a3 
>> <dma_blk_read_io_func>, io_func_opaque=0x561034d40870, cb=0x561032340b35 
>> <ide_dma_cb>, opaque=0x561035a7fee8, dir=DMA_DIRECTION_FROM_DEVICE) at 
>> dma-helpers.c:227
>>   #7  0x0000561032254855 in dma_blk_read (blk=0x561034d40870, 
>> sg=0x561035a80210, offset=0, align=512, cb=0x561032340b35 <ide_dma_cb>, 
>> opaque=0x561035a7fee8) at dma-helpers.c:245
>>   #8  0x0000561032340e6e in ide_dma_cb (opaque=0x561035a7fee8, ret=0) at 
>> hw/ide/core.c:910
>>   #9  0x000056103234a912 in bmdma_cmd_writeb (bm=0x561035a81030, val=9) at 
>> hw/ide/pci.c:240
>>   #10 0x000056103234b5bd in bmdma_write (opaque=0x561035a81030, addr=0, 
>> val=9, size=1) at hw/ide/piix.c:76
>>   #11 0x00005610320e5763 in memory_region_write_accessor (mr=0x561035a81180, 
>> addr=0, value=0x7fd7fc204678, size=1, shift=0, mask=255, attrs=...) at 
>> /work/src/qemu/up-mirror-dead-lock/memory.c:504
>>   #12 0x00005610320e596d in access_with_adjusted_size (addr=0, 
>> value=0x7fd7fc204678, size=1, access_size_min=1, access_size_max=4, 
>> access_fn=
>>       0x5610320e5683 <memory_region_write_accessor>, mr=0x561035a81180, 
>> attrs=...) at /work/src/qemu/up-mirror-dead-lock/memory.c:570
>>   #13 0x00005610320e86ce in memory_region_dispatch_write (mr=0x561035a81180, 
>> addr=0, data=9, size=1, attrs=...) at 
>> /work/src/qemu/up-mirror-dead-lock/memory.c:1452
>>   #14 0x0000561032083770 in flatview_write_continue (fv=0x7fd7f4108090, 
>> addr=49216, attrs=..., buf=0x7fd810309000 "\t\371\061", len=1, addr1=0, l=1, 
>> mr=0x561035a81180)
>>       at /work/src/qemu/up-mirror-dead-lock/exec.c:3233
>>   #15 0x00005610320838cf in flatview_write (fv=0x7fd7f4108090, addr=49216, 
>> attrs=..., buf=0x7fd810309000 "\t\371\061", len=1) at 
>> /work/src/qemu/up-mirror-dead-lock/exec.c:3272
>>   #16 0x0000561032083bef in address_space_write (as=0x561032dcdc40 
>> <address_space_io>, addr=49216, attrs=..., buf=0x7fd810309000 "\t\371\061", 
>> len=1) at /work/src/qemu/up-mirror-dead-lock/exec.c:3362
>>   #17 0x0000561032083c40 in address_space_rw (as=0x561032dcdc40 
>> <address_space_io>, addr=49216, attrs=..., buf=0x7fd810309000 "\t\371\061", 
>> len=1, is_write=true) at /work/src/qemu/up-mirror-dead-lock/exec.c:3373
>>   #18 0x0000561032104766 in kvm_handle_io (port=49216, attrs=..., 
>> data=0x7fd810309000, direction=1, size=1, count=1) at 
>> /work/src/qemu/up-mirror-dead-lock/accel/kvm/kvm-all.c:1775
>>   #19 0x0000561032104f0e in kvm_cpu_exec (cpu=0x561034d60240) at 
>> /work/src/qemu/up-mirror-dead-lock/accel/kvm/kvm-all.c:2021
>>   #20 0x00005610320ca396 in qemu_kvm_cpu_thread_fn (arg=0x561034d60240) at 
>> /work/src/qemu/up-mirror-dead-lock/cpus.c:1281
>>   #21 0x00005610326557ab in qemu_thread_start (args=0x561034d81a70) at 
>> util/qemu-thread-posix.c:498
>>   #22 0x00007fd80e87fdd5 in start_thread (arg=0x7fd7fc205700) at 
>> pthread_create.c:308
>>   #23 0x00007fd804c97b3d in clone () at 
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
>>   (gdb) fr 2
>>   #2  0x00007fd80e881cdf in __GI___pthread_mutex_lock (mutex=0x561034d25dc0) 
>> at ../nptl/pthread_mutex_lock.c:114
>>   114           LLL_MUTEX_LOCK (mutex);
>>   (gdb) p *mutex
>>   $2 = {__data = {__lock = 2, __count = 16, __owner = 145411, __nusers = 1, 
>> __kind = 1, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 
>> 0x0}},
>>     __size = 
>> "\002\000\000\000\020\000\000\000\003\070\002\000\001\000\000\000\001", 
>> '\000' <repeats 22 times>, __align = 68719476738}
>>
>>   And thr2 wants aio context mutex, which is owned by thr1. Classic
>>   dead-lock.
>>
>>   But who took aio context? Suppose that it is mirror and check:
>>
>>   (gdb) p blk_next(0)->root->bs->job
>>   $3 = (BlockJob *) 0x5610358e2520
>>   (gdb) p (MirrorBlockJob *)$3
>>   $4 = (MirrorBlockJob *) 0x5610358e2520
>>   [...]
>>   (gdb) p $4->target->root->bs->tracked_requests.lh_first->co
>>   $7 = (Coroutine *) 0x561035dd0860
>>   [...]
>>   (gdb) qemu coroutine 0x561035dd0860
>>   #0  0x0000561032670ab2 in qemu_coroutine_switch (from_=0x561035dd0860, 
>> to_=0x7fd8102cfb80, action=COROUTINE_YIELD) at util/coroutine-ucontext.c:218
>>   #1  0x000056103266f0fc in qemu_coroutine_yield () at 
>> util/qemu-coroutine.c:194
>>   #2  0x000056103266fb76 in qemu_co_mutex_lock_slowpath (ctx=0x561034d25d60, 
>> mutex=0x561035c5a4b0) at util/qemu-coroutine-lock.c:235
>>   #3  0x000056103266fc82 in qemu_co_mutex_lock (mutex=0x561035c5a4b0) at 
>> util/qemu-coroutine-lock.c:273
>>   #4  0x0000561032544f4e in qcow2_co_pwritev (bs=0x561035b86890, 
>> offset=531628032, bytes=1048576, qiov=0x561034fd08a8, flags=0) at 
>> block/qcow2.c:2063
>>   #5  0x000056103258e5df in bdrv_driver_pwritev (bs=0x561035b86890, 
>> offset=531628032, bytes=1048576, qiov=0x561034fd08a8, flags=0) at 
>> block/io.c:1092
>>   #6  0x000056103259004b in bdrv_aligned_pwritev (child=0x561035b178c0, 
>> req=0x7fd7a23eae30, offset=531628032, bytes=1048576, align=1, 
>> qiov=0x561034fd08a8, flags=0) at block/io.c:1719
>>   #7  0x0000561032590cb2 in bdrv_co_pwritev (child=0x561035b178c0, 
>> offset=531628032, bytes=1048576, qiov=0x561034fd08a8, flags=0) at 
>> block/io.c:1961
>>   #8  0x0000561032578478 in blk_co_pwritev (blk=0x561035dd0400, 
>> offset=531628032, bytes=1048576, qiov=0x561034fd08a8, flags=0) at 
>> block/block-backend.c:1203
>>   #9  0x00005610325866a0 in mirror_read_complete (op=0x561034fd08a0, ret=0) 
>> at block/mirror.c:232
>>   #10 0x0000561032586e1a in mirror_co_read (opaque=0x561034fd08a0) at 
>> block/mirror.c:370
>>   #11 0x000056103267081d in coroutine_trampoline (i0=903678048, i1=22032) at 
>> util/coroutine-ucontext.c:116
>>   #12 0x00007fd804be0fc0 in __start_context () at /lib64/libc.so.6
>>   #13 0x00007fd7fafff600 in  ()
>>   #14 0x0000000000000000 in  ()
>>
>>   Ok, look at mirror_read_complete:
>>   static void coroutine_fn mirror_read_complete(MirrorOp *op, int ret)
>>   {
>>       MirrorBlockJob *s = op->s;
>>
>>       aio_context_acquire(blk_get_aio_context(s->common.blk));
>>       if (ret < 0) {
>>           BlockErrorAction action;
>>
>>           bdrv_set_dirty_bitmap(s->dirty_bitmap, op->offset, op->bytes);
>>           action = mirror_error_action(s, true, -ret);
>>           if (action == BLOCK_ERROR_ACTION_REPORT && s->ret >= 0) {
>>               s->ret = ret;
>>           }
>>
>>           mirror_iteration_done(op, ret);
>>       } else {
>>           ret = blk_co_pwritev(s->target, op->offset,
>>                                op->qiov.size, &op->qiov, 0);
>>           mirror_write_complete(op, ret);
>>       }
>>       aio_context_release(blk_get_aio_context(s->common.blk));
>>   }
>>
>>   Line 232 if blk_co_pwritev. So, we acquired aio context and go to
>>   write, which may yield, which makes the dead-lock possible.
>>
>>   And bisect points to
>>   commit 2e1990b26e5aa1ba1a730aa6281df123bb7a71b6 (refs/bisect/bad)
>>   Author: Max Reitz <mre...@redhat.com>
>>   Date:   Wed Jun 13 20:18:11 2018 +0200
>>
>>       block/mirror: Convert to coroutines
>>
>>   So, yes, 3.0 has the bug.
>>
>>   Interesting thing is that pair of aio_context_acquire / release exists
>>   even before that patch, and it was added in 2.9:
>>   commit b9e413dd3756f71abe4e8cafe1d7a459ce74ccf4
>>   Author: Paolo Bonzini <pbonz...@redhat.com>
>>   Date:   Mon Feb 13 14:52:32 2017 +0100
>>
>>       block: explicitly acquire aiocontext in aio callbacks that need it
>>
>>   In 2.9 mirror_read_complete was a callback for blk_aio_preadv. But
>>   honestly, I don't understand why it needed to acquire the context: it
>>   is inserted to this context as a BH, why it need to additionally
>>   acquire the context?
> 
> BHs don't automatically take the AioContext lock of the AioContext from
> which they are run.



so, if I call aio_bh_schedule_oneshot(aio_contex, cb), I can't be sure, that 
this
cb will run in this aio_context, and I should acquire aio_context in cb code?

Then, what means the first parameter of aio_bh_schedule_oneshot(), and is text
in docs/devel/multiple-iothreads.txt wrong, which sais:

   Side note: the best way to schedule a function call across threads is to call
   aio_bh_schedule_oneshot().  No acquire/release or locking is needed.


> 
>>   Anyway, after 2e1990b26e5 mirror_read/write_complete are not
>>   callbacks, so, even if b9e413dd3 was OK, it doesn't relate to this
>>   place now.
> 
> I guess in this specific instance we can just remove the locking from
> mirror_read/write_complete becasue it's only locking an AioContext whose
> lock we already hold anyway because we're in a Coroutine owned by the
> AioContext, so yielding will release the lock and avoid the deadlock.
> 
> But more generally, does this mean that it is never correct to have an
> explicit aio_context_acquire/release() pair in coroutine context, at
> least for all practical cases where you call things that could yield?
> And if you're inside a coroutine and want to access something in a
> different AioContext, you need to drop out of the coroutine first?
> 
> If so, should aio_context_acquire() then assert that it's not called
> from coroutine context?
> 
> Paolo, will this restriction of coroutines be solved for good when you
> get your patches merged that remove the AioContext lock?
> 
> Kevin
> 


-- 
Best regards,
Vladimir

Re: [Qemu-devel] AioContext locking vs. coroutines

Reply via email to