Am 12.09.2025 um 16:23 hat Daniel P. Berrangé geschrieben:
> On Thu, Sep 11, 2025 at 02:13:47PM +0200, Kevin Wolf wrote:
> > Am 11.09.2025 um 13:21 hat Thomas Huth geschrieben:
> > > On 10/09/2025 18.08, Kevin Wolf wrote:
> > > > Am 10.09.2025 um 17:16 hat Thomas Huth geschrieben:
> > > > > 
> > > > >   Hi,
> > > > > 
> > > > > when running "./check -luks" in the qemu-iotests directory,
> > > > > some tests are failing for me:
> > > > > 
> > > > > 295 296 inactive-node-nbd luks-detached-header
> > > > > 
> > > > > Is that a known problem already?
> > > > 
> > > > Not to me anyway.
> > > > 
> > > > > FWIW, 295 is failing with the following output:
> > > > > 
> > > > > 295   fail       [17:03:01] [17:03:17]   15.7s                failed, 
> > > > > exit status 1
> > > > > [...]
> > > > > +EWARNING:qemu.machine.machine:qemu received signal 6; command: 
> > > > > "/home/thuth/tmp/qemu-build/qemu-system-x86_64 -display none -vga 
> > > > > none -chardev socket,id=mon,fd=5 -mon chardev=mon,mode=control 
> > > > > -chardev socket,id=qtest,fd=3 -qtest chardev:qtest -accel qtest 
> > > > > -nodefaults -display none -accel qtest"
> > > > > +EEWARNING:qemu.machine.machine:qemu received signal 6; command: 
> > > > > "/home/thuth/tmp/qemu-build/qemu-system-x86_64 -display none -vga 
> > > > > none -chardev socket,id=mon,fd=6 -mon chardev=mon,mode=control 
> > > > > -chardev socket,id=qtest,fd=3 -qtest chardev:qtest -accel qtest 
> > > > > -nodefaults -display none -accel qtest"
> > > > > +EEWARNING:qemu.machine.machine:qemu received signal 6; command: 
> > > > > "/home/thuth/tmp/qemu-build/qemu-system-x86_64 -display none -vga 
> > > > > none -chardev socket,id=mon,fd=10 -mon chardev=mon,mode=control 
> > > > > -chardev socket,id=qtest,fd=3 -qtest chardev:qtest -accel qtest 
> > > > > -nodefaults -display none -accel qtest"
> > > > > +E
> > > > > [...]
> > > > > 
> > > > > etc.
> > > > > 
> > > > > 296 looks very similar (also a "qemu received signal 6" error),
> > > > > but the others look like this:
> > > > 
> > > > When it gets signal 6 (i.e. SIGABRT), that usually means that you should
> > > > have a look at the coredump.
> > > 
> > > With "-p" I additionally get this error message in the log:
> > > 
> > > qemu-system-x86_64: ../../devel/qemu/block/graph-lock.c:294:
> > >  bdrv_graph_rdlock_main_loop: Assertion `!qemu_in_coroutine()' failed.
> > > 
> > > With -gdb I can get a back trace that looks like this:
> > > 
> > > Thread 1 "qemu-system-x86" received signal SIGABRT, Aborted.
> > > 0x00007ffff4ba7e9c in __pthread_kill_implementation () from 
> > > target:/lib64/libc.so.6
> > > --Type <RET> for more, q to quit, c to continue without paging--
> > > #0  0x00007ffff4ba7e9c in __pthread_kill_implementation () from 
> > > target:/lib64/libc.so.6
> > > #1  0x00007ffff4b4df3e in raise () from target:/lib64/libc.so.6
> > > #2  0x00007ffff4b356d0 in abort () from target:/lib64/libc.so.6
> > > #3  0x00007ffff4b35639 in __assert_fail_base.cold () from 
> > > target:/lib64/libc.so.6
> > > #4  0x0000555555574eae in bdrv_graph_rdlock_main_loop () at 
> > > ../../devel/qemu/block/graph-lock.c:294
> > > #5  0x0000555555aa2f43 in graph_lockable_auto_lock_mainloop (x=<optimized 
> > > out>) at /home/thuth/devel/qemu/include/block/graph-lock.h:275
> > > #6  block_crypto_read_func (block=<optimized out>, offset=4096, 
> > > buf=0x555558324100 "", buflen=256000, opaque=0x555558a259d0, 
> > > errp=0x555558a8c370)
> > >     at ../../devel/qemu/block/crypto.c:71
> > > #7  0x0000555555a5a308 in qcrypto_block_luks_load_key 
> > > (block=block@entry=0x555558686ec0, slot_idx=slot_idx@entry=0,
> > >     password=password@entry=0x555558626050 "hunter0", 
> > > masterkey=masterkey@entry=0x55555886b2a0 "",
> > >     readfunc=readfunc@entry=0x555555aa2f10 <block_crypto_read_func>, 
> > > opaque=opaque@entry=0x555558a259d0, errp=0x555558a8c370)
> > >     at ../../devel/qemu/crypto/block-luks.c:927
> > > #8  0x0000555555a5ba7e in qcrypto_block_luks_find_key 
> > > (block=0x555558686ec0, password=0x555558626050 "hunter0", 
> > > masterkey=0x55555886b2a0 "",
> > >     readfunc=0x555555aa2f10 <block_crypto_read_func>, 
> > > opaque=0x555558a259d0, errp=0x555558a8c370) at 
> > > ../../devel/qemu/crypto/block-luks.c:1045
> > > #9  qcrypto_block_luks_amend_add_keyslot (block=0x555558686ec0, 
> > > readfunc=0x555555aa2f10 <block_crypto_read_func>,
> > >     writefunc=0x555555aa2e50 <block_crypto_write_func>, 
> > > opaque=0x555558a259d0, opts_luks=0x7fffec5fff38, force=<optimized out>, 
> > > errp=0x555558a8c370)
> > >     at ../../devel/qemu/crypto/block-luks.c:1673
> > > #10 qcrypto_block_luks_amend_options (block=0x555558686ec0, 
> > > readfunc=0x555555aa2f10 <block_crypto_read_func>,
> > >     writefunc=0x555555aa2e50 <block_crypto_write_func>, 
> > > opaque=0x555558a259d0, options=0x7fffec5fff30, force=<optimized out>, 
> > > errp=0x555558a8c370)
> > >     at ../../devel/qemu/crypto/block-luks.c:1865
> > > #11 0x0000555555aa3852 in block_crypto_amend_options_generic_luks 
> > > (bs=<optimized out>, amend_options=<optimized out>, force=<optimized out>,
> > >     errp=<optimized out>) at ../../devel/qemu/block/crypto.c:949
> > > #12 0x0000555555aa38e9 in block_crypto_co_amend_luks (bs=<optimized out>, 
> > > opts=<optimized out>, force=<optimized out>, errp=<optimized out>)
> > >     at ../../devel/qemu/block/crypto.c:1008
> > > #13 0x0000555555a96030 in blockdev_amend_run (job=0x555558a8c2b0, 
> > > errp=0x555558a8c370) at ../../devel/qemu/block/amend.c:52
> > > #14 0x0000555555a874ad in job_co_entry (opaque=0x555558a8c2b0) at 
> > > ../../devel/qemu/job.c:1112
> > > #15 0x0000555555bdc41b in coroutine_trampoline (i0=<optimized out>, 
> > > i1=<optimized out>) at ../../devel/qemu/util/coroutine-ucontext.c:175
> > > #16 0x00007ffff4b68f70 in ?? () from target:/lib64/libc.so.6
> > > #17 0x00007fffffffc310 in ?? ()
> > > #18 0x0000000000000000 in ?? ()
> > 
> > Hm, so block_crypto_read_func() isn't prepared to be called in coroutine
> > context, but block_crypto_co_amend_luks() still calls it from a
> > coroutine. The indirection of going through QCrypto won't make it easier
> > to fix this properly.
> 
> Historically block_crypto_read_func() didn't care/know whether it
> was in a coroutine or not. Bisect tells me the regression was caused
> by
> 
>   commit 1f051dcbdf2e4b6f518db731c84e304b2b9d15ce
>   Author: Kevin Wolf <kw...@redhat.com>
>   Date:   Fri Oct 27 17:53:33 2023 +0200
> 
>     block: Protect bs->file with graph_lock
> 
> which added
>  
>     GLOBAL_STATE_CODE();
>     GRAPH_RDLOCK_GUARD_MAINLOOP();
> 
> > It seems to me that while block_crypto_read/write_func are effectively
> > no_coroutine_fn, qcrypto_block_amend_options() should really take
> > function pointers that can be called from coroutines. It is called from
> > both coroutine and non-coroutine code paths, so should the function
> > pointers be coroutine_mixed_fn or do we want to change the callers?
> >
> > Either way, we should add the appropriate coroutine markers to the
> > QCrypto interfaces to show the intention at least.
> 
> I'm unclear why QCrypto needs to know about coroutines at all ?
> It just wants a function pointer that will send or recv a blob
> of data. In the case of the block layer these functions end up
> doing I/O via the block APIs, but QCrypto doesn't care about
> this impl detail.

Does a case where it's not in the context of the block layer even exist?
The only callers of qcrypto_block_amend_options() are in block/crypto.c
and block/qcow2.c. Apart from a test case, qcrypto_block_open() is the
same.

And even ignoring the block layer, doing synchronous I/O outside of
coroutines is arguably a bug anyway because that's a blocking operation
that stops the mainloop from making progress.

But if we don't want to fix it at the QCrypto level, that makes the
function pointer implicitly coroutine_mixed_fn and the function needs to
be rewritten to check qemu_in_coroutine() and probably take the graph
lock internally before calling bdrv_co_pread() in the coroutine case,
unless we can prove that all callers already hold it. Unfortunately,
function pointers still defeat TSA, so this proof will have to be
manual.

Kevin


Reply via email to