Am 12.09.2025 um 16:23 hat Daniel P. Berrangé geschrieben: > On Thu, Sep 11, 2025 at 02:13:47PM +0200, Kevin Wolf wrote: > > Am 11.09.2025 um 13:21 hat Thomas Huth geschrieben: > > > On 10/09/2025 18.08, Kevin Wolf wrote: > > > > Am 10.09.2025 um 17:16 hat Thomas Huth geschrieben: > > > > > > > > > > Hi, > > > > > > > > > > when running "./check -luks" in the qemu-iotests directory, > > > > > some tests are failing for me: > > > > > > > > > > 295 296 inactive-node-nbd luks-detached-header > > > > > > > > > > Is that a known problem already? > > > > > > > > Not to me anyway. > > > > > > > > > FWIW, 295 is failing with the following output: > > > > > > > > > > 295 fail [17:03:01] [17:03:17] 15.7s failed, > > > > > exit status 1 > > > > > [...] > > > > > +EWARNING:qemu.machine.machine:qemu received signal 6; command: > > > > > "/home/thuth/tmp/qemu-build/qemu-system-x86_64 -display none -vga > > > > > none -chardev socket,id=mon,fd=5 -mon chardev=mon,mode=control > > > > > -chardev socket,id=qtest,fd=3 -qtest chardev:qtest -accel qtest > > > > > -nodefaults -display none -accel qtest" > > > > > +EEWARNING:qemu.machine.machine:qemu received signal 6; command: > > > > > "/home/thuth/tmp/qemu-build/qemu-system-x86_64 -display none -vga > > > > > none -chardev socket,id=mon,fd=6 -mon chardev=mon,mode=control > > > > > -chardev socket,id=qtest,fd=3 -qtest chardev:qtest -accel qtest > > > > > -nodefaults -display none -accel qtest" > > > > > +EEWARNING:qemu.machine.machine:qemu received signal 6; command: > > > > > "/home/thuth/tmp/qemu-build/qemu-system-x86_64 -display none -vga > > > > > none -chardev socket,id=mon,fd=10 -mon chardev=mon,mode=control > > > > > -chardev socket,id=qtest,fd=3 -qtest chardev:qtest -accel qtest > > > > > -nodefaults -display none -accel qtest" > > > > > +E > > > > > [...] > > > > > > > > > > etc. > > > > > > > > > > 296 looks very similar (also a "qemu received signal 6" error), > > > > > but the others look like this: > > > > > > > > When it gets signal 6 (i.e. SIGABRT), that usually means that you should > > > > have a look at the coredump. > > > > > > With "-p" I additionally get this error message in the log: > > > > > > qemu-system-x86_64: ../../devel/qemu/block/graph-lock.c:294: > > > bdrv_graph_rdlock_main_loop: Assertion `!qemu_in_coroutine()' failed. > > > > > > With -gdb I can get a back trace that looks like this: > > > > > > Thread 1 "qemu-system-x86" received signal SIGABRT, Aborted. > > > 0x00007ffff4ba7e9c in __pthread_kill_implementation () from > > > target:/lib64/libc.so.6 > > > --Type <RET> for more, q to quit, c to continue without paging-- > > > #0 0x00007ffff4ba7e9c in __pthread_kill_implementation () from > > > target:/lib64/libc.so.6 > > > #1 0x00007ffff4b4df3e in raise () from target:/lib64/libc.so.6 > > > #2 0x00007ffff4b356d0 in abort () from target:/lib64/libc.so.6 > > > #3 0x00007ffff4b35639 in __assert_fail_base.cold () from > > > target:/lib64/libc.so.6 > > > #4 0x0000555555574eae in bdrv_graph_rdlock_main_loop () at > > > ../../devel/qemu/block/graph-lock.c:294 > > > #5 0x0000555555aa2f43 in graph_lockable_auto_lock_mainloop (x=<optimized > > > out>) at /home/thuth/devel/qemu/include/block/graph-lock.h:275 > > > #6 block_crypto_read_func (block=<optimized out>, offset=4096, > > > buf=0x555558324100 "", buflen=256000, opaque=0x555558a259d0, > > > errp=0x555558a8c370) > > > at ../../devel/qemu/block/crypto.c:71 > > > #7 0x0000555555a5a308 in qcrypto_block_luks_load_key > > > (block=block@entry=0x555558686ec0, slot_idx=slot_idx@entry=0, > > > password=password@entry=0x555558626050 "hunter0", > > > masterkey=masterkey@entry=0x55555886b2a0 "", > > > readfunc=readfunc@entry=0x555555aa2f10 <block_crypto_read_func>, > > > opaque=opaque@entry=0x555558a259d0, errp=0x555558a8c370) > > > at ../../devel/qemu/crypto/block-luks.c:927 > > > #8 0x0000555555a5ba7e in qcrypto_block_luks_find_key > > > (block=0x555558686ec0, password=0x555558626050 "hunter0", > > > masterkey=0x55555886b2a0 "", > > > readfunc=0x555555aa2f10 <block_crypto_read_func>, > > > opaque=0x555558a259d0, errp=0x555558a8c370) at > > > ../../devel/qemu/crypto/block-luks.c:1045 > > > #9 qcrypto_block_luks_amend_add_keyslot (block=0x555558686ec0, > > > readfunc=0x555555aa2f10 <block_crypto_read_func>, > > > writefunc=0x555555aa2e50 <block_crypto_write_func>, > > > opaque=0x555558a259d0, opts_luks=0x7fffec5fff38, force=<optimized out>, > > > errp=0x555558a8c370) > > > at ../../devel/qemu/crypto/block-luks.c:1673 > > > #10 qcrypto_block_luks_amend_options (block=0x555558686ec0, > > > readfunc=0x555555aa2f10 <block_crypto_read_func>, > > > writefunc=0x555555aa2e50 <block_crypto_write_func>, > > > opaque=0x555558a259d0, options=0x7fffec5fff30, force=<optimized out>, > > > errp=0x555558a8c370) > > > at ../../devel/qemu/crypto/block-luks.c:1865 > > > #11 0x0000555555aa3852 in block_crypto_amend_options_generic_luks > > > (bs=<optimized out>, amend_options=<optimized out>, force=<optimized out>, > > > errp=<optimized out>) at ../../devel/qemu/block/crypto.c:949 > > > #12 0x0000555555aa38e9 in block_crypto_co_amend_luks (bs=<optimized out>, > > > opts=<optimized out>, force=<optimized out>, errp=<optimized out>) > > > at ../../devel/qemu/block/crypto.c:1008 > > > #13 0x0000555555a96030 in blockdev_amend_run (job=0x555558a8c2b0, > > > errp=0x555558a8c370) at ../../devel/qemu/block/amend.c:52 > > > #14 0x0000555555a874ad in job_co_entry (opaque=0x555558a8c2b0) at > > > ../../devel/qemu/job.c:1112 > > > #15 0x0000555555bdc41b in coroutine_trampoline (i0=<optimized out>, > > > i1=<optimized out>) at ../../devel/qemu/util/coroutine-ucontext.c:175 > > > #16 0x00007ffff4b68f70 in ?? () from target:/lib64/libc.so.6 > > > #17 0x00007fffffffc310 in ?? () > > > #18 0x0000000000000000 in ?? () > > > > Hm, so block_crypto_read_func() isn't prepared to be called in coroutine > > context, but block_crypto_co_amend_luks() still calls it from a > > coroutine. The indirection of going through QCrypto won't make it easier > > to fix this properly. > > Historically block_crypto_read_func() didn't care/know whether it > was in a coroutine or not. Bisect tells me the regression was caused > by > > commit 1f051dcbdf2e4b6f518db731c84e304b2b9d15ce > Author: Kevin Wolf <kw...@redhat.com> > Date: Fri Oct 27 17:53:33 2023 +0200 > > block: Protect bs->file with graph_lock > > which added > > GLOBAL_STATE_CODE(); > GRAPH_RDLOCK_GUARD_MAINLOOP(); > > > It seems to me that while block_crypto_read/write_func are effectively > > no_coroutine_fn, qcrypto_block_amend_options() should really take > > function pointers that can be called from coroutines. It is called from > > both coroutine and non-coroutine code paths, so should the function > > pointers be coroutine_mixed_fn or do we want to change the callers? > > > > Either way, we should add the appropriate coroutine markers to the > > QCrypto interfaces to show the intention at least. > > I'm unclear why QCrypto needs to know about coroutines at all ? > It just wants a function pointer that will send or recv a blob > of data. In the case of the block layer these functions end up > doing I/O via the block APIs, but QCrypto doesn't care about > this impl detail.
Does a case where it's not in the context of the block layer even exist? The only callers of qcrypto_block_amend_options() are in block/crypto.c and block/qcow2.c. Apart from a test case, qcrypto_block_open() is the same. And even ignoring the block layer, doing synchronous I/O outside of coroutines is arguably a bug anyway because that's a blocking operation that stops the mainloop from making progress. But if we don't want to fix it at the QCrypto level, that makes the function pointer implicitly coroutine_mixed_fn and the function needs to be rewritten to check qemu_in_coroutine() and probably take the graph lock internally before calling bdrv_co_pread() in the coroutine case, unless we can prove that all callers already hold it. Unfortunately, function pointers still defeat TSA, so this proof will have to be manual. Kevin