On Thu, Sep 11, 2025 at 02:13:47PM +0200, Kevin Wolf wrote:
> Am 11.09.2025 um 13:21 hat Thomas Huth geschrieben:
> > On 10/09/2025 18.08, Kevin Wolf wrote:
> > > Am 10.09.2025 um 17:16 hat Thomas Huth geschrieben:
> > > > 
> > > >   Hi,
> > > > 
> > > > when running "./check -luks" in the qemu-iotests directory,
> > > > some tests are failing for me:
> > > > 
> > > > 295 296 inactive-node-nbd luks-detached-header
> > > > 
> > > > Is that a known problem already?
> > > 
> > > Not to me anyway.
> > > 
> > > > FWIW, 295 is failing with the following output:
> > > > 
> > > > 295   fail       [17:03:01] [17:03:17]   15.7s                failed, 
> > > > exit status 1
> > > > [...]
> > > > +EWARNING:qemu.machine.machine:qemu received signal 6; command: 
> > > > "/home/thuth/tmp/qemu-build/qemu-system-x86_64 -display none -vga none 
> > > > -chardev socket,id=mon,fd=5 -mon chardev=mon,mode=control -chardev 
> > > > socket,id=qtest,fd=3 -qtest chardev:qtest -accel qtest -nodefaults 
> > > > -display none -accel qtest"
> > > > +EEWARNING:qemu.machine.machine:qemu received signal 6; command: 
> > > > "/home/thuth/tmp/qemu-build/qemu-system-x86_64 -display none -vga none 
> > > > -chardev socket,id=mon,fd=6 -mon chardev=mon,mode=control -chardev 
> > > > socket,id=qtest,fd=3 -qtest chardev:qtest -accel qtest -nodefaults 
> > > > -display none -accel qtest"
> > > > +EEWARNING:qemu.machine.machine:qemu received signal 6; command: 
> > > > "/home/thuth/tmp/qemu-build/qemu-system-x86_64 -display none -vga none 
> > > > -chardev socket,id=mon,fd=10 -mon chardev=mon,mode=control -chardev 
> > > > socket,id=qtest,fd=3 -qtest chardev:qtest -accel qtest -nodefaults 
> > > > -display none -accel qtest"
> > > > +E
> > > > [...]
> > > > 
> > > > etc.
> > > > 
> > > > 296 looks very similar (also a "qemu received signal 6" error),
> > > > but the others look like this:
> > > 
> > > When it gets signal 6 (i.e. SIGABRT), that usually means that you should
> > > have a look at the coredump.
> > 
> > With "-p" I additionally get this error message in the log:
> > 
> > qemu-system-x86_64: ../../devel/qemu/block/graph-lock.c:294:
> >  bdrv_graph_rdlock_main_loop: Assertion `!qemu_in_coroutine()' failed.
> > 
> > With -gdb I can get a back trace that looks like this:
> > 
> > Thread 1 "qemu-system-x86" received signal SIGABRT, Aborted.
> > 0x00007ffff4ba7e9c in __pthread_kill_implementation () from 
> > target:/lib64/libc.so.6
> > --Type <RET> for more, q to quit, c to continue without paging--
> > #0  0x00007ffff4ba7e9c in __pthread_kill_implementation () from 
> > target:/lib64/libc.so.6
> > #1  0x00007ffff4b4df3e in raise () from target:/lib64/libc.so.6
> > #2  0x00007ffff4b356d0 in abort () from target:/lib64/libc.so.6
> > #3  0x00007ffff4b35639 in __assert_fail_base.cold () from 
> > target:/lib64/libc.so.6
> > #4  0x0000555555574eae in bdrv_graph_rdlock_main_loop () at 
> > ../../devel/qemu/block/graph-lock.c:294
> > #5  0x0000555555aa2f43 in graph_lockable_auto_lock_mainloop (x=<optimized 
> > out>) at /home/thuth/devel/qemu/include/block/graph-lock.h:275
> > #6  block_crypto_read_func (block=<optimized out>, offset=4096, 
> > buf=0x555558324100 "", buflen=256000, opaque=0x555558a259d0, 
> > errp=0x555558a8c370)
> >     at ../../devel/qemu/block/crypto.c:71
> > #7  0x0000555555a5a308 in qcrypto_block_luks_load_key 
> > (block=block@entry=0x555558686ec0, slot_idx=slot_idx@entry=0,
> >     password=password@entry=0x555558626050 "hunter0", 
> > masterkey=masterkey@entry=0x55555886b2a0 "",
> >     readfunc=readfunc@entry=0x555555aa2f10 <block_crypto_read_func>, 
> > opaque=opaque@entry=0x555558a259d0, errp=0x555558a8c370)
> >     at ../../devel/qemu/crypto/block-luks.c:927
> > #8  0x0000555555a5ba7e in qcrypto_block_luks_find_key 
> > (block=0x555558686ec0, password=0x555558626050 "hunter0", 
> > masterkey=0x55555886b2a0 "",
> >     readfunc=0x555555aa2f10 <block_crypto_read_func>, 
> > opaque=0x555558a259d0, errp=0x555558a8c370) at 
> > ../../devel/qemu/crypto/block-luks.c:1045
> > #9  qcrypto_block_luks_amend_add_keyslot (block=0x555558686ec0, 
> > readfunc=0x555555aa2f10 <block_crypto_read_func>,
> >     writefunc=0x555555aa2e50 <block_crypto_write_func>, 
> > opaque=0x555558a259d0, opts_luks=0x7fffec5fff38, force=<optimized out>, 
> > errp=0x555558a8c370)
> >     at ../../devel/qemu/crypto/block-luks.c:1673
> > #10 qcrypto_block_luks_amend_options (block=0x555558686ec0, 
> > readfunc=0x555555aa2f10 <block_crypto_read_func>,
> >     writefunc=0x555555aa2e50 <block_crypto_write_func>, 
> > opaque=0x555558a259d0, options=0x7fffec5fff30, force=<optimized out>, 
> > errp=0x555558a8c370)
> >     at ../../devel/qemu/crypto/block-luks.c:1865
> > #11 0x0000555555aa3852 in block_crypto_amend_options_generic_luks 
> > (bs=<optimized out>, amend_options=<optimized out>, force=<optimized out>,
> >     errp=<optimized out>) at ../../devel/qemu/block/crypto.c:949
> > #12 0x0000555555aa38e9 in block_crypto_co_amend_luks (bs=<optimized out>, 
> > opts=<optimized out>, force=<optimized out>, errp=<optimized out>)
> >     at ../../devel/qemu/block/crypto.c:1008
> > #13 0x0000555555a96030 in blockdev_amend_run (job=0x555558a8c2b0, 
> > errp=0x555558a8c370) at ../../devel/qemu/block/amend.c:52
> > #14 0x0000555555a874ad in job_co_entry (opaque=0x555558a8c2b0) at 
> > ../../devel/qemu/job.c:1112
> > #15 0x0000555555bdc41b in coroutine_trampoline (i0=<optimized out>, 
> > i1=<optimized out>) at ../../devel/qemu/util/coroutine-ucontext.c:175
> > #16 0x00007ffff4b68f70 in ?? () from target:/lib64/libc.so.6
> > #17 0x00007fffffffc310 in ?? ()
> > #18 0x0000000000000000 in ?? ()
> 
> Hm, so block_crypto_read_func() isn't prepared to be called in coroutine
> context, but block_crypto_co_amend_luks() still calls it from a
> coroutine. The indirection of going through QCrypto won't make it easier
> to fix this properly.

Historically block_crypto_read_func() didn't care/know whether it
was in a coroutine or not. Bisect tells me the regression was caused
by

  commit 1f051dcbdf2e4b6f518db731c84e304b2b9d15ce
  Author: Kevin Wolf <kw...@redhat.com>
  Date:   Fri Oct 27 17:53:33 2023 +0200

    block: Protect bs->file with graph_lock

which added
 
    GLOBAL_STATE_CODE();
    GRAPH_RDLOCK_GUARD_MAINLOOP();

> It seems to me that while block_crypto_read/write_func are effectively
> no_coroutine_fn, qcrypto_block_amend_options() should really take
> function pointers that can be called from coroutines. It is called from
> both coroutine and non-coroutine code paths, so should the function
> pointers be coroutine_mixed_fn or do we want to change the callers?
>
> Either way, we should add the appropriate coroutine markers to the
> QCrypto interfaces to show the intention at least.

I'm unclear why QCrypto needs to know about coroutines at all ?
It just wants a function pointer that will send or recv a blob
of data. In the case of the block layer these functions end up
doing I/O via the block APIs, but QCrypto doesn't care about
this impl detail.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Reply via email to