On Wed, Oct 22, 2025 at 11:02:28AM +0200, Kevin Wolf wrote:
> Am 21.10.2025 um 21:10 hat Stefan Hajnoczi geschrieben:
> > On Thu, Oct 09, 2025 at 06:59:20PM +0200, Kevin Wolf wrote:
> > > Am 09.10.2025 um 17:46 hat Kevin Wolf geschrieben:
> > > > Am 10.09.2025 um 19:56 hat Stefan Hajnoczi geschrieben:
> > > > > There is no need for aio_context_use_g_source() now that epoll(7) and
> > > > > io_uring(7) file descriptor monitoring works with the glib event loop.
> > > > > AioContext doesn't need to be notified that GSource is being used.
> > > > > 
> > > > > Signed-off-by: Stefan Hajnoczi <[email protected]>
> > > > > Reviewed-by: Eric Blake <[email protected]>
> > > > 
> > > > We should probably mention in the commit message that this causes the
> > > > default fdmon on Linux to change from poll to io_uring. It's a small
> > > > code change, but it makes QEMU use a completely different code path by
> > > > default.
> > > 
> > > Just to make sure, I ran 'make check' after this patch and it's failing
> > > for me:
> > > 
> > >  10/401 qemu:qtest+qtest-x86_64 / qtest-x86_64/ahci-test                  
> > >   TIMEOUT        150.02s   killed by signal 15 SIGTERM
> > > 133/401 qemu:unit / test-aio                                              
> > >   TIMEOUT         30.01s   killed by signal 15 SIGTERM
> > > 137/401 qemu:unit / test-bdrv-drain                                       
> > >   TIMEOUT         30.01s   killed by signal 15 SIGTERM
> > > 142/401 qemu:unit / test-block-iothread                                   
> > >   TIMEOUT         30.01s   killed by signal 15 SIGTERM
> > > 192/401 qemu:doc+rust / rust-bql-rs-doctests                              
> > >   FAIL             0.84s   exit status 101
> > > 311/401 qemu:block / io-qcow2-267                                         
> > >   ERROR            3.20s   exit status 1
> > > 321/401 qemu:block / io-qcow2-copy-before-write                           
> > >   TIMEOUT        180.01s   killed by signal 15 SIGTERM
> > > 
> > > Some of them look unrelated, but I have confirmed that the three unit
> > > tests still pass before this patch (and still hang after the complete
> > > series).
> > 
> > I can't reproduce these failures, regardless of whether sysctl
> > kernel.io_uring_disabled is 0 or 1.
> > 
> > Can you launch the unit tests from your terminal and post the output?
> > 
> >   $ cd qemu
> >   $ build/tests/unit/test-aio
> 
> TAP version 14
> # random seed: R02S48dcdde28634143f18bad3947c52d334
> 1..27
> # Start of aio tests
> # Start of bh tests
> ok 1 /aio/bh/schedule
> ok 2 /aio/bh/schedule10
> ok 3 /aio/bh/cancel
> ok 4 /aio/bh/delete
> ok 5 /aio/bh/flush
> # Start of callback-delete tests
> ok 6 /aio/bh/callback-delete/one
> ok 7 /aio/bh/callback-delete/many
> # End of callback-delete tests
> # End of bh tests
> # Start of event tests
> ok 8 /aio/event/add-remove
> ok 9 /aio/event/wait
> ok 10 /aio/event/flush
> # Start of wait tests
> ok 11 /aio/event/wait/no-flush-cb
> # End of wait tests
> # End of event tests
> # Start of timer tests
> 
> >   $ build/tests/unit/test-bdrv-drain
> 
> TAP version 14
> # random seed: R02S7d6ba0fc81d5b90d323813d680a30644
> 1..30
> # Start of bdrv-drain tests
> ok 1 /bdrv-drain/nested
> ok 2 /bdrv-drain/set_aio_context
> # Start of driver-cb tests
> 
> 
> >   $ build/tests/unit/test-block-iothread
> 
> TAP version 14
> # random seed: R02Sf81baf68887daa9b86be5c72b99df589
> 1..22
> # Start of sync-op tests
> ok 1 /sync-op/pread
> ok 2 /sync-op/pwrite
> ok 3 /sync-op/preadv
> ok 4 /sync-op/pwritev
> ok 5 /sync-op/preadv_part
> ok 6 /sync-op/pwritev_part
> ok 7 /sync-op/pwrite_compressed
> ok 8 /sync-op/pwrite_zeroes
> ok 9 /sync-op/load_vmstate
> ok 10 /sync-op/save_vmstate
> ok 11 /sync-op/pdiscard
> ok 12 /sync-op/truncate
> ok 13 /sync-op/block_status
> ok 14 /sync-op/flush
> ok 15 /sync-op/check
> ok 16 /sync-op/activate
> # End of sync-op tests
> # Start of attach tests
> 
> > That will show exactly which sub-test case is hanging.
> > Other information that might help: your host kernel version and liburing
> > version.
> 
> This is a F42 system.
> 
> kernel-6.16.12-200.fc42.x86_64
> liburing-2.9-1.fc42.x86_64
> 
> If you can't reproduce or find a hypothesis what's happening, I can try
> to debug one of the hanging processes.

Unfortunately I haven't been able to reproduce it on my system. It's
a F42 machine with the same package versions as your machine.

The test-aio timer tests look like good candidates for debugging. It is
likely that the test is either getting to an infinite do {} while
(!aio_poll(ctx, false)) loop or to an aio_poll(ctx, true) call that
hangs.

Thanks for your help with debugging!

Stefan

Attachment: signature.asc
Description: PGP signature

Reply via email to