On Thu, Aug 07, 2025 at 09:05:25AM +0000, Bernd Schubert wrote: > Hi Brian, > > sorry for late replies. Totally swamped in work this week and next week > will be off another week. > > On 8/5/25 06:11, Brian Song wrote: > > > > > > On 2025-08-04 7:33 a.m., Bernd Schubert wrote: > >> Hi Brian, > >> > >> sorry for my late reply, just back from vacation and fighting through > >> my mails. > >> > >> On 8/4/25 01:33, Brian Song wrote: > >>> > >>> > >>> On 2025-08-01 12:09 p.m., Brian Song wrote: > >>>> Hi Bernd, > >>>> > >>>> We are currently working on implementing termination support for fuse- > >>>> over-io_uring in QEMU, and right now we are focusing on how to clean up > >>>> in-flight SQEs properly. Our main question is about how well the kernel > >>>> supports robust cancellation for these fuse-over-io_uring SQEs. Does it > >>>> actually implement cancellation beyond destroying the io_uring queue? > >>>> [...] > >>> > >> > >> I have to admit that I'm confused why you can't use umount, isn't that > >> the most graceful way to shutdown a connection? > >> > >> If you need another custom way for some reasons, we probably need > >> to add it. > >> > >> > >> Thanks, > >> Bernd > > > > Hi Bernd, > > > > Thanks for your insights! > > > > I think umount doesn't cancel any pending SQEs, right? From what I see, > > the only way to cancel all pending SQEs and transition all entries to > > the FRRS_USERSPACE state (unavailable for further fuse requests) in the > > kernel is by calling io_uring_files_cancel in do_exit, or > > io_uring_task_cancel in begin_new_exec. > > There are two umount forms > > - Forced umount - immediately cancels the connection and aborts > requests. That also immediately releases pending SQEs. > > - Normal umount, destroys the connection and completed SQEs at the end > of umount. > > > > > From my understanding, QEMU follows an event-driven model. So if we > > don't cancel the SQEs submitted by a connection when it ends, then > > before QEMU exits — after the connection is closed and the associated > > FUSE data structures have been freed — any CQE that comes back will > > trigger QEMU to invoke a previously deleted CQE handler, leading to a > > segfault. > > > > So if the only way to make all pending entries unavailable in the kernel > > is calling do_exit or begin_new_exec, I think we should do some > > workarounds in QEMU. > > I guess if we find a good argument why qemu needs to complete SQEs > before umount is complete a kernel patch would be accepted. Doesn't > sound that difficult to create patch for that. At least for entries that > are on state FRRS_AVAILABLE. I can prepare patch, but at best in between > Saturday and Monday.
Hi Bernd, QEMU quiesces I/O at certain points, like when the block driver graph is reconfigured (kind of like changing the device-mapper table in the kernel) or when threads are reconfigured. This is also used during termination to stop accepting new I/O and wait until in-flight I/O has completed. Ideally io_uring's ASYNC_CANCEL would work on in-flight FUSE-over-io_uring uring_cmd requests. The REGISTER or COMMIT_AND_FETCH uring_cmds would complete with -ECANCELED and future FUSE requests would be queued in the kernel until FUSE-over-io_uring becomes ready again. If and when userspace becomes ready again, it submits REGISTER uring_cmds again and queued FUSE requests are then delivered to userspace. Thanks for your help! Stefan
signature.asc
Description: PGP signature