On 15.09.25 07:43, Brian Song wrote:
Hi Hanna,

Hi Brian!

(Thanks for your heads-up!)

Stefan raised the above issue and proposed a preliminary solution: keep
closing the file descriptor in the delete section, but perform
umount separately for FUSE uring and traditional FUSE in the shutdown
and delete sections respectively. This approach avoids the race
condition on the file descriptor.

In the case of FUSE uring, umount must be performed in the shutdown
section. The reason is that the kernel currently lacks an interface to
explicitly cancel submitted SQEs. Performing umount forces the kernel to
flush all pending SQEs and return their CQEs. Without this step, CQEs
may arrive after the export has already been deleted, and invoking the
CQE handler at that point would dereference freed memory and trigger a
segmentation fault.

The commit message says that incrementing the BB reference would be enough to solve the problem (i.e. deleting is delayed until all requests are done).  Why isn’t it?

I’m curious about traditional FUSE: is it strictly necessary to perform
umount in the delete section, or could it also be done in shutdown?

Looking into libfuse, fuse_session_unmount() (in fuse_kern_unmount()) closes the FUSE FD.  I can imagine that might result in the potential problems Stefan described.

Additionally, what is the correct ordering between close(fd) and
umount, does one need to precede the other?

fuse_kern_unmount() closes the (queue 0) FD first before actually unmounting, with a comment: “Need to close file descriptor, otherwise synchronous umount would recurse into filesystem, and deadlock.”

Given that, I assume the FDs should all be closed before unmounting.

(Though to be fair, before looking into it now, I don’t think I’ve ever given it much thought…)

Hanna

Thanks,
Brian

On 9/9/25 3:33 PM, Stefan Hajnoczi wrote:
  > On Fri, Aug 29, 2025 at 10:50:24PM -0400, Brian Song wrote:
  >> @@ -901,24 +941,15 @@ static void fuse_export_shutdown(BlockExport
*blk_exp)
  >>            */
  >>           g_hash_table_remove(exports, exp->mountpoint);
  >>       }
  >> -}
  >> -
  >> -static void fuse_export_delete(BlockExport *blk_exp)
  >> -{
  >> -    FuseExport *exp = container_of(blk_exp, FuseExport, common);
  >>
  >> -    for (int i = 0; i < exp->num_queues; i++) {
  >> +    for (size_t i = 0; i < exp->num_queues; i++) {
  >>           FuseQueue *q = &exp->queues[i];
  >>
  >>           /* Queue 0's FD belongs to the FUSE session */
  >>           if (i > 0 && q->fuse_fd >= 0) {
  >>               close(q->fuse_fd);
  >
  > This changes the behavior of the non-io_uring code. Now all fuse fds and
  > fuse_session are closed while requests are potentially still being
  > processed.
  >
  > There is a race condition: if an IOThread is processing a request here
  > then it may invoke a system call on q->fuse_fd just after it has been
  > closed but not set to -1. If another thread has also opened a new file
  > then the fd could be reused, resulting in an accidental write(2) to the
  > new file. I'm not sure whether there is a way to trigger this in
  > practice, but it looks like a problem waiting to happen.
  >
  > Simply setting q->fuse_fd to -1 here doesn't fix the race. It would be
  > necessary to stop processing fuse_fd in the thread before closing it
  > here or to schedule a BH in each thread so that fuse_fd can be closed
  > in the thread that uses the fd.



Reply via email to