On 9/8/25 3:45 PM, Bernd Schubert wrote:
On 9/8/25 21:09, Brian Song wrote:
On 9/3/25 7:51 AM, Stefan Hajnoczi wrote:
On Fri, Aug 29, 2025 at 10:50:23PM -0400, Brian Song wrote:
https://docs.kernel.org/filesystems/fuse-io-uring.html
As described in the kernel documentation, after FUSE-over-io_uring
initialization and handshake, FUSE interacts with the kernel using
SQE/CQE to send requests and receive responses. This corresponds to
the "Sending requests with CQEs" section in the docs.
This patch implements three key parts: registering the CQE handler
(fuse_uring_cqe_handler), processing FUSE requests (fuse_uring_co_
process_request), and sending response results (fuse_uring_send_
response). It also merges the traditional /dev/fuse request handling
with the FUSE-over-io_uring handling functions.
Suggested-by: Kevin Wolf <kw...@redhat.com>
Suggested-by: Stefan Hajnoczi <stefa...@redhat.com>
Signed-off-by: Brian Song <hibrians...@gmail.com>
---
block/export/fuse.c | 457 ++++++++++++++++++++++++++++++--------------
1 file changed, 309 insertions(+), 148 deletions(-)
diff --git a/block/export/fuse.c b/block/export/fuse.c
index 19bf9e5f74..07f74fc8ec 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -310,6 +310,47 @@ static const BlockDevOps fuse_export_blk_dev_ops = {
};
#ifdef CONFIG_LINUX_IO_URING
+static void coroutine_fn fuse_uring_co_process_request(FuseRingEnt *ent);
+
+static void coroutine_fn co_fuse_uring_queue_handle_cqes(void *opaque)
This function appears to handle exactly one cqe. A singular function
name would be clearer than a plural: co_fuse_uring_queue_handle_cqe().
+{
+ FuseRingEnt *ent = opaque;
+ FuseExport *exp = ent->rq->q->exp;
+
+ /* Going to process requests */
+ fuse_inc_in_flight(exp);
What is the rationale for taking a reference here? Normally something
already holds a reference (e.g. the request itself) and it will be
dropped somewhere inside a function we're about to call, but we still
need to access exp afterwards, so we temporarily take a reference.
Please document the specifics in a comment.
I think blk_exp_ref()/blk_exp_unref() are appropriate instead of
fuse_inc_in_flight()/fuse_dec_in_flight() since we only need to hold
onto the export and don't care about drain behavior.
Stefan:
When handling FUSE requests, we don’t want the FuseExport to be
accidentally deleted. Therefore, we use fuse_inc_in_flight in the CQE
handler to increment the in_flight counter, and when a request is
completed, we call fuse_dec_in_flight to decrement it. Once the last
request has been processed, fuse_dec_in_flight brings the in_flight
counter down to 0, indicating that the export can safely be deleted. The
usage of in_flight follows the same logic as in traditional FUSE request
handling.
Since submitted SQEs for FUSE cannot be canceled, once we register or
commit them we must wait for the kernel to return a CQE. Otherwise, the
kernel may deliver a CQE and invoke its handler after the export has
already been deleted. For this reason, we directly call blk_exp_ref and
blk_exp_unref when submitting an SQE and when receiving its CQE, to
explicitly control the export reference and prevent accidental deletion.
The doc/comment for co_fuse_uring_queue_handle_cqe:
Protect FuseExport from premature deletion while handling FUSE requests.
CQE handlers inc/dec the in_flight counter; when it reaches 0, the
export can be freed. This follows the same logic as traditional FUSE.
Since FUSE SQEs cannot be canceled, a CQE may arrive after commit even
if the export is deleted. To prevent this, we ref/unref the export
explicitly at SQE submission and CQE completion.
+
+ /* A ring entry returned */
+ fuse_uring_co_process_request(ent);
+
+ /* Finished processing requests */
+ fuse_dec_in_flight(exp);
+}
+
+static void fuse_uring_cqe_handler(CqeHandler *cqe_handler)
+{
+ FuseRingEnt *ent = container_of(cqe_handler, FuseRingEnt,
fuse_cqe_handler);
+ Coroutine *co;
+ FuseExport *exp = ent->rq->q->exp;
+
+ if (unlikely(exp->halted)) {
+ return;
+ }
+
+ int err = cqe_handler->cqe.res;
+
+ if (err != 0) {
+ /* -ENOTCONN is ok on umount */
+ if (err != -EINTR && err != -EAGAIN &&
+ err != -ENOTCONN) {
+ fuse_export_halt(exp);
+ }
How are EINTR and EAGAIN handled if they are silently ignored? When did
you encounter these error codes?
Bernd:
I have the same question about this. As for how the kernel returns
errors, I haven’t studied each case yet. In libfuse it’s implemented the
same way, could you briefly explain why we choose to ignore these two
errors, and under what circumstances we might encounter them?
I think I remember why I had added these. Initially the ring threads
didn't inherit the signal handlers libfuse worker threads have. I had
fixed that later and these error conditions are a left over.
In libfuse idea is that the main thread gets all signals and then sets
se->exited - worker thread, include ring threads are not supposed to get
or handle signals at all, but have to monitor se->exited.
Good catch Stefan, I think I can remove these conditions in libfuse.
Thanks,
Bernd
In libfuse:
static int fuse_uring_queue_handle_cqes(struct fuse_ring_queue *queue)
{
struct fuse_ring_pool *ring_pool = queue->ring_pool;
struct fuse_session *se = ring_pool->se;
size_t num_completed = 0;
struct io_uring_cqe *cqe;
unsigned int head;
int ret = 0;
io_uring_for_each_cqe(&queue->ring, head, cqe) {
int err = 0;
num_completed++;
err = cqe->res;
if (err != 0) {
if (err > 0 && ((uintptr_t)io_uring_cqe_get_data(cqe) ==
(unsigned int)queue->eventfd)) {
/* teardown from eventfd */
return -ENOTCONN;
}
// XXX: Needs rate limited logs, otherwise log spam
//fuse_log(FUSE_LOG_ERR, "cqe res: %d\n", cqe->res);
/* -ENOTCONN is ok on umount */
if (err != -EINTR &&
err != -EAGAIN && err != -ENOTCONN) {
se->error = cqe->res;
/* return first error */
if (ret == 0)
ret = err;
}
} else {
fuse_uring_handle_cqe(queue, cqe);
}
}
if (num_completed)
io_uring_cq_advance(&queue->ring, num_completed);
return ret == 0 ? 0 : num_completed;
}
If err > 0 && ((uintptr_t)io_uring_cqe_get_data(cqe) == (unsigned
int)queue->eventfd), it will return the negative value -ENOTCONN so that
the caller sets se->exited = 1. Then, under what circumstances is err >
0? When is err < 0? The current code also doesn't seem to handle the
case where err is negative?