Hello everyone, This patch series introduces native io_uring support for FUSE storage export to overcome the scalability limits of the /dev/fuse interface. By utilizing shared memory ring buffers and per-core queues, this feature drastically reduces context switch overhead and lock contention. This allows FUSE export daemons to achieve much higher throughput and lower latency by minimizing the userspace-kernel switch penalty.
More details on Fuse-over-io_uring: https://docs.kernel.org/filesystems/fuse/fuse-io-uring.html Changes in this version: - Reorganized patch structure. - Unified naming of Uring data structures (e.g. FuseRing -> FuseUring) - Refactored FUSE_IN/OUT_OP_STRUCT_LEGACY - Code cleanup and logic simplification: - Used the io_uring flag to indicate the intention to enable Fuse-over-io_uring. - Used uring_started to track the active state. - Removed unnecessary #ifdef CONFIG_LINUX_IO_URING guards. - Moved fuse_fd closing to BH in uring mode to prevent data races. - Updated tests: now using mount to verify if the test image mount is fully gone. More detail in the v3 cover letter: https://lists.nongnu.org/archive/html/qemu-block/2025-08/msg00325.html V2 cover letter: https://lists.nongnu.org/archive/html/qemu-block/2025-08/msg00140.html V1 cover letter: https://lists.nongnu.org/archive/html/qemu-block/2025-07/msg00280.html We used fio to test a 1GB file under both legacy FUSE and FUSE-over-io_uring modes. The experiments were conducted with the following iodepth and numjobs configurations: 1-1, 64-1, 1-4, and 64-4, with 70% read and 30% write mix. This resulted in a total of 8 test cases, measuring both latency and throughput. Performance Results: [Bandwidth (MiB/s)] | Config (Job/QD) | Read (Leg -> Uring) | Write (Leg -> Uring)| |------------------|---------------------|---------------------| | 1 Job, QD=1 | 72.2 -> 104 | 30.9 -> 44.7 | | 1 Job, QD=64 | 114 -> 181 | 48.8 -> 77.7 | | 4 Jobs, QD=1 | 109 -> 159 | 47.0 -> 68.5 | | 4 Jobs, QD=64 | 106 -> 160 | 45.7 -> 68.9 | [Latency (usec)] | Config (Job/QD) | Read (Leg -> Uring) | Write (Leg -> Uring)| |------------------|---------------------|---------------------| | 1 Job, QD=1 | 37.0 -> 23.7 | 36.9 -> 29.5 | | 1 Job, QD=64 | 1537 -> 964 | 1535 -> 967 | | 4 Jobs, QD=1 | 96.6 -> 66.4 | 114.2 -> 71.9 | | 4 Jobs, QD=64 | 6560 -> 4234 | 6600 -> 4280 | Brian Song (7): [Patch v4 1/7] aio-posix: enable 128-byte SQEs [Patch v4 2/7] fuse: io_uring mode init [Patch v4 3/7] fuse: uring support for write ops [Patch v4 4/7] fuse: refactor FUSE request handler [Patch v4 5/6] fuse: safe termination for io_uring [Patch v4 6/7] fuse: add 'io-uring' option [Patch v4 7/7] fuse: add io_uring test support block/export/fuse.c | 958 +++++++++++++++++++++++---- docs/tools/qemu-storage-daemon.rst | 7 +- qapi/block-export.json | 5 +- storage-daemon/qemu-storage-daemon.c | 1 + tests/qemu-iotests/check | 2 + tests/qemu-iotests/common.rc | 47 +- util/fdmon-io_uring.c | 7 +- 7 files changed, 879 insertions(+), 148 deletions(-) -- 2.43.0
