On Tue, Jul 01, 2025 at 01:44:34PM +0200, Hanna Czenczek wrote: > FUSE allows creating multiple request queues by "cloning" /dev/fuse FDs > (via open("/dev/fuse") + ioctl(FUSE_DEV_IOC_CLONE)). > > We can use this to implement multi-threading. > > For configuration, we don't need any more information beyond the simple > array provided by the core block export interface: The FUSE kernel > driver feeds these FDs in a round-robin fashion, so all of them are > equivalent and we want to have exactly one per thread. > > These are the benchmark results when using four threads (compared to a > single thread); note that fio still only uses a single job, but > performance can still be improved because of said round-robin usage for > the queues. (Not in the sync case, though, in which case I guess it > just adds overhead.) > > file: > read: > seq aio: 264.8k ±0.8k (+120 %) > rand aio: 143.8k ±0.4k (+ 27 %) > seq sync: 49.9k ±0.5k (- 5 %) > rand sync: 10.3k ±0.1k (- 1 %) > write: > seq aio: 226.6k ±2.1k (+184 %) > rand aio: 225.9k ±1.8k (+186 %) > seq sync: 36.9k ±0.6k (- 11 %) > rand sync: 36.9k ±0.2k (- 11 %) > null: > read: > seq aio: 315.2k ±11.0k (+18 %) > rand aio: 300.5k ±10.8k (+14 %) > seq sync: 114.2k ± 3.6k (-16 %) > rand sync: 112.5k ± 2.8k (-16 %) > write: > seq aio: 222.6k ±6.8k (-21 %) > rand aio: 220.5k ±6.8k (-23 %) > seq sync: 117.2k ±3.7k (-18 %) > rand sync: 116.3k ±4.4k (-18 %) > > (I don't know what's going on in the null-write AIO case, sorry.) > > Here's results for numjobs=4: > > "Before", i.e. without multithreading in QSD/FUSE (results compared to > numjobs=1): > > file: > read: > seq aio: 104.7k ± 0.4k (- 13 %) > rand aio: 111.5k ± 0.4k (- 2 %) > seq sync: 71.0k ±13.8k (+ 36 %) > rand sync: 41.4k ± 0.1k (+297 %) > write: > seq aio: 79.4k ±0.1k (- 1 %) > rand aio: 78.6k ±0.1k (± 0 %) > seq sync: 83.3k ±0.1k (+101 %) > rand sync: 82.0k ±0.2k (+ 98 %) > null: > read: > seq aio: 260.5k ±1.5k (- 2 %) > rand aio: 260.1k ±1.4k (- 2 %) > seq sync: 291.8k ±1.3k (+115 %) > rand sync: 280.1k ±1.7k (+115 %) > write: > seq aio: 280.1k ±1.7k (± 0 %) > rand aio: 279.5k ±1.4k (- 3 %) > seq sync: 306.7k ±2.2k (+116 %) > rand sync: 305.9k ±1.8k (+117 %) > > (As probably expected, little difference in the AIO case, but great > improvements in the sync case because it kind of gives it an artificial > iodepth of 4.) > > "After", i.e. with four threads in QSD/FUSE (now results compared to the > above): > > file: > read: > seq aio: 193.3k ± 1.8k (+ 85 %) > rand aio: 329.3k ± 0.3k (+195 %) > seq sync: 66.2k ±13.0k (- 7 %) > rand sync: 40.1k ± 0.0k (- 3 %) > write: > seq aio: 219.7k ±0.8k (+177 %) > rand aio: 217.2k ±1.5k (+176 %) > seq sync: 92.5k ±0.2k (+ 11 %) > rand sync: 91.9k ±0.2k (+ 12 %) > null: > read: > seq aio: 706.7k ±2.1k (+171 %) > rand aio: 714.7k ±3.2k (+175 %) > seq sync: 431.7k ±3.0k (+ 48 %) > rand sync: 435.4k ±2.8k (+ 50 %) > write: > seq aio: 746.9k ±2.8k (+167 %) > rand aio: 749.0k ±4.9k (+168 %) > seq sync: 420.7k ±3.1k (+ 37 %) > rand sync: 419.1k ±2.5k (+ 37 %) > > So this helps mainly for the AIO cases, but also in the null sync cases, > because null is always CPU-bound, so more threads help. > > Signed-off-by: Hanna Czenczek <hre...@redhat.com> > --- > block/export/fuse.c | 205 ++++++++++++++++++++++++++++++++++---------- > 1 file changed, 159 insertions(+), 46 deletions(-)
Reviewed-by: Stefan Hajnoczi <stefa...@redhat.com>
signature.asc
Description: PGP signature