On Tue, Jul 01, 2025 at 01:44:34PM +0200, Hanna Czenczek wrote:
> FUSE allows creating multiple request queues by "cloning" /dev/fuse FDs
> (via open("/dev/fuse") + ioctl(FUSE_DEV_IOC_CLONE)).
> 
> We can use this to implement multi-threading.
> 
> For configuration, we don't need any more information beyond the simple
> array provided by the core block export interface: The FUSE kernel
> driver feeds these FDs in a round-robin fashion, so all of them are
> equivalent and we want to have exactly one per thread.
> 
> These are the benchmark results when using four threads (compared to a
> single thread); note that fio still only uses a single job, but
> performance can still be improved because of said round-robin usage for
> the queues.  (Not in the sync case, though, in which case I guess it
> just adds overhead.)
> 
> file:
>   read:
>     seq aio:   264.8k ±0.8k (+120 %)
>     rand aio:  143.8k ±0.4k (+ 27 %)
>     seq sync:   49.9k ±0.5k (-  5 %)
>     rand sync:  10.3k ±0.1k (-  1 %)
>   write:
>     seq aio:   226.6k ±2.1k (+184 %)
>     rand aio:  225.9k ±1.8k (+186 %)
>     seq sync:   36.9k ±0.6k (- 11 %)
>     rand sync:  36.9k ±0.2k (- 11 %)
> null:
>   read:
>     seq aio:   315.2k ±11.0k (+18 %)
>     rand aio:  300.5k ±10.8k (+14 %)
>     seq sync:  114.2k ± 3.6k (-16 %)
>     rand sync: 112.5k ± 2.8k (-16 %)
>   write:
>     seq aio:   222.6k ±6.8k (-21 %)
>     rand aio:  220.5k ±6.8k (-23 %)
>     seq sync:  117.2k ±3.7k (-18 %)
>     rand sync: 116.3k ±4.4k (-18 %)
> 
> (I don't know what's going on in the null-write AIO case, sorry.)
> 
> Here's results for numjobs=4:
> 
> "Before", i.e. without multithreading in QSD/FUSE (results compared to
> numjobs=1):
> 
> file:
>   read:
>     seq aio:   104.7k ± 0.4k (- 13 %)
>     rand aio:  111.5k ± 0.4k (-  2 %)
>     seq sync:   71.0k ±13.8k (+ 36 %)
>     rand sync:  41.4k ± 0.1k (+297 %)
>   write:
>     seq aio:    79.4k ±0.1k (-  1 %)
>     rand aio:   78.6k ±0.1k (±  0 %)
>     seq sync:   83.3k ±0.1k (+101 %)
>     rand sync:  82.0k ±0.2k (+ 98 %)
> null:
>   read:
>     seq aio:   260.5k ±1.5k (-  2 %)
>     rand aio:  260.1k ±1.4k (-  2 %)
>     seq sync:  291.8k ±1.3k (+115 %)
>     rand sync: 280.1k ±1.7k (+115 %)
>   write:
>     seq aio:   280.1k ±1.7k (±  0 %)
>     rand aio:  279.5k ±1.4k (-  3 %)
>     seq sync:  306.7k ±2.2k (+116 %)
>     rand sync: 305.9k ±1.8k (+117 %)
> 
> (As probably expected, little difference in the AIO case, but great
> improvements in the sync case because it kind of gives it an artificial
> iodepth of 4.)
> 
> "After", i.e. with four threads in QSD/FUSE (now results compared to the
> above):
> 
> file:
>   read:
>     seq aio:   193.3k ± 1.8k (+ 85 %)
>     rand aio:  329.3k ± 0.3k (+195 %)
>     seq sync:   66.2k ±13.0k (-  7 %)
>     rand sync:  40.1k ± 0.0k (-  3 %)
>   write:
>     seq aio:   219.7k ±0.8k (+177 %)
>     rand aio:  217.2k ±1.5k (+176 %)
>     seq sync:   92.5k ±0.2k (+ 11 %)
>     rand sync:  91.9k ±0.2k (+ 12 %)
> null:
>   read:
>     seq aio:   706.7k ±2.1k (+171 %)
>     rand aio:  714.7k ±3.2k (+175 %)
>     seq sync:  431.7k ±3.0k (+ 48 %)
>     rand sync: 435.4k ±2.8k (+ 50 %)
>   write:
>     seq aio:   746.9k ±2.8k (+167 %)
>     rand aio:  749.0k ±4.9k (+168 %)
>     seq sync:  420.7k ±3.1k (+ 37 %)
>     rand sync: 419.1k ±2.5k (+ 37 %)
> 
> So this helps mainly for the AIO cases, but also in the null sync cases,
> because null is always CPU-bound, so more threads help.
> 
> Signed-off-by: Hanna Czenczek <hre...@redhat.com>
> ---
>  block/export/fuse.c | 205 ++++++++++++++++++++++++++++++++++----------
>  1 file changed, 159 insertions(+), 46 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefa...@redhat.com>

Attachment: signature.asc
Description: PGP signature

Reply via email to