On Wed, Jun 04, 2025 at 03:28:05PM +0200, Hanna Czenczek wrote:
> Manually read requests from the /dev/fuse FD and process them, without
> using libfuse.  This allows us to safely add parallel request processing
> in coroutines later, without having to worry about libfuse internals.
> (Technically, we already have exactly that problem with
> read_from_fuse_export()/read_from_fuse_fd() nesting.)
> 
> We will continue to use libfuse for mounting the filesystem; fusermount3
> is a effectively a helper program of libfuse, so it should know best how
> to interact with it.  (Doing it manually without libfuse, while doable,
> is a bit of a pain, and it is not clear to me how stable the "protocol"
> actually is.)
> 
> Take this opportunity of quite a major rewrite to update the Copyright
> line with corrected information that has surfaced in the meantime.
> 
> Here are some benchmarks from before this patch (4k, iodepth=16, libaio;
> except 'sync', which are iodepth=1 and pvsync2):
> 
> file:
>   read:
>     seq aio:   78.6k ±1.3k IOPS
>     rand aio:  39.3k ±2.9k
>     seq sync:  32.5k ±0.7k
>     rand sync:  9.9k ±0.1k
>   write:
>     seq aio:   61.9k ±0.5k
>     rand aio:  61.2k ±0.6k
>     seq sync:  27.9k ±0.2k
>     rand sync: 27.6k ±0.4k
> null:
>   read:
>     seq aio:   214.0k ±5.9k
>     rand aio:  212.7k ±4.5k
>     seq sync:   90.3k ±6.5k
>     rand sync:  89.7k ±5.1k
>   write:
>     seq aio:   203.9k ±1.5k
>     rand aio:  201.4k ±3.6k
>     seq sync:   86.1k ±6.2k
>     rand sync:  84.9k ±5.3k
> 
> And with this patch applied:
> 
> file:
>   read:
>     seq aio:   76.6k ±1.8k (- 3 %)
>     rand aio:  26.7k ±0.4k (-32 %)
>     seq sync:  47.7k ±1.2k (+47 %)
>     rand sync: 10.1k ±0.2k (+ 2 %)
>   write:
>     seq aio:   58.1k ±0.5k (- 6 %)
>     rand aio:  58.1k ±0.5k (- 5 %)
>     seq sync:  36.3k ±0.3k (+30 %)
>     rand sync: 36.1k ±0.4k (+31 %)
> null:
>   read:
>     seq aio:   268.4k ±3.4k (+25 %)
>     rand aio:  265.3k ±2.1k (+25 %)
>     seq sync:  134.3k ±2.7k (+49 %)
>     rand sync: 132.4k ±1.4k (+48 %)
>   write:
>     seq aio:   275.3k ±1.7k (+35 %)
>     rand aio:  272.3k ±1.9k (+35 %)
>     seq sync:  130.7k ±1.6k (+52 %)
>     rand sync: 127.4k ±2.4k (+50 %)
> 
> So clearly the AIO file results are actually not good, and random reads
> are indeed quite terrible.  On the other hand, we can see from the sync
> and null results that request handling should in theory be quicker.  How
> does this fit together?
> 
> I believe the bad AIO results are an artifact of the accidental parallel
> request processing we have due to nested polling: Depending on how the
> actual request processing is structured and how long request processing
> takes, more or less requests will be submitted in parallel.  So because
> of the restructuring, I think this patch accidentally changes how many
> requests end up being submitted in parallel, which decreases
> performance.
> 
> (I have seen something like this before: In RSD, without having
> implemented a polling mode, the debug build tended to have better
> performance than the more optimized release build, because the debug
> build, taking longer to submit requests, ended up processing more
> requests in parallel.)
> 
> In any case, once we use coroutines throughout the code, performance
> will improve again across the board.
> 
> Signed-off-by: Hanna Czenczek <hre...@redhat.com>
> ---
>  block/export/fuse.c | 754 +++++++++++++++++++++++++++++++-------------
>  1 file changed, 535 insertions(+), 219 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefa...@redhat.com>

Attachment: signature.asc
Description: PGP signature

Reply via email to