On Wed, Jun 04, 2025 at 03:28:05PM +0200, Hanna Czenczek wrote: > Manually read requests from the /dev/fuse FD and process them, without > using libfuse. This allows us to safely add parallel request processing > in coroutines later, without having to worry about libfuse internals. > (Technically, we already have exactly that problem with > read_from_fuse_export()/read_from_fuse_fd() nesting.) > > We will continue to use libfuse for mounting the filesystem; fusermount3 > is a effectively a helper program of libfuse, so it should know best how > to interact with it. (Doing it manually without libfuse, while doable, > is a bit of a pain, and it is not clear to me how stable the "protocol" > actually is.) > > Take this opportunity of quite a major rewrite to update the Copyright > line with corrected information that has surfaced in the meantime. > > Here are some benchmarks from before this patch (4k, iodepth=16, libaio; > except 'sync', which are iodepth=1 and pvsync2): > > file: > read: > seq aio: 78.6k ±1.3k IOPS > rand aio: 39.3k ±2.9k > seq sync: 32.5k ±0.7k > rand sync: 9.9k ±0.1k > write: > seq aio: 61.9k ±0.5k > rand aio: 61.2k ±0.6k > seq sync: 27.9k ±0.2k > rand sync: 27.6k ±0.4k > null: > read: > seq aio: 214.0k ±5.9k > rand aio: 212.7k ±4.5k > seq sync: 90.3k ±6.5k > rand sync: 89.7k ±5.1k > write: > seq aio: 203.9k ±1.5k > rand aio: 201.4k ±3.6k > seq sync: 86.1k ±6.2k > rand sync: 84.9k ±5.3k > > And with this patch applied: > > file: > read: > seq aio: 76.6k ±1.8k (- 3 %) > rand aio: 26.7k ±0.4k (-32 %) > seq sync: 47.7k ±1.2k (+47 %) > rand sync: 10.1k ±0.2k (+ 2 %) > write: > seq aio: 58.1k ±0.5k (- 6 %) > rand aio: 58.1k ±0.5k (- 5 %) > seq sync: 36.3k ±0.3k (+30 %) > rand sync: 36.1k ±0.4k (+31 %) > null: > read: > seq aio: 268.4k ±3.4k (+25 %) > rand aio: 265.3k ±2.1k (+25 %) > seq sync: 134.3k ±2.7k (+49 %) > rand sync: 132.4k ±1.4k (+48 %) > write: > seq aio: 275.3k ±1.7k (+35 %) > rand aio: 272.3k ±1.9k (+35 %) > seq sync: 130.7k ±1.6k (+52 %) > rand sync: 127.4k ±2.4k (+50 %) > > So clearly the AIO file results are actually not good, and random reads > are indeed quite terrible. On the other hand, we can see from the sync > and null results that request handling should in theory be quicker. How > does this fit together? > > I believe the bad AIO results are an artifact of the accidental parallel > request processing we have due to nested polling: Depending on how the > actual request processing is structured and how long request processing > takes, more or less requests will be submitted in parallel. So because > of the restructuring, I think this patch accidentally changes how many > requests end up being submitted in parallel, which decreases > performance. > > (I have seen something like this before: In RSD, without having > implemented a polling mode, the debug build tended to have better > performance than the more optimized release build, because the debug > build, taking longer to submit requests, ended up processing more > requests in parallel.) > > In any case, once we use coroutines throughout the code, performance > will improve again across the board. > > Signed-off-by: Hanna Czenczek <hre...@redhat.com> > --- > block/export/fuse.c | 754 +++++++++++++++++++++++++++++++------------- > 1 file changed, 535 insertions(+), 219 deletions(-)
Reviewed-by: Stefan Hajnoczi <stefa...@redhat.com>
signature.asc
Description: PGP signature