Am 18.03.2026 um 16:32 hat Hanna Czenczek geschrieben:
> Short writes can happen, too, not just short reads.  The difference to
> aio=native is that the kernel will actually retry the tail of short
> requests internally already -- so it is harder to reproduce.  But if the
> tail of a short request returns an error to the kernel, we will see it
> in userspace still.  To reproduce this, apply the following patch on top
> of the one shown in HEAD^ (again %s/escaped // to apply):
> 
> escaped diff --git a/block/export/fuse.c b/block/export/fuse.c
> escaped index 67dc50a412..2b98489a32 100644
> escaped --- a/block/export/fuse.c
> escaped +++ b/block/export/fuse.c
> @@ -1059,8 +1059,15 @@ fuse_co_read(FuseExport *exp, void **bufptr, uint64_t 
> offset, uint32_t size)
>      int64_t blk_len;
>      void *buf;
>      int ret;
> +    static uint32_t error_size;
> 
> -    size = MIN(size, 4096);
> +    if (error_size == size) {
> +        error_size = 0;
> +        return -EIO;
> +    } else if (size > 4096) {
> +        error_size = size - 4096;
> +        size = 4096;
> +    }
> 
>      /* Limited by max_read, should not happen */
>      if (size > FUSE_MAX_READ_BYTES) {
> @@ -1111,8 +1118,15 @@ fuse_co_write(FuseExport *exp, struct fuse_write_out 
> *out,
>  {
>      int64_t blk_len;
>      int ret;
> +    static uint32_t error_size;
> 
> -    size = MIN(size, 4096);
> +    if (error_size == size) {
> +        error_size = 0;
> +        return -EIO;
> +    } else if (size > 4096) {
> +        error_size = size - 4096;
> +        size = 4096;
> +    }
> 
>      QEMU_BUILD_BUG_ON(FUSE_MAX_WRITE_BYTES > BDRV_REQUEST_MAX_BYTES);
>      /* Limited by max_write, should not happen */
> 
> I know this is a bit artificial because to produce this, there must be
> an I/O error somewhere anyway, but if it does happen, qemu will
> understand it to mean ENOSPC for short writes, which is incorrect.  So I
> believe we need to resubmit the tail to maybe have it succeed now, or at
> least get the correct error code.
> 
> Reproducer as before:
> $ ./qemu-img create -f raw test.raw 8k
> Formatting 'test.raw', fmt=raw size=8192
> $ ./qemu-io -f raw -c 'write -P 42 0 8k' test.raw
> wrote 8192/8192 bytes at offset 0
> 8 KiB, 1 ops; 00.00 sec (64.804 MiB/sec and 8294.9003 ops/sec)
> $ hexdump -C test.raw
> 00000000  2a 2a 2a 2a 2a 2a 2a 2a  2a 2a 2a 2a 2a 2a 2a 2a  |****************|
> *
> 00002000
> $ storage-daemon/qemu-storage-daemon \
>     --blockdev file,node-name=test,filename=test.raw \
>     --export fuse,id=exp,node-name=test,mountpoint=test.raw,writable=true
> 
> $ ./qemu-io --image-opts -c 'read -P 23 0 8k' \
>     driver=file,filename=test.raw,cache.direct=on,aio=io_uring
> read 8192/8192 bytes at offset 0
> 8 KiB, 1 ops; 00.00 sec (58.481 MiB/sec and 7485.5342 ops/sec)
> $ ./qemu-io --image-opts -c 'write -P 23 0 8k' \
>     driver=file,filename=test.raw,cache.direct=on,aio=io_uring
> write failed: No space left on device
> $ hexdump -C test.raw
> 00000000  17 17 17 17 17 17 17 17  17 17 17 17 17 17 17 17  |................|
> *
> 00001000  2a 2a 2a 2a 2a 2a 2a 2a  2a 2a 2a 2a 2a 2a 2a 2a  |****************|
> *
> 00002000
> 
> So short reads already work (because there is code for that), but short
> writes incorrectly produce ENOSPC.  This patch fixes that by
> resubmitting not only the tail of short reads but short writes also.
> 
> Signed-off-by: Hanna Czenczek <[email protected]>

> @@ -44,6 +44,10 @@ static void luring_prep_sqe(struct io_uring_sqe *sqe, void 
> *opaque)
       uint64_t offset = req->offset;
>      int fd = req->fd;
>      BdrvRequestFlags flags = req->flags;
>  
> +    if (req->resubmit_qiov.iov != NULL) {
> +        qiov = &req->resubmit_qiov;
> +    }
> +

We could have offset = req->offset + req->total_done again instead of
adding them in each case below, like I already commented on linux-aio.c.

Reviewed-by: Kevin Wolf <[email protected]>


Reply via email to