Am 18.03.2026 um 16:32 hat Hanna Czenczek geschrieben:
> Short writes can happen, too, not just short reads. The difference to
> aio=native is that the kernel will actually retry the tail of short
> requests internally already -- so it is harder to reproduce. But if the
> tail of a short request returns an error to the kernel, we will see it
> in userspace still. To reproduce this, apply the following patch on top
> of the one shown in HEAD^ (again %s/escaped // to apply):
>
> escaped diff --git a/block/export/fuse.c b/block/export/fuse.c
> escaped index 67dc50a412..2b98489a32 100644
> escaped --- a/block/export/fuse.c
> escaped +++ b/block/export/fuse.c
> @@ -1059,8 +1059,15 @@ fuse_co_read(FuseExport *exp, void **bufptr, uint64_t
> offset, uint32_t size)
> int64_t blk_len;
> void *buf;
> int ret;
> + static uint32_t error_size;
>
> - size = MIN(size, 4096);
> + if (error_size == size) {
> + error_size = 0;
> + return -EIO;
> + } else if (size > 4096) {
> + error_size = size - 4096;
> + size = 4096;
> + }
>
> /* Limited by max_read, should not happen */
> if (size > FUSE_MAX_READ_BYTES) {
> @@ -1111,8 +1118,15 @@ fuse_co_write(FuseExport *exp, struct fuse_write_out
> *out,
> {
> int64_t blk_len;
> int ret;
> + static uint32_t error_size;
>
> - size = MIN(size, 4096);
> + if (error_size == size) {
> + error_size = 0;
> + return -EIO;
> + } else if (size > 4096) {
> + error_size = size - 4096;
> + size = 4096;
> + }
>
> QEMU_BUILD_BUG_ON(FUSE_MAX_WRITE_BYTES > BDRV_REQUEST_MAX_BYTES);
> /* Limited by max_write, should not happen */
>
> I know this is a bit artificial because to produce this, there must be
> an I/O error somewhere anyway, but if it does happen, qemu will
> understand it to mean ENOSPC for short writes, which is incorrect. So I
> believe we need to resubmit the tail to maybe have it succeed now, or at
> least get the correct error code.
>
> Reproducer as before:
> $ ./qemu-img create -f raw test.raw 8k
> Formatting 'test.raw', fmt=raw size=8192
> $ ./qemu-io -f raw -c 'write -P 42 0 8k' test.raw
> wrote 8192/8192 bytes at offset 0
> 8 KiB, 1 ops; 00.00 sec (64.804 MiB/sec and 8294.9003 ops/sec)
> $ hexdump -C test.raw
> 00000000 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a |****************|
> *
> 00002000
> $ storage-daemon/qemu-storage-daemon \
> --blockdev file,node-name=test,filename=test.raw \
> --export fuse,id=exp,node-name=test,mountpoint=test.raw,writable=true
>
> $ ./qemu-io --image-opts -c 'read -P 23 0 8k' \
> driver=file,filename=test.raw,cache.direct=on,aio=io_uring
> read 8192/8192 bytes at offset 0
> 8 KiB, 1 ops; 00.00 sec (58.481 MiB/sec and 7485.5342 ops/sec)
> $ ./qemu-io --image-opts -c 'write -P 23 0 8k' \
> driver=file,filename=test.raw,cache.direct=on,aio=io_uring
> write failed: No space left on device
> $ hexdump -C test.raw
> 00000000 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 |................|
> *
> 00001000 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a |****************|
> *
> 00002000
>
> So short reads already work (because there is code for that), but short
> writes incorrectly produce ENOSPC. This patch fixes that by
> resubmitting not only the tail of short reads but short writes also.
>
> Signed-off-by: Hanna Czenczek <[email protected]>
> @@ -44,6 +44,10 @@ static void luring_prep_sqe(struct io_uring_sqe *sqe, void
> *opaque)
uint64_t offset = req->offset;
> int fd = req->fd;
> BdrvRequestFlags flags = req->flags;
>
> + if (req->resubmit_qiov.iov != NULL) {
> + qiov = &req->resubmit_qiov;
> + }
> +
We could have offset = req->offset + req->total_done again instead of
adding them in each case below, like I already commented on linux-aio.c.
Reviewed-by: Kevin Wolf <[email protected]>