On 17/01/14 15:14, Kevin Wolf wrote: > This patch series adds code to the block layer that allows performing > I/O requests in smaller granularities than required by the host backend > (most importantly, O_DIRECT restrictions). It achieves this for reads > by rounding the request to host-side block boundary, and for writes by > performing a read-modify-write cycle (and serialising requests > touching the same block so that the RMW doesn't write back stale data).
Nice, this might really help on s390 (also for KVM) since dasd disks usually have a 4k sector size. We also have flash systems with 4k block size. Both disk systems cause lots of trouble with cache=none and friends. Do you have a tree with these patches, so that I can test those on s390? > > Originally I intended to reuse a lot of code from Paolo's previous > patch series, however as I tried to integrate pread/pwrite, which > already do a very similar thing (except for considering concurrency), > and because I wanted to implement zero-copy, most of this series ended > up being new code. > > Zero-copy is possible in a common case because while XFS defauls to a > 4k sector size and therefore 4k on-disk O_DIRECT alignment for 512E > disks, it still only has a 512 byte memory alignment requirement. > (Unfortunately the XFS_IOC_DIOINFO ioctl claims 4k even for memory, but > we know that the value is wrong and can probe it.) > > > Changes in v2 -> v3: > - Fixed I/O throttling bypass by converting to byte granularity [Wenchao] > - Made 'bytes' argument to tracked_request_overlaps() unsigned [Max] > - Fixed a corruption bug that came from using outdated RMW buffers after > waiting for another request and added some assertions to check the > assumptions [Peter] > - Fixed bytes vs. sectors error in zero-after-EOF code of > bdrv_co_do_preadv [Max] > - Removed orphaned protoype in block.h [Max] > - A qemu-iotests case and some infrastructure to support it > > Changes in v1 -> v2: > - Fixed overlap_bytes calculation in mark_request_serialising() > - Fixed wait_serialising_requests() deadlock > - iscsi: Set bs->request_alignment [Peter] > - iscsi: Query block limits only in iscsi_open() when no other request > are in flight, and in iscsi_refresh_limits() copy the stored values > into bs->bl [Peter] > > Changes in RFC -> v1: > - Moved opt_mem_alignment into BlockLimits [Paolo] > - Changed BlockLimits in turn to work a bit more like the > .bdrv_opt_mem_align() callback of the RFC; allows updating the > BlockLimits later when the chain changes or bdrv_reopen() toggles > O_DIRECT > - Fixed a typo in a commit message [Eric] > > > Kevin Wolf (26): > block: Move initialisation of BlockLimits to bdrv_refresh_limits() > block: Inherit opt_transfer_length > block: Update BlockLimits when they might have changed > qemu_memalign: Allow small alignments > block: Detect unaligned length in bdrv_qiov_is_aligned() > block: Don't use guest sector size for qemu_blockalign() > block: Introduce bdrv_aligned_preadv() > block: Introduce bdrv_co_do_preadv() > block: Introduce bdrv_aligned_pwritev() > block: write: Handle COR dependency after I/O throttling > block: Introduce bdrv_co_do_pwritev() > block: Switch BdrvTrackedRequest to byte granularity > block: Allow waiting for overlapping requests between begin/end > block: Make zero-after-EOF work with larger alignment > block: Generalise and optimise COR serialisation > block: Make overlap range for serialisation dynamic > block: Allow wait_serialising_requests() at any point > block: Align requests in bdrv_co_do_pwritev() > block: Assert serialisation assumptions in pwritev > block: Change coroutine wrapper to byte granularity > block: Make bdrv_pread() a bdrv_prwv_co() wrapper > block: Make bdrv_pwrite() a bdrv_prwv_co() wrapper > blkdebug: Make required alignment configurable > qemu-io: New command 'sleep' > qemu-iotests: Test pwritev RMW logic > block: Switch bdrv_io_limits_intercept() to byte granularity > > Paolo Bonzini (3): > block: rename buffer_alignment to guest_block_size > raw: Probe required direct I/O alignment > iscsi: Set bs->request_alignment > > block.c | 644 > +++++++++++++++++++++++++++++++-------------- > block/backup.c | 7 +- > block/blkdebug.c | 24 ++ > block/iscsi.c | 47 ++-- > block/qcow2.c | 11 +- > block/qed.c | 11 +- > block/raw-posix.c | 102 +++++-- > block/raw-win32.c | 41 +++ > block/stream.c | 2 + > block/vmdk.c | 22 +- > hw/block/virtio-blk.c | 2 +- > hw/ide/core.c | 2 +- > hw/scsi/scsi-disk.c | 2 +- > hw/scsi/scsi-generic.c | 2 +- > include/block/block.h | 15 +- > include/block/block_int.h | 27 +- > qemu-io-cmds.c | 42 +++ > tests/qemu-iotests/077 | 278 +++++++++++++++++++ > tests/qemu-iotests/077.out | 202 ++++++++++++++ > tests/qemu-iotests/group | 1 + > util/oslib-posix.c | 5 + > 21 files changed, 1234 insertions(+), 255 deletions(-) > create mode 100755 tests/qemu-iotests/077 > create mode 100644 tests/qemu-iotests/077.out >