Re: [Qemu-devel] [PATCH v3 00/29] block: Support for 512b-on-4k emulation
Am 22.01.2014 um 21:30 hat Christian Borntraeger geschrieben: On 17/01/14 15:14, Kevin Wolf wrote: This patch series adds code to the block layer that allows performing I/O requests in smaller granularities than required by the host backend (most importantly, O_DIRECT restrictions). It achieves this for reads by rounding the request to host-side block boundary, and for writes by performing a read-modify-write cycle (and serialising requests touching the same block so that the RMW doesn't write back stale data). Nice, this might really help on s390 (also for KVM) since dasd disks usually have a 4k sector size. We also have flash systems with 4k block size. Both disk systems cause lots of trouble with cache=none and friends. Can't you configure guests to use 4k sector size as well? With these patches, it should work, but there is an obvious performance penalty, so you want the guest to make as little use of it as possible. For PCs, it's essentially just the boot process that needs 512 byte accesses because that's the BIOS interface, and the OS generally won't send misaligned requests later. What's the situation on s390? Do you have a tree with these patches, so that I can test those on s390? Sure, and any testing is appreciated: git://repo.or.cz/qemu/kevin.git align Kevin
Re: [Qemu-devel] [PATCH v3 00/29] block: Support for 512b-on-4k emulation
On 23/01/14 11:29, Kevin Wolf wrote: Am 22.01.2014 um 21:30 hat Christian Borntraeger geschrieben: On 17/01/14 15:14, Kevin Wolf wrote: This patch series adds code to the block layer that allows performing I/O requests in smaller granularities than required by the host backend (most importantly, O_DIRECT restrictions). It achieves this for reads by rounding the request to host-side block boundary, and for writes by performing a read-modify-write cycle (and serialising requests touching the same block so that the RMW doesn't write back stale data). Nice, this might really help on s390 (also for KVM) since dasd disks usually have a 4k sector size. We also have flash systems with 4k block size. Both disk systems cause lots of trouble with cache=none and friends. Can't you configure guests to use 4k sector size as well? With these patches, it should work, but there is an obvious performance penalty, so you want the guest to make as little use of it as possible. Yes, thats what we do now, but it requires a command line parameter. A year ago or so we posted patches for block size detection (as well as geometry pass through) but the necessary rework after feedback is still pending (this got lost in a bunch of other todos, but we will restart that) For PCs, it's essentially just the boot process that needs 512 byte accesses because that's the BIOS interface, and the OS generally won't send misaligned requests later. What's the situation on s390? Yes, its mostly the boot process. But there is also a problem if you have an image file on file system that is on a 4k disk. Then the guest might send non-aligned request IIRC. Do you have a tree with these patches, so that I can test those on s390? Sure, and any testing is appreciated: git://repo.or.cz/qemu/kevin.git align Kevin
Re: [Qemu-devel] [PATCH v3 00/29] block: Support for 512b-on-4k emulation
On 17/01/14 15:14, Kevin Wolf wrote: This patch series adds code to the block layer that allows performing I/O requests in smaller granularities than required by the host backend (most importantly, O_DIRECT restrictions). It achieves this for reads by rounding the request to host-side block boundary, and for writes by performing a read-modify-write cycle (and serialising requests touching the same block so that the RMW doesn't write back stale data). Nice, this might really help on s390 (also for KVM) since dasd disks usually have a 4k sector size. We also have flash systems with 4k block size. Both disk systems cause lots of trouble with cache=none and friends. Do you have a tree with these patches, so that I can test those on s390? Originally I intended to reuse a lot of code from Paolo's previous patch series, however as I tried to integrate pread/pwrite, which already do a very similar thing (except for considering concurrency), and because I wanted to implement zero-copy, most of this series ended up being new code. Zero-copy is possible in a common case because while XFS defauls to a 4k sector size and therefore 4k on-disk O_DIRECT alignment for 512E disks, it still only has a 512 byte memory alignment requirement. (Unfortunately the XFS_IOC_DIOINFO ioctl claims 4k even for memory, but we know that the value is wrong and can probe it.) Changes in v2 - v3: - Fixed I/O throttling bypass by converting to byte granularity [Wenchao] - Made 'bytes' argument to tracked_request_overlaps() unsigned [Max] - Fixed a corruption bug that came from using outdated RMW buffers after waiting for another request and added some assertions to check the assumptions [Peter] - Fixed bytes vs. sectors error in zero-after-EOF code of bdrv_co_do_preadv [Max] - Removed orphaned protoype in block.h [Max] - A qemu-iotests case and some infrastructure to support it Changes in v1 - v2: - Fixed overlap_bytes calculation in mark_request_serialising() - Fixed wait_serialising_requests() deadlock - iscsi: Set bs-request_alignment [Peter] - iscsi: Query block limits only in iscsi_open() when no other request are in flight, and in iscsi_refresh_limits() copy the stored values into bs-bl [Peter] Changes in RFC - v1: - Moved opt_mem_alignment into BlockLimits [Paolo] - Changed BlockLimits in turn to work a bit more like the .bdrv_opt_mem_align() callback of the RFC; allows updating the BlockLimits later when the chain changes or bdrv_reopen() toggles O_DIRECT - Fixed a typo in a commit message [Eric] Kevin Wolf (26): block: Move initialisation of BlockLimits to bdrv_refresh_limits() block: Inherit opt_transfer_length block: Update BlockLimits when they might have changed qemu_memalign: Allow small alignments block: Detect unaligned length in bdrv_qiov_is_aligned() block: Don't use guest sector size for qemu_blockalign() block: Introduce bdrv_aligned_preadv() block: Introduce bdrv_co_do_preadv() block: Introduce bdrv_aligned_pwritev() block: write: Handle COR dependency after I/O throttling block: Introduce bdrv_co_do_pwritev() block: Switch BdrvTrackedRequest to byte granularity block: Allow waiting for overlapping requests between begin/end block: Make zero-after-EOF work with larger alignment block: Generalise and optimise COR serialisation block: Make overlap range for serialisation dynamic block: Allow wait_serialising_requests() at any point block: Align requests in bdrv_co_do_pwritev() block: Assert serialisation assumptions in pwritev block: Change coroutine wrapper to byte granularity block: Make bdrv_pread() a bdrv_prwv_co() wrapper block: Make bdrv_pwrite() a bdrv_prwv_co() wrapper blkdebug: Make required alignment configurable qemu-io: New command 'sleep' qemu-iotests: Test pwritev RMW logic block: Switch bdrv_io_limits_intercept() to byte granularity Paolo Bonzini (3): block: rename buffer_alignment to guest_block_size raw: Probe required direct I/O alignment iscsi: Set bs-request_alignment block.c| 644 +++-- block/backup.c | 7 +- block/blkdebug.c | 24 ++ block/iscsi.c | 47 ++-- block/qcow2.c | 11 +- block/qed.c| 11 +- block/raw-posix.c | 102 +-- block/raw-win32.c | 41 +++ block/stream.c | 2 + block/vmdk.c | 22 +- hw/block/virtio-blk.c | 2 +- hw/ide/core.c | 2 +- hw/scsi/scsi-disk.c| 2 +- hw/scsi/scsi-generic.c | 2 +- include/block/block.h | 15 +- include/block/block_int.h | 27 +- qemu-io-cmds.c | 42 +++ tests/qemu-iotests/077 | 278 +++ tests/qemu-iotests/077.out | 202 ++ tests/qemu-iotests/group | 1 + util/oslib-posix.c | 5 + 21