Re: [Qemu-devel] [PATCH v3 00/29] block: Support for 512b-on-4k emulation

2014-01-23 Thread Kevin Wolf
Am 22.01.2014 um 21:30 hat Christian Borntraeger geschrieben:
 On 17/01/14 15:14, Kevin Wolf wrote:
  This patch series adds code to the block layer that allows performing
  I/O requests in smaller granularities than required by the host backend
  (most importantly, O_DIRECT restrictions). It achieves this for reads
  by rounding the request to host-side block boundary, and for writes by
  performing a read-modify-write cycle (and serialising requests
  touching the same block so that the RMW doesn't write back stale data).
 
 Nice, this might really help on s390 (also for KVM) since dasd disks usually
 have a 4k sector size. We also have flash systems with 4k block size. Both
 disk systems cause lots of trouble with cache=none and friends.

Can't you configure guests to use 4k sector size as well? With these
patches, it should work, but there is an obvious performance penalty, so
you want the guest to make as little use of it as possible.

For PCs, it's essentially just the boot process that needs 512 byte
accesses because that's the BIOS interface, and the OS generally won't
send misaligned requests later.

What's the situation on s390?

 Do you have a tree with these patches, so that I can test those on s390?

Sure, and any testing is appreciated:

git://repo.or.cz/qemu/kevin.git align

Kevin



Re: [Qemu-devel] [PATCH v3 00/29] block: Support for 512b-on-4k emulation

2014-01-23 Thread Christian Borntraeger
On 23/01/14 11:29, Kevin Wolf wrote:
 Am 22.01.2014 um 21:30 hat Christian Borntraeger geschrieben:
 On 17/01/14 15:14, Kevin Wolf wrote:
 This patch series adds code to the block layer that allows performing
 I/O requests in smaller granularities than required by the host backend
 (most importantly, O_DIRECT restrictions). It achieves this for reads
 by rounding the request to host-side block boundary, and for writes by
 performing a read-modify-write cycle (and serialising requests
 touching the same block so that the RMW doesn't write back stale data).

 Nice, this might really help on s390 (also for KVM) since dasd disks usually
 have a 4k sector size. We also have flash systems with 4k block size. Both
 disk systems cause lots of trouble with cache=none and friends.
 
 Can't you configure guests to use 4k sector size as well? With these
 patches, it should work, but there is an obvious performance penalty, so
 you want the guest to make as little use of it as possible.

Yes, thats what we do now, but it requires a command line parameter. 
A year ago or so we posted patches for block size detection (as well as geometry
pass through) but the necessary rework after feedback is still pending
(this got lost in a bunch of other todos, but we will restart that)

 For PCs, it's essentially just the boot process that needs 512 byte
 accesses because that's the BIOS interface, and the OS generally won't
 send misaligned requests later.
 
 What's the situation on s390?

Yes, its mostly the boot process. But there is also a problem if you have
an image file on file system that is on a 4k disk. Then the guest might send
non-aligned request IIRC.
 
 Do you have a tree with these patches, so that I can test those on s390?
 
 Sure, and any testing is appreciated:
 
 git://repo.or.cz/qemu/kevin.git align
 
 Kevin
 




Re: [Qemu-devel] [PATCH v3 00/29] block: Support for 512b-on-4k emulation

2014-01-22 Thread Christian Borntraeger
On 17/01/14 15:14, Kevin Wolf wrote:
 This patch series adds code to the block layer that allows performing
 I/O requests in smaller granularities than required by the host backend
 (most importantly, O_DIRECT restrictions). It achieves this for reads
 by rounding the request to host-side block boundary, and for writes by
 performing a read-modify-write cycle (and serialising requests
 touching the same block so that the RMW doesn't write back stale data).

Nice, this might really help on s390 (also for KVM) since dasd disks usually
have a 4k sector size. We also have flash systems with 4k block size. Both
disk systems cause lots of trouble with cache=none and friends.

Do you have a tree with these patches, so that I can test those on s390?



 
 Originally I intended to reuse a lot of code from Paolo's previous
 patch series, however as I tried to integrate pread/pwrite, which
 already do a very similar thing (except for considering concurrency),
 and because I wanted to implement zero-copy, most of this series ended
 up being new code.
 
 Zero-copy is possible in a common case because while XFS defauls to a
 4k sector size and therefore 4k on-disk O_DIRECT alignment for 512E
 disks, it still only has a 512 byte memory alignment requirement.
 (Unfortunately the XFS_IOC_DIOINFO ioctl claims 4k even for memory, but
 we know that the value is wrong and can probe it.)
 
 
 Changes in v2 - v3:
 - Fixed I/O throttling bypass by converting to byte granularity [Wenchao]
 - Made 'bytes' argument to tracked_request_overlaps() unsigned [Max]
 - Fixed a corruption bug that came from using outdated RMW buffers after
   waiting for another request and added some assertions to check the
   assumptions [Peter]
 - Fixed bytes vs. sectors error in zero-after-EOF code of
   bdrv_co_do_preadv [Max]
 - Removed orphaned protoype in block.h [Max]
 - A qemu-iotests case and some infrastructure to support it
 
 Changes in v1 - v2:
 - Fixed overlap_bytes calculation in mark_request_serialising()
 - Fixed wait_serialising_requests() deadlock
 - iscsi: Set bs-request_alignment [Peter]
 - iscsi: Query block limits only in iscsi_open() when no other request
   are in flight, and in iscsi_refresh_limits() copy the stored values
   into bs-bl [Peter]
 
 Changes in RFC - v1:
 - Moved opt_mem_alignment into BlockLimits [Paolo]
 - Changed BlockLimits in turn to work a bit more like the
   .bdrv_opt_mem_align() callback of the RFC; allows updating the
   BlockLimits later when the chain changes or bdrv_reopen() toggles
   O_DIRECT
 - Fixed a typo in a commit message [Eric]
 
 
 Kevin Wolf (26):
   block: Move initialisation of BlockLimits to bdrv_refresh_limits()
   block: Inherit opt_transfer_length
   block: Update BlockLimits when they might have changed
   qemu_memalign: Allow small alignments
   block: Detect unaligned length in bdrv_qiov_is_aligned()
   block: Don't use guest sector size for qemu_blockalign()
   block: Introduce bdrv_aligned_preadv()
   block: Introduce bdrv_co_do_preadv()
   block: Introduce bdrv_aligned_pwritev()
   block: write: Handle COR dependency after I/O throttling
   block: Introduce bdrv_co_do_pwritev()
   block: Switch BdrvTrackedRequest to byte granularity
   block: Allow waiting for overlapping requests between begin/end
   block: Make zero-after-EOF work with larger alignment
   block: Generalise and optimise COR serialisation
   block: Make overlap range for serialisation dynamic
   block: Allow wait_serialising_requests() at any point
   block: Align requests in bdrv_co_do_pwritev()
   block: Assert serialisation assumptions in pwritev
   block: Change coroutine wrapper to byte granularity
   block: Make bdrv_pread() a bdrv_prwv_co() wrapper
   block: Make bdrv_pwrite() a bdrv_prwv_co() wrapper
   blkdebug: Make required alignment configurable
   qemu-io: New command 'sleep'
   qemu-iotests: Test pwritev RMW logic
   block: Switch bdrv_io_limits_intercept() to byte granularity
 
 Paolo Bonzini (3):
   block: rename buffer_alignment to guest_block_size
   raw: Probe required direct I/O alignment
   iscsi: Set bs-request_alignment
 
  block.c| 644 
 +++--
  block/backup.c |   7 +-
  block/blkdebug.c   |  24 ++
  block/iscsi.c  |  47 ++--
  block/qcow2.c  |  11 +-
  block/qed.c|  11 +-
  block/raw-posix.c  | 102 +--
  block/raw-win32.c  |  41 +++
  block/stream.c |   2 +
  block/vmdk.c   |  22 +-
  hw/block/virtio-blk.c  |   2 +-
  hw/ide/core.c  |   2 +-
  hw/scsi/scsi-disk.c|   2 +-
  hw/scsi/scsi-generic.c |   2 +-
  include/block/block.h  |  15 +-
  include/block/block_int.h  |  27 +-
  qemu-io-cmds.c |  42 +++
  tests/qemu-iotests/077 | 278 +++
  tests/qemu-iotests/077.out | 202 ++
  tests/qemu-iotests/group   |   1 +
  util/oslib-posix.c |   5 +
  21