Re: [RFC v3 4/8] block: add BDRV_REQ_REGISTERED_BUF request flag

2022-08-17 Thread Stefan Hajnoczi
On Thu, Jul 14, 2022 at 10:54:12AM +0200, Hanna Reitz wrote:
> On 08.07.22 06:17, Stefan Hajnoczi wrote:
> > Block drivers may optimize I/O requests accessing buffers previously
> > registered with bdrv_register_buf(). Checking whether all elements of a
> > request's QEMUIOVector are within previously registered buffers is
> > expensive, so we need a hint from the user to avoid costly checks.
> > 
> > Add a BDRV_REQ_REGISTERED_BUF request flag to indicate that all
> > QEMUIOVector elements in an I/O request are known to be within
> > previously registered buffers.
> > 
> > bdrv_aligned_preadv() is strict in validating supported read flags and
> > its assertions fail when it sees BDRV_REQ_REGISTERED_BUF. There is no
> > harm in passing BDRV_REQ_REGISTERED_BUF to block drivers that do not
> > support it, so update the assertions to ignore BDRV_REQ_REGISTERED_BUF.
> > 
> > Care must be taken to clear the flag when the block layer or filter
> > drivers replace QEMUIOVector elements with bounce buffers since these
> > have not been registered with bdrv_register_buf(). A lot of the changes
> > in this commit deal with clearing the flag in those cases.
> > 
> > Ensuring that the flag is cleared properly is somewhat invasive to
> > implement across the block layer and it's hard to spot when future code
> > changes accidentally break it. Another option might be to add a flag to
> > QEMUIOVector itself and clear it in qemu_iovec_*() functions that modify
> > elements. That is more robust but somewhat of a layering violation, so I
> > haven't attempted that.
> 
> Yeah...  I will say that most read code already looks quite reasonable in
> that it’ll pass @flags to lower layers basically only if it’s an unmodified
> request, so it seems like in the past most people have already adhered to
> “don’t pass on any flags if you’re reading to a local bounce buffer”.
> 
> > Signed-off-by: Stefan Hajnoczi
> > ---
> >   include/block/block-common.h |  9 +
> >   block/blkverify.c|  4 ++--
> >   block/crypto.c   |  2 ++
> >   block/io.c   | 30 +++---
> >   block/mirror.c   |  2 ++
> >   block/raw-format.c   |  2 ++
> >   6 files changed, 40 insertions(+), 9 deletions(-)
> 
> Some things not covered here that look a bit wrong:
> 
> While bdrv_driver_preadv() asserts that the flags don’t contain anything the
> driver couldn’t handle (and this new flag is made exempt from that assertion
> here in this patch), bdrv_driver_pwritev() just hides those flags from
> drivers silently. I think just like we exempt the new flag from the
> assertion in bdrv_driver_preadv(), we should have bdrv_driver_pwritev()
> always pass it to drivers.
> 
> The following driver read/write functions assert that @flags is 0, which is
> probably no longer ideal:
> - bdrv_qed_co_writev()
> - block_crypto_co_preadv()
> - nbd_client_co_preadv()
> - parallels_co_writev()
> - qcow_co_preadv()
> - qcow_co_pwritev()
> - qemu_gluster_co_writev()
> - raw_co_pwritev() (block/file-posix.c)
> - replication_co_writev()
> - ssh_co_writev()
> - vhdx_co_writev()
> 
> snapshot_access_co_preadv_part() returns an error when any flags are set,
> but should probably ignore BDRV_REQ_REGISTERED_BUF for this check.

The assert(!flags) checks can be removed without losing much safety
since bdrv_driver_preadv/pwritev() prepare the flags bits appropriately
and calls from other locations are rare.

> 
> 
> While looking around, I spotted a couple of places that look like they could
> pass the flag on but currently don’t (just FYI, not asking for anything
> here):
> 
> bdrv_co_do_copy_on_readv() never passes the flags through to its calls, but
> I think it could pass this flag on in the one bdrv_driver_preadv() call
> where it doesn’t use a bounce buffer (“Read directly into the destination”).
> 
> qcow2’s qcow2_co_preadv_task() and qcow2_co_pwritev_task() (besides the
> encryption part) also look like they should pass this flag on, but, well,
> the functions themselves currently don’t get the flag, so they can’t.
> 
> qcow1’s qcow_co_preadv() and qcow_co_pwritev() are so-so, sometimes using a
> bounce buffer, and sometimes not, but those function could use optimization
> in general if anyone cared.
> 
> vpc_co_preadv()’s and vpc_co_pwritev()’s first
> bdrv_co_preadv()/bdrv_co_pwritev() invocations look straightforward, but as
> with qcow1, not sure if anyone cares.
> 
> I’m too lazy to thoroughly check what’s going on with qed_aio_write_main(). 
> Passing 0 is safe, and it doesn’t get the original request flags, so I guess
> doing anything about this would be difficult.
> 
> quorum’s read_fifo_child() probably could pass acb->flags. Probably. 
> Perhaps not.  Difficult to say it is.
> 
> block/replication.c also looks like a candidate for passing flags, but
> personally, I’d like to refrain from touching it.  (Well, besides the fact
> that replication_co_writev() asserts that @flags is 0.)
> 
> 
> 

Re: [RFC v3 4/8] block: add BDRV_REQ_REGISTERED_BUF request flag

2022-07-14 Thread Hanna Reitz

On 08.07.22 06:17, Stefan Hajnoczi wrote:

Block drivers may optimize I/O requests accessing buffers previously
registered with bdrv_register_buf(). Checking whether all elements of a
request's QEMUIOVector are within previously registered buffers is
expensive, so we need a hint from the user to avoid costly checks.

Add a BDRV_REQ_REGISTERED_BUF request flag to indicate that all
QEMUIOVector elements in an I/O request are known to be within
previously registered buffers.

bdrv_aligned_preadv() is strict in validating supported read flags and
its assertions fail when it sees BDRV_REQ_REGISTERED_BUF. There is no
harm in passing BDRV_REQ_REGISTERED_BUF to block drivers that do not
support it, so update the assertions to ignore BDRV_REQ_REGISTERED_BUF.

Care must be taken to clear the flag when the block layer or filter
drivers replace QEMUIOVector elements with bounce buffers since these
have not been registered with bdrv_register_buf(). A lot of the changes
in this commit deal with clearing the flag in those cases.

Ensuring that the flag is cleared properly is somewhat invasive to
implement across the block layer and it's hard to spot when future code
changes accidentally break it. Another option might be to add a flag to
QEMUIOVector itself and clear it in qemu_iovec_*() functions that modify
elements. That is more robust but somewhat of a layering violation, so I
haven't attempted that.


Yeah...  I will say that most read code already looks quite reasonable 
in that it’ll pass @flags to lower layers basically only if it’s an 
unmodified request, so it seems like in the past most people have 
already adhered to “don’t pass on any flags if you’re reading to a local 
bounce buffer”.



Signed-off-by: Stefan Hajnoczi
---
  include/block/block-common.h |  9 +
  block/blkverify.c|  4 ++--
  block/crypto.c   |  2 ++
  block/io.c   | 30 +++---
  block/mirror.c   |  2 ++
  block/raw-format.c   |  2 ++
  6 files changed, 40 insertions(+), 9 deletions(-)


Some things not covered here that look a bit wrong:

While bdrv_driver_preadv() asserts that the flags don’t contain anything 
the driver couldn’t handle (and this new flag is made exempt from that 
assertion here in this patch), bdrv_driver_pwritev() just hides those 
flags from drivers silently. I think just like we exempt the new flag 
from the assertion in bdrv_driver_preadv(), we should have 
bdrv_driver_pwritev() always pass it to drivers.


The following driver read/write functions assert that @flags is 0, which 
is probably no longer ideal:

- bdrv_qed_co_writev()
- block_crypto_co_preadv()
- nbd_client_co_preadv()
- parallels_co_writev()
- qcow_co_preadv()
- qcow_co_pwritev()
- qemu_gluster_co_writev()
- raw_co_pwritev() (block/file-posix.c)
- replication_co_writev()
- ssh_co_writev()
- vhdx_co_writev()

snapshot_access_co_preadv_part() returns an error when any flags are 
set, but should probably ignore BDRV_REQ_REGISTERED_BUF for this check.



While looking around, I spotted a couple of places that look like they 
could pass the flag on but currently don’t (just FYI, not asking for 
anything here):


bdrv_co_do_copy_on_readv() never passes the flags through to its calls, 
but I think it could pass this flag on in the one bdrv_driver_preadv() 
call where it doesn’t use a bounce buffer (“Read directly into the 
destination”).


qcow2’s qcow2_co_preadv_task() and qcow2_co_pwritev_task() (besides the 
encryption part) also look like they should pass this flag on, but, 
well, the functions themselves currently don’t get the flag, so they can’t.


qcow1’s qcow_co_preadv() and qcow_co_pwritev() are so-so, sometimes 
using a bounce buffer, and sometimes not, but those function could use 
optimization in general if anyone cared.


vpc_co_preadv()’s and vpc_co_pwritev()’s first 
bdrv_co_preadv()/bdrv_co_pwritev() invocations look straightforward, but 
as with qcow1, not sure if anyone cares.


I’m too lazy to thoroughly check what’s going on with 
qed_aio_write_main().  Passing 0 is safe, and it doesn’t get the 
original request flags, so I guess doing anything about this would be 
difficult.


quorum’s read_fifo_child() probably could pass acb->flags. Probably.  
Perhaps not.  Difficult to say it is.


block/replication.c also looks like a candidate for passing flags, but 
personally, I’d like to refrain from touching it.  (Well, besides the 
fact that replication_co_writev() asserts that @flags is 0.)



(And finally, I found that block/parallels.c invokes bdrv_co_pwritev() 
with a buffer instead of an I/O vector, which looks really wrong, but 
has nothing to do with this patch.)


[...]


diff --git a/block/io.c b/block/io.c
index e7f4117fe7..83b8259227 100644
--- a/block/io.c
+++ b/block/io.c


[...]


@@ -1902,6 +1910,11 @@ static int coroutine_fn 
bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
  return -ENOTSUP;
  }
  
+/* By definition there is 

[RFC v3 4/8] block: add BDRV_REQ_REGISTERED_BUF request flag

2022-07-07 Thread Stefan Hajnoczi
Block drivers may optimize I/O requests accessing buffers previously
registered with bdrv_register_buf(). Checking whether all elements of a
request's QEMUIOVector are within previously registered buffers is
expensive, so we need a hint from the user to avoid costly checks.

Add a BDRV_REQ_REGISTERED_BUF request flag to indicate that all
QEMUIOVector elements in an I/O request are known to be within
previously registered buffers.

bdrv_aligned_preadv() is strict in validating supported read flags and
its assertions fail when it sees BDRV_REQ_REGISTERED_BUF. There is no
harm in passing BDRV_REQ_REGISTERED_BUF to block drivers that do not
support it, so update the assertions to ignore BDRV_REQ_REGISTERED_BUF.

Care must be taken to clear the flag when the block layer or filter
drivers replace QEMUIOVector elements with bounce buffers since these
have not been registered with bdrv_register_buf(). A lot of the changes
in this commit deal with clearing the flag in those cases.

Ensuring that the flag is cleared properly is somewhat invasive to
implement across the block layer and it's hard to spot when future code
changes accidentally break it. Another option might be to add a flag to
QEMUIOVector itself and clear it in qemu_iovec_*() functions that modify
elements. That is more robust but somewhat of a layering violation, so I
haven't attempted that.

Signed-off-by: Stefan Hajnoczi 
---
 include/block/block-common.h |  9 +
 block/blkverify.c|  4 ++--
 block/crypto.c   |  2 ++
 block/io.c   | 30 +++---
 block/mirror.c   |  2 ++
 block/raw-format.c   |  2 ++
 6 files changed, 40 insertions(+), 9 deletions(-)

diff --git a/include/block/block-common.h b/include/block/block-common.h
index fdb7306e78..061606e867 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -80,6 +80,15 @@ typedef enum {
  */
 BDRV_REQ_MAY_UNMAP  = 0x4,
 
+/*
+ * An optimization hint when all QEMUIOVector elements are within
+ * previously registered bdrv_register_buf() memory ranges.
+ *
+ * Code that replaces the user's QEMUIOVector elements with bounce buffers
+ * must take care to clear this flag.
+ */
+BDRV_REQ_REGISTERED_BUF = 0x8,
+
 BDRV_REQ_FUA= 0x10,
 BDRV_REQ_WRITE_COMPRESSED   = 0x20,
 
diff --git a/block/blkverify.c b/block/blkverify.c
index e4a37af3b2..d624f4fd05 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -235,8 +235,8 @@ blkverify_co_preadv(BlockDriverState *bs, int64_t offset, 
int64_t bytes,
 qemu_iovec_init(_qiov, qiov->niov);
 qemu_iovec_clone(_qiov, qiov, buf);
 
-ret = blkverify_co_prwv(bs, , offset, bytes, qiov, _qiov, flags,
-false);
+ret = blkverify_co_prwv(bs, , offset, bytes, qiov, _qiov,
+flags & ~BDRV_REQ_REGISTERED_BUF, false);
 
 cmp_offset = qemu_iovec_compare(qiov, _qiov);
 if (cmp_offset != -1) {
diff --git a/block/crypto.c b/block/crypto.c
index 1ba82984ef..c900355adb 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -473,6 +473,8 @@ block_crypto_co_pwritev(BlockDriverState *bs, int64_t 
offset, int64_t bytes,
 uint64_t sector_size = qcrypto_block_get_sector_size(crypto->block);
 uint64_t payload_offset = qcrypto_block_get_payload_offset(crypto->block);
 
+flags &= ~BDRV_REQ_REGISTERED_BUF;
+
 assert(!(flags & ~BDRV_REQ_FUA));
 assert(payload_offset < INT64_MAX);
 assert(QEMU_IS_ALIGNED(offset, sector_size));
diff --git a/block/io.c b/block/io.c
index e7f4117fe7..83b8259227 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1541,11 +1541,14 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild 
*child,
 max_transfer = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX),
align);
 
-/* TODO: We would need a per-BDS .supported_read_flags and
+/*
+ * TODO: We would need a per-BDS .supported_read_flags and
  * potential fallback support, if we ever implement any read flags
  * to pass through to drivers.  For now, there aren't any
- * passthrough flags.  */
-assert(!(flags & ~(BDRV_REQ_COPY_ON_READ | BDRV_REQ_PREFETCH)));
+ * passthrough flags except the BDRV_REQ_REGISTERED_BUF optimization hint.
+ */
+assert(!(flags & ~(BDRV_REQ_COPY_ON_READ | BDRV_REQ_PREFETCH |
+   BDRV_REQ_REGISTERED_BUF)));
 
 /* Handle Copy on Read and associated serialisation */
 if (flags & BDRV_REQ_COPY_ON_READ) {
@@ -1586,7 +1589,7 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild 
*child,
 goto out;
 }
 
-assert(!(flags & ~bs->supported_read_flags));
+assert(!(flags & ~(bs->supported_read_flags | BDRV_REQ_REGISTERED_BUF)));
 
 max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align);
 if (bytes <= max_bytes && bytes <= max_transfer) {
@@ -1775,7 +1778,8 @@ static void