On Thu, Jul 14, 2022 at 10:54:12AM +0200, Hanna Reitz wrote: > On 08.07.22 06:17, Stefan Hajnoczi wrote: > > Block drivers may optimize I/O requests accessing buffers previously > > registered with bdrv_register_buf(). Checking whether all elements of a > > request's QEMUIOVector are within previously registered buffers is > > expensive, so we need a hint from the user to avoid costly checks. > > > > Add a BDRV_REQ_REGISTERED_BUF request flag to indicate that all > > QEMUIOVector elements in an I/O request are known to be within > > previously registered buffers. > > > > bdrv_aligned_preadv() is strict in validating supported read flags and > > its assertions fail when it sees BDRV_REQ_REGISTERED_BUF. There is no > > harm in passing BDRV_REQ_REGISTERED_BUF to block drivers that do not > > support it, so update the assertions to ignore BDRV_REQ_REGISTERED_BUF. > > > > Care must be taken to clear the flag when the block layer or filter > > drivers replace QEMUIOVector elements with bounce buffers since these > > have not been registered with bdrv_register_buf(). A lot of the changes > > in this commit deal with clearing the flag in those cases. > > > > Ensuring that the flag is cleared properly is somewhat invasive to > > implement across the block layer and it's hard to spot when future code > > changes accidentally break it. Another option might be to add a flag to > > QEMUIOVector itself and clear it in qemu_iovec_*() functions that modify > > elements. That is more robust but somewhat of a layering violation, so I > > haven't attempted that. > > Yeah... I will say that most read code already looks quite reasonable in > that it’ll pass @flags to lower layers basically only if it’s an unmodified > request, so it seems like in the past most people have already adhered to > “don’t pass on any flags if you’re reading to a local bounce buffer”. > > > Signed-off-by: Stefan Hajnoczi<stefa...@redhat.com> > > --- > > include/block/block-common.h | 9 +++++++++ > > block/blkverify.c | 4 ++-- > > block/crypto.c | 2 ++ > > block/io.c | 30 +++++++++++++++++++++++------- > > block/mirror.c | 2 ++ > > block/raw-format.c | 2 ++ > > 6 files changed, 40 insertions(+), 9 deletions(-) > > Some things not covered here that look a bit wrong: > > While bdrv_driver_preadv() asserts that the flags don’t contain anything the > driver couldn’t handle (and this new flag is made exempt from that assertion > here in this patch), bdrv_driver_pwritev() just hides those flags from > drivers silently. I think just like we exempt the new flag from the > assertion in bdrv_driver_preadv(), we should have bdrv_driver_pwritev() > always pass it to drivers. > > The following driver read/write functions assert that @flags is 0, which is > probably no longer ideal: > - bdrv_qed_co_writev() > - block_crypto_co_preadv() > - nbd_client_co_preadv() > - parallels_co_writev() > - qcow_co_preadv() > - qcow_co_pwritev() > - qemu_gluster_co_writev() > - raw_co_pwritev() (block/file-posix.c) > - replication_co_writev() > - ssh_co_writev() > - vhdx_co_writev() > > snapshot_access_co_preadv_part() returns an error when any flags are set, > but should probably ignore BDRV_REQ_REGISTERED_BUF for this check.
The assert(!flags) checks can be removed without losing much safety since bdrv_driver_preadv/pwritev() prepare the flags bits appropriately and calls from other locations are rare. > > > While looking around, I spotted a couple of places that look like they could > pass the flag on but currently don’t (just FYI, not asking for anything > here): > > bdrv_co_do_copy_on_readv() never passes the flags through to its calls, but > I think it could pass this flag on in the one bdrv_driver_preadv() call > where it doesn’t use a bounce buffer (“Read directly into the destination”). > > qcow2’s qcow2_co_preadv_task() and qcow2_co_pwritev_task() (besides the > encryption part) also look like they should pass this flag on, but, well, > the functions themselves currently don’t get the flag, so they can’t. > > qcow1’s qcow_co_preadv() and qcow_co_pwritev() are so-so, sometimes using a > bounce buffer, and sometimes not, but those function could use optimization > in general if anyone cared. > > vpc_co_preadv()’s and vpc_co_pwritev()’s first > bdrv_co_preadv()/bdrv_co_pwritev() invocations look straightforward, but as > with qcow1, not sure if anyone cares. > > I’m too lazy to thoroughly check what’s going on with qed_aio_write_main(). > Passing 0 is safe, and it doesn’t get the original request flags, so I guess > doing anything about this would be difficult. > > quorum’s read_fifo_child() probably could pass acb->flags. Probably. > Perhaps not. Difficult to say it is. > > block/replication.c also looks like a candidate for passing flags, but > personally, I’d like to refrain from touching it. (Well, besides the fact > that replication_co_writev() asserts that @flags is 0.) > > > (And finally, I found that block/parallels.c invokes bdrv_co_pwritev() with > a buffer instead of an I/O vector, which looks really wrong, but has nothing > to do with this patch.) Thanks for looking at these. I haven't attempted to propagate BDRV_REQ_REGISTERED_BUF through image format drivers yet so there are optimization opportunities here. > [...] > > > diff --git a/block/io.c b/block/io.c > > index e7f4117fe7..83b8259227 100644 > > --- a/block/io.c > > +++ b/block/io.c > > [...] > > > @@ -1902,6 +1910,11 @@ static int coroutine_fn > > bdrv_co_do_pwrite_zeroes(BlockDriverState *bs, > > return -ENOTSUP; > > } > > + /* By definition there is no user buffer so this flag doesn't make > > sense */ > > + if (flags & BDRV_REQ_REGISTERED_BUF) { > > + return -EINVAL; > > + } > > + > > Here we return an error when the flag is met... > > > /* Invalidate the cached block-status data range if this write > > overlaps */ > > bdrv_bsc_invalidate_range(bs, offset, bytes); > > @@ -2187,6 +2200,9 @@ static int coroutine_fn > > bdrv_co_do_zero_pwritev(BdrvChild *child, > > bool padding; > > BdrvRequestPadding pad; > > + /* This flag doesn't make sense for padding or zero writes */ > > + flags &= ~BDRV_REQ_REGISTERED_BUF; > > + > > ...and here we just ignore it. Why don’t we handle this the same in both of > these functions? (And what about bdrv_co_pwrite_zeroes()?) > > Besides that, if we do make it an error, I wonder if it shouldn’t be an > assertion instead so the duty of clearing the flag falls on the caller. (I > personally like just silently clearing it in the zero-write functions, > though.) Thanks for catching this. Let's consistently clear BDRV_REQ_REGISTERED_BUF silently for zero writes. Stefan
signature.asc
Description: PGP signature