On 3/26/19 10:51 AM, Kevin Wolf wrote: > We know that the kernel implements a slow fallback code path for > BLKZEROOUT, so if BDRV_REQ_NO_FALLBACK is given, we shouldn't call it. > The other operations we call in the context of .bdrv_co_pwrite_zeroes > should usually be quick, so no modification should be needed for them. > If we ever notice that there are additional problematic cases, we can > still make these conditional as well.
Are there cases where fallocate(FALLOC_FL_ZERO_RANGE) falls back to slow writes? It may be fast on some file systems, but when used on a block device, that may equally trigger slow fallbacks. The man page is not clear on that fact; I suspect that there may be cases in there that need to be made conditional (it would be awesome if the kernel folks would give us another FALLOC_ flag when we want to guarantee no fallback). By the way, is there an easy setup to prove (maybe some qemu-img convert command on a specially-prepared source image) whether the no fallback flag makes a difference? I'm about to cross-post a series of patches to nbd/qemu/nbdkit/libnbd that adds a new NBD_CMD_FLAG_FAST_ZERO which fits the bill of BDRV_REQ_NO_FALLBACK, but would like to include some benchmark numbers in my cover letter if I can reproduce a setup where it matters. And this patch has a bug: > +++ b/block/file-posix.c > @@ -652,7 +652,7 @@ static int raw_open_common(BlockDriverState *bs, QDict > *options, > } > #endif > > - bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP; > + bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK; > ret = 0; > fail: > if (filename && (bdrv_flags & BDRV_O_TEMPORARY)) { > @@ -1500,14 +1500,19 @@ static ssize_t > handle_aiocb_write_zeroes_block(RawPosixAIOData *aiocb) { int ret = -ENOTSUP; BDRVRawState *s = aiocb->bs->opaque; if (!s->has_write_zeroes) { return -ENOTSUP; > } At this point, ret is -ENOTSUP. > > #ifdef BLKZEROOUT > - do { > - uint64_t range[2] = { aiocb->aio_offset, aiocb->aio_nbytes }; > - if (ioctl(aiocb->aio_fildes, BLKZEROOUT, range) == 0) { > - return 0; > - } > - } while (errno == EINTR); > + /* The BLKZEROOUT implementation in the kernel doesn't set > + * BLKDEV_ZERO_NOFALLBACK, so we can't call this if we have to avoid slow > + * fallbacks. */ > + if (!(aiocb->aio_type & QEMU_AIO_NO_FALLBACK)) { > + do { > + uint64_t range[2] = { aiocb->aio_offset, aiocb->aio_nbytes }; > + if (ioctl(aiocb->aio_fildes, BLKZEROOUT, range) == 0) { > + return 0; > + } > + } while (errno == EINTR); > > - ret = translate_err(-errno); > + ret = translate_err(-errno); > + } If the very first call to this function is with NO_FALLBACK, then this 'if' is skipped, > #endif > > if (ret == -ENOTSUP) { s->has_write_zeroes = false; } and we set s->has_write_zeroes to false, permanently disabling any BLKZEROOUT attempts in future calls, even if the future calls no longer pass the NO_FALLBACK flag. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
signature.asc
Description: OpenPGP digital signature