Am 26.07.2018 um 18:22 hat Nir Soffer geschrieben: > On Thu, Jul 26, 2018 at 2:33 PM Kevin Wolf <kw...@redhat.com> wrote: > > > The BLKDISCARD ioctl doesn't guarantee that the discarded blocks read as > > all-zero afterwards, so don't try to abuse it for zero writing. We try > > to only use this if BLKDISCARDZEROES tells us that it is safe, but this > > is unreliable on older kernels and a constant 0 in newer kernels. In > > other words, this code path is never actually used with newer kernels, > > so we don't even try to unmap while writing zeros. > > > > This patch removes the abuse of discard for writing zeroes from > > file-posix and instead adds a new function that uses interfaces that are > > actually meant to deallocate and zero out at the same time. Only if > > those fail, it falls back to zeroing out without unmap. We never fall > > back to a discard operation any more that may or may not result in > > zeros. > > > > Signed-off-by: Kevin Wolf <kw...@redhat.com> > > --- > > block/file-posix.c | 62 > > +++++++++++++++++++++++++++++++++++++++++------------- > > 1 file changed, 47 insertions(+), 15 deletions(-) > > > > diff --git a/block/file-posix.c b/block/file-posix.c > > index 60af4b3d51..9c66cd87d1 100644 > > --- a/block/file-posix.c > > +++ b/block/file-posix.c > > @@ -648,7 +648,7 @@ static int raw_open_common(BlockDriverState *bs, QDict > > *options, > > } > > #endif > > > > - bs->supported_zero_flags = s->discard_zeroes ? BDRV_REQ_MAY_UNMAP : 0; > > + bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP; > > ret = 0; > > fail: > > if (filename && (bdrv_flags & BDRV_O_TEMPORARY)) { > > @@ -1487,6 +1487,38 @@ static ssize_t > > handle_aiocb_write_zeroes(RawPosixAIOData *aiocb) > > return -ENOTSUP; > > } > > > > +static ssize_t handle_aiocb_write_zeroes_unmap(RawPosixAIOData *aiocb) > > +{ > > + BDRVRawState *s = aiocb->bs->opaque; > > + int ret; > > + > > + /* First try to write zeros and unmap at the same time */ > > + > > +#ifdef CONFIG_FALLOCATE_PUNCH_HOLE > > + ret = do_fallocate(s->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, > > + aiocb->aio_offset, aiocb->aio_nbytes); > > + if (ret != -ENOTSUP) { > > > > This fails with ENODEV on RHEL 7.5 when fd is a block device. > > The manual says: > > ENODEV fd does not refer to a regular file or a directory. (If fd > is a pipe or FIFO, a different error results.)
do_fallocate() is a wrapper around fallocate() that handles EINTR and translates a few errno values (including ENODEV) into a -ENOTSUP return code. Kevin