On 2018-01-30 15:23, Anton Nefedov wrote: > > > On 29/1/2018 11:28 PM, Max Reitz wrote: >> On 2018-01-18 18:49, Anton Nefedov wrote: >>> If COW areas of the newly allocated clusters are zeroes on the >>> backing image, >>> efficient bdrv_write_zeroes(flags=BDRV_REQ_ALLOCATE) can be used on >>> the whole >>> cluster instead of writing explicit zero buffers later in perform_cow(). >>> >>> iotest 060: >>> write to the discarded cluster does not trigger COW anymore. >>> Use a backing image instead. >>> >>> iotest 066: >>> cluster-alignment areas that were not really COWed are now detected >>> as zeroes, hence the initial write has to be exactly the same size for >>> the maps to match >>> >>> Signed-off-by: Anton Nefedov <anton.nefe...@virtuozzo.com> >>> --- >>> qapi/block-core.json | 4 ++- >>> block/qcow2.h | 6 +++++ >>> block/qcow2-cluster.c | 2 +- >>> block/qcow2.c | 66 >>> ++++++++++++++++++++++++++++++++++++++++++++-- >>> block/trace-events | 1 + >>> tests/qemu-iotests/060 | 26 +++++++++++------- >>> tests/qemu-iotests/060.out | 5 +++- >>> tests/qemu-iotests/066 | 2 +- >>> tests/qemu-iotests/066.out | 4 +-- >>> 9 files changed, 98 insertions(+), 18 deletions(-) >> >> [...] >> >>> @@ -1875,6 +1880,52 @@ static bool is_zero(BlockDriverState *bs, >>> int64_t offset, int64_t bytes) >>> return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == bytes; >>> } >>> +static bool is_zero_cow(BlockDriverState *bs, QCowL2Meta *m) >>> +{ >>> + return is_zero(bs, m->offset + m->cow_start.offset, >>> + m->cow_start.nb_bytes) && >>> + is_zero(bs, m->offset + m->cow_end.offset, >>> m->cow_end.nb_bytes); >>> +} >>> + >>> +static int handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta) >>> +{ >>> + BDRVQcow2State *s = bs->opaque; >>> + QCowL2Meta *m; >>> + >>> + for (m = l2meta; m != NULL; m = m->next) { >>> + int ret; >>> + >>> + if (!m->cow_start.nb_bytes && !m->cow_end.nb_bytes) { >>> + continue; >>> + } >>> + >>> + if (bs->encrypted) { >>> + continue; >>> + } >> >> Not sure if the compiler optimizes this anyway, but I'd pull this out of >> the loop. >> > > An imprint of the following patches (which were dropped from this > series) - preallocation ahead of image EOF, which takes action > regardless of image encryption. > > But I'll leave the check outside the loop until it's needed > there. > > >> Maybe you could put all of the conditions under which this function can >> actually do something at its beginning: That is, we can't do anything if >> the BDS is encrypted or if bs->file does not support BDRV_REQ_ALLOCATE >> (and then you just call this function unconditionally in >> qcow2_co_pwritev()). >> > > Done. > >>> + if (!is_zero_cow(bs, m)) { >>> + continue; >>> + } >> >> Is this really efficient? I remember someone complaining about >> bdrv_co_block_status() being kind of slow on some filesystems, so >> there'd be a tradeoff depending on how it compares to just reading up to >> two clusters from the backing file -- especially considering that the OS >> can query the same information just as quickly, and thus the only >> overhead the read should have is a memset() (assuming the OS is clever). >> >> So basically my question is whether it would be better to just skip this >> if we have any backing file at all and only do this optimization if >> there is none. >> > > So what we trade between is > (read+write) vs (lseek+fallocate or lseek+read+write). > > Indeed if it comes to lseek the profit is smaller, and we're probably > unlikely to find a hole anyway. > > Maybe it's good enough to cover these cases: > 1. no backing > 2. beyond bdrv_getlength() in backing > 3. unallocated in backing qcow2 (covers 'beyond bdrv_getlength() > in backing->file') > > 1 & 2 are easy to check;
Not sure how useful 2 is, though. (I don't know. I always hear about people wanting to optimize for such a case where a backing file is shorter than the overlay, but I can't imagine a real use case for that.) > 3: if that's not too hacky maybe we can do the bdrv_is_allocated() check > for qcow2 exclusively and if there is raw (or any other format) backing > image - do the COW Yes, well, it seems useful in principle... But it would require general block layer work indeed. Maybe a new flag for bdrv_block_status*() that says only to look into the format layer? Max
signature.asc
Description: OpenPGP digital signature