subject:"\[Qemu\-devel\] \[PATCH v1 03\/13\] qcow2\: do not COW the empty areas"

Re: [Qemu-devel] [PATCH v1 03/13] qcow2: do not COW the empty areas

2017-05-23 Thread Denis V. Lunev

On 05/22/2017 10:24 PM, Eric Blake wrote:
> On 05/19/2017 04:34 AM, Anton Nefedov wrote:
>> If COW area of the newly allocated cluster is zeroes, there is no reason
>> to write zero sectors in perform_cow() again now as whole clusters are
>> zeroed out in single chunks by handle_alloc_space().
> But that's only true if you can guarantee that handle_alloc_space()
> succeeded at ensuring the cluster reads as zeroes.  If you silently
> ignore errors (which is what patch 1/13 does), you risk assuming that
> the cluster reads as zeroes when in reality it does not, and then you
> have corrupted data.
>
> The idea of avoiding a COW of areas that read as zero at the source when
> the destination also already reads as zeroes makes sense, but I'm not
> convinced that this patch is safe as written.
we will recheck error path. OK.

>> Introduce QCowL2Meta field "reduced", since the existing fields
>> (offset and nb_bytes) still has to keep other write requests from
>> simultaneous writing in the area
>>
>> iotest 060:
>> write to the discarded cluster does not trigger COW anymore.
>> so, break on write_aio event instead, will work for the test
>> (but write won't fail anymore, so update reference output)
>>
>> iotest 066:
>> cluster-alignment areas that were not really COWed are now detected
>> as zeroes, hence the initial write has to be exactly the same size for
>> the maps to match
>>
>> performance tests: ===
>>
>> qemu-io,
>>   results in seconds to complete (less is better)
>>   random write 4k to empty image, no backing
>> HDD
>>   64k cluster
>> 128M over 128M image:   160 -> 160 ( x1  )
>> 128M over   2G image:86 ->  84 ( x1  )
>> 128M over   8G image:40 ->  29 ( x1.4 )
>>   1M cluster
>>  32M over   8G image:58 ->  23 ( x2.5 )
>>
>> SSD
>>   64k cluster
>>   2G over   2G image:71 ->  38 (  x1.9 )
>> 512M over   8G image:85 ->   8 ( x10.6 )
>>   1M cluster
>> 128M over  32G image:   314 ->   2 ( x157  )
> At any rate, the benchmark numbers show that there is merit to pursuing
> the idea of reducing I/O when partial cluster writes can avoid writing
> COW'd zeroes on either side of the data.
>
yes! This is exactly the point and also with this approach we would
allow sequential non-aligned to cluster writes, which is also very good.

Den

Re: [Qemu-devel] [PATCH v1 03/13] qcow2: do not COW the empty areas

2017-05-23 Thread Anton Nefedov




On 05/22/2017 10:24 PM, Eric Blake wrote:

On 05/19/2017 04:34 AM, Anton Nefedov wrote:

If COW area of the newly allocated cluster is zeroes, there is no reason
to write zero sectors in perform_cow() again now as whole clusters are
zeroed out in single chunks by handle_alloc_space().


But that's only true if you can guarantee that handle_alloc_space()
succeeded at ensuring the cluster reads as zeroes.  If you silently
ignore errors (which is what patch 1/13 does), you risk assuming that
the cluster reads as zeroes when in reality it does not, and then you
have corrupted data.



Sure; COW is only skipped if pwrite_zeroes() from patch 1/13
succeeds


The idea of avoiding a COW of areas that read as zero at the source when
the destination also already reads as zeroes makes sense, but I'm not
convinced that this patch is safe as written.



/Anton

Re: [Qemu-devel] [PATCH v1 03/13] qcow2: do not COW the empty areas

2017-05-22 Thread Eric Blake

On 05/19/2017 04:34 AM, Anton Nefedov wrote:
> If COW area of the newly allocated cluster is zeroes, there is no reason
> to write zero sectors in perform_cow() again now as whole clusters are
> zeroed out in single chunks by handle_alloc_space().

But that's only true if you can guarantee that handle_alloc_space()
succeeded at ensuring the cluster reads as zeroes.  If you silently
ignore errors (which is what patch 1/13 does), you risk assuming that
the cluster reads as zeroes when in reality it does not, and then you
have corrupted data.

The idea of avoiding a COW of areas that read as zero at the source when
the destination also already reads as zeroes makes sense, but I'm not
convinced that this patch is safe as written.

> 
> Introduce QCowL2Meta field "reduced", since the existing fields
> (offset and nb_bytes) still has to keep other write requests from
> simultaneous writing in the area
> 
> iotest 060:
> write to the discarded cluster does not trigger COW anymore.
> so, break on write_aio event instead, will work for the test
> (but write won't fail anymore, so update reference output)
> 
> iotest 066:
> cluster-alignment areas that were not really COWed are now detected
> as zeroes, hence the initial write has to be exactly the same size for
> the maps to match
> 
> performance tests: ===
> 
> qemu-io,
>   results in seconds to complete (less is better)
>   random write 4k to empty image, no backing
> HDD
>   64k cluster
> 128M over 128M image:   160 -> 160 ( x1  )
> 128M over   2G image:86 ->  84 ( x1  )
> 128M over   8G image:40 ->  29 ( x1.4 )
>   1M cluster
>  32M over   8G image:58 ->  23 ( x2.5 )
> 
> SSD
>   64k cluster
>   2G over   2G image:71 ->  38 (  x1.9 )
> 512M over   8G image:85 ->   8 ( x10.6 )
>   1M cluster
> 128M over  32G image:   314 ->   2 ( x157  )

At any rate, the benchmark numbers show that there is merit to pursuing
the idea of reducing I/O when partial cluster writes can avoid writing
COW'd zeroes on either side of the data.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH v1 03/13] qcow2: do not COW the empty areas

2017-05-19 Thread Anton Nefedov

If COW area of the newly allocated cluster is zeroes, there is no reason
to write zero sectors in perform_cow() again now as whole clusters are
zeroed out in single chunks by handle_alloc_space().

Introduce QCowL2Meta field "reduced", since the existing fields
(offset and nb_bytes) still has to keep other write requests from
simultaneous writing in the area

iotest 060:
write to the discarded cluster does not trigger COW anymore.
so, break on write_aio event instead, will work for the test
(but write won't fail anymore, so update reference output)

iotest 066:
cluster-alignment areas that were not really COWed are now detected
as zeroes, hence the initial write has to be exactly the same size for
the maps to match

performance tests: ===

qemu-io,
  results in seconds to complete (less is better)
  random write 4k to empty image, no backing
HDD
  64k cluster
128M over 128M image:   160 -> 160 ( x1  )
128M over   2G image:86 ->  84 ( x1  )
128M over   8G image:40 ->  29 ( x1.4 )
  1M cluster
 32M over   8G image:58 ->  23 ( x2.5 )

SSD
  64k cluster
  2G over   2G image:71 ->  38 (  x1.9 )
512M over   8G image:85 ->   8 ( x10.6 )
  1M cluster
128M over  32G image:   314 ->   2 ( x157  )

  - improvement grows bigger the bigger the cluster size,
  - first data portions to the fresh image benefit the most
  (more chance to hit an unallocated cluster)
  - SSD improvement is close to the IO length reduction rate
  (e.g. writing only 4k instead of 64k) gives theoretical x16
  and practical x10 improvement)

fio tests over xfs, empty image (cluster 64k), no backing,

  first megabytes of random writes:
randwrite 4k, size=8g:

  HDD (io_size=128m) :  730 ->  1050 IOPS ( x1.45)
  SSD (io_size=512m) : 1500 ->  7000 IOPS ( x4.7 )

  random writes io_size==image_size:
randwrite 4k, size=2g io_size=2g:
   HDD   : 200 IOPS (no difference)
   SSD   : 7500 ->  9500 IOPS ( x1.3 )

  sequential write:
seqwrite 4k, size=4g, iodepth=4
   SSD   : 7000 -> 18000 IOPS ( x2.6 )

  - numbers are similar to qemu-io tests, slightly less improvement
  (damped by fs?)

Signed-off-by: Anton Nefedov 
Signed-off-by: Denis V. Lunev 
---
 block/qcow2-cluster.c  |  4 +++-
 block/qcow2.c  | 23 +++
 block/qcow2.h  |  4 
 tests/qemu-iotests/060 |  2 +-
 tests/qemu-iotests/060.out |  3 ++-
 tests/qemu-iotests/066 |  2 +-
 tests/qemu-iotests/066.out |  4 ++--
 7 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 347d94b..cf18dee 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -758,7 +758,7 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m, 
Qcow2COWRegion *r)
 BDRVQcow2State *s = bs->opaque;
 int ret;
 
-if (r->nb_bytes == 0) {
+if (r->nb_bytes == 0 || r->reduced) {
 return 0;
 }
 
@@ -1267,10 +1267,12 @@ static int handle_alloc(BlockDriverState *bs, uint64_t 
guest_offset,
 .cow_start = {
 .offset = 0,
 .nb_bytes   = offset_into_cluster(s, guest_offset),
+.reduced= false,
 },
 .cow_end = {
 .offset = nb_bytes,
 .nb_bytes   = avail_bytes - nb_bytes,
+.reduced= false,
 },
 };
 qemu_co_queue_init(&(*m)->dependent_requests);
diff --git a/block/qcow2.c b/block/qcow2.c
index b885dfc..b438f22 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -64,6 +64,9 @@ typedef struct {
 #define  QCOW2_EXT_MAGIC_BACKING_FORMAT 0xE2792ACA
 #define  QCOW2_EXT_MAGIC_FEATURE_TABLE 0x6803f857
 
+static bool is_zero_sectors(BlockDriverState *bs, int64_t start,
+uint32_t count);
+
 static int qcow2_probe(const uint8_t *buf, int buf_size, const char *filename)
 {
 const QCowHeader *cow_header = (const void *)buf;
@@ -1575,6 +1578,25 @@ fail:
 return ret;
 }
 
+static void handle_cow_reduce(BlockDriverState *bs, QCowL2Meta *m)
+{
+if (bs->encrypted) {
+return;
+}
+if (!m->cow_start.reduced && m->cow_start.nb_bytes != 0 &&
+is_zero_sectors(bs,
+(m->offset + m->cow_start.offset) >> BDRV_SECTOR_BITS,
+m->cow_start.nb_bytes >> BDRV_SECTOR_BITS)) {
+m->cow_start.reduced = true;
+}
+if (!m->cow_end.reduced && m->cow_end.nb_bytes != 0 &&
+is_zero_sectors(bs,
+(m->offset + m->cow_end.offset) >> BDRV_SECTOR_BITS,
+m->cow_end.nb_bytes >> BDRV_SECTOR_BITS)) {
+m->cow_end.reduced = true;
+}
+}
+
 static void handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
 {
 BDRVQcow2State *s = bs->opaque;
@@ -1598,6 +1620,7 @@ static void handle_alloc_space(BlockDriverState

Re: [Qemu-devel] [PATCH v1 03/13] qcow2: do not COW the empty areas

Re: [Qemu-devel] [PATCH v1 03/13] qcow2: do not COW the empty areas

Re: [Qemu-devel] [PATCH v1 03/13] qcow2: do not COW the empty areas

[Qemu-devel] [PATCH v1 03/13] qcow2: do not COW the empty areas

4 matches

Site Navigation

Mail list logo

Footer information