date:20200428

Re: [PATCH v4 00/18] nvme: factor out cmb/pmr setup

2020-04-28 Thread Klaus Jensen

On Apr 22 13:01, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> Changes since v3
> 
> * Remove the addition of a new PROPERTIES macro in "nvme: move device
>   parameters to separate struct" (Philippe)
> 
> * Add NVME_PMR_BIR constant and use it in PMR setup.
> 
> * Split "nvme: factor out cmb/pmr setup" into
>   - "nvme: factor out cmb setup",
>   - "nvme: factor out pmr setup" and
>   - "nvme: do cmb/pmr init as part of pci init"
>   (Philippe)
> 
> 
> Klaus Jensen (18):
>   nvme: fix pci doorbell size calculation
>   nvme: rename trace events to pci_nvme
>   nvme: remove superfluous breaks
>   nvme: move device parameters to separate struct
>   nvme: use constants in identify
>   nvme: refactor nvme_addr_read
>   nvme: add max_ioqpairs device parameter
>   nvme: remove redundant cmbloc/cmbsz members
>   nvme: factor out property/constraint checks
>   nvme: factor out device state setup
>   nvme: factor out block backend setup
>   nvme: add namespace helpers
>   nvme: factor out namespace setup
>   nvme: factor out pci setup
>   nvme: factor out cmb setup
>   nvme: factor out pmr setup
>   nvme: do cmb/pmr init as part of pci init
>   nvme: factor out controller identify setup
> 
>  hw/block/nvme.c   | 543 --
>  hw/block/nvme.h   |  31 ++-
>  hw/block/trace-events | 180 +++---
>  include/block/nvme.h  |   8 +
>  4 files changed, 429 insertions(+), 333 deletions(-)
> 
> -- 
> 2.26.2
> 
> 

Gentle bump on this.

I apparently managed to screw up the git send-email this time, loosing a
bunch of CCs in the process. Sorry about that.

Re: [PATCH v2 00/17] 64bit block-layer

2020-04-28 Thread Vladimir Sementsov-Ogievskiy


29.04.2020 0:33, Eric Blake wrote:

On 4/27/20 3:23 AM, Vladimir Sementsov-Ogievskiy wrote:

Hi all!

v1 was "[RFC 0/3] 64bit block-layer part I", please refer to initial
cover-letter
  https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg08723.html
for motivation.

v2:
patch 02 is unchanged, add Stefan's r-b. Everything other is changed a
lot. What's new:



You'll also want to check my (now-abandoned?) posting from a while back:
https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg02769.html

to see what (if anything) from that attempt can be salvaged.



Hmm, looks close :) will keep it in mind, thanks. Or, may be you want to 
resend? First 4 patches are not needed now, as bdrv_read/bdrv_write already 
dropped.

--
Best regards,
Vladimir

Re: [PATCH v2 01/17] block/throttle-groups: throttle_group_co_io_limits_intercept(): 64bit bytes

2020-04-28 Thread Vladimir Sementsov-Ogievskiy


29.04.2020 1:09, Eric Blake wrote:

On 4/27/20 3:23 AM, Vladimir Sementsov-Ogievskiy wrote:

The function is called from 64bit io handlers, and bytes is just passed
to throttle_account() which is 64bit too (unsigned though). So, let's
convert intermediate argument to 64bit too.


My audit for this patch:

Caller has 32-bit, this patch now causes widening which is safe:
block/block-backend.c: blk_do_preadv() passes 'unsigned int'
block/block-backend.c: blk_do_pwritev_part() passes 'unsigned int'
block/throttle.c: throttle_co_pwrite_zeroes() passes 'int'
block/throttle.c: throttle_co_pdiscard() passes 'int'

Caller has 64-bit, this patch fixes potential bug where pre-patch could narrow, 
except it's easy enough to trace that callers are still capped at 2G actions:
block/throttle.c: throttle_co_preadv() passes 'uint64_t'
block/throttle.c: throttle_co_pwritev() passes 'uint64_t'

Implementation in question: block/throttle-groups.c 
throttle_group_co_io_limits_intercept() takes 'unsigned int bytes' and uses it:
argument to util/throttle.c throttle_account(uint64_t)

All safe: it patches a latent bug, and does not introduce any 64-bit gotchas 
once throttle_co_p{read,write}v are relaxed, and assuming throttle_account() is 
not buggy.


Should I add this all to commit message?





Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  include/block/throttle-groups.h | 2 +-
  block/throttle-groups.c | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)


Reviewed-by: Eric Blake 



Thanks for careful review!

--
Best regards,
Vladimir

Re: [PATCH 0/9] More truncate improvements

2020-04-28 Thread no-reply

Patchew URL: https://patchew.org/QEMU/20200428202905.770727-1-ebl...@redhat.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC  block/file-posix.o
  CC  block/linux-aio.o
/tmp/qemu-test/src/block/parallels.c: In function 'parallels_co_writev':
/tmp/qemu-test/src/block/parallels.c:218:12: error: 'ret' may be used 
uninitialized in this function [-Werror=maybe-uninitialized]
 if (ret < 0) {
^
/tmp/qemu-test/src/block/parallels.c:169:9: note: 'ret' was declared here
 int ret;
 ^
cc1: all warnings being treated as errors
make: *** [block/parallels.o] Error 1
make: *** Waiting for unfinished jobs
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 664, in 
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=17b76be9fd0a432985a07807c0fce033', '-u', 
'1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-_1eoo1y7/src/docker-src.2020-04-28-22.22.17.10859:/var/tmp/qemu:z,ro',
 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=17b76be9fd0a432985a07807c0fce033
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-_1eoo1y7/src'
make: *** [docker-run-test-quick@centos7] Error 2

real2m2.980s
user0m8.950s


The full log is available at
http://patchew.org/logs/20200428202905.770727-1-ebl...@redhat.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH 0/9] More truncate improvements

2020-04-28 Thread no-reply

Patchew URL: https://patchew.org/QEMU/20200428202905.770727-1-ebl...@redhat.com/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#! /bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-mingw@fedora J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC  block/io.o
  CC  block/create.o
/tmp/qemu-test/src/block/parallels.c: In function 'parallels_co_writev':
/tmp/qemu-test/src/block/parallels.c:218:12: error: 'ret' may be used 
uninitialized in this function [-Werror=maybe-uninitialized]
 if (ret < 0) {
^
/tmp/qemu-test/src/block/parallels.c:169:9: note: 'ret' was declared here
 int ret;
 ^~~
cc1: all warnings being treated as errors
make: *** [/tmp/qemu-test/src/rules.mak:69: block/parallels.o] Error 1
make: *** Waiting for unfinished jobs
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 664, in 
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=b6fe059fceb24ee6af44e5ec70624428', '-u', 
'1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-3enfstzu/src/docker-src.2020-04-28-22.21.44.9392:/var/tmp/qemu:z,ro',
 'qemu:fedora', '/var/tmp/qemu/run', 'test-mingw']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=b6fe059fceb24ee6af44e5ec70624428
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-3enfstzu/src'
make: *** [docker-run-test-mingw@fedora] Error 2

real2m45.121s
user0m8.436s


The full log is available at
http://patchew.org/logs/20200428202905.770727-1-ebl...@redhat.com/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH] block: Comment cleanups

2020-04-28 Thread Alberto Garcia

On Tue 28 Apr 2020 11:38:07 PM CEST, Eric Blake wrote:
> It's been a while since we got rid of the sector-based bdrv_read and
> bdrv_write (commit 2e11d756); let's finish the job on a few remaining
> comments.
>
> Signed-off-by: Eric Blake 

Reviewed-by: Alberto Garcia 

Berto

Re: [PATCH v2 01/17] block/throttle-groups: throttle_group_co_io_limits_intercept(): 64bit bytes

2020-04-28 Thread Eric Blake


On 4/27/20 3:23 AM, Vladimir Sementsov-Ogievskiy wrote:

The function is called from 64bit io handlers, and bytes is just passed
to throttle_account() which is 64bit too (unsigned though). So, let's
convert intermediate argument to 64bit too.


My audit for this patch:

Caller has 32-bit, this patch now causes widening which is safe:
block/block-backend.c: blk_do_preadv() passes 'unsigned int'
block/block-backend.c: blk_do_pwritev_part() passes 'unsigned int'
block/throttle.c: throttle_co_pwrite_zeroes() passes 'int'
block/throttle.c: throttle_co_pdiscard() passes 'int'

Caller has 64-bit, this patch fixes potential bug where pre-patch could 
narrow, except it's easy enough to trace that callers are still capped 
at 2G actions:

block/throttle.c: throttle_co_preadv() passes 'uint64_t'
block/throttle.c: throttle_co_pwritev() passes 'uint64_t'

Implementation in question: block/throttle-groups.c 
throttle_group_co_io_limits_intercept() takes 'unsigned int bytes' and 
uses it:

argument to util/throttle.c throttle_account(uint64_t)

All safe: it patches a latent bug, and does not introduce any 64-bit 
gotchas once throttle_co_p{read,write}v are relaxed, and assuming 
throttle_account() is not buggy.




Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  include/block/throttle-groups.h | 2 +-
  block/throttle-groups.c | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)


Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

[PATCH] block: Comment cleanups

2020-04-28 Thread Eric Blake

It's been a while since we got rid of the sector-based bdrv_read and
bdrv_write (commit 2e11d756); let's finish the job on a few remaining
comments.

Signed-off-by: Eric Blake 
---

Hmm - I started this in Nov 2018, and just barely noticed that it has
been sitting in a stale tree on my disk for a while now...

 block/io.c |  3 ++-
 block/qcow2-refcount.c |  2 +-
 block/vvfat.c  | 10 +-
 tests/qemu-iotests/001 |  2 +-
 tests/qemu-iotests/052 |  2 +-
 tests/qemu-iotests/134 |  2 +-
 tests/qemu-iotests/188 |  2 +-
 7 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/block/io.c b/block/io.c
index a4f971423093..7d30e61edc6c 100644
--- a/block/io.c
+++ b/block/io.c
@@ -960,7 +960,7 @@ int bdrv_pwrite_zeroes(BdrvChild *child, int64_t offset,
  * flags are passed through to bdrv_pwrite_zeroes (e.g. BDRV_REQ_MAY_UNMAP,
  * BDRV_REQ_FUA).
  *
- * Returns < 0 on error, 0 on success. For error codes see bdrv_write().
+ * Returns < 0 on error, 0 on success. For error codes see bdrv_pwrite().
  */
 int bdrv_make_zero(BdrvChild *child, BdrvRequestFlags flags)
 {
@@ -994,6 +994,7 @@ int bdrv_make_zero(BdrvChild *child, BdrvRequestFlags flags)
 }
 }

+/* return < 0 if error. See bdrv_pwrite() for the return codes */
 int bdrv_preadv(BdrvChild *child, int64_t offset, QEMUIOVector *qiov)
 {
 int ret;
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index d9650b9b6c50..0457a6060d11 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -2660,7 +2660,7 @@ fail:
  * - 0 if writing to this offset will not affect the mentioned metadata
  * - a positive QCow2MetadataOverlap value indicating one overlapping section
  * - a negative value (-errno) indicating an error while performing a check,
- *   e.g. when bdrv_read failed on QCOW2_OL_INACTIVE_L2
+ *   e.g. when bdrv_pread failed on QCOW2_OL_INACTIVE_L2
  */
 int qcow2_check_metadata_overlap(BlockDriverState *bs, int ign, int64_t offset,
  int64_t size)
diff --git a/block/vvfat.c b/block/vvfat.c
index ab800c4887a2..6d5c090dec4d 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -2148,7 +2148,7 @@ DLOG(checkpoint());
  * - get modified FAT
  * - compare the two FATs (TODO)
  * - get buffer for marking used clusters
- * - recurse direntries from root (using bs->bdrv_read to make
+ * - recurse direntries from root (using bs->bdrv_pread to make
  *sure to get the new data)
  *   - check that the FAT agrees with the size
  *   - count the number of clusters occupied by this directory and
@@ -2913,9 +2913,9 @@ static int handle_deletes(BDRVVVFATState* s)
 /*
  * synchronize mapping with new state:
  *
- * - copy FAT (with bdrv_read)
+ * - copy FAT (with bdrv_pread)
  * - mark all filenames corresponding to mappings as deleted
- * - recurse direntries from root (using bs->bdrv_read)
+ * - recurse direntries from root (using bs->bdrv_pread)
  * - delete files corresponding to mappings marked as deleted
  */
 static int do_commit(BDRVVVFATState* s)
@@ -2935,10 +2935,10 @@ static int do_commit(BDRVVVFATState* s)
 return ret;
 }

-/* copy FAT (with bdrv_read) */
+/* copy FAT (with bdrv_pread) */
 memcpy(s->fat.pointer, s->fat2, 0x200 * s->sectors_per_fat);

-/* recurse direntries from root (using bs->bdrv_read) */
+/* recurse direntries from root (using bs->bdrv_pread) */
 ret = commit_direntries(s, 0, -1);
 if (ret) {
 fprintf(stderr, "Fatal: error while committing (%d)\n", ret);
diff --git a/tests/qemu-iotests/001 b/tests/qemu-iotests/001
index d87a535c3391..696726e45f56 100755
--- a/tests/qemu-iotests/001
+++ b/tests/qemu-iotests/001
@@ -1,6 +1,6 @@
 #!/usr/bin/env bash
 #
-# Test simple read/write using plain bdrv_read/bdrv_write
+# Test simple read/write using plain bdrv_pread/bdrv_pwrite
 #
 # Copyright (C) 2009 Red Hat, Inc.
 #
diff --git a/tests/qemu-iotests/052 b/tests/qemu-iotests/052
index 45a140910da1..8d5c10601fe9 100755
--- a/tests/qemu-iotests/052
+++ b/tests/qemu-iotests/052
@@ -1,6 +1,6 @@
 #!/usr/bin/env bash
 #
-# Test bdrv_read/bdrv_write using BDRV_O_SNAPSHOT
+# Test bdrv_pread/bdrv_pwrite using BDRV_O_SNAPSHOT
 #
 # Copyright (C) 2013 Red Hat, Inc.
 #
diff --git a/tests/qemu-iotests/134 b/tests/qemu-iotests/134
index 5f0fb86211e3..5162d2166248 100755
--- a/tests/qemu-iotests/134
+++ b/tests/qemu-iotests/134
@@ -1,6 +1,6 @@
 #!/usr/bin/env bash
 #
-# Test encrypted read/write using plain bdrv_read/bdrv_write
+# Test encrypted read/write using plain bdrv_pread/bdrv_pwrite
 #
 # Copyright (C) 2015 Red Hat, Inc.
 #
diff --git a/tests/qemu-iotests/188 b/tests/qemu-iotests/188
index afca44df5427..09b9b6083ab3 100755
--- a/tests/qemu-iotests/188
+++ b/tests/qemu-iotests/188
@@ -1,6 +1,6 @@
 #!/usr/bin/env bash
 #
-# Test encrypted read/write using plain bdrv_read/bdrv_write
+# Test encrypted read/write using plain bdrv_pread/bdrv_pwrite
 #
 # Copyright (C) 2017 Red Hat, Inc.
 #
-- 
2.26.2

Re: [PATCH v2 00/17] 64bit block-layer

2020-04-28 Thread Eric Blake


On 4/27/20 3:23 AM, Vladimir Sementsov-Ogievskiy wrote:

Hi all!

v1 was "[RFC 0/3] 64bit block-layer part I", please refer to initial
cover-letter
  https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg08723.html
for motivation.

v2:
patch 02 is unchanged, add Stefan's r-b. Everything other is changed a
lot. What's new:



You'll also want to check my (now-abandoned?) posting from a while back:
https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg02769.html

to see what (if anything) from that attempt can be salvaged.

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v22 4/4] iotests: 287: add qcow2 compression type test

2020-04-28 Thread Eric Blake


On 4/28/20 3:00 PM, Denis Plotnikov wrote:

The test checks fulfilling qcow2 requirements for the compression
type feature and zstd compression type operability.

Signed-off-by: Denis Plotnikov 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Tested-by: Vladimir Sementsov-Ogievskiy 
---


Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v22 3/4] qcow2: add zstd cluster compression

2020-04-28 Thread Eric Blake


On 4/28/20 3:00 PM, Denis Plotnikov wrote:

zstd significantly reduces cluster compression time.
It provides better compression performance maintaining
the same level of the compression ratio in comparison with
zlib, which, at the moment, is the only compression
method available.

The performance test results:
Test compresses and decompresses qemu qcow2 image with just
installed rhel-7.6 guest.
Image cluster size: 64K. Image on disk size: 2.2G

The test was conducted with brd disk to reduce the influence
of disk subsystem to the test results.
The results is given in seconds.

compress cmd:
   time ./qemu-img convert -O qcow2 -c -o compression_type=[zlib|zstd]
   src.img [zlib|zstd]_compressed.img
decompress cmd
   time ./qemu-img convert -O qcow2
   [zlib|zstd]_compressed.img uncompressed.img

compression   decompression
  zlib   zstd   zlib zstd

real 65.5   16.3 (-75 %)1.9  1.6 (-16 %)
user 65.0   15.85.3  2.5
sys   3.30.22.0  2.0

Both ZLIB and ZSTD gave the same compression ratio: 1.57
compressed image size in both cases: 1.4G

Signed-off-by: Denis Plotnikov 
QAPI part:
Acked-by: Markus Armbruster 
---



+static ssize_t qcow2_zstd_compress(void *dest, size_t dest_size,
+   const void *src, size_t src_size)
+{
+ssize_t ret;
+size_t zstd_ret;
+ZSTD_outBuffer output = {
+.dst = dest,
+.size = dest_size,
+.pos = 0
+};



+
+/* make sure we can safely return compressed buffer size with ssize_t */
+assert(output.pos <= SSIZE_MAX);


Seems rather vague, since we know that .pos won't exceed .size which was 
initialized by dest_size which is <= 2M.  A tighter assertion:


assert(output.pos <= dest_size)

seems like it is more realistic to your real constraint (namely, that 
zstd did not overflow dest).



+ret = output.pos;
+out:
+ZSTD_freeCCtx(cctx);
+return ret;
+}
+



+++ b/slirp
@@ -1 +1 @@
-Subproject commit 2faae0f778f818fadc873308f983289df697eb93
+Subproject commit 55ab21c9a36852915b81f1b41ebaf3b6509dd8ba


Umm, you definitely don't want that.

A maintainer could touch up both of those, but I'm not sure which block 
maintainer will be accepting this series.  Maybe wait for that question 
to be answered before trying to post a v23.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

[PATCH 7/9] parallels: Rework truncation logic

2020-04-28 Thread Eric Blake

The parallels driver tries to use truncation for image growth, but can
only do so when reads are guaranteed as zero.  Now that we have a way
to request zero contents from truncation, we can defer the decision to
actual allocation attempts rather than up front, reducing the number
of places that still use bdrv_has_zero_init_truncate.

Signed-off-by: Eric Blake 
---
 block/parallels.c | 23 +++
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 2be92cf41708..9dadaa3217b9 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -196,14 +196,24 @@ static int64_t allocate_clusters(BlockDriverState *bs, 
int64_t sector_num,
 }
 if (s->data_end + space > (len >> BDRV_SECTOR_BITS)) {
 space += s->prealloc_size;
+/*
+ * We require the expanded size to read back as zero. If the
+ * user permitted truncation, we try that; but if it fails, we
+ * force the safer-but-slower fallocate.
+ */
+if (s->prealloc_mode == PRL_PREALLOC_MODE_TRUNCATE) {
+ret = bdrv_truncate(bs->file,
+(s->data_end + space) << BDRV_SECTOR_BITS,
+false, PREALLOC_MODE_OFF, BDRV_REQ_ZERO_WRITE,
+NULL);
+if (ret == -ENOTSUP) {
+s->prealloc_mode = PRL_PREALLOC_MODE_FALLOCATE;
+}
+}
 if (s->prealloc_mode == PRL_PREALLOC_MODE_FALLOCATE) {
 ret = bdrv_pwrite_zeroes(bs->file,
  s->data_end << BDRV_SECTOR_BITS,
  space << BDRV_SECTOR_BITS, 0);
-} else {
-ret = bdrv_truncate(bs->file,
-(s->data_end + space) << BDRV_SECTOR_BITS,
-false, PREALLOC_MODE_OFF, 0, NULL);
 }
 if (ret < 0) {
 return ret;
@@ -828,6 +838,7 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 qemu_opt_get_size_del(opts, PARALLELS_OPT_PREALLOC_SIZE, 0);
 s->prealloc_size = MAX(s->tracks, s->prealloc_size >> BDRV_SECTOR_BITS);
 buf = qemu_opt_get_del(opts, PARALLELS_OPT_PREALLOC_MODE);
+/* prealloc_mode can be downgraded later during allocate_clusters */
 s->prealloc_mode = qapi_enum_parse(_mode_lookup, buf,
PRL_PREALLOC_MODE_FALLOCATE,
_err);
@@ -836,10 +847,6 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 goto fail_options;
 }

-if (!bdrv_has_zero_init_truncate(bs->file->bs)) {
-s->prealloc_mode = PRL_PREALLOC_MODE_FALLOCATE;
-}
-
 if ((flags & BDRV_O_RDWR) && !(flags & BDRV_O_INACTIVE)) {
 s->header->inuse = cpu_to_le32(HEADER_INUSE_MAGIC);
 ret = parallels_update_header(bs);
-- 
2.26.2

[PATCH 8/9] vhdx: Rework truncation logic

2020-04-28 Thread Eric Blake

The vhdx driver uses truncation for image growth, with a special case
for blocks that already read as zero but which are only being
partially written.  But with a bit of rearranging, it's just as easy
to defer the decision on whether truncation resulted in zeroes to the
actual allocation attempt, reducing the number of places that still
use bdrv_has_zero_init_truncate.

Signed-off-by: Eric Blake 
---
 block/vhdx.c | 89 ++--
 1 file changed, 51 insertions(+), 38 deletions(-)

diff --git a/block/vhdx.c b/block/vhdx.c
index 21497f731878..fe544abaf52a 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -1241,12 +1241,16 @@ exit:
 /*
  * Allocate a new payload block at the end of the file.
  *
- * Allocation will happen at 1MB alignment inside the file
+ * Allocation will happen at 1MB alignment inside the file.
+ *
+ * If @need_zero is set on entry but not cleared on return, then truncation
+ * could not guarantee that the new portion reads as zero, and the caller
+ * will take care of it instead.
  *
  * Returns the file offset start of the new payload block
  */
 static int vhdx_allocate_block(BlockDriverState *bs, BDRVVHDXState *s,
-uint64_t *new_offset)
+   uint64_t *new_offset, bool *need_zero)
 {
 int64_t current_len;

@@ -1263,6 +1267,17 @@ static int vhdx_allocate_block(BlockDriverState *bs, 
BDRVVHDXState *s,
 return -EINVAL;
 }

+if (*need_zero) {
+int ret;
+
+ret = bdrv_truncate(bs->file, *new_offset + s->block_size, false,
+PREALLOC_MODE_OFF, BDRV_REQ_ZERO_WRITE, NULL);
+if (ret != -ENOTSUP) {
+*need_zero = false;
+return ret;
+}
+}
+
 return bdrv_truncate(bs->file, *new_offset + s->block_size, false,
  PREALLOC_MODE_OFF, 0, NULL);
 }
@@ -1356,18 +1371,38 @@ static coroutine_fn int vhdx_co_writev(BlockDriverState 
*bs, int64_t sector_num,
 /* in this case, we need to preserve zero writes for
  * data that is not part of this write, so we must pad
  * the rest of the buffer to zeroes */
-
-/* if we are on a posix system with ftruncate() that extends
- * a file, then it is zero-filled for us.  On Win32, the raw
- * layer uses SetFilePointer and SetFileEnd, which does not
- * zero fill AFAIK */
-
-/* Queue another write of zero buffers if the underlying file
- * does not zero-fill on file extension */
-
-if (bdrv_has_zero_init_truncate(bs->file->bs) == 0) {
-use_zero_buffers = true;
-
+use_zero_buffers = true;
+/* fall through */
+case PAYLOAD_BLOCK_NOT_PRESENT: /* fall through */
+case PAYLOAD_BLOCK_UNMAPPED:
+case PAYLOAD_BLOCK_UNMAPPED_v095:
+case PAYLOAD_BLOCK_UNDEFINED:
+bat_prior_offset = sinfo.file_offset;
+ret = vhdx_allocate_block(bs, s, _offset,
+  _zero_buffers);
+if (ret < 0) {
+goto exit;
+}
+/*
+ * once we support differencing files, this may also be
+ * partially present
+ */
+/* update block state to the newly specified state */
+vhdx_update_bat_table_entry(bs, s, , _entry,
+_entry_offset,
+PAYLOAD_BLOCK_FULLY_PRESENT);
+bat_update = true;
+/*
+ * Since we just allocated a block, file_offset is the
+ * beginning of the payload block. It needs to be the
+ * write address, which includes the offset into the
+ * block, unless the entire block needs to read as
+ * zeroes but truncation was not able to provide them,
+ * in which case we need to fill in the rest.
+ */
+if (!use_zero_buffers) {
+sinfo.file_offset += sinfo.block_offset;
+} else {
 /* zero fill the front, if any */
 if (sinfo.block_offset) {
 iov1.iov_len = sinfo.block_offset;
@@ -1379,7 +1414,7 @@ static coroutine_fn int vhdx_co_writev(BlockDriverState 
*bs, int64_t sector_num,
 }

 /* our actual data */
-qemu_iovec_concat(_qiov, qiov,  bytes_done,
+qemu_iovec_concat(_qiov, qiov, bytes_done,
   sinfo.bytes_avail);

 /* zero fill the back, if any */
@@ -1394,29 +1429,7 @@ static coroutine_fn int vhdx_co_writev(BlockDriverState 
*bs, int64_t

[PATCH 9/9] block: Drop unused .bdrv_has_zero_init_truncate

2020-04-28 Thread Eric Blake

Now that there are no clients of bdrv_has_zero_init_truncate, none of
the drivers need to worry about providing it.

What's more, this eliminates a source of some confusion: a literal
reading of the documentation as written in ceaca56f and implemented in
commit 1dcaf527 claims that a driver which returns 0 for
bdrv_has_zero_init_truncate() must not return 1 for
bdrv_has_zero_init(); this condition was violated for parallels, qcow,
and sometimes for vdi, although in practice it did not matter since
those drivers also lacked .bdrv_co_truncate.

Signed-off-by: Eric Blake 
---
 include/block/block.h |  1 -
 include/block/block_int.h |  7 ---
 block.c   | 21 -
 block/file-posix.c|  1 -
 block/file-win32.c|  1 -
 block/nfs.c   |  1 -
 block/qcow2.c |  1 -
 block/qed.c   |  1 -
 block/raw-format.c|  6 --
 block/rbd.c   |  1 -
 block/sheepdog.c  |  3 ---
 block/ssh.c   |  1 -
 12 files changed, 45 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 8b62429aa4a9..4de8d8f8a6b2 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -430,7 +430,6 @@ int bdrv_pdiscard(BdrvChild *child, int64_t offset, int64_t 
bytes);
 int bdrv_co_pdiscard(BdrvChild *child, int64_t offset, int64_t bytes);
 int bdrv_has_zero_init_1(BlockDriverState *bs);
 int bdrv_has_zero_init(BlockDriverState *bs);
-int bdrv_has_zero_init_truncate(BlockDriverState *bs);
 bool bdrv_unallocated_blocks_are_zero(BlockDriverState *bs);
 bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
 int bdrv_block_status(BlockDriverState *bs, int64_t offset,
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 92335f33c750..df6d0273d679 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -449,16 +449,9 @@ struct BlockDriver {
 /*
  * Returns 1 if newly created images are guaranteed to contain only
  * zeros, 0 otherwise.
- * Must return 0 if .bdrv_has_zero_init_truncate() returns 0.
  */
 int (*bdrv_has_zero_init)(BlockDriverState *bs);

-/*
- * Returns 1 if new areas added by growing the image with
- * PREALLOC_MODE_OFF contain only zeros, 0 otherwise.
- */
-int (*bdrv_has_zero_init_truncate)(BlockDriverState *bs);
-
 /* Remove fd handlers, timers, and other event loop callbacks so the event
  * loop is no longer in use.  Called with no in-flight requests and in
  * depth-first traversal order with parents before child nodes.
diff --git a/block.c b/block.c
index 03cc5813a292..fea646d33dc3 100644
--- a/block.c
+++ b/block.c
@@ -5291,27 +5291,6 @@ int bdrv_has_zero_init(BlockDriverState *bs)
 return 0;
 }

-int bdrv_has_zero_init_truncate(BlockDriverState *bs)
-{
-if (!bs->drv) {
-return 0;
-}
-
-if (bs->backing) {
-/* Depends on the backing image length, but better safe than sorry */
-return 0;
-}
-if (bs->drv->bdrv_has_zero_init_truncate) {
-return bs->drv->bdrv_has_zero_init_truncate(bs);
-}
-if (bs->file && bs->drv->is_filter) {
-return bdrv_has_zero_init_truncate(bs->file->bs);
-}
-
-/* safe default */
-return 0;
-}
-
 bool bdrv_unallocated_blocks_are_zero(BlockDriverState *bs)
 {
 BlockDriverInfo bdi;
diff --git a/block/file-posix.c b/block/file-posix.c
index 1dca220a81ba..84012be18f4d 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -3099,7 +3099,6 @@ BlockDriver bdrv_file = {
 .bdrv_co_create = raw_co_create,
 .bdrv_co_create_opts = raw_co_create_opts,
 .bdrv_has_zero_init = bdrv_has_zero_init_1,
-.bdrv_has_zero_init_truncate = bdrv_has_zero_init_1,
 .bdrv_co_block_status = raw_co_block_status,
 .bdrv_co_invalidate_cache = raw_co_invalidate_cache,
 .bdrv_co_pwrite_zeroes = raw_co_pwrite_zeroes,
diff --git a/block/file-win32.c b/block/file-win32.c
index fa569685d8bc..221aaf713e24 100644
--- a/block/file-win32.c
+++ b/block/file-win32.c
@@ -641,7 +641,6 @@ BlockDriver bdrv_file = {
 .bdrv_close = raw_close,
 .bdrv_co_create_opts = raw_co_create_opts,
 .bdrv_has_zero_init = bdrv_has_zero_init_1,
-.bdrv_has_zero_init_truncate = bdrv_has_zero_init_1,

 .bdrv_aio_preadv= raw_aio_preadv,
 .bdrv_aio_pwritev   = raw_aio_pwritev,
diff --git a/block/nfs.c b/block/nfs.c
index b93989265630..2d3474c1e051 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -876,7 +876,6 @@ static BlockDriver bdrv_nfs = {
 .create_opts= _create_opts,

 .bdrv_has_zero_init = nfs_has_zero_init,
-.bdrv_has_zero_init_truncate= nfs_has_zero_init,
 .bdrv_get_allocated_file_size   = nfs_get_allocated_file_size,
 .bdrv_co_truncate   = nfs_file_co_truncate,

diff --git a/block/qcow2.c b/block/qcow2.c
index 2ba0b17c391c..9acdbaeb3ab8 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -5596,7 +5596,6 @@

[PATCH 5/9] sheepdog: Support BDRV_REQ_ZERO_WRITE for truncate

2020-04-28 Thread Eric Blake

Our .bdrv_has_zero_init_truncate always returns 1 because sheepdog
always 0-fills; we can use that same knowledge to implement
BDRV_REQ_ZERO_WRITE by ignoring it.

Signed-off-by: Eric Blake 
---
 block/sheepdog.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index ef0a6e743e27..26fd22c7f07d 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -1654,6 +1654,7 @@ static int sd_open(BlockDriverState *bs, QDict *options, 
int flags,
 memcpy(>inode, buf, sizeof(s->inode));

 bs->total_sectors = s->inode.vdi_size / BDRV_SECTOR_SIZE;
+bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
 pstrcpy(s->name, sizeof(s->name), vdi);
 qemu_co_mutex_init(>lock);
 qemu_co_mutex_init(>queue_lock);
-- 
2.26.2

[PATCH 4/9] rbd: Support BDRV_REQ_ZERO_WRITE for truncate

2020-04-28 Thread Eric Blake

Our .bdrv_has_zero_init_truncate always returns 1 because rbd always
0-fills; we can use that same knowledge to implement
BDRV_REQ_ZERO_WRITE by ignoring it.

Signed-off-by: Eric Blake 
---
 block/rbd.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/rbd.c b/block/rbd.c
index f2d52091c702..331c45adb2b2 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -817,6 +817,9 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 }

+/* When extending regular files, we get zeros from the OS */
+bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
+
 r = 0;
 goto out;

-- 
2.26.2

[PATCH 6/9] ssh: Support BDRV_REQ_ZERO_WRITE for truncate

2020-04-28 Thread Eric Blake

Our .bdrv_has_zero_init_truncate can detect when the remote side
always zero fills; we can reuse that same knowledge to implement
BDRV_REQ_ZERO_WRITE by ignoring it when the server gives it to us for
free.

Signed-off-by: Eric Blake 
---
 block/ssh.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/block/ssh.c b/block/ssh.c
index 9eb33df8598c..f9e08a490069 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -883,6 +883,10 @@ static int ssh_file_open(BlockDriverState *bs, QDict 
*options, int bdrv_flags,
 /* Go non-blocking. */
 ssh_set_blocking(s->session, 0);

+if (s->attrs->type == SSH_FILEXFER_TYPE_REGULAR) {
+bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
+}
+
 qapi_free_BlockdevOptionsSsh(opts);

 return 0;
-- 
2.26.2

[PATCH 1/9] gluster: Drop useless has_zero_init callback

2020-04-28 Thread Eric Blake

block.c already defaults to 0 if we don't provide a callback; there's
no need to write a callback that always fails.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Alberto Garcia 
---
 block/gluster.c | 14 --
 1 file changed, 14 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index d06df900f692..31233cac696a 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -1359,12 +1359,6 @@ static int64_t 
qemu_gluster_allocated_file_size(BlockDriverState *bs)
 }
 }

-static int qemu_gluster_has_zero_init(BlockDriverState *bs)
-{
-/* GlusterFS volume could be backed by a block device */
-return 0;
-}
-
 /*
  * Find allocation range in @bs around offset @start.
  * May change underlying file descriptor's file offset.
@@ -1569,8 +1563,6 @@ static BlockDriver bdrv_gluster = {
 .bdrv_co_readv= qemu_gluster_co_readv,
 .bdrv_co_writev   = qemu_gluster_co_writev,
 .bdrv_co_flush_to_disk= qemu_gluster_co_flush_to_disk,
-.bdrv_has_zero_init   = qemu_gluster_has_zero_init,
-.bdrv_has_zero_init_truncate  = qemu_gluster_has_zero_init,
 #ifdef CONFIG_GLUSTERFS_DISCARD
 .bdrv_co_pdiscard = qemu_gluster_co_pdiscard,
 #endif
@@ -1601,8 +1593,6 @@ static BlockDriver bdrv_gluster_tcp = {
 .bdrv_co_readv= qemu_gluster_co_readv,
 .bdrv_co_writev   = qemu_gluster_co_writev,
 .bdrv_co_flush_to_disk= qemu_gluster_co_flush_to_disk,
-.bdrv_has_zero_init   = qemu_gluster_has_zero_init,
-.bdrv_has_zero_init_truncate  = qemu_gluster_has_zero_init,
 #ifdef CONFIG_GLUSTERFS_DISCARD
 .bdrv_co_pdiscard = qemu_gluster_co_pdiscard,
 #endif
@@ -1633,8 +1623,6 @@ static BlockDriver bdrv_gluster_unix = {
 .bdrv_co_readv= qemu_gluster_co_readv,
 .bdrv_co_writev   = qemu_gluster_co_writev,
 .bdrv_co_flush_to_disk= qemu_gluster_co_flush_to_disk,
-.bdrv_has_zero_init   = qemu_gluster_has_zero_init,
-.bdrv_has_zero_init_truncate  = qemu_gluster_has_zero_init,
 #ifdef CONFIG_GLUSTERFS_DISCARD
 .bdrv_co_pdiscard = qemu_gluster_co_pdiscard,
 #endif
@@ -1671,8 +1659,6 @@ static BlockDriver bdrv_gluster_rdma = {
 .bdrv_co_readv= qemu_gluster_co_readv,
 .bdrv_co_writev   = qemu_gluster_co_writev,
 .bdrv_co_flush_to_disk= qemu_gluster_co_flush_to_disk,
-.bdrv_has_zero_init   = qemu_gluster_has_zero_init,
-.bdrv_has_zero_init_truncate  = qemu_gluster_has_zero_init,
 #ifdef CONFIG_GLUSTERFS_DISCARD
 .bdrv_co_pdiscard = qemu_gluster_co_pdiscard,
 #endif
-- 
2.26.2

[PATCH 0/9] More truncate improvements

2020-04-28 Thread Eric Blake

Based-on: <20200424125448.63318-1-kw...@redhat.com>
[PATCH v7 00/10] block: Fix resize (extending) of short overlays

After reviewing Kevin's work, I questioned if we had a redundancy with
bdrv_has_zero_init_truncate.  It turns out we do, and this is the result.

Patch 1 has been previously posted [1] and reviewed, the rest is new.
I did not address Neils' comment that modern gluster also always
0-initializes [2], as I am not set up to verify it (my changes to the
other drivers are semantic no-ops, so I don't feel as bad about
posting them with less rigourous testing).

[1] https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg08070.html
[2] https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg04266.html

Eric Blake (9):
  gluster: Drop useless has_zero_init callback
  file-win32: Support BDRV_REQ_ZERO_WRITE for truncate
  nfs: Support BDRV_REQ_ZERO_WRITE for truncate
  rbd: Support BDRV_REQ_ZERO_WRITE for truncate
  sheepdog: Support BDRV_REQ_ZERO_WRITE for truncate
  ssh: Support BDRV_REQ_ZERO_WRITE for truncate
  parallels: Rework truncation logic
  vhdx: Rework truncation logic
  block: Drop unused .bdrv_has_zero_init_truncate

 include/block/block.h |  1 -
 include/block/block_int.h |  7 ---
 block.c   | 21 -
 block/file-posix.c|  1 -
 block/file-win32.c|  4 +-
 block/gluster.c   | 14 --
 block/nfs.c   |  4 +-
 block/parallels.c | 23 ++
 block/qcow2.c |  1 -
 block/qed.c   |  1 -
 block/raw-format.c|  6 ---
 block/rbd.c   |  4 +-
 block/sheepdog.c  |  4 +-
 block/ssh.c   |  5 ++-
 block/vhdx.c  | 89 ++-
 15 files changed, 80 insertions(+), 105 deletions(-)

-- 
2.26.2

[PATCH 2/9] file-win32: Support BDRV_REQ_ZERO_WRITE for truncate

2020-04-28 Thread Eric Blake

When using bdrv_file, .bdrv_has_zero_init_truncate always returns 1;
therefore, we can behave just like file-posix, and always implement
BDRV_REQ_ZERO_WRITE by ignoring it since the OS gives it to us for
free (note that file-posix.c had to use an 'if' because it shared code
between regular files and block devices, but in file-win32.c,
bdrv_host_device uses a separate .bdrv_file_open).

Signed-off-by: Eric Blake 
---
 block/file-win32.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/file-win32.c b/block/file-win32.c
index a6b0dda5c302..fa569685d8bc 100644
--- a/block/file-win32.c
+++ b/block/file-win32.c
@@ -408,6 +408,9 @@ static int raw_open(BlockDriverState *bs, QDict *options, 
int flags,
 win32_aio_attach_aio_context(s->aio, bdrv_get_aio_context(bs));
 }

+/* When extending regular files, we get zeros from the OS */
+bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
+
 ret = 0;
 fail:
 qemu_opts_del(opts);
-- 
2.26.2

[PATCH 3/9] nfs: Support BDRV_REQ_ZERO_WRITE for truncate

2020-04-28 Thread Eric Blake

Our .bdrv_has_zero_init_truncate returns 1 if we detect that the OS
always 0-fills; we can use that same knowledge to implement
BDRV_REQ_ZERO_WRITE by ignoring it when the OS gives it to us for
free.

Signed-off-by: Eric Blake 
---
 block/nfs.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/nfs.c b/block/nfs.c
index 2393fbfe6bcc..b93989265630 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -623,6 +623,9 @@ static int nfs_file_open(BlockDriverState *bs, QDict 
*options, int flags,
 }

 bs->total_sectors = ret;
+if (client->has_zero_init) {
+bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
+}
 ret = 0;
 return ret;
 }
-- 
2.26.2

[PATCH v22 1/4] qcow2: introduce compression type feature

2020-04-28 Thread Denis Plotnikov

The patch adds some preparation parts for incompatible compression type
feature to qcow2 allowing the use different compression methods for
image clusters (de)compressing.

It is implied that the compression type is set on the image creation and
can be changed only later by image conversion, thus compression type
defines the only compression algorithm used for the image, and thus,
for all image clusters.

The goal of the feature is to add support of other compression methods
to qcow2. For example, ZSTD which is more effective on compression than ZLIB.

The default compression is ZLIB. Images created with ZLIB compression type
are backward compatible with older qemu versions.

Adding of the compression type breaks a number of tests because now the
compression type is reported on image creation and there are some changes
in the qcow2 header in size and offsets.

The tests are fixed in the following ways:
* filter out compression_type for many tests
* fix header size, feature table size and backing file offset
  affected tests: 031, 036, 061, 080
  header_size +=8: 1 byte compression type
   7 bytes padding
  feature_table += 48: incompatible feature compression type
  backing_file_offset += 56 (8 + 48 -> header_change + feature_table_change)
* add "compression type" for test output matching when it isn't filtered
  affected tests: 049, 060, 061, 065, 144, 182, 242, 255

Signed-off-by: Denis Plotnikov 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Eric Blake 
QAPI part:
Acked-by: Markus Armbruster 
---
 qapi/block-core.json |  22 +-
 block/qcow2.h|  20 +-
 include/block/block_int.h|   1 +
 block/qcow2.c| 113 +++
 tests/qemu-iotests/031.out   |  14 ++--
 tests/qemu-iotests/036.out   |   4 +-
 tests/qemu-iotests/049.out   | 102 ++--
 tests/qemu-iotests/060.out   |   1 +
 tests/qemu-iotests/061.out   |  34 ++
 tests/qemu-iotests/065   |  28 +---
 tests/qemu-iotests/080   |   2 +-
 tests/qemu-iotests/144.out   |   4 +-
 tests/qemu-iotests/182.out   |   2 +-
 tests/qemu-iotests/242.out   |   5 ++
 tests/qemu-iotests/255.out   |   8 +--
 tests/qemu-iotests/common.filter |   3 +-
 16 files changed, 267 insertions(+), 96 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 943df1926a..1522e2983f 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -78,6 +78,8 @@
 #
 # @bitmaps: A list of qcow2 bitmap details (since 4.0)
 #
+# @compression-type: the image cluster compression method (since 5.1)
+#
 # Since: 1.7
 ##
 { 'struct': 'ImageInfoSpecificQCow2',
@@ -89,7 +91,8 @@
   '*corrupt': 'bool',
   'refcount-bits': 'int',
   '*encrypt': 'ImageInfoSpecificQCow2Encryption',
-  '*bitmaps': ['Qcow2BitmapInfo']
+  '*bitmaps': ['Qcow2BitmapInfo'],
+  'compression-type': 'Qcow2CompressionType'
   } }
 
 ##
@@ -4284,6 +4287,18 @@
   'data': [ 'v2', 'v3' ] }
 
 
+##
+# @Qcow2CompressionType:
+#
+# Compression type used in qcow2 image file
+#
+# @zlib: zlib compression, see 
+#
+# Since: 5.1
+##
+{ 'enum': 'Qcow2CompressionType',
+  'data': [ 'zlib' ] }
+
 ##
 # @BlockdevCreateOptionsQcow2:
 #
@@ -4307,6 +4322,8 @@
 # allowed values: off, falloc, full, metadata)
 # @lazy-refcounts: True if refcounts may be updated lazily (default: off)
 # @refcount-bits: Width of reference counts in bits (default: 16)
+# @compression-type: The image cluster compression method
+#(default: zlib, since 5.1)
 #
 # Since: 2.12
 ##
@@ -4322,7 +4339,8 @@
 '*cluster-size':'size',
 '*preallocation':   'PreallocMode',
 '*lazy-refcounts':  'bool',
-'*refcount-bits':   'int' } }
+'*refcount-bits':   'int',
+'*compression-type':'Qcow2CompressionType' } }
 
 ##
 # @BlockdevCreateOptionsQed:
diff --git a/block/qcow2.h b/block/qcow2.h
index f4de0a27d5..6a8b82e6cc 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -146,8 +146,16 @@ typedef struct QCowHeader {
 
 uint32_t refcount_order;
 uint32_t header_length;
+
+/* Additional fields */
+uint8_t compression_type;
+
+/* header must be a multiple of 8 */
+uint8_t padding[7];
 } QEMU_PACKED QCowHeader;
 
+QEMU_BUILD_BUG_ON(!QEMU_IS_ALIGNED(sizeof(QCowHeader), 8));
+
 typedef struct QEMU_PACKED QCowSnapshotHeader {
 /* header is 8 byte aligned */
 uint64_t l1_table_offset;
@@ -216,13 +224,16 @@ enum {
 QCOW2_INCOMPAT_DIRTY_BITNR  = 0,
 QCOW2_INCOMPAT_CORRUPT_BITNR= 1,
 QCOW2_INCOMPAT_DATA_FILE_BITNR  = 2,
+QCOW2_INCOMPAT_COMPRESSION_BITNR = 3,
 QCOW2_INCOMPAT_DIRTY= 1 << QCOW2_INCOMPAT_DIRTY_BITNR,
 QCOW2_INCOMPAT_CORRUPT  = 1 << QCOW2_INCOMPAT_CORRUPT_BITNR,
 QCOW2_INCOMPAT_DATA_FILE= 1 <<

[PATCH v22 2/4] qcow2: rework the cluster compression routine

2020-04-28 Thread Denis Plotnikov

The patch enables processing the image compression type defined
for the image and chooses an appropriate method for image clusters
(de)compression.

Signed-off-by: Denis Plotnikov 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Alberto Garcia 
---
 block/qcow2-threads.c | 71 ---
 1 file changed, 60 insertions(+), 11 deletions(-)

diff --git a/block/qcow2-threads.c b/block/qcow2-threads.c
index a68126f291..7dbaf53489 100644
--- a/block/qcow2-threads.c
+++ b/block/qcow2-threads.c
@@ -74,7 +74,9 @@ typedef struct Qcow2CompressData {
 } Qcow2CompressData;
 
 /*
- * qcow2_compress()
+ * qcow2_zlib_compress()
+ *
+ * Compress @src_size bytes of data using zlib compression method
  *
  * @dest - destination buffer, @dest_size bytes
  * @src - source buffer, @src_size bytes
@@ -83,8 +85,8 @@ typedef struct Qcow2CompressData {
  *  -ENOMEM destination buffer is not enough to store compressed data
  *  -EIOon any other error
  */
-static ssize_t qcow2_compress(void *dest, size_t dest_size,
-  const void *src, size_t src_size)
+static ssize_t qcow2_zlib_compress(void *dest, size_t dest_size,
+   const void *src, size_t src_size)
 {
 ssize_t ret;
 z_stream strm;
@@ -119,10 +121,10 @@ static ssize_t qcow2_compress(void *dest, size_t 
dest_size,
 }
 
 /*
- * qcow2_decompress()
+ * qcow2_zlib_decompress()
  *
  * Decompress some data (not more than @src_size bytes) to produce exactly
- * @dest_size bytes.
+ * @dest_size bytes using zlib compression method
  *
  * @dest - destination buffer, @dest_size bytes
  * @src - source buffer, @src_size bytes
@@ -130,8 +132,8 @@ static ssize_t qcow2_compress(void *dest, size_t dest_size,
  * Returns: 0 on success
  *  -EIO on fail
  */
-static ssize_t qcow2_decompress(void *dest, size_t dest_size,
-const void *src, size_t src_size)
+static ssize_t qcow2_zlib_decompress(void *dest, size_t dest_size,
+ const void *src, size_t src_size)
 {
 int ret;
 z_stream strm;
@@ -191,20 +193,67 @@ qcow2_co_do_compress(BlockDriverState *bs, void *dest, 
size_t dest_size,
 return arg.ret;
 }
 
+/*
+ * qcow2_co_compress()
+ *
+ * Compress @src_size bytes of data using the compression
+ * method defined by the image compression type
+ *
+ * @dest - destination buffer, @dest_size bytes
+ * @src - source buffer, @src_size bytes
+ *
+ * Returns: compressed size on success
+ *  a negative error code on failure
+ */
 ssize_t coroutine_fn
 qcow2_co_compress(BlockDriverState *bs, void *dest, size_t dest_size,
   const void *src, size_t src_size)
 {
-return qcow2_co_do_compress(bs, dest, dest_size, src, src_size,
-qcow2_compress);
+BDRVQcow2State *s = bs->opaque;
+Qcow2CompressFunc fn;
+
+switch (s->compression_type) {
+case QCOW2_COMPRESSION_TYPE_ZLIB:
+fn = qcow2_zlib_compress;
+break;
+
+default:
+abort();
+}
+
+return qcow2_co_do_compress(bs, dest, dest_size, src, src_size, fn);
 }
 
+/*
+ * qcow2_co_decompress()
+ *
+ * Decompress some data (not more than @src_size bytes) to produce exactly
+ * @dest_size bytes using the compression method defined by the image
+ * compression type
+ *
+ * @dest - destination buffer, @dest_size bytes
+ * @src - source buffer, @src_size bytes
+ *
+ * Returns: 0 on success
+ *  a negative error code on failure
+ */
 ssize_t coroutine_fn
 qcow2_co_decompress(BlockDriverState *bs, void *dest, size_t dest_size,
 const void *src, size_t src_size)
 {
-return qcow2_co_do_compress(bs, dest, dest_size, src, src_size,
-qcow2_decompress);
+BDRVQcow2State *s = bs->opaque;
+Qcow2CompressFunc fn;
+
+switch (s->compression_type) {
+case QCOW2_COMPRESSION_TYPE_ZLIB:
+fn = qcow2_zlib_decompress;
+break;
+
+default:
+abort();
+}
+
+return qcow2_co_do_compress(bs, dest, dest_size, src, src_size, fn);
 }
 
 
-- 
2.17.0

[PATCH v22 3/4] qcow2: add zstd cluster compression

2020-04-28 Thread Denis Plotnikov

zstd significantly reduces cluster compression time.
It provides better compression performance maintaining
the same level of the compression ratio in comparison with
zlib, which, at the moment, is the only compression
method available.

The performance test results:
Test compresses and decompresses qemu qcow2 image with just
installed rhel-7.6 guest.
Image cluster size: 64K. Image on disk size: 2.2G

The test was conducted with brd disk to reduce the influence
of disk subsystem to the test results.
The results is given in seconds.

compress cmd:
  time ./qemu-img convert -O qcow2 -c -o compression_type=[zlib|zstd]
  src.img [zlib|zstd]_compressed.img
decompress cmd
  time ./qemu-img convert -O qcow2
  [zlib|zstd]_compressed.img uncompressed.img

   compression   decompression
 zlib   zstd   zlib zstd

real 65.5   16.3 (-75 %)1.9  1.6 (-16 %)
user 65.0   15.85.3  2.5
sys   3.30.22.0  2.0

Both ZLIB and ZSTD gave the same compression ratio: 1.57
compressed image size in both cases: 1.4G

Signed-off-by: Denis Plotnikov 
QAPI part:
Acked-by: Markus Armbruster 
---
 docs/interop/qcow2.txt |   1 +
 configure  |   2 +-
 qapi/block-core.json   |   3 +-
 block/qcow2-threads.c  | 169 +
 block/qcow2.c  |   7 ++
 slirp  |   2 +-
 6 files changed, 181 insertions(+), 3 deletions(-)

diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index 640e0eca40..18a77f737e 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -209,6 +209,7 @@ version 2.
 
 Available compression type values:
 0: zlib 
+1: zstd 
 
 
 === Header padding ===
diff --git a/configure b/configure
index 23b5e93752..4e3a1690ea 100755
--- a/configure
+++ b/configure
@@ -1861,7 +1861,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   lzfse   support of lzfse compression library
   (for reading lzfse-compressed dmg images)
   zstdsupport for zstd compression library
-  (for migration compression)
+  (for migration compression and qcow2 cluster compression)
   seccomp seccomp support
   coroutine-pool  coroutine freelist (better performance)
   glusterfs   GlusterFS backend
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 1522e2983f..6fbacddab2 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4293,11 +4293,12 @@
 # Compression type used in qcow2 image file
 #
 # @zlib: zlib compression, see 
+# @zstd: zstd compression, see 
 #
 # Since: 5.1
 ##
 { 'enum': 'Qcow2CompressionType',
-  'data': [ 'zlib' ] }
+  'data': [ 'zlib', { 'name': 'zstd', 'if': 'defined(CONFIG_ZSTD)' } ] }
 
 ##
 # @BlockdevCreateOptionsQcow2:
diff --git a/block/qcow2-threads.c b/block/qcow2-threads.c
index 7dbaf53489..a0b12e1b15 100644
--- a/block/qcow2-threads.c
+++ b/block/qcow2-threads.c
@@ -28,6 +28,11 @@
 #define ZLIB_CONST
 #include 
 
+#ifdef CONFIG_ZSTD
+#include 
+#include 
+#endif
+
 #include "qcow2.h"
 #include "block/thread-pool.h"
 #include "crypto.h"
@@ -166,6 +171,160 @@ static ssize_t qcow2_zlib_decompress(void *dest, size_t 
dest_size,
 return ret;
 }
 
+#ifdef CONFIG_ZSTD
+
+/*
+ * qcow2_zstd_compress()
+ *
+ * Compress @src_size bytes of data using zstd compression method
+ *
+ * @dest - destination buffer, @dest_size bytes
+ * @src - source buffer, @src_size bytes
+ *
+ * Returns: compressed size on success
+ *  -ENOMEM destination buffer is not enough to store compressed data
+ *  -EIOon any other error
+ */
+static ssize_t qcow2_zstd_compress(void *dest, size_t dest_size,
+   const void *src, size_t src_size)
+{
+ssize_t ret;
+size_t zstd_ret;
+ZSTD_outBuffer output = {
+.dst = dest,
+.size = dest_size,
+.pos = 0
+};
+ZSTD_inBuffer input = {
+.src = src,
+.size = src_size,
+.pos = 0
+};
+ZSTD_CCtx *cctx = ZSTD_createCCtx();
+
+if (!cctx) {
+return -EIO;
+}
+/*
+ * Use the zstd streamed interface for symmetry with decompression,
+ * where streaming is essential since we don't record the exact
+ * compressed size.
+ *
+ * ZSTD_compressStream2() tries to compress everything it could
+ * with a single call. Although, ZSTD docs says that:
+ * "You must continue calling ZSTD_compressStream2() with ZSTD_e_end
+ * until it returns 0, at which point you are free to start a new frame",
+ * in out tests we saw the only case when it returned with >0 -
+ * when the output

[PATCH v22 0/4] implement zstd cluster compression method

2020-04-28 Thread Denis Plotnikov

v22:
   03: remove assignemnt in if condition

v21:
   03:
   * remove the loop on compression [Max]
   * use designated initializers [Max]
   04:
   * don't erase user's options [Max]
   * use _rm_test_img [Max]
   * add unsupported qcow2 options [Max]

v20:
   04: fix a number of flaws [Vladimir]
   * don't use $RAND_FILE passing to qemu-io,
 so check $TEST_DIR is redundant
   * re-arrage $RAND_FILE writing
   * fix a typo

v19:
   04: fix a number of flaws [Eric]
   * remove rudundant test case descriptions
   * fix stdout redirect
   * don't use (())
   * use peek_file_be instead of od
   * check $TEST_DIR for spaces and other before using
   * use $RAND_FILE safer

v18:
   * 04: add quotes to all file name variables [Vladimir] 
   * 04: add Vladimir's comment according to "qemu-io write -s"
 option issue.

v17:
   * 03: remove incorrect comment in zstd decompress [Vladimir]
   * 03: remove "paraniod" and rewrite the comment on decompress [Vladimir]
   * 03: fix dead assignment [Vladimir]
   * 04: add and remove quotes [Vladimir]
   * 04: replace long offset form with the short one [Vladimir]

v16:
   * 03: ssize_t for ret, size_t for zstd_ret [Vladimir]
   * 04: small fixes according to the comments [Vladimir] 

v15:
   * 01: aiming qemu 5.1 [Eric]
   * 03: change zstd_res definition place [Vladimir]
   * 04: add two new test cases [Eric]
 1. test adjacent cluster compression with zstd
 2. test incompressible cluster processing
   * 03, 04: many rewording and gramma fixing [Eric]

v14:
   * fix bug on compression - looping until compress == 0 [Me]
   * apply reworked Vladimir's suggestions:
  1. not mixing ssize_t with size_t
  2. safe check for ENOMEM in compression part - avoid overflow
  3. tolerate sanity check allow zstd to make progress only
 on one of the buffers
v13:
   * 03: add progress sanity check to decompression loop [Vladimir]
 03: add successful decompression check [Me]

v12:
   * 03: again, rework compression and decompression loops
 to make them more correct [Vladimir]
 03: move assert in compression to more appropriate place
 [Vladimir]
v11:
   * 03: the loops don't need "do{}while" form anymore and
 the they were buggy (missed "do" in the beginning)
 replace them with usual "while(){}" loops [Vladimir]
v10:
   * 03: fix zstd (de)compressed loops for multi-frame
 cases [Vladimir]
v9:
   * 01: fix error checking and reporting in qcow2_amend compression type part 
[Vladimir]
   * 03: replace asserts with -EIO in qcow2_zstd_decompression [Vladimir, 
Alberto]
   * 03: reword/amend/add comments, fix typos [Vladimir]

v8:
   * 03: switch zstd API from simple to stream [Eric]
 No need to state a special cluster layout for zstd
 compressed clusters.
v7:
   * use qapi_enum_parse instead of the open-coding [Eric]
   * fix wording, typos and spelling [Eric]

v6:
   * "block/qcow2-threads: fix qcow2_decompress" is removed from the series
  since it has been accepted by Max already
   * add compile time checking for Qcow2Header to be a multiple of 8 [Max, 
Alberto]
   * report error on qcow2 amending when the compression type is actually 
chnged [Max]
   * remove the extra space and the extra new line [Max]
   * re-arrange acks and signed-off-s [Vladimir]

v5:
   * replace -ENOTSUP with abort in qcow2_co_decompress [Vladimir]
   * set cluster size for all test cases in the beginning of the 287 test

v4:
   * the series is rebased on top of 01 "block/qcow2-threads: fix 
qcow2_decompress"
   * 01 is just a no-change resend to avoid extra dependencies. Still, it may 
be merged in separate

v3:
   * remove redundant max compression type value check [Vladimir, Eric]
 (the switch below checks everything)
   * prevent compression type changing on "qemu-img amend" [Vladimir]
   * remove zstd config setting, since it has been added already by
 "migration" patches [Vladimir]
   * change the compression type error message [Vladimir] 
   * fix alignment and 80-chars exceeding [Vladimir]

v2:
   * rework compression type setting [Vladimir]
   * squash iotest changes to the compression type introduction patch 
[Vladimir, Eric]
   * fix zstd availability checking in zstd iotest [Vladimir]
   * remove unnecessry casting [Eric]
   * remove rudundant checks [Eric]
   * fix compressed cluster layout in qcow2 spec [Vladimir]
   * fix wording [Eric, Vladimir]
   * fix compression type filtering in iotests [Eric]

v1:
   the initial series

Denis Plotnikov (4):
  qcow2: introduce compression type feature
  qcow2: rework the cluster compression routine
  qcow2: add zstd cluster compression
  iotests: 287: add qcow2 compression type test

 docs/interop/qcow2.txt   |   1 +
 configure|   2 +-
 qapi/block-core.json |  23 ++-
 block/qcow2.h|  20 ++-

[PATCH v22 4/4] iotests: 287: add qcow2 compression type test

2020-04-28 Thread Denis Plotnikov

The test checks fulfilling qcow2 requirements for the compression
type feature and zstd compression type operability.

Signed-off-by: Denis Plotnikov 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Tested-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/287 | 152 +
 tests/qemu-iotests/287.out |  67 
 tests/qemu-iotests/group   |   1 +
 3 files changed, 220 insertions(+)
 create mode 100755 tests/qemu-iotests/287
 create mode 100644 tests/qemu-iotests/287.out

diff --git a/tests/qemu-iotests/287 b/tests/qemu-iotests/287
new file mode 100755
index 00..21fe1f19f5
--- /dev/null
+++ b/tests/qemu-iotests/287
@@ -0,0 +1,152 @@
+#!/usr/bin/env bash
+#
+# Test case for an image using zstd compression
+#
+# Copyright (c) 2020 Virtuozzo International GmbH
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=dplotni...@virtuozzo.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+status=1   # failure is the default!
+
+# standard environment
+. ./common.rc
+. ./common.filter
+
+# This tests qocw2-specific low-level functionality
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+_unsupported_imgopts 'compat=0.10' data_file
+
+COMPR_IMG="$TEST_IMG.compressed"
+RAND_FILE="$TEST_DIR/rand_data"
+
+_cleanup()
+{
+   _rm_test_img
+   rm -f "$COMPR_IMG"
+   rm -f "$RAND_FILE"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# for all the cases
+CLUSTER_SIZE=65536
+
+# Check if we can run this test.
+if IMGOPTS='compression_type=zstd' _make_test_img 64M |
+grep "Invalid parameter 'zstd'"; then
+_notrun "ZSTD is disabled"
+fi
+
+echo
+echo "=== Testing compression type incompatible bit setting for zlib ==="
+echo
+_make_test_img -o compression_type=zlib 64M
+$PYTHON qcow2.py "$TEST_IMG" dump-header | grep incompatible_features
+
+echo
+echo "=== Testing compression type incompatible bit setting for zstd ==="
+echo
+_make_test_img -o compression_type=zstd 64M
+$PYTHON qcow2.py "$TEST_IMG" dump-header | grep incompatible_features
+
+echo
+echo "=== Testing zlib with incompatible bit set ==="
+echo
+_make_test_img -o compression_type=zlib 64M
+$PYTHON qcow2.py "$TEST_IMG" set-feature-bit incompatible 3
+# to make sure the bit was actually set
+$PYTHON qcow2.py "$TEST_IMG" dump-header | grep incompatible_features
+
+if $QEMU_IMG info "$TEST_IMG" >/dev/null 2>&1 ; then
+echo "Error: The image opened successfully. The image must not be opened."
+fi
+
+echo
+echo "=== Testing zstd with incompatible bit unset ==="
+echo
+_make_test_img -o compression_type=zstd 64M
+$PYTHON qcow2.py "$TEST_IMG" set-header incompatible_features 0
+# to make sure the bit was actually unset
+$PYTHON qcow2.py "$TEST_IMG" dump-header | grep incompatible_features
+
+if $QEMU_IMG info "$TEST_IMG" >/dev/null 2>&1 ; then
+echo "Error: The image opened successfully. The image must not be opened."
+fi
+
+echo
+echo "=== Testing compression type values ==="
+echo
+# zlib=0
+_make_test_img -o compression_type=zlib 64M
+peek_file_be "$TEST_IMG" 104 1
+echo
+
+# zstd=1
+_make_test_img -o compression_type=zstd 64M
+peek_file_be "$TEST_IMG" 104 1
+echo
+
+echo
+echo "=== Testing simple reading and writing with zstd ==="
+echo
+_make_test_img -o compression_type=zstd 64M
+$QEMU_IO -c "write -c -P 0xAC 64K 64K " "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0xAC 64K 64K " "$TEST_IMG" | _filter_qemu_io
+# read on the cluster boundaries
+$QEMU_IO -c "read -v 131070 8 " "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -v 65534 8" "$TEST_IMG" | _filter_qemu_io
+
+echo
+echo "=== Testing adjacent clusters reading and writing with zstd ==="
+echo
+_make_test_img -o compression_type=zstd 64M
+$QEMU_IO -c "write -c -P 0xAB 0 64K " "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "write -c -P 0xAC 64K 64K " "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "write -c -P 0xAD 128K 64K " "$TEST_IMG" | _filter_qemu_io
+
+$QEMU_IO -c "read -P 0xAB 0 64k " "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0xAC 64K 64k " "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0xAD 128K 64k " "$TEST_IMG" | _filter_qemu_io
+
+echo
+echo "=== Testing incompressible cluster processing with zstd ==="
+echo
+# create a 2M image and fill it with 1M likely incompressible data
+# and 1M compressible data
+dd if=/dev/urandom of="$RAND_FILE" bs=1M count=1 seek=1

Re: backing chain & block status & filters

2020-04-28 Thread Vladimir Sementsov-Ogievskiy


28.04.2020 19:46, Vladimir Sementsov-Ogievskiy wrote:

28.04.2020 19:18, Eric Blake wrote:

On 4/28/20 10:13 AM, Vladimir Sementsov-Ogievskiy wrote:


Hm.  I could imagine that there are formats that have non-zero holes
(e.g. 0xff or just garbage).  It would be a bit wrong for them to return
ZERO or DATA then.

But OTOH we don’t care about such cases, do we?  We need to know whether
ranges are zero, data, or unallocated.  If they aren’t zero, we only
care about whether reading from it will return data from this layer or not.

So I suppose that anything that doesn’t support backing files (or
filtered children) should always return ZERO and/or DATA.


I'm not sure I agree with the notion that everything should be
BDRV_BLOCK_ALLOCATED at the lowest layer. It's not what it means today
at least. If we want to change this, we will have to check all callers
of bdrv_is_allocated() and friends who might use this to find holes in
the file.


Yes. Because they are doing incorrect (or at least undocumented and unreliable) 
thing.


Here's some previous mails discussing the same question about what block_status 
should actually mean.  At the time, I was so scared of the prospect of 
something breaking if I changed things that I ended up keeping status quo, so 
here we are revisiting the topic several years later, still asking the same 
questions.

https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg00069.html
https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03757.html





Basically, the way bdrv_is_allocated() works today is that we assume an
implicit zeroed backing layer even for block drivers that don't support
backing files.


But read doesn't work so: it will read data from the bottom layer, not from
this implicit zeroed backing layer. And it is inconsistent. On read data
comes exactly from this layer, not from its implicit backing. So it should
return BDRV_BLOCK_ALLOCATED, accordingly to its definition..

Or, we should at least document current behavior:

   BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
   layer rather than any backing, set by block. Attention: it may not be set
   for drivers without backing support, still data is of course read from
   this layer. Note, that for such drivers BDRV_BLOCK_ALLOCATED may mean
   allocation on fs level, which occupies real space on disk.. So, for such 
drivers

   ZERO | ALLOCATED means that, read as zero, data may be allocated on fs, or
   (most probably) not,
   don't look at ALLOCATED flag, as it is added by generic layer for another 
logic,
   not related to fs-allocation.

   0 means that, most probably, data doesn't occupy space on fs, zero-status is
   unknown (most probably non-zero)



That may be right in describing the current situation, but again, needs a GOOD 
audit of what we are actually using it for, and whether it is what we really 
WANT to be using it for.  If we're going to audit/refactor the code, we might 
as well get semantics that are actually useful, rather than painfully contorted 
to documentation that happens to match our current contorted code.



Honest enough:) I'll try to make a table.

I don't think that reporting fs-allocation status is a bad thing. But I'm sure 
that it should be separated from backing-chain-allocated concept.



As a first step, I've don brief analysis of .bdrv_co_block_status of drivers 
(attached)

--
Best regards,
Vladimir
= Filter functions =
mirror,commit,backup filters: bdrv_co_block_status_from_backing
blklogwrites,compress,COR,throttle: bdrv_co_block_status_from_file

return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID
- semantics of BDRV_BLOCK_RAW is unclean, behavior is broken

blkdebug: blkdebug_co_block_status
- actually, uses bdrv_co_block_status_from_file,
after additional blkdebug-related things not influincing the result

= raw =
raw: raw_co_block_status
return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID
- semantics of BDRV_BLOCK_RAW is unclean, behavior is broken

Summary: we need to fix BDRV_BLOCK_RAW-recursion semantics to not interfere 
with block_status_above/is_allocated_above loops.

= Format drivers with supports_backing = true =

qed: bdrv_qed_co_block_status
bdi->unallocated_blocks_are_zero = true;

0 - go to backing
ZERO - metadata-zero
DATA | OFFSET_VALID with @map set and @file = file-child : 
metadata-allocated-data
 
parallels: parallels_co_block_status
unallocated_blocks_are_zero is unset, but they are actually read as zero if 
no backing

0 - go to backing
DATA | OFFSET_VALID with @map set and @file = file-child : 
metadat-allocated-data

qcow2: qcow2_co_block_status
bdi->unallocated_blocks_are_zero = true;

ZERO | OFFSET_VALID with @map set and @file = something : 
metadata-allocated-zero
DATA | OFFSET_VALID with @map set and @file = something : 
metadata-allocated-data
RECURSE | DATA | OFFSET_VALID with @map set and @file = file-child : 
metadata-allocated-data (hint, that

[PATCH v4 2/3] qcow2: Allow resize of images with internal snapshots

2020-04-28 Thread Eric Blake

We originally refused to allow resize of images with internal
snapshots because the v2 image format did not require the tracking of
snapshot size, making it impossible to safely revert to a snapshot
with a different size than the current view of the image.  But the
snapshot size tracking was rectified in v3, and our recent fixes to
qemu-img amend (see 0a85af35) guarantee that we always have a valid
snapshot size.  Thus, we no longer need to artificially limit image
resizes, but it does become one more thing that would prevent a
downgrade back to v2.  And now that we support different-sized
snapshots, it's also easy to fix reverting to a snapshot to apply the
new size.

Upgrade iotest 61 to cover this (we previously had NO coverage of
refusal to resize while snapshots exist).  Note that the amend process
can fail but still have effects: in particular, since we break things
into upgrade, resize, downgrade, a failure during resize does not roll
back changes made during upgrade, nor does failure in downgrade roll
back a resize.  But this situation is pre-existing even without this
patch; and without journaling, the best we could do is minimize the
chance of partial failure by collecting all changes prior to doing any
writes - which adds a lot of complexity but could still fail with EIO.
On the other hand, we are careful that even if we have partial
modification but then fail, the image is left viable (that is, we are
careful to sequence things so that after each successful cluster
write, there may be transient leaked clusters but no corrupt
metadata).  And complicating the code to make it more transaction-like
is not worth the effort: a user can always request multiple 'qemu-img
amend' changing one thing each, if they need finer-grained control
over detecting the first failure than what they get by letting qemu
decide how to sequence multiple changes.

Signed-off-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 block/qcow2-snapshot.c | 20 
 block/qcow2.c  | 25 ++---
 tests/qemu-iotests/061 | 35 +++
 tests/qemu-iotests/061.out | 28 
 4 files changed, 101 insertions(+), 7 deletions(-)

diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 82c32d4c9b08..2756b37d2427 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -23,6 +23,7 @@
  */

 #include "qemu/osdep.h"
+#include "sysemu/block-backend.h"
 #include "qapi/error.h"
 #include "qcow2.h"
 #include "qemu/bswap.h"
@@ -775,10 +776,21 @@ int qcow2_snapshot_goto(BlockDriverState *bs, const char 
*snapshot_id)
 }

 if (sn->disk_size != bs->total_sectors * BDRV_SECTOR_SIZE) {
-error_report("qcow2: Loading snapshots with different disk "
-"size is not implemented");
-ret = -ENOTSUP;
-goto fail;
+BlockBackend *blk = blk_new_with_bs(bs, BLK_PERM_RESIZE, BLK_PERM_ALL,
+_err);
+if (!blk) {
+error_report_err(local_err);
+ret = -ENOTSUP;
+goto fail;
+}
+
+ret = blk_truncate(blk, sn->disk_size, true, PREALLOC_MODE_OFF, 0,
+   _err);
+blk_unref(blk);
+if (ret < 0) {
+error_report_err(local_err);
+goto fail;
+}
 }

 /*
diff --git a/block/qcow2.c b/block/qcow2.c
index 0edc7f4643f8..3e8b3d022b80 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3989,9 +3989,12 @@ static int coroutine_fn 
qcow2_co_truncate(BlockDriverState *bs, int64_t offset,

 qemu_co_mutex_lock(>lock);

-/* cannot proceed if image has snapshots */
-if (s->nb_snapshots) {
-error_setg(errp, "Can't resize an image which has snapshots");
+/*
+ * Even though we store snapshot size for all images, it was not
+ * required until v3, so it is not safe to proceed for v2.
+ */
+if (s->nb_snapshots && s->qcow_version < 3) {
+error_setg(errp, "Can't resize a v2 image which has snapshots");
 ret = -ENOTSUP;
 goto fail;
 }
@@ -5005,6 +5008,7 @@ static int qcow2_downgrade(BlockDriverState *bs, int 
target_version,
 BDRVQcow2State *s = bs->opaque;
 int current_version = s->qcow_version;
 int ret;
+int i;

 /* This is qcow2_downgrade(), not qcow2_upgrade() */
 assert(target_version < current_version);
@@ -5022,6 +5026,21 @@ static int qcow2_downgrade(BlockDriverState *bs, int 
target_version,
 return -ENOTSUP;
 }

+/*
+ * If any internal snapshot has a different size than the current
+ * image size, or VM state size that exceeds 32 bits, downgrading
+ * is unsafe.  Even though we would still use v3-compliant output
+ * to preserve that data, other v2 programs might not realize
+ * those optional fields are important.
+ */
+for (i = 0; i < s->nb_snapshots; i++) {
+if (s->snapshots[i].vm_state_size > UINT32_MAX

[PATCH v4 3/3] qcow2: Tweak comment about bitmaps vs. resize

2020-04-28 Thread Eric Blake

Our comment did not actually match the code.  Rewrite the comment to
be less sensitive to any future changes to qcow2-bitmap.c that might
implement scenarios that we currently reject.

Signed-off-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 block/qcow2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 3e8b3d022b80..ad934109a813 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3999,7 +3999,7 @@ static int coroutine_fn 
qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
 goto fail;
 }

-/* cannot proceed if image has bitmaps */
+/* See qcow2-bitmap.c for which bitmap scenarios prevent a resize. */
 if (qcow2_truncate_bitmaps_check(bs, errp)) {
 ret = -ENOTSUP;
 goto fail;
-- 
2.26.2

[PATCH v4 1/3] block: Add blk_new_with_bs() helper

2020-04-28 Thread Eric Blake

There are several callers that need to create a new block backend from
an existing BDS; make the task slightly easier with a common helper
routine.

Suggested-by: Max Reitz 
Signed-off-by: Eric Blake 
Message-Id: <20200424190903.522087-2-ebl...@redhat.com>
[mreitz: Set @ret only in error paths, see
 https://lists.nongnu.org/archive/html/qemu-block/2020-04/msg01216.html]
Signed-off-by: Max Reitz 
---
 include/sysemu/block-backend.h |  2 ++
 block/block-backend.c  | 23 +++
 block/crypto.c |  9 -
 block/parallels.c  |  8 
 block/qcow.c   |  8 
 block/qcow2.c  | 18 --
 block/qed.c|  8 
 block/sheepdog.c   | 10 +-
 block/vdi.c|  8 
 block/vhdx.c   |  8 
 block/vmdk.c   |  9 -
 block/vpc.c|  8 
 blockdev.c |  8 +++-
 blockjob.c |  7 ++-
 14 files changed, 75 insertions(+), 59 deletions(-)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 34de7faa81de..0917663d89f4 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -77,6 +77,8 @@ typedef struct BlockBackendPublic {
 } BlockBackendPublic;

 BlockBackend *blk_new(AioContext *ctx, uint64_t perm, uint64_t shared_perm);
+BlockBackend *blk_new_with_bs(BlockDriverState *bs, uint64_t perm,
+  uint64_t shared_perm, Error **errp);
 BlockBackend *blk_new_open(const char *filename, const char *reference,
QDict *options, int flags, Error **errp);
 int blk_get_refcnt(BlockBackend *blk);
diff --git a/block/block-backend.c b/block/block-backend.c
index 17ed6d8c5b27..f4944861fa4e 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -355,6 +355,29 @@ BlockBackend *blk_new(AioContext *ctx, uint64_t perm, 
uint64_t shared_perm)
 return blk;
 }

+/*
+ * Create a new BlockBackend connected to an existing BlockDriverState.
+ *
+ * @perm is a bitmasks of BLK_PERM_* constants which describes the
+ * permissions to request for @bs that is attached to this
+ * BlockBackend.  @shared_perm is a bitmask which describes which
+ * permissions may be granted to other users of the attached node.
+ * Both sets of permissions can be changed later using blk_set_perm().
+ *
+ * Return the new BlockBackend on success, null on failure.
+ */
+BlockBackend *blk_new_with_bs(BlockDriverState *bs, uint64_t perm,
+  uint64_t shared_perm, Error **errp)
+{
+BlockBackend *blk = blk_new(bdrv_get_aio_context(bs), perm, shared_perm);
+
+if (blk_insert_bs(blk, bs, errp) < 0) {
+blk_unref(blk);
+return NULL;
+}
+return blk;
+}
+
 /*
  * Creates a new BlockBackend, opens a new BlockDriverState, and connects both.
  * The new BlockBackend is in the main AioContext.
diff --git a/block/crypto.c b/block/crypto.c
index e02f34359019..ca44dae4f7e8 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -261,11 +261,10 @@ static int 
block_crypto_co_create_generic(BlockDriverState *bs,
 QCryptoBlock *crypto = NULL;
 struct BlockCryptoCreateData data;

-blk = blk_new(bdrv_get_aio_context(bs),
-  BLK_PERM_WRITE | BLK_PERM_RESIZE, BLK_PERM_ALL);
-
-ret = blk_insert_bs(blk, bs, errp);
-if (ret < 0) {
+blk = blk_new_with_bs(bs, BLK_PERM_WRITE | BLK_PERM_RESIZE, BLK_PERM_ALL,
+  errp);
+if (!blk) {
+ret = -EPERM;
 goto cleanup;
 }

diff --git a/block/parallels.c b/block/parallels.c
index 2be92cf41708..8db64a55e3ae 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -559,10 +559,10 @@ static int coroutine_fn 
parallels_co_create(BlockdevCreateOptions* opts,
 return -EIO;
 }

-blk = blk_new(bdrv_get_aio_context(bs),
-  BLK_PERM_WRITE | BLK_PERM_RESIZE, BLK_PERM_ALL);
-ret = blk_insert_bs(blk, bs, errp);
-if (ret < 0) {
+blk = blk_new_with_bs(bs, BLK_PERM_WRITE | BLK_PERM_RESIZE, BLK_PERM_ALL,
+  errp);
+if (!blk) {
+ret = -EPERM;
 goto out;
 }
 blk_set_allow_write_beyond_eof(blk, true);
diff --git a/block/qcow.c b/block/qcow.c
index 6b5f2269f0ba..b0475b73a551 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -849,10 +849,10 @@ static int coroutine_fn 
qcow_co_create(BlockdevCreateOptions *opts,
 return -EIO;
 }

-qcow_blk = blk_new(bdrv_get_aio_context(bs),
-   BLK_PERM_WRITE | BLK_PERM_RESIZE, BLK_PERM_ALL);
-ret = blk_insert_bs(qcow_blk, bs, errp);
-if (ret < 0) {
+qcow_blk = blk_new_with_bs(bs, BLK_PERM_WRITE | BLK_PERM_RESIZE,
+   BLK_PERM_ALL, errp);
+if (!qcow_blk) {
+ret = -EPERM;
 goto exit;
 }

[PATCH v4 0/3] qcow2: Allow resize of images with internal snapshots

2020-04-28 Thread Eric Blake

Re-posting this to make Max' life easier when rebasing on top of Kevin's work.

Based-on: <20200424125448.63318-1-kw...@redhat.com>
[PATCH v7 00/10] block: Fix resize (extending) of short overlays

In v4:
- patch 1: fold in Max's touchups to my v3
- patch 1: resolve merge conflict on top of Kevin's block-next branch (truncate 
signature change)
- patch 2: resolve semantic conflict (truncate signature change)

001/3:[0004] [FC] 'block: Add blk_new_with_bs() helper'
002/3:[0002] [FC] 'qcow2: Allow resize of images with internal snapshots'
003/3:[] [--] 'qcow2: Tweak comment about bitmaps vs. resize'

Eric Blake (3):
  block: Add blk_new_with_bs() helper
  qcow2: Allow resize of images with internal snapshots
  qcow2: Tweak comment about bitmaps vs. resize

 include/sysemu/block-backend.h |  2 ++
 block/block-backend.c  | 23 +
 block/crypto.c |  9 +++
 block/parallels.c  |  8 +++---
 block/qcow.c   |  8 +++---
 block/qcow2-snapshot.c | 20 ---
 block/qcow2.c  | 45 +++---
 block/qed.c|  8 +++---
 block/sheepdog.c   | 10 
 block/vdi.c|  8 +++---
 block/vhdx.c   |  8 +++---
 block/vmdk.c   |  9 +++
 block/vpc.c|  8 +++---
 blockdev.c |  8 +++---
 blockjob.c |  7 ++
 tests/qemu-iotests/061 | 35 ++
 tests/qemu-iotests/061.out | 28 +
 17 files changed, 177 insertions(+), 67 deletions(-)

-- 
2.26.2

Re: [PATCH v7 04/10] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate

2020-04-28 Thread Eric Blake


On 4/28/20 1:45 PM, Kevin Wolf wrote:

Am 28.04.2020 um 18:28 hat Eric Blake geschrieben:

On 4/24/20 7:54 AM, Kevin Wolf wrote:

If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
undo any previous preallocation, but just adds the zero flag to all
relevant L2 entries. If an external data file is in use, a write_zeroes
request to the data file is made instead.

Signed-off-by: Kevin Wolf 
---
   block/qcow2-cluster.c |  2 +-
   block/qcow2.c | 34 ++
   2 files changed, 35 insertions(+), 1 deletion(-)




+++ b/block/qcow2.c
@@ -1726,6 +1726,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState 
*bs, QDict *options,
   bs->supported_zero_flags = header.version >= 3 ?
  BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK : 0;
+bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;


Is this really what we want for encrypted files, or would it be better as:

 if (bs->encrypted) {
 bs->supported_truncate_flags = 0;
 } else {
 bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
 }

At the qcow2 level, we can guarantee a read of 0 even for an encrypted
image, but is that really what we want?  Is setting the qcow2 zero flag on
the cluster done at the decrypted level (at which point we may be leaking
information about guest contents via anyone that can read the qcow2
metadata) or at the encrypted level (at which point it's useless
information, because knowing the underlying file reads as zero still
decrypts into garbage)?


The zero flag means that the guest reads zeros, even with encrypted
files. I'm not sure if it's worse than exposing the information which
clusters are allocated and which are unallocated, which we have always
been doing and which is hard to avoid without encrypting all the
metadata, too. But it does reveal some information.

If we think that exposing zero flags is worse than exposing the
allocation status, I would still not use your solution above. In that
case, the full fix would be returning -ENOTSUP from
.bdrv_co_pwrite_zeroes() to cover all other callers, too.


Indeed, it also makes me wonder if we should support 
truncate(BDRV_REQ_ZERO_WRITE|BDRV_REQ_NO_FALLBACK), to differentiate 
whether a truncation request is aiming more to be fast (NO_FALLBACK set, 
fail immediately with -ENOTSUP on encryption) or complete (NO_FALLBACK 
clear, go ahead and write guest-visible zeroes, which populates the 
format layer).  In other words, maybe we want a knob that the user can 
set on encrypted volumes on whether to allow zero flags in the qcow2 image.




If we think that allocation status and zero flags are of comparable
importance, then we need to fix either both or nothing. Hiding all of
this information probably means encrypting at least the L2 tables and
potentially all of the metadata apart from the header. This would
obviously require an incompatible feature flag (and some effort to
implement it).


Indeed, my question is broad enough that it does not hold up _this_ 
series, so much as providing food for thought on what else we may need 
to add for encrypted qcow2 images as a future series, to make it easier 
to adjust the slider between the extremes of performance vs. minimal 
data leaks when using encryption.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v7 04/10] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate

2020-04-28 Thread Kevin Wolf

Am 28.04.2020 um 18:28 hat Eric Blake geschrieben:
> On 4/24/20 7:54 AM, Kevin Wolf wrote:
> > If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
> > qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
> > undo any previous preallocation, but just adds the zero flag to all
> > relevant L2 entries. If an external data file is in use, a write_zeroes
> > request to the data file is made instead.
> > 
> > Signed-off-by: Kevin Wolf 
> > ---
> >   block/qcow2-cluster.c |  2 +-
> >   block/qcow2.c | 34 ++
> >   2 files changed, 35 insertions(+), 1 deletion(-)
> > 
> 
> > +++ b/block/qcow2.c
> > @@ -1726,6 +1726,7 @@ static int coroutine_fn 
> > qcow2_do_open(BlockDriverState *bs, QDict *options,
> >   bs->supported_zero_flags = header.version >= 3 ?
> >  BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK 
> > : 0;
> > +bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
> 
> Is this really what we want for encrypted files, or would it be better as:
> 
> if (bs->encrypted) {
> bs->supported_truncate_flags = 0;
> } else {
> bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
> }
> 
> At the qcow2 level, we can guarantee a read of 0 even for an encrypted
> image, but is that really what we want?  Is setting the qcow2 zero flag on
> the cluster done at the decrypted level (at which point we may be leaking
> information about guest contents via anyone that can read the qcow2
> metadata) or at the encrypted level (at which point it's useless
> information, because knowing the underlying file reads as zero still
> decrypts into garbage)?

The zero flag means that the guest reads zeros, even with encrypted
files. I'm not sure if it's worse than exposing the information which
clusters are allocated and which are unallocated, which we have always
been doing and which is hard to avoid without encrypting all the
metadata, too. But it does reveal some information.

If we think that exposing zero flags is worse than exposing the
allocation status, I would still not use your solution above. In that
case, the full fix would be returning -ENOTSUP from
.bdrv_co_pwrite_zeroes() to cover all other callers, too.

If we think that allocation status and zero flags are of comparable
importance, then we need to fix either both or nothing. Hiding all of
this information probably means encrypting at least the L2 tables and
potentially all of the metadata apart from the header. This would
obviously require an incompatible feature flag (and some effort to
implement it).

Kevin

Re: backing chain & block status & filters

2020-04-28 Thread Kevin Wolf

Am 28.04.2020 um 18:46 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 28.04.2020 19:18, Eric Blake wrote:
> > On 4/28/20 10:13 AM, Vladimir Sementsov-Ogievskiy wrote:
> > 
> > > > > Hm.  I could imagine that there are formats that have non-zero holes
> > > > > (e.g. 0xff or just garbage).  It would be a bit wrong for them to 
> > > > > return
> > > > > ZERO or DATA then.
> > > > > 
> > > > > But OTOH we don’t care about such cases, do we?  We need to know 
> > > > > whether
> > > > > ranges are zero, data, or unallocated.  If they aren’t zero, we only
> > > > > care about whether reading from it will return data from this layer 
> > > > > or not.
> > > > > 
> > > > > So I suppose that anything that doesn’t support backing files (or
> > > > > filtered children) should always return ZERO and/or DATA.
> > > > 
> > > > I'm not sure I agree with the notion that everything should be
> > > > BDRV_BLOCK_ALLOCATED at the lowest layer. It's not what it means today
> > > > at least. If we want to change this, we will have to check all callers
> > > > of bdrv_is_allocated() and friends who might use this to find holes in
> > > > the file.
> > > 
> > > Yes. Because they are doing incorrect (or at least undocumented and 
> > > unreliable) thing.
> > 
> > Here's some previous mails discussing the same question about what 
> > block_status should actually mean.  At the time, I was so scared of the 
> > prospect of something breaking if I changed things that I ended up keeping 
> > status quo, so here we are revisiting the topic several years later, still 
> > asking the same questions.
> > 
> > https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg00069.html
> > https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03757.html
> > 
> > > 
> > > > 
> > > > Basically, the way bdrv_is_allocated() works today is that we assume an
> > > > implicit zeroed backing layer even for block drivers that don't support
> > > > backing files.
> > > 
> > > But read doesn't work so: it will read data from the bottom layer, not 
> > > from
> > > this implicit zeroed backing layer. And it is inconsistent. On read data
> > > comes exactly from this layer, not from its implicit backing. So it should
> > > return BDRV_BLOCK_ALLOCATED, accordingly to its definition..
> > > 
> > > Or, we should at least document current behavior:
> > > 
> > >    BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
> > >    layer rather than any backing, set by block. Attention: it may not be 
> > > set
> > >    for drivers without backing support, still data is of course read from
> > >    this layer. Note, that for such drivers BDRV_BLOCK_ALLOCATED may mean
> > >    allocation on fs level, which occupies real space on disk.. So, for 
> > > such drivers
> > > 
> > >    ZERO | ALLOCATED means that, read as zero, data may be allocated on 
> > > fs, or
> > >    (most probably) not,
> > >    don't look at ALLOCATED flag, as it is added by generic layer for 
> > > another logic,
> > >    not related to fs-allocation.
> > > 
> > >    0 means that, most probably, data doesn't occupy space on fs, 
> > > zero-status is
> > >    unknown (most probably non-zero)
> > > 
> > 
> > That may be right in describing the current situation, but again,
> > needs a GOOD audit of what we are actually using it for, and whether
> > it is what we really WANT to be using it for.  If we're going to
> > audit/refactor the code, we might as well get semantics that are
> > actually useful, rather than painfully contorted to documentation
> > that happens to match our current contorted code.
> > 
> 
> Honest enough:) I'll try to make a table.
> 
> I don't think that reporting fs-allocation status is a bad thing. But
> I'm sure that it should be separated from backing-chain-allocated
> concept.

I think we could easily agree on what would be a good concept.

My concern is just that existing code probably uses existing semantics
and not what we consider more logical now. So if we change it, we must
make sure that we change all places that expect the old semantics.

Kevin

Re: [PATCH v3 0/3] qcow2: Allow resize of images with internal snapshots

2020-04-28 Thread Eric Blake


On 4/28/20 7:59 AM, Max Reitz wrote:

On 24.04.20 21:09, Eric Blake wrote:

In v3:
- patch 1: fix error returns [patchew, Max], R-b dropped
- patch 2,3: unchanged, so add R-b

Eric Blake (3):
   block: Add blk_new_with_bs() helper
   qcow2: Allow resize of images with internal snapshots
   qcow2: Tweak comment about bitmaps vs. resize


Thanks, I’ve squashed the diff into patch 1 and applied the series to my
block-next branch:

https://git.xanclic.moe/XanClic/qemu/commits/branch/block-next


This series has not only a merge conflict, but a semantic conflict, with 
the current state of Kevin's block-next branch.  I think I'll go ahead 
and post a v4 based on Kevin's branch to spare you the efforts of having 
to repeat my merge resolution.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v10 00/14] iotests: use python logging

2020-04-28 Thread John Snow




On 4/28/20 8:21 AM, Kevin Wolf wrote:
> Am 28.04.2020 um 13:46 hat Max Reitz geschrieben:
>> On 31.03.20 02:00, John Snow wrote:
>>> This series uses python logging to enable output conditionally on
>>> iotests.log(). We unify an initialization call (which also enables
>>> debugging output for those tests with -d) and then make the switch
>>> inside of iotests.
>>>
>>> It will help alleviate the need to create logged/unlogged versions
>>> of all the various helpers we have made.
>>>
>>> Also, I got lost and accidentally delinted iotests while I was here.
>>> Sorry about that. By version 9, it's now the overwhelming focus of
>>> this series. No good deed, etc.
>>
>> Seems like nobody else wants it, so I thank you and let you know that
>> I’ve applied this series to my block-next branch:
>>
>> https://git.xanclic.moe/XanClic/qemu/commits/branch/block-next
> 
> John said he wanted to address my comment on patch 14, so I expected him
> to send another version. This need not stop this series (we can still
> fix that on top), but just as an explanation why I didn't take it yet.
> 
> Kevin
> 

Sorry, foggy memory. I will send a follow-up and we can try to squash it in.

Re: backing chain & block status & filters

2020-04-28 Thread Vladimir Sementsov-Ogievskiy


28.04.2020 19:18, Eric Blake wrote:

On 4/28/20 10:13 AM, Vladimir Sementsov-Ogievskiy wrote:


Hm.  I could imagine that there are formats that have non-zero holes
(e.g. 0xff or just garbage).  It would be a bit wrong for them to return
ZERO or DATA then.

But OTOH we don’t care about such cases, do we?  We need to know whether
ranges are zero, data, or unallocated.  If they aren’t zero, we only
care about whether reading from it will return data from this layer or not.

So I suppose that anything that doesn’t support backing files (or
filtered children) should always return ZERO and/or DATA.


I'm not sure I agree with the notion that everything should be
BDRV_BLOCK_ALLOCATED at the lowest layer. It's not what it means today
at least. If we want to change this, we will have to check all callers
of bdrv_is_allocated() and friends who might use this to find holes in
the file.


Yes. Because they are doing incorrect (or at least undocumented and unreliable) 
thing.


Here's some previous mails discussing the same question about what block_status 
should actually mean.  At the time, I was so scared of the prospect of 
something breaking if I changed things that I ended up keeping status quo, so 
here we are revisiting the topic several years later, still asking the same 
questions.

https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg00069.html
https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03757.html





Basically, the way bdrv_is_allocated() works today is that we assume an
implicit zeroed backing layer even for block drivers that don't support
backing files.


But read doesn't work so: it will read data from the bottom layer, not from
this implicit zeroed backing layer. And it is inconsistent. On read data
comes exactly from this layer, not from its implicit backing. So it should
return BDRV_BLOCK_ALLOCATED, accordingly to its definition..

Or, we should at least document current behavior:

   BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
   layer rather than any backing, set by block. Attention: it may not be set
   for drivers without backing support, still data is of course read from
   this layer. Note, that for such drivers BDRV_BLOCK_ALLOCATED may mean
   allocation on fs level, which occupies real space on disk.. So, for such 
drivers

   ZERO | ALLOCATED means that, read as zero, data may be allocated on fs, or
   (most probably) not,
   don't look at ALLOCATED flag, as it is added by generic layer for another 
logic,
   not related to fs-allocation.

   0 means that, most probably, data doesn't occupy space on fs, zero-status is
   unknown (most probably non-zero)



That may be right in describing the current situation, but again, needs a GOOD 
audit of what we are actually using it for, and whether it is what we really 
WANT to be using it for.  If we're going to audit/refactor the code, we might 
as well get semantics that are actually useful, rather than painfully contorted 
to documentation that happens to match our current contorted code.



Honest enough:) I'll try to make a table.

I don't think that reporting fs-allocation status is a bad thing. But I'm sure 
that it should be separated from backing-chain-allocated concept.

--
Best regards,
Vladimir

Re: [PATCH v7 04/10] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate

2020-04-28 Thread Eric Blake


On 4/24/20 7:54 AM, Kevin Wolf wrote:

If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
undo any previous preallocation, but just adds the zero flag to all
relevant L2 entries. If an external data file is in use, a write_zeroes
request to the data file is made instead.

Signed-off-by: Kevin Wolf 
---
  block/qcow2-cluster.c |  2 +-
  block/qcow2.c | 34 ++
  2 files changed, 35 insertions(+), 1 deletion(-)




+++ b/block/qcow2.c
@@ -1726,6 +1726,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState 
*bs, QDict *options,
  
  bs->supported_zero_flags = header.version >= 3 ?

 BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK : 0;
+bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;


Is this really what we want for encrypted files, or would it be better as:

if (bs->encrypted) {
bs->supported_truncate_flags = 0;
} else {
bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
}

At the qcow2 level, we can guarantee a read of 0 even for an encrypted 
image, but is that really what we want?  Is setting the qcow2 zero flag 
on the cluster done at the decrypted level (at which point we may be 
leaking information about guest contents via anyone that can read the 
qcow2 metadata) or at the encrypted level (at which point it's useless 
information, because knowing the underlying file reads as zero still 
decrypts into garbage)?



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v2 11/14] block/core: add generic infrastructure for x-blockdev-amend qmp command

2020-04-28 Thread Daniel P . Berrangé

On Sun, Mar 08, 2020 at 05:19:00PM +0200, Maxim Levitsky wrote:
> blockdev-amend will be used similiar to blockdev-create
> to allow on the fly changes of the structure of the format based block 
> devices.
> 
> Current plan is to first support encryption keyslot management for luks
> based formats (raw and embedded in qcow2)
> 
> Signed-off-by: Maxim Levitsky 
> ---
>  block/Makefile.objs   |   2 +-
>  block/amend.c | 108 ++
>  include/block/block_int.h |  21 +---
>  qapi/block-core.json  |  42 +++
>  qapi/job.json |   4 +-
>  5 files changed, 169 insertions(+), 8 deletions(-)
>  create mode 100644 block/amend.c

> diff --git a/block/amend.c b/block/amend.c
> new file mode 100644
> index 00..2db7b1eafc
> --- /dev/null
> +++ b/block/amend.c
> @@ -0,0 +1,108 @@
> +/*
> + * Block layer code related to image options amend
> + *
> + * Copyright (c) 2018 Kevin Wolf 
> + * Copyright (c) 2019 Maxim Levitsky 

I would have expected these to be  "Copyright (C) 2019 Red Hat, Inc"
since employees usually have to assign copyright, unless they were
done outside of work context.

Aside from that 

Reviewed-by: Daniel P. Berrangé 


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v2 14/14] iotests: add tests for blockdev-amend

2020-04-28 Thread Daniel P . Berrangé

On Sun, Mar 08, 2020 at 05:19:03PM +0200, Maxim Levitsky wrote:
> This commit adds two tests that cover the
> new blockdev-amend functionality of luks and qcow2 driver
> 
> Signed-off-by: Maxim Levitsky 
> ---
>  tests/qemu-iotests/302 | 278 +
>  tests/qemu-iotests/302.out |  40 ++
>  tests/qemu-iotests/303 | 233 +++
>  tests/qemu-iotests/303.out |  33 +
>  tests/qemu-iotests/group   |   3 +
>  5 files changed, 587 insertions(+)
>  create mode 100755 tests/qemu-iotests/302
>  create mode 100644 tests/qemu-iotests/302.out
>  create mode 100755 tests/qemu-iotests/303
>  create mode 100644 tests/qemu-iotests/303.out

Reviewed-by: Daniel P. Berrangé 


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: backing chain & block status & filters

2020-04-28 Thread Eric Blake


On 4/28/20 10:13 AM, Vladimir Sementsov-Ogievskiy wrote:


Hm.  I could imagine that there are formats that have non-zero holes
(e.g. 0xff or just garbage).  It would be a bit wrong for them to return
ZERO or DATA then.

But OTOH we don’t care about such cases, do we?  We need to know whether
ranges are zero, data, or unallocated.  If they aren’t zero, we only
care about whether reading from it will return data from this layer 
or not.


So I suppose that anything that doesn’t support backing files (or
filtered children) should always return ZERO and/or DATA.


I'm not sure I agree with the notion that everything should be
BDRV_BLOCK_ALLOCATED at the lowest layer. It's not what it means today
at least. If we want to change this, we will have to check all callers
of bdrv_is_allocated() and friends who might use this to find holes in
the file.


Yes. Because they are doing incorrect (or at least undocumented and 
unreliable) thing.


Here's some previous mails discussing the same question about what 
block_status should actually mean.  At the time, I was so scared of the 
prospect of something breaking if I changed things that I ended up 
keeping status quo, so here we are revisiting the topic several years 
later, still asking the same questions.


https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg00069.html
https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03757.html





Basically, the way bdrv_is_allocated() works today is that we assume an
implicit zeroed backing layer even for block drivers that don't support
backing files.


But read doesn't work so: it will read data from the bottom layer, not from
this implicit zeroed backing layer. And it is inconsistent. On read data
comes exactly from this layer, not from its implicit backing. So it should
return BDRV_BLOCK_ALLOCATED, accordingly to its definition..

Or, we should at least document current behavior:

   BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
   layer rather than any backing, set by block. Attention: it may not be 
set

   for drivers without backing support, still data is of course read from
   this layer. Note, that for such drivers BDRV_BLOCK_ALLOCATED may mean
   allocation on fs level, which occupies real space on disk.. So, for 
such drivers


   ZERO | ALLOCATED means that, read as zero, data may be allocated on 
fs, or

   (most probably) not,
   don't look at ALLOCATED flag, as it is added by generic layer for 
another logic,

   not related to fs-allocation.

   0 means that, most probably, data doesn't occupy space on fs, 
zero-status is

   unknown (most probably non-zero)



That may be right in describing the current situation, but again, needs 
a GOOD audit of what we are actually using it for, and whether it is 
what we really WANT to be using it for.  If we're going to 
audit/refactor the code, we might as well get semantics that are 
actually useful, rather than painfully contorted to documentation that 
happens to match our current contorted code.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v2 09/14] iotests: filter few more luks specific create options

2020-04-28 Thread Daniel P . Berrangé

On Sun, Mar 08, 2020 at 05:18:58PM +0200, Maxim Levitsky wrote:
> This allows more tests to be able to have same output on both qcow2 luks 
> encrypted images
> and raw luks images
> 
> Signed-off-by: Maxim Levitsky 
> ---
>  tests/qemu-iotests/087.out   | 6 +++---
>  tests/qemu-iotests/134.out   | 2 +-
>  tests/qemu-iotests/158.out   | 4 ++--
>  tests/qemu-iotests/188.out   | 2 +-
>  tests/qemu-iotests/189.out   | 4 ++--
>  tests/qemu-iotests/198.out   | 4 ++--
>  tests/qemu-iotests/263.out   | 4 ++--
>  tests/qemu-iotests/284.out   | 6 +++---
>  tests/qemu-iotests/common.filter | 6 --
>  9 files changed, 20 insertions(+), 18 deletions(-)

Reviewed-by: Daniel P. Berrangé 


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v2 10/14] iotests: qemu-img tests for luks key management

2020-04-28 Thread Daniel P . Berrangé

On Sun, Mar 08, 2020 at 05:18:59PM +0200, Maxim Levitsky wrote:
> This commit adds two tests, which test the new amend interface
> of both luks raw images and qcow2 luks encrypted images.
> 
> Signed-off-by: Maxim Levitsky 
> ---
>  tests/qemu-iotests/300 | 207 +
>  tests/qemu-iotests/300.out |  99 ++
>  tests/qemu-iotests/301 |  90 
>  tests/qemu-iotests/301.out |  30 ++
>  tests/qemu-iotests/group   |   3 +
>  5 files changed, 429 insertions(+)
>  create mode 100755 tests/qemu-iotests/300
>  create mode 100644 tests/qemu-iotests/300.out
>  create mode 100755 tests/qemu-iotests/301
>  create mode 100644 tests/qemu-iotests/301.out

Reviewed-by: Daniel P. Berrangé 


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v2 08/14] block/qcow2: extend qemu-img amend interface with crypto options

2020-04-28 Thread Daniel P . Berrangé

On Sun, Mar 08, 2020 at 05:18:57PM +0200, Maxim Levitsky wrote:
> Now that we have all the infrastructure in place,
> wire it in the qcow2 driver and expose this to the user.
> 
> Signed-off-by: Maxim Levitsky 
> ---
>  block/qcow2.c  | 80 --
>  tests/qemu-iotests/082.out | 45 +
>  2 files changed, 114 insertions(+), 11 deletions(-)

Reviewed-by: Daniel P. Berrangé 


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v2 07/14] block/crypto: implement the encryption key management

2020-04-28 Thread Daniel P . Berrangé

On Sun, Mar 08, 2020 at 05:18:56PM +0200, Maxim Levitsky wrote:
> This implements the encryption key management using the generic code in
> qcrypto layer and exposes it to the user via qemu-img
> 
> This code adds another 'write_func' because the initialization
> write_func works directly on the underlying file, and amend
> works on instance of luks device.
> 
> This commit also adds a 'hack/workaround' I and Kevin Wolf (thanks)
> made to make the driver both support write sharing (to avoid breaking the 
> users),
> and be safe against concurrent  metadata update (the keyslots)
> 
> Eventually the write sharing for luks driver will be deprecated
> and removed together with this hack.
> 
> The hack is that we ask (as a format driver) for BLK_PERM_CONSISTENT_READ
> and then when we want to update the keys, we unshare that permission.
> So if someone else has the image open, even readonly, encryption
> key update will fail gracefully.
> 
> Also thanks to Daniel Berrange for the idea of
> unsharing read, rather that write permission which allows
> to avoid cases when the other user had opened the image read-only.
> 
> Signed-off-by: Maxim Levitsky 
> ---
>  block/crypto.c | 127 +++--
>  block/crypto.h |  44 +++--
>  2 files changed, 163 insertions(+), 8 deletions(-)

Reviewed-by: Daniel P. Berrangé 


> @@ -661,6 +693,95 @@ block_crypto_get_specific_info_luks(BlockDriverState 
> *bs, Error **errp)
>  return spec_info;
>  }
>  
> +static int
> +block_crypto_amend_options_luks(BlockDriverState *bs,
> +   QemuOpts *opts,
> +   BlockDriverAmendStatusCB *status_cb,
> +   void *cb_opaque,
> +   bool force,
> +   Error **errp)

Nitpick - align options after the "("


> @@ -81,11 +86,40 @@
>  .help = "Name of encryption hash algorithm", \
>  }
>  
> -#define BLOCK_CRYPTO_OPT_DEF_LUKS_ITER_TIME(prefix)   \
> -{ \
> -.name = prefix BLOCK_CRYPTO_OPT_LUKS_ITER_TIME,   \
> -.type = QEMU_OPT_NUMBER,  \
> -.help = "Time to spend in PBKDF in milliseconds", \
> +#define BLOCK_CRYPTO_OPT_DEF_LUKS_ITER_TIME(prefix)\
> +{  \
> +.name = prefix BLOCK_CRYPTO_OPT_LUKS_ITER_TIME,\
> +.type = QEMU_OPT_NUMBER,   \
> +.help = "Time to spend in PBKDF in milliseconds",  \
> +}

Nitpick - no need to change whitespace here

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v21 0/4] implement zstd cluster compression method

2020-04-28 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/20200428133407.10657-1-dplotni...@virtuozzo.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [PATCH v21 0/4] implement zstd cluster compression method
Message-id: 20200428133407.10657-1-dplotni...@virtuozzo.com
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 - [tag update]  patchew/20200428132629.796753-1-mre...@redhat.com -> 
patchew/20200428132629.796753-1-mre...@redhat.com
Switched to a new branch 'test'
7201c7e iotests: 287: add qcow2 compression type test
7f44ce7 qcow2: add zstd cluster compression
0504310 qcow2: rework the cluster compression routine
01049b7 qcow2: introduce compression type feature

=== OUTPUT BEGIN ===
1/4 Checking commit 01049b7e28e9 (qcow2: introduce compression type feature)
2/4 Checking commit 0504310611bc (qcow2: rework the cluster compression routine)
3/4 Checking commit 7f44ce7d732b (qcow2: add zstd cluster compression)
ERROR: do not use assignment in if condition
#115: FILE: block/qcow2-threads.c:225:
+if ((zstd_ret = ZSTD_compressStream2(cctx, , , ZSTD_e_end))) {

total: 1 errors, 0 warnings, 238 lines checked

Patch 3/4 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

4/4 Checking commit 7201c7e73ce4 (iotests: 287: add qcow2 compression type test)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#23: 
new file mode 100755

total: 0 errors, 1 warnings, 228 lines checked

Patch 4/4 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200428133407.10657-1-dplotni...@virtuozzo.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH v2 05/14] block/amend: refactor qcow2 amend options

2020-04-28 Thread Daniel P . Berrangé

On Sun, Mar 08, 2020 at 05:18:54PM +0200, Maxim Levitsky wrote:
> Some qcow2 create options can't be used for amend.
> Remove them from the qcow2 create options and add generic logic to detect
> such options in qemu-img
> 
> Signed-off-by: Maxim Levitsky 
> ---
>  block/qcow2.c  | 108 ++---
>  qemu-img.c |  18 +++-
>  tests/qemu-iotests/049.out | 102 ++--
>  tests/qemu-iotests/061.out |  12 ++-
>  tests/qemu-iotests/079.out |  18 ++--
>  tests/qemu-iotests/082.out | 149 
>  tests/qemu-iotests/085.out |  38 
>  tests/qemu-iotests/087.out |   6 +-
>  tests/qemu-iotests/115.out |   2 +-
>  tests/qemu-iotests/121.out |   4 +-
>  tests/qemu-iotests/125.out | 192 ++---
>  tests/qemu-iotests/134.out |   2 +-
>  tests/qemu-iotests/144.out |   4 +-
>  tests/qemu-iotests/158.out |   4 +-
>  tests/qemu-iotests/182.out |   2 +-
>  tests/qemu-iotests/185.out |   8 +-
>  tests/qemu-iotests/188.out |   2 +-
>  tests/qemu-iotests/189.out |   4 +-
>  tests/qemu-iotests/198.out |   4 +-
>  tests/qemu-iotests/243.out |  16 ++--
>  tests/qemu-iotests/250.out |   2 +-
>  tests/qemu-iotests/255.out |   8 +-
>  tests/qemu-iotests/263.out |   4 +-
>  tests/qemu-iotests/280.out |   2 +-
>  24 files changed, 283 insertions(+), 428 deletions(-)

Kind of annoying how the option order changes, but that's not
a functional problem and there's no easy way to avoid it.
Perhaps the original code should have alphabetically ordered
them but that's not your problem to solve really.

Reviewed-by: Daniel P. Berrangé 

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v2 04/14] block/amend: separate amend and create options for qemu-img

2020-04-28 Thread Daniel P . Berrangé

On Tue, Apr 28, 2020 at 04:03:33PM +0100, Daniel P. Berrangé wrote:
> On Sun, Mar 08, 2020 at 05:18:53PM +0200, Maxim Levitsky wrote:
> > Some options are only useful for creation
> > (or hard to be amended, like cluster size for qcow2), while some other
> > options are only useful for amend, like upcoming keyslot management
> > options for luks
> > 
> > Since currently only qcow2 supports amend, move all its options
> > to a common macro and then include it in each action option list.
> > 
> > In future it might be useful to remove some options which are
> > not supported anyway from amend list, which currently
> > cause an error message if amended.
> 
> In the v1 posting I had suggested changing this patch, so that it
> only adds things to the amend list that actually can be amended. 
> 
> https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg07570.html

Never mind, I should have read ahead in the series to see the next
patch

So for this patch

Reviewed-by: Daniel P. Berrangé 


> 
> > 
> > Signed-off-by: Maxim Levitsky 
> > ---
> >  block/qcow2.c | 160 +-
> >  include/block/block_int.h |   4 +
> >  qemu-img.c|  18 ++---
> >  3 files changed, 100 insertions(+), 82 deletions(-)
> > 
> > diff --git a/block/qcow2.c b/block/qcow2.c
> > index b55e5b7c1f..9574085772 100644
> > --- a/block/qcow2.c
> > +++ b/block/qcow2.c
> > @@ -5440,83 +5440,96 @@ void qcow2_signal_corruption(BlockDriverState *bs, 
> > bool fatal, int64_t offset,
> >  s->signaled_corruption = true;
> >  }
> >  
> > +#define QCOW_COMMON_OPTIONS \
> > +{   \
> > +.name = BLOCK_OPT_SIZE, \
> > +.type = QEMU_OPT_SIZE,  \
> > +.help = "Virtual disk size" \
> > +},  \
> > +{   \
> > +.name = BLOCK_OPT_COMPAT_LEVEL, \
> > +.type = QEMU_OPT_STRING,\
> > +.help = "Compatibility level (v2 [0.10] or v3 [1.1])"   \
> > +},  \
> > +{   \
> > +.name = BLOCK_OPT_BACKING_FILE, \
> > +.type = QEMU_OPT_STRING,\
> > +.help = "File name of a base image" \
> > +},  \
> > +{   \
> > +.name = BLOCK_OPT_BACKING_FMT,  \
> > +.type = QEMU_OPT_STRING,\
> > +.help = "Image format of the base image"\
> > +},  \
> > +{   \
> > +.name = BLOCK_OPT_DATA_FILE,\
> > +.type = QEMU_OPT_STRING,\
> > +.help = "File name of an external data file"\
> > +},  \
> > +{   \
> > +.name = BLOCK_OPT_DATA_FILE_RAW,\
> > +.type = QEMU_OPT_BOOL,  \
> > +.help = "The external data file must stay valid "   \
> > +"as a raw image"\
> > +},  \
> > +{   \
> > +.name = BLOCK_OPT_ENCRYPT,  \
> > +.type = QEMU_OPT_BOOL,  \
> > +.help = "Encrypt the image with format 'aes'. (Deprecated " \
> > +"in favor of " BLOCK_OPT_ENCRYPT_FORMAT "=aes)",\
> > +},  \
> > +{   \
> > +.name = BLOCK_OPT_ENCRYPT_FORMAT,   \
> > +.type = QEMU_OPT_STRING,\
> > +.help = "Encrypt the image, format choices: 'aes', 'luks'", \
> > +},  \
> > +BLOCK_CRYPTO_OPT_DEF_KEY_SECRET("encrypt.", \
> > +"ID of secret providing qcow AES key or LUKS passphrase"),  \
> > +

Re: backing chain & block status & filters

2020-04-28 Thread Vladimir Sementsov-Ogievskiy


28.04.2020 14:28, Kevin Wolf wrote:

Am 28.04.2020 um 13:08 hat Max Reitz geschrieben:

On 28.04.20 10:55, Vladimir Sementsov-Ogievskiy wrote:

Hi!

I wanted to resend my "[PATCH 0/4] fix & merge block_status_above and
is_allocated_above", and returned to all the inconsistencies about
block-status. I keep in mind Max's series about child-access functions,
and Andrey's work about using COR filter in block-stream, which depends
on Max's series (because, without them COR fitler with file child breaks
backing chains).. And, it seems that it's better to discuss some
questions before resending.

First, problems about block-status:

1. We consider ALLOCATED = ZERO | DATA, and documented as follows:

    * BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer
    * BDRV_BLOCK_ZERO: offset reads as zero
    * BDRV_BLOCK_OFFSET_VALID: an associated offset exists for accessing
raw data
    * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
    *   layer rather than any backing, set by block
layer

This actually means, that we should always have BDRV_BLOCK_ALLOCATED for
formats which doesn't support backing. So, all such format drivers must
return ZERO or DATA (or both?), yes?. Seems file-posix does so, but, for
example, iscsi - doesn't.


Hm.  I could imagine that there are formats that have non-zero holes
(e.g. 0xff or just garbage).  It would be a bit wrong for them to return
ZERO or DATA then.

But OTOH we don’t care about such cases, do we?  We need to know whether
ranges are zero, data, or unallocated.  If they aren’t zero, we only
care about whether reading from it will return data from this layer or not.

So I suppose that anything that doesn’t support backing files (or
filtered children) should always return ZERO and/or DATA.


I'm not sure I agree with the notion that everything should be
BDRV_BLOCK_ALLOCATED at the lowest layer. It's not what it means today
at least. If we want to change this, we will have to check all callers
of bdrv_is_allocated() and friends who might use this to find holes in
the file.


Yes. Because they are doing incorrect (or at least undocumented and unreliable) 
thing.



Basically, the way bdrv_is_allocated() works today is that we assume an
implicit zeroed backing layer even for block drivers that don't support
backing files.


But read doesn't work so: it will read data from the bottom layer, not from
this implicit zeroed backing layer. And it is inconsistent. On read data
comes exactly from this layer, not from its implicit backing. So it should
return BDRV_BLOCK_ALLOCATED, accordingly to its definition..

Or, we should at least document current behavior:

  BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
  layer rather than any backing, set by block. Attention: it may not be set
  for drivers without backing support, still data is of course read from
  this layer. Note, that for such drivers BDRV_BLOCK_ALLOCATED may mean
  allocation on fs level, which occupies real space on disk.. So, for such 
drivers

  ZERO | ALLOCATED means that, read as zero, data may be allocated on fs, or
  (most probably) not,
  don't look at ALLOCATED flag, as it is added by generic layer for another 
logic,
  not related to fs-allocation.

  0 means that, most probably, data doesn't occupy space on fs, zero-status is
  unknown (most probably non-zero)

 


--
Best regards,
Vladimir

Re: [PATCH v2 04/14] block/amend: separate amend and create options for qemu-img

2020-04-28 Thread Daniel P . Berrangé

On Sun, Mar 08, 2020 at 05:18:53PM +0200, Maxim Levitsky wrote:
> Some options are only useful for creation
> (or hard to be amended, like cluster size for qcow2), while some other
> options are only useful for amend, like upcoming keyslot management
> options for luks
> 
> Since currently only qcow2 supports amend, move all its options
> to a common macro and then include it in each action option list.
> 
> In future it might be useful to remove some options which are
> not supported anyway from amend list, which currently
> cause an error message if amended.

In the v1 posting I had suggested changing this patch, so that it
only adds things to the amend list that actually can be amended. 

https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg07570.html

> 
> Signed-off-by: Maxim Levitsky 
> ---
>  block/qcow2.c | 160 +-
>  include/block/block_int.h |   4 +
>  qemu-img.c|  18 ++---
>  3 files changed, 100 insertions(+), 82 deletions(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index b55e5b7c1f..9574085772 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -5440,83 +5440,96 @@ void qcow2_signal_corruption(BlockDriverState *bs, 
> bool fatal, int64_t offset,
>  s->signaled_corruption = true;
>  }
>  
> +#define QCOW_COMMON_OPTIONS \
> +{   \
> +.name = BLOCK_OPT_SIZE, \
> +.type = QEMU_OPT_SIZE,  \
> +.help = "Virtual disk size" \
> +},  \
> +{   \
> +.name = BLOCK_OPT_COMPAT_LEVEL, \
> +.type = QEMU_OPT_STRING,\
> +.help = "Compatibility level (v2 [0.10] or v3 [1.1])"   \
> +},  \
> +{   \
> +.name = BLOCK_OPT_BACKING_FILE, \
> +.type = QEMU_OPT_STRING,\
> +.help = "File name of a base image" \
> +},  \
> +{   \
> +.name = BLOCK_OPT_BACKING_FMT,  \
> +.type = QEMU_OPT_STRING,\
> +.help = "Image format of the base image"\
> +},  \
> +{   \
> +.name = BLOCK_OPT_DATA_FILE,\
> +.type = QEMU_OPT_STRING,\
> +.help = "File name of an external data file"\
> +},  \
> +{   \
> +.name = BLOCK_OPT_DATA_FILE_RAW,\
> +.type = QEMU_OPT_BOOL,  \
> +.help = "The external data file must stay valid "   \
> +"as a raw image"\
> +},  \
> +{   \
> +.name = BLOCK_OPT_ENCRYPT,  \
> +.type = QEMU_OPT_BOOL,  \
> +.help = "Encrypt the image with format 'aes'. (Deprecated " \
> +"in favor of " BLOCK_OPT_ENCRYPT_FORMAT "=aes)",\
> +},  \
> +{   \
> +.name = BLOCK_OPT_ENCRYPT_FORMAT,   \
> +.type = QEMU_OPT_STRING,\
> +.help = "Encrypt the image, format choices: 'aes', 'luks'", \
> +},  \
> +BLOCK_CRYPTO_OPT_DEF_KEY_SECRET("encrypt.", \
> +"ID of secret providing qcow AES key or LUKS passphrase"),  \
> +BLOCK_CRYPTO_OPT_DEF_LUKS_CIPHER_ALG("encrypt."),   \
> +BLOCK_CRYPTO_OPT_DEF_LUKS_CIPHER_MODE("encrypt."),  \
> +BLOCK_CRYPTO_OPT_DEF_LUKS_IVGEN_ALG("encrypt."),\
> +BLOCK_CRYPTO_OPT_DEF_LUKS_IVGEN_HASH_ALG("encrypt."),   \
> +BLOCK_CRYPTO_OPT_DEF_LUKS_HASH_ALG("encrypt."), \
> +

Re: [PATCH 4/4] block: Use blk_make_empty() after commits

2020-04-28 Thread Kevin Wolf

Am 28.04.2020 um 15:26 hat Max Reitz geschrieben:
> bdrv_commit() already has a BlockBackend pointing to the BDS that we
> want to empty, it just has the wrong permissions.
> 
> qemu-img commit has no BlockBackend pointing to the old backing file
> yet, but introducing one is simple.
> 
> After this commit, bdrv_make_empty() is the only remaining caller of
> BlockDriver.bdrv_make_empty().
> 
> Signed-off-by: Max Reitz 
> ---
>  block/commit.c |  8 +++-
>  qemu-img.c | 19 ++-
>  2 files changed, 21 insertions(+), 6 deletions(-)
> 
> diff --git a/block/commit.c b/block/commit.c
> index 8e672799af..24720ba67d 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -493,10 +493,16 @@ int bdrv_commit(BlockDriverState *bs)
>  }
>  
>  if (drv->bdrv_make_empty) {
> -ret = drv->bdrv_make_empty(bs);
> +ret = blk_set_perm(src, BLK_PERM_WRITE, BLK_PERM_ALL, NULL);

This is very likely to fail because the common case is that the source
node is attached to a guest device that doesn't share writes.
(qemu-iotests 131 and 274 catch this.)

So I think after my theoretical comment in patch 1, this is the
practical reason why we need WRITE_UNCHANGED rather than WRITE.

Also, why don't you take this permission from the start so that we would
error out right away rather than failing after waiting for the all the
data to be copied?

>  if (ret < 0) {
>  goto ro_cleanup;
>  }
> +
> +ret = blk_make_empty(src, NULL);
> +if (ret < 0) {
> +goto ro_cleanup;
> +}
> +
>  blk_flush(src);
>  }
>  
> diff --git a/qemu-img.c b/qemu-img.c
> index 821cbf610e..a5e8659867 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -1065,11 +1065,20 @@ static int img_commit(int argc, char **argv)
>  goto unref_backing;
>  }
>  
> -if (!drop && bs->drv->bdrv_make_empty) {
> -ret = bs->drv->bdrv_make_empty(bs);
> -if (ret) {
> -error_setg_errno(_err, -ret, "Could not empty %s",
> - filename);
> +if (!drop) {
> +BlockBackend *old_backing_blk;
> +
> +old_backing_blk = blk_new_with_bs(bs, BLK_PERM_WRITE, BLK_PERM_ALL,
> +  _err);

Oh, you actually depend on another series that you didn't mention in
the cover letter.

> +if (!old_backing_blk) {
> +goto unref_backing;
> +}
> +ret = blk_make_empty(old_backing_blk, _err);
> +blk_unref(old_backing_blk);
> +if (ret == -ENOTSUP) {
> +error_free(local_err);
> +local_err = NULL;
> +} else if (ret < 0) {
>  goto unref_backing;
>  }
>  }

Kevin

Re: [PATCH 2/4] block: Use bdrv_make_empty() where possible

2020-04-28 Thread Kevin Wolf

Am 28.04.2020 um 15:26 hat Max Reitz geschrieben:
> Signed-off-by: Max Reitz 

Reviewed-by: Kevin Wolf

Re: [PATCH 0/4] block: Do not call BlockDriver.bdrv_make_empty() directly

2020-04-28 Thread no-reply

Patchew URL: https://patchew.org/QEMU/20200428132629.796753-1-mre...@redhat.com/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#! /bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-mingw@fedora J=14 NETWORK=1
=== TEST SCRIPT END ===

  SIGNpc-bios/optionrom/linuxboot.bin
  SIGNpc-bios/optionrom/kvmvapic.bin
/tmp/qemu-test/src/qemu-img.c: In function 'img_commit':
/tmp/qemu-test/src/qemu-img.c:1071:27: error: implicit declaration of function 
'blk_new_with_bs'; did you mean 'blk_get_stats'? 
[-Werror=implicit-function-declaration]
 old_backing_blk = blk_new_with_bs(bs, BLK_PERM_WRITE, BLK_PERM_ALL,
   ^~~
   blk_get_stats
/tmp/qemu-test/src/qemu-img.c:1071:27: error: nested extern declaration of 
'blk_new_with_bs' [-Werror=nested-externs]
/tmp/qemu-test/src/qemu-img.c:1071:25: error: assignment to 'BlockBackend *' 
{aka 'struct BlockBackend *'} from 'int' makes pointer from integer without a 
cast [-Werror=int-conversion]
 old_backing_blk = blk_new_with_bs(bs, BLK_PERM_WRITE, BLK_PERM_ALL,
 ^
cc1: all warnings being treated as errors
make: *** [/tmp/qemu-test/src/rules.mak:69: qemu-img.o] Error 1
make: *** Waiting for unfinished jobs
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 664, in 
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=ce8305e22d9c45ed90ec2dc43d7f1cc3', '-u', 
'1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-kzo016wj/src/docker-src.2020-04-28-10.58.45.27385:/var/tmp/qemu:z,ro',
 'qemu:fedora', '/var/tmp/qemu/run', 'test-mingw']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=ce8305e22d9c45ed90ec2dc43d7f1cc3
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-kzo016wj/src'
make: *** [docker-run-test-mingw@fedora] Error 2

real3m31.450s
user0m8.978s


The full log is available at
http://patchew.org/logs/20200428132629.796753-1-mre...@redhat.com/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH 0/4] block: Do not call BlockDriver.bdrv_make_empty() directly

2020-04-28 Thread no-reply

Patchew URL: https://patchew.org/QEMU/20200428132629.796753-1-mre...@redhat.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  SIGNpc-bios/optionrom/linuxboot.bin
  SIGNpc-bios/optionrom/kvmvapic.bin
/tmp/qemu-test/src/qemu-img.c: In function 'img_commit':
/tmp/qemu-test/src/qemu-img.c:1071:9: error: implicit declaration of function 
'blk_new_with_bs' [-Werror=implicit-function-declaration]
 old_backing_blk = blk_new_with_bs(bs, BLK_PERM_WRITE, BLK_PERM_ALL,
 ^
/tmp/qemu-test/src/qemu-img.c:1071:9: error: nested extern declaration of 
'blk_new_with_bs' [-Werror=nested-externs]
/tmp/qemu-test/src/qemu-img.c:1071:25: error: assignment makes pointer from 
integer without a cast [-Werror]
 old_backing_blk = blk_new_with_bs(bs, BLK_PERM_WRITE, BLK_PERM_ALL,
 ^
cc1: all warnings being treated as errors
make: *** [qemu-img.o] Error 1
make: *** Waiting for unfinished jobs
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 664, in 
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=46a96d39761c47298a2d1eaffd870dde', '-u', 
'1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-5y1a3roy/src/docker-src.2020-04-28-10.55.01.17538:/var/tmp/qemu:z,ro',
 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=46a96d39761c47298a2d1eaffd870dde
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-5y1a3roy/src'
make: *** [docker-run-test-quick@centos7] Error 2

real2m49.941s
user0m8.502s


The full log is available at
http://patchew.org/logs/20200428132629.796753-1-mre...@redhat.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH 0/4] block: Do not call BlockDriver.bdrv_make_empty() directly

2020-04-28 Thread no-reply

Patchew URL: https://patchew.org/QEMU/20200428132629.796753-1-mre...@redhat.com/



Hi,

This series failed the asan build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-debug@fedora TARGET_LIST=x86_64-softmmu J=14 NETWORK=1
=== TEST SCRIPT END ===

  BUILD   pc-bios/optionrom/pvh.img
  BUILD   pc-bios/optionrom/pvh.raw
  SIGNpc-bios/optionrom/pvh.bin
/tmp/qemu-test/src/qemu-img.c:1071:27: error: implicit declaration of function 
'blk_new_with_bs' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
old_backing_blk = blk_new_with_bs(bs, BLK_PERM_WRITE, BLK_PERM_ALL,
  ^
/tmp/qemu-test/src/qemu-img.c:1071:27: error: this function declaration is not 
a prototype [-Werror,-Wstrict-prototypes]
/tmp/qemu-test/src/qemu-img.c:1071:25: error: incompatible integer to pointer 
conversion assigning to 'BlockBackend *' (aka 'struct BlockBackend *') from 
'int' [-Werror,-Wint-conversion]
old_backing_blk = blk_new_with_bs(bs, BLK_PERM_WRITE, BLK_PERM_ALL,
^ ~
3 errors generated.
make: *** [/tmp/qemu-test/src/rules.mak:69: qemu-img.o] Error 1
make: *** Waiting for unfinished jobs
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 664, in 
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=c01c8a7b80794cb2a4458fd17b18ff83', '-u', 
'1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 
'TARGET_LIST=x86_64-softmmu', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 
'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', 
'-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-7kkbz7sd/src/docker-src.2020-04-28-10.48.39.6384:/var/tmp/qemu:z,ro',
 'qemu:fedora', '/var/tmp/qemu/run', 'test-debug']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=c01c8a7b80794cb2a4458fd17b18ff83
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-7kkbz7sd/src'
make: *** [docker-run-test-debug@fedora] Error 2

real4m22.457s
user0m8.829s


The full log is available at
http://patchew.org/logs/20200428132629.796753-1-mre...@redhat.com/testing.asan/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: backing chain & block status & filters

2020-04-28 Thread Vladimir Sementsov-Ogievskiy


28.04.2020 14:08, Max Reitz wrote:

On 28.04.20 10:55, Vladimir Sementsov-Ogievskiy wrote:

Hi!

I wanted to resend my "[PATCH 0/4] fix & merge block_status_above and
is_allocated_above", and returned to all the inconsistencies about
block-status. I keep in mind Max's series about child-access functions,
and Andrey's work about using COR filter in block-stream, which depends
on Max's series (because, without them COR fitler with file child breaks
backing chains).. And, it seems that it's better to discuss some
questions before resending.

First, problems about block-status:

1. We consider ALLOCATED = ZERO | DATA, and documented as follows:

    * BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer
    * BDRV_BLOCK_ZERO: offset reads as zero
    * BDRV_BLOCK_OFFSET_VALID: an associated offset exists for accessing
raw data
    * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
    *   layer rather than any backing, set by block
layer

This actually means, that we should always have BDRV_BLOCK_ALLOCATED for
formats which doesn't support backing. So, all such format drivers must
return ZERO or DATA (or both?), yes?. Seems file-posix does so, but, for
example, iscsi - doesn't.


Hm.  I could imagine that there are formats that have non-zero holes
(e.g. 0xff or just garbage).  It would be a bit wrong for them to return
ZERO or DATA then.

But OTOH we don’t care about such cases, do we?  We need to know whether
ranges are zero, data, or unallocated.  If they aren’t zero, we only
care about whether reading from it will return data from this layer or not.

So I suppose that anything that doesn’t support backing files (or
filtered children) should always return ZERO and/or DATA.


2. ZERO. The meaning differs a bit for generic block_status and for
drivers.. I think, we at least should document it like this:

BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer
BDRV_BLOCK_ZERO: if driver return ZERO, than the region is allocated at
this layer and read as ZERO. If generic block_status returns ZERO, it
only mean that it reads as zero, but the region may be allocated on
underlying level.


Hm.  What does that mean?

One of the problems is that “allocated” has two meanings:
(1) reading data returns data defined at this backing layer,
(2) actually allocated, i.e. takes up space on the file represented by
this BDS.

As far as I understand, we actually don’t care about (2) in the context
of block_status, but just about (1).

So if a layer returns ZERO, it is by definition (1)-allocated.  (It
isn’t necessarily (2)-allocated.)


3. bdi.unallocated_blocks_are_zero

I think it's very bad, that we have formats, that supports backing, but
doesn't report bdi.unallocated_blocks_are_zero as true. It means that
UNALLOCATED region reads as zero if we have short backing file, and not
as zero if we remove this short backing file.


What do you mean by “remove this short backing file”?  Because generally
one can’t just drop a backing file.

So maybe a case like block-stream?  Wouldn’t that be a bug in
block-stream them, i.e. shouldn’t it stream zeros after the end of the
backing file?


I can live with it but
this is weird logic. These bad drivers are qcow (not qcow2), parallels
and vmdk. I hope, they actually just forget to set
unallocated_blocks_are_zero to true.


qcow definitely sounds like it.


Next. But what about drivers which doesn't support backing? As we
considered above, they should always return ZERO or DATA, as everything
is allocated in this backing-chain level (last level, of course).. So
again unallocated_blocks_are_zero is meaningless. So, I think, that
driver which doesn't support backings, should be fixed to return always
ZERO or DATA, than we don't need this unallocated_blocks_are_zero at all.


Agreed.


3.


The second 3.? :)


Short backing files in allocated_above: we must consider space after
EOF as ALLOCATED, if short backing file is inside requested
backing-chain part, as it produced exactly because of this short file
(and we never go to backing).


Sounds correct.


(current realization of allocated_above is
buggy, see my outdated series "[PATCH 0/4] fix & merge
block_status_above and is_allocated_above")

4. Long ago we've discussed problems about BDRV_BLOCK_RAW, when we have
a backing chain of non-backing child.. I just remember that we didn't
reach the consensus.


Possible? :)


5. Filters.. OK we have two functions for them:
bdrv_co_block_status_from_file and bdrv_co_block_status_from_backing. I
think both are wrong:

bdrv_co_block_status_from_file leads to problem [4], when we can report
UNALLOCATED, which refers not to the current backing chain, but to sub
backing chain of file child, which is inconsistent with
block_status_above and is_allocated_above iteration.

bdrv_co_block_status_from_backing is also is not consistent with
block_status_above iteration.. At least at leads to querying the same
node twice.


Well, yes.


So,

Re: [PATCH 3/4] block: Add blk_make_empty()

2020-04-28 Thread Kevin Wolf

Am 28.04.2020 um 15:26 hat Max Reitz geschrieben:
> Two callers of BlockDriver.bdrv_make_empty() remain that should not call
> this method directly.  Both do not have access to a BdrvChild, but they
> can use a BlockBackend, so we add this function that lets them use it.
> 
> Signed-off-by: Max Reitz 
> ---
>  include/sysemu/block-backend.h | 2 ++
>  block/block-backend.c  | 5 +
>  2 files changed, 7 insertions(+)
> 
> diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
> index d37c1244dd..14338b76dc 100644
> --- a/include/sysemu/block-backend.h
> +++ b/include/sysemu/block-backend.h
> @@ -266,4 +266,6 @@ int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, 
> int64_t off_in,
>  
>  const BdrvChild *blk_root(BlockBackend *blk);
>  
> +int blk_make_empty(BlockBackend *blk, Error **errp);
> +
>  #endif
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 3592066b42..5d36efd32f 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -2402,3 +2402,8 @@ const BdrvChild *blk_root(BlockBackend *blk)
>  {
>  return blk->root;
>  }
> +
> +int blk_make_empty(BlockBackend *blk, Error **errp)
> +{
> +return bdrv_make_empty(blk->root, errp);
> +}

Should we check that blk->root != NULL? Most other functions do that
through blk_is_available().

Kevin

Re: [PATCH v21 0/4] implement zstd cluster compression method

2020-04-28 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/20200428133407.10657-1-dplotni...@virtuozzo.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [PATCH v21 0/4] implement zstd cluster compression method
Message-id: 20200428133407.10657-1-dplotni...@virtuozzo.com
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Switched to a new branch 'test'
146ff2a iotests: 287: add qcow2 compression type test
694bbc7 qcow2: add zstd cluster compression
954ac0d qcow2: rework the cluster compression routine
3294c21 qcow2: introduce compression type feature

=== OUTPUT BEGIN ===
1/4 Checking commit 3294c21f8a66 (qcow2: introduce compression type feature)
2/4 Checking commit 954ac0d0a541 (qcow2: rework the cluster compression routine)
3/4 Checking commit 694bbc7bf00f (qcow2: add zstd cluster compression)
ERROR: do not use assignment in if condition
#115: FILE: block/qcow2-threads.c:225:
+if ((zstd_ret = ZSTD_compressStream2(cctx, , , ZSTD_e_end))) {

total: 1 errors, 0 warnings, 238 lines checked

Patch 3/4 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

4/4 Checking commit 146ff2a078a9 (iotests: 287: add qcow2 compression type test)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#23: 
new file mode 100755

total: 0 errors, 1 warnings, 228 lines checked

Patch 4/4 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200428133407.10657-1-dplotni...@virtuozzo.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH 3/4] block: Add blk_make_empty()

2020-04-28 Thread Eric Blake


On 4/28/20 8:55 AM, Eric Blake wrote:


+++ b/include/sysemu/block-backend.h
@@ -266,4 +266,6 @@ int coroutine_fn blk_co_copy_range(BlockBackend 
*blk_in, int64_t off_in,

  const BdrvChild *blk_root(BlockBackend *blk);
+int blk_make_empty(BlockBackend *blk, Error **errp);
+


Again, a flag parameter to allow use of BDRV_REQ_NO_FALLBACK would be nice.


Or maybe not, after reading Kevin's responses.  Making an image empty is 
not the same as making it read as zero.  If we can't come up with a use 
for a flag, then deferring the addition of a flag until later is a 
perfectly reasonable approach (rather than adding a flag now that will 
never get set to anything other than 0).  This isn't quite the same as a 
public API where we would regret being locked out of a flag down the road.



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 1/4] block: Add bdrv_make_empty()

2020-04-28 Thread Eric Blake


On 4/28/20 9:16 AM, Kevin Wolf wrote:



Yes.  Although now I'm wondering if the two should remain separate or should
just be a single driver callback where flags can include BDRV_REQ_ZERO_WRITE
to distinguish whether exposing the backing file vs. reading as all zeroes
is intended, or if that is merging too much.


I don't think the implementations for both things are too similar, so
you might just end up having two if branches and then two separate
implementations in the block drivers.



Yeah, the more I think about it, the more two callbacks still make 
sense.  .bdrv_make_empty may or may not need a flag, but .bdrv_make_zero 
definitely does (because that's where we want a difference between 
making the entire image zero no matter the delay, vs. only making it all 
zero if it is is fast).



If anything, bdrv_make_empty() is more related to discard than
write_zeroes. But we use the discard code for it in qcow2 only as a
fallback because in the most common cases, making an image completely
empty means effectively just creating an entirely new L1 and refcount
table, which is much faster than individually discarding all clusters.

For bdrv_make_zero() I don't see an opportunity for such optimisations,
so I don't really see a reason to have a separate callback. Unless you
do know one?


The optimization I have in mind is adding a qcow2 autoclear bit to track 
when an image is known to read as all zero - at which point 
.bdrv_make_zero instantly returns success.  For raw files, a possible 
optimization is to truncate to size 0 and then back to the full size, 
when it is known that truncation forces read-as-zero.  For NBD, I'm 
still playing with whether adding new 64-bit transactions for a bulk 
zero will be useful, and even if not, maybe special-casing 
NBD_CMD_WRITE_ZEROES with a size of 0 to do a bulk zero, if both server 
and client negotiated that particular meaning.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 3/4] block: Add blk_make_empty()

2020-04-28 Thread Eric Blake


On 4/28/20 8:26 AM, Max Reitz wrote:

Two callers of BlockDriver.bdrv_make_empty() remain that should not call
this method directly.  Both do not have access to a BdrvChild, but they
can use a BlockBackend, so we add this function that lets them use it.

Signed-off-by: Max Reitz 
---
  include/sysemu/block-backend.h | 2 ++
  block/block-backend.c  | 5 +
  2 files changed, 7 insertions(+)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index d37c1244dd..14338b76dc 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -266,4 +266,6 @@ int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, 
int64_t off_in,
  
  const BdrvChild *blk_root(BlockBackend *blk);
  
+int blk_make_empty(BlockBackend *blk, Error **errp);

+


Again, a flag parameter to allow use of BDRV_REQ_NO_FALLBACK would be nice.


  #endif
diff --git a/block/block-backend.c b/block/block-backend.c
index 3592066b42..5d36efd32f 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2402,3 +2402,8 @@ const BdrvChild *blk_root(BlockBackend *blk)
  {
  return blk->root;
  }
+
+int blk_make_empty(BlockBackend *blk, Error **errp)
+{
+return bdrv_make_empty(blk->root, errp);
+}



Otherwise looks fine.

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 1/4] block: Add bdrv_make_empty()

2020-04-28 Thread Kevin Wolf

Am 28.04.2020 um 15:26 hat Max Reitz geschrieben:
> Right now, all users of bdrv_make_empty() call the BlockDriver method
> directly.  That is not only bad style, it is also wrong, unless the
> caller has a BdrvChild with a WRITE permission.
> 
> Introduce bdrv_make_empty() that verifies that it does.
> 
> Signed-off-by: Max Reitz 
> ---
>  include/block/block.h |  1 +
>  block.c   | 23 +++
>  2 files changed, 24 insertions(+)
> 
> diff --git a/include/block/block.h b/include/block/block.h
> index b05995fe9c..d947fb4080 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -351,6 +351,7 @@ BlockMeasureInfo *bdrv_measure(BlockDriver *drv, QemuOpts 
> *opts,
>  void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr);
>  void bdrv_refresh_limits(BlockDriverState *bs, Error **errp);
>  int bdrv_commit(BlockDriverState *bs);
> +int bdrv_make_empty(BdrvChild *c, Error **errp);
>  int bdrv_change_backing_file(BlockDriverState *bs,
>  const char *backing_file, const char *backing_fmt);
>  void bdrv_register(BlockDriver *bdrv);
> diff --git a/block.c b/block.c
> index 2e3905c99e..b0d5b98617 100644
> --- a/block.c
> +++ b/block.c
> @@ -6791,3 +6791,26 @@ void bdrv_del_child(BlockDriverState *parent_bs, 
> BdrvChild *child, Error **errp)
>  
>  parent_bs->drv->bdrv_del_child(parent_bs, child, errp);
>  }
> +
> +int bdrv_make_empty(BdrvChild *c, Error **errp)
> +{
> +BlockDriver *drv = c->bs->drv;
> +int ret;
> +
> +assert(c->perm & BLK_PERM_WRITE);

If I understand correctly, bdrv_make_empty() is called to drop an
overlay whose content is identical to what it would read from its
backing file (in particular after a commit operation). This means that
the caller promises that the visible content doesn't change.

So should we check BLK_PERM_WRITE_UNCHANGED instead?

Kevin

Re: [PATCH 1/4] block: Add bdrv_make_empty()

2020-04-28 Thread Kevin Wolf

Am 28.04.2020 um 16:07 hat Eric Blake geschrieben:
> On 4/28/20 9:01 AM, Kevin Wolf wrote:
> 
> > > Can we please fix this to take a flags parameter?  I want to make it 
> > > easier
> > > for callers to request BDRV_REQ_NO_FALLBACK for distinguishing between
> > > callers where the image must be made empty (read as all zeroes) regardless
> > > of time spent, vs. made empty quickly (including if it is already all 
> > > zero)
> > > but where the caller is prepared for the operation to fail and will write
> > > zeroes itself if fast bulk zeroing was not possible.
> > 
> > bdrv_make_empty() is not for making an image read as all zeroes, but to
> > make it fully unallocated so that the backing file becomes visible.
> > 
> > Are you confusing it with bdrv_make_zero(), which is just a wrapper
> > around bdrv_pwrite_zeroes() and does take flags?
> 
> Yes.  Although now I'm wondering if the two should remain separate or should
> just be a single driver callback where flags can include BDRV_REQ_ZERO_WRITE
> to distinguish whether exposing the backing file vs. reading as all zeroes
> is intended, or if that is merging too much.

I don't think the implementations for both things are too similar, so
you might just end up having two if branches and then two separate
implementations in the block drivers.

If anything, bdrv_make_empty() is more related to discard than
write_zeroes. But we use the discard code for it in qcow2 only as a
fallback because in the most common cases, making an image completely
empty means effectively just creating an entirely new L1 and refcount
table, which is much faster than individually discarding all clusters.

For bdrv_make_zero() I don't see an opportunity for such optimisations,
so I don't really see a reason to have a separate callback. Unless you
do know one?

Kevin

Re: [PATCH 0/4] block: Do not call BlockDriver.bdrv_make_empty() directly

2020-04-28 Thread Eric Blake


On 4/28/20 8:43 AM, no-re...@patchew.org wrote:

Patchew URL: https://patchew.org/QEMU/20200428132629.796753-1-mre...@redhat.com/





   SIGNpc-bios/optionrom/pvh.bin
/tmp/qemu-test/src/qemu-img.c: In function 'img_commit':
/tmp/qemu-test/src/qemu-img.c:1071:9: error: implicit declaration of function 
'blk_new_with_bs' [-Werror=implicit-function-declaration]
  old_backing_blk = blk_new_with_bs(bs, BLK_PERM_WRITE, BLK_PERM_ALL,


Ah, you forgot to tell patchew:

Based-on: <20200424190903.522087-1-ebl...@redhat.com>
[PATCH v3 0/3] qcow2: Allow resize of images with internal snapshots

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 0/4] block: Do not call BlockDriver.bdrv_make_empty() directly

2020-04-28 Thread Eric Blake


On 4/28/20 8:49 AM, Eric Blake wrote:

On 4/28/20 8:26 AM, Max Reitz wrote:

Branch: https://github.com/XanClic/qemu.git fix-bdrv_make_empty-v1
Branch: https://git.xanclic.moe/XanClic/qemu.git fix-bdrv_make_empty-v1

Hi,

Right now, there is no centralized bdrv_make_empty() function.  Not only
is it bad style to call BlockDriver methods directly, it is also wrong,
unless the caller has a BdrvChild with BLK_PERM_WRITE taken.


I'm also in the middle of writing a patch series that adds a 
corresponding .bdrv_make_empty driver callback.  I'll rebase that work 
on top of this, as part of my efforts at fixing more code to rely on 
bdrv_make_empty rather than directly querying 
bdrv_has_zero_init[_truncate].


Correction - I'm working on adding .bdrv_make_zero, not .bdrv_make empty 
(which already exists), although maybe it really only needs to be one 
callback instead of two if we have decent flags.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: fdatasync semantics and block device backup

2020-04-28 Thread Kevin Wolf

Am 28.04.2020 um 15:58 hat Bryan S Rosenburg geschrieben:
> Kevin Wolf  wrote on 04/28/2020 07:11:24 AM:
> > I think "don't do that" is a good answer actually.
> > 
> > You may want to put an NBD indirection between QEMU and your object
> > store, so that the close() syscall will just block a qemu-nbd process
> > that has already closed its connection to QEMU instead of blocking all
> > of QEMU.
> > 
> > It is possible to disable fdatasync() by specifying cache=unsafe for
> > the block device, so you could avoid the penalty of repeated syncs on
> > s3fs.
> > 
> > Of course, if s3fs requires an fsync before data is actually stable, in
> > this case you couldn't consider your backup completed when the backup
> > block job finishes successfully, but you would have to issue an fsync
> > manually and wait for its result before you can consider the backup
> > successful.
> > 
> > Kevin
> 
> Thanks, Kevin.
> 
> It sounds like we should be specifying cache=unsafe when using 
> rclone-mount, at least, so qemu won't think the file system is 
> implementing fdatasyncs when it's not.

I don't think it makes a difference in this case. In the end, whether
QEMU throws the fdatasync away or a lower layer does so, the result is
the same.

cache=unsafe would only be to disable fdatasync in cases where it is
very expensive, and with the idea that you later do a (single) fdatasync
manually.

Kevin

Re: [PATCH 1/4] block: Add bdrv_make_empty()

2020-04-28 Thread Eric Blake


On 4/28/20 9:01 AM, Kevin Wolf wrote:


Can we please fix this to take a flags parameter?  I want to make it easier
for callers to request BDRV_REQ_NO_FALLBACK for distinguishing between
callers where the image must be made empty (read as all zeroes) regardless
of time spent, vs. made empty quickly (including if it is already all zero)
but where the caller is prepared for the operation to fail and will write
zeroes itself if fast bulk zeroing was not possible.


bdrv_make_empty() is not for making an image read as all zeroes, but to
make it fully unallocated so that the backing file becomes visible.

Are you confusing it with bdrv_make_zero(), which is just a wrapper
around bdrv_pwrite_zeroes() and does take flags?


Yes.  Although now I'm wondering if the two should remain separate or 
should just be a single driver callback where flags can include 
BDRV_REQ_ZERO_WRITE to distinguish whether exposing the backing file vs. 
reading as all zeroes is intended, or if that is merging too much.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 4/4] block: Use blk_make_empty() after commits

2020-04-28 Thread Eric Blake


On 4/28/20 8:26 AM, Max Reitz wrote:

bdrv_commit() already has a BlockBackend pointing to the BDS that we
want to empty, it just has the wrong permissions.

qemu-img commit has no BlockBackend pointing to the old backing file
yet, but introducing one is simple.

After this commit, bdrv_make_empty() is the only remaining caller of
BlockDriver.bdrv_make_empty().

Signed-off-by: Max Reitz 
---
  block/commit.c |  8 +++-
  qemu-img.c | 19 ++-
  2 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/block/commit.c b/block/commit.c
index 8e672799af..24720ba67d 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -493,10 +493,16 @@ int bdrv_commit(BlockDriverState *bs)
  }
  
  if (drv->bdrv_make_empty) {


This 'if' is still a bit awkward. Do all filter drivers set this 
function, or will bdrv_make_empty() automatically take care of looking 
through filters?  Should this be a check of a supported_ variable 
instead (similar to how Kevin just added supported_truncate_flags)?



-ret = drv->bdrv_make_empty(bs);
+ret = blk_set_perm(src, BLK_PERM_WRITE, BLK_PERM_ALL, NULL);
  if (ret < 0) {
  goto ro_cleanup;
  }
+
+ret = blk_make_empty(src, NULL);


So, if the driver lacks the callback, you miss calling blk_make_empty() 
even if it would have done something.



+if (ret < 0) {
+goto ro_cleanup;
+}
+
  blk_flush(src);
  }
  
diff --git a/qemu-img.c b/qemu-img.c

index 821cbf610e..a5e8659867 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1065,11 +1065,20 @@ static int img_commit(int argc, char **argv)
  goto unref_backing;
  }
  
-if (!drop && bs->drv->bdrv_make_empty) {

-ret = bs->drv->bdrv_make_empty(bs);


Old code: if the driver had the callback, use it.


-if (ret) {
-error_setg_errno(_err, -ret, "Could not empty %s",
- filename);
+if (!drop) {
+BlockBackend *old_backing_blk;
+
+old_backing_blk = blk_new_with_bs(bs, BLK_PERM_WRITE, BLK_PERM_ALL,
+  _err);
+if (!old_backing_blk) {
+goto unref_backing;
+}
+ret = blk_make_empty(old_backing_blk, _err);


New code: Call blk_make_empty() whether or not driver has the callback, 
then deal with the fallout.



+blk_unref(old_backing_blk);
+if (ret == -ENOTSUP) {
+error_free(local_err);
+local_err = NULL;
+} else if (ret < 0) {
  goto unref_backing;
  }


But this actually looks smarter than the commit case.

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 0/4] block: Do not call BlockDriver.bdrv_make_empty() directly

2020-04-28 Thread no-reply

Patchew URL: https://patchew.org/QEMU/20200428132629.796753-1-mre...@redhat.com/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#! /bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-mingw@fedora J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC  aarch64-softmmu/exec-vary.o
  CC  aarch64-softmmu/tcg/tcg.o
/tmp/qemu-test/src/qemu-img.c: In function 'img_commit':
/tmp/qemu-test/src/qemu-img.c:1071:27: error: implicit declaration of function 
'blk_new_with_bs'; did you mean 'blk_get_stats'? 
[-Werror=implicit-function-declaration]
 old_backing_blk = blk_new_with_bs(bs, BLK_PERM_WRITE, BLK_PERM_ALL,
   ^~~
   blk_get_stats
/tmp/qemu-test/src/qemu-img.c:1071:27: error: nested extern declaration of 
'blk_new_with_bs' [-Werror=nested-externs]
/tmp/qemu-test/src/qemu-img.c:1071:25: error: assignment to 'BlockBackend *' 
{aka 'struct BlockBackend *'} from 'int' makes pointer from integer without a 
cast [-Werror=int-conversion]
 old_backing_blk = blk_new_with_bs(bs, BLK_PERM_WRITE, BLK_PERM_ALL,
 ^
cc1: all warnings being treated as errors
make: *** [/tmp/qemu-test/src/rules.mak:69: qemu-img.o] Error 1
make: *** Waiting for unfinished jobs
  CC  aarch64-softmmu/tcg/tcg-op.o
  CC  aarch64-softmmu/tcg/tcg-op-vec.o
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=7e33797e28514c48a1cce7021d8b04fd', '-u', 
'1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-dh8ik377/src/docker-src.2020-04-28-09.44.17.22581:/var/tmp/qemu:z,ro',
 'qemu:fedora', '/var/tmp/qemu/run', 'test-mingw']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=7e33797e28514c48a1cce7021d8b04fd
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-dh8ik377/src'
make: *** [docker-run-test-mingw@fedora] Error 2

real4m4.967s
user0m8.349s


The full log is available at
http://patchew.org/logs/20200428132629.796753-1-mre...@redhat.com/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH 1/4] block: Add bdrv_make_empty()

2020-04-28 Thread Eric Blake


On 4/28/20 8:26 AM, Max Reitz wrote:

Right now, all users of bdrv_make_empty() call the BlockDriver method
directly.  That is not only bad style, it is also wrong, unless the
caller has a BdrvChild with a WRITE permission.

Introduce bdrv_make_empty() that verifies that it does.

Signed-off-by: Max Reitz 
---
  include/block/block.h |  1 +
  block.c   | 23 +++
  2 files changed, 24 insertions(+)

diff --git a/include/block/block.h b/include/block/block.h
index b05995fe9c..d947fb4080 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -351,6 +351,7 @@ BlockMeasureInfo *bdrv_measure(BlockDriver *drv, QemuOpts 
*opts,
  void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr);
  void bdrv_refresh_limits(BlockDriverState *bs, Error **errp);
  int bdrv_commit(BlockDriverState *bs);
+int bdrv_make_empty(BdrvChild *c, Error **errp);


Can we please fix this to take a flags parameter?  I want to make it 
easier for callers to request BDRV_REQ_NO_FALLBACK for distinguishing 
between callers where the image must be made empty (read as all zeroes) 
regardless of time spent, vs. made empty quickly (including if it is 
already all zero) but where the caller is prepared for the operation to 
fail and will write zeroes itself if fast bulk zeroing was not possible.




+int bdrv_make_empty(BdrvChild *c, Error **errp)
+{
+BlockDriver *drv = c->bs->drv;
+int ret;
+
+assert(c->perm & BLK_PERM_WRITE);
+
+if (!drv->bdrv_make_empty) {
+error_setg(errp, "%s does not support emptying nodes",
+   drv->format_name);
+return -ENOTSUP;
+}


And here's where we can add some automatic fallbacks, such as 
recognizing if the image already reads as all zeroes.  But those 
optimizations can come in separate patches; for YOUR patch, just getting 
the proper API in place is fine.



+
+ret = drv->bdrv_make_empty(c->bs);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to empty %s",
+ c->bs->filename);
+return ret;
+}
+
+return 0;
+}



Other than a missing flag parameter, this looks fine.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 1/4] block: Add bdrv_make_empty()

2020-04-28 Thread Kevin Wolf

Am 28.04.2020 um 15:53 hat Eric Blake geschrieben:
> On 4/28/20 8:26 AM, Max Reitz wrote:
> > Right now, all users of bdrv_make_empty() call the BlockDriver method
> > directly.  That is not only bad style, it is also wrong, unless the
> > caller has a BdrvChild with a WRITE permission.
> > 
> > Introduce bdrv_make_empty() that verifies that it does.
> > 
> > Signed-off-by: Max Reitz 
> > ---
> >   include/block/block.h |  1 +
> >   block.c   | 23 +++
> >   2 files changed, 24 insertions(+)
> > 
> > diff --git a/include/block/block.h b/include/block/block.h
> > index b05995fe9c..d947fb4080 100644
> > --- a/include/block/block.h
> > +++ b/include/block/block.h
> > @@ -351,6 +351,7 @@ BlockMeasureInfo *bdrv_measure(BlockDriver *drv, 
> > QemuOpts *opts,
> >   void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr);
> >   void bdrv_refresh_limits(BlockDriverState *bs, Error **errp);
> >   int bdrv_commit(BlockDriverState *bs);
> > +int bdrv_make_empty(BdrvChild *c, Error **errp);
> 
> Can we please fix this to take a flags parameter?  I want to make it easier
> for callers to request BDRV_REQ_NO_FALLBACK for distinguishing between
> callers where the image must be made empty (read as all zeroes) regardless
> of time spent, vs. made empty quickly (including if it is already all zero)
> but where the caller is prepared for the operation to fail and will write
> zeroes itself if fast bulk zeroing was not possible.

bdrv_make_empty() is not for making an image read as all zeroes, but to
make it fully unallocated so that the backing file becomes visible.

Are you confusing it with bdrv_make_zero(), which is just a wrapper
around bdrv_pwrite_zeroes() and does take flags?

Kevin

RE: fdatasync semantics and block device backup

2020-04-28 Thread Bryan S Rosenburg

Kevin Wolf  wrote on 04/28/2020 07:11:24 AM:
> 
> Am 27.04.2020 um 21:49 hat Bryan S Rosenburg geschrieben:
> > Blockdev community,
> > 
> > Our group would like to write block device backups directly to an 
object 
> > store, using an interface such as s3fs or rclone-mount. We've run into 

> > problems with both interfaces, and in both cases the problems revolve 
> > around fdatasync system calls. With s3fs, fdatasync calls are 
painfully 
> > slow. With rclone-mount, the calls are very fast but don't do 
anything.
> > 
> > Syncing files to an object store is inherently problematic, as a 
proper 
> > sync requires finalizing the object that holds the file. After 
> > finalization, additional writes to the file require a new object to be 

> > created and the old object to be copied and destroyed. This process 
> > results in an N-squared performance problem for files that are synced 
> > periodically as they are written, as is the case for qemu backups.
> > 
> > Empirically, s3fs implements fdatasync, and hence backups written to 
s3fs 
> > take an untenably long time. I can provide data and straces, if 
needed.
> > 
> > Backups written to rclone-mount are much faster, but there are obvious 

> > semantic problems. The backup job completes successfully before the 
file 
> > is actually stable in the object store. And in fact, a lot of the work 
of 
> > finalizing the file occurs during the "close" system call that is 
invoked 
> > as part of the qmp_blockdev_del operation.The syscall causes that 
> > operation to take so long that other commands time out waiting to 
"acquire 
> > state change lock (held by monitor qemuProcessEventHandler)".
> > 
> > My questions for the group are: Has anyone else tried writing backups 
to 
> > file systems that don't have good support for fdatasync, and do you 
have 
> > any advice other than "Don't do that." ?
> 
> I think "don't do that" is a good answer actually.
> 
> You may want to put an NBD indirection between QEMU and your object
> store, so that the close() syscall will just block a qemu-nbd process
> that has already closed its connection to QEMU instead of blocking all
> of QEMU.
> 
> It is possible to disable fdatasync() by specifying cache=unsafe for
> the block device, so you could avoid the penalty of repeated syncs on
> s3fs.
> 
> Of course, if s3fs requires an fsync before data is actually stable, in
> this case you couldn't consider your backup completed when the backup
> block job finishes successfully, but you would have to issue an fsync
> manually and wait for its result before you can consider the backup
> successful.
> 
> Kevin

Thanks, Kevin.

It sounds like we should be specifying cache=unsafe when using 
rclone-mount, at least, so qemu won't think the file system is 
implementing fdatasyncs when it's not.

- Bryan

Re: [PATCH 2/4] block: Use bdrv_make_empty() where possible

2020-04-28 Thread Eric Blake


On 4/28/20 8:26 AM, Max Reitz wrote:

Signed-off-by: Max Reitz 
---
  block/replication.c | 6 ++
  block/vvfat.c   | 4 +---
  2 files changed, 3 insertions(+), 7 deletions(-)



Yes, definitely nicer :)  May have some obvious fallout to add a 0 flag 
parameter, per my request on 1/4, but that doesn't stop me from giving:


Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 0/4] block: Do not call BlockDriver.bdrv_make_empty() directly

2020-04-28 Thread Eric Blake


On 4/28/20 8:26 AM, Max Reitz wrote:

Branch: https://github.com/XanClic/qemu.git fix-bdrv_make_empty-v1
Branch: https://git.xanclic.moe/XanClic/qemu.git fix-bdrv_make_empty-v1

Hi,

Right now, there is no centralized bdrv_make_empty() function.  Not only
is it bad style to call BlockDriver methods directly, it is also wrong,
unless the caller has a BdrvChild with BLK_PERM_WRITE taken.


I'm also in the middle of writing a patch series that adds a 
corresponding .bdrv_make_empty driver callback.  I'll rebase that work 
on top of this, as part of my efforts at fixing more code to rely on 
bdrv_make_empty rather than directly querying bdrv_has_zero_init[_truncate].




This series fixes that.

Note that as far as I’m aware this series shouldn’t visibly fix anything
at this point; but “block: Introduce real BdrvChildRole”
(https://lists.nongnu.org/archive/html/qemu-block/2020-02/msg00737.html)
makes the iotest break when run with -o data_file=$SOMETHING, without
this series applied beforehand.  (That is because without that series,
external data files are treated much like metadata children, so the
format driver always takes the WRITE permission if the file is writable;
but after that series, it only does so when it itself has a parent
requestion the WRITE permission.)



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 0/4] block: Do not call BlockDriver.bdrv_make_empty() directly

2020-04-28 Thread no-reply

Patchew URL: https://patchew.org/QEMU/20200428132629.796753-1-mre...@redhat.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  BUILD   pc-bios/optionrom/pvh.raw
  SIGNpc-bios/optionrom/pvh.bin
/tmp/qemu-test/src/qemu-img.c: In function 'img_commit':
/tmp/qemu-test/src/qemu-img.c:1071:9: error: implicit declaration of function 
'blk_new_with_bs' [-Werror=implicit-function-declaration]
 old_backing_blk = blk_new_with_bs(bs, BLK_PERM_WRITE, BLK_PERM_ALL,
 ^
/tmp/qemu-test/src/qemu-img.c:1071:9: error: nested extern declaration of 
'blk_new_with_bs' [-Werror=nested-externs]
/tmp/qemu-test/src/qemu-img.c:1071:25: error: assignment makes pointer from 
integer without a cast [-Werror]
 old_backing_blk = blk_new_with_bs(bs, BLK_PERM_WRITE, BLK_PERM_ALL,
 ^
cc1: all warnings being treated as errors
make: *** [qemu-img.o] Error 1
make: *** Waiting for unfinished jobs
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 664, in 
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=2d32da85331c4d51b4632262369586d1', '-u', 
'1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-mtvq5xk5/src/docker-src.2020-04-28-09.40.27.12753:/var/tmp/qemu:z,ro',
 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=2d32da85331c4d51b4632262369586d1
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-mtvq5xk5/src'
make: *** [docker-run-test-quick@centos7] Error 2

real2m58.028s
user0m8.393s


The full log is available at
http://patchew.org/logs/20200428132629.796753-1-mre...@redhat.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH 0/4] block: Do not call BlockDriver.bdrv_make_empty() directly

2020-04-28 Thread no-reply

Patchew URL: https://patchew.org/QEMU/20200428132629.796753-1-mre...@redhat.com/



Hi,

This series failed the asan build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-debug@fedora TARGET_LIST=x86_64-softmmu J=14 NETWORK=1
=== TEST SCRIPT END ===

  SIGNpc-bios/optionrom/kvmvapic.bin
  BUILD   pc-bios/optionrom/pvh.raw
  SIGNpc-bios/optionrom/pvh.bin
/tmp/qemu-test/src/qemu-img.c:1071:27: error: implicit declaration of function 
'blk_new_with_bs' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
old_backing_blk = blk_new_with_bs(bs, BLK_PERM_WRITE, BLK_PERM_ALL,
  ^
/tmp/qemu-test/src/qemu-img.c:1071:27: error: this function declaration is not 
a prototype [-Werror,-Wstrict-prototypes]
/tmp/qemu-test/src/qemu-img.c:1071:25: error: incompatible integer to pointer 
conversion assigning to 'BlockBackend *' (aka 'struct BlockBackend *') from 
'int' [-Werror,-Wint-conversion]
old_backing_blk = blk_new_with_bs(bs, BLK_PERM_WRITE, BLK_PERM_ALL,
^ ~
3 errors generated.
make: *** [/tmp/qemu-test/src/rules.mak:69: qemu-img.o] Error 1
make: *** Waiting for unfinished jobs
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 664, in 
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=1896d2b1f1044091b832be313d66ac6e', '-u', 
'1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 
'TARGET_LIST=x86_64-softmmu', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 
'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', 
'-v', '/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-fql9ti5h/src/docker-src.2020-04-28-09.34.00.5332:/var/tmp/qemu:z,ro',
 'qemu:fedora', '/var/tmp/qemu/run', 'test-debug']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=1896d2b1f1044091b832be313d66ac6e
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-fql9ti5h/src'
make: *** [docker-run-test-debug@fedora] Error 2

real4m44.194s
user0m8.084s


The full log is available at
http://patchew.org/logs/20200428132629.796753-1-mre...@redhat.com/testing.asan/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

[PATCH v21 1/4] qcow2: introduce compression type feature

2020-04-28 Thread Denis Plotnikov

The patch adds some preparation parts for incompatible compression type
feature to qcow2 allowing the use different compression methods for
image clusters (de)compressing.

It is implied that the compression type is set on the image creation and
can be changed only later by image conversion, thus compression type
defines the only compression algorithm used for the image, and thus,
for all image clusters.

The goal of the feature is to add support of other compression methods
to qcow2. For example, ZSTD which is more effective on compression than ZLIB.

The default compression is ZLIB. Images created with ZLIB compression type
are backward compatible with older qemu versions.

Adding of the compression type breaks a number of tests because now the
compression type is reported on image creation and there are some changes
in the qcow2 header in size and offsets.

The tests are fixed in the following ways:
* filter out compression_type for many tests
* fix header size, feature table size and backing file offset
  affected tests: 031, 036, 061, 080
  header_size +=8: 1 byte compression type
   7 bytes padding
  feature_table += 48: incompatible feature compression type
  backing_file_offset += 56 (8 + 48 -> header_change + feature_table_change)
* add "compression type" for test output matching when it isn't filtered
  affected tests: 049, 060, 061, 065, 144, 182, 242, 255

Signed-off-by: Denis Plotnikov 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
QAPI part:
Acked-by: Markus Armbruster 
---
 qapi/block-core.json |  22 +-
 block/qcow2.h|  20 +-
 include/block/block_int.h|   1 +
 block/qcow2.c| 113 +++
 tests/qemu-iotests/031.out   |  14 ++--
 tests/qemu-iotests/036.out   |   4 +-
 tests/qemu-iotests/049.out   | 102 ++--
 tests/qemu-iotests/060.out   |   1 +
 tests/qemu-iotests/061.out   |  34 ++
 tests/qemu-iotests/065   |  28 +---
 tests/qemu-iotests/080   |   2 +-
 tests/qemu-iotests/144.out   |   4 +-
 tests/qemu-iotests/182.out   |   2 +-
 tests/qemu-iotests/242.out   |   5 ++
 tests/qemu-iotests/255.out   |   8 +--
 tests/qemu-iotests/common.filter |   3 +-
 16 files changed, 267 insertions(+), 96 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 943df1926a..1522e2983f 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -78,6 +78,8 @@
 #
 # @bitmaps: A list of qcow2 bitmap details (since 4.0)
 #
+# @compression-type: the image cluster compression method (since 5.1)
+#
 # Since: 1.7
 ##
 { 'struct': 'ImageInfoSpecificQCow2',
@@ -89,7 +91,8 @@
   '*corrupt': 'bool',
   'refcount-bits': 'int',
   '*encrypt': 'ImageInfoSpecificQCow2Encryption',
-  '*bitmaps': ['Qcow2BitmapInfo']
+  '*bitmaps': ['Qcow2BitmapInfo'],
+  'compression-type': 'Qcow2CompressionType'
   } }
 
 ##
@@ -4284,6 +4287,18 @@
   'data': [ 'v2', 'v3' ] }
 
 
+##
+# @Qcow2CompressionType:
+#
+# Compression type used in qcow2 image file
+#
+# @zlib: zlib compression, see 
+#
+# Since: 5.1
+##
+{ 'enum': 'Qcow2CompressionType',
+  'data': [ 'zlib' ] }
+
 ##
 # @BlockdevCreateOptionsQcow2:
 #
@@ -4307,6 +4322,8 @@
 # allowed values: off, falloc, full, metadata)
 # @lazy-refcounts: True if refcounts may be updated lazily (default: off)
 # @refcount-bits: Width of reference counts in bits (default: 16)
+# @compression-type: The image cluster compression method
+#(default: zlib, since 5.1)
 #
 # Since: 2.12
 ##
@@ -4322,7 +4339,8 @@
 '*cluster-size':'size',
 '*preallocation':   'PreallocMode',
 '*lazy-refcounts':  'bool',
-'*refcount-bits':   'int' } }
+'*refcount-bits':   'int',
+'*compression-type':'Qcow2CompressionType' } }
 
 ##
 # @BlockdevCreateOptionsQed:
diff --git a/block/qcow2.h b/block/qcow2.h
index f4de0a27d5..6a8b82e6cc 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -146,8 +146,16 @@ typedef struct QCowHeader {
 
 uint32_t refcount_order;
 uint32_t header_length;
+
+/* Additional fields */
+uint8_t compression_type;
+
+/* header must be a multiple of 8 */
+uint8_t padding[7];
 } QEMU_PACKED QCowHeader;
 
+QEMU_BUILD_BUG_ON(!QEMU_IS_ALIGNED(sizeof(QCowHeader), 8));
+
 typedef struct QEMU_PACKED QCowSnapshotHeader {
 /* header is 8 byte aligned */
 uint64_t l1_table_offset;
@@ -216,13 +224,16 @@ enum {
 QCOW2_INCOMPAT_DIRTY_BITNR  = 0,
 QCOW2_INCOMPAT_CORRUPT_BITNR= 1,
 QCOW2_INCOMPAT_DATA_FILE_BITNR  = 2,
+QCOW2_INCOMPAT_COMPRESSION_BITNR = 3,
 QCOW2_INCOMPAT_DIRTY= 1 << QCOW2_INCOMPAT_DIRTY_BITNR,
 QCOW2_INCOMPAT_CORRUPT  = 1 << QCOW2_INCOMPAT_CORRUPT_BITNR,

[PATCH v21 0/4] implement zstd cluster compression method

2020-04-28 Thread Denis Plotnikov

v21:
   03:
   * remove the loop on compression [Max]
   * use designated initializers [Max]
   04:
   * don't erase user's options [Max]
   * use _rm_test_img [Max]
   * add unsupported qcow2 options [Max]

v20:
   04: fix a number of flaws [Vladimir]
   * don't use $RAND_FILE passing to qemu-io,
 so check $TEST_DIR is redundant
   * re-arrage $RAND_FILE writing
   * fix a typo

v19:
   04: fix a number of flaws [Eric]
   * remove rudundant test case descriptions
   * fix stdout redirect
   * don't use (())
   * use peek_file_be instead of od
   * check $TEST_DIR for spaces and other before using
   * use $RAND_FILE safer

v18:
   * 04: add quotes to all file name variables [Vladimir] 
   * 04: add Vladimir's comment according to "qemu-io write -s"
 option issue.

v17:
   * 03: remove incorrect comment in zstd decompress [Vladimir]
   * 03: remove "paraniod" and rewrite the comment on decompress [Vladimir]
   * 03: fix dead assignment [Vladimir]
   * 04: add and remove quotes [Vladimir]
   * 04: replace long offset form with the short one [Vladimir]

v16:
   * 03: ssize_t for ret, size_t for zstd_ret [Vladimir]
   * 04: small fixes according to the comments [Vladimir] 

v15:
   * 01: aiming qemu 5.1 [Eric]
   * 03: change zstd_res definition place [Vladimir]
   * 04: add two new test cases [Eric]
 1. test adjacent cluster compression with zstd
 2. test incompressible cluster processing
   * 03, 04: many rewording and gramma fixing [Eric]

v14:
   * fix bug on compression - looping until compress == 0 [Me]
   * apply reworked Vladimir's suggestions:
  1. not mixing ssize_t with size_t
  2. safe check for ENOMEM in compression part - avoid overflow
  3. tolerate sanity check allow zstd to make progress only
 on one of the buffers
v13:
   * 03: add progress sanity check to decompression loop [Vladimir]
 03: add successful decompression check [Me]

v12:
   * 03: again, rework compression and decompression loops
 to make them more correct [Vladimir]
 03: move assert in compression to more appropriate place
 [Vladimir]
v11:
   * 03: the loops don't need "do{}while" form anymore and
 the they were buggy (missed "do" in the beginning)
 replace them with usual "while(){}" loops [Vladimir]
v10:
   * 03: fix zstd (de)compressed loops for multi-frame
 cases [Vladimir]
v9:
   * 01: fix error checking and reporting in qcow2_amend compression type part 
[Vladimir]
   * 03: replace asserts with -EIO in qcow2_zstd_decompression [Vladimir, 
Alberto]
   * 03: reword/amend/add comments, fix typos [Vladimir]

v8:
   * 03: switch zstd API from simple to stream [Eric]
 No need to state a special cluster layout for zstd
 compressed clusters.
v7:
   * use qapi_enum_parse instead of the open-coding [Eric]
   * fix wording, typos and spelling [Eric]

v6:
   * "block/qcow2-threads: fix qcow2_decompress" is removed from the series
  since it has been accepted by Max already
   * add compile time checking for Qcow2Header to be a multiple of 8 [Max, 
Alberto]
   * report error on qcow2 amending when the compression type is actually 
chnged [Max]
   * remove the extra space and the extra new line [Max]
   * re-arrange acks and signed-off-s [Vladimir]

v5:
   * replace -ENOTSUP with abort in qcow2_co_decompress [Vladimir]
   * set cluster size for all test cases in the beginning of the 287 test

v4:
   * the series is rebased on top of 01 "block/qcow2-threads: fix 
qcow2_decompress"
   * 01 is just a no-change resend to avoid extra dependencies. Still, it may 
be merged in separate

v3:
   * remove redundant max compression type value check [Vladimir, Eric]
 (the switch below checks everything)
   * prevent compression type changing on "qemu-img amend" [Vladimir]
   * remove zstd config setting, since it has been added already by
 "migration" patches [Vladimir]
   * change the compression type error message [Vladimir] 
   * fix alignment and 80-chars exceeding [Vladimir]

v2:
   * rework compression type setting [Vladimir]
   * squash iotest changes to the compression type introduction patch 
[Vladimir, Eric]
   * fix zstd availability checking in zstd iotest [Vladimir]
   * remove unnecessry casting [Eric]
   * remove rudundant checks [Eric]
   * fix compressed cluster layout in qcow2 spec [Vladimir]
   * fix wording [Eric, Vladimir]
   * fix compression type filtering in iotests [Eric]

v1:
   the initial series

Denis Plotnikov (4):
  qcow2: introduce compression type feature
  qcow2: rework the cluster compression routine
  qcow2: add zstd cluster compression
  iotests: 287: add qcow2 compression type test

 docs/interop/qcow2.txt   |   1 +
 configure|   2 +-
 qapi/block-core.json |  23 ++-
 block/qcow2.h|  20 ++-
 include/block/block_int.h|   1 +
 block/qcow2-threads.c

[PATCH v21 3/4] qcow2: add zstd cluster compression

2020-04-28 Thread Denis Plotnikov

zstd significantly reduces cluster compression time.
It provides better compression performance maintaining
the same level of the compression ratio in comparison with
zlib, which, at the moment, is the only compression
method available.

The performance test results:
Test compresses and decompresses qemu qcow2 image with just
installed rhel-7.6 guest.
Image cluster size: 64K. Image on disk size: 2.2G

The test was conducted with brd disk to reduce the influence
of disk subsystem to the test results.
The results is given in seconds.

compress cmd:
  time ./qemu-img convert -O qcow2 -c -o compression_type=[zlib|zstd]
  src.img [zlib|zstd]_compressed.img
decompress cmd
  time ./qemu-img convert -O qcow2
  [zlib|zstd]_compressed.img uncompressed.img

   compression   decompression
 zlib   zstd   zlib zstd

real 65.5   16.3 (-75 %)1.9  1.6 (-16 %)
user 65.0   15.85.3  2.5
sys   3.30.22.0  2.0

Both ZLIB and ZSTD gave the same compression ratio: 1.57
compressed image size in both cases: 1.4G

Signed-off-by: Denis Plotnikov 
QAPI part:
Acked-by: Markus Armbruster 
---
 docs/interop/qcow2.txt |   1 +
 configure  |   2 +-
 qapi/block-core.json   |   3 +-
 block/qcow2-threads.c  | 167 +
 block/qcow2.c  |   7 ++
 5 files changed, 178 insertions(+), 2 deletions(-)

diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index 640e0eca40..18a77f737e 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -209,6 +209,7 @@ version 2.
 
 Available compression type values:
 0: zlib 
+1: zstd 
 
 
 === Header padding ===
diff --git a/configure b/configure
index 23b5e93752..4e3a1690ea 100755
--- a/configure
+++ b/configure
@@ -1861,7 +1861,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   lzfse   support of lzfse compression library
   (for reading lzfse-compressed dmg images)
   zstdsupport for zstd compression library
-  (for migration compression)
+  (for migration compression and qcow2 cluster compression)
   seccomp seccomp support
   coroutine-pool  coroutine freelist (better performance)
   glusterfs   GlusterFS backend
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 1522e2983f..6fbacddab2 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4293,11 +4293,12 @@
 # Compression type used in qcow2 image file
 #
 # @zlib: zlib compression, see 
+# @zstd: zstd compression, see 
 #
 # Since: 5.1
 ##
 { 'enum': 'Qcow2CompressionType',
-  'data': [ 'zlib' ] }
+  'data': [ 'zlib', { 'name': 'zstd', 'if': 'defined(CONFIG_ZSTD)' } ] }
 
 ##
 # @BlockdevCreateOptionsQcow2:
diff --git a/block/qcow2-threads.c b/block/qcow2-threads.c
index 7dbaf53489..0591dafbc8 100644
--- a/block/qcow2-threads.c
+++ b/block/qcow2-threads.c
@@ -28,6 +28,11 @@
 #define ZLIB_CONST
 #include 
 
+#ifdef CONFIG_ZSTD
+#include 
+#include 
+#endif
+
 #include "qcow2.h"
 #include "block/thread-pool.h"
 #include "crypto.h"
@@ -166,6 +171,158 @@ static ssize_t qcow2_zlib_decompress(void *dest, size_t 
dest_size,
 return ret;
 }
 
+#ifdef CONFIG_ZSTD
+
+/*
+ * qcow2_zstd_compress()
+ *
+ * Compress @src_size bytes of data using zstd compression method
+ *
+ * @dest - destination buffer, @dest_size bytes
+ * @src - source buffer, @src_size bytes
+ *
+ * Returns: compressed size on success
+ *  -ENOMEM destination buffer is not enough to store compressed data
+ *  -EIOon any other error
+ */
+static ssize_t qcow2_zstd_compress(void *dest, size_t dest_size,
+   const void *src, size_t src_size)
+{
+ssize_t ret;
+size_t zstd_ret;
+ZSTD_outBuffer output = {
+.dst = dest,
+.size = dest_size,
+.pos = 0
+};
+ZSTD_inBuffer input = {
+.src = src,
+.size = src_size,
+.pos = 0
+};
+ZSTD_CCtx *cctx = ZSTD_createCCtx();
+
+if (!cctx) {
+return -EIO;
+}
+/*
+ * Use the zstd streamed interface for symmetry with decompression,
+ * where streaming is essential since we don't record the exact
+ * compressed size.
+ *
+ * ZSTD_compressStream2() tries to compress everything it could
+ * with a single call. Although, ZSTD docs says that:
+ * "You must continue calling ZSTD_compressStream2() with ZSTD_e_end
+ * until it returns 0, at which point you are free to start a new frame",
+ * in out tests we saw the only case when it returned with >0 -
+ * when the output buffer was too small. In that case,
+

[PATCH v21 2/4] qcow2: rework the cluster compression routine

2020-04-28 Thread Denis Plotnikov

The patch enables processing the image compression type defined
for the image and chooses an appropriate method for image clusters
(de)compression.

Signed-off-by: Denis Plotnikov 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Alberto Garcia 
Reviewed-by: Max Reitz 
---
 block/qcow2-threads.c | 71 ---
 1 file changed, 60 insertions(+), 11 deletions(-)

diff --git a/block/qcow2-threads.c b/block/qcow2-threads.c
index a68126f291..7dbaf53489 100644
--- a/block/qcow2-threads.c
+++ b/block/qcow2-threads.c
@@ -74,7 +74,9 @@ typedef struct Qcow2CompressData {
 } Qcow2CompressData;
 
 /*
- * qcow2_compress()
+ * qcow2_zlib_compress()
+ *
+ * Compress @src_size bytes of data using zlib compression method
  *
  * @dest - destination buffer, @dest_size bytes
  * @src - source buffer, @src_size bytes
@@ -83,8 +85,8 @@ typedef struct Qcow2CompressData {
  *  -ENOMEM destination buffer is not enough to store compressed data
  *  -EIOon any other error
  */
-static ssize_t qcow2_compress(void *dest, size_t dest_size,
-  const void *src, size_t src_size)
+static ssize_t qcow2_zlib_compress(void *dest, size_t dest_size,
+   const void *src, size_t src_size)
 {
 ssize_t ret;
 z_stream strm;
@@ -119,10 +121,10 @@ static ssize_t qcow2_compress(void *dest, size_t 
dest_size,
 }
 
 /*
- * qcow2_decompress()
+ * qcow2_zlib_decompress()
  *
  * Decompress some data (not more than @src_size bytes) to produce exactly
- * @dest_size bytes.
+ * @dest_size bytes using zlib compression method
  *
  * @dest - destination buffer, @dest_size bytes
  * @src - source buffer, @src_size bytes
@@ -130,8 +132,8 @@ static ssize_t qcow2_compress(void *dest, size_t dest_size,
  * Returns: 0 on success
  *  -EIO on fail
  */
-static ssize_t qcow2_decompress(void *dest, size_t dest_size,
-const void *src, size_t src_size)
+static ssize_t qcow2_zlib_decompress(void *dest, size_t dest_size,
+ const void *src, size_t src_size)
 {
 int ret;
 z_stream strm;
@@ -191,20 +193,67 @@ qcow2_co_do_compress(BlockDriverState *bs, void *dest, 
size_t dest_size,
 return arg.ret;
 }
 
+/*
+ * qcow2_co_compress()
+ *
+ * Compress @src_size bytes of data using the compression
+ * method defined by the image compression type
+ *
+ * @dest - destination buffer, @dest_size bytes
+ * @src - source buffer, @src_size bytes
+ *
+ * Returns: compressed size on success
+ *  a negative error code on failure
+ */
 ssize_t coroutine_fn
 qcow2_co_compress(BlockDriverState *bs, void *dest, size_t dest_size,
   const void *src, size_t src_size)
 {
-return qcow2_co_do_compress(bs, dest, dest_size, src, src_size,
-qcow2_compress);
+BDRVQcow2State *s = bs->opaque;
+Qcow2CompressFunc fn;
+
+switch (s->compression_type) {
+case QCOW2_COMPRESSION_TYPE_ZLIB:
+fn = qcow2_zlib_compress;
+break;
+
+default:
+abort();
+}
+
+return qcow2_co_do_compress(bs, dest, dest_size, src, src_size, fn);
 }
 
+/*
+ * qcow2_co_decompress()
+ *
+ * Decompress some data (not more than @src_size bytes) to produce exactly
+ * @dest_size bytes using the compression method defined by the image
+ * compression type
+ *
+ * @dest - destination buffer, @dest_size bytes
+ * @src - source buffer, @src_size bytes
+ *
+ * Returns: 0 on success
+ *  a negative error code on failure
+ */
 ssize_t coroutine_fn
 qcow2_co_decompress(BlockDriverState *bs, void *dest, size_t dest_size,
 const void *src, size_t src_size)
 {
-return qcow2_co_do_compress(bs, dest, dest_size, src, src_size,
-qcow2_decompress);
+BDRVQcow2State *s = bs->opaque;
+Qcow2CompressFunc fn;
+
+switch (s->compression_type) {
+case QCOW2_COMPRESSION_TYPE_ZLIB:
+fn = qcow2_zlib_decompress;
+break;
+
+default:
+abort();
+}
+
+return qcow2_co_do_compress(bs, dest, dest_size, src, src_size, fn);
 }
 
 
-- 
2.17.0

[PATCH v21 4/4] iotests: 287: add qcow2 compression type test

2020-04-28 Thread Denis Plotnikov

The test checks fulfilling qcow2 requirements for the compression
type feature and zstd compression type operability.

Signed-off-by: Denis Plotnikov 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Tested-by: Vladimir Sementsov-Ogievskiy 
---
 slirp  |   2 +-
 tests/qemu-iotests/287 | 152 +
 tests/qemu-iotests/287.out |  67 
 tests/qemu-iotests/group   |   1 +
 4 files changed, 221 insertions(+), 1 deletion(-)
 create mode 100755 tests/qemu-iotests/287
 create mode 100644 tests/qemu-iotests/287.out

diff --git a/slirp b/slirp
index 2faae0f778..55ab21c9a3 16
--- a/slirp
+++ b/slirp
@@ -1 +1 @@
-Subproject commit 2faae0f778f818fadc873308f983289df697eb93
+Subproject commit 55ab21c9a36852915b81f1b41ebaf3b6509dd8ba
diff --git a/tests/qemu-iotests/287 b/tests/qemu-iotests/287
new file mode 100755
index 00..21fe1f19f5
--- /dev/null
+++ b/tests/qemu-iotests/287
@@ -0,0 +1,152 @@
+#!/usr/bin/env bash
+#
+# Test case for an image using zstd compression
+#
+# Copyright (c) 2020 Virtuozzo International GmbH
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=dplotni...@virtuozzo.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+status=1   # failure is the default!
+
+# standard environment
+. ./common.rc
+. ./common.filter
+
+# This tests qocw2-specific low-level functionality
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+_unsupported_imgopts 'compat=0.10' data_file
+
+COMPR_IMG="$TEST_IMG.compressed"
+RAND_FILE="$TEST_DIR/rand_data"
+
+_cleanup()
+{
+   _rm_test_img
+   rm -f "$COMPR_IMG"
+   rm -f "$RAND_FILE"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# for all the cases
+CLUSTER_SIZE=65536
+
+# Check if we can run this test.
+if IMGOPTS='compression_type=zstd' _make_test_img 64M |
+grep "Invalid parameter 'zstd'"; then
+_notrun "ZSTD is disabled"
+fi
+
+echo
+echo "=== Testing compression type incompatible bit setting for zlib ==="
+echo
+_make_test_img -o compression_type=zlib 64M
+$PYTHON qcow2.py "$TEST_IMG" dump-header | grep incompatible_features
+
+echo
+echo "=== Testing compression type incompatible bit setting for zstd ==="
+echo
+_make_test_img -o compression_type=zstd 64M
+$PYTHON qcow2.py "$TEST_IMG" dump-header | grep incompatible_features
+
+echo
+echo "=== Testing zlib with incompatible bit set ==="
+echo
+_make_test_img -o compression_type=zlib 64M
+$PYTHON qcow2.py "$TEST_IMG" set-feature-bit incompatible 3
+# to make sure the bit was actually set
+$PYTHON qcow2.py "$TEST_IMG" dump-header | grep incompatible_features
+
+if $QEMU_IMG info "$TEST_IMG" >/dev/null 2>&1 ; then
+echo "Error: The image opened successfully. The image must not be opened."
+fi
+
+echo
+echo "=== Testing zstd with incompatible bit unset ==="
+echo
+_make_test_img -o compression_type=zstd 64M
+$PYTHON qcow2.py "$TEST_IMG" set-header incompatible_features 0
+# to make sure the bit was actually unset
+$PYTHON qcow2.py "$TEST_IMG" dump-header | grep incompatible_features
+
+if $QEMU_IMG info "$TEST_IMG" >/dev/null 2>&1 ; then
+echo "Error: The image opened successfully. The image must not be opened."
+fi
+
+echo
+echo "=== Testing compression type values ==="
+echo
+# zlib=0
+_make_test_img -o compression_type=zlib 64M
+peek_file_be "$TEST_IMG" 104 1
+echo
+
+# zstd=1
+_make_test_img -o compression_type=zstd 64M
+peek_file_be "$TEST_IMG" 104 1
+echo
+
+echo
+echo "=== Testing simple reading and writing with zstd ==="
+echo
+_make_test_img -o compression_type=zstd 64M
+$QEMU_IO -c "write -c -P 0xAC 64K 64K " "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0xAC 64K 64K " "$TEST_IMG" | _filter_qemu_io
+# read on the cluster boundaries
+$QEMU_IO -c "read -v 131070 8 " "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -v 65534 8" "$TEST_IMG" | _filter_qemu_io
+
+echo
+echo "=== Testing adjacent clusters reading and writing with zstd ==="
+echo
+_make_test_img -o compression_type=zstd 64M
+$QEMU_IO -c "write -c -P 0xAB 0 64K " "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "write -c -P 0xAC 64K 64K " "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "write -c -P 0xAD 128K 64K " "$TEST_IMG" | _filter_qemu_io
+
+$QEMU_IO -c "read -P 0xAB 0 64k " "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0xAC 64K 64k " "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0xAD 128K 64k "

Re: [PATCH v20 4/4] iotests: 287: add qcow2 compression type test

2020-04-28 Thread Denis Plotnikov





On 28.04.2020 16:01, Eric Blake wrote:

On 4/28/20 7:55 AM, Max Reitz wrote:


+# This tests qocw2-specific low-level functionality
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux

This test doesn’t work with compat=0.10 (because we can’t store a
non-default compression type there) or data_file (because those don’t
support compression), so those options should be marked as 
unsupported.


(It does seem to work with any refcount_bits, though.)


Could I ask how to achieve that?
I can't find any _supported_* related.



It’s _unsupported_imgopts.


Test 036 is an example of this.

Max, Eric

Thanks!

Denis

[PATCH 0/4] block: Do not call BlockDriver.bdrv_make_empty() directly

2020-04-28 Thread Max Reitz

Branch: https://github.com/XanClic/qemu.git fix-bdrv_make_empty-v1
Branch: https://git.xanclic.moe/XanClic/qemu.git fix-bdrv_make_empty-v1

Hi,

Right now, there is no centralized bdrv_make_empty() function.  Not only
is it bad style to call BlockDriver methods directly, it is also wrong,
unless the caller has a BdrvChild with BLK_PERM_WRITE taken.

This series fixes that.

Note that as far as I’m aware this series shouldn’t visibly fix anything
at this point; but “block: Introduce real BdrvChildRole”
(https://lists.nongnu.org/archive/html/qemu-block/2020-02/msg00737.html)
makes the iotest break when run with -o data_file=$SOMETHING, without
this series applied beforehand.  (That is because without that series,
external data files are treated much like metadata children, so the
format driver always takes the WRITE permission if the file is writable;
but after that series, it only does so when it itself has a parent
requestion the WRITE permission.)


Max Reitz (4):
  block: Add bdrv_make_empty()
  block: Use bdrv_make_empty() where possible
  block: Add blk_make_empty()
  block: Use blk_make_empty() after commits

 include/block/block.h  |  1 +
 include/sysemu/block-backend.h |  2 ++
 block.c| 23 +++
 block/block-backend.c  |  5 +
 block/commit.c |  8 +++-
 block/replication.c|  6 ++
 block/vvfat.c  |  4 +---
 qemu-img.c | 19 ++-
 8 files changed, 55 insertions(+), 13 deletions(-)

-- 
2.25.4

[PATCH 3/4] block: Add blk_make_empty()

2020-04-28 Thread Max Reitz

Two callers of BlockDriver.bdrv_make_empty() remain that should not call
this method directly.  Both do not have access to a BdrvChild, but they
can use a BlockBackend, so we add this function that lets them use it.

Signed-off-by: Max Reitz 
---
 include/sysemu/block-backend.h | 2 ++
 block/block-backend.c  | 5 +
 2 files changed, 7 insertions(+)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index d37c1244dd..14338b76dc 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -266,4 +266,6 @@ int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, 
int64_t off_in,
 
 const BdrvChild *blk_root(BlockBackend *blk);
 
+int blk_make_empty(BlockBackend *blk, Error **errp);
+
 #endif
diff --git a/block/block-backend.c b/block/block-backend.c
index 3592066b42..5d36efd32f 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2402,3 +2402,8 @@ const BdrvChild *blk_root(BlockBackend *blk)
 {
 return blk->root;
 }
+
+int blk_make_empty(BlockBackend *blk, Error **errp)
+{
+return bdrv_make_empty(blk->root, errp);
+}
-- 
2.25.4

[PATCH 1/4] block: Add bdrv_make_empty()

2020-04-28 Thread Max Reitz

Right now, all users of bdrv_make_empty() call the BlockDriver method
directly.  That is not only bad style, it is also wrong, unless the
caller has a BdrvChild with a WRITE permission.

Introduce bdrv_make_empty() that verifies that it does.

Signed-off-by: Max Reitz 
---
 include/block/block.h |  1 +
 block.c   | 23 +++
 2 files changed, 24 insertions(+)

diff --git a/include/block/block.h b/include/block/block.h
index b05995fe9c..d947fb4080 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -351,6 +351,7 @@ BlockMeasureInfo *bdrv_measure(BlockDriver *drv, QemuOpts 
*opts,
 void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr);
 void bdrv_refresh_limits(BlockDriverState *bs, Error **errp);
 int bdrv_commit(BlockDriverState *bs);
+int bdrv_make_empty(BdrvChild *c, Error **errp);
 int bdrv_change_backing_file(BlockDriverState *bs,
 const char *backing_file, const char *backing_fmt);
 void bdrv_register(BlockDriver *bdrv);
diff --git a/block.c b/block.c
index 2e3905c99e..b0d5b98617 100644
--- a/block.c
+++ b/block.c
@@ -6791,3 +6791,26 @@ void bdrv_del_child(BlockDriverState *parent_bs, 
BdrvChild *child, Error **errp)
 
 parent_bs->drv->bdrv_del_child(parent_bs, child, errp);
 }
+
+int bdrv_make_empty(BdrvChild *c, Error **errp)
+{
+BlockDriver *drv = c->bs->drv;
+int ret;
+
+assert(c->perm & BLK_PERM_WRITE);
+
+if (!drv->bdrv_make_empty) {
+error_setg(errp, "%s does not support emptying nodes",
+   drv->format_name);
+return -ENOTSUP;
+}
+
+ret = drv->bdrv_make_empty(c->bs);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to empty %s",
+ c->bs->filename);
+return ret;
+}
+
+return 0;
+}
-- 
2.25.4

[PATCH 4/4] block: Use blk_make_empty() after commits

2020-04-28 Thread Max Reitz

bdrv_commit() already has a BlockBackend pointing to the BDS that we
want to empty, it just has the wrong permissions.

qemu-img commit has no BlockBackend pointing to the old backing file
yet, but introducing one is simple.

After this commit, bdrv_make_empty() is the only remaining caller of
BlockDriver.bdrv_make_empty().

Signed-off-by: Max Reitz 
---
 block/commit.c |  8 +++-
 qemu-img.c | 19 ++-
 2 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/block/commit.c b/block/commit.c
index 8e672799af..24720ba67d 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -493,10 +493,16 @@ int bdrv_commit(BlockDriverState *bs)
 }
 
 if (drv->bdrv_make_empty) {
-ret = drv->bdrv_make_empty(bs);
+ret = blk_set_perm(src, BLK_PERM_WRITE, BLK_PERM_ALL, NULL);
 if (ret < 0) {
 goto ro_cleanup;
 }
+
+ret = blk_make_empty(src, NULL);
+if (ret < 0) {
+goto ro_cleanup;
+}
+
 blk_flush(src);
 }
 
diff --git a/qemu-img.c b/qemu-img.c
index 821cbf610e..a5e8659867 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1065,11 +1065,20 @@ static int img_commit(int argc, char **argv)
 goto unref_backing;
 }
 
-if (!drop && bs->drv->bdrv_make_empty) {
-ret = bs->drv->bdrv_make_empty(bs);
-if (ret) {
-error_setg_errno(_err, -ret, "Could not empty %s",
- filename);
+if (!drop) {
+BlockBackend *old_backing_blk;
+
+old_backing_blk = blk_new_with_bs(bs, BLK_PERM_WRITE, BLK_PERM_ALL,
+  _err);
+if (!old_backing_blk) {
+goto unref_backing;
+}
+ret = blk_make_empty(old_backing_blk, _err);
+blk_unref(old_backing_blk);
+if (ret == -ENOTSUP) {
+error_free(local_err);
+local_err = NULL;
+} else if (ret < 0) {
 goto unref_backing;
 }
 }
-- 
2.25.4

[PATCH 2/4] block: Use bdrv_make_empty() where possible

2020-04-28 Thread Max Reitz

Signed-off-by: Max Reitz 
---
 block/replication.c | 6 ++
 block/vvfat.c   | 4 +---
 2 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/block/replication.c b/block/replication.c
index da013c2041..cc6a40d577 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -331,9 +331,8 @@ static void secondary_do_checkpoint(BDRVReplicationState 
*s, Error **errp)
 return;
 }
 
-ret = s->active_disk->bs->drv->bdrv_make_empty(s->active_disk->bs);
+ret = bdrv_make_empty(s->active_disk, errp);
 if (ret < 0) {
-error_setg(errp, "Cannot make active disk empty");
 return;
 }
 
@@ -343,9 +342,8 @@ static void secondary_do_checkpoint(BDRVReplicationState 
*s, Error **errp)
 return;
 }
 
-ret = s->hidden_disk->bs->drv->bdrv_make_empty(s->hidden_disk->bs);
+ret = bdrv_make_empty(s->hidden_disk, errp);
 if (ret < 0) {
-error_setg(errp, "Cannot make hidden disk empty");
 return;
 }
 }
diff --git a/block/vvfat.c b/block/vvfat.c
index ab800c4887..e3020b65c8 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -2960,9 +2960,7 @@ static int do_commit(BDRVVVFATState* s)
 return ret;
 }
 
-if (s->qcow->bs->drv && s->qcow->bs->drv->bdrv_make_empty) {
-s->qcow->bs->drv->bdrv_make_empty(s->qcow->bs);
-}
+bdrv_make_empty(s->qcow, NULL);
 
 memset(s->used_clusters, 0, sector2cluster(s, s->sector_count));
 
-- 
2.25.4

Re: [PATCH v2 02/14] qcrypto/luks: implement encryption key management

2020-04-28 Thread Daniel P . Berrangé

On Sun, Mar 08, 2020 at 05:18:51PM +0200, Maxim Levitsky wrote:
> Next few patches will expose that functionality
> to the user.
> 
> Signed-off-by: Maxim Levitsky 
> ---
>  crypto/block-luks.c | 398 +++-
>  qapi/crypto.json|  61 ++-
>  2 files changed, 455 insertions(+), 4 deletions(-)
> 
> diff --git a/crypto/block-luks.c b/crypto/block-luks.c
> index 4861db810c..b11ee08c6d 100644
> --- a/crypto/block-luks.c
> +++ b/crypto/block-luks.c

> +/*
> + * Erases an keyslot given its index
> + * Returns:
> + *0 if the keyslot was erased successfully
> + *   -1 if a error occurred while erasing the keyslot
> + *
> + */
> +static int
> +qcrypto_block_luks_erase_key(QCryptoBlock *block,
> + unsigned int slot_idx,
> + QCryptoBlockWriteFunc writefunc,
> + void *opaque,
> + Error **errp)
> +{
> +QCryptoBlockLUKS *luks = block->opaque;
> +QCryptoBlockLUKSKeySlot *slot = >header.key_slots[slot_idx];
> +g_autofree uint8_t *garbagesplitkey = NULL;
> +size_t splitkeylen = luks->header.master_key_len * slot->stripes;
> +size_t i;
> +
> +assert(slot_idx < QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS);
> +assert(splitkeylen > 0);
> +garbagesplitkey = g_new0(uint8_t, splitkeylen);
> +
> +/* Reset the key slot header */
> +memset(slot->salt, 0, QCRYPTO_BLOCK_LUKS_SALT_LEN);
> +slot->iterations = 0;
> +slot->active = QCRYPTO_BLOCK_LUKS_KEY_SLOT_DISABLED;
> +
> +qcrypto_block_luks_store_header(block,  writefunc, opaque, errp);

This may set  errp and we don't return immediately, so

> +/*
> + * Now try to erase the key material, even if the header
> + * update failed
> + */
> +for (i = 0; i < QCRYPTO_BLOCK_LUKS_ERASE_ITERATIONS; i++) {
> +if (qcrypto_random_bytes(garbagesplitkey, splitkeylen, errp) < 0) {

...this may then set errp a second time, which is not permitted.

This call needs to use a "local_err", and error_propagate(errp, local_err).
The latter is a no-op if errp is already set.

> +/*
> + * If we failed to get the random data, still write
> + * at least zeros to the key slot at least once
> + */
> +if (i > 0) {
> +return -1;
> +}
> +}
> +if (writefunc(block,
> +  slot->key_offset_sector * 
> QCRYPTO_BLOCK_LUKS_SECTOR_SIZE,
> +  garbagesplitkey,
> +  splitkeylen,
> +  opaque,
> +  errp) != splitkeylen) {

same issue with errp here too.

> +return -1;
> +}
> +}
> +return 0;
> +}


> +/*
> + * Given LUKSKeyslotUpdate command, set @slots_bitmap with all slots
> + * that will be updated with new password (or erased)
> + * returns 0 on success, and -1 on failure
> + */
> +static int
> +qcrypto_block_luks_get_update_bitmap(QCryptoBlock *block,
> + QCryptoBlockReadFunc readfunc,
> + void *opaque,
> + const QCryptoBlockAmendOptionsLUKS 
> *opts,
> + unsigned long *slots_bitmap,
> + Error **errp)
> +{
> +const QCryptoBlockLUKS *luks = block->opaque;
> +size_t i;
> +
> +bitmap_zero(slots_bitmap, QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS);
> +
> +if (opts->has_keyslot) {
> +/* keyslot set, select only this keyslot */
> +int keyslot = opts->keyslot;
> +
> +if (keyslot < 0 || keyslot >= QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS) {
> +error_setg(errp,
> +   "Invalid slot %u specified, must be between 0 and %u",
> +   keyslot, QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS - 1);
> +return -1;
> +}
> +bitmap_set(slots_bitmap, keyslot, 1);
> +
> +} else if (opts->has_old_secret) {
> +/* initially select all active keyslots */
> +for (i = 0; i < QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS; i++) {
> +if (qcrypto_block_luks_slot_active(luks, i)) {
> +bitmap_set(slots_bitmap, i, 1);
> +}
> +}
> +} else {
> +/* find a free keyslot */
> +int slot = qcrypto_block_luks_find_free_keyslot(luks);
> +
> +if (slot == -1) {
> +error_setg(errp,
> +   "Can't add a keyslot - all key slots are in use");
> +return -1;
> +}
> +bitmap_set(slots_bitmap, slot, 1);
> +}
> +
> +if (opts->has_old_secret) {
> +/* now deselect all keyslots that don't contain the password */
> +g_autofree uint8_t *tmpkey = g_new0(uint8_t,
> +luks->header.master_key_len);
> +
> +for (i = 0; i < QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS; i++) {
> +

Re: [PATCH v20 4/4] iotests: 287: add qcow2 compression type test

2020-04-28 Thread Eric Blake


On 4/28/20 7:55 AM, Max Reitz wrote:


+# This tests qocw2-specific low-level functionality
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux

This test doesn’t work with compat=0.10 (because we can’t store a
non-default compression type there) or data_file (because those don’t
support compression), so those options should be marked as unsupported.

(It does seem to work with any refcount_bits, though.)


Could I ask how to achieve that?
I can't find any _supported_* related.



It’s _unsupported_imgopts.


Test 036 is an example of this.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v3 0/3] qcow2: Allow resize of images with internal snapshots

2020-04-28 Thread Max Reitz

On 24.04.20 21:09, Eric Blake wrote:
> In v3:
> - patch 1: fix error returns [patchew, Max], R-b dropped
> - patch 2,3: unchanged, so add R-b
> 
> Eric Blake (3):
>   block: Add blk_new_with_bs() helper
>   qcow2: Allow resize of images with internal snapshots
>   qcow2: Tweak comment about bitmaps vs. resize

Thanks, I’ve squashed the diff into patch 1 and applied the series to my
block-next branch:

https://git.xanclic.moe/XanClic/qemu/commits/branch/block-next

Max



signature.asc
Description: OpenPGP digital signature

Re: [PATCH v20 4/4] iotests: 287: add qcow2 compression type test

2020-04-28 Thread Max Reitz

On 28.04.20 13:41, Denis Plotnikov wrote:
> 
> 
> On 27.04.2020 16:29, Max Reitz wrote:
>> On 21.04.20 10:11, Denis Plotnikov wrote:
>>> The test checks fulfilling qcow2 requirements for the compression
>>> type feature and zstd compression type operability.
>>>
>>> Signed-off-by: Denis Plotnikov 
>>> ---
>>>   tests/qemu-iotests/287 | 146 +
>>>   tests/qemu-iotests/287.out |  67 +
>>>   tests/qemu-iotests/group   |   1 +
>>>   3 files changed, 214 insertions(+)
>>>   create mode 100755 tests/qemu-iotests/287
>>>   create mode 100644 tests/qemu-iotests/287.out
>>>
>>> diff --git a/tests/qemu-iotests/287 b/tests/qemu-iotests/287
>>> new file mode 100755
>>> index 00..156acc40ad
>>> --- /dev/null
>>> +++ b/tests/qemu-iotests/287
>>> @@ -0,0 +1,146 @@
>>> +#!/usr/bin/env bash
>>> +#
>>> +# Test case for an image using zstd compression
>>> +#
>>> +# Copyright (c) 2020 Virtuozzo International GmbH
>>> +#
>>> +# This program is free software; you can redistribute it and/or modify
>>> +# it under the terms of the GNU General Public License as published by
>>> +# the Free Software Foundation; either version 2 of the License, or
>>> +# (at your option) any later version.
>>> +#
>>> +# This program is distributed in the hope that it will be useful,
>>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>> +# GNU General Public License for more details.
>>> +#
>>> +# You should have received a copy of the GNU General Public License
>>> +# along with this program.  If not, see .
>>> +#
>>> +
>>> +# creator
>>> +owner=dplotni...@virtuozzo.com
>>> +
>>> +seq="$(basename $0)"
>>> +echo "QA output created by $seq"
>>> +
>>> +status=1    # failure is the default!
>>> +
>>> +# standard environment
>>> +. ./common.rc
>>> +. ./common.filter
>>> +
>>> +# This tests qocw2-specific low-level functionality
>>> +_supported_fmt qcow2
>>> +_supported_proto file
>>> +_supported_os Linux
>> This test doesn’t work with compat=0.10 (because we can’t store a
>> non-default compression type there) or data_file (because those don’t
>> support compression), so those options should be marked as unsupported.
>>
>> (It does seem to work with any refcount_bits, though.)
> 
> Could I ask how to achieve that?
> I can't find any _supported_* related.


It’s _unsupported_imgopts.

Max



signature.asc
Description: OpenPGP digital signature

Re: [PATCH v3 1/3] block: Add blk_new_with_bs() helper

2020-04-28 Thread Eric Blake


On 4/28/20 1:34 AM, Max Reitz wrote:


block_crypto_co_create_generic(BlockDriverState *bs,
     PreallocMode prealloc,
     Error **errp)
   {
-    int ret;
+    int ret = -EPERM;


I’m not sure I’m a fan of this, because I feel like it makes the code
harder to read, due to having to look in three places (here, around the
blk_new_with_bs() call, and under the cleanup label) instead of in two
(not here) to verify that the error handling code is correct.

There’s also the fact that this is not really a default return value,
but one very specific error code for if one very specific function call
fails.

I suppose it comes down to whether one considers LoC a complexity
problem.  (I don’t, necessarily.)

(Also I realize it seems rather common in the kernel to set error return
variables before the function call, but I think the more common pattern
in qemu is to set it in the error path.)


I'm fine with either style.  Setting it up front is handy if that
particular error makes a good default, but in many of the functions I
touched, we were returning a variety of errors (-EIO, -EINVAL, -EPERM,
etc) such that there was no good default, and thus no reason to set a
default up front.  Is this something that would go through your tree,
and if so, are you okay making that tweak, or do I need to send v4?


I suppose I can do that, this is what I’d squash in, OK?


Yes, that change looks correct to me.

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v10 00/14] iotests: use python logging

2020-04-28 Thread Kevin Wolf

Am 28.04.2020 um 13:46 hat Max Reitz geschrieben:
> On 31.03.20 02:00, John Snow wrote:
> > This series uses python logging to enable output conditionally on
> > iotests.log(). We unify an initialization call (which also enables
> > debugging output for those tests with -d) and then make the switch
> > inside of iotests.
> > 
> > It will help alleviate the need to create logged/unlogged versions
> > of all the various helpers we have made.
> > 
> > Also, I got lost and accidentally delinted iotests while I was here.
> > Sorry about that. By version 9, it's now the overwhelming focus of
> > this series. No good deed, etc.
> 
> Seems like nobody else wants it, so I thank you and let you know that
> I’ve applied this series to my block-next branch:
> 
> https://git.xanclic.moe/XanClic/qemu/commits/branch/block-next

John said he wanted to address my comment on patch 14, so I expected him
to send another version. This need not stop this series (we can still
fix that on top), but just as an explanation why I didn't take it yet.

Kevin


signature.asc
Description: PGP signature

Re: [PATCH v10 00/14] iotests: use python logging

2020-04-28 Thread Max Reitz

On 31.03.20 02:00, John Snow wrote:
> This series uses python logging to enable output conditionally on
> iotests.log(). We unify an initialization call (which also enables
> debugging output for those tests with -d) and then make the switch
> inside of iotests.
> 
> It will help alleviate the need to create logged/unlogged versions
> of all the various helpers we have made.
> 
> Also, I got lost and accidentally delinted iotests while I was here.
> Sorry about that. By version 9, it's now the overwhelming focus of
> this series. No good deed, etc.

Seems like nobody else wants it, so I thank you and let you know that
I’ve applied this series to my block-next branch:

https://git.xanclic.moe/XanClic/qemu/commits/branch/block-next

Max



signature.asc
Description: OpenPGP digital signature

Re: [PATCH v20 4/4] iotests: 287: add qcow2 compression type test

2020-04-28 Thread Denis Plotnikov





On 27.04.2020 16:29, Max Reitz wrote:

On 21.04.20 10:11, Denis Plotnikov wrote:

The test checks fulfilling qcow2 requirements for the compression
type feature and zstd compression type operability.

Signed-off-by: Denis Plotnikov 
---
  tests/qemu-iotests/287 | 146 +
  tests/qemu-iotests/287.out |  67 +
  tests/qemu-iotests/group   |   1 +
  3 files changed, 214 insertions(+)
  create mode 100755 tests/qemu-iotests/287
  create mode 100644 tests/qemu-iotests/287.out

diff --git a/tests/qemu-iotests/287 b/tests/qemu-iotests/287
new file mode 100755
index 00..156acc40ad
--- /dev/null
+++ b/tests/qemu-iotests/287
@@ -0,0 +1,146 @@
+#!/usr/bin/env bash
+#
+# Test case for an image using zstd compression
+#
+# Copyright (c) 2020 Virtuozzo International GmbH
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=dplotni...@virtuozzo.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+status=1   # failure is the default!
+
+# standard environment
+. ./common.rc
+. ./common.filter
+
+# This tests qocw2-specific low-level functionality
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux

This test doesn’t work with compat=0.10 (because we can’t store a
non-default compression type there) or data_file (because those don’t
support compression), so those options should be marked as unsupported.

(It does seem to work with any refcount_bits, though.)


Could I ask how to achieve that?
I can't find any _supported_* related.

Denis



+
+COMPR_IMG="$TEST_IMG.compressed"
+RAND_FILE="$TEST_DIR/rand_data"
+
+_cleanup()
+{
+   _cleanup_test_img
+   rm -f "$COMPR_IMG"

Using _rm_test_img() would be nicer.  There shouldn’t be a functional
difference here because there’d only be one with external data files (I
think), which won’t work with this test, but still.


+   rm -f "$RAND_FILE"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# for all the cases
+CLUSTER_SIZE=65536
+
+# Check if we can run this test.
+if IMGOPTS='compression_type=zstd' _make_test_img 64M |
+grep "Invalid parameter 'zstd'"; then
+_notrun "ZSTD is disabled"
+fi
+
+echo
+echo "=== Testing compression type incompatible bit setting for zlib ==="
+echo
+IMGOPTS='compression_type=zlib' _make_test_img 64M

Please use -o so user options are still considered.

(i.e., _make_test_img -o compression_type=zlib)

[...]


+echo
+echo "=== Testing incompressible cluster processing with zstd ==="
+echo
+# create a 2M image and fill it with 1M likely incompressible data
+# and 1M compressible data
+dd if=/dev/urandom of="$RAND_FILE" bs=1M count=1 seek=1
+QEMU_IO_OPTIONS="$QEMU_IO_OPTIONS_NO_FMT" \
+$QEMU_IO -f raw -c "write -P 0xFA 0 1M" "$RAND_FILE" | _filter_qemu_io
+$QEMU_IMG convert -f raw -O $IMGFMT -c "$RAND_FILE" "$TEST_IMG" | 
_filter_qemu_io
+
+$QEMU_IMG convert -O $IMGFMT -c -o compression_type=zstd \
+  "$TEST_IMG" "$COMPR_IMG"

Again, it would be nice to not discard the user-supplied options here,
and maybe it would also be nicer to explicitly pass the compression type
for the first convert, too.  So we’d use
   -o "$(_optstr_add "$IMGOPTS" "compression_type=zlib")"
for the first convert, and
   -o "$(_optstr_add "$IMGOPTS" "compression_type=zstd")"
for the second one.

Max

Re: backing chain & block status & filters

2020-04-28 Thread Kevin Wolf

Am 28.04.2020 um 13:08 hat Max Reitz geschrieben:
> On 28.04.20 10:55, Vladimir Sementsov-Ogievskiy wrote:
> > Hi!
> > 
> > I wanted to resend my "[PATCH 0/4] fix & merge block_status_above and
> > is_allocated_above", and returned to all the inconsistencies about
> > block-status. I keep in mind Max's series about child-access functions,
> > and Andrey's work about using COR filter in block-stream, which depends
> > on Max's series (because, without them COR fitler with file child breaks
> > backing chains).. And, it seems that it's better to discuss some
> > questions before resending.
> > 
> > First, problems about block-status:
> > 
> > 1. We consider ALLOCATED = ZERO | DATA, and documented as follows:
> > 
> >    * BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer
> >    * BDRV_BLOCK_ZERO: offset reads as zero
> >    * BDRV_BLOCK_OFFSET_VALID: an associated offset exists for accessing
> > raw data
> >    * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
> >    *   layer rather than any backing, set by block
> > layer
> > 
> > This actually means, that we should always have BDRV_BLOCK_ALLOCATED for
> > formats which doesn't support backing. So, all such format drivers must
> > return ZERO or DATA (or both?), yes?. Seems file-posix does so, but, for
> > example, iscsi - doesn't.
> 
> Hm.  I could imagine that there are formats that have non-zero holes
> (e.g. 0xff or just garbage).  It would be a bit wrong for them to return
> ZERO or DATA then.
> 
> But OTOH we don’t care about such cases, do we?  We need to know whether
> ranges are zero, data, or unallocated.  If they aren’t zero, we only
> care about whether reading from it will return data from this layer or not.
> 
> So I suppose that anything that doesn’t support backing files (or
> filtered children) should always return ZERO and/or DATA.

I'm not sure I agree with the notion that everything should be
BDRV_BLOCK_ALLOCATED at the lowest layer. It's not what it means today
at least. If we want to change this, we will have to check all callers
of bdrv_is_allocated() and friends who might use this to find holes in
the file.

Basically, the way bdrv_is_allocated() works today is that we assume an
implicit zeroed backing layer even for block drivers that don't support
backing files.

Kevin


signature.asc
Description: PGP signature

Re: fdatasync semantics and block device backup

2020-04-28 Thread Kevin Wolf

Hi Bryan,

first of all, for your next question, please don't reply to a message in
an unrelated thread, but start a new email. This will give you a lot
more visibility because people generally use a threaded email view and
will decide whether to read an email or not depending on whether the
topic of that thread is interesting to them.

Am 27.04.2020 um 21:49 hat Bryan S Rosenburg geschrieben:
> Blockdev community,
> 
> Our group would like to write block device backups directly to an object 
> store, using an interface such as s3fs or rclone-mount. We've run into 
> problems with both interfaces, and in both cases the problems revolve 
> around fdatasync system calls. With s3fs, fdatasync calls are painfully 
> slow. With rclone-mount, the calls are very fast but don't do anything.
> 
> Syncing files to an object store is inherently problematic, as a proper 
> sync requires finalizing the object that holds the file. After 
> finalization, additional writes to the file require a new object to be 
> created and the old object to be copied and destroyed. This process 
> results in an N-squared performance problem for files that are synced 
> periodically as they are written, as is the case for qemu backups.
> 
> Empirically, s3fs implements fdatasync, and hence backups written to s3fs 
> take an untenably long time. I can provide data and straces, if needed.
> 
> Backups written to rclone-mount are much faster, but there are obvious 
> semantic problems. The backup job completes successfully before the file 
> is actually stable in the object store. And in fact, a lot of the work of 
> finalizing the file occurs during the "close" system call that is invoked 
> as part of the qmp_blockdev_del operation.The syscall causes that 
> operation to take so long that other commands time out waiting to "acquire 
> state change lock (held by monitor qemuProcessEventHandler)".
> 
> My questions for the group are: Has anyone else tried writing backups to 
> file systems that don't have good support for fdatasync, and do you have 
> any advice other than "Don't do that." ?

I think "don't do that" is a good answer actually.

You may want to put an NBD indirection between QEMU and your object
store, so that the close() syscall will just block a qemu-nbd process
that has already closed its connection to QEMU instead of blocking all
of QEMU.

It is possible to disable fdatasync() by specifying cache=unsafe for
the block device, so you could avoid the penalty of repeated syncs on
s3fs.

Of course, if s3fs requires an fsync before data is actually stable, in
this case you couldn't consider your backup completed when the backup
block job finishes successfully, but you would have to issue an fsync
manually and wait for its result before you can consider the backup
successful.

Kevin

Re: backing chain & block status & filters

2020-04-28 Thread Max Reitz

On 28.04.20 10:55, Vladimir Sementsov-Ogievskiy wrote:
> Hi!
> 
> I wanted to resend my "[PATCH 0/4] fix & merge block_status_above and
> is_allocated_above", and returned to all the inconsistencies about
> block-status. I keep in mind Max's series about child-access functions,
> and Andrey's work about using COR filter in block-stream, which depends
> on Max's series (because, without them COR fitler with file child breaks
> backing chains).. And, it seems that it's better to discuss some
> questions before resending.
> 
> First, problems about block-status:
> 
> 1. We consider ALLOCATED = ZERO | DATA, and documented as follows:
> 
>    * BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer
>    * BDRV_BLOCK_ZERO: offset reads as zero
>    * BDRV_BLOCK_OFFSET_VALID: an associated offset exists for accessing
> raw data
>    * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
>    *   layer rather than any backing, set by block
> layer
> 
> This actually means, that we should always have BDRV_BLOCK_ALLOCATED for
> formats which doesn't support backing. So, all such format drivers must
> return ZERO or DATA (or both?), yes?. Seems file-posix does so, but, for
> example, iscsi - doesn't.

Hm.  I could imagine that there are formats that have non-zero holes
(e.g. 0xff or just garbage).  It would be a bit wrong for them to return
ZERO or DATA then.

But OTOH we don’t care about such cases, do we?  We need to know whether
ranges are zero, data, or unallocated.  If they aren’t zero, we only
care about whether reading from it will return data from this layer or not.

So I suppose that anything that doesn’t support backing files (or
filtered children) should always return ZERO and/or DATA.

> 2. ZERO. The meaning differs a bit for generic block_status and for
> drivers.. I think, we at least should document it like this:
> 
> BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer
> BDRV_BLOCK_ZERO: if driver return ZERO, than the region is allocated at
> this layer and read as ZERO. If generic block_status returns ZERO, it
> only mean that it reads as zero, but the region may be allocated on
> underlying level.

Hm.  What does that mean?

One of the problems is that “allocated” has two meanings:
(1) reading data returns data defined at this backing layer,
(2) actually allocated, i.e. takes up space on the file represented by
this BDS.

As far as I understand, we actually don’t care about (2) in the context
of block_status, but just about (1).

So if a layer returns ZERO, it is by definition (1)-allocated.  (It
isn’t necessarily (2)-allocated.)

> 3. bdi.unallocated_blocks_are_zero
> 
> I think it's very bad, that we have formats, that supports backing, but
> doesn't report bdi.unallocated_blocks_are_zero as true. It means that
> UNALLOCATED region reads as zero if we have short backing file, and not
> as zero if we remove this short backing file.

What do you mean by “remove this short backing file”?  Because generally
one can’t just drop a backing file.

So maybe a case like block-stream?  Wouldn’t that be a bug in
block-stream them, i.e. shouldn’t it stream zeros after the end of the
backing file?

> I can live with it but
> this is weird logic. These bad drivers are qcow (not qcow2), parallels
> and vmdk. I hope, they actually just forget to set
> unallocated_blocks_are_zero to true.

qcow definitely sounds like it.

> Next. But what about drivers which doesn't support backing? As we
> considered above, they should always return ZERO or DATA, as everything
> is allocated in this backing-chain level (last level, of course).. So
> again unallocated_blocks_are_zero is meaningless. So, I think, that
> driver which doesn't support backings, should be fixed to return always
> ZERO or DATA, than we don't need this unallocated_blocks_are_zero at all.

Agreed.

> 3.

The second 3.? :)

> Short backing files in allocated_above: we must consider space after
> EOF as ALLOCATED, if short backing file is inside requested
> backing-chain part, as it produced exactly because of this short file
> (and we never go to backing).

Sounds correct.

> (current realization of allocated_above is
> buggy, see my outdated series "[PATCH 0/4] fix & merge
> block_status_above and is_allocated_above")
> 
> 4. Long ago we've discussed problems about BDRV_BLOCK_RAW, when we have
> a backing chain of non-backing child.. I just remember that we didn't
> reach the consensus.

Possible? :)

> 5. Filters.. OK we have two functions for them:
> bdrv_co_block_status_from_file and bdrv_co_block_status_from_backing. I
> think both are wrong:
> 
> bdrv_co_block_status_from_file leads to problem [4], when we can report
> UNALLOCATED, which refers not to the current backing chain, but to sub
> backing chain of file child, which is inconsistent with
> block_status_above and is_allocated_above iteration.
> 
> bdrv_co_block_status_from_backing is also is not consistent with
>

Re: [PATCH v20 3/4] qcow2: add zstd cluster compression

2020-04-28 Thread Max Reitz

On 28.04.20 09:23, Denis Plotnikov wrote:
> 
> 
> On 28.04.2020 09:16, Max Reitz wrote:
>> On 27.04.20 21:26, Denis Plotnikov wrote:
>>>
>>> On 27.04.2020 15:35, Max Reitz wrote:
 On 21.04.20 10:11, Denis Plotnikov wrote:
> zstd significantly reduces cluster compression time.
> It provides better compression performance maintaining
> the same level of the compression ratio in comparison with
> zlib, which, at the moment, is the only compression
> method available.
>
> The performance test results:
> Test compresses and decompresses qemu qcow2 image with just
> installed rhel-7.6 guest.
> Image cluster size: 64K. Image on disk size: 2.2G
>
> The test was conducted with brd disk to reduce the influence
> of disk subsystem to the test results.
> The results is given in seconds.
>
> compress cmd:
>     time ./qemu-img convert -O qcow2 -c -o
> compression_type=[zlib|zstd]
>     src.img [zlib|zstd]_compressed.img
> decompress cmd
>     time ./qemu-img convert -O qcow2
>     [zlib|zstd]_compressed.img uncompressed.img
>
>  compression   decompression
>    zlib   zstd   zlib zstd
> 
> real 65.5   16.3 (-75 %)    1.9  1.6 (-16 %)
> user 65.0   15.8    5.3  2.5
> sys   3.3    0.2    2.0  2.0
>
> Both ZLIB and ZSTD gave the same compression ratio: 1.57
> compressed image size in both cases: 1.4G
>
> Signed-off-by: Denis Plotnikov 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> Reviewed-by: Alberto Garcia 
> QAPI part:
> Acked-by: Markus Armbruster 
> ---
>    docs/interop/qcow2.txt |   1 +
>    configure  |   2 +-
>    qapi/block-core.json   |   3 +-
>    block/qcow2-threads.c  | 157
> +
>    block/qcow2.c  |   7 ++
>    5 files changed, 168 insertions(+), 2 deletions(-)
 [...]

> diff --git a/block/qcow2-threads.c b/block/qcow2-threads.c
> index 7dbaf53489..0525718704 100644
> --- a/block/qcow2-threads.c
> +++ b/block/qcow2-threads.c
> @@ -28,6 +28,11 @@
>    #define ZLIB_CONST
>    #include 
>    +#ifdef CONFIG_ZSTD
> +#include 
> +#include 
> +#endif
> +
>    #include "qcow2.h"
>    #include "block/thread-pool.h"
>    #include "crypto.h"
> @@ -166,6 +171,148 @@ static ssize_t qcow2_zlib_decompress(void
> *dest, size_t dest_size,
>    return ret;
>    }
>    +#ifdef CONFIG_ZSTD
> +
> +/*
> + * qcow2_zstd_compress()
> + *
> + * Compress @src_size bytes of data using zstd compression method
> + *
> + * @dest - destination buffer, @dest_size bytes
> + * @src - source buffer, @src_size bytes
> + *
> + * Returns: compressed size on success
> + *  -ENOMEM destination buffer is not enough to store
> compressed data
> + *  -EIO    on any other error
> + */
> +static ssize_t qcow2_zstd_compress(void *dest, size_t dest_size,
> +   const void *src, size_t src_size)
> +{
> +    ssize_t ret;
> +    ZSTD_outBuffer output = { dest, dest_size, 0 };
> +    ZSTD_inBuffer input = { src, src_size, 0 };
 Minor style note: I think it’d be nicer to use designated initializers
 here.

> +    ZSTD_CCtx *cctx = ZSTD_createCCtx();
> +
> +    if (!cctx) {
> +    return -EIO;
> +    }
> +    /*
> + * Use the zstd streamed interface for symmetry with
> decompression,
> + * where streaming is essential since we don't record the exact
> + * compressed size.
> + *
> + * In the loop, we try to compress all the data into one zstd
> frame.
> + * ZSTD_compressStream2 potentially can finish a frame earlier
> + * than the full input data is consumed. That's why we are
> looping
> + * until all the input data is consumed.
> + */
> +    while (input.pos < input.size) {
> +    size_t zstd_ret;
> +    /*
> + * ZSTD spec: "You must continue calling
> ZSTD_compressStream2()
> + * with ZSTD_e_end until it returns 0, at which point you are
> + * free to start a new frame". We assume that "start a new
> frame"
> + * means call ZSTD_compressStream2 in the very beginning or
> when
> + * ZSTD_compressStream2 has returned with 0.
> + */
> +    do {
> +    zstd_ret = ZSTD_compressStream2(cctx, , ,
> ZSTD_e_end);
 The spec makes it sound to me like ZSTD_e_end will always complete in a
 single call if there’s enough space in the output buffer.  So

Re: [PATCH v2 5/6] block/block-copy: move block_copy_task_create down

2020-04-28 Thread Max Reitz

On 28.04.20 11:17, Vladimir Sementsov-Ogievskiy wrote:
> 28.04.2020 12:06, Max Reitz wrote:
>> On 25.03.20 14:46, Vladimir Sementsov-Ogievskiy wrote:
>>> Simple movement without any change. It's needed for the following
>>> patch, as this function will need to use some staff which is currently
>>
>> *stuff
>>
>>> below it.
>>
>> Wouldn’t it be simpler to just declare block_copy_task_entry()?
>>
> 
> I just think, that it's good to keep native order of functions and avoid
> extra declarations. Still, may be I care too much. No actual difference,
> if you prefer declaration, I can drop this patch.

Personally, the native order doesn’t do me any good (cscope doesn’t
really care where the definition is), and also having functions in order
seems just like a C artifact.

I just prefer declarations because otherwise we end up moving functions
all the time with no real benefit.  Furthermore, moving functions has
the drawback of polluting git blame.

Max

signature.asc
Description: OpenPGP digital signature

1 2 >

1 - 100 of 113 matches

Mail list logo