[PULL 33/34] qcow2: Assert that expand_zero_clusters_in_l1() does not support subclusters

2020-08-25 Thread Max Reitz
From: Alberto Garcia 

This function is only used by qcow2_expand_zero_clusters() to
downgrade a qcow2 image to a previous version. This would require
transforming all extended L2 entries into normal L2 entries but this
is not a simple task and there are no plans to implement this at the
moment.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
Message-Id: 
<15e65112b4144381b4d8c0bdf8fb76b0d813e3d1.1594396418.git.be...@igalia.com>
[mreitz: Fixed comment style]
Signed-off-by: Max Reitz 
---
 block/qcow2-cluster.c  | 14 --
 tests/qemu-iotests/061 |  6 ++
 tests/qemu-iotests/061.out |  5 +
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index fd506b4cb3..996b3314f4 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -2166,6 +2166,9 @@ static int expand_zero_clusters_in_l1(BlockDriverState 
*bs, uint64_t *l1_table,
 int ret;
 int i, j;
 
+/* qcow2_downgrade() is not allowed in images with subclusters */
+assert(!has_subclusters(s));
+
 slice_size2 = s->l2_slice_size * l2_entry_size(s);
 n_slices = s->cluster_size / slice_size2;
 
@@ -2233,8 +2236,11 @@ static int expand_zero_clusters_in_l1(BlockDriverState 
*bs, uint64_t *l1_table,
 
 if (cluster_type == QCOW2_CLUSTER_ZERO_PLAIN) {
 if (!bs->backing) {
-/* not backed; therefore we can simply deallocate the
- * cluster */
+/*
+ * not backed; therefore we can simply deallocate the
+ * cluster. No need to call set_l2_bitmap(), this
+ * function doesn't support images with subclusters.
+ */
 set_l2_entry(s, l2_slice, j, 0);
 l2_dirty = true;
 continue;
@@ -2305,6 +2311,10 @@ static int expand_zero_clusters_in_l1(BlockDriverState 
*bs, uint64_t *l1_table,
 } else {
 set_l2_entry(s, l2_slice, j, offset);
 }
+/*
+ * No need to call set_l2_bitmap() after set_l2_entry() because
+ * this function doesn't support images with subclusters.
+ */
 l2_dirty = true;
 }
 
diff --git a/tests/qemu-iotests/061 b/tests/qemu-iotests/061
index 08ddbdd10c..5747beb7ed 100755
--- a/tests/qemu-iotests/061
+++ b/tests/qemu-iotests/061
@@ -303,6 +303,12 @@ $QEMU_IMG amend -o "compat=0.10" "$TEST_IMG"
 _img_info --format-specific
 _check_test_img
 
+echo
+echo "=== Testing version downgrade with extended L2 entries ==="
+echo
+_make_test_img -o "compat=1.1,extended_l2=on" 64M
+$QEMU_IMG amend -o "compat=0.10" "$TEST_IMG"
+
 echo
 echo "=== Try changing the external data file ==="
 echo
diff --git a/tests/qemu-iotests/061.out b/tests/qemu-iotests/061.out
index b0a1382046..ee30da2665 100644
--- a/tests/qemu-iotests/061.out
+++ b/tests/qemu-iotests/061.out
@@ -533,6 +533,11 @@ Format specific information:
 extended l2: false
 No errors were found on the image.
 
+=== Testing version downgrade with extended L2 entries ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+qemu-img: Cannot downgrade an image with incompatible features 0x10 set
+
 === Try changing the external data file ===
 
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
-- 
2.26.2




[PULL 17/34] qcow2: Add cluster type parameter to qcow2_get_host_offset()

2020-08-25 Thread Max Reitz
From: Alberto Garcia 

This function returns an integer that can be either an error code or a
cluster type (a value from the QCow2ClusterType enum).

We are going to start using subcluster types instead of cluster types
in some functions so it's better to use the exact data types instead
of integers for clarity and in order to detect errors more easily.

This patch makes qcow2_get_host_offset() return 0 on success and
puts the returned cluster type in a separate parameter. There are no
semantic changes.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Message-Id: 
<396b6eab1859a271551dcd7dcba77f8934aa3c3f.1594396418.git.be...@igalia.com>
Signed-off-by: Max Reitz 
---
 block/qcow2.h |  3 ++-
 block/qcow2-cluster.c | 11 +++
 block/qcow2.c | 37 ++---
 3 files changed, 31 insertions(+), 20 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index ea647c8bb5..74f65793bd 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -893,7 +893,8 @@ int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t 
sector_num,
   uint8_t *buf, int nb_sectors, bool enc, Error 
**errp);
 
 int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
-  unsigned int *bytes, uint64_t *host_offset);
+  unsigned int *bytes, uint64_t *host_offset,
+  QCow2ClusterType *cluster_type);
 int qcow2_alloc_cluster_offset(BlockDriverState *bs, uint64_t offset,
unsigned int *bytes, uint64_t *host_offset,
QCowL2Meta **m);
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 2fe7a0f79c..ec0fe0e13b 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -565,13 +565,14 @@ static int coroutine_fn 
do_perform_cow_write(BlockDriverState *bs,
  *
  * On exit, *bytes is the number of bytes starting at offset that have the same
  * cluster type and (if applicable) are stored contiguously in the image file.
+ * The cluster type is stored in *cluster_type.
  * Compressed clusters are always returned one by one.
  *
- * Returns the cluster type (QCOW2_CLUSTER_*) on success, -errno in error
- * cases.
+ * Returns 0 on success, -errno in error cases.
  */
 int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
-  unsigned int *bytes, uint64_t *host_offset)
+  unsigned int *bytes, uint64_t *host_offset,
+  QCow2ClusterType *cluster_type)
 {
 BDRVQcow2State *s = bs->opaque;
 unsigned int l2_index;
@@ -713,7 +714,9 @@ out:
 assert(bytes_available - offset_in_cluster <= UINT_MAX);
 *bytes = bytes_available - offset_in_cluster;
 
-return type;
+*cluster_type = type;
+
+return 0;
 
 fail:
 qcow2_cache_put(s->l2_table_cache, (void **)_slice);
diff --git a/block/qcow2.c b/block/qcow2.c
index edbf9fbd0a..070f89c700 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2043,6 +2043,7 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 BDRVQcow2State *s = bs->opaque;
 uint64_t host_offset;
 unsigned int bytes;
+QCow2ClusterType type;
 int ret, status = 0;
 
 qemu_co_mutex_lock(>lock);
@@ -2054,7 +2055,7 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 }
 
 bytes = MIN(INT_MAX, count);
-ret = qcow2_get_host_offset(bs, offset, , _offset);
+ret = qcow2_get_host_offset(bs, offset, , _offset, );
 qemu_co_mutex_unlock(>lock);
 if (ret < 0) {
 return ret;
@@ -2062,15 +2063,15 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 
 *pnum = bytes;
 
-if ((ret == QCOW2_CLUSTER_NORMAL || ret == QCOW2_CLUSTER_ZERO_ALLOC) &&
+if ((type == QCOW2_CLUSTER_NORMAL || type == QCOW2_CLUSTER_ZERO_ALLOC) &&
 !s->crypto) {
 *map = host_offset;
 *file = s->data_file->bs;
 status |= BDRV_BLOCK_OFFSET_VALID;
 }
-if (ret == QCOW2_CLUSTER_ZERO_PLAIN || ret == QCOW2_CLUSTER_ZERO_ALLOC) {
+if (type == QCOW2_CLUSTER_ZERO_PLAIN || type == QCOW2_CLUSTER_ZERO_ALLOC) {
 status |= BDRV_BLOCK_ZERO;
-} else if (ret != QCOW2_CLUSTER_UNALLOCATED) {
+} else if (type != QCOW2_CLUSTER_UNALLOCATED) {
 status |= BDRV_BLOCK_DATA;
 }
 if (s->metadata_preallocation && (status & BDRV_BLOCK_DATA) &&
@@ -2279,6 +2280,7 @@ static coroutine_fn int 
qcow2_co_preadv_part(BlockDriverState *bs,
 int ret = 0;
 unsigned int cur_bytes; /* number of bytes in current iteration */
 uint64_t host_offset = 0;
+QCow2ClusterType type;
 AioTaskPool *aio = NULL;
 
 while (bytes != 0 && aio_task_pool_status(aio) == 0) {
@@ -2290,22 +2292,23 @@ static coroutine_fn int 
qcow2_co_preadv_part(BlockDriverState *bs,
 }
 
 qemu_co_mutex_lock(>lock);
-ret = qcow2_get_host_offset(bs, offset, _bytes, 

[PULL 26/34] qcow2: Clear the L2 bitmap when allocating a compressed cluster

2020-08-25 Thread Max Reitz
From: Alberto Garcia 

Compressed clusters always have the bitmap part of the extended L2
entry set to 0.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Message-Id: 
<04455b3de5dfeb9d1cfe1fc7b02d7060a6e09710.1594396418.git.be...@igalia.com>
Signed-off-by: Max Reitz 
---
 block/qcow2-cluster.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 816ddc7639..1e84bd8e2e 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -862,6 +862,9 @@ int qcow2_alloc_compressed_cluster_offset(BlockDriverState 
*bs,
 BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE_COMPRESSED);
 qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
 set_l2_entry(s, l2_slice, l2_index, cluster_offset);
+if (has_subclusters(s)) {
+set_l2_bitmap(s, l2_slice, l2_index, 0);
+}
 qcow2_cache_put(s->l2_table_cache, (void **) _slice);
 
 *host_offset = cluster_offset & s->cluster_offset_mask;
-- 
2.26.2




Re: [PATCH v2 2/5] qemu-iotests: Fix FilePaths docstring

2020-08-25 Thread Max Reitz
On 21.08.20 01:54, Nir Soffer wrote:
> When this class was extracted from FilePath, the docstring was not
> updated for generating multiple files, and the example usage was
> referencing unrelated file.
> 
> Fixes: de263986b5dc
> Signed-off-by: Nir Soffer 
> ---
>  tests/qemu-iotests/iotests.py | 14 --
>  1 file changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
> index 16a04df8a3..f34a1d7ef1 100644
> --- a/tests/qemu-iotests/iotests.py
> +++ b/tests/qemu-iotests/iotests.py
> @@ -450,14 +450,16 @@ def file_pattern(name):
>  
>  class FilePaths:
>  """
> -FilePaths is an auto-generated filename that cleans itself up.
> +Context manager generating multiple file names. The generated files are
> +removed when exiting the context.
>  
> -Use this context manager to generate filenames and ensure that the file
> -gets deleted::
> +Example usage:
> +
> +with FilePaths(['test.img', 'test.sock']) as (img_path, sock_path):

On second thought, this isn’t a great example because image files and
sockets should go into different base directories.

Max

> +# Use img_path and sock_path here...
> +
> +# img_path and sock_path are automatically removed here.
>  
> -with FilePaths(['test.img']) as img_path:
> -qemu_img('create', img_path, '1G')
> -# migration_sock_path is automatically deleted
>  """
>  def __init__(self, names, base_dir=test_dir):
>  self.paths = []
> 




signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 30/58] ahci: Move QOM macros to header

2020-08-25 Thread Daniel P . Berrangé
On Wed, Aug 19, 2020 at 08:12:08PM -0400, Eduardo Habkost wrote:
> The TYPE_* constants and the typedefs are defined in ahci.h, so
> we can move the type checking macros there too.
> 
> This will make future conversion to OBJECT_DECLARE* easier.
> 
> Signed-off-by: Eduardo Habkost 
> ---
> Changes series v1 -> v2: new patch in series v2
> 
> Cc: John Snow 
> Cc: qemu-block@nongnu.org
> Cc: qemu-de...@nongnu.org
> ---
>  hw/ide/ahci_internal.h | 5 -
>  include/hw/ide/ahci.h  | 3 +++
>  2 files changed, 3 insertions(+), 5 deletions(-)

Reviewed-by: Daniel P. Berrangé 


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v2 0/5] iotest.FilePath fixes and cleanups

2020-08-25 Thread Kevin Wolf
Am 21.08.2020 um 01:54 hat Nir Soffer geschrieben:
> Fix some issues introduced when iotests.FilePaths was added and merge it back
> into FilePath keeping the option to create multiple file names.

Reviewed-by: Kevin Wolf 




[PULL 34/34] iotests: Add tests for qcow2 images with extended L2 entries

2020-08-25 Thread Max Reitz
From: Alberto Garcia 

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Message-Id: 

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/271 | 901 +
 tests/qemu-iotests/271.out | 726 ++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 1628 insertions(+)
 create mode 100755 tests/qemu-iotests/271
 create mode 100644 tests/qemu-iotests/271.out

diff --git a/tests/qemu-iotests/271 b/tests/qemu-iotests/271
new file mode 100755
index 00..39ff462328
--- /dev/null
+++ b/tests/qemu-iotests/271
@@ -0,0 +1,901 @@
+#!/bin/bash
+#
+# Test qcow2 images with extended L2 entries
+#
+# Copyright (C) 2019-2020 Igalia, S.L.
+# Author: Alberto Garcia 
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=be...@igalia.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+here="$PWD"
+status=1   # failure is the default!
+
+_cleanup()
+{
+_cleanup_test_img
+rm -f "$TEST_IMG.raw"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2
+_supported_proto file nfs
+_supported_os Linux
+_unsupported_imgopts extended_l2 compat=0.10 cluster_size data_file 
refcount_bits=1[^0-9]
+
+l2_offset=$((0x4))
+
+_verify_img()
+{
+$QEMU_IMG compare "$TEST_IMG" "$TEST_IMG.raw" | grep -v 'Images are 
identical'
+$QEMU_IMG check "$TEST_IMG" | _filter_qemu_img_check | \
+grep -v 'No errors were found on the image'
+}
+
+# Compare the bitmap of an extended L2 entry against an expected value
+_verify_l2_bitmap()
+{
+entry_no="$1"# L2 entry number, starting from 0
+expected_alloc="$alloc"  # Space-separated list of allocated subcluster 
indexes
+expected_zero="$zero"# Space-separated list of zero subcluster indexes
+
+offset=$(($l2_offset + $entry_no * 16))
+entry=$(peek_file_be "$TEST_IMG" $offset 8)
+offset=$(($offset + 8))
+bitmap=$(peek_file_be "$TEST_IMG" $offset 8)
+
+expected_bitmap=0
+for bit in $expected_alloc; do
+expected_bitmap=$(($expected_bitmap | (1 << $bit)))
+done
+for bit in $expected_zero; do
+expected_bitmap=$(($expected_bitmap | (1 << (32 + $bit
+done
+printf -v expected_bitmap "%u" $expected_bitmap # Convert to unsigned
+
+printf "L2 entry #%d: 0x%016x %016x\n" "$entry_no" "$entry" "$bitmap"
+if [ "$bitmap" != "$expected_bitmap" ]; then
+printf "ERROR: expecting bitmap   0x%016x\n" "$expected_bitmap"
+fi
+}
+
+# This should be called as _run_test c=XXX sc=XXX off=XXX len=XXX cmd=XXX
+# c:   cluster number (0 if unset)
+# sc:  subcluster number inside cluster @c (0 if unset)
+# off: offset inside subcluster @sc, in kilobytes (0 if unset)
+# len: request length, passed directly to qemu-io (e.g: 256, 4k, 1M, ...)
+# cmd: the command to pass to qemu-io, must be one of
+#  write-> write
+#  zero -> write -z
+#  unmap-> write -z -u
+#  compress -> write -c
+#  discard  -> discard
+_run_test()
+{
+unset c sc off len cmd
+for var in "$@"; do eval "$var"; done
+case "${cmd:-write}" in
+zero)
+cmd="write -q -z";;
+unmap)
+cmd="write -q -z -u";;
+compress)
+pat=$((${pat:-0} + 1))
+cmd="write -q -c -P ${pat}";;
+write)
+pat=$((${pat:-0} + 1))
+cmd="write -q -P ${pat}";;
+discard)
+cmd="discard -q";;
+*)
+echo "Unknown option $cmd"
+exit 1;;
+esac
+c="${c:-0}"
+sc="${sc:-0}"
+off="${off:-0}"
+offset="$(($c * 64 + $sc * 2 + $off))"
+[ "$offset" != 0 ] && offset="${offset}k"
+cmd="$cmd ${offset} ${len}"
+raw_cmd=$(echo $cmd | sed s/-c//) # Raw images don't support -c
+echo $cmd | sed 's/-P [0-9][0-9]\?/-P PATTERN/'
+$QEMU_IO -c "$cmd" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "$raw_cmd" -f raw "$TEST_IMG.raw" | _filter_qemu_io
+_verify_img
+_verify_l2_bitmap "$c"
+}
+
+_reset_img()
+{
+size="$1"
+$QEMU_IMG create -f raw "$TEST_IMG.raw" "$size" | _filter_img_create
+if [ "$use_backing_file" = "yes" ]; then
+$QEMU_IMG create -f raw "$TEST_IMG.base" "$size" | _filter_img_create
+$QEMU_IO -c "write -q -P 0xFF 0 $size" -f raw 

[PULL 18/34] qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_*

2020-08-25 Thread Max Reitz
From: Alberto Garcia 

In order to support extended L2 entries some functions of the qcow2
driver need to start dealing with subclusters instead of clusters.

qcow2_get_host_offset() is modified to return the subcluster type
instead of the cluster type, and all callers are updated to replace
all values of QCow2ClusterType with their QCow2SubclusterType
equivalents.

This patch only changes the data types, there are no semantic changes.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Message-Id: 

Signed-off-by: Max Reitz 
---
 block/qcow2.h |  2 +-
 block/qcow2-cluster.c | 10 +++
 block/qcow2.c | 70 ++-
 3 files changed, 42 insertions(+), 40 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 74f65793bd..5df761edc3 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -894,7 +894,7 @@ int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t 
sector_num,
 
 int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
   unsigned int *bytes, uint64_t *host_offset,
-  QCow2ClusterType *cluster_type);
+  QCow2SubclusterType *subcluster_type);
 int qcow2_alloc_cluster_offset(BlockDriverState *bs, uint64_t offset,
unsigned int *bytes, uint64_t *host_offset,
QCowL2Meta **m);
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index ec0fe0e13b..5937937596 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -564,15 +564,15 @@ static int coroutine_fn 
do_perform_cow_write(BlockDriverState *bs,
  * offset that we are interested in.
  *
  * On exit, *bytes is the number of bytes starting at offset that have the same
- * cluster type and (if applicable) are stored contiguously in the image file.
- * The cluster type is stored in *cluster_type.
- * Compressed clusters are always returned one by one.
+ * subcluster type and (if applicable) are stored contiguously in the image
+ * file. The subcluster type is stored in *subcluster_type.
+ * Compressed clusters are always processed one by one.
  *
  * Returns 0 on success, -errno in error cases.
  */
 int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
   unsigned int *bytes, uint64_t *host_offset,
-  QCow2ClusterType *cluster_type)
+  QCow2SubclusterType *subcluster_type)
 {
 BDRVQcow2State *s = bs->opaque;
 unsigned int l2_index;
@@ -714,7 +714,7 @@ out:
 assert(bytes_available - offset_in_cluster <= UINT_MAX);
 *bytes = bytes_available - offset_in_cluster;
 
-*cluster_type = type;
+*subcluster_type = qcow2_cluster_to_subcluster_type(type);
 
 return 0;
 
diff --git a/block/qcow2.c b/block/qcow2.c
index 070f89c700..477d60dd71 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2043,7 +2043,7 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 BDRVQcow2State *s = bs->opaque;
 uint64_t host_offset;
 unsigned int bytes;
-QCow2ClusterType type;
+QCow2SubclusterType type;
 int ret, status = 0;
 
 qemu_co_mutex_lock(>lock);
@@ -2063,15 +2063,16 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 
 *pnum = bytes;
 
-if ((type == QCOW2_CLUSTER_NORMAL || type == QCOW2_CLUSTER_ZERO_ALLOC) &&
-!s->crypto) {
+if ((type == QCOW2_SUBCLUSTER_NORMAL ||
+ type == QCOW2_SUBCLUSTER_ZERO_ALLOC) && !s->crypto) {
 *map = host_offset;
 *file = s->data_file->bs;
 status |= BDRV_BLOCK_OFFSET_VALID;
 }
-if (type == QCOW2_CLUSTER_ZERO_PLAIN || type == QCOW2_CLUSTER_ZERO_ALLOC) {
+if (type == QCOW2_SUBCLUSTER_ZERO_PLAIN ||
+type == QCOW2_SUBCLUSTER_ZERO_ALLOC) {
 status |= BDRV_BLOCK_ZERO;
-} else if (type != QCOW2_CLUSTER_UNALLOCATED) {
+} else if (type != QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN) {
 status |= BDRV_BLOCK_DATA;
 }
 if (s->metadata_preallocation && (status & BDRV_BLOCK_DATA) &&
@@ -2168,7 +2169,7 @@ typedef struct Qcow2AioTask {
 AioTask task;
 
 BlockDriverState *bs;
-QCow2ClusterType cluster_type; /* only for read */
+QCow2SubclusterType subcluster_type; /* only for read */
 uint64_t host_offset; /* or full descriptor in compressed clusters */
 uint64_t offset;
 uint64_t bytes;
@@ -2181,7 +2182,7 @@ static coroutine_fn int 
qcow2_co_preadv_task_entry(AioTask *task);
 static coroutine_fn int qcow2_add_task(BlockDriverState *bs,
AioTaskPool *pool,
AioTaskFunc func,
-   QCow2ClusterType cluster_type,
+   QCow2SubclusterType subcluster_type,
uint64_t host_offset,
uint64_t offset,
 

[PULL 32/34] qcow2: Allow preallocation and backing files if extended_l2 is set

2020-08-25 Thread Max Reitz
From: Alberto Garcia 

Traditional qcow2 images don't allow preallocation if a backing file
is set. This is because once a cluster is allocated there is no way to
tell that its data should be read from the backing file.

Extended L2 entries have individual allocation bits for each
subcluster, and therefore it is perfectly possible to have an
allocated cluster with all its subclusters unallocated.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
Message-Id: 
<6d5b0f38e7dc5f2f31d8cab1cb92044e9909aece.1594396418.git.be...@igalia.com>
Signed-off-by: Max Reitz 
---
 block/qcow2.c  | 7 ---
 tests/qemu-iotests/206.out | 2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 00cda5696b..da56b1a4df 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3449,10 +3449,11 @@ qcow2_co_create(BlockdevCreateOptions *create_options, 
Error **errp)
 qcow2_opts->preallocation = PREALLOC_MODE_OFF;
 }
 if (qcow2_opts->has_backing_file &&
-qcow2_opts->preallocation != PREALLOC_MODE_OFF)
+qcow2_opts->preallocation != PREALLOC_MODE_OFF &&
+!qcow2_opts->extended_l2)
 {
-error_setg(errp, "Backing file and preallocation cannot be used at "
-   "the same time");
+error_setg(errp, "Backing file and preallocation can only be used at "
+   "the same time if extended_l2 is on");
 ret = -EINVAL;
 goto out;
 }
diff --git a/tests/qemu-iotests/206.out b/tests/qemu-iotests/206.out
index 363c5abe35..a100849fcb 100644
--- a/tests/qemu-iotests/206.out
+++ b/tests/qemu-iotests/206.out
@@ -203,7 +203,7 @@ Job failed: Different refcount widths than 16 bits require 
compatibility level 1
 === Invalid backing file options ===
 {"execute": "blockdev-create", "arguments": {"job-id": "job0", "options": 
{"backing-file": "/dev/null", "driver": "qcow2", "file": "node0", 
"preallocation": "full", "size": 67108864}}}
 {"return": {}}
-Job failed: Backing file and preallocation cannot be used at the same time
+Job failed: Backing file and preallocation can only be used at the same time 
if extended_l2 is on
 {"execute": "job-dismiss", "arguments": {"id": "job0"}}
 {"return": {}}
 
-- 
2.26.2




[PULL 27/34] qcow2: Add subcluster support to handle_alloc_space()

2020-08-25 Thread Max Reitz
From: Alberto Garcia 

The bdrv_co_pwrite_zeroes() call here fills complete clusters with
zeroes, but it can happen that some subclusters are not part of the
write request or the copy-on-write. This patch makes sure that only
the affected subclusters are overwritten.

A potential improvement would be to also fill with zeroes the other
subclusters if we can guarantee that we are not overwriting existing
data. However this would waste more disk space, so we should first
evaluate if it's really worth doing.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Message-Id: 

Signed-off-by: Max Reitz 
---
 block/qcow2.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 60192f1be3..9990535c46 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2421,6 +2421,9 @@ static int handle_alloc_space(BlockDriverState *bs, 
QCowL2Meta *l2meta)
 
 for (m = l2meta; m != NULL; m = m->next) {
 int ret;
+uint64_t start_offset = m->alloc_offset + m->cow_start.offset;
+unsigned nb_bytes = m->cow_end.offset + m->cow_end.nb_bytes -
+m->cow_start.offset;
 
 if (!m->cow_start.nb_bytes && !m->cow_end.nb_bytes) {
 continue;
@@ -2435,16 +2438,14 @@ static int handle_alloc_space(BlockDriverState *bs, 
QCowL2Meta *l2meta)
  * efficiently zero out the whole clusters
  */
 
-ret = qcow2_pre_write_overlap_check(bs, 0, m->alloc_offset,
-m->nb_clusters * s->cluster_size,
+ret = qcow2_pre_write_overlap_check(bs, 0, start_offset, nb_bytes,
 true);
 if (ret < 0) {
 return ret;
 }
 
 BLKDBG_EVENT(bs->file, BLKDBG_CLUSTER_ALLOC_SPACE);
-ret = bdrv_co_pwrite_zeroes(s->data_file, m->alloc_offset,
-m->nb_clusters * s->cluster_size,
+ret = bdrv_co_pwrite_zeroes(s->data_file, start_offset, nb_bytes,
 BDRV_REQ_NO_FALLBACK);
 if (ret < 0) {
 if (ret != -ENOTSUP && ret != -EAGAIN) {
-- 
2.26.2




[PULL 22/34] qcow2: Add subcluster support to zero_in_l2_slice()

2020-08-25 Thread Max Reitz
From: Alberto Garcia 

The QCOW_OFLAG_ZERO bit that indicates that a cluster reads as
zeroes is only used in standard L2 entries. Extended L2 entries use
individual 'all zeroes' bits for each subcluster.

This must be taken into account when updating the L2 entry and also
when deciding that an existing entry does not need to be updated.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
Message-Id: 

Signed-off-by: Max Reitz 
---
 block/qcow2-cluster.c | 38 --
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 08ecb4ca0c..5afcd72f5a 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1957,7 +1957,6 @@ static int zero_in_l2_slice(BlockDriverState *bs, 
uint64_t offset,
 int l2_index;
 int ret;
 int i;
-bool unmap = !!(flags & BDRV_REQ_MAY_UNMAP);
 
 ret = get_cluster_table(bs, offset, _slice, _index);
 if (ret < 0) {
@@ -1969,28 +1968,31 @@ static int zero_in_l2_slice(BlockDriverState *bs, 
uint64_t offset,
 assert(nb_clusters <= INT_MAX);
 
 for (i = 0; i < nb_clusters; i++) {
-uint64_t old_offset;
-QCow2ClusterType cluster_type;
-
-old_offset = get_l2_entry(s, l2_slice, l2_index + i);
+uint64_t old_l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
+uint64_t old_l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
+QCow2ClusterType type = qcow2_get_cluster_type(bs, old_l2_entry);
+bool unmap = (type == QCOW2_CLUSTER_COMPRESSED) ||
+((flags & BDRV_REQ_MAY_UNMAP) && qcow2_cluster_is_allocated(type));
+uint64_t new_l2_entry = unmap ? 0 : old_l2_entry;
+uint64_t new_l2_bitmap = old_l2_bitmap;
+
+if (has_subclusters(s)) {
+new_l2_bitmap = QCOW_L2_BITMAP_ALL_ZEROES;
+} else {
+new_l2_entry |= QCOW_OFLAG_ZERO;
+}
 
-/*
- * Minimize L2 changes if the cluster already reads back as
- * zeroes with correct allocation.
- */
-cluster_type = qcow2_get_cluster_type(bs, old_offset);
-if (cluster_type == QCOW2_CLUSTER_ZERO_PLAIN ||
-(cluster_type == QCOW2_CLUSTER_ZERO_ALLOC && !unmap)) {
+if (old_l2_entry == new_l2_entry && old_l2_bitmap == new_l2_bitmap) {
 continue;
 }
 
 qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
-if (cluster_type == QCOW2_CLUSTER_COMPRESSED || unmap) {
-set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
-qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
-} else {
-uint64_t entry = get_l2_entry(s, l2_slice, l2_index + i);
-set_l2_entry(s, l2_slice, l2_index + i, entry | QCOW_OFLAG_ZERO);
+if (unmap) {
+qcow2_free_any_clusters(bs, old_l2_entry, 1, 
QCOW2_DISCARD_REQUEST);
+}
+set_l2_entry(s, l2_slice, l2_index + i, new_l2_entry);
+if (has_subclusters(s)) {
+set_l2_bitmap(s, l2_slice, l2_index + i, new_l2_bitmap);
 }
 }
 
-- 
2.26.2




[PULL 30/34] qcow2: Add prealloc field to QCowL2Meta

2020-08-25 Thread Max Reitz
From: Alberto Garcia 

This field allows us to indicate that the L2 metadata update does not
come from a write request with actual data but from a preallocation
request.

For traditional images this does not make any difference, but for
images with extended L2 entries this means that the clusters are
allocated normally in the L2 table but individual subclusters are
marked as unallocated.

This will allow preallocating images that have a backing file.

There is one special case: when we resize an existing image we can
also request that the new clusters are preallocated. If the image
already had a backing file then we have to hide any possible stale
data and zero out the new clusters (see commit 955c7d6687 for more
details).

In this case the subclusters cannot be left as unallocated so the L2
bitmap must be updated.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
Message-Id: 
<960d4c444a4f5a870e2b47e5da322a73cd9a2f5a.1594396418.git.be...@igalia.com>
Signed-off-by: Max Reitz 
---
 block/qcow2.h | 8 
 block/qcow2-cluster.c | 2 +-
 block/qcow2.c | 6 ++
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 4ef4ae4ab0..f3499e53bf 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -463,6 +463,14 @@ typedef struct QCowL2Meta
  */
 bool skip_cow;
 
+/**
+ * Indicates that this is not a normal write request but a preallocation.
+ * If the image has extended L2 entries this means that no new individual
+ * subclusters will be marked as allocated in the L2 bitmap (but any
+ * existing contents of that bitmap will be kept).
+ */
+bool prealloc;
+
 /**
  * The I/O vector with the data from the actual guest write request.
  * If non-NULL, this is meant to be merged together with the data
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 9d349d61c6..fd506b4cb3 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1067,7 +1067,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, 
QCowL2Meta *m)
 set_l2_entry(s, l2_slice, l2_index + i, offset | QCOW_OFLAG_COPIED);
 
 /* Update bitmap with the subclusters that were just written */
-if (has_subclusters(s)) {
+if (has_subclusters(s) && !m->prealloc) {
 uint64_t l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
 unsigned written_from = m->cow_start.offset;
 unsigned written_to = m->cow_end.offset + m->cow_end.nb_bytes ?:
diff --git a/block/qcow2.c b/block/qcow2.c
index 54c9b7c119..7c03d41170 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2096,6 +2096,7 @@ static coroutine_fn int 
qcow2_handle_l2meta(BlockDriverState *bs,
 QCowL2Meta *next;
 
 if (link_l2) {
+assert(!l2meta->prealloc);
 ret = qcow2_alloc_cluster_link_l2(bs, l2meta);
 if (ret) {
 goto out;
@@ -3130,6 +3131,7 @@ static int coroutine_fn preallocate_co(BlockDriverState 
*bs, uint64_t offset,
 
 while (meta) {
 QCowL2Meta *next = meta->next;
+meta->prealloc = true;
 
 ret = qcow2_alloc_cluster_link_l2(bs, meta);
 if (ret < 0) {
@@ -4217,6 +4219,7 @@ static int coroutine_fn 
qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
 int64_t clusters_allocated;
 int64_t old_file_size, last_cluster, new_file_size;
 uint64_t nb_new_data_clusters, nb_new_l2_tables;
+bool subclusters_need_allocation = false;
 
 /* With a data file, preallocation means just allocating the metadata
  * and forwarding the truncate request to the data file */
@@ -4298,6 +4301,8 @@ static int coroutine_fn 
qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
BDRV_REQ_ZERO_WRITE, NULL);
 if (ret >= 0) {
 flags &= ~BDRV_REQ_ZERO_WRITE;
+/* Ensure that we read zeroes and not backing file data */
+subclusters_need_allocation = true;
 }
 } else {
 ret = -1;
@@ -4336,6 +4341,7 @@ static int coroutine_fn 
qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
 .offset   = nb_clusters << s->cluster_bits,
 .nb_bytes = 0,
 },
+.prealloc = !subclusters_need_allocation,
 };
 qemu_co_queue_init(_requests);
 
-- 
2.26.2




Re: [PATCH v2 2/5] qemu-iotests: Fix FilePaths docstring

2020-08-25 Thread Max Reitz
On 21.08.20 01:54, Nir Soffer wrote:
> When this class was extracted from FilePath, the docstring was not
> updated for generating multiple files, and the example usage was
> referencing unrelated file.
> 
> Fixes: de263986b5dc
> Signed-off-by: Nir Soffer 
> ---
>  tests/qemu-iotests/iotests.py | 14 --
>  1 file changed, 8 insertions(+), 6 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 4/5] qemu-iotests: Merge FilePaths and FilePath

2020-08-25 Thread Max Reitz
On 21.08.20 01:54, Nir Soffer wrote:
> FilePath creates now one temporary file:
> 
> with FilePath("a") as a:
> 
> Or more:
> 
> with FilePath("a", "b", "c") as (a, b, c):
> 
> This is also the behavior of the file_path() helper, used by some of the
> tests. Now we have only 2 helpers for creating temporary files instead
> of 3.
> 
> Signed-off-by: Nir Soffer 
> ---
>  tests/qemu-iotests/194|  2 +-
>  tests/qemu-iotests/208|  2 +-
>  tests/qemu-iotests/222|  2 +-
>  tests/qemu-iotests/257|  4 ++--
>  tests/qemu-iotests/iotests.py | 23 +++
>  5 files changed, 16 insertions(+), 17 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


[PULL 14/34] qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type()

2020-08-25 Thread Max Reitz
From: Alberto Garcia 

This patch adds QCow2SubclusterType, which is the subcluster-level
version of QCow2ClusterType. All QCOW2_SUBCLUSTER_* values have the
the same meaning as their QCOW2_CLUSTER_* equivalents (when they
exist). See below for details and caveats.

In images without extended L2 entries clusters are treated as having
exactly one subcluster so it is possible to replace one data type with
the other while keeping the exact same semantics.

With extended L2 entries there are new possible values, and every
subcluster in the same cluster can obviously have a different
QCow2SubclusterType so functions need to be adapted to work on the
subcluster level.

There are several things that have to be taken into account:

  a) QCOW2_SUBCLUSTER_COMPRESSED means that the whole cluster is
 compressed. We do not support compression at the subcluster
 level.

  b) There are two different values for unallocated subclusters:
 QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN which means that the whole
 cluster is unallocated, and QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC
 which means that the cluster is allocated but the subcluster is
 not. The latter can only happen in images with extended L2
 entries.

  c) QCOW2_SUBCLUSTER_INVALID is used to detect the cases where an L2
 entry has a value that violates the specification. The caller is
 responsible for handling these situations.

 To prevent compatibility problems with images that have invalid
 values but are currently being read by QEMU without causing side
 effects, QCOW2_SUBCLUSTER_INVALID is only returned for images
 with extended L2 entries.

qcow2_cluster_to_subcluster_type() is added as a separate function
from qcow2_get_subcluster_type(), but this is only temporary and both
will be merged in a subsequent patch.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
Message-Id: 
<26ef38e270f25851c98b51278852b4c4a7f97e69.1594396418.git.be...@igalia.com>
Signed-off-by: Max Reitz 
---
 block/qcow2.h | 126 +-
 1 file changed, 125 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 82b86f6cec..3aec6f452a 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -80,6 +80,21 @@
 
 #define QCOW_EXTL2_SUBCLUSTERS_PER_CLUSTER 32
 
+/* The subcluster X [0..31] is allocated */
+#define QCOW_OFLAG_SUB_ALLOC(X)   (1ULL << (X))
+/* The subcluster X [0..31] reads as zeroes */
+#define QCOW_OFLAG_SUB_ZERO(X)(QCOW_OFLAG_SUB_ALLOC(X) << 32)
+/* Subclusters [X, Y) (0 <= X <= Y <= 32) are allocated */
+#define QCOW_OFLAG_SUB_ALLOC_RANGE(X, Y) \
+(QCOW_OFLAG_SUB_ALLOC(Y) - QCOW_OFLAG_SUB_ALLOC(X))
+/* Subclusters [X, Y) (0 <= X <= Y <= 32) read as zeroes */
+#define QCOW_OFLAG_SUB_ZERO_RANGE(X, Y) \
+(QCOW_OFLAG_SUB_ALLOC_RANGE(X, Y) << 32)
+/* L2 entry bitmap with all allocation bits set */
+#define QCOW_L2_BITMAP_ALL_ALLOC  (QCOW_OFLAG_SUB_ALLOC_RANGE(0, 32))
+/* L2 entry bitmap with all "read as zeroes" bits set */
+#define QCOW_L2_BITMAP_ALL_ZEROES (QCOW_OFLAG_SUB_ZERO_RANGE(0, 32))
+
 /* Size of normal and extended L2 entries */
 #define L2E_SIZE_NORMAL   (sizeof(uint64_t))
 #define L2E_SIZE_EXTENDED (sizeof(uint64_t) * 2)
@@ -462,6 +477,33 @@ typedef struct QCowL2Meta
 QLIST_ENTRY(QCowL2Meta) next_in_flight;
 } QCowL2Meta;
 
+/*
+ * In images with standard L2 entries all clusters are treated as if
+ * they had one subcluster so QCow2ClusterType and QCow2SubclusterType
+ * can be mapped to each other and have the exact same meaning
+ * (QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC cannot happen in these images).
+ *
+ * In images with extended L2 entries QCow2ClusterType refers to the
+ * complete cluster and QCow2SubclusterType to each of the individual
+ * subclusters, so there are several possible combinations:
+ *
+ * |--+---|
+ * | Cluster type | Possible subcluster types |
+ * |--+---|
+ * | UNALLOCATED  | UNALLOCATED_PLAIN |
+ * |  |ZERO_PLAIN |
+ * |--+---|
+ * | NORMAL   | UNALLOCATED_ALLOC |
+ * |  |ZERO_ALLOC |
+ * |  |NORMAL |
+ * |--+---|
+ * | COMPRESSED   |COMPRESSED |
+ * |--+---|
+ *
+ * QCOW2_SUBCLUSTER_INVALID means that the L2 entry is incorrect and
+ * the image should be marked corrupt.
+ */
+
 typedef enum QCow2ClusterType {
 QCOW2_CLUSTER_UNALLOCATED,
 QCOW2_CLUSTER_ZERO_PLAIN,
@@ -470,6 +512,16 @@ typedef enum QCow2ClusterType {
 QCOW2_CLUSTER_COMPRESSED,
 } QCow2ClusterType;
 
+typedef enum QCow2SubclusterType {
+QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN,
+QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC,
+QCOW2_SUBCLUSTER_ZERO_PLAIN,
+

[PULL 13/34] qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap()

2020-08-25 Thread Max Reitz
From: Alberto Garcia 

Extended L2 entries are 128-bit wide: 64 bits for the entry itself and
64 bits for the subcluster allocation bitmap.

In order to support them correctly get/set_l2_entry() need to be
updated so they take the entry width into account in order to
calculate the correct offset.

This patch also adds the get/set_l2_bitmap() functions that are
used to access the bitmaps. For convenience we allow calling
get_l2_bitmap() on images without subclusters. In this case the
returned value is always 0 and has no meaning.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
Message-Id: 
<6ee0f81ae3329c991de125618b3675e1e46acdbb.1594396418.git.be...@igalia.com>
Signed-off-by: Max Reitz 
---
 block/qcow2.h | 21 +
 1 file changed, 21 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 46b351229a..82b86f6cec 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -533,15 +533,36 @@ static inline size_t l2_entry_size(BDRVQcow2State *s)
 static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
 int idx)
 {
+idx *= l2_entry_size(s) / sizeof(uint64_t);
 return be64_to_cpu(l2_slice[idx]);
 }
 
+static inline uint64_t get_l2_bitmap(BDRVQcow2State *s, uint64_t *l2_slice,
+ int idx)
+{
+if (has_subclusters(s)) {
+idx *= l2_entry_size(s) / sizeof(uint64_t);
+return be64_to_cpu(l2_slice[idx + 1]);
+} else {
+return 0; /* For convenience only; this value has no meaning. */
+}
+}
+
 static inline void set_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
 int idx, uint64_t entry)
 {
+idx *= l2_entry_size(s) / sizeof(uint64_t);
 l2_slice[idx] = cpu_to_be64(entry);
 }
 
+static inline void set_l2_bitmap(BDRVQcow2State *s, uint64_t *l2_slice,
+ int idx, uint64_t bitmap)
+{
+assert(has_subclusters(s));
+idx *= l2_entry_size(s) / sizeof(uint64_t);
+l2_slice[idx + 1] = cpu_to_be64(bitmap);
+}
+
 static inline bool has_data_file(BlockDriverState *bs)
 {
 BDRVQcow2State *s = bs->opaque;
-- 
2.26.2




[PULL 08/34] qcow2: Add dummy has_subclusters() function

2020-08-25 Thread Max Reitz
From: Alberto Garcia 

This function will be used by the qcow2 code to check if an image has
subclusters or not.

At the moment this simply returns false. Once all patches needed for
subcluster support are ready then QEMU will be able to create and
read images with subclusters and this function will return the actual
value.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Message-Id: 
<905526221083581a1b7057bca1585487661c5c13.1594396418.git.be...@igalia.com>
Signed-off-by: Max Reitz 
---
 block/qcow2.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index eecbadc4cb..2064dd3d85 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -510,6 +510,12 @@ typedef enum QCow2MetadataOverlap {
 
 #define INV_OFFSET (-1ULL)
 
+static inline bool has_subclusters(BDRVQcow2State *s)
+{
+/* FIXME: Return false until this feature is complete */
+return false;
+}
+
 static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
 int idx)
 {
-- 
2.26.2




Re: [PATCH v2 1/5] qemu-iotests: Fix FilePaths cleanup

2020-08-25 Thread Max Reitz
On 21.08.20 01:54, Nir Soffer wrote:
> If os.remove() fails to remove one of the paths, for example if the file
> was removed by the test, the cleanup loop would exit silently, without
> removing the rest of the files.
> 
> Fixes: de263986b5dc
> Signed-off-by: Nir Soffer 
> ---
>  tests/qemu-iotests/iotests.py | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v7 3/4] qapi: add filter-node-name to block-stream

2020-08-25 Thread Andrey Shinkevich

On 25.08.2020 09:37, Markus Armbruster wrote:

Andrey Shinkevich  writes:


Provide the possibility to pass the 'filter-node-name' parameter to the
block-stream job as it is done for the commit block job.

Signed-off-by: Andrey Shinkevich 
Reviewed-by: Vladimir Sementsov-Ogievskiy 

[...]

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 0b8ccd3..1db6ce1 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2524,6 +2524,11 @@
  #'stop' and 'enospc' can only be used if the block device
  #supports io-status (see BlockInfo).  Since 1.3.
  #
+# @filter-node-name: the node name that should be assigned to the
+#filter driver that the stream job inserts into the graph
+#above @device. If this option is not given, a node name is
+#autogenerated. (Since: 5.1)

It's (Since: 5.2) now.



Well noted, Markus, thank you.

Andrey





+#
  # @auto-finalize: When false, this job will wait in a PENDING state after it 
has
  # finished its work, waiting for @block-job-finalize before
  # making any block graph changes.
@@ -2554,6 +2559,7 @@
'data': { '*job-id': 'str', 'device': 'str', '*base': 'str',
  '*base-node': 'str', '*backing-file': 'str', '*speed': 'int',
  '*on-error': 'BlockdevOnError',
+'*filter-node-name': 'str',
  '*auto-finalize': 'bool', '*auto-dismiss': 'bool' } }
  
  ##




Re: [PATCH v2 5/5] qemu-iotests: Simplify FilePath __init__

2020-08-25 Thread Max Reitz
On 21.08.20 01:54, Nir Soffer wrote:
> Use list comprehension instead of append loop.
> 
> Signed-off-by: Nir Soffer 
> ---
>  tests/qemu-iotests/iotests.py | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 32/58] ahci: Move QOM macro to header

2020-08-25 Thread Daniel P . Berrangé
On Wed, Aug 19, 2020 at 08:12:10PM -0400, Eduardo Habkost wrote:
> Move the ALLWINNER_AHCI macro close to the TYPE_ALLWINNER_AHCI
> define.
> 
> This will make future conversion to OBJECT_DECLARE* easier.
> 
> Signed-off-by: Eduardo Habkost 
> ---
> Changes series v1 -> v2: new patch in series v2
> 
> Cc: John Snow 
> Cc: qemu-block@nongnu.org
> Cc: qemu-de...@nongnu.org
> ---
>  include/hw/ide/ahci.h   | 2 ++
>  hw/ide/ahci-allwinner.c | 3 ---
>  2 files changed, 2 insertions(+), 3 deletions(-)

Reviewed-by: Daniel P. Berrangé 


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




[PULL 12/34] qcow2: Add l2_entry_size()

2020-08-25 Thread Max Reitz
From: Alberto Garcia 

qcow2 images with subclusters have 128-bit L2 entries. The first 64
bits contain the same information as traditional images and the last
64 bits form a bitmap with the status of each individual subcluster.

Because of that we cannot assume that L2 entries are sizeof(uint64_t)
anymore. This function returns the proper value for the image.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Message-Id: 

Signed-off-by: Max Reitz 
---
 block/qcow2.h  |  9 +
 block/qcow2-cluster.c  | 12 ++--
 block/qcow2-refcount.c | 14 --
 block/qcow2.c  |  8 
 4 files changed, 27 insertions(+), 16 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 4fe31adfd3..46b351229a 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -80,6 +80,10 @@
 
 #define QCOW_EXTL2_SUBCLUSTERS_PER_CLUSTER 32
 
+/* Size of normal and extended L2 entries */
+#define L2E_SIZE_NORMAL   (sizeof(uint64_t))
+#define L2E_SIZE_EXTENDED (sizeof(uint64_t) * 2)
+
 #define MIN_CLUSTER_BITS 9
 #define MAX_CLUSTER_BITS 21
 
@@ -521,6 +525,11 @@ static inline bool has_subclusters(BDRVQcow2State *s)
 return false;
 }
 
+static inline size_t l2_entry_size(BDRVQcow2State *s)
+{
+return has_subclusters(s) ? L2E_SIZE_EXTENDED : L2E_SIZE_NORMAL;
+}
+
 static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
 int idx)
 {
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 5bd1e1feb8..0b762502f6 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -208,7 +208,7 @@ static int l2_load(BlockDriverState *bs, uint64_t offset,
uint64_t l2_offset, uint64_t **l2_slice)
 {
 BDRVQcow2State *s = bs->opaque;
-int start_of_slice = sizeof(uint64_t) *
+int start_of_slice = l2_entry_size(s) *
 (offset_to_l2_index(s, offset) - offset_to_l2_slice_index(s, offset));
 
 return qcow2_cache_get(bs, s->l2_table_cache, l2_offset + start_of_slice,
@@ -281,7 +281,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index)
 
 /* allocate a new l2 entry */
 
-l2_offset = qcow2_alloc_clusters(bs, s->l2_size * sizeof(uint64_t));
+l2_offset = qcow2_alloc_clusters(bs, s->l2_size * l2_entry_size(s));
 if (l2_offset < 0) {
 ret = l2_offset;
 goto fail;
@@ -305,7 +305,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index)
 
 /* allocate a new entry in the l2 cache */
 
-slice_size2 = s->l2_slice_size * sizeof(uint64_t);
+slice_size2 = s->l2_slice_size * l2_entry_size(s);
 n_slices = s->cluster_size / slice_size2;
 
 trace_qcow2_l2_allocate_get_empty(bs, l1_index);
@@ -369,7 +369,7 @@ fail:
 }
 s->l1_table[l1_index] = old_l2_offset;
 if (l2_offset > 0) {
-qcow2_free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t),
+qcow2_free_clusters(bs, l2_offset, s->l2_size * l2_entry_size(s),
 QCOW2_DISCARD_ALWAYS);
 }
 return ret;
@@ -717,7 +717,7 @@ static int get_cluster_table(BlockDriverState *bs, uint64_t 
offset,
 
 /* Then decrease the refcount of the old table */
 if (l2_offset) {
-qcow2_free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t),
+qcow2_free_clusters(bs, l2_offset, s->l2_size * l2_entry_size(s),
 QCOW2_DISCARD_OTHER);
 }
 
@@ -1921,7 +1921,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState 
*bs, uint64_t *l1_table,
 int ret;
 int i, j;
 
-slice_size2 = s->l2_slice_size * sizeof(uint64_t);
+slice_size2 = s->l2_slice_size * l2_entry_size(s);
 n_slices = s->cluster_size / slice_size2;
 
 if (!is_active_l1) {
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 04546838e8..770c5dbc83 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1254,7 +1254,7 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
 l2_slice = NULL;
 l1_table = NULL;
 l1_size2 = l1_size * sizeof(uint64_t);
-slice_size2 = s->l2_slice_size * sizeof(uint64_t);
+slice_size2 = s->l2_slice_size * l2_entry_size(s);
 n_slices = s->cluster_size / slice_size2;
 
 s->cache_discards = true;
@@ -1605,7 +1605,7 @@ static int check_refcounts_l2(BlockDriverState *bs, 
BdrvCheckResult *res,
 int i, l2_size, nb_csectors, ret;
 
 /* Read L2 table from disk */
-l2_size = s->l2_size * sizeof(uint64_t);
+l2_size = s->l2_size * l2_entry_size(s);
 l2_table = g_malloc(l2_size);
 
 ret = bdrv_pread(bs->file, l2_offset, l2_table, l2_size);
@@ -1680,15 +1680,16 @@ static int check_refcounts_l2(BlockDriverState *bs, 
BdrvCheckResult *res,
 fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR",
 offset);
 if (fix & BDRV_FIX_ERRORS) {
+int idx = i * (l2_entry_size(s) / sizeof(uint64_t));
 uint64_t 

[PULL 10/34] qcow2: Add offset_to_sc_index()

2020-08-25 Thread Max Reitz
From: Alberto Garcia 

For a given offset, return the subcluster number within its cluster
(i.e. with 32 subclusters per cluster it returns a number between 0
and 31).

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Message-Id: 
<56e3e4ac0d827c6a2f5f259106c5ddb7c4ca2653.1594396418.git.be...@igalia.com>
Signed-off-by: Max Reitz 
---
 block/qcow2.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index eee4c8de9c..2503374677 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -581,6 +581,11 @@ static inline int offset_to_l2_slice_index(BDRVQcow2State 
*s, int64_t offset)
 return (offset >> s->cluster_bits) & (s->l2_slice_size - 1);
 }
 
+static inline int offset_to_sc_index(BDRVQcow2State *s, int64_t offset)
+{
+return (offset >> s->subcluster_bits) & (s->subclusters_per_cluster - 1);
+}
+
 static inline int64_t qcow2_vm_state_offset(BDRVQcow2State *s)
 {
 return (int64_t)s->l1_vm_state_index << (s->cluster_bits + s->l2_bits);
-- 
2.26.2




[PULL 09/34] qcow2: Add subcluster-related fields to BDRVQcow2State

2020-08-25 Thread Max Reitz
From: Alberto Garcia 

This patch adds the following new fields to BDRVQcow2State:

- subclusters_per_cluster: Number of subclusters in a cluster
- subcluster_size: The size of each subcluster, in bytes
- subcluster_bits: No. of bits so 1 << subcluster_bits = subcluster_size

Images without subclusters are treated as if they had exactly one
subcluster per cluster (i.e. subcluster_size = cluster_size).

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Message-Id: 
<55bfeac86b092fa2c9d182a95cbeb479ff7eca4f.1594396418.git.be...@igalia.com>
Signed-off-by: Max Reitz 
---
 block/qcow2.h | 5 +
 block/qcow2.c | 5 +
 2 files changed, 10 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 2064dd3d85..eee4c8de9c 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -78,6 +78,8 @@
 /* The cluster reads as all zeros */
 #define QCOW_OFLAG_ZERO (1ULL << 0)
 
+#define QCOW_EXTL2_SUBCLUSTERS_PER_CLUSTER 32
+
 #define MIN_CLUSTER_BITS 9
 #define MAX_CLUSTER_BITS 21
 
@@ -295,6 +297,9 @@ typedef struct BDRVQcow2State {
 int cluster_bits;
 int cluster_size;
 int l2_slice_size;
+int subcluster_bits;
+int subcluster_size;
+int subclusters_per_cluster;
 int l2_bits;
 int l2_size;
 int l1_size;
diff --git a/block/qcow2.c b/block/qcow2.c
index 6738daa247..fb4584d3ee 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1444,6 +1444,11 @@ static int coroutine_fn qcow2_do_open(BlockDriverState 
*bs, QDict *options,
 }
 }
 
+s->subclusters_per_cluster =
+has_subclusters(s) ? QCOW_EXTL2_SUBCLUSTERS_PER_CLUSTER : 1;
+s->subcluster_size = s->cluster_size / s->subclusters_per_cluster;
+s->subcluster_bits = ctz32(s->subcluster_size);
+
 /* Check support for various header values */
 if (header.refcount_order > 6) {
 error_setg(errp, "Reference count entry width too large; may not "
-- 
2.26.2




Re: [RFC PATCH 0/9] block/curl: Add caching of data downloaded from the remote server

2020-08-25 Thread Max Reitz
On 19.08.20 16:19, David Edmondson wrote:
> On Wednesday, 2020-08-19 at 15:11:37 +01, Stefan Hajnoczi wrote:
> 
>> On Tue, Aug 18, 2020 at 12:08:36PM +0100, David Edmondson wrote:
>>> When using qemu-img to convert an image that is hosted on an HTTP
>>> server to some faster local (or pseudo-local) storage, the overall
>>> performance can be improved by reading data from the HTTP server in
>>> larger blocks and by caching and re-using blocks already read. This
>>> set of patches implements both of these, and adds a further patch
>>> allowing an offset to be added to all of the HTTP requests.
>>
>> Hi David,
>> Thanks for posting this! Kevin and Max are the maintainers in this area,
>> but I wanted to ask an initial question:
>>
>> Is caching curl-specific or could this be implemented as a block filter
>> driver so that it can be stacked on top of other network protocols too?
> 
> This implementation is curl specific, as you probably surmised. I will
> look at implementing something similar as a block filter.

I think from an implementation standpoint the best would be if we could
just use such a generic caching block filter above all curl nodes so we
can drop all caching from curl.

However, I suppose then we’d at least have the problem of how to put
this cache node on top of all curl nodes without breaking compatibility,
which may be impossible.

OTOH, maybe it would be fine to leave the new cache optional, and just
leave the curl driver itself as it is.  Which would also mean that
wouldn’t need readahead support in the cache driver.

But if we do need this full cache directly in the curl driver, is it at
least possible to share most of the caching code between it and a
(potential future) dedicated cache block driver?

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 2/2] nbd: disable signals and forking on Windows builds

2020-08-25 Thread Daniel P . Berrangé
On Mon, Aug 24, 2020 at 12:12:53PM -0500, Eric Blake wrote:
> On 8/24/20 12:02 PM, Daniel P. Berrangé wrote:
> > Disabling these parts are sufficient to get the qemu-nbd program
> > compiling in a Windows build.
> > 
> > Signed-off-by: Daniel P. Berrangé 
> > ---
> >   meson.build |  7 ++-
> >   qemu-nbd.c  | 10 +-
> >   2 files changed, 11 insertions(+), 6 deletions(-)
> 
> Feels a bit hacky at what it supports, but certainly better than nothing ;)
> 
> > 
> > diff --git a/meson.build b/meson.build
> > index df5bf728b5..1071871605 100644
> > --- a/meson.build
> > +++ b/meson.build
> > @@ -1074,12 +1074,9 @@ if have_tools
> >dependencies: [authz, block, crypto, io, qom, qemuutil], 
> > install: true)
> > qemu_io = executable('qemu-io', files('qemu-io.c'),
> >dependencies: [block, qemuutil], install: true)
> > -  qemu_block_tools += [qemu_img, qemu_io]
> > -  if targetos == 'linux' or targetos == 'sunos' or targetos.endswith('bsd')
> > -qemu_nbd = executable('qemu-nbd', files('qemu-nbd.c'),
> > +  qemu_nbd = executable('qemu-nbd', files('qemu-nbd.c'),
> >  dependencies: [block, qemuutil], install: true)
> 
> Conflicts with this patch:
> https://lists.gnu.org/archive/html/qemu-devel/2020-08/msg05546.html
> 
> but this one gets rid of the need for that one.
> 
> > -qemu_block_tools += [qemu_nbd]
> > -  endif
> > +  qemu_block_tools += [qemu_img, qemu_io, qemu_nbd]
> > subdir('storage-daemon')
> > subdir('contrib/rdmacm-mux')
> > diff --git a/qemu-nbd.c b/qemu-nbd.c
> > index b102874f0f..c6fd6524d3 100644
> > --- a/qemu-nbd.c
> > +++ b/qemu-nbd.c
> > @@ -155,12 +155,13 @@ QEMU_COPYRIGHT "\n"
> >   , name);
> >   }
> > +#ifndef WIN32
> >   static void termsig_handler(int signum)
> >   {
> >   atomic_cmpxchg(, RUNNING, TERMINATE);
> >   qemu_notify_event();
> >   }
> > -
> > +#endif
> 
> How does one terminate a long-running server on Windows if there is no
> SIGTERM handler?  I guess Ctrl-C does something, but without the state
> notification from a signal handler, you are getting less-clean shutdowns,
> which may explain the hangs you were seeing in testing?  But incremental
> progress is fine, and I see no reason to not take this patch as-is.

The hangs occurred even with windows client/ native server, just less
frequently so don't think it is related.

Re-reading the code I notice this SIGTERM handler is only there for
the NBD device client thread, so we should skip it if that is not
installed.

Ctrl-C does SIGINT, so that's unrelated to the SIGTERM handler.

> 
> Reviewed-by: Eric Blake 
> 
> I'm happy to queue this series through my NBD tree.

I'll post a v2

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v2 2/5] qemu-iotests: Fix FilePaths docstring

2020-08-25 Thread Nir Soffer
On Tue, Aug 25, 2020 at 1:48 PM Max Reitz  wrote:
>
> On 21.08.20 01:54, Nir Soffer wrote:
> > When this class was extracted from FilePath, the docstring was not
> > updated for generating multiple files, and the example usage was
> > referencing unrelated file.
> >
> > Fixes: de263986b5dc
> > Signed-off-by: Nir Soffer 
> > ---
> >  tests/qemu-iotests/iotests.py | 14 --
> >  1 file changed, 8 insertions(+), 6 deletions(-)
> >
> > diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
> > index 16a04df8a3..f34a1d7ef1 100644
> > --- a/tests/qemu-iotests/iotests.py
> > +++ b/tests/qemu-iotests/iotests.py
> > @@ -450,14 +450,16 @@ def file_pattern(name):
> >
> >  class FilePaths:
> >  """
> > -FilePaths is an auto-generated filename that cleans itself up.
> > +Context manager generating multiple file names. The generated files are
> > +removed when exiting the context.
> >
> > -Use this context manager to generate filenames and ensure that the file
> > -gets deleted::
> > +Example usage:
> > +
> > +with FilePaths(['test.img', 'test.sock']) as (img_path, sock_path):
>
> On second thought, this isn’t a great example because image files and
> sockets should go into different base directories.

Right, will improve it in v3.

>
> Max
>
> > +# Use img_path and sock_path here...
> > +
> > +# img_path and sock_path are automatically removed here.
> >
> > -with FilePaths(['test.img']) as img_path:
> > -qemu_img('create', img_path, '1G')
> > -# migration_sock_path is automatically deleted
> >  """
> >  def __init__(self, names, base_dir=test_dir):
> >  self.paths = []
> >
>
>




[PULL 11/34] qcow2: Add offset_into_subcluster() and size_to_subclusters()

2020-08-25 Thread Max Reitz
From: Alberto Garcia 

Like offset_into_cluster() and size_to_clusters(), but for
subclusters.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
Message-Id: 
<3cc2390dcdef3d234d47c741b708bd8734490862.1594396418.git.be...@igalia.com>
Signed-off-by: Max Reitz 
---
 block/qcow2.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 2503374677..4fe31adfd3 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -555,11 +555,21 @@ static inline int64_t offset_into_cluster(BDRVQcow2State 
*s, int64_t offset)
 return offset & (s->cluster_size - 1);
 }
 
+static inline int64_t offset_into_subcluster(BDRVQcow2State *s, int64_t offset)
+{
+return offset & (s->subcluster_size - 1);
+}
+
 static inline uint64_t size_to_clusters(BDRVQcow2State *s, uint64_t size)
 {
 return (size + (s->cluster_size - 1)) >> s->cluster_bits;
 }
 
+static inline uint64_t size_to_subclusters(BDRVQcow2State *s, uint64_t size)
+{
+return (size + (s->subcluster_size - 1)) >> s->subcluster_bits;
+}
+
 static inline int64_t size_to_l1(BDRVQcow2State *s, int64_t size)
 {
 int shift = s->cluster_bits + s->l2_bits;
-- 
2.26.2




[PULL 20/34] qcow2: Add subcluster support to calculate_l2_meta()

2020-08-25 Thread Max Reitz
From: Alberto Garcia 

If an image has subclusters then there are more copy-on-write
scenarios that we need to consider. Let's say we have a write request
from the middle of subcluster #3 until the end of the cluster:

1) If we are writing to a newly allocated cluster then we need
   copy-on-write. The previous contents of subclusters #0 to #3 must
   be copied to the new cluster. We can optimize this process by
   skipping all leading unallocated or zero subclusters (the status of
   those skipped subclusters will be reflected in the new L2 bitmap).

2) If we are overwriting an existing cluster:

   2.1) If subcluster #3 is unallocated or has the all-zeroes bit set
then we need copy-on-write (on subcluster #3 only).

   2.2) If subcluster #3 was already allocated then there is no need
for any copy-on-write. However we still need to update the L2
bitmap to reflect possible changes in the allocation status of
subclusters #4 to #31. Because of this, this function checks
if all the overwritten subclusters are already allocated and
in this case it returns without creating a new QCowL2Meta
structure.

After all these changes l2meta_cow_start() and l2meta_cow_end()
are not necessarily cluster-aligned anymore. We need to update the
calculation of old_start and old_end in handle_dependencies() to
guarantee that no two requests try to write on the same cluster.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
Message-Id: 
<4292dd56e4446d386a2fe307311737a711c00708.1594396418.git.be...@igalia.com>
Signed-off-by: Max Reitz 
---
 block/qcow2-cluster.c | 167 +-
 1 file changed, 133 insertions(+), 34 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 5937937596..5e7ae0843d 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -387,7 +387,6 @@ fail:
  * If the L2 entry is invalid return -errno and set @type to
  * QCOW2_SUBCLUSTER_INVALID.
  */
-G_GNUC_UNUSED
 static int qcow2_get_subcluster_range_type(BlockDriverState *bs,
uint64_t l2_entry,
uint64_t l2_bitmap,
@@ -,56 +1110,148 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, 
QCowL2Meta *m)
  * If @keep_old is true it means that the clusters were already
  * allocated and will be overwritten. If false then the clusters are
  * new and we have to decrease the reference count of the old ones.
+ *
+ * Returns 0 on success, -errno on failure.
  */
-static void calculate_l2_meta(BlockDriverState *bs,
-  uint64_t host_cluster_offset,
-  uint64_t guest_offset, unsigned bytes,
-  uint64_t *l2_slice, QCowL2Meta **m, bool 
keep_old)
+static int calculate_l2_meta(BlockDriverState *bs, uint64_t 
host_cluster_offset,
+ uint64_t guest_offset, unsigned bytes,
+ uint64_t *l2_slice, QCowL2Meta **m, bool keep_old)
 {
 BDRVQcow2State *s = bs->opaque;
-int l2_index = offset_to_l2_slice_index(s, guest_offset);
-uint64_t l2_entry;
+int sc_index, l2_index = offset_to_l2_slice_index(s, guest_offset);
+uint64_t l2_entry, l2_bitmap;
 unsigned cow_start_from, cow_end_to;
 unsigned cow_start_to = offset_into_cluster(s, guest_offset);
 unsigned cow_end_from = cow_start_to + bytes;
 unsigned nb_clusters = size_to_clusters(s, cow_end_from);
 QCowL2Meta *old_m = *m;
-QCow2ClusterType type;
+QCow2SubclusterType type;
+int i;
+bool skip_cow = keep_old;
 
 assert(nb_clusters <= s->l2_slice_size - l2_index);
 
-/* Return if there's no COW (all clusters are normal and we keep them) */
-if (keep_old) {
-int i;
-for (i = 0; i < nb_clusters; i++) {
-l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
-if (qcow2_get_cluster_type(bs, l2_entry) != QCOW2_CLUSTER_NORMAL) {
-break;
+/* Check the type of all affected subclusters */
+for (i = 0; i < nb_clusters; i++) {
+l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
+l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
+if (skip_cow) {
+unsigned write_from = MAX(cow_start_to, i << s->cluster_bits);
+unsigned write_to = MIN(cow_end_from, (i + 1) << s->cluster_bits);
+int first_sc = offset_to_sc_index(s, write_from);
+int last_sc = offset_to_sc_index(s, write_to - 1);
+int cnt = qcow2_get_subcluster_range_type(bs, l2_entry, l2_bitmap,
+  first_sc, );
+/* Is any of the subclusters of type != QCOW2_SUBCLUSTER_NORMAL ? 
*/
+if (type != QCOW2_SUBCLUSTER_NORMAL || first_sc + cnt <= last_sc) {
+skip_cow = false;
 }
+} else {
+/* If we 

[PULL 23/34] qcow2: Add subcluster support to discard_in_l2_slice()

2020-08-25 Thread Max Reitz
From: Alberto Garcia 

Two things need to be taken into account here:

1) With full_discard == true the L2 entry must be cleared completely.
   This also includes the L2 bitmap if the image has extended L2
   entries.

2) With full_discard == false we have to make the discarded cluster
   read back as zeroes. With normal L2 entries this is done with the
   QCOW_OFLAG_ZERO bit, whereas with extended L2 entries this is done
   with the individual 'all zeroes' bits for each subcluster.

   Note however that QCOW_OFLAG_ZERO is not supported in v2 qcow2
   images so, if there is a backing file, discard cannot guarantee
   that the image will read back as zeroes. If this is important for
   the caller it should forbid it as qcow2_co_pdiscard() does (see
   80f5c01183 for more details).

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
Message-Id: 
<5ef8274e628aa3ab559bfac467abf488534f2b76.1594396418.git.be...@igalia.com>
Signed-off-by: Max Reitz 
---
 block/qcow2-cluster.c | 52 +++
 1 file changed, 23 insertions(+), 29 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 5afcd72f5a..a41351aba5 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1848,11 +1848,17 @@ static int discard_in_l2_slice(BlockDriverState *bs, 
uint64_t offset,
 assert(nb_clusters <= INT_MAX);
 
 for (i = 0; i < nb_clusters; i++) {
-uint64_t old_l2_entry;
-
-old_l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
+uint64_t old_l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
+uint64_t old_l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
+uint64_t new_l2_entry = old_l2_entry;
+uint64_t new_l2_bitmap = old_l2_bitmap;
+QCow2ClusterType cluster_type =
+qcow2_get_cluster_type(bs, old_l2_entry);
 
 /*
+ * If full_discard is true, the cluster should not read back as zeroes,
+ * but rather fall through to the backing file.
+ *
  * If full_discard is false, make sure that a discarded area reads back
  * as zeroes for v3 images (we cannot do it for v2 without actually
  * writing a zero-filled buffer). We can skip the operation if the
@@ -1861,40 +1867,28 @@ static int discard_in_l2_slice(BlockDriverState *bs, 
uint64_t offset,
  *
  * TODO We might want to use bdrv_block_status(bs) here, but we're
  * holding s->lock, so that doesn't work today.
- *
- * If full_discard is true, the sector should not read back as zeroes,
- * but rather fall through to the backing file.
  */
-switch (qcow2_get_cluster_type(bs, old_l2_entry)) {
-case QCOW2_CLUSTER_UNALLOCATED:
-if (full_discard || !bs->backing) {
-continue;
-}
-break;
-
-case QCOW2_CLUSTER_ZERO_PLAIN:
-if (!full_discard) {
-continue;
+if (full_discard) {
+new_l2_entry = new_l2_bitmap = 0;
+} else if (bs->backing || qcow2_cluster_is_allocated(cluster_type)) {
+if (has_subclusters(s)) {
+new_l2_entry = 0;
+new_l2_bitmap = QCOW_L2_BITMAP_ALL_ZEROES;
+} else {
+new_l2_entry = s->qcow_version >= 3 ? QCOW_OFLAG_ZERO : 0;
 }
-break;
-
-case QCOW2_CLUSTER_ZERO_ALLOC:
-case QCOW2_CLUSTER_NORMAL:
-case QCOW2_CLUSTER_COMPRESSED:
-break;
+}
 
-default:
-abort();
+if (old_l2_entry == new_l2_entry && old_l2_bitmap == new_l2_bitmap) {
+continue;
 }
 
 /* First remove L2 entries */
 qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
-if (!full_discard && s->qcow_version >= 3) {
-set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
-} else {
-set_l2_entry(s, l2_slice, l2_index + i, 0);
+set_l2_entry(s, l2_slice, l2_index + i, new_l2_entry);
+if (has_subclusters(s)) {
+set_l2_bitmap(s, l2_slice, l2_index + i, new_l2_bitmap);
 }
-
 /* Then decrease the refcount */
 qcow2_free_any_clusters(bs, old_l2_entry, 1, type);
 }
-- 
2.26.2




[PULL 25/34] qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()

2020-08-25 Thread Max Reitz
From: Alberto Garcia 

The L2 bitmap needs to be updated after each write to indicate what
new subclusters are now allocated. This needs to happen even if the
cluster was already allocated and the L2 entry was otherwise valid.

In some cases however a write operation doesn't need change the L2
bitmap (because all affected subclusters were already allocated). This
is detected in calculate_l2_meta(), and qcow2_alloc_cluster_link_l2()
is never called in those cases.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
Message-Id: 
<0875620d49f44320334b6a91c73b3f301f975f38.1594396418.git.be...@igalia.com>
Signed-off-by: Max Reitz 
---
 block/qcow2-cluster.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index a41351aba5..816ddc7639 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1062,6 +1062,24 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, 
QCowL2Meta *m)
 assert((offset & L2E_OFFSET_MASK) == offset);
 
 set_l2_entry(s, l2_slice, l2_index + i, offset | QCOW_OFLAG_COPIED);
+
+/* Update bitmap with the subclusters that were just written */
+if (has_subclusters(s)) {
+uint64_t l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
+unsigned written_from = m->cow_start.offset;
+unsigned written_to = m->cow_end.offset + m->cow_end.nb_bytes ?:
+m->nb_clusters << s->cluster_bits;
+int first_sc, last_sc;
+/* Narrow written_from and written_to down to the current cluster 
*/
+written_from = MAX(written_from, i << s->cluster_bits);
+written_to   = MIN(written_to, (i + 1) << s->cluster_bits);
+assert(written_from < written_to);
+first_sc = offset_to_sc_index(s, written_from);
+last_sc  = offset_to_sc_index(s, written_to - 1);
+l2_bitmap |= QCOW_OFLAG_SUB_ALLOC_RANGE(first_sc, last_sc + 1);
+l2_bitmap &= ~QCOW_OFLAG_SUB_ZERO_RANGE(first_sc, last_sc + 1);
+set_l2_bitmap(s, l2_slice, l2_index + i, l2_bitmap);
+}
  }
 
 
-- 
2.26.2




[PULL 16/34] qcow2: Add qcow2_cluster_is_allocated()

2020-08-25 Thread Max Reitz
From: Alberto Garcia 

This helper function tells us if a cluster is allocated (that is,
there is an associated host offset for it).

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
Message-Id: 
<6d8771c5c79cbdc6c519875a5078e1cc85856d63.1594396418.git.be...@igalia.com>
Signed-off-by: Max Reitz 
---
 block/qcow2.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 3aec6f452a..ea647c8bb5 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -780,6 +780,12 @@ QCow2SubclusterType 
qcow2_get_subcluster_type(BlockDriverState *bs,
 }
 }
 
+static inline bool qcow2_cluster_is_allocated(QCow2ClusterType type)
+{
+return (type == QCOW2_CLUSTER_COMPRESSED || type == QCOW2_CLUSTER_NORMAL ||
+type == QCOW2_CLUSTER_ZERO_ALLOC);
+}
+
 /* Check whether refcounts are eager or lazy */
 static inline bool qcow2_need_accurate_refcounts(BDRVQcow2State *s)
 {
-- 
2.26.2




Re: Dropping posix_fallocate for -o preallocation falloc

2020-08-25 Thread Kevin Wolf
Am 23.08.2020 um 16:46 hat Nir Soffer geschrieben:
> Using -o preallocation falloc works great on NFS 4.2 and local file system,
> when fallocate() is supported, but when it is not, posix_fallocate falls back
> to very inefficient way:
> https://code.woboq.org/userspace/glibc/sysdeps/posix/posix_fallocate.c.html#96
> 
> This will read the last byte for every 4k block, and if the byte is
> null, write one null byte.
> 
> This minimizes the amount of data sent over the wire, but is very slow.
> 
> In file-posix we optimize this flow by not truncating the file to the
> final size, so this will only write one null byte for every 4k block,
> but this is still very slow.
> 
> Except the poor performance, we have a bug showing that for some reason,
> this does not work well with OFD locking:
> https://bugzilla.redhat.com/1851097
> 
> In oVirt 4.4.2 we avoid the issue by not using -o preallocation
> falloc. Instead we use our own fallocate helper:
> https://github.com/oVirt/vdsm/blob/master/helpers/fallocate
> 
> (We got feedback that the name of this helper is confusing since it does
> destructive operation when fallocate() is not supported. We will
> change the name)
> 
> This helper is similar to posix_fallocate, but instead of falling back
> to writing one byte per 4k block, it falls back to writing zeros in
> large blocks.
> 
> Testing shows that this improves fallocation time by 385% for one disk, and
> 468% for 10 concurrent disk preallocation:
> https://bugzilla.redhat.com/1850267#c25
> 
> I think the next step is to move this change into qemu, so all users can
> benefit from this change.
> 
> I think the way to do this is to replace posix_fallocate() with fallocate(),
> and fallback to "full" preallocation if fallocate is not supported.

Yes, this makes sense to me.

However, what full preallocation uses, is not exactly a large buffer
either (64k). Especially with your desire to use this with O_DIRECT,
we'd better use something like a 2 MB buffer.

Actually, looking at the code, it's completely broken for O_DIRECT
because it g_malloc()s the (then misaligned) buffer. This needs to
become qemu_try_blockalign0(). It's currently failing:

$ ./qemu-img create -f raw ~/tmp/test.raw 64M
Formatting '/home/kwolf/tmp/test.raw', fmt=raw size=67108864
$ ./qemu-img resize --image-opts --preallocation=full 
driver=file,filename=/home/kwolf/tmp/test.raw,cache.direct=on 72M
qemu-img: Could not write zeros for preallocation: Invalid argument

> However with current code, in qemu-img create, we don't have a way to
> force O_DIRECT for the preallocation, and in qemu-img convert the
> preallocation step does not respect the -t none flag. Not using
> O_DIRECT in oVirt is very bad, and likely to cause timeouts in sanlock
> when the kernel flushes the page cache.

If we're talking about regular files, the easy way without changing QEMU
would be to use the same way as in my example above: Create a 0 byte
image file and then resize it with full preallocation.

But I can see that passing a cache mode to image creation is desirable.
So are probably some other options for image creation processes where
the image is actually opened with its block driver.

Possible options I see in the order from an ad-hoc hack to a fully
generic interface:

1. Just add a cache mode create option to file-posix. qemu-img create
   would mostly ignore this option for qcow2 (it would use O_DIRECT for
   creating the raw image file, but not for opening when writing the
   qcow2 data, including preallocation)

   QMP blockde-create would allow you to get the desired behaviour
   because you open the raw layer explicitly there. It just wouldn't be
   available in qemu-img create.

2. Add a cache mode/flags parameter to .bdrv_co_create_opts (the legacy
   callback used for qemu-img create, but not QMP blockdev-create).
   Format drivers would then use that for their bdrv_open() call on the
   protocol layer; file-posix would directly use it for opening the
   file.

3. Add a full BlockdevOptions parameter to .bdrv_co_create_opts() that
   is used for the bdrv_open() of the protocol layer. I guess this would
   bring in features that are currently not accessible from qemu-img
   create because they can't be encoded in a URL but require options
   (such as multiple servers for rbd or gluster).

4. Have some kind of full blockdev-create support in qemu-img create.
   This wouldn't require any changes to the block drivers, but finding a
   good syntax for qemu-img create might not be trivial.

Given that 1-3 are all hacks to the legacy interface, maybe 4 is what we
should be doing to expose the full set of QEMU features in qemu-img
create, even though it poses interface questions that we wouldn't have
to answer with the other options.

> So needed changes are:
> 
> 1. Add a way to control cache in qemu-img create (-t none? -o cache=none?)
> 2. Respect -t none in qemu-img convert -o preallocation falloc

-o cache=none would be what results from my option 1. 

[PATCH v2 1/3] block: add missing socket_init() calls to tools

2020-08-25 Thread Daniel P . Berrangé
Any tool that uses sockets needs to call socket_init() in order to work
on the Windows platform.

Reviewed-by: Eric Blake 
Signed-off-by: Daniel P. Berrangé 
---
 qemu-img.c | 2 ++
 qemu-io.c  | 2 ++
 qemu-nbd.c | 1 +
 3 files changed, 5 insertions(+)

diff --git a/qemu-img.c b/qemu-img.c
index 5308773811..eb2fc1f862 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -41,6 +41,7 @@
 #include "qemu/log.h"
 #include "qemu/main-loop.h"
 #include "qemu/module.h"
+#include "qemu/sockets.h"
 #include "qemu/units.h"
 #include "qom/object_interfaces.h"
 #include "sysemu/block-backend.h"
@@ -5410,6 +5411,7 @@ int main(int argc, char **argv)
 signal(SIGPIPE, SIG_IGN);
 #endif
 
+socket_init();
 error_init(argv[0]);
 module_call_init(MODULE_INIT_TRACE);
 qemu_init_exec_dir(argv[0]);
diff --git a/qemu-io.c b/qemu-io.c
index 3adc5a7d0d..7cc832b3d6 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -25,6 +25,7 @@
 #include "qemu/config-file.h"
 #include "qemu/readline.h"
 #include "qemu/log.h"
+#include "qemu/sockets.h"
 #include "qapi/qmp/qstring.h"
 #include "qapi/qmp/qdict.h"
 #include "qom/object_interfaces.h"
@@ -542,6 +543,7 @@ int main(int argc, char **argv)
 signal(SIGPIPE, SIG_IGN);
 #endif
 
+socket_init();
 error_init(argv[0]);
 module_call_init(MODULE_INIT_TRACE);
 qemu_init_exec_dir(argv[0]);
diff --git a/qemu-nbd.c b/qemu-nbd.c
index d2657b8db5..b102874f0f 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -599,6 +599,7 @@ int main(int argc, char **argv)
 signal(SIGPIPE, SIG_IGN);
 #endif
 
+socket_init();
 error_init(argv[0]);
 module_call_init(MODULE_INIT_TRACE);
 qcrypto_init(_fatal);
-- 
2.26.2




[PATCH v2 0/3] nbd: build qemu-nbd on Windows

2020-08-25 Thread Daniel P . Berrangé
We are already building the NBD client and server on Windows when it is
used via the main system emulator binaries. This demonstrates there is
no fundamental blocker to buildig the qemu-nbd binary too.

Changed in v2:

 - Split second patch into two parts
 - Use  HAVE_NBD_DEVICE condition to disable SIGTERM handler not WIN32

Daniel P. Berrangé (3):
  block: add missing socket_init() calls to tools
  nbd: skip SIGTERM handler if NBD device support is not built
  nbd: disable signals and forking on Windows builds

 meson.build |  7 ++-
 qemu-img.c  |  2 ++
 qemu-io.c   |  2 ++
 qemu-nbd.c  | 11 ++-
 4 files changed, 16 insertions(+), 6 deletions(-)

-- 
2.26.2





Re: [PATCH v7 3/4] qapi: add filter-node-name to block-stream

2020-08-25 Thread Markus Armbruster
Andrey Shinkevich  writes:

> Provide the possibility to pass the 'filter-node-name' parameter to the
> block-stream job as it is done for the commit block job.
>
> Signed-off-by: Andrey Shinkevich 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
[...]
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 0b8ccd3..1db6ce1 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2524,6 +2524,11 @@
>  #'stop' and 'enospc' can only be used if the block device
>  #supports io-status (see BlockInfo).  Since 1.3.
>  #
> +# @filter-node-name: the node name that should be assigned to the
> +#filter driver that the stream job inserts into the graph
> +#above @device. If this option is not given, a node name 
> is
> +#autogenerated. (Since: 5.1)

It's (Since: 5.2) now.

> +#
>  # @auto-finalize: When false, this job will wait in a PENDING state after it 
> has
>  # finished its work, waiting for @block-job-finalize before
>  # making any block graph changes.
> @@ -2554,6 +2559,7 @@
>'data': { '*job-id': 'str', 'device': 'str', '*base': 'str',
>  '*base-node': 'str', '*backing-file': 'str', '*speed': 'int',
>  '*on-error': 'BlockdevOnError',
> +'*filter-node-name': 'str',
>  '*auto-finalize': 'bool', '*auto-dismiss': 'bool' } }
>  
>  ##




[PULL 01/34] qcow2: Make Qcow2AioTask store the full host offset

2020-08-25 Thread Max Reitz
From: Alberto Garcia 

The file_cluster_offset field of Qcow2AioTask stores a cluster-aligned
host offset. In practice this is not very useful because all users(*)
of this structure need the final host offset into the cluster, which
they calculate using

   host_offset = file_cluster_offset + offset_into_cluster(s, offset)

There is no reason why Qcow2AioTask cannot store host_offset directly
and that is what this patch does.

(*) compressed clusters are the exception: in this case what
file_cluster_offset was storing was the full compressed cluster
descriptor (offset + size). This does not change with this patch
but it is documented now.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Message-Id: 
<07c4b15c644dcf06c9459f98846ac1c4ea96e26f.1594396418.git.be...@igalia.com>
Signed-off-by: Max Reitz 
---
 block/qcow2.c  | 69 ++
 block/trace-events |  2 +-
 2 files changed, 34 insertions(+), 37 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 6ad6bdc166..5ab1f45fe2 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -74,7 +74,7 @@ typedef struct {
 
 static int coroutine_fn
 qcow2_co_preadv_compressed(BlockDriverState *bs,
-   uint64_t file_cluster_offset,
+   uint64_t cluster_descriptor,
uint64_t offset,
uint64_t bytes,
QEMUIOVector *qiov,
@@ -2113,7 +2113,7 @@ out:
 
 static coroutine_fn int
 qcow2_co_preadv_encrypted(BlockDriverState *bs,
-   uint64_t file_cluster_offset,
+   uint64_t host_offset,
uint64_t offset,
uint64_t bytes,
QEMUIOVector *qiov,
@@ -2140,16 +2140,12 @@ qcow2_co_preadv_encrypted(BlockDriverState *bs,
 }
 
 BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
-ret = bdrv_co_pread(s->data_file,
-file_cluster_offset + offset_into_cluster(s, offset),
-bytes, buf, 0);
+ret = bdrv_co_pread(s->data_file, host_offset, bytes, buf, 0);
 if (ret < 0) {
 goto fail;
 }
 
-if (qcow2_co_decrypt(bs,
- file_cluster_offset + offset_into_cluster(s, offset),
- offset, buf, bytes) < 0)
+if (qcow2_co_decrypt(bs, host_offset, offset, buf, bytes) < 0)
 {
 ret = -EIO;
 goto fail;
@@ -2167,7 +2163,7 @@ typedef struct Qcow2AioTask {
 
 BlockDriverState *bs;
 QCow2ClusterType cluster_type; /* only for read */
-uint64_t file_cluster_offset;
+uint64_t host_offset; /* or full descriptor in compressed clusters */
 uint64_t offset;
 uint64_t bytes;
 QEMUIOVector *qiov;
@@ -2180,7 +2176,7 @@ static coroutine_fn int qcow2_add_task(BlockDriverState 
*bs,
AioTaskPool *pool,
AioTaskFunc func,
QCow2ClusterType cluster_type,
-   uint64_t file_cluster_offset,
+   uint64_t host_offset,
uint64_t offset,
uint64_t bytes,
QEMUIOVector *qiov,
@@ -2195,7 +2191,7 @@ static coroutine_fn int qcow2_add_task(BlockDriverState 
*bs,
 .bs = bs,
 .cluster_type = cluster_type,
 .qiov = qiov,
-.file_cluster_offset = file_cluster_offset,
+.host_offset = host_offset,
 .offset = offset,
 .bytes = bytes,
 .qiov_offset = qiov_offset,
@@ -2204,7 +2200,7 @@ static coroutine_fn int qcow2_add_task(BlockDriverState 
*bs,
 
 trace_qcow2_add_task(qemu_coroutine_self(), bs, pool,
  func == qcow2_co_preadv_task_entry ? "read" : "write",
- cluster_type, file_cluster_offset, offset, bytes,
+ cluster_type, host_offset, offset, bytes,
  qiov, qiov_offset);
 
 if (!pool) {
@@ -2218,13 +2214,12 @@ static coroutine_fn int qcow2_add_task(BlockDriverState 
*bs,
 
 static coroutine_fn int qcow2_co_preadv_task(BlockDriverState *bs,
  QCow2ClusterType cluster_type,
- uint64_t file_cluster_offset,
+ uint64_t host_offset,
  uint64_t offset, uint64_t bytes,
  QEMUIOVector *qiov,
  size_t qiov_offset)
 {
 BDRVQcow2State *s = bs->opaque;
-int offset_in_cluster = offset_into_cluster(s, offset);
 
 switch (cluster_type) {
 case QCOW2_CLUSTER_ZERO_PLAIN:
@@ 

[PULL 07/34] qcow2: Document the Extended L2 Entries feature

2020-08-25 Thread Max Reitz
From: Alberto Garcia 

Subcluster allocation in qcow2 is implemented by extending the
existing L2 table entries and adding additional information to
indicate the allocation status of each subcluster.

This patch documents the changes to the qcow2 format and how they
affect the calculation of the L2 cache size.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Reviewed-by: Eric Blake 
Message-Id: 
<5199f2e1c717bcaa58b48142c9062b803145ff7f.1594396418.git.be...@igalia.com>
Signed-off-by: Max Reitz 
---
 docs/interop/qcow2.txt | 68 --
 docs/qcow2-cache.txt   | 19 +++-
 2 files changed, 83 insertions(+), 4 deletions(-)

diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index f072e27900..7da0d81df8 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -42,6 +42,9 @@ The first cluster of a qcow2 image contains the file header:
 as the maximum cluster size and won't be able to open 
images
 with larger cluster sizes.
 
+Note: if the image has Extended L2 Entries then 
cluster_bits
+must be at least 14 (i.e. 16384 byte clusters).
+
  24 - 31:   size
 Virtual disk size in bytes.
 
@@ -117,7 +120,12 @@ the next fields through header_length.
 clusters. The compression_type field must be
 present and not zero.
 
-Bits 4-63:  Reserved (set to 0)
+Bit 4:  Extended L2 Entries.  If this bit is set then
+L2 table entries use an extended format that
+allows subcluster-based allocation. See the
+Extended L2 Entries section for more details.
+
+Bits 5-63:  Reserved (set to 0)
 
  80 -  87:  compatible_features
 Bitmask of compatible features. An implementation can
@@ -498,7 +506,7 @@ cannot be relaxed without an incompatible layout change).
 Given an offset into the virtual disk, the offset into the image file can be
 obtained as follows:
 
-l2_entries = (cluster_size / sizeof(uint64_t))
+l2_entries = (cluster_size / sizeof(uint64_t))[*]
 
 l2_index = (offset / cluster_size) % l2_entries
 l1_index = (offset / cluster_size) / l2_entries
@@ -508,6 +516,8 @@ obtained as follows:
 
 return cluster_offset + (offset % cluster_size)
 
+[*] this changes if Extended L2 Entries are enabled, see next section
+
 L1 table entry:
 
 Bit  0 -  8:Reserved (set to 0)
@@ -548,7 +558,8 @@ Standard Cluster Descriptor:
 nor is data read from the backing file if the cluster is
 unallocated.
 
-With version 2, this is always 0.
+With version 2 or with extended L2 entries (see the next
+section), this is always 0.
 
  1 -  8:Reserved (set to 0)
 
@@ -585,6 +596,57 @@ file (except if bit 0 in the Standard Cluster Descriptor 
is set). If there is
 no backing file or the backing file is smaller than the image, they shall read
 zeros for all parts that are not covered by the backing file.
 
+== Extended L2 Entries ==
+
+An image uses Extended L2 Entries if bit 4 is set on the incompatible_features
+field of the header.
+
+In these images standard data clusters are divided into 32 subclusters of the
+same size. They are contiguous and start from the beginning of the cluster.
+Subclusters can be allocated independently and the L2 entry contains 
information
+indicating the status of each one of them. Compressed data clusters don't have
+subclusters so they are treated the same as in images without this feature.
+
+The size of an extended L2 entry is 128 bits so the number of entries per table
+is calculated using this formula:
+
+l2_entries = (cluster_size / (2 * sizeof(uint64_t)))
+
+The first 64 bits have the same format as the standard L2 table entry described
+in the previous section, with the exception of bit 0 of the standard cluster
+descriptor.
+
+The last 64 bits contain a subcluster allocation bitmap with this format:
+
+Subcluster Allocation Bitmap (for standard clusters):
+
+Bit  0 - 31:Allocation status (one bit per subcluster)
+
+1: the subcluster is allocated. In this case the
+   host cluster offset field must contain a valid
+   offset.
+0: the subcluster is not allocated. In this case
+   read requests shall go to the backing file or
+   return zeros if there is no backing file data.
+
+Bits are assigned starting from the least significant
+one (i.e. bit x is used for subcluster x).
+
+32 - 63 Subcluster reads as zeros (one bit per subcluster)
+
+1: the 

[PULL 06/34] qcow2: Add get_l2_entry() and set_l2_entry()

2020-08-25 Thread Max Reitz
From: Alberto Garcia 

The size of an L2 entry is 64 bits, but if we want to have subclusters
we need extended L2 entries. This means that we have to access L2
tables and slices differently depending on whether an image has
extended L2 entries or not.

This patch replaces all l2_slice[] accesses with calls to
get_l2_entry() and set_l2_entry().

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Message-Id: 
<9586363531fec125ba1386e561762d3e4224e9fc.1594396418.git.be...@igalia.com>
Signed-off-by: Max Reitz 
---
 block/qcow2.h  | 12 
 block/qcow2-cluster.c  | 63 ++
 block/qcow2-refcount.c | 17 ++--
 3 files changed, 54 insertions(+), 38 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 06475e0849..eecbadc4cb 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -510,6 +510,18 @@ typedef enum QCow2MetadataOverlap {
 
 #define INV_OFFSET (-1ULL)
 
+static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
+int idx)
+{
+return be64_to_cpu(l2_slice[idx]);
+}
+
+static inline void set_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
+int idx, uint64_t entry)
+{
+l2_slice[idx] = cpu_to_be64(entry);
+}
+
 static inline bool has_data_file(BlockDriverState *bs)
 {
 BDRVQcow2State *s = bs->opaque;
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index b7b7b37062..5bd1e1feb8 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -383,12 +383,13 @@ fail:
  * cluster which may require a different handling)
  */
 static int count_contiguous_clusters(BlockDriverState *bs, int nb_clusters,
-int cluster_size, uint64_t *l2_slice, uint64_t stop_flags)
+int cluster_size, uint64_t *l2_slice, int l2_index, uint64_t 
stop_flags)
 {
+BDRVQcow2State *s = bs->opaque;
 int i;
 QCow2ClusterType first_cluster_type;
 uint64_t mask = stop_flags | L2E_OFFSET_MASK | QCOW_OFLAG_COMPRESSED;
-uint64_t first_entry = be64_to_cpu(l2_slice[0]);
+uint64_t first_entry = get_l2_entry(s, l2_slice, l2_index);
 uint64_t offset = first_entry & mask;
 
 first_cluster_type = qcow2_get_cluster_type(bs, first_entry);
@@ -401,7 +402,7 @@ static int count_contiguous_clusters(BlockDriverState *bs, 
int nb_clusters,
first_cluster_type == QCOW2_CLUSTER_ZERO_ALLOC);
 
 for (i = 0; i < nb_clusters; i++) {
-uint64_t l2_entry = be64_to_cpu(l2_slice[i]) & mask;
+uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index + i) & mask;
 if (offset + (uint64_t) i * cluster_size != l2_entry) {
 break;
 }
@@ -417,14 +418,16 @@ static int count_contiguous_clusters(BlockDriverState 
*bs, int nb_clusters,
 static int count_contiguous_clusters_unallocated(BlockDriverState *bs,
  int nb_clusters,
  uint64_t *l2_slice,
+ int l2_index,
  QCow2ClusterType wanted_type)
 {
+BDRVQcow2State *s = bs->opaque;
 int i;
 
 assert(wanted_type == QCOW2_CLUSTER_ZERO_PLAIN ||
wanted_type == QCOW2_CLUSTER_UNALLOCATED);
 for (i = 0; i < nb_clusters; i++) {
-uint64_t entry = be64_to_cpu(l2_slice[i]);
+uint64_t entry = get_l2_entry(s, l2_slice, l2_index + i);
 QCow2ClusterType type = qcow2_get_cluster_type(bs, entry);
 
 if (type != wanted_type) {
@@ -575,7 +578,7 @@ int qcow2_get_host_offset(BlockDriverState *bs, uint64_t 
offset,
 /* find the cluster offset for the given disk offset */
 
 l2_index = offset_to_l2_slice_index(s, offset);
-l2_entry = be64_to_cpu(l2_slice[l2_index]);
+l2_entry = get_l2_entry(s, l2_slice, l2_index);
 
 nb_clusters = size_to_clusters(s, bytes_needed);
 /* bytes_needed <= *bytes + offset_in_cluster, both of which are unsigned
@@ -610,7 +613,7 @@ int qcow2_get_host_offset(BlockDriverState *bs, uint64_t 
offset,
 case QCOW2_CLUSTER_UNALLOCATED:
 /* how many empty clusters ? */
 c = count_contiguous_clusters_unallocated(bs, nb_clusters,
-  _slice[l2_index], type);
+  l2_slice, l2_index, type);
 break;
 case QCOW2_CLUSTER_ZERO_ALLOC:
 case QCOW2_CLUSTER_NORMAL: {
@@ -618,7 +621,7 @@ int qcow2_get_host_offset(BlockDriverState *bs, uint64_t 
offset,
 *host_offset = host_cluster_offset + offset_in_cluster;
 /* how many allocated clusters ? */
 c = count_contiguous_clusters(bs, nb_clusters, s->cluster_size,
-  _slice[l2_index], QCOW_OFLAG_ZERO);
+  l2_slice, l2_index, QCOW_OFLAG_ZERO);
 if (offset_into_cluster(s, 

[PULL 00/34] Block patches

2020-08-25 Thread Max Reitz
The following changes since commit 30aa19446d82358a30eac3b556b4d6641e00b7c1:

  Merge remote-tracking branch 'remotes/cschoenebeck/tags/pull-9p-20200812' 
into staging (2020-08-24 16:39:53 +0100)

are available in the Git repository at:

  https://github.com/XanClic/qemu.git tags/pull-block-2020-08-25

for you to fetch changes up to c576fd97d4ca77b5a1a27728df11a61083dbfa98:

  iotests: Add tests for qcow2 images with extended L2 entries (2020-08-25 
10:20:18 +0200)


Block patches:
- qcow2 subclusters (extended L2 entries)


Alberto Garcia (34):
  qcow2: Make Qcow2AioTask store the full host offset
  qcow2: Convert qcow2_get_cluster_offset() into qcow2_get_host_offset()
  qcow2: Add calculate_l2_meta()
  qcow2: Split cluster_needs_cow() out of count_cow_clusters()
  qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()
  qcow2: Add get_l2_entry() and set_l2_entry()
  qcow2: Document the Extended L2 Entries feature
  qcow2: Add dummy has_subclusters() function
  qcow2: Add subcluster-related fields to BDRVQcow2State
  qcow2: Add offset_to_sc_index()
  qcow2: Add offset_into_subcluster() and size_to_subclusters()
  qcow2: Add l2_entry_size()
  qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap()
  qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type()
  qcow2: Add qcow2_get_subcluster_range_type()
  qcow2: Add qcow2_cluster_is_allocated()
  qcow2: Add cluster type parameter to qcow2_get_host_offset()
  qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_*
  qcow2: Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC
  qcow2: Add subcluster support to calculate_l2_meta()
  qcow2: Add subcluster support to qcow2_get_host_offset()
  qcow2: Add subcluster support to zero_in_l2_slice()
  qcow2: Add subcluster support to discard_in_l2_slice()
  qcow2: Add subcluster support to check_refcounts_l2()
  qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()
  qcow2: Clear the L2 bitmap when allocating a compressed cluster
  qcow2: Add subcluster support to handle_alloc_space()
  qcow2: Add subcluster support to qcow2_co_pwrite_zeroes()
  qcow2: Add subcluster support to qcow2_measure()
  qcow2: Add prealloc field to QCowL2Meta
  qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit
  qcow2: Allow preallocation and backing files if extended_l2 is set
  qcow2: Assert that expand_zero_clusters_in_l1() does not support
subclusters
  iotests: Add tests for qcow2 images with extended L2 entries

 docs/interop/qcow2.txt   |  68 ++-
 docs/qcow2-cache.txt |  19 +-
 qapi/block-core.json |   7 +
 block/qcow2.h| 211 ++-
 include/block/block_int.h|   1 +
 block/qcow2-cluster.c| 906 +--
 block/qcow2-refcount.c   |  47 +-
 block/qcow2.c| 302 +++
 block/trace-events   |   2 +-
 tests/qemu-iotests/031.out   |   8 +-
 tests/qemu-iotests/036.out   |   4 +-
 tests/qemu-iotests/049.out   | 102 ++--
 tests/qemu-iotests/060.out   |   3 +-
 tests/qemu-iotests/061   |   6 +
 tests/qemu-iotests/061.out   |  25 +-
 tests/qemu-iotests/065   |  12 +-
 tests/qemu-iotests/082.out   |  39 +-
 tests/qemu-iotests/085.out   |  38 +-
 tests/qemu-iotests/144.out   |   4 +-
 tests/qemu-iotests/182.out   |   2 +-
 tests/qemu-iotests/185.out   |   8 +-
 tests/qemu-iotests/198   |   2 +
 tests/qemu-iotests/206.out   |   6 +-
 tests/qemu-iotests/242.out   |   5 +
 tests/qemu-iotests/255.out   |   8 +-
 tests/qemu-iotests/271   | 901 ++
 tests/qemu-iotests/271.out   | 726 +
 tests/qemu-iotests/274.out   |  49 +-
 tests/qemu-iotests/280.out   |   2 +-
 tests/qemu-iotests/291.out   |   2 +
 tests/qemu-iotests/302.out   |   1 +
 tests/qemu-iotests/303.out   |   4 +-
 tests/qemu-iotests/common.filter |   1 +
 tests/qemu-iotests/group |   1 +
 34 files changed, 2952 insertions(+), 570 deletions(-)
 create mode 100755 tests/qemu-iotests/271
 create mode 100644 tests/qemu-iotests/271.out

-- 
2.26.2




[PULL 02/34] qcow2: Convert qcow2_get_cluster_offset() into qcow2_get_host_offset()

2020-08-25 Thread Max Reitz
From: Alberto Garcia 

qcow2_get_cluster_offset() takes an (unaligned) guest offset and
returns the (aligned) offset of the corresponding cluster in the qcow2
image.

In practice none of the callers need to know where the cluster starts
so this patch makes the function calculate and return the final host
offset directly. The function is also renamed accordingly.

There is a pre-existing exception with compressed clusters: in this
case the function returns the complete cluster descriptor (containing
the offset and size of the compressed data). This does not change with
this patch but it is now documented.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Message-Id: 

Signed-off-by: Max Reitz 
---
 block/qcow2.h |  4 ++--
 block/qcow2-cluster.c | 41 +++--
 block/qcow2.c | 24 +++-
 3 files changed, 32 insertions(+), 37 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 7ce2c23bdb..06475e0849 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -694,8 +694,8 @@ int qcow2_write_l1_entry(BlockDriverState *bs, int 
l1_index);
 int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t sector_num,
   uint8_t *buf, int nb_sectors, bool enc, Error 
**errp);
 
-int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
- unsigned int *bytes, uint64_t *cluster_offset);
+int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
+  unsigned int *bytes, uint64_t *host_offset);
 int qcow2_alloc_cluster_offset(BlockDriverState *bs, uint64_t offset,
unsigned int *bytes, uint64_t *host_offset,
QCowL2Meta **m);
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 550850b264..71ddb0c462 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -496,10 +496,15 @@ static int coroutine_fn 
do_perform_cow_write(BlockDriverState *bs,
 
 
 /*
- * get_cluster_offset
+ * get_host_offset
  *
- * For a given offset of the virtual disk, find the cluster type and offset in
- * the qcow2 file. The offset is stored in *cluster_offset.
+ * For a given offset of the virtual disk find the equivalent host
+ * offset in the qcow2 file and store it in *host_offset. Neither
+ * offset needs to be aligned to a cluster boundary.
+ *
+ * If the cluster is unallocated then *host_offset will be 0.
+ * If the cluster is compressed then *host_offset will contain the
+ * complete compressed cluster descriptor.
  *
  * On entry, *bytes is the maximum number of contiguous bytes starting at
  * offset that we are interested in.
@@ -511,12 +516,12 @@ static int coroutine_fn 
do_perform_cow_write(BlockDriverState *bs,
  * Returns the cluster type (QCOW2_CLUSTER_*) on success, -errno in error
  * cases.
  */
-int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
- unsigned int *bytes, uint64_t *cluster_offset)
+int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
+  unsigned int *bytes, uint64_t *host_offset)
 {
 BDRVQcow2State *s = bs->opaque;
 unsigned int l2_index;
-uint64_t l1_index, l2_offset, *l2_slice;
+uint64_t l1_index, l2_offset, *l2_slice, l2_entry;
 int c;
 unsigned int offset_in_cluster;
 uint64_t bytes_available, bytes_needed, nb_clusters;
@@ -537,7 +542,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t 
offset,
 bytes_needed = bytes_available;
 }
 
-*cluster_offset = 0;
+*host_offset = 0;
 
 /* seek to the l2 offset in the l1 table */
 
@@ -570,7 +575,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t 
offset,
 /* find the cluster offset for the given disk offset */
 
 l2_index = offset_to_l2_slice_index(s, offset);
-*cluster_offset = be64_to_cpu(l2_slice[l2_index]);
+l2_entry = be64_to_cpu(l2_slice[l2_index]);
 
 nb_clusters = size_to_clusters(s, bytes_needed);
 /* bytes_needed <= *bytes + offset_in_cluster, both of which are unsigned
@@ -578,7 +583,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t 
offset,
  * true */
 assert(nb_clusters <= INT_MAX);
 
-type = qcow2_get_cluster_type(bs, *cluster_offset);
+type = qcow2_get_cluster_type(bs, l2_entry);
 if (s->qcow_version < 3 && (type == QCOW2_CLUSTER_ZERO_PLAIN ||
 type == QCOW2_CLUSTER_ZERO_ALLOC)) {
 qcow2_signal_corruption(bs, true, -1, -1, "Zero cluster entry found"
@@ -599,42 +604,42 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, 
uint64_t offset,
 }
 /* Compressed clusters can only be processed one by one */
 c = 1;
-*cluster_offset &= L2E_COMPRESSED_OFFSET_SIZE_MASK;
+*host_offset = l2_entry & L2E_COMPRESSED_OFFSET_SIZE_MASK;
 break;
 case QCOW2_CLUSTER_ZERO_PLAIN:
 case QCOW2_CLUSTER_UNALLOCATED:
 /* how many 

[PULL 03/34] qcow2: Add calculate_l2_meta()

2020-08-25 Thread Max Reitz
From: Alberto Garcia 

handle_alloc() creates a QCowL2Meta structure in order to update the
image metadata and perform the necessary copy-on-write operations.

This patch moves that code to a separate function so it can be used
from other places.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Message-Id: 

Signed-off-by: Max Reitz 
---
 block/qcow2-cluster.c | 77 +--
 1 file changed, 53 insertions(+), 24 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 71ddb0c462..6447222ba0 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1038,6 +1038,56 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, 
QCowL2Meta *m)
 }
 }
 
+/*
+ * For a given write request, create a new QCowL2Meta structure, add
+ * it to @m and the BDRVQcow2State.cluster_allocs list.
+ *
+ * @host_cluster_offset points to the beginning of the first cluster.
+ *
+ * @guest_offset and @bytes indicate the offset and length of the
+ * request.
+ *
+ * If @keep_old is true it means that the clusters were already
+ * allocated and will be overwritten. If false then the clusters are
+ * new and we have to decrease the reference count of the old ones.
+ */
+static void calculate_l2_meta(BlockDriverState *bs,
+  uint64_t host_cluster_offset,
+  uint64_t guest_offset, unsigned bytes,
+  QCowL2Meta **m, bool keep_old)
+{
+BDRVQcow2State *s = bs->opaque;
+unsigned cow_start_from = 0;
+unsigned cow_start_to = offset_into_cluster(s, guest_offset);
+unsigned cow_end_from = cow_start_to + bytes;
+unsigned cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
+unsigned nb_clusters = size_to_clusters(s, cow_end_from);
+QCowL2Meta *old_m = *m;
+
+*m = g_malloc0(sizeof(**m));
+**m = (QCowL2Meta) {
+.next   = old_m,
+
+.alloc_offset   = host_cluster_offset,
+.offset = start_of_cluster(s, guest_offset),
+.nb_clusters= nb_clusters,
+
+.keep_old_clusters = keep_old,
+
+.cow_start = {
+.offset = cow_start_from,
+.nb_bytes   = cow_start_to - cow_start_from,
+},
+.cow_end = {
+.offset = cow_end_from,
+.nb_bytes   = cow_end_to - cow_end_from,
+},
+};
+
+qemu_co_queue_init(&(*m)->dependent_requests);
+QLIST_INSERT_HEAD(>cluster_allocs, *m, next_in_flight);
+}
+
 /*
  * Returns the number of contiguous clusters that can be used for an allocating
  * write, but require COW to be performed (this includes yet unallocated space,
@@ -1436,35 +1486,14 @@ static int handle_alloc(BlockDriverState *bs, uint64_t 
guest_offset,
 uint64_t requested_bytes = *bytes + offset_into_cluster(s, guest_offset);
 int avail_bytes = nb_clusters << s->cluster_bits;
 int nb_bytes = MIN(requested_bytes, avail_bytes);
-QCowL2Meta *old_m = *m;
-
-*m = g_malloc0(sizeof(**m));
-
-**m = (QCowL2Meta) {
-.next   = old_m,
-
-.alloc_offset   = alloc_cluster_offset,
-.offset = start_of_cluster(s, guest_offset),
-.nb_clusters= nb_clusters,
-
-.keep_old_clusters  = keep_old_clusters,
-
-.cow_start = {
-.offset = 0,
-.nb_bytes   = offset_into_cluster(s, guest_offset),
-},
-.cow_end = {
-.offset = nb_bytes,
-.nb_bytes   = avail_bytes - nb_bytes,
-},
-};
-qemu_co_queue_init(&(*m)->dependent_requests);
-QLIST_INSERT_HEAD(>cluster_allocs, *m, next_in_flight);
 
 *host_offset = alloc_cluster_offset + offset_into_cluster(s, guest_offset);
 *bytes = MIN(*bytes, nb_bytes - offset_into_cluster(s, guest_offset));
 assert(*bytes != 0);
 
+calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes,
+  m, keep_old_clusters);
+
 return 1;
 
 fail:
-- 
2.26.2




[PULL 28/34] qcow2: Add subcluster support to qcow2_co_pwrite_zeroes()

2020-08-25 Thread Max Reitz
From: Alberto Garcia 

This works now at the subcluster level and pwrite_zeroes_alignment is
updated accordingly.

qcow2_cluster_zeroize() is turned into qcow2_subcluster_zeroize() with
the following changes:

   - The request can now be subcluster-aligned.

   - The cluster-aligned body of the request is still zeroized using
 zero_in_l2_slice() as before.

   - The subcluster-aligned head and tail of the request are zeroized
 with the new zero_l2_subclusters() function.

There is just one thing to take into account for a possible future
improvement: compressed clusters cannot be partially zeroized so
zero_l2_subclusters() on the head or the tail can return -ENOTSUP.
This makes the caller repeat the *complete* request and write actual
zeroes to disk. This is sub-optimal because

   1) if the head area was compressed we would still be able to use
  the fast path for the body and possibly the tail.

   2) if the tail area was compressed we are writing zeroes to the
  head and the body areas, which are already zeroized.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Message-Id: 
<17e05e2ee7e12f10dcf012da81e83ebe27eb3bef.1594396418.git.be...@igalia.com>
Signed-off-by: Max Reitz 
---
 block/qcow2.h |  4 +--
 block/qcow2-cluster.c | 81 +++
 block/qcow2.c | 33 +-
 3 files changed, 94 insertions(+), 24 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 4fad40b96b..4ef4ae4ab0 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -898,8 +898,8 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, 
QCowL2Meta *m);
 int qcow2_cluster_discard(BlockDriverState *bs, uint64_t offset,
   uint64_t bytes, enum qcow2_discard_type type,
   bool full_discard);
-int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
-  uint64_t bytes, int flags);
+int qcow2_subcluster_zeroize(BlockDriverState *bs, uint64_t offset,
+ uint64_t bytes, int flags);
 
 int qcow2_expand_zero_clusters(BlockDriverState *bs,
BlockDriverAmendStatusCB *status_cb,
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 1e84bd8e2e..9d349d61c6 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -2016,12 +2016,59 @@ static int zero_in_l2_slice(BlockDriverState *bs, 
uint64_t offset,
 return nb_clusters;
 }
 
-int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
-  uint64_t bytes, int flags)
+static int zero_l2_subclusters(BlockDriverState *bs, uint64_t offset,
+   unsigned nb_subclusters)
+{
+BDRVQcow2State *s = bs->opaque;
+uint64_t *l2_slice;
+uint64_t old_l2_bitmap, l2_bitmap;
+int l2_index, ret, sc = offset_to_sc_index(s, offset);
+
+/* For full clusters use zero_in_l2_slice() instead */
+assert(nb_subclusters > 0 && nb_subclusters < s->subclusters_per_cluster);
+assert(sc + nb_subclusters <= s->subclusters_per_cluster);
+assert(offset_into_subcluster(s, offset) == 0);
+
+ret = get_cluster_table(bs, offset, _slice, _index);
+if (ret < 0) {
+return ret;
+}
+
+switch (qcow2_get_cluster_type(bs, get_l2_entry(s, l2_slice, l2_index))) {
+case QCOW2_CLUSTER_COMPRESSED:
+ret = -ENOTSUP; /* We cannot partially zeroize compressed clusters */
+goto out;
+case QCOW2_CLUSTER_NORMAL:
+case QCOW2_CLUSTER_UNALLOCATED:
+break;
+default:
+g_assert_not_reached();
+}
+
+old_l2_bitmap = l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index);
+
+l2_bitmap |=  QCOW_OFLAG_SUB_ZERO_RANGE(sc, sc + nb_subclusters);
+l2_bitmap &= ~QCOW_OFLAG_SUB_ALLOC_RANGE(sc, sc + nb_subclusters);
+
+if (old_l2_bitmap != l2_bitmap) {
+set_l2_bitmap(s, l2_slice, l2_index, l2_bitmap);
+qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
+}
+
+ret = 0;
+out:
+qcow2_cache_put(s->l2_table_cache, (void **) _slice);
+
+return ret;
+}
+
+int qcow2_subcluster_zeroize(BlockDriverState *bs, uint64_t offset,
+ uint64_t bytes, int flags)
 {
 BDRVQcow2State *s = bs->opaque;
 uint64_t end_offset = offset + bytes;
 uint64_t nb_clusters;
+unsigned head, tail;
 int64_t cleared;
 int ret;
 
@@ -2036,8 +2083,8 @@ int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t 
offset,
 }
 
 /* Caller must pass aligned values, except at image end */
-assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
-assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
+assert(offset_into_subcluster(s, offset) == 0);
+assert(offset_into_subcluster(s, end_offset) == 0 ||
end_offset >= bs->total_sectors << BDRV_SECTOR_BITS);
 
 /*
@@ -2052,11 +2099,26 @@ int qcow2_cluster_zeroize(BlockDriverState *bs, 
uint64_t offset,
 return -ENOTSUP;
 }
 
-/* 

[PULL 31/34] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit

2020-08-25 Thread Max Reitz
From: Alberto Garcia 

Now that the implementation of subclusters is complete we can finally
add the necessary options to create and read images with this feature,
which we call "extended L2 entries".

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
Message-Id: 
<6476caaa73216bd05b7bb2d504a20415e1665176.1594396418.git.be...@igalia.com>
[mreitz: %s/5\.1/5.2/; fixed 302's and 303's reference output]
Signed-off-by: Max Reitz 
---
 qapi/block-core.json |   7 +++
 block/qcow2.h|   8 ++-
 include/block/block_int.h|   1 +
 block/qcow2.c|  66 ++--
 tests/qemu-iotests/031.out   |   8 +--
 tests/qemu-iotests/036.out   |   4 +-
 tests/qemu-iotests/049.out   | 102 +++
 tests/qemu-iotests/060.out   |   1 +
 tests/qemu-iotests/061.out   |  20 +++---
 tests/qemu-iotests/065   |  12 ++--
 tests/qemu-iotests/082.out   |  39 +---
 tests/qemu-iotests/085.out   |  38 ++--
 tests/qemu-iotests/144.out   |   4 +-
 tests/qemu-iotests/182.out   |   2 +-
 tests/qemu-iotests/185.out   |   8 +--
 tests/qemu-iotests/198   |   2 +
 tests/qemu-iotests/206.out   |   4 ++
 tests/qemu-iotests/242.out   |   5 ++
 tests/qemu-iotests/255.out   |   8 +--
 tests/qemu-iotests/274.out   |  49 ---
 tests/qemu-iotests/280.out   |   2 +-
 tests/qemu-iotests/291.out   |   2 +
 tests/qemu-iotests/302.out   |   1 +
 tests/qemu-iotests/303.out   |   4 +-
 tests/qemu-iotests/common.filter |   1 +
 25 files changed, 256 insertions(+), 142 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 197bdc1c36..db08c58d78 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -67,6 +67,9 @@
 # standalone (read-only) raw image without looking at qcow2
 # metadata (since: 4.0)
 #
+# @extended-l2: true if the image has extended L2 entries; only valid for
+#   compat >= 1.1 (since 5.2)
+#
 # @lazy-refcounts: on or off; only valid for compat >= 1.1
 #
 # @corrupt: true if the image has been marked corrupt; only valid for
@@ -88,6 +91,7 @@
   'compat': 'str',
   '*data-file': 'str',
   '*data-file-raw': 'bool',
+  '*extended-l2': 'bool',
   '*lazy-refcounts': 'bool',
   '*corrupt': 'bool',
   'refcount-bits': 'int',
@@ -4304,6 +4308,8 @@
 # @data-file-raw: True if the external data file must stay valid as a
 # standalone (read-only) raw image without looking at qcow2
 # metadata (default: false; since: 4.0)
+# @extended-l2  True to make the image have extended L2 entries
+#   (default: false; since 5.2)
 # @size: Size of the virtual disk in bytes
 # @version: Compatibility level (default: v3)
 # @backing-file: File name of the backing file if a backing file
@@ -4324,6 +4330,7 @@
   'data': { 'file': 'BlockdevRef',
 '*data-file':   'BlockdevRef',
 '*data-file-raw':   'bool',
+'*extended-l2': 'bool',
 'size': 'size',
 '*version': 'BlockdevQcow2Version',
 '*backing-file':'str',
diff --git a/block/qcow2.h b/block/qcow2.h
index f3499e53bf..065ec3df0b 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -246,15 +246,18 @@ enum {
 QCOW2_INCOMPAT_CORRUPT_BITNR= 1,
 QCOW2_INCOMPAT_DATA_FILE_BITNR  = 2,
 QCOW2_INCOMPAT_COMPRESSION_BITNR = 3,
+QCOW2_INCOMPAT_EXTL2_BITNR  = 4,
 QCOW2_INCOMPAT_DIRTY= 1 << QCOW2_INCOMPAT_DIRTY_BITNR,
 QCOW2_INCOMPAT_CORRUPT  = 1 << QCOW2_INCOMPAT_CORRUPT_BITNR,
 QCOW2_INCOMPAT_DATA_FILE= 1 << QCOW2_INCOMPAT_DATA_FILE_BITNR,
 QCOW2_INCOMPAT_COMPRESSION  = 1 << QCOW2_INCOMPAT_COMPRESSION_BITNR,
+QCOW2_INCOMPAT_EXTL2= 1 << QCOW2_INCOMPAT_EXTL2_BITNR,
 
 QCOW2_INCOMPAT_MASK = QCOW2_INCOMPAT_DIRTY
 | QCOW2_INCOMPAT_CORRUPT
 | QCOW2_INCOMPAT_DATA_FILE
-| QCOW2_INCOMPAT_COMPRESSION,
+| QCOW2_INCOMPAT_COMPRESSION
+| QCOW2_INCOMPAT_EXTL2,
 };
 
 /* Compatible feature bits */
@@ -581,8 +584,7 @@ typedef enum QCow2MetadataOverlap {
 
 static inline bool has_subclusters(BDRVQcow2State *s)
 {
-/* FIXME: Return false until this feature is complete */
-return false;
+return s->incompatible_features & QCOW2_INCOMPAT_EXTL2;
 }
 
 static inline size_t l2_entry_size(BDRVQcow2State *s)
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 38dec0275b..9da7a42927 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -59,6 +59,7 @@
 #define BLOCK_OPT_DATA_FILE "data_file"
 #define BLOCK_OPT_DATA_FILE_RAW "data_file_raw"

[PULL 19/34] qcow2: Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC

2020-08-25 Thread Max Reitz
From: Alberto Garcia 

When dealing with subcluster types there is a new value called
QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC that has no equivalent in
QCow2ClusterType.

This patch handles that value in all places where subcluster types
are processed.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Message-Id: 

Signed-off-by: Max Reitz 
---
 block/qcow2.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 477d60dd71..60192f1be3 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2064,7 +2064,8 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 *pnum = bytes;
 
 if ((type == QCOW2_SUBCLUSTER_NORMAL ||
- type == QCOW2_SUBCLUSTER_ZERO_ALLOC) && !s->crypto) {
+ type == QCOW2_SUBCLUSTER_ZERO_ALLOC ||
+ type == QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC) && !s->crypto) {
 *map = host_offset;
 *file = s->data_file->bs;
 status |= BDRV_BLOCK_OFFSET_VALID;
@@ -2072,7 +2073,8 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 if (type == QCOW2_SUBCLUSTER_ZERO_PLAIN ||
 type == QCOW2_SUBCLUSTER_ZERO_ALLOC) {
 status |= BDRV_BLOCK_ZERO;
-} else if (type != QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN) {
+} else if (type != QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN &&
+   type != QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC) {
 status |= BDRV_BLOCK_DATA;
 }
 if (s->metadata_preallocation && (status & BDRV_BLOCK_DATA) &&
@@ -2235,6 +2237,7 @@ static coroutine_fn int 
qcow2_co_preadv_task(BlockDriverState *bs,
 g_assert_not_reached();
 
 case QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN:
+case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
 assert(bs->backing); /* otherwise handled in qcow2_co_preadv_part */
 
 BLKDBG_EVENT(bs->file, BLKDBG_READ_BACKING_AIO);
@@ -2303,7 +2306,8 @@ static coroutine_fn int 
qcow2_co_preadv_part(BlockDriverState *bs,
 
 if (type == QCOW2_SUBCLUSTER_ZERO_PLAIN ||
 type == QCOW2_SUBCLUSTER_ZERO_ALLOC ||
-(type == QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN && !bs->backing))
+(type == QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN && !bs->backing) ||
+(type == QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC && !bs->backing))
 {
 qemu_iovec_memset(qiov, qiov_offset, 0, cur_bytes);
 } else {
@@ -3858,6 +3862,7 @@ static coroutine_fn int 
qcow2_co_pwrite_zeroes(BlockDriverState *bs,
 ret = qcow2_get_host_offset(bs, offset, , , );
 if (ret < 0 ||
 (type != QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN &&
+ type != QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC &&
  type != QCOW2_SUBCLUSTER_ZERO_PLAIN &&
  type != QCOW2_SUBCLUSTER_ZERO_ALLOC)) {
 qemu_co_mutex_unlock(>lock);
@@ -3936,6 +3941,7 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
 
 switch (type) {
 case QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN:
+case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
 if (bs->backing && bs->backing->bs) {
 int64_t backing_length = bdrv_getlength(bs->backing->bs);
 if (src_offset >= backing_length) {
-- 
2.26.2




[PULL 15/34] qcow2: Add qcow2_get_subcluster_range_type()

2020-08-25 Thread Max Reitz
From: Alberto Garcia 

There are situations in which we want to know how many contiguous
subclusters of the same type there are in a given cluster. This can be
done by simply iterating over the subclusters and repeatedly calling
qcow2_get_subcluster_type() for each one of them.

However once we determined the type of a subcluster we can check the
rest efficiently by counting the number of adjacent ones (or zeroes)
in the bitmap. This is what this function does.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
Message-Id: 

Signed-off-by: Max Reitz 
---
 block/qcow2-cluster.c | 51 +++
 1 file changed, 51 insertions(+)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 0b762502f6..2fe7a0f79c 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -375,6 +375,57 @@ fail:
 return ret;
 }
 
+/*
+ * For a given L2 entry, count the number of contiguous subclusters of
+ * the same type starting from @sc_from. Compressed clusters are
+ * treated as if they were divided into subclusters of size
+ * s->subcluster_size.
+ *
+ * Return the number of contiguous subclusters and set @type to the
+ * subcluster type.
+ *
+ * If the L2 entry is invalid return -errno and set @type to
+ * QCOW2_SUBCLUSTER_INVALID.
+ */
+G_GNUC_UNUSED
+static int qcow2_get_subcluster_range_type(BlockDriverState *bs,
+   uint64_t l2_entry,
+   uint64_t l2_bitmap,
+   unsigned sc_from,
+   QCow2SubclusterType *type)
+{
+BDRVQcow2State *s = bs->opaque;
+uint32_t val;
+
+*type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc_from);
+
+if (*type == QCOW2_SUBCLUSTER_INVALID) {
+return -EINVAL;
+} else if (!has_subclusters(s) || *type == QCOW2_SUBCLUSTER_COMPRESSED) {
+return s->subclusters_per_cluster - sc_from;
+}
+
+switch (*type) {
+case QCOW2_SUBCLUSTER_NORMAL:
+val = l2_bitmap | QCOW_OFLAG_SUB_ALLOC_RANGE(0, sc_from);
+return cto32(val) - sc_from;
+
+case QCOW2_SUBCLUSTER_ZERO_PLAIN:
+case QCOW2_SUBCLUSTER_ZERO_ALLOC:
+val = (l2_bitmap | QCOW_OFLAG_SUB_ZERO_RANGE(0, sc_from)) >> 32;
+return cto32(val) - sc_from;
+
+case QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN:
+case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
+val = ((l2_bitmap >> 32) | l2_bitmap)
+& ~QCOW_OFLAG_SUB_ALLOC_RANGE(0, sc_from);
+return ctz32(val) - sc_from;
+
+default:
+g_assert_not_reached();
+}
+}
+
 /*
  * Checks how many clusters in a given L2 slice are contiguous in the image
  * file. As soon as one of the flags in the bitmask stop_flags changes compared
-- 
2.26.2




[PULL 05/34] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()

2020-08-25 Thread Max Reitz
From: Alberto Garcia 

When writing to a qcow2 file there are two functions that take a
virtual offset and return a host offset, possibly allocating new
clusters if necessary:

   - handle_copied() looks for normal data clusters that are already
 allocated and have a reference count of 1. In those clusters we
 can simply write the data and there is no need to perform any
 copy-on-write.

   - handle_alloc() looks for clusters that do need copy-on-write,
 either because they haven't been allocated yet, because their
 reference count is != 1 or because they are ZERO_ALLOC clusters.

The ZERO_ALLOC case is a bit special because those are clusters that
are already allocated and they could perfectly be dealt with in
handle_copied() (as long as copy-on-write is performed when required).

In fact, there is extra code specifically for them in handle_alloc()
that tries to reuse the existing allocation if possible and frees them
otherwise.

This patch changes the handling of ZERO_ALLOC clusters so the
semantics of these two functions are now like this:

   - handle_copied() looks for clusters that are already allocated and
 which we can overwrite (NORMAL and ZERO_ALLOC clusters with a
 reference count of 1).

   - handle_alloc() looks for clusters for which we need a new
 allocation (all other cases).

One important difference after this change is that clusters found
in handle_copied() may now require copy-on-write, but this will be
necessary anyway once we add support for subclusters.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
Message-Id: 

Signed-off-by: Max Reitz 
---
 block/qcow2-cluster.c | 252 +++---
 1 file changed, 139 insertions(+), 113 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 369689b27c..b7b7b37062 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1040,13 +1040,18 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, 
QCowL2Meta *m)
 
 /*
  * For a given write request, create a new QCowL2Meta structure, add
- * it to @m and the BDRVQcow2State.cluster_allocs list.
+ * it to @m and the BDRVQcow2State.cluster_allocs list. If the write
+ * request does not need copy-on-write or changes to the L2 metadata
+ * then this function does nothing.
  *
  * @host_cluster_offset points to the beginning of the first cluster.
  *
  * @guest_offset and @bytes indicate the offset and length of the
  * request.
  *
+ * @l2_slice contains the L2 entries of all clusters involved in this
+ * write request.
+ *
  * If @keep_old is true it means that the clusters were already
  * allocated and will be overwritten. If false then the clusters are
  * new and we have to decrease the reference count of the old ones.
@@ -1054,15 +1059,53 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, 
QCowL2Meta *m)
 static void calculate_l2_meta(BlockDriverState *bs,
   uint64_t host_cluster_offset,
   uint64_t guest_offset, unsigned bytes,
-  QCowL2Meta **m, bool keep_old)
+  uint64_t *l2_slice, QCowL2Meta **m, bool 
keep_old)
 {
 BDRVQcow2State *s = bs->opaque;
-unsigned cow_start_from = 0;
+int l2_index = offset_to_l2_slice_index(s, guest_offset);
+uint64_t l2_entry;
+unsigned cow_start_from, cow_end_to;
 unsigned cow_start_to = offset_into_cluster(s, guest_offset);
 unsigned cow_end_from = cow_start_to + bytes;
-unsigned cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
 unsigned nb_clusters = size_to_clusters(s, cow_end_from);
 QCowL2Meta *old_m = *m;
+QCow2ClusterType type;
+
+assert(nb_clusters <= s->l2_slice_size - l2_index);
+
+/* Return if there's no COW (all clusters are normal and we keep them) */
+if (keep_old) {
+int i;
+for (i = 0; i < nb_clusters; i++) {
+l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
+if (qcow2_get_cluster_type(bs, l2_entry) != QCOW2_CLUSTER_NORMAL) {
+break;
+}
+}
+if (i == nb_clusters) {
+return;
+}
+}
+
+/* Get the L2 entry of the first cluster */
+l2_entry = be64_to_cpu(l2_slice[l2_index]);
+type = qcow2_get_cluster_type(bs, l2_entry);
+
+if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
+cow_start_from = cow_start_to;
+} else {
+cow_start_from = 0;
+}
+
+/* Get the L2 entry of the last cluster */
+l2_entry = be64_to_cpu(l2_slice[l2_index + nb_clusters - 1]);
+type = qcow2_get_cluster_type(bs, l2_entry);
+
+if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
+cow_end_to = cow_end_from;
+} else {
+cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
+}
 
 *m = g_malloc0(sizeof(**m));
 **m = (QCowL2Meta) {
@@ -1088,18 +1131,22 @@ static void calculate_l2_meta(BlockDriverState *bs,
 

[PULL 04/34] qcow2: Split cluster_needs_cow() out of count_cow_clusters()

2020-08-25 Thread Max Reitz
From: Alberto Garcia 

We are going to need it in other places.

Signed-off-by: Alberto Garcia 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: 
<65e5d9627ca2ebe7e62deaeddf60949c33067d9d.1594396418.git.be...@igalia.com>
Signed-off-by: Max Reitz 
---
 block/qcow2-cluster.c | 34 +++---
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 6447222ba0..369689b27c 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1088,6 +1088,24 @@ static void calculate_l2_meta(BlockDriverState *bs,
 QLIST_INSERT_HEAD(>cluster_allocs, *m, next_in_flight);
 }
 
+/* Returns true if writing to a cluster requires COW */
+static bool cluster_needs_cow(BlockDriverState *bs, uint64_t l2_entry)
+{
+switch (qcow2_get_cluster_type(bs, l2_entry)) {
+case QCOW2_CLUSTER_NORMAL:
+if (l2_entry & QCOW_OFLAG_COPIED) {
+return false;
+}
+case QCOW2_CLUSTER_UNALLOCATED:
+case QCOW2_CLUSTER_COMPRESSED:
+case QCOW2_CLUSTER_ZERO_PLAIN:
+case QCOW2_CLUSTER_ZERO_ALLOC:
+return true;
+default:
+abort();
+}
+}
+
 /*
  * Returns the number of contiguous clusters that can be used for an allocating
  * write, but require COW to be performed (this includes yet unallocated space,
@@ -1100,25 +1118,11 @@ static int count_cow_clusters(BlockDriverState *bs, int 
nb_clusters,
 
 for (i = 0; i < nb_clusters; i++) {
 uint64_t l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
-QCow2ClusterType cluster_type = qcow2_get_cluster_type(bs, l2_entry);
-
-switch(cluster_type) {
-case QCOW2_CLUSTER_NORMAL:
-if (l2_entry & QCOW_OFLAG_COPIED) {
-goto out;
-}
+if (!cluster_needs_cow(bs, l2_entry)) {
 break;
-case QCOW2_CLUSTER_UNALLOCATED:
-case QCOW2_CLUSTER_COMPRESSED:
-case QCOW2_CLUSTER_ZERO_PLAIN:
-case QCOW2_CLUSTER_ZERO_ALLOC:
-break;
-default:
-abort();
 }
 }
 
-out:
 assert(i <= nb_clusters);
 return i;
 }
-- 
2.26.2




[PULL 21/34] qcow2: Add subcluster support to qcow2_get_host_offset()

2020-08-25 Thread Max Reitz
From: Alberto Garcia 

The logic of this function remains pretty much the same, except that
it uses count_contiguous_subclusters(), which combines the logic of
count_contiguous_clusters() / count_contiguous_clusters_unallocated()
and checks individual subclusters.

qcow2_cluster_to_subcluster_type() is not necessary as a separate
function anymore so it's inlined into its caller.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Message-Id: 

[mreitz: Initialize expected_type to anything]
Signed-off-by: Max Reitz 
---
 block/qcow2.h |  38 ---
 block/qcow2-cluster.c | 150 ++
 2 files changed, 92 insertions(+), 96 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 5df761edc3..4fad40b96b 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -710,29 +710,6 @@ static inline QCow2ClusterType 
qcow2_get_cluster_type(BlockDriverState *bs,
 }
 }
 
-/*
- * For an image without extended L2 entries, return the
- * QCow2SubclusterType equivalent of a given QCow2ClusterType.
- */
-static inline
-QCow2SubclusterType qcow2_cluster_to_subcluster_type(QCow2ClusterType type)
-{
-switch (type) {
-case QCOW2_CLUSTER_COMPRESSED:
-return QCOW2_SUBCLUSTER_COMPRESSED;
-case QCOW2_CLUSTER_ZERO_PLAIN:
-return QCOW2_SUBCLUSTER_ZERO_PLAIN;
-case QCOW2_CLUSTER_ZERO_ALLOC:
-return QCOW2_SUBCLUSTER_ZERO_ALLOC;
-case QCOW2_CLUSTER_NORMAL:
-return QCOW2_SUBCLUSTER_NORMAL;
-case QCOW2_CLUSTER_UNALLOCATED:
-return QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN;
-default:
-g_assert_not_reached();
-}
-}
-
 /*
  * In an image without subsclusters @l2_bitmap is ignored and
  * @sc_index must be 0.
@@ -776,7 +753,20 @@ QCow2SubclusterType 
qcow2_get_subcluster_type(BlockDriverState *bs,
 g_assert_not_reached();
 }
 } else {
-return qcow2_cluster_to_subcluster_type(type);
+switch (type) {
+case QCOW2_CLUSTER_COMPRESSED:
+return QCOW2_SUBCLUSTER_COMPRESSED;
+case QCOW2_CLUSTER_ZERO_PLAIN:
+return QCOW2_SUBCLUSTER_ZERO_PLAIN;
+case QCOW2_CLUSTER_ZERO_ALLOC:
+return QCOW2_SUBCLUSTER_ZERO_ALLOC;
+case QCOW2_CLUSTER_NORMAL:
+return QCOW2_SUBCLUSTER_NORMAL;
+case QCOW2_CLUSTER_UNALLOCATED:
+return QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN;
+default:
+g_assert_not_reached();
+}
 }
 }
 
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 5e7ae0843d..08ecb4ca0c 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -426,66 +426,66 @@ static int 
qcow2_get_subcluster_range_type(BlockDriverState *bs,
 }
 
 /*
- * Checks how many clusters in a given L2 slice are contiguous in the image
- * file. As soon as one of the flags in the bitmask stop_flags changes compared
- * to the first cluster, the search is stopped and the cluster is not counted
- * as contiguous. (This allows it, for example, to stop at the first compressed
- * cluster which may require a different handling)
+ * Return the number of contiguous subclusters of the exact same type
+ * in a given L2 slice, starting from cluster @l2_index, subcluster
+ * @sc_index. Allocated subclusters are required to be contiguous in
+ * the image file.
+ * At most @nb_clusters are checked (note that this means clusters,
+ * not subclusters).
+ * Compressed clusters are always processed one by one but for the
+ * purpose of this count they are treated as if they were divided into
+ * subclusters of size s->subcluster_size.
+ * On failure return -errno and update @l2_index to point to the
+ * invalid entry.
  */
-static int count_contiguous_clusters(BlockDriverState *bs, int nb_clusters,
-int cluster_size, uint64_t *l2_slice, int l2_index, uint64_t 
stop_flags)
+static int count_contiguous_subclusters(BlockDriverState *bs, int nb_clusters,
+unsigned sc_index, uint64_t *l2_slice,
+unsigned *l2_index)
 {
 BDRVQcow2State *s = bs->opaque;
-int i;
-QCow2ClusterType first_cluster_type;
-uint64_t mask = stop_flags | L2E_OFFSET_MASK | QCOW_OFLAG_COMPRESSED;
-uint64_t first_entry = get_l2_entry(s, l2_slice, l2_index);
-uint64_t offset = first_entry & mask;
+int i, count = 0;
+bool check_offset = false;
+uint64_t expected_offset = 0;
+QCow2SubclusterType expected_type = QCOW2_SUBCLUSTER_NORMAL, type;
 
-first_cluster_type = qcow2_get_cluster_type(bs, first_entry);
-if (first_cluster_type == QCOW2_CLUSTER_UNALLOCATED) {
-return 0;
-}
-
-/* must be allocated */
-assert(first_cluster_type == QCOW2_CLUSTER_NORMAL ||
-   first_cluster_type == QCOW2_CLUSTER_ZERO_ALLOC);
+assert(*l2_index + nb_clusters <= s->l2_slice_size);
 
 for (i = 0; i < nb_clusters; i++) {
-uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index + i) & mask;

[PULL 29/34] qcow2: Add subcluster support to qcow2_measure()

2020-08-25 Thread Max Reitz
From: Alberto Garcia 

Extended L2 entries are bigger than normal L2 entries so this has an
impact on the amount of metadata needed for a qcow2 file.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Message-Id: 
<7efae2efd5e36b42d2570743a12576d68ce53685.1594396418.git.be...@igalia.com>
Signed-off-by: Max Reitz 
---
 block/qcow2.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 0cf0b0a9fb..54c9b7c119 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3232,28 +3232,31 @@ int64_t qcow2_refcount_metadata_size(int64_t clusters, 
size_t cluster_size,
  * @total_size: virtual disk size in bytes
  * @cluster_size: cluster size in bytes
  * @refcount_order: refcount bits power-of-2 exponent
+ * @extended_l2: true if the image has extended L2 entries
  *
  * Returns: Total number of bytes required for the fully allocated image
  * (including metadata).
  */
 static int64_t qcow2_calc_prealloc_size(int64_t total_size,
 size_t cluster_size,
-int refcount_order)
+int refcount_order,
+bool extended_l2)
 {
 int64_t meta_size = 0;
 uint64_t nl1e, nl2e;
 int64_t aligned_total_size = ROUND_UP(total_size, cluster_size);
+size_t l2e_size = extended_l2 ? L2E_SIZE_EXTENDED : L2E_SIZE_NORMAL;
 
 /* header: 1 cluster */
 meta_size += cluster_size;
 
 /* total size of L2 tables */
 nl2e = aligned_total_size / cluster_size;
-nl2e = ROUND_UP(nl2e, cluster_size / sizeof(uint64_t));
-meta_size += nl2e * sizeof(uint64_t);
+nl2e = ROUND_UP(nl2e, cluster_size / l2e_size);
+meta_size += nl2e * l2e_size;
 
 /* total size of L1 tables */
-nl1e = nl2e * sizeof(uint64_t) / cluster_size;
+nl1e = nl2e * l2e_size / cluster_size;
 nl1e = ROUND_UP(nl1e, cluster_size / sizeof(uint64_t));
 meta_size += nl1e * sizeof(uint64_t);
 
@@ -4838,6 +4841,8 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, 
BlockDriverState *in_bs,
 PreallocMode prealloc;
 bool has_backing_file;
 bool has_luks;
+bool extended_l2 = false; /* Set to false until the option is added */
+size_t l2e_size;
 
 /* Parse image creation options */
 cluster_size = qcow2_opt_get_cluster_size_del(opts, _err);
@@ -4896,8 +4901,9 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, 
BlockDriverState *in_bs,
 virtual_size = ROUND_UP(virtual_size, cluster_size);
 
 /* Check that virtual disk size is valid */
+l2e_size = extended_l2 ? L2E_SIZE_EXTENDED : L2E_SIZE_NORMAL;
 l2_tables = DIV_ROUND_UP(virtual_size / cluster_size,
- cluster_size / sizeof(uint64_t));
+ cluster_size / l2e_size);
 if (l2_tables * sizeof(uint64_t) > QCOW_MAX_L1_SIZE) {
 error_setg(_err, "The image size is too large "
"(try using a larger cluster size)");
@@ -4960,9 +4966,9 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, 
BlockDriverState *in_bs,
 }
 
 info = g_new0(BlockMeasureInfo, 1);
-info->fully_allocated =
+info->fully_allocated = luks_payload_size +
 qcow2_calc_prealloc_size(virtual_size, cluster_size,
- ctz32(refcount_bits)) + luks_payload_size;
+ ctz32(refcount_bits), extended_l2);
 
 /*
  * Remove data clusters that are not required.  This overestimates the
-- 
2.26.2




Re: [PATCH v2 3/5] qemu-iotests: Support varargs syntax in FilePaths

2020-08-25 Thread Max Reitz
On 21.08.20 01:54, Nir Soffer wrote:
> Accept variable number of names instead of a sequence:
> 
> with FilePaths("a", "b", "c") as (a, b, c):
> 
> The disadvantage is that base_dir must be used as kwarg:
> 
> with FilePaths("a", "b", base_dir=soc_dir) as (sock1, sock2):
> 
> But this is more clear and calling optional argument as positional
> arguments is bad idea anyway.
> 
> Signed-off-by: Nir Soffer 
> ---
>  tests/qemu-iotests/194|  4 ++--
>  tests/qemu-iotests/257| 10 --
>  tests/qemu-iotests/iotests.py |  6 +++---
>  3 files changed, 9 insertions(+), 11 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


[PULL 24/34] qcow2: Add subcluster support to check_refcounts_l2()

2020-08-25 Thread Max Reitz
From: Alberto Garcia 

The offset field of an uncompressed cluster's L2 entry must be aligned
to the cluster size, otherwise it is invalid. If the cluster has no
data then it means that the offset points to a preallocation, so we
can clear the offset field without affecting the guest-visible data.
This is what 'qemu-img check' does when run in repair mode.

On traditional qcow2 images this can only happen when QCOW_OFLAG_ZERO
is set, and repairing such entries turns the clusters from ZERO_ALLOC
into ZERO_PLAIN.

Extended L2 entries have no ZERO_ALLOC clusters and no QCOW_OFLAG_ZERO
but the idea is the same: if none of the subclusters are allocated
then we can clear the offset field and leave the bitmap untouched.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Message-Id: 
<9f4ed1d0a34b0a545b032c31ecd8c14734065342.1594396418.git.be...@igalia.com>
Signed-off-by: Max Reitz 
---
 block/qcow2-refcount.c | 16 +++-
 tests/qemu-iotests/060.out |  2 +-
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 770c5dbc83..aae52607eb 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1669,12 +1669,18 @@ static int check_refcounts_l2(BlockDriverState *bs, 
BdrvCheckResult *res,
 
 /* Correct offsets are cluster aligned */
 if (offset_into_cluster(s, offset)) {
+bool contains_data;
 res->corruptions++;
 
-if (qcow2_get_cluster_type(bs, l2_entry) ==
-QCOW2_CLUSTER_ZERO_ALLOC)
-{
-fprintf(stderr, "%s offset=%" PRIx64 ": Preallocated zero "
+if (has_subclusters(s)) {
+uint64_t l2_bitmap = get_l2_bitmap(s, l2_table, i);
+contains_data = (l2_bitmap & QCOW_L2_BITMAP_ALL_ALLOC);
+} else {
+contains_data = !(l2_entry & QCOW_OFLAG_ZERO);
+}
+
+if (!contains_data) {
+fprintf(stderr, "%s offset=%" PRIx64 ": Preallocated "
 "cluster is not properly aligned; L2 entry "
 "corrupted.\n",
 fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR",
@@ -1686,7 +1692,7 @@ static int check_refcounts_l2(BlockDriverState *bs, 
BdrvCheckResult *res,
 int ign = active ? QCOW2_OL_ACTIVE_L2 :
QCOW2_OL_INACTIVE_L2;
 
-l2_entry = QCOW_OFLAG_ZERO;
+l2_entry = has_subclusters(s) ? 0 : QCOW_OFLAG_ZERO;
 set_l2_entry(s, l2_table, i, l2_entry);
 ret = qcow2_pre_write_overlap_check(bs, ign,
 l2e_offset, l2_entry_size(s), false);
diff --git a/tests/qemu-iotests/060.out b/tests/qemu-iotests/060.out
index e574c38797..b2804a0d07 100644
--- a/tests/qemu-iotests/060.out
+++ b/tests/qemu-iotests/060.out
@@ -320,7 +320,7 @@ discard 65536/65536 bytes at offset 0
 qcow2: Marking image as corrupt: Preallocated zero cluster offset 0x2a00 
unaligned (guest offset: 0); further corruption events will be suppressed
 write failed: Input/output error
 --- Repairing ---
-Repairing offset=2a00: Preallocated zero cluster is not properly aligned; L2 
entry corrupted.
+Repairing offset=2a00: Preallocated cluster is not properly aligned; L2 entry 
corrupted.
 The following inconsistencies were found and repaired:
 
 0 leaked clusters
-- 
2.26.2




[PATCH v2 2/3] nbd: skip SIGTERM handler if NBD device support is not built

2020-08-25 Thread Daniel P . Berrangé
The termsig_handler function is used by the client thread handling the
host NBD device connection to do a graceful shutdown. IOW, if we have
disabled NBD device support at compile time, we don't need the SIGTERM
handler. This fixes a build issue for Windows.

Signed-off-by: Daniel P. Berrangé 
---
 qemu-nbd.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/qemu-nbd.c b/qemu-nbd.c
index b102874f0f..dc6ef089af 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -155,12 +155,13 @@ QEMU_COPYRIGHT "\n"
 , name);
 }
 
+#if HAVE_NBD_DEVICE
 static void termsig_handler(int signum)
 {
 atomic_cmpxchg(, RUNNING, TERMINATE);
 qemu_notify_event();
 }
-
+#endif /* HAVE_NBD_DEVICE */
 
 static int qemu_nbd_client_list(SocketAddress *saddr, QCryptoTLSCreds *tls,
 const char *hostname)
@@ -587,6 +588,7 @@ int main(int argc, char **argv)
 unsigned socket_activation;
 const char *pid_file_name = NULL;
 
+#if HAVE_NBD_DEVICE
 /* The client thread uses SIGTERM to interrupt the server.  A signal
  * handler ensures that "qemu-nbd -v -c" exits with a nice status code.
  */
@@ -594,6 +596,7 @@ int main(int argc, char **argv)
 memset(_sigterm, 0, sizeof(sa_sigterm));
 sa_sigterm.sa_handler = termsig_handler;
 sigaction(SIGTERM, _sigterm, NULL);
+#endif /* HAVE_NBD_DEVICE */
 
 #ifdef CONFIG_POSIX
 signal(SIGPIPE, SIG_IGN);
-- 
2.26.2




[PATCH v2 3/3] nbd: disable signals and forking on Windows builds

2020-08-25 Thread Daniel P . Berrangé
Disabling these parts are sufficient to get the qemu-nbd program
compiling in a Windows build.

Signed-off-by: Daniel P. Berrangé 
---
 meson.build | 7 ++-
 qemu-nbd.c  | 5 +
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/meson.build b/meson.build
index df5bf728b5..1071871605 100644
--- a/meson.build
+++ b/meson.build
@@ -1074,12 +1074,9 @@ if have_tools
  dependencies: [authz, block, crypto, io, qom, qemuutil], install: 
true)
   qemu_io = executable('qemu-io', files('qemu-io.c'),
  dependencies: [block, qemuutil], install: true)
-  qemu_block_tools += [qemu_img, qemu_io]
-  if targetos == 'linux' or targetos == 'sunos' or targetos.endswith('bsd')
-qemu_nbd = executable('qemu-nbd', files('qemu-nbd.c'),
+  qemu_nbd = executable('qemu-nbd', files('qemu-nbd.c'),
dependencies: [block, qemuutil], install: true)
-qemu_block_tools += [qemu_nbd]
-  endif
+  qemu_block_tools += [qemu_img, qemu_io, qemu_nbd]
 
   subdir('storage-daemon')
   subdir('contrib/rdmacm-mux')
diff --git a/qemu-nbd.c b/qemu-nbd.c
index dc6ef089af..33476a1000 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -899,6 +899,7 @@ int main(int argc, char **argv)
 #endif
 
 if ((device && !verbose) || fork_process) {
+#ifndef WIN32
 int stderr_fd[2];
 pid_t pid;
 int ret;
@@ -962,6 +963,10 @@ int main(int argc, char **argv)
  */
 exit(errors);
 }
+#else /* WIN32 */
+error_report("Unable to fork into background on Windows hosts");
+exit(EXIT_FAILURE);
+#endif /* WIN32 */
 }
 
 if (device != NULL && sockpath == NULL) {
-- 
2.26.2




Re: [PATCH v2 1/2] util/hexdump: Convert to take a void pointer argument

2020-08-25 Thread Alistair Francis
On Sat, Aug 22, 2020 at 11:10 AM Philippe Mathieu-Daudé  wrote:
>
> Most uses of qemu_hexdump() do not take an array of char
> as input, forcing use of cast. Since we can use this
> helper to dump any kind of buffer, use a pointer to void
> argument instead.
>
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Alistair Francis 

Alistair

> ---
> Since v1:
> - renamed argument 'bufptr' (Peter Maydell)
> ---
>  include/qemu-common.h|  3 ++-
>  hw/dma/xlnx_dpdma.c  |  2 +-
>  hw/net/fsl_etsec/etsec.c |  2 +-
>  hw/sd/sd.c   |  2 +-
>  hw/usb/redirect.c|  2 +-
>  net/colo-compare.c   | 12 ++--
>  net/net.c|  2 +-
>  util/hexdump.c   |  4 +++-
>  8 files changed, 16 insertions(+), 13 deletions(-)
>
> diff --git a/include/qemu-common.h b/include/qemu-common.h
> index bb9496bd80f..6834883822f 100644
> --- a/include/qemu-common.h
> +++ b/include/qemu-common.h
> @@ -138,7 +138,8 @@ int os_parse_cmd_args(int index, const char *optarg);
>   * Hexdump a buffer to a file. An optional string prefix is added to every 
> line
>   */
>
> -void qemu_hexdump(const char *buf, FILE *fp, const char *prefix, size_t 
> size);
> +void qemu_hexdump(const void *bufptr, FILE *fp,
> +  const char *prefix, size_t size);
>
>  /*
>   * helper to parse debug environment variables
> diff --git a/hw/dma/xlnx_dpdma.c b/hw/dma/xlnx_dpdma.c
> index b40c897de2c..d75a8069426 100644
> --- a/hw/dma/xlnx_dpdma.c
> +++ b/hw/dma/xlnx_dpdma.c
> @@ -388,7 +388,7 @@ static void xlnx_dpdma_dump_descriptor(DPDMADescriptor 
> *desc)
>  {
>  if (DEBUG_DPDMA) {
>  qemu_log("DUMP DESCRIPTOR:\n");
> -qemu_hexdump((char *)desc, stdout, "", sizeof(DPDMADescriptor));
> +qemu_hexdump(desc, stdout, "", sizeof(DPDMADescriptor));
>  }
>  }
>
> diff --git a/hw/net/fsl_etsec/etsec.c b/hw/net/fsl_etsec/etsec.c
> index 7035cf4eb97..c817a28decd 100644
> --- a/hw/net/fsl_etsec/etsec.c
> +++ b/hw/net/fsl_etsec/etsec.c
> @@ -357,7 +357,7 @@ static ssize_t etsec_receive(NetClientState *nc,
>
>  #if defined(HEX_DUMP)
>  fprintf(stderr, "%s receive size:%zd\n", nc->name, size);
> -qemu_hexdump((void *)buf, stderr, "", size);
> +qemu_hexdump(buf, stderr, "", size);
>  #endif
>  /* Flush is unnecessary as are already in receiving path */
>  etsec->need_flush = false;
> diff --git a/hw/sd/sd.c b/hw/sd/sd.c
> index fad9cf1ee7a..190e4cf1232 100644
> --- a/hw/sd/sd.c
> +++ b/hw/sd/sd.c
> @@ -1781,7 +1781,7 @@ send_response:
>  }
>
>  #ifdef DEBUG_SD
> -qemu_hexdump((const char *)response, stderr, "Response", rsplen);
> +qemu_hexdump(response, stderr, "Response", rsplen);
>  #endif
>
>  return rsplen;
> diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c
> index 417a60a2e68..09edb0d81c0 100644
> --- a/hw/usb/redirect.c
> +++ b/hw/usb/redirect.c
> @@ -240,7 +240,7 @@ static void usbredir_log_data(USBRedirDevice *dev, const 
> char *desc,
>  if (dev->debug < usbredirparser_debug_data) {
>  return;
>  }
> -qemu_hexdump((char *)data, stderr, desc, len);
> +qemu_hexdump(data, stderr, desc, len);
>  }
>
>  /*
> diff --git a/net/colo-compare.c b/net/colo-compare.c
> index 2c20de1537d..550272b3baa 100644
> --- a/net/colo-compare.c
> +++ b/net/colo-compare.c
> @@ -494,9 +494,9 @@ sec:
>  g_queue_push_head(>secondary_list, spkt);
>
>  if (trace_event_get_state_backends(TRACE_COLO_COMPARE_MISCOMPARE)) {
> -qemu_hexdump((char *)ppkt->data, stderr,
> +qemu_hexdump(ppkt->data, stderr,
>  "colo-compare ppkt", ppkt->size);
> -qemu_hexdump((char *)spkt->data, stderr,
> +qemu_hexdump(spkt->data, stderr,
>  "colo-compare spkt", spkt->size);
>  }
>
> @@ -535,9 +535,9 @@ static int colo_packet_compare_udp(Packet *spkt, Packet 
> *ppkt)
>  trace_colo_compare_udp_miscompare("primary pkt size", ppkt->size);
>  trace_colo_compare_udp_miscompare("Secondary pkt size", spkt->size);
>  if (trace_event_get_state_backends(TRACE_COLO_COMPARE_MISCOMPARE)) {
> -qemu_hexdump((char *)ppkt->data, stderr, "colo-compare pri pkt",
> +qemu_hexdump(ppkt->data, stderr, "colo-compare pri pkt",
>   ppkt->size);
> -qemu_hexdump((char *)spkt->data, stderr, "colo-compare sec pkt",
> +qemu_hexdump(spkt->data, stderr, "colo-compare sec pkt",
>   spkt->size);
>  }
>  return -1;
> @@ -578,9 +578,9 @@ static int colo_packet_compare_icmp(Packet *spkt, Packet 
> *ppkt)
>  trace_colo_compare_icmp_miscompare("Secondary pkt size",
> spkt->size);
>  if (trace_event_get_state_backends(TRACE_COLO_COMPARE_MISCOMPARE)) {
> -qemu_hexdump((char *)ppkt->data, stderr, "colo-compare pri pkt",
> +qemu_hexdump(ppkt->data, stderr, "colo-compare 

Re: [PATCH 0/1] qcow2: Skip copy-on-write when allocating a zero cluster

2020-08-25 Thread Brian Foster
On Tue, Aug 25, 2020 at 02:24:58PM +0200, Alberto Garcia wrote:
> On Fri 21 Aug 2020 07:02:32 PM CEST, Brian Foster wrote:
> >> I was running fio with --ramp_time=5 which ignores the first 5 seconds
> >> of data in order to let performance settle, but if I remove that I can
> >> see the effect more clearly. I can observe it with raw files (in 'off'
> >> and 'prealloc' modes) and qcow2 files in 'prealloc' mode. With qcow2 and
> >> preallocation=off the performance is stable during the whole test.
> >
> > That's interesting. I ran your fio command (without --ramp_time and
> > with --runtime=5m) against a file on XFS (so no qcow2, no zero_range)
> > once with sparse file with a 64k extent size hint and again with a
> > fully preallocated 25GB file and I saw similar results in terms of the
> > delta.  This was just against an SSD backed vdisk in my local dev VM,
> > but I saw ~5800 iops for the full preallocation test and ~6200 iops
> > with the extent size hint.
> >
> > I do notice an initial iops burst as described for both tests, so I
> > switched to use a 60s ramp time and 60s runtime. With that longer ramp
> > up time, I see ~5000 iops with the 64k extent size hint and ~5500 iops
> > with the full 25GB prealloc. Perhaps the unexpected performance delta
> > with qcow2 is similarly transient towards the start of the test and
> > the runtime is short enough that it skews the final results..?
> 
> I also tried running directly against a file on xfs (no qcow2, no VMs)
> but it doesn't really matter whether I use --ramp_time=5 or 60.
> 
> Here are the results:
> 
> |---+---+---|
> | preallocation |   xfs |  ext4 |
> |---+---+---|
> | off   |  7277 | 43260 |
> | fallocate |  7299 | 42810 |
> | full  | 88404 | 83197 |
> |---+---+---|
> 
> I ran the first case (no preallocation) for 5 minutes and I said there's
> a peak during the first 5 seconds, but then the number remains under 10k
> IOPS for the rest of the 5 minutes.
> 

I don't think we're talking about the same thing. I was referring to the
difference between full file preallocation and the extent size hint in
XFS, and how the latter was faster with the shorter ramp time but that
swapped around when the test ramped up for longer. Here, it looks like
you're comparing XFS to ext4 writing direct to a file..

If I compare this 5m fio test between XFS and ext4 on a couple of my
systems (with either no prealloc or full file prealloc), I end up seeing
ext4 run slightly faster on my vm and XFS slightly faster on bare metal.
Either way, I don't see that huge disparity where ext4 is 5-6 times
faster than XFS. Can you describe the test, filesystem and storage in
detail where you observe such a discrepancy?

Brian




Re: [PATCH 0/1] qcow2: Skip copy-on-write when allocating a zero cluster

2020-08-25 Thread Alberto Garcia
On Tue 25 Aug 2020 06:54:15 PM CEST, Brian Foster wrote:
> If I compare this 5m fio test between XFS and ext4 on a couple of my
> systems (with either no prealloc or full file prealloc), I end up seeing
> ext4 run slightly faster on my vm and XFS slightly faster on bare metal.
> Either way, I don't see that huge disparity where ext4 is 5-6 times
> faster than XFS. Can you describe the test, filesystem and storage in
> detail where you observe such a discrepancy?

Here's the test:

fio --filename=/path/to/file.raw --direct=1 --randrepeat=1 \
--eta=always --ioengine=libaio --iodepth=32 --numjobs=1 \
--name=test --size=25G --io_limit=25G --ramp_time=0 \
--rw=randwrite --bs=4k --runtime=300 --time_based=1

The size of the XFS filesystem is 126 GB and it's almost empty, here's
the xfs_info output:

meta-data=/dev/vg/test   isize=512agcount=4, agsize=8248576
blks
 =   sectsz=512   attr=2, projid32bit=1
 =   crc=1finobt=1, sparse=1,
 rmapbt=0
 =   reflink=0
data =   bsize=4096   blocks=32994304, imaxpct=25
 =   sunit=0  swidth=0 blks
naming   =version 2  bsize=4096   ascii-ci=0, ftype=1
log  =internal log   bsize=4096   blocks=16110, version=2
 =   sectsz=512   sunit=0 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0

The size of the ext4 filesystem is 99GB, of which 49GB are free (that
is, without the file used in this test). The filesystem uses 4KB
blocks, a 128M journal and these features:

Filesystem revision #:1 (dynamic)
Filesystem features:  has_journal ext_attr resize_inode dir_index
  filetype needs_recovery extent flex_bg
  sparse_super large_file huge_file uninit_bg
  dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options:user_xattr acl

In both cases I'm using LVM on top of LUKS and the hard drive is a
Samsung SSD 850 PRO 1TB.

The Linux version is 4.19.132-1 from Debian.

Berto



Re: [PATCH v4 2/6] util: refactor qemu_open_old to split off variadic args handling

2020-08-25 Thread Markus Armbruster
Daniel P. Berrangé  writes:

> This simple refactoring prepares for future patches. The variadic args
> handling is split from the main bulk of the open logic. The duplicated
> calls to open() are removed in favour of updating the "flags" variable
> to have O_CLOEXEC.
>
> Signed-off-by: Daniel P. Berrangé 
> ---
>  util/osdep.c | 40 +++-
>  1 file changed, 27 insertions(+), 13 deletions(-)
>
> diff --git a/util/osdep.c b/util/osdep.c
> index 9df1b6adec..9ff92551e7 100644
> --- a/util/osdep.c
> +++ b/util/osdep.c
> @@ -22,6 +22,7 @@
>   * THE SOFTWARE.
>   */
>  #include "qemu/osdep.h"
> +#include "qapi/error.h"
>  
>  /* Needed early for CONFIG_BSD etc. */
>  
> @@ -282,10 +283,10 @@ int qemu_lock_fd_test(int fd, int64_t start, int64_t 
> len, bool exclusive)
>  /*
>   * Opens a file with FD_CLOEXEC set
>   */
> -int qemu_open_old(const char *name, int flags, ...)
> +static int
> +qemu_open_internal(const char *name, int flags, mode_t mode)
>  {
>  int ret;
> -int mode = 0;
>  
>  #ifndef _WIN32
>  const char *fdset_id_str;
> @@ -323,22 +324,35 @@ int qemu_open_old(const char *name, int flags, ...)
>  }
>  #endif
>  
> -if (flags & O_CREAT) {
> -va_list ap;
> -
> -va_start(ap, flags);
> -mode = va_arg(ap, int);
> -va_end(ap);
> -}
> -
>  #ifdef O_CLOEXEC
> -ret = open(name, flags | O_CLOEXEC, mode);
> -#else
> +flags |= O_CLOEXEC;
> +#endif /* O_CLOEXEC */
> +
>  ret = open(name, flags, mode);
> +
> +#ifndef O_CLOEXEC
>  if (ret >= 0) {
>  qemu_set_cloexec(ret);
>  }
> -#endif
> +#endif /* ! O_CLOEXEC */

I dislike this part, to be honest.  I find

#ifdef O_CLOEXEC
flags |= O_CLOEXEC;
#endif /* O_CLOEXEC */

ret = open(name, flags, mode);

#ifndef O_CLOEXEC
if (ret >= 0) {
qemu_set_cloexec(ret);
}
#endif /* ! O_CLOEXEC */

harder to read than

#ifdef O_CLOEXEC
ret = open(name, flags | O_CLOEXEC, mode);
#else
ret = open(name, flags, mode);
if (ret >= 0) {
qemu_set_cloexec(ret);
}
#endif

> +
> +return ret;
> +}
> +
> +
> +int qemu_open_old(const char *name, int flags, ...)
> +{
> +va_list ap;
> +mode_t mode = 0;
> +int ret;
> +
> +va_start(ap, flags);
> +if (flags & O_CREAT) {
> +mode = va_arg(ap, int);
> +}
> +va_end(ap);
> +
> +ret = qemu_open_internal(name, flags, mode);
>  
>  #ifdef O_DIRECT
>  if (ret == -1 && errno == EINVAL && (flags & O_DIRECT)) {




Re: [PATCH v4 4/6] util: introduce qemu_open and qemu_create with error reporting

2020-08-25 Thread Markus Armbruster
Daniel P. Berrangé  writes:

> qemu_open_old() works like open(): set errno and return -1 on failure.
> It has even more failure modes, though.  Reporting the error clearly
> to users is basically impossible for many of them.
>
> Our standard cure for "errno is too coarse" is the Error object.
> Introduce two new helper methods:
>
>   int qemu_open(const char *name, int flags, Error **errp);
>   int qemu_create(const char *name, int flags, mode_t mode, Error **errp);
>
> Note that with this design we no longer require or even accept the
> O_CREAT flag. Avoiding overloading the two distinct operations
> means we can avoid variable arguments which would prevent 'errp' from
> being the last argument. It also gives us a guarantee that the 'mode' is
> given when creating files, avoiding a latent security bug.
>
> Signed-off-by: Daniel P. Berrangé 
> ---
>  include/qemu/osdep.h |  6 ++
>  util/osdep.c | 21 +
>  2 files changed, 23 insertions(+), 4 deletions(-)
>
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index 18333e9006..13a821845b 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -497,7 +497,13 @@ int qemu_madvise(void *addr, size_t len, int advice);
>  int qemu_mprotect_rwx(void *addr, size_t size);
>  int qemu_mprotect_none(void *addr, size_t size);
>  
> +/*
> + * Don't introduce new usage of this function, prefer the following
> + * qemu_open/qemu_create that take an "Error **errp"
> + */
>  int qemu_open_old(const char *name, int flags, ...);
> +int qemu_open(const char *name, int flags, Error **errp);
> +int qemu_create(const char *name, int flags, mode_t mode, Error **errp);
>  int qemu_close(int fd);
>  int qemu_unlink(const char *name);
>  #ifndef _WIN32
> diff --git a/util/osdep.c b/util/osdep.c
> index 9c7118d3cb..a4956fbf6b 100644
> --- a/util/osdep.c
> +++ b/util/osdep.c
> @@ -344,10 +344,7 @@ qemu_open_internal(const char *name, int flags, mode_t 
> mode, Error **errp)
>  #endif /* ! O_CLOEXEC */
>  
>  if (ret == -1) {
> -const char *action = "open";
> -if (flags & O_CREAT) {
> -action = "create";
> -}
> +const char *action = flags & O_CREAT ? "create" : "open";
>  error_setg_errno(errp, errno, "Could not %s '%s' flags 0x%x",
>   action, name, flags);
>  }

Squash this hunk into PATCH 3, please.

> @@ -357,6 +354,22 @@ qemu_open_internal(const char *name, int flags, mode_t 
> mode, Error **errp)
>  }
>  
>  
> +int qemu_open(const char *name, int flags, Error **errp)
> +{
> +assert(!(flags & O_CREAT));
> +
> +return qemu_open_internal(name, flags, 0, errp);
> +}
> +
> +
> +int qemu_create(const char *name, int flags, mode_t mode, Error **errp)
> +{
> +assert(!(flags & O_CREAT));
> +
> +return qemu_open_internal(name, flags | O_CREAT, mode, errp);
> +}
> +
> +
>  int qemu_open_old(const char *name, int flags, ...)
>  {
>  va_list ap;

Other than that:
Reviewed-by: Markus Armbruster 




Re: [PATCH v4 6/6] block/fileb: switch to use qemu_open/qemu_create for improved errors

2020-08-25 Thread Markus Armbruster
Daniel P. Berrangé  writes:

> Currently at startup if using cache=none on a filesystem lacking
> O_DIRECT such as tmpfs, at startup QEMU prints
>
> qemu-system-x86_64: -drive file=/tmp/foo.img,cache=none: file system may not 
> support O_DIRECT
> qemu-system-x86_64: -drive file=/tmp/foo.img,cache=none: Could not open 
> '/tmp/foo.img': Invalid argument
>
> while at QMP level the hint is missing, so QEMU reports just
>
>   "error": {
>   "class": "GenericError",
>   "desc": "Could not open '/tmp/foo.img': Invalid argument"
>   }
>
> which is close to useless for the end user trying to figure out what
> they did wrong.
>
> With this change at startup QEMU prints
>
> qemu-system-x86_64: -drive file=/tmp/foo.img,cache=none: Unable to open 
> '/tmp/foo.img' flags 0x4000: filesystem does not support O_DIRECT
>
> while at the QMP level QEMU reports a massively more informative
>
>   "error": {
>  "class": "GenericError",
>  "desc": "Unable to open '/tmp/foo.img' flags 0x4002: filesystem does not 
> support O_DIRECT"
>   }
>
> Reviewed-by: Eric Blake 
> Reviewed-by: Philippe Mathieu-Daudé 
> Signed-off-by: Daniel P. Berrangé 

Both commit message and expected iotest results demonstrate the less
than helpful flags error reporting I pointed out in my review of PATCH
3.

Reviewed-by: Markus Armbruster 




Re: [PULL 00/34] Block patches

2020-08-25 Thread Peter Maydell
On Tue, 25 Aug 2020 at 09:33, Max Reitz  wrote:
>
> The following changes since commit 30aa19446d82358a30eac3b556b4d6641e00b7c1:
>
>   Merge remote-tracking branch 'remotes/cschoenebeck/tags/pull-9p-20200812' 
> into staging (2020-08-24 16:39:53 +0100)
>
> are available in the Git repository at:
>
>   https://github.com/XanClic/qemu.git tags/pull-block-2020-08-25
>
> for you to fetch changes up to c576fd97d4ca77b5a1a27728df11a61083dbfa98:
>
>   iotests: Add tests for qcow2 images with extended L2 entries (2020-08-25 
> 10:20:18 +0200)
>
> 
> Block patches:
> - qcow2 subclusters (extended L2 entries)

This fails 'make check' on the BSDs because it assumes it has a bash
in /bin/bash, which isn't necessarily true:


  TESTiotest-qcow2: 271 [fail]
QEMU  --
"/home/qemu/qemu-test.dvSnX6/build/tests/qemu-iotests/../../qemu-system-aarch64"
-nodefaults -display none -accel qtest -machine virt
QEMU_IMG  --
"/home/qemu/qemu-test.dvSnX6/build/tests/qemu-iotests/../../qemu-img"
QEMU_IO   --
"/home/qemu/qemu-test.dvSnX6/build/tests/qemu-iotests/../../qemu-io"
--cache writeback --aio threads -f qcow2
QEMU_NBD  --
"/home/qemu/qemu-test.dvSnX6/build/tests/qemu-iotests/../../qemu-nbd"
IMGFMT-- qcow2 (compat=1.1)
IMGPROTO  -- file
PLATFORM  -- NetBSD/amd64 localhost 9.0
TEST_DIR  -- /home/qemu/qemu-test.dvSnX6/build/tests/qemu-iotests/scratch
SOCK_DIR  -- /tmp/mktemp.fPDlHdxw
SOCKET_SCM_HELPER --

--- /home/qemu/qemu-test.dvSnX6/src/tests/qemu-iotests/271.out
2020-08-25 12:59:52.0 +
+++ /home/qemu/qemu-test.dvSnX6/build/tests/qemu-iotests/271.out.bad
 2020-08-25 13:11:25.483774595 +
@@ -1,726 +1 @@
-QA output created by 271
-
-### Standard write tests (backing file: yes) ###
-
-Formatting 'TEST_DIR/t.IMGFMT.raw', fmt=raw size=1048576
-Formatting 'TEST_DIR/t.IMGFMT.base', fmt=raw size=1048576
-Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=raw
-write -q -P PATTERN 0 1k
-L2 entry #0: 0x8005 0001
-write -q -P PATTERN 3k 512
-L2 entry #0: 0x8005 0003
-write -q -P PATTERN 5k 1k
[skip rest of expected output]
-wrote 2048/2048 bytes at offset 40960
-2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-*** done
+./check: ./271: /bin/bash: bad interpreter: No such file or directory
  TESTiotest-qcow2: 283


thanks
-- PMM



Re: [PATCH v4 2/6] util: refactor qemu_open_old to split off variadic args handling

2020-08-25 Thread Daniel P . Berrangé
On Tue, Aug 25, 2020 at 04:56:40PM +0200, Markus Armbruster wrote:
> Daniel P. Berrangé  writes:
> 
> > This simple refactoring prepares for future patches. The variadic args
> > handling is split from the main bulk of the open logic. The duplicated
> > calls to open() are removed in favour of updating the "flags" variable
> > to have O_CLOEXEC.
> >
> > Signed-off-by: Daniel P. Berrangé 
> > ---
> >  util/osdep.c | 40 +++-
> >  1 file changed, 27 insertions(+), 13 deletions(-)
> >
> > diff --git a/util/osdep.c b/util/osdep.c
> > index 9df1b6adec..9ff92551e7 100644
> > --- a/util/osdep.c
> > +++ b/util/osdep.c
> > @@ -22,6 +22,7 @@
> >   * THE SOFTWARE.
> >   */
> >  #include "qemu/osdep.h"
> > +#include "qapi/error.h"
> >  
> >  /* Needed early for CONFIG_BSD etc. */
> >  
> > @@ -282,10 +283,10 @@ int qemu_lock_fd_test(int fd, int64_t start, int64_t 
> > len, bool exclusive)
> >  /*
> >   * Opens a file with FD_CLOEXEC set
> >   */
> > -int qemu_open_old(const char *name, int flags, ...)
> > +static int
> > +qemu_open_internal(const char *name, int flags, mode_t mode)
> >  {
> >  int ret;
> > -int mode = 0;
> >  
> >  #ifndef _WIN32
> >  const char *fdset_id_str;
> > @@ -323,22 +324,35 @@ int qemu_open_old(const char *name, int flags, ...)
> >  }
> >  #endif
> >  
> > -if (flags & O_CREAT) {
> > -va_list ap;
> > -
> > -va_start(ap, flags);
> > -mode = va_arg(ap, int);
> > -va_end(ap);
> > -}
> > -
> >  #ifdef O_CLOEXEC
> > -ret = open(name, flags | O_CLOEXEC, mode);
> > -#else
> > +flags |= O_CLOEXEC;
> > +#endif /* O_CLOEXEC */
> > +
> >  ret = open(name, flags, mode);
> > +
> > +#ifndef O_CLOEXEC
> >  if (ret >= 0) {
> >  qemu_set_cloexec(ret);
> >  }
> > -#endif
> > +#endif /* ! O_CLOEXEC */
> 
> I dislike this part, to be honest.  I find
> 
> #ifdef O_CLOEXEC
> flags |= O_CLOEXEC;
> #endif /* O_CLOEXEC */
> 
> ret = open(name, flags, mode);
> 
> #ifndef O_CLOEXEC
> if (ret >= 0) {
> qemu_set_cloexec(ret);
> }
> #endif /* ! O_CLOEXEC */
> 
> harder to read than
> 
> #ifdef O_CLOEXEC
> ret = open(name, flags | O_CLOEXEC, mode);
> #else
> ret = open(name, flags, mode);
> if (ret >= 0) {
> qemu_set_cloexec(ret);
> }
> #endif

We're re-using 'flags' variable again in a later patch and want to
have O_CLOEXEC present in it then too.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v4 3/6] util: add Error object for qemu_open_internal error reporting

2020-08-25 Thread Markus Armbruster
Daniel P. Berrangé  writes:

> Instead of relying on the limited information from errno, we can now
> also provide detailed error messages.

The more detailed error messages are currently always ignored, but the
next patches will fix that.

> Signed-off-by: Daniel P. Berrangé 
> ---
>  util/osdep.c | 21 +++--
>  1 file changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/util/osdep.c b/util/osdep.c
> index 9ff92551e7..9c7118d3cb 100644
> --- a/util/osdep.c
> +++ b/util/osdep.c
> @@ -284,7 +284,7 @@ int qemu_lock_fd_test(int fd, int64_t start, int64_t len, 
> bool exclusive)
>   * Opens a file with FD_CLOEXEC set
>   */
>  static int
> -qemu_open_internal(const char *name, int flags, mode_t mode)
> +qemu_open_internal(const char *name, int flags, mode_t mode, Error **errp)
>  {
>  int ret;
>  
> @@ -298,24 +298,31 @@ qemu_open_internal(const char *name, int flags, mode_t 
> mode)
>  
>  fdset_id = qemu_parse_fdset(fdset_id_str);
>  if (fdset_id == -1) {
> +error_setg(errp, "Could not parse fdset %s", name);
>  errno = EINVAL;
>  return -1;
>  }
>  
>  fd = monitor_fdset_get_fd(fdset_id, flags);
>  if (fd < 0) {
> +error_setg_errno(errp, -fd, "Could not acquire FD for %s flags 
> %x",
> + name, flags);
>  errno = -fd;
>  return -1;
>  }
>  
>  dupfd = qemu_dup_flags(fd, flags);
>  if (dupfd == -1) {
> +error_setg_errno(errp, errno, "Could not dup FD for %s flags %x",
> + name, flags);
>  return -1;
>  }
>  
>  ret = monitor_fdset_dup_fd_add(fdset_id, dupfd);
>  if (ret == -1) {
>  close(dupfd);
> +error_setg(errp, "Could not save FD for %s flags %x",
> +   name, flags);

Can this happen?

>  errno = EINVAL;
>  return -1;
>  }
> @@ -336,6 +343,16 @@ qemu_open_internal(const char *name, int flags, mode_t 
> mode)
>  }
>  #endif /* ! O_CLOEXEC */
>  
> +if (ret == -1) {
> +const char *action = "open";
> +if (flags & O_CREAT) {
> +action = "create";
> +}
> +error_setg_errno(errp, errno, "Could not %s '%s' flags 0x%x",
> + action, name, flags);

Not a good user experience:

Could not open '/etc/shadow' flags 0x0: Permission denied

Better:

Could not open '/etc/shadow' for reading: Permission denied

Are you sure flags other than the access mode (O_RDONLY, O_WRONLY,
O_RDWR) must be included in the error message?

If you must report flags in hexadecimal, then please reporting them more
consistently.  Right now you have

for %s flags 0x%x
'%s' flags %x

Perhaps '%s' with flags 0x%x

> +}
> +
> +
>  return ret;
>  }
>  
> @@ -352,7 +369,7 @@ int qemu_open_old(const char *name, int flags, ...)
>  }
>  va_end(ap);
>  
> -ret = qemu_open_internal(name, flags, mode);
> +ret = qemu_open_internal(name, flags, mode, NULL);
>  
>  #ifdef O_DIRECT
>  if (ret == -1 && errno == EINVAL && (flags & O_DIRECT)) {




Re: [PATCH 0/2] nbd: build qemu-nbd on Windows

2020-08-25 Thread Daniel P . Berrangé
On Mon, Aug 24, 2020 at 06:02:16PM +0100, Daniel P. Berrangé wrote:
> We are already building the NBD client and server on Windows when it is
> used via the main system emulator binaries. This demonstrates there is
> no fundamental blocker to buildig the qemu-nbd binary too.
> 
> 
> In testing this I used:
> 
>wine qemu-nbd.exe -t -p 9000 demo.img 
> 
> and
> 
>wine qemu-img.exe info nbd:localhost:9000
> 
> In fact I tested the full matrix of native vs windows client and native
> vs windows server.
> 
> I did notice that native qemu-img will sometimes hang when talking to
> the windows qemu-nbd.exe binary, and the windows qemu-img will almost
> always hang.
> 
> The hang happens in the "blk_unref" call in collect_image_info_list().
> This suggests something related to the socket teardown / cleanup in
> the NBD code.
> 
> While we should obviously investigate and fix that, I didn't consider
> it a blocker for enabling build of qemu-nbd.exe, since we're already
> building the same (buggy) NBD code in the system emulators on Windows.

After more debugging here's where it is getting stuck...

The main NBD client coroutine in qemu-img.exe is getting stuck in
nbd_read_eof() call where it has done qio_channel_readv() and got
QIO_CHANNEL_ERR_BLOCK. It has thus run qio_channel_yield(G_IO_IN)
and is waiting for that to return control, where upon it will exit
on EOF

Meanwhile the main qemu-img thread is in nbd_teardown_connection
having run qio_channel_shutdown(BOTH), and is now stuck forever in
BDRV_POLL_WHILE(bs, s->connection_co); waiting for the main NBD
coroutine to exit.


On native builds, we'll get G_IO_IN|G_IO_HUP in the coroutine after
calling the qio_channel_shutdown() in the main thread, and thus
the coroutine exits.

On windows builds running under Wine this doesn't seem to happen.
If I strace the wine program it does happen. IOW there's is some
kind of race condition wrt socket shutdown in QEMU when run in
Wine.

On windows builds running Windows Server 2008 r2, it appears to
work correctly. Maybe this is just luck, or maybe the bug really
is only affecting Wine. 

I don't intend to investigate this hang any more though, given
that it doesn't seem to reproduce in native Windows


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v4 5/6] util: give a specific error message when O_DIRECT doesn't work

2020-08-25 Thread Markus Armbruster
Daniel P. Berrangé  writes:

> A common error scenario is to tell QEMU to use O_DIRECT in combination
> with a filesystem that doesn't support it. To aid users to diagnosing
> their mistake we want to provide a clear error message when this happens.
>
> Reviewed-by: Eric Blake 
> Signed-off-by: Daniel P. Berrangé 
> ---
>  util/osdep.c | 13 +
>  1 file changed, 13 insertions(+)
>
> diff --git a/util/osdep.c b/util/osdep.c
> index a4956fbf6b..6c24985f7a 100644
> --- a/util/osdep.c
> +++ b/util/osdep.c
> @@ -345,6 +345,19 @@ qemu_open_internal(const char *name, int flags, mode_t 
> mode, Error **errp)
>  
>  if (ret == -1) {
>  const char *action = flags & O_CREAT ? "create" : "open";
> +#ifdef O_DIRECT
> +if (errno == EINVAL && (flags & O_DIRECT)) {
> +ret = open(name, flags & ~O_DIRECT, mode);
> +if (ret != -1) {
> +close(ret);
> +error_setg(errp, "Could not %s '%s' flags 0x%x: "
> +   "filesystem does not support O_DIRECT",
> +   action, name, flags);
> +errno = EINVAL; /* close() clobbered earlier errno */

More precise: close() may have clobbered

Sure open() can only fail with EINVAL here?

> +return -1;
> +}
> +}
> +#endif /* O_DIRECT */
>  error_setg_errno(errp, errno, "Could not %s '%s' flags 0x%x",
>   action, name, flags);
>  }

There is no qemu_set_cloexec().  Intentional?




Re: [PATCH v5 05/10] block: bdrv_mark_request_serialising: split non-waiting function

2020-08-25 Thread Max Reitz
On 21.08.20 16:11, Vladimir Sementsov-Ogievskiy wrote:
> We'll need a separate function, which will only "mark" request
> serialising with specified align but not wait for conflicting
> requests. So, it will be like old bdrv_mark_request_serialising(),
> before merging bdrv_wait_serialising_requests_locked() into it.
> 
> To reduce the possible mess, let's do the following:
> 
> Public function that does both marking and waiting will be called
> bdrv_make_request_serialising, and private function which will only
> "mark" will be called tracked_request_set_serialising().
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  include/block/block_int.h |  3 ++-
>  block/file-posix.c|  2 +-
>  block/io.c| 35 +++
>  3 files changed, 26 insertions(+), 14 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [PULL 00/34] Block patches

2020-08-25 Thread Alberto Garcia
On Tue 25 Aug 2020 04:01:14 PM CEST, Peter Maydell  
wrote:
> On Tue, 25 Aug 2020 at 09:33, Max Reitz  wrote:
>>
>> The following changes since commit 30aa19446d82358a30eac3b556b4d6641e00b7c1:
>>
>>   Merge remote-tracking branch 'remotes/cschoenebeck/tags/pull-9p-20200812' 
>> into staging (2020-08-24 16:39:53 +0100)
>>
>> are available in the Git repository at:
>>
>>   https://github.com/XanClic/qemu.git tags/pull-block-2020-08-25
>>
>> for you to fetch changes up to c576fd97d4ca77b5a1a27728df11a61083dbfa98:
>>
>>   iotests: Add tests for qcow2 images with extended L2 entries (2020-08-25 
>> 10:20:18 +0200)
>>
>> 
>> Block patches:
>> - qcow2 subclusters (extended L2 entries)
>
> This fails 'make check' on the BSDs because it assumes it has a bash
> in /bin/bash, which isn't necessarily true:

I guess it needs to be replaced by '#!/usr/bin/env bash', shall I send
the patches again, or can you fix it Max?

Berto



Re: [PATCH v5 06/10] block: introduce BDRV_REQ_NO_WAIT flag

2020-08-25 Thread Max Reitz
On 21.08.20 16:11, Vladimir Sementsov-Ogievskiy wrote:
> Add flag to make serialising request no wait: if there are conflicting
> requests, just return error immediately. It's will be used in upcoming
> preallocate filter.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  include/block/block.h |  9 -
>  block/io.c| 11 ++-
>  2 files changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/include/block/block.h b/include/block/block.h
> index b8f4e86e8d..877fda06a4 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -67,8 +67,15 @@ typedef enum {
>   * written to qiov parameter which may be NULL.
>   */
>  BDRV_REQ_PREFETCH  = 0x200,
> +
> +/*
> + * If we need to wait for other requests, just fail immediately. Used
> + * only together with BDRV_REQ_SERIALISING.
> + */
> +BDRV_REQ_NO_WAIT = 0x400,
> +
>  /* Mask of valid flags */
> -BDRV_REQ_MASK   = 0x3ff,
> +BDRV_REQ_MASK   = 0x7ff,
>  } BdrvRequestFlags;
>  
>  typedef struct BlockSizes {
> diff --git a/block/io.c b/block/io.c
> index dd28befb08..c93b1e98a3 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1912,9 +1912,18 @@ bdrv_co_write_req_prepare(BdrvChild *child, int64_t 
> offset, uint64_t bytes,
>  assert(!(bs->open_flags & BDRV_O_INACTIVE));
>  assert((bs->open_flags & BDRV_O_NO_IO) == 0);
>  assert(!(flags & ~BDRV_REQ_MASK));
> +assert(!((flags & BDRV_REQ_NO_WAIT) && !(flags & BDRV_REQ_SERIALISING)));
>  
>  if (flags & BDRV_REQ_SERIALISING) {
> -bdrv_make_request_serialising(req, bdrv_get_cluster_size(bs));
> +QEMU_LOCK_GUARD(>reqs_lock);
> +
> +tracked_request_set_serialising(req, bdrv_get_cluster_size(bs));
> +
> +if ((flags & BDRV_REQ_NO_WAIT) && 
> bdrv_find_conflicting_request(req)) {

bdrv_find_conflicting_request() will return NULL even if there are
conflicting requests, but those have a non-NULL waiting_for.  Is that
something to consider?

(I would like to think that will never have a real impact because then
we must find some other conflicting request; but isn’t is possible that
we find an overlapping request that waits for another request with which
it overlaps, but our request does not?)

Max

> +return -EBUSY;
> +}
> +
> +bdrv_wait_serialising_requests_locked(req);
>  } else {
>  bdrv_wait_serialising_requests(req);
>  }
> 




signature.asc
Description: OpenPGP digital signature


Re: [PULL 00/34] Block patches

2020-08-25 Thread Max Reitz
On 25.08.20 16:22, Alberto Garcia wrote:
> On Tue 25 Aug 2020 04:01:14 PM CEST, Peter Maydell  
> wrote:
>> On Tue, 25 Aug 2020 at 09:33, Max Reitz  wrote:
>>>
>>> The following changes since commit 30aa19446d82358a30eac3b556b4d6641e00b7c1:
>>>
>>>   Merge remote-tracking branch 'remotes/cschoenebeck/tags/pull-9p-20200812' 
>>> into staging (2020-08-24 16:39:53 +0100)
>>>
>>> are available in the Git repository at:
>>>
>>>   https://github.com/XanClic/qemu.git tags/pull-block-2020-08-25
>>>
>>> for you to fetch changes up to c576fd97d4ca77b5a1a27728df11a61083dbfa98:
>>>
>>>   iotests: Add tests for qcow2 images with extended L2 entries (2020-08-25 
>>> 10:20:18 +0200)
>>>
>>> 
>>> Block patches:
>>> - qcow2 subclusters (extended L2 entries)
>>
>> This fails 'make check' on the BSDs because it assumes it has a bash
>> in /bin/bash, which isn't necessarily true:

:(

> I guess it needs to be replaced by '#!/usr/bin/env bash', shall I send
> the patches again, or can you fix it Max?

Sure, I’ll fix it and send a v2 tomorrow.

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 0/1] qcow2: Skip copy-on-write when allocating a zero cluster

2020-08-25 Thread Alberto Garcia
On Fri 21 Aug 2020 07:02:32 PM CEST, Brian Foster wrote:
>> I was running fio with --ramp_time=5 which ignores the first 5 seconds
>> of data in order to let performance settle, but if I remove that I can
>> see the effect more clearly. I can observe it with raw files (in 'off'
>> and 'prealloc' modes) and qcow2 files in 'prealloc' mode. With qcow2 and
>> preallocation=off the performance is stable during the whole test.
>
> That's interesting. I ran your fio command (without --ramp_time and
> with --runtime=5m) against a file on XFS (so no qcow2, no zero_range)
> once with sparse file with a 64k extent size hint and again with a
> fully preallocated 25GB file and I saw similar results in terms of the
> delta.  This was just against an SSD backed vdisk in my local dev VM,
> but I saw ~5800 iops for the full preallocation test and ~6200 iops
> with the extent size hint.
>
> I do notice an initial iops burst as described for both tests, so I
> switched to use a 60s ramp time and 60s runtime. With that longer ramp
> up time, I see ~5000 iops with the 64k extent size hint and ~5500 iops
> with the full 25GB prealloc. Perhaps the unexpected performance delta
> with qcow2 is similarly transient towards the start of the test and
> the runtime is short enough that it skews the final results..?

I also tried running directly against a file on xfs (no qcow2, no VMs)
but it doesn't really matter whether I use --ramp_time=5 or 60.

Here are the results:

|---+---+---|
| preallocation |   xfs |  ext4 |
|---+---+---|
| off   |  7277 | 43260 |
| fallocate |  7299 | 42810 |
| full  | 88404 | 83197 |
|---+---+---|

I ran the first case (no preallocation) for 5 minutes and I said there's
a peak during the first 5 seconds, but then the number remains under 10k
IOPS for the rest of the 5 minutes.

Berto



Re: [PATCH v5 07/10] block: introduce preallocate filter

2020-08-25 Thread Max Reitz
On 21.08.20 16:11, Vladimir Sementsov-Ogievskiy wrote:
> It's intended to be inserted between format and protocol nodes to
> preallocate additional space (expanding protocol file) on writes
> crossing EOF. It improves performance for file-systems with slow
> allocation.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  docs/system/qemu-block-drivers.rst.inc |  26 +++
>  qapi/block-core.json   |  20 +-
>  block/preallocate.c| 291 +
>  block/Makefile.objs|   1 +
>  4 files changed, 337 insertions(+), 1 deletion(-)
>  create mode 100644 block/preallocate.c

Looks good to me in essence.  Besides minor details, I wonder most about
whether truncating the file on close can be safe, but more about that below.

> diff --git a/docs/system/qemu-block-drivers.rst.inc 
> b/docs/system/qemu-block-drivers.rst.inc
> index b052a6d14e..5e8a35c571 100644
> --- a/docs/system/qemu-block-drivers.rst.inc
> +++ b/docs/system/qemu-block-drivers.rst.inc
> @@ -952,3 +952,29 @@ on host and see if there are locks held by the QEMU 
> process on the image file.
>  More than one byte could be locked by the QEMU instance, each byte of which
>  reflects a particular permission that is acquired or protected by the running
>  block driver.
> +
> +Filter drivers
> +~~
> +
> +QEMU supports several filter drivers, which don't store any data, but do some

s/do/perform/

> +additional tasks, hooking io requests.
> +
> +.. program:: filter-drivers
> +.. option:: preallocate
> +
> +  Preallocate filter driver is intended to be inserted between format

*The preallocate filter driver

> +  and protocol nodes and does preallocation of some additional space

I’d simplify this as s/does preallocation of/preallocates/

> +  (expanding the protocol file) on write. It may be used for

I’d complicate this as s/on write/when writing past the file’s end/, or
“when data is written to the file after its end”, or at least “on
post-EOF writes”.

Maybe also s/It may be used for/This can be useful for/?

> +  file-systems with slow allocation.
> +
> +  Supported options:
> +
> +  .. program:: preallocate
> +  .. option:: prealloc-align
> +
> +On preallocation, align file length to this number, default 1M.

*the file length

As for “number”...  Well, it is a number.  But “value” might fit better.
 Or “length (in bytes)”?

> +
> +  .. program:: preallocate
> +  .. option:: prealloc-size
> +
> +How much to preallocate, default 128M.
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 197bdc1c36..b40448063b 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2805,7 +2805,7 @@
>  'cloop', 'compress', 'copy-on-read', 'dmg', 'file', 'ftp', 
> 'ftps',
>  'gluster', 'host_cdrom', 'host_device', 'http', 'https', 'iscsi',
>  'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme', 'parallels',
> -'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd',
> +'preallocate', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd',
>  { 'name': 'replication', 'if': 'defined(CONFIG_REPLICATION)' },
>  'sheepdog',
>  'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
> @@ -3074,6 +3074,23 @@
>'data': { 'aes': 'QCryptoBlockOptionsQCow',
>  'luks': 'QCryptoBlockOptionsLUKS'} }
>  
> +##
> +# @BlockdevOptionsPreallocate:
> +#
> +# Filter driver intended to be inserted between format and protocol node
> +# and do preallocation in protocol node on write.
> +#
> +# @prealloc-align: on preallocation, align file length to this number,
> +# default 1048576 (1M)

Speaking of alignment, this second line isn’t properly aligned.

> +#
> +# @prealloc-size: how much to preallocate, default 134217728 (128M)
> +#
> +# Since: 5.2
> +##
> +{ 'struct': 'BlockdevOptionsPreallocate',
> +  'base': 'BlockdevOptionsGenericFormat',
> +  'data': { '*prealloc-align': 'int', '*prealloc-size': 'int' } }
> +
>  ##
>  # @BlockdevOptionsQcow2:
>  #
> @@ -3979,6 +3996,7 @@
>'null-co':'BlockdevOptionsNull',
>'nvme':   'BlockdevOptionsNVMe',
>'parallels':  'BlockdevOptionsGenericFormat',
> +  'preallocate':'BlockdevOptionsPreallocate',
>'qcow2':  'BlockdevOptionsQcow2',
>'qcow':   'BlockdevOptionsQcow',
>'qed':'BlockdevOptionsGenericCOWFormat',
> diff --git a/block/preallocate.c b/block/preallocate.c
> new file mode 100644
> index 00..bdf54dbd2f
> --- /dev/null
> +++ b/block/preallocate.c
> @@ -0,0 +1,291 @@
> +/*
> + * preallocate filter driver
> + *
> + * The driver performs preallocate operation: it is injected above
> + * some node, and before each write over EOF it does additional preallocating
> + * write-zeroes request.
> + *
> + * Copyright (c) 2020 Virtuozzo International GmbH.
> + *
> + * Author:
> + *  Sementsov-Ogievskiy Vladimir 
> + *
> + * This program is free 

Re: [PATCH v4 5/6] util: give a specific error message when O_DIRECT doesn't work

2020-08-25 Thread Daniel P . Berrangé
On Tue, Aug 25, 2020 at 05:19:53PM +0200, Markus Armbruster wrote:
> Daniel P. Berrangé  writes:
> 
> > A common error scenario is to tell QEMU to use O_DIRECT in combination
> > with a filesystem that doesn't support it. To aid users to diagnosing
> > their mistake we want to provide a clear error message when this happens.
> >
> > Reviewed-by: Eric Blake 
> > Signed-off-by: Daniel P. Berrangé 
> > ---
> >  util/osdep.c | 13 +
> >  1 file changed, 13 insertions(+)
> >
> > diff --git a/util/osdep.c b/util/osdep.c
> > index a4956fbf6b..6c24985f7a 100644
> > --- a/util/osdep.c
> > +++ b/util/osdep.c
> > @@ -345,6 +345,19 @@ qemu_open_internal(const char *name, int flags, mode_t 
> > mode, Error **errp)
> >  
> >  if (ret == -1) {
> >  const char *action = flags & O_CREAT ? "create" : "open";
> > +#ifdef O_DIRECT
> > +if (errno == EINVAL && (flags & O_DIRECT)) {
> > +ret = open(name, flags & ~O_DIRECT, mode);
> > +if (ret != -1) {
> > +close(ret);
> > +error_setg(errp, "Could not %s '%s' flags 0x%x: "
> > +   "filesystem does not support O_DIRECT",
> > +   action, name, flags);
> > +errno = EINVAL; /* close() clobbered earlier errno */
> 
> More precise: close() may have clobbered

Either open or close in fact.

> 
> Sure open() can only fail with EINVAL here?

We don't care about the errno from the open() call seen in this context,
we're restoring the EINVAL from the previous open() call above this patch
contxt, that we match on with  "if (errno == EINVAL && ...)" line.

> 
> > +return -1;
> > +}
> > +}
> > +#endif /* O_DIRECT */
> >  error_setg_errno(errp, errno, "Could not %s '%s' flags 0x%x",
> >   action, name, flags);
> >  }
> 
> There is no qemu_set_cloexec().  Intentional?

We've called close() immediately after open(). Adding qemu_set_cloexec()
doesnt make it any less racy on platforms lacking O_CLOSEXEC

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v4 3/6] util: add Error object for qemu_open_internal error reporting

2020-08-25 Thread Daniel P . Berrangé
On Tue, Aug 25, 2020 at 05:14:21PM +0200, Markus Armbruster wrote:
> Daniel P. Berrangé  writes:
> 
> > Instead of relying on the limited information from errno, we can now
> > also provide detailed error messages.
> 
> The more detailed error messages are currently always ignored, but the
> next patches will fix that.
> 
> > Signed-off-by: Daniel P. Berrangé 
> > ---
> >  util/osdep.c | 21 +++--
> >  1 file changed, 19 insertions(+), 2 deletions(-)
> >
> > diff --git a/util/osdep.c b/util/osdep.c
> > index 9ff92551e7..9c7118d3cb 100644
> > --- a/util/osdep.c
> > +++ b/util/osdep.c
> > @@ -284,7 +284,7 @@ int qemu_lock_fd_test(int fd, int64_t start, int64_t 
> > len, bool exclusive)
> >   * Opens a file with FD_CLOEXEC set
> >   */
> >  static int
> > -qemu_open_internal(const char *name, int flags, mode_t mode)
> > +qemu_open_internal(const char *name, int flags, mode_t mode, Error **errp)
> >  {
> >  int ret;
> >  
> > @@ -298,24 +298,31 @@ qemu_open_internal(const char *name, int flags, 
> > mode_t mode)
> >  
> >  fdset_id = qemu_parse_fdset(fdset_id_str);
> >  if (fdset_id == -1) {
> > +error_setg(errp, "Could not parse fdset %s", name);
> >  errno = EINVAL;
> >  return -1;
> >  }
> >  
> >  fd = monitor_fdset_get_fd(fdset_id, flags);
> >  if (fd < 0) {
> > +error_setg_errno(errp, -fd, "Could not acquire FD for %s flags 
> > %x",
> > + name, flags);
> >  errno = -fd;
> >  return -1;
> >  }
> >  
> >  dupfd = qemu_dup_flags(fd, flags);
> >  if (dupfd == -1) {
> > +error_setg_errno(errp, errno, "Could not dup FD for %s flags 
> > %x",
> > + name, flags);
> >  return -1;
> >  }
> >  
> >  ret = monitor_fdset_dup_fd_add(fdset_id, dupfd);
> >  if (ret == -1) {
> >  close(dupfd);
> > +error_setg(errp, "Could not save FD for %s flags %x",
> > +   name, flags);
> 
> Can this happen?

Well there's code in monitor_fdset_dup_fd_add that can return -1.

> 
> >  errno = EINVAL;
> >  return -1;
> >  }
> > @@ -336,6 +343,16 @@ qemu_open_internal(const char *name, int flags, mode_t 
> > mode)
> >  }
> >  #endif /* ! O_CLOEXEC */
> >  
> > +if (ret == -1) {
> > +const char *action = "open";
> > +if (flags & O_CREAT) {
> > +action = "create";
> > +}
> > +error_setg_errno(errp, errno, "Could not %s '%s' flags 0x%x",
> > + action, name, flags);
> 
> Not a good user experience:
> 
> Could not open '/etc/shadow' flags 0x0: Permission denied
> 
> Better:
> 
> Could not open '/etc/shadow' for reading: Permission denied
> 
> Are you sure flags other than the access mode (O_RDONLY, O_WRONLY,
> O_RDWR) must be included in the error message?

It was the flags other than access mode that I was thinking were
more important to log. I'm ambivalent htough, so can drop the
flags if it is thought to be overkill.

> 
> If you must report flags in hexadecimal, then please reporting them more
> consistently.  Right now you have
> 
> for %s flags 0x%x
> '%s' flags %x
> 
> Perhaps '%s' with flags 0x%x
> 
> > +}
> > +
> > +
> >  return ret;
> >  }
> >  
> > @@ -352,7 +369,7 @@ int qemu_open_old(const char *name, int flags, ...)
> >  }
> >  va_end(ap);
> >  
> > -ret = qemu_open_internal(name, flags, mode);
> > +ret = qemu_open_internal(name, flags, mode, NULL);
> >  
> >  #ifdef O_DIRECT
> >  if (ret == -1 && errno == EINVAL && (flags & O_DIRECT)) {
> 
> 

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




[PATCH v3 72/74] [automated] Remove redundant instance_size/class_size fields

2020-08-25 Thread Eduardo Habkost
This will remove instance_size/class_size fields from TypeInfo
variables when the value is exactly the same as the one in the
parent class.

Generated by:

 $ ./scripts/codeconverter/converter.py -i \
   --pattern=RedundantTypeSizes $(git grep -l TypeInfo -- '*.[ch]')

Signed-off-by: Eduardo Habkost 
---
Changes series v2 -> v3: this is a new patch in series v3

---
Cc: "Marc-André Lureau" 
Cc: Paolo Bonzini 
Cc: "Daniel P. Berrangé" 
Cc: "Cédric Le Goater" 
Cc: Peter Maydell 
Cc: Andrew Jeffery 
Cc: Joel Stanley 
Cc: Jan Kiszka 
Cc: Eduardo Habkost 
Cc: "Michael S. Tsirkin" 
Cc: Marcel Apfelbaum 
Cc: Richard Henderson 
Cc: Stefano Stabellini 
Cc: Anthony Perard 
Cc: Paul Durrant 
Cc: Gerd Hoffmann 
Cc: Aleksandar Markovic 
Cc: "Philippe Mathieu-Daudé" 
Cc: Aurelien Jarno 
Cc: Jiaxun Yang 
Cc: Aleksandar Rikalo 
Cc: Laurent Vivier 
Cc: Mark Cave-Ayland 
Cc: David Gibson 
Cc: Cornelia Huck 
Cc: Thomas Huth 
Cc: David Hildenbrand 
Cc: Halil Pasic 
Cc: Christian Borntraeger 
Cc: Fam Zheng 
Cc: Beniamino Galvani 
Cc: Andrew Baumann 
Cc: Michael Walle 
Cc: Andrzej Zaborowski 
Cc: Yoshinori Sato 
Cc: Alex Williamson 
Cc: qemu-de...@nongnu.org
Cc: qemu-...@nongnu.org
Cc: xen-de...@lists.xenproject.org
Cc: qemu-...@nongnu.org
Cc: qemu-s3...@nongnu.org
Cc: qemu-block@nongnu.org

---
Cc: "Marc-André Lureau" 
Cc: Paolo Bonzini 
Cc: "Daniel P. Berrangé" 
Cc: "Cédric Le Goater" 
Cc: Peter Maydell 
Cc: Andrew Jeffery 
Cc: Joel Stanley 
Cc: Jan Kiszka 
Cc: Eduardo Habkost 
Cc: "Michael S. Tsirkin" 
Cc: Marcel Apfelbaum 
Cc: Richard Henderson 
Cc: Stefano Stabellini 
Cc: Anthony Perard 
Cc: Paul Durrant 
Cc: Gerd Hoffmann 
Cc: Aleksandar Markovic 
Cc: "Philippe Mathieu-Daudé" 
Cc: Aurelien Jarno 
Cc: Jiaxun Yang 
Cc: Aleksandar Rikalo 
Cc: Laurent Vivier 
Cc: Mark Cave-Ayland 
Cc: David Gibson 
Cc: Cornelia Huck 
Cc: Halil Pasic 
Cc: Christian Borntraeger 
Cc: Thomas Huth 
Cc: David Hildenbrand 
Cc: Fam Zheng 
Cc: Beniamino Galvani 
Cc: Andrew Baumann 
Cc: Michael Walle 
Cc: Andrzej Zaborowski 
Cc: Yoshinori Sato 
Cc: Alex Williamson 
Cc: qemu-de...@nongnu.org
Cc: qemu-...@nongnu.org
Cc: xen-de...@lists.xenproject.org
Cc: qemu-...@nongnu.org
Cc: qemu-s3...@nongnu.org
Cc: qemu-block@nongnu.org
---
 chardev/char-null.c| 1 -
 crypto/tls-cipher-suites.c | 2 --
 hw/arm/aspeed_ast2600.c| 2 --
 hw/arm/aspeed_soc.c| 2 --
 hw/arm/musicpal.c  | 1 -
 hw/core/sysbus.c   | 1 -
 hw/i386/kvm/apic.c | 1 -
 hw/i386/pc_piix.c  | 1 -
 hw/i386/xen/xen_apic.c | 1 -
 hw/input/virtio-input-hid.c| 3 ---
 hw/intc/apic.c | 1 -
 hw/intc/ioapic.c   | 1 -
 hw/isa/isa-bus.c   | 1 -
 hw/mips/gt64xxx_pci.c  | 1 -
 hw/misc/aspeed_scu.c   | 3 ---
 hw/misc/ivshmem.c  | 2 --
 hw/nubus/nubus-bridge.c| 1 -
 hw/pci-bridge/dec.c| 2 --
 hw/pci-bridge/pci_bridge_dev.c | 1 -
 hw/pci-host/grackle.c  | 1 -
 hw/pci-host/uninorth.c | 4 
 hw/pci-host/versatile.c| 1 -
 hw/pci-host/xen_igd_pt.c   | 1 -
 hw/ppc/pnv_homer.c | 2 --
 hw/ppc/pnv_occ.c   | 2 --
 hw/ppc/ppc4xx_pci.c| 1 -
 hw/s390x/s390-skeys-kvm.c  | 2 --
 hw/s390x/sclpcpu.c | 2 --
 hw/s390x/sclpquiesce.c | 2 --
 hw/s390x/tod-kvm.c | 2 --
 hw/s390x/tod-qemu.c| 2 --
 hw/s390x/virtio-ccw-input.c| 3 ---
 hw/scsi/scsi-generic.c | 1 -
 hw/sd/allwinner-sdhost.c   | 1 -
 hw/sd/bcm2835_sdhost.c | 1 -
 hw/sd/milkymist-memcard.c  | 1 -
 hw/sd/pl181.c  | 1 -
 hw/sd/pxa2xx_mmci.c| 1 -
 hw/sd/sdhci.c  | 1 -
 hw/sh4/sh_pci.c| 1 -
 hw/timer/pxa2xx_timer.c| 2 --
 hw/vfio/pci.c  | 1 -
 hw/virtio/virtio-mmio.c| 1 -
 hw/watchdog/wdt_aspeed.c   | 3 ---
 hw/xen/xen-legacy-backend.c| 1 -
 45 files changed, 69 deletions(-)

diff --git a/chardev/char-null.c b/chardev/char-null.c
index ce43ccdda6..2736b2ff2b 100644
--- a/chardev/char-null.c
+++ b/chardev/char-null.c
@@ -44,7 +44,6 @@ static void char_null_class_init(ObjectClass *oc, void *data)
 static const TypeInfo char_null_type_info = {
 .name = TYPE_CHARDEV_NULL,
 .parent = TYPE_CHARDEV,
-.instance_size = sizeof(Chardev),
 .class_init = char_null_class_init,
 };
 TYPE_INFO(char_null_type_info)
diff --git a/crypto/tls-cipher-suites.c b/crypto/tls-cipher-suites.c
index e92a380a24..0c9713d301 100644
--- a/crypto/tls-cipher-suites.c
+++ b/crypto/tls-cipher-suites.c
@@ -108,8 +108,6 @@ static void 
qcrypto_tls_cipher_suites_class_init(ObjectClass *oc, void *data)
 static const TypeInfo qcrypto_tls_cipher_suites_info = {
 .parent = TYPE_QCRYPTO_TLS_CREDS,
 .name = TYPE_QCRYPTO_TLS_CIPHER_SUITES,
-.instance_size = sizeof(QCryptoTLSCreds),
-.class_size = sizeof(QCryptoTLSCredsClass),
 .class_init = 

Re: [PATCH v3] block: Raise an error when backing file parameter is an empty string

2020-08-25 Thread Connor Kuehl

On 8/13/20 8:47 AM, Connor Kuehl wrote:

Providing an empty string for the backing file parameter like so:

qemu-img create -f qcow2 -b '' /tmp/foo

allows the flow of control to reach and subsequently fail an assert
statement because passing an empty string to

bdrv_get_full_backing_filename_from_filename()

simply results in NULL being returned without an error being raised.

To fix this, let's check for an empty string when getting the value from
the opts list.

Reported-by: Attila Fazekas 
Fixes: https://bugzilla.redhat.com/1809553
Signed-off-by: Connor Kuehl 
---
v3:
   - Moved test case into 049 instead of taking up
 298.

v2:
   - Removed 4 spaces to resolve pylint warning
   - Updated format to be 'iotests.imgfmt' instead
 of hardcoding 'qcow2'
   - Use temporary file instead of '/tmp/foo'
   - Give a size parameter to qemu-img
   - Run test for qcow2, qcow, qed and *not* raw


Ping




[PATCH v3 16/74] throttle-groups: Move ThrottleGroup typedef to header

2020-08-25 Thread Eduardo Habkost
Move typedef closer to the type check macros, to make it easier
to convert the code to OBJECT_DEFINE_TYPE() in the future.

Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Daniel P. Berrangé 
Signed-off-by: Eduardo Habkost 
---
Changes v2 -> v3: none

Changes v1 -> v2: none

---
Cc: Alberto Garcia 
Cc: Kevin Wolf 
Cc: Max Reitz 
Cc: qemu-block@nongnu.org
Cc: qemu-de...@nongnu.org
---
 include/block/throttle-groups.h | 1 +
 block/throttle-groups.c | 4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/block/throttle-groups.h b/include/block/throttle-groups.h
index 712a8e64b4..5e77db700f 100644
--- a/include/block/throttle-groups.h
+++ b/include/block/throttle-groups.h
@@ -59,6 +59,7 @@ typedef struct ThrottleGroupMember {
 } ThrottleGroupMember;
 
 #define TYPE_THROTTLE_GROUP "throttle-group"
+typedef struct ThrottleGroup ThrottleGroup;
 #define THROTTLE_GROUP(obj) OBJECT_CHECK(ThrottleGroup, (obj), 
TYPE_THROTTLE_GROUP)
 
 const char *throttle_group_get_name(ThrottleGroupMember *tgm);
diff --git a/block/throttle-groups.c b/block/throttle-groups.c
index 98fea7fd47..4e28365d8d 100644
--- a/block/throttle-groups.c
+++ b/block/throttle-groups.c
@@ -63,7 +63,7 @@ static void timer_cb(ThrottleGroupMember *tgm, bool is_write);
  * access some other ThrottleGroupMember's timers only after verifying that
  * that ThrottleGroupMember has throttled requests in the queue.
  */
-typedef struct ThrottleGroup {
+struct ThrottleGroup {
 Object parent_obj;
 
 /* refuse individual property change if initialization is complete */
@@ -79,7 +79,7 @@ typedef struct ThrottleGroup {
 
 /* This field is protected by the global QEMU mutex */
 QTAILQ_ENTRY(ThrottleGroup) list;
-} ThrottleGroup;
+};
 
 /* This is protected by the global QEMU mutex */
 static QTAILQ_HEAD(, ThrottleGroup) throttle_groups =
-- 
2.26.2




[PATCH v3 30/74] ahci: Move QOM macros to header

2020-08-25 Thread Eduardo Habkost
The TYPE_* constants and the typedefs are defined in ahci.h, so
we can move the type checking macros there too.

This will make future conversion to OBJECT_DECLARE* easier.

Reviewed-by: Daniel P. Berrangé 
Signed-off-by: Eduardo Habkost 
---
Changes v2 -> v3: none

Changes series v1 -> v2: new patch in series v2

Cc: John Snow 
Cc: qemu-block@nongnu.org
Cc: qemu-de...@nongnu.org
---
 hw/ide/ahci_internal.h | 5 -
 include/hw/ide/ahci.h  | 3 +++
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/hw/ide/ahci_internal.h b/hw/ide/ahci_internal.h
index bab0459774..ac9bdead7b 100644
--- a/hw/ide/ahci_internal.h
+++ b/hw/ide/ahci_internal.h
@@ -332,9 +332,6 @@ struct AHCIPCIState {
 AHCIState ahci;
 };
 
-#define ICH_AHCI(obj) \
-OBJECT_CHECK(AHCIPCIState, (obj), TYPE_ICH9_AHCI)
-
 extern const VMStateDescription vmstate_ahci;
 
 #define VMSTATE_AHCI(_field, _state) {   \
@@ -394,6 +391,4 @@ void ahci_uninit(AHCIState *s);
 
 void ahci_reset(AHCIState *s);
 
-#define SYSBUS_AHCI(obj) OBJECT_CHECK(SysbusAHCIState, (obj), TYPE_SYSBUS_AHCI)
-
 #endif /* HW_IDE_AHCI_INTERNAL_H */
diff --git a/include/hw/ide/ahci.h b/include/hw/ide/ahci.h
index b44e3000cf..ce2bf8a5f8 100644
--- a/include/hw/ide/ahci.h
+++ b/include/hw/ide/ahci.h
@@ -53,11 +53,14 @@ typedef struct AHCIState {
 typedef struct AHCIPCIState AHCIPCIState;
 
 #define TYPE_ICH9_AHCI "ich9-ahci"
+#define ICH_AHCI(obj) \
+OBJECT_CHECK(AHCIPCIState, (obj), TYPE_ICH9_AHCI)
 
 int32_t ahci_get_num_ports(PCIDevice *dev);
 void ahci_ide_create_devs(PCIDevice *dev, DriveInfo **hd);
 
 #define TYPE_SYSBUS_AHCI "sysbus-ahci"
+#define SYSBUS_AHCI(obj) OBJECT_CHECK(SysbusAHCIState, (obj), TYPE_SYSBUS_AHCI)
 
 typedef struct SysbusAHCIState {
 /*< private >*/
-- 
2.26.2




[PATCH v3 68/74] [semi-automated] Use DECLARE_*CHECKER* when possible (--force mode)

2020-08-25 Thread Eduardo Habkost
Separate run of the TypeCheckMacro converter using the --force
flag, for the cases where typedefs weren't found in the same
header nor in typedefs.h.

Generated initially using:

 $ ./scripts/codeconverter/converter.py --force -i \
   --pattern=TypeCheckMacro $(git grep -l '' -- '*.[ch]')

Then each case was manually reviewed, and a comment was added
indicating what's unusual about those type checking
macros/functions.  Despite not following the usual pattern, the
changes in this patch were found to be safe.

Reviewed-by: Daniel P. Berrangé 
Signed-off-by: Eduardo Habkost 
---
Changes v2 -> v3: none

Changes v1 -> v2:
* Most of the old changes in this patch are now being handled by
  the regular TypeCheckMacro patch (without --force mode)
* Added comments added explaining why these unusual changes
  remain

---
Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: Peter Maydell 
Cc: Beniamino Galvani 
Cc: Andrew Baumann 
Cc: "Philippe Mathieu-Daudé" 
Cc: Andrzej Zaborowski 
Cc: David Gibson 
Cc: qemu-de...@nongnu.org
Cc: qemu-...@nongnu.org
Cc: qemu-block@nongnu.org
Cc: qemu-...@nongnu.org
---
 include/hw/intc/arm_gic.h   | 9 +++--
 include/hw/intc/arm_gicv3.h | 8 +++-
 include/hw/ppc/xics_spapr.h | 4 +++-
 include/hw/virtio/virtio-mmio.h | 9 +++--
 hw/intc/apic.c  | 5 +++--
 hw/intc/arm_gic_kvm.c   | 9 +++--
 hw/intc/arm_gicv3_its_kvm.c | 8 +++-
 hw/intc/arm_gicv3_kvm.c | 9 +++--
 hw/sd/allwinner-sdhost.c| 5 +++--
 hw/sd/bcm2835_sdhost.c  | 5 +++--
 hw/sd/pxa2xx_mmci.c | 4 +++-
 hw/sd/sdhci.c   | 4 +++-
 12 files changed, 36 insertions(+), 43 deletions(-)

diff --git a/include/hw/intc/arm_gic.h b/include/hw/intc/arm_gic.h
index 704ef2b751..116ccbb5a9 100644
--- a/include/hw/intc/arm_gic.h
+++ b/include/hw/intc/arm_gic.h
@@ -74,12 +74,9 @@
 
 #define TYPE_ARM_GIC "arm_gic"
 typedef struct ARMGICClass ARMGICClass;
-#define ARM_GIC(obj) \
- OBJECT_CHECK(GICState, (obj), TYPE_ARM_GIC)
-#define ARM_GIC_CLASS(klass) \
- OBJECT_CLASS_CHECK(ARMGICClass, (klass), TYPE_ARM_GIC)
-#define ARM_GIC_GET_CLASS(obj) \
- OBJECT_GET_CLASS(ARMGICClass, (obj), TYPE_ARM_GIC)
+/* This is reusing the GICState typedef from TYPE_ARM_GIC_COMMON */
+DECLARE_OBJ_CHECKERS(GICState, ARMGICClass,
+ ARM_GIC, TYPE_ARM_GIC)
 
 struct ARMGICClass {
 /*< private >*/
diff --git a/include/hw/intc/arm_gicv3.h b/include/hw/intc/arm_gicv3.h
index 58e9131a33..a81a6ae7ec 100644
--- a/include/hw/intc/arm_gicv3.h
+++ b/include/hw/intc/arm_gicv3.h
@@ -17,11 +17,9 @@
 
 #define TYPE_ARM_GICV3 "arm-gicv3"
 typedef struct ARMGICv3Class ARMGICv3Class;
-#define ARM_GICV3(obj) OBJECT_CHECK(GICv3State, (obj), TYPE_ARM_GICV3)
-#define ARM_GICV3_CLASS(klass) \
- OBJECT_CLASS_CHECK(ARMGICv3Class, (klass), TYPE_ARM_GICV3)
-#define ARM_GICV3_GET_CLASS(obj) \
- OBJECT_GET_CLASS(ARMGICv3Class, (obj), TYPE_ARM_GICV3)
+/* This is reusing the GICState typedef from TYPE_ARM_GICV3_COMMON */
+DECLARE_OBJ_CHECKERS(GICv3State, ARMGICv3Class,
+ ARM_GICV3, TYPE_ARM_GICV3)
 
 struct ARMGICv3Class {
 /*< private >*/
diff --git a/include/hw/ppc/xics_spapr.h b/include/hw/ppc/xics_spapr.h
index 09e428de4e..0b8182e40b 100644
--- a/include/hw/ppc/xics_spapr.h
+++ b/include/hw/ppc/xics_spapr.h
@@ -31,7 +31,9 @@
 #include "qom/object.h"
 
 #define TYPE_ICS_SPAPR "ics-spapr"
-#define ICS_SPAPR(obj) OBJECT_CHECK(ICSState, (obj), TYPE_ICS_SPAPR)
+/* This is reusing the ICSState typedef from TYPE_ICS */
+DECLARE_INSTANCE_CHECKER(ICSState, ICS_SPAPR,
+ TYPE_ICS_SPAPR)
 
 int xics_kvm_connect(SpaprInterruptController *intc, uint32_t nr_servers,
  Error **errp);
diff --git a/include/hw/virtio/virtio-mmio.h b/include/hw/virtio/virtio-mmio.h
index dca651fd14..6a1c2c20d4 100644
--- a/include/hw/virtio/virtio-mmio.h
+++ b/include/hw/virtio/virtio-mmio.h
@@ -28,12 +28,9 @@
 /* QOM macros */
 /* virtio-mmio-bus */
 #define TYPE_VIRTIO_MMIO_BUS "virtio-mmio-bus"
-#define VIRTIO_MMIO_BUS(obj) \
-OBJECT_CHECK(VirtioBusState, (obj), TYPE_VIRTIO_MMIO_BUS)
-#define VIRTIO_MMIO_BUS_GET_CLASS(obj) \
-OBJECT_GET_CLASS(VirtioBusClass, (obj), TYPE_VIRTIO_MMIO_BUS)
-#define VIRTIO_MMIO_BUS_CLASS(klass) \
-OBJECT_CLASS_CHECK(VirtioBusClass, (klass), TYPE_VIRTIO_MMIO_BUS)
+/* This is reusing the VirtioBusState typedef from TYPE_VIRTIO_BUS */
+DECLARE_OBJ_CHECKERS(VirtioBusState, VirtioBusClass,
+ VIRTIO_MMIO_BUS, TYPE_VIRTIO_MMIO_BUS)
 
 /* virtio-mmio */
 #define TYPE_VIRTIO_MMIO "virtio-mmio"
diff --git a/hw/intc/apic.c b/hw/intc/apic.c
index afbb653497..dadbfd9a75 100644
--- a/hw/intc/apic.c
+++ b/hw/intc/apic.c
@@ -40,8 +40,9 @@
 static APICCommonState *local_apics[MAX_APICS + 1];
 
 #define TYPE_APIC "apic"
-#define APIC(obj) \
-OBJECT_CHECK(APICCommonState, (obj), TYPE_APIC)
+/*This is reusing the APICCommonState typedef from APIC_COMMON 

[PATCH v3 69/74] [automated] Use OBJECT_DECLARE_TYPE where possible

2020-08-25 Thread Eduardo Habkost
Replace DECLARE_OBJ_CHECKERS with OBJECT_DECLARE_TYPE where the
typedefs can be safely removed.

Generated running:

$ ./scripts/codeconverter/converter.py -i \
  --pattern=DeclareObjCheckers $(git grep -l '' -- '*.[ch]')

Reviewed-by: Daniel P. Berrangé 
Signed-off-by: Eduardo Habkost 
---
Changes v2 -> v3:
* Removed hunk due to rebase conflict: include/hw/ppc/xive.h
* Reviewed-by line from Daniel was kept, as no additional hunks
  are introduced in this version

Changes v1 -> v2:
* Script re-run after typedefs and macros were moved, and now the
  patch also touches:
  - TYPE_ARM_SSE
  - TYPE_SD_BUS

Signed-off-by: Eduardo Habkost 

---
Cc: "Marc-André Lureau" 
Cc: Gerd Hoffmann 
Cc: "Michael S. Tsirkin" 
Cc: "Daniel P. Berrangé" 
Cc: Peter Maydell 
Cc: Corey Minyard 
Cc: "Cédric Le Goater" 
Cc: David Gibson 
Cc: Cornelia Huck 
Cc: Thomas Huth 
Cc: Halil Pasic 
Cc: Christian Borntraeger 
Cc: "Philippe Mathieu-Daudé" 
Cc: Alistair Francis 
Cc: David Hildenbrand 
Cc: Laurent Vivier 
Cc: Amit Shah 
Cc: Stefano Stabellini 
Cc: Anthony Perard 
Cc: Paul Durrant 
Cc: Paolo Bonzini 
Cc: Fam Zheng 
Cc: "Gonglei (Arei)" 
Cc: Eduardo Habkost 
Cc: Igor Mammedov 
Cc: Stefan Berger 
Cc: Richard Henderson 
Cc: Michael Rolnik 
Cc: Sarah Harris 
Cc: "Edgar E. Iglesias" 
Cc: Michael Walle 
Cc: Aleksandar Markovic 
Cc: Aurelien Jarno 
Cc: Jiaxun Yang 
Cc: Aleksandar Rikalo 
Cc: Anthony Green 
Cc: Chris Wulff 
Cc: Marek Vasut 
Cc: Stafford Horne 
Cc: Palmer Dabbelt 
Cc: Sagar Karandikar 
Cc: Bastian Koppelmann 
Cc: Yoshinori Sato 
Cc: Mark Cave-Ayland 
Cc: Artyom Tarasenko 
Cc: Guan Xuetao 
Cc: Max Filippov 
Cc: qemu-de...@nongnu.org
Cc: qemu-...@nongnu.org
Cc: qemu-...@nongnu.org
Cc: qemu-s3...@nongnu.org
Cc: qemu-block@nongnu.org
Cc: xen-de...@lists.xenproject.org
Cc: qemu-ri...@nongnu.org
---
 hw/audio/intel-hda.h| 6 ++
 hw/display/virtio-vga.h | 6 ++
 include/authz/base.h| 6 ++
 include/authz/list.h| 6 ++
 include/authz/listfile.h| 6 ++
 include/authz/pamacct.h | 6 ++
 include/authz/simple.h  | 6 ++
 include/crypto/secret_common.h  | 6 ++
 include/crypto/secret_keyring.h | 6 ++
 include/hw/arm/armsse.h | 6 ++
 include/hw/hyperv/vmbus.h   | 6 ++
 include/hw/i2c/i2c.h| 6 ++
 include/hw/i2c/smbus_slave.h| 6 ++
 include/hw/ipack/ipack.h| 6 ++
 include/hw/ipmi/ipmi.h  | 6 ++
 include/hw/mem/pc-dimm.h| 6 ++
 include/hw/ppc/pnv.h| 6 ++
 include/hw/ppc/pnv_core.h   | 6 ++
 include/hw/ppc/pnv_homer.h  | 6 ++
 include/hw/ppc/pnv_occ.h| 6 ++
 include/hw/ppc/pnv_psi.h| 6 ++
 include/hw/ppc/pnv_xive.h   | 6 ++
 include/hw/ppc/spapr_cpu_core.h | 6 ++
 include/hw/ppc/spapr_drc.h  | 6 ++
 include/hw/ppc/spapr_vio.h  | 6 ++
 include/hw/ppc/spapr_xive.h | 6 ++
 include/hw/ppc/xics.h   | 6 ++
 include/hw/s390x/event-facility.h   | 6 ++
 include/hw/s390x/s390_flic.h| 6 ++
 include/hw/s390x/sclp.h | 6 ++
 include/hw/sd/sd.h  | 6 ++
 include/hw/ssi/ssi.h| 6 ++
 include/hw/sysbus.h | 6 ++
 include/hw/virtio/virtio-gpu.h  | 6 ++
 include/hw/virtio/virtio-input.h| 6 ++
 include/hw/virtio/virtio-mem.h  | 6 ++
 include/hw/virtio/virtio-pmem.h | 6 ++
 include/hw/virtio/virtio-serial.h   | 6 ++
 include/hw/xen/xen-bus.h| 6 ++
 include/io/channel.h| 6 ++
 include/io/dns-resolver.h   | 6 ++
 include/io/net-listener.h   | 6 ++
 include/scsi/pr-manager.h   | 6 ++
 include/sysemu/cryptodev.h  | 6 ++
 include/sysemu/hostmem.h| 6 ++
 include/sysemu/rng.h| 6 ++
 include/sysemu/tpm_backend.h| 6 ++
 include/sysemu/vhost-user-backend.h | 6 ++
 target/alpha/cpu-qom.h  | 6 ++
 target/arm/cpu-qom.h| 6 ++
 target/avr/cpu-qom.h| 6 ++
 target/cris/cpu-qom.h   | 6 ++
 target/hppa/cpu-qom.h   | 6 ++
 target/i386/cpu-qom.h   | 6 ++
 target/lm32/cpu-qom.h   | 6 ++
 target/m68k/cpu-qom.h   | 6 ++
 target/microblaze/cpu-qom.h | 6 ++
 target/mips/cpu-qom.h   | 6 ++
 target/moxie/cpu.h  | 6 ++
 target/nios2/cpu.h  | 6 ++
 target/openrisc/cpu.h   | 6 ++
 target/ppc/cpu-qom.h| 6 ++
 target/riscv/cpu.h  | 6 ++
 target/s390x/cpu-qom.h  | 6 ++
 target/sh4/cpu-qom.h| 6 ++
 target/sparc/cpu-qom.h  | 6 ++
 target/tilegx/cpu.h   

[PATCH v3 49/74] swim: Rename struct SWIM to Swim

2020-08-25 Thread Eduardo Habkost
Currently we have a SWIM typedef and a SWIM type checking macro,
but OBJECT_DECLARE* would transform the SWIM macro into a
function, and the function name would conflict with the SWIM
typedef name.

Rename the struct and typedef to "Swim". This will make future
conversion to OBJECT_DECLARE* easier.

Signed-off-by: Eduardo Habkost 
---
Changes series v2 -> v3: new patch added to series v3

---
Cc: Laurent Vivier 
Cc: Kevin Wolf 
Cc: Max Reitz 
Cc: qemu-block@nongnu.org
Cc: qemu-de...@nongnu.org

Signed-off-by: Eduardo Habkost 
---
 include/hw/block/swim.h |  6 +++---
 hw/block/swim.c | 10 +-
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/hw/block/swim.h b/include/hw/block/swim.h
index 6add3499d0..9d8b65c561 100644
--- a/include/hw/block/swim.h
+++ b/include/hw/block/swim.h
@@ -67,10 +67,10 @@ struct SWIMCtrl {
 };
 
 #define TYPE_SWIM "swim"
-#define SWIM(obj) OBJECT_CHECK(SWIM, (obj), TYPE_SWIM)
+#define SWIM(obj) OBJECT_CHECK(Swim, (obj), TYPE_SWIM)
 
-typedef struct SWIM {
+typedef struct Swim {
 SysBusDevice parent_obj;
 SWIMCtrl ctrl;
-} SWIM;
+} Swim;
 #endif
diff --git a/hw/block/swim.c b/hw/block/swim.c
index 74f56e8f46..20133a814c 100644
--- a/hw/block/swim.c
+++ b/hw/block/swim.c
@@ -387,7 +387,7 @@ static const MemoryRegionOps swimctrl_mem_ops = {
 
 static void sysbus_swim_reset(DeviceState *d)
 {
-SWIM *sys = SWIM(d);
+Swim *sys = SWIM(d);
 SWIMCtrl *ctrl = >ctrl;
 int i;
 
@@ -408,7 +408,7 @@ static void sysbus_swim_reset(DeviceState *d)
 static void sysbus_swim_init(Object *obj)
 {
 SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
-SWIM *sbs = SWIM(obj);
+Swim *sbs = SWIM(obj);
 SWIMCtrl *swimctrl = >ctrl;
 
 memory_region_init_io(>iomem, obj, _mem_ops, swimctrl,
@@ -418,7 +418,7 @@ static void sysbus_swim_init(Object *obj)
 
 static void sysbus_swim_realize(DeviceState *dev, Error **errp)
 {
-SWIM *sys = SWIM(dev);
+Swim *sys = SWIM(dev);
 SWIMCtrl *swimctrl = >ctrl;
 
 qbus_create_inplace(>bus, sizeof(SWIMBus), TYPE_SWIM_BUS, dev,
@@ -460,7 +460,7 @@ static const VMStateDescription vmstate_sysbus_swim = {
 .name = "SWIM",
 .version_id = 1,
 .fields = (VMStateField[]) {
-VMSTATE_STRUCT(ctrl, SWIM, 0, vmstate_swim, SWIMCtrl),
+VMSTATE_STRUCT(ctrl, Swim, 0, vmstate_swim, SWIMCtrl),
 VMSTATE_END_OF_LIST()
 }
 };
@@ -477,7 +477,7 @@ static void sysbus_swim_class_init(ObjectClass *oc, void 
*data)
 static const TypeInfo sysbus_swim_info = {
 .name  = TYPE_SWIM,
 .parent= TYPE_SYS_BUS_DEVICE,
-.instance_size = sizeof(SWIM),
+.instance_size = sizeof(Swim),
 .instance_init = sysbus_swim_init,
 .class_init= sysbus_swim_class_init,
 };
-- 
2.26.2




Re: [PATCH 0/1] qcow2: Skip copy-on-write when allocating a zero cluster

2020-08-25 Thread Brian Foster
On Tue, Aug 25, 2020 at 07:18:19PM +0200, Alberto Garcia wrote:
> On Tue 25 Aug 2020 06:54:15 PM CEST, Brian Foster wrote:
> > If I compare this 5m fio test between XFS and ext4 on a couple of my
> > systems (with either no prealloc or full file prealloc), I end up seeing
> > ext4 run slightly faster on my vm and XFS slightly faster on bare metal.
> > Either way, I don't see that huge disparity where ext4 is 5-6 times
> > faster than XFS. Can you describe the test, filesystem and storage in
> > detail where you observe such a discrepancy?
> 
> Here's the test:
> 
> fio --filename=/path/to/file.raw --direct=1 --randrepeat=1 \
> --eta=always --ioengine=libaio --iodepth=32 --numjobs=1 \
> --name=test --size=25G --io_limit=25G --ramp_time=0 \
> --rw=randwrite --bs=4k --runtime=300 --time_based=1
> 

My fio fallocates the entire file by default with this command. Is that
the intent of this particular test? I added --fallocate=none to my test
runs to incorporate the allocation cost in the I/Os.

> The size of the XFS filesystem is 126 GB and it's almost empty, here's
> the xfs_info output:
> 
> meta-data=/dev/vg/test   isize=512agcount=4, agsize=8248576
> blks
>  =   sectsz=512   attr=2, projid32bit=1
>  =   crc=1finobt=1, sparse=1,
>  rmapbt=0
>  =   reflink=0
> data =   bsize=4096   blocks=32994304, imaxpct=25
>  =   sunit=0  swidth=0 blks
> naming   =version 2  bsize=4096   ascii-ci=0, ftype=1
> log  =internal log   bsize=4096   blocks=16110, version=2
>  =   sectsz=512   sunit=0 blks, lazy-count=1
> realtime =none   extsz=4096   blocks=0, rtextents=0
> 
> The size of the ext4 filesystem is 99GB, of which 49GB are free (that
> is, without the file used in this test). The filesystem uses 4KB
> blocks, a 128M journal and these features:
> 
> Filesystem revision #:1 (dynamic)
> Filesystem features:  has_journal ext_attr resize_inode dir_index
>   filetype needs_recovery extent flex_bg
>   sparse_super large_file huge_file uninit_bg
>   dir_nlink extra_isize
> Filesystem flags: signed_directory_hash
> Default mount options:user_xattr acl
> 
> In both cases I'm using LVM on top of LUKS and the hard drive is a
> Samsung SSD 850 PRO 1TB.
> 
> The Linux version is 4.19.132-1 from Debian.
> 

Thanks. I don't have LUKS in the mix on my box, but I was running on a
more recent kernel (Fedora 5.7.15-100). I threw v4.19 on the box and saw
a bit more of a delta between XFS (~14k iops) and ext4 (~24k). The same
test shows ~17k iops for XFS and ~19k iops for ext4 on v5.7. If I
increase the size of the LVM volume from 126G to >1TB, ext4 runs at
roughly the same rate and XFS closes the gap to around ~19k iops as
well. I'm not sure what might have changed since v4.19, but care to see
if this is still an issue on a more recent kernel?

Brian

> Berto
> 




qcow2 overlay performance

2020-08-25 Thread Yoonho Park
I have been measuring the performance of qcow2 overlays, and I am hoping to
get some help in understanding the data I collected. In my experiments, I
created a VM and attached a 16G qcow2 disk to it using "qemu-img create"
and "virsh attach-disk". I use fio to fill it. I create some number of
snapshots (overlays) using "virsh snapshot-create-as". To mimic user
activity between taking snapshots, I use fio to randomly write to 10% of
each overlay right after I create it. After creating the overlays, I use
fio to measure random read performance and random write performance with 2
different blocks sizes, 4K and 64K. 64K is the qcow2 cluster size used by
the 16G qcow2 disk and the overlays (verified with "qemu-img info"). fio is
using the attached disk as a block device to avoid as much file system
overhead as possible. The VM, 16G disk, and snapshots (overlays) all reside
on local disk. Below are the measurements I collected for up to 5 overlays.


  4K blocks64K blocks

olays rd bw rd iops wr bw  wr iops rd bw rd iops wr bw  wr iops

0 4510  1127438028 109507  67854 1060521808 8153

1 4692  11732924   731 66801 1043104297 1629

2 4524  11312781   695 66801 1043104297 1629

3 4573  11433034   758 65500 102395627  1494

4 4556  11392971   742 67973 1062108099 1689

5 4471  11172937   734 66615 104098472  1538


Read performance is not affected by overlays. However, write performance
drops even with a single overlay. My understanding is that writing 4K
blocks requires a read-modify-write because you must fetch a complete
cluster from deeper in the overlay chain before writing to the active
overlay. However, this does not explain the drop in performance when
writing 64K blocks. The performance drop is not as significant, but if the
write block size matches the cluster size then it seems that there should
not be any performance drop because the write can go directly to the active
overlay.


Another issue I hit is that I cannot set or change the cluster size of
overlays. Is this possible with "virsh snapshot-create-as"?


I am using qemu-system-x86_64 version 4.2.0 and virsh version 6.0.0.


Thank you for any insights or advice you have.


[PATCH v3 02/74] megasas: Rename QOM class cast macros

2020-08-25 Thread Eduardo Habkost
Rename the MEGASAS_DEVICE_CLASS() and MEGASAS_DEVICE_GET_CLASS()
macros to be consistent with the MEGASAS() instance cast macro.

This will allow us to register the type cast macros using
OBJECT_DECLARE_TYPE later.

Reviewed-by: Daniel P. Berrangé 
Signed-off-by: Eduardo Habkost 
---
Changes v2 -> v3: none

Changes v1 -> v2: none

---
Cc: Paolo Bonzini 
Cc: Fam Zheng 
Cc: Hannes Reinecke 
Cc: qemu-block@nongnu.org
Cc: qemu-de...@nongnu.org
---
 hw/scsi/megasas.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
index 5cfd1bf22e..390c2f2edb 100644
--- a/hw/scsi/megasas.c
+++ b/hw/scsi/megasas.c
@@ -134,9 +134,9 @@ typedef struct MegasasBaseClass {
 #define MEGASAS(obj) \
 OBJECT_CHECK(MegasasState, (obj), TYPE_MEGASAS_BASE)
 
-#define MEGASAS_DEVICE_CLASS(oc) \
+#define MEGASAS_CLASS(oc) \
 OBJECT_CLASS_CHECK(MegasasBaseClass, (oc), TYPE_MEGASAS_BASE)
-#define MEGASAS_DEVICE_GET_CLASS(oc) \
+#define MEGASAS_GET_CLASS(oc) \
 OBJECT_GET_CLASS(MegasasBaseClass, (oc), TYPE_MEGASAS_BASE)
 
 #define MEGASAS_INTR_DISABLED_MASK 0x
@@ -733,7 +733,7 @@ static int megasas_ctrl_get_info(MegasasState *s, 
MegasasCmd *cmd)
 {
 PCIDevice *pci_dev = PCI_DEVICE(s);
 PCIDeviceClass *pci_class = PCI_DEVICE_GET_CLASS(pci_dev);
-MegasasBaseClass *base_class = MEGASAS_DEVICE_GET_CLASS(s);
+MegasasBaseClass *base_class = MEGASAS_GET_CLASS(s);
 struct mfi_ctrl_info info;
 size_t dcmd_size = sizeof(info);
 BusChild *kid;
@@ -1999,7 +1999,7 @@ static uint64_t megasas_mmio_read(void *opaque, hwaddr 
addr,
 {
 MegasasState *s = opaque;
 PCIDevice *pci_dev = PCI_DEVICE(s);
-MegasasBaseClass *base_class = MEGASAS_DEVICE_GET_CLASS(s);
+MegasasBaseClass *base_class = MEGASAS_GET_CLASS(s);
 uint32_t retval = 0;
 
 switch (addr) {
@@ -2322,7 +2322,7 @@ static const struct SCSIBusInfo megasas_scsi_info = {
 static void megasas_scsi_realize(PCIDevice *dev, Error **errp)
 {
 MegasasState *s = MEGASAS(dev);
-MegasasBaseClass *b = MEGASAS_DEVICE_GET_CLASS(s);
+MegasasBaseClass *b = MEGASAS_GET_CLASS(s);
 uint8_t *pci_conf;
 int i, bar_type;
 Error *err = NULL;
@@ -2506,7 +2506,7 @@ static void megasas_class_init(ObjectClass *oc, void 
*data)
 {
 DeviceClass *dc = DEVICE_CLASS(oc);
 PCIDeviceClass *pc = PCI_DEVICE_CLASS(oc);
-MegasasBaseClass *e = MEGASAS_DEVICE_CLASS(oc);
+MegasasBaseClass *e = MEGASAS_CLASS(oc);
 const MegasasInfo *info = data;
 
 pc->realize = megasas_scsi_realize;
-- 
2.26.2




[PATCH v3 32/74] ahci: Move QOM macro to header

2020-08-25 Thread Eduardo Habkost
Move the ALLWINNER_AHCI macro close to the TYPE_ALLWINNER_AHCI
define.

This will make future conversion to OBJECT_DECLARE* easier.

Reviewed-by: Daniel P. Berrangé 
Signed-off-by: Eduardo Habkost 
---
Changes v2 -> v3: none

Changes series v1 -> v2: new patch in series v2

Cc: John Snow 
Cc: qemu-block@nongnu.org
Cc: qemu-de...@nongnu.org
---
 include/hw/ide/ahci.h   | 2 ++
 hw/ide/ahci-allwinner.c | 3 ---
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/include/hw/ide/ahci.h b/include/hw/ide/ahci.h
index ce2bf8a5f8..41bb517047 100644
--- a/include/hw/ide/ahci.h
+++ b/include/hw/ide/ahci.h
@@ -72,6 +72,8 @@ typedef struct SysbusAHCIState {
 } SysbusAHCIState;
 
 #define TYPE_ALLWINNER_AHCI "allwinner-ahci"
+#define ALLWINNER_AHCI(obj) \
+OBJECT_CHECK(AllwinnerAHCIState, (obj), TYPE_ALLWINNER_AHCI)
 
 #define ALLWINNER_AHCI_MMIO_OFF  0x80
 #define ALLWINNER_AHCI_MMIO_SIZE 0x80
diff --git a/hw/ide/ahci-allwinner.c b/hw/ide/ahci-allwinner.c
index 8536b9eb5a..227e747ba7 100644
--- a/hw/ide/ahci-allwinner.c
+++ b/hw/ide/ahci-allwinner.c
@@ -25,9 +25,6 @@
 
 #include "trace.h"
 
-#define ALLWINNER_AHCI(obj) \
-OBJECT_CHECK(AllwinnerAHCIState, (obj), TYPE_ALLWINNER_AHCI)
-
 #define ALLWINNER_AHCI_BISTAFR((0xa0 - ALLWINNER_AHCI_MMIO_OFF) / 4)
 #define ALLWINNER_AHCI_BISTCR ((0xa4 - ALLWINNER_AHCI_MMIO_OFF) / 4)
 #define ALLWINNER_AHCI_BISTFCTR   ((0xa8 - ALLWINNER_AHCI_MMIO_OFF) / 4)
-- 
2.26.2