date:20201211

[PATCH v4 30/32] qdev: Rename qdev_get_prop_ptr() to object_field_prop_ptr()

2020-12-11 Thread Eduardo Habkost

The function will be moved to common QOM code, as it is not
specific to TYPE_DEVICE anymore.

Reviewed-by: Stefan Berger 
Signed-off-by: Eduardo Habkost 
---
Changes v1 -> v2:
* Rename to object_field_prop_ptr() instead of object_static_prop_ptr()
---
Cc: Stefan Berger 
Cc: Stefano Stabellini 
Cc: Anthony Perard 
Cc: Paul Durrant 
Cc: Kevin Wolf 
Cc: Max Reitz 
Cc: Paolo Bonzini 
Cc: "Daniel P. Berrangé" 
Cc: Eduardo Habkost 
Cc: Cornelia Huck 
Cc: Halil Pasic 
Cc: Christian Borntraeger 
Cc: Richard Henderson 
Cc: David Hildenbrand 
Cc: Thomas Huth 
Cc: Matthew Rosato 
Cc: Alex Williamson 
Cc: qemu-de...@nongnu.org
Cc: xen-de...@lists.xenproject.org
Cc: qemu-block@nongnu.org
Cc: qemu-s3...@nongnu.org
---
 include/hw/qdev-properties.h |  2 +-
 backends/tpm/tpm_util.c  |  6 ++--
 hw/block/xen-block.c |  4 +--
 hw/core/qdev-properties-system.c | 50 +-
 hw/core/qdev-properties.c| 60 
 hw/s390x/css.c   |  4 +--
 hw/s390x/s390-pci-bus.c  |  4 +--
 hw/vfio/pci-quirks.c |  4 +--
 8 files changed, 67 insertions(+), 67 deletions(-)

diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
index 90222822f1..97bb9494ae 100644
--- a/include/hw/qdev-properties.h
+++ b/include/hw/qdev-properties.h
@@ -193,7 +193,7 @@ void qdev_prop_set_macaddr(DeviceState *dev, const char 
*name,
const uint8_t *value);
 void qdev_prop_set_enum(DeviceState *dev, const char *name, int value);
 
-void *qdev_get_prop_ptr(Object *obj, Property *prop);
+void *object_field_prop_ptr(Object *obj, Property *prop);
 
 void qdev_prop_register_global(GlobalProperty *prop);
 const GlobalProperty *qdev_find_global_prop(Object *obj,
diff --git a/backends/tpm/tpm_util.c b/backends/tpm/tpm_util.c
index 39b45fa46d..a6e6d3e72f 100644
--- a/backends/tpm/tpm_util.c
+++ b/backends/tpm/tpm_util.c
@@ -35,7 +35,7 @@
 static void get_tpm(Object *obj, Visitor *v, const char *name, void *opaque,
 Error **errp)
 {
-TPMBackend **be = qdev_get_prop_ptr(obj, opaque);
+TPMBackend **be = object_field_prop_ptr(obj, opaque);
 char *p;
 
 p = g_strdup(*be ? (*be)->id : "");
@@ -47,7 +47,7 @@ static void set_tpm(Object *obj, Visitor *v, const char 
*name, void *opaque,
 Error **errp)
 {
 Property *prop = opaque;
-TPMBackend *s, **be = qdev_get_prop_ptr(obj, prop);
+TPMBackend *s, **be = object_field_prop_ptr(obj, prop);
 char *str;
 
 if (!visit_type_str(v, name, &str, errp)) {
@@ -67,7 +67,7 @@ static void set_tpm(Object *obj, Visitor *v, const char 
*name, void *opaque,
 static void release_tpm(Object *obj, const char *name, void *opaque)
 {
 Property *prop = opaque;
-TPMBackend **be = qdev_get_prop_ptr(obj, prop);
+TPMBackend **be = object_field_prop_ptr(obj, prop);
 
 if (*be) {
 tpm_backend_reset(*be);
diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c
index bd1aef63a7..718d886e5c 100644
--- a/hw/block/xen-block.c
+++ b/hw/block/xen-block.c
@@ -336,7 +336,7 @@ static void xen_block_get_vdev(Object *obj, Visitor *v, 
const char *name,
void *opaque, Error **errp)
 {
 Property *prop = opaque;
-XenBlockVdev *vdev = qdev_get_prop_ptr(obj, prop);
+XenBlockVdev *vdev = object_field_prop_ptr(obj, prop);
 char *str;
 
 switch (vdev->type) {
@@ -396,7 +396,7 @@ static void xen_block_set_vdev(Object *obj, Visitor *v, 
const char *name,
void *opaque, Error **errp)
 {
 Property *prop = opaque;
-XenBlockVdev *vdev = qdev_get_prop_ptr(obj, prop);
+XenBlockVdev *vdev = object_field_prop_ptr(obj, prop);
 char *str, *p;
 const char *end;
 
diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
index 590c5f3d97..e6d378a34e 100644
--- a/hw/core/qdev-properties-system.c
+++ b/hw/core/qdev-properties-system.c
@@ -62,7 +62,7 @@ static void get_drive(Object *obj, Visitor *v, const char 
*name, void *opaque,
   Error **errp)
 {
 Property *prop = opaque;
-void **ptr = qdev_get_prop_ptr(obj, prop);
+void **ptr = object_field_prop_ptr(obj, prop);
 const char *value;
 char *p;
 
@@ -88,7 +88,7 @@ static void set_drive_helper(Object *obj, Visitor *v, const 
char *name,
 {
 DeviceState *dev = DEVICE(obj);
 Property *prop = opaque;
-void **ptr = qdev_get_prop_ptr(obj, prop);
+void **ptr = object_field_prop_ptr(obj, prop);
 char *str;
 BlockBackend *blk;
 bool blk_created = false;
@@ -181,7 +181,7 @@ static void release_drive(Object *obj, const char *name, 
void *opaque)
 {
 DeviceState *dev = DEVICE(obj);
 Property *prop = opaque;
-BlockBackend **ptr = qdev_get_prop_ptr(obj, prop);
+BlockBackend **ptr = object_field_prop_ptr(obj, prop);
 
 if (*ptr) {
 AioContext *ctx = blk_get_aio_context(*ptr);
@@ -214,7 +214,

[PATCH v4 23/32] qdev: Move dev->realized check to qdev_property_set()

2020-12-11 Thread Eduardo Habkost

Every single qdev property setter function manually checks
dev->realized.  We can just check dev->realized inside
qdev_property_set() instead.

The check is being added as a separate function
(qdev_prop_allow_set()) because it will become a callback later.

Reviewed-by: Stefan Berger 
Signed-off-by: Eduardo Habkost 
---
Changes v1 -> v2:
* Removed unused variable at xen_block_set_vdev()
* Redone patch after changes in the previous patches in the
  series
---
Cc: Stefan Berger 
Cc: Stefano Stabellini 
Cc: Anthony Perard 
Cc: Paul Durrant 
Cc: Kevin Wolf 
Cc: Max Reitz 
Cc: Paolo Bonzini 
Cc: "Daniel P. Berrangé" 
Cc: Eduardo Habkost 
Cc: Cornelia Huck 
Cc: Halil Pasic 
Cc: Christian Borntraeger 
Cc: Richard Henderson 
Cc: David Hildenbrand 
Cc: Thomas Huth 
Cc: Matthew Rosato 
Cc: Alex Williamson 
Cc: Mark Cave-Ayland 
Cc: Artyom Tarasenko 
Cc: qemu-de...@nongnu.org
Cc: xen-de...@lists.xenproject.org
Cc: qemu-block@nongnu.org
Cc: qemu-s3...@nongnu.org
---
 backends/tpm/tpm_util.c  |   6 --
 hw/block/xen-block.c |   6 --
 hw/core/qdev-properties-system.c |  70 --
 hw/core/qdev-properties.c| 100 ++-
 hw/s390x/css.c   |   6 --
 hw/s390x/s390-pci-bus.c  |   6 --
 hw/vfio/pci-quirks.c |   6 --
 target/sparc/cpu.c   |   6 --
 8 files changed, 18 insertions(+), 188 deletions(-)

diff --git a/backends/tpm/tpm_util.c b/backends/tpm/tpm_util.c
index a5d997e7dc..39b45fa46d 100644
--- a/backends/tpm/tpm_util.c
+++ b/backends/tpm/tpm_util.c
@@ -46,16 +46,10 @@ static void get_tpm(Object *obj, Visitor *v, const char 
*name, void *opaque,
 static void set_tpm(Object *obj, Visitor *v, const char *name, void *opaque,
 Error **errp)
 {
-DeviceState *dev = DEVICE(obj);
 Property *prop = opaque;
 TPMBackend *s, **be = qdev_get_prop_ptr(obj, prop);
 char *str;
 
-if (dev->realized) {
-qdev_prop_set_after_realize(dev, name, errp);
-return;
-}
-
 if (!visit_type_str(v, name, &str, errp)) {
 return;
 }
diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c
index 905e4acd97..bd1aef63a7 100644
--- a/hw/block/xen-block.c
+++ b/hw/block/xen-block.c
@@ -395,17 +395,11 @@ static int vbd_name_to_disk(const char *name, const char 
**endp,
 static void xen_block_set_vdev(Object *obj, Visitor *v, const char *name,
void *opaque, Error **errp)
 {
-DeviceState *dev = DEVICE(obj);
 Property *prop = opaque;
 XenBlockVdev *vdev = qdev_get_prop_ptr(obj, prop);
 char *str, *p;
 const char *end;
 
-if (dev->realized) {
-qdev_prop_set_after_realize(dev, name, errp);
-return;
-}
-
 if (!visit_type_str(v, name, &str, errp)) {
 return;
 }
diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
index 42529c3b65..f31aea3de1 100644
--- a/hw/core/qdev-properties-system.c
+++ b/hw/core/qdev-properties-system.c
@@ -94,11 +94,6 @@ static void set_drive_helper(Object *obj, Visitor *v, const 
char *name,
 bool blk_created = false;
 int ret;
 
-if (dev->realized) {
-qdev_prop_set_after_realize(dev, name, errp);
-return;
-}
-
 if (!visit_type_str(v, name, &str, errp)) {
 return;
 }
@@ -230,17 +225,11 @@ static void get_chr(Object *obj, Visitor *v, const char 
*name, void *opaque,
 static void set_chr(Object *obj, Visitor *v, const char *name, void *opaque,
 Error **errp)
 {
-DeviceState *dev = DEVICE(obj);
 Property *prop = opaque;
 CharBackend *be = qdev_get_prop_ptr(obj, prop);
 Chardev *s;
 char *str;
 
-if (dev->realized) {
-qdev_prop_set_after_realize(dev, name, errp);
-return;
-}
-
 if (!visit_type_str(v, name, &str, errp)) {
 return;
 }
@@ -311,18 +300,12 @@ static void get_mac(Object *obj, Visitor *v, const char 
*name, void *opaque,
 static void set_mac(Object *obj, Visitor *v, const char *name, void *opaque,
 Error **errp)
 {
-DeviceState *dev = DEVICE(obj);
 Property *prop = opaque;
 MACAddr *mac = qdev_get_prop_ptr(obj, prop);
 int i, pos;
 char *str;
 const char *p;
 
-if (dev->realized) {
-qdev_prop_set_after_realize(dev, name, errp);
-return;
-}
-
 if (!visit_type_str(v, name, &str, errp)) {
 return;
 }
@@ -390,7 +373,6 @@ static void get_netdev(Object *obj, Visitor *v, const char 
*name,
 static void set_netdev(Object *obj, Visitor *v, const char *name,
void *opaque, Error **errp)
 {
-DeviceState *dev = DEVICE(obj);
 Property *prop = opaque;
 NICPeers *peers_ptr = qdev_get_prop_ptr(obj, prop);
 NetClientState **ncs = peers_ptr->ncs;
@@ -398,11 +380,6 @@ static void set_netdev(Object *obj, Visitor *v, const char 
*name,
 int queues, err = 0, i = 0;
 char *str;
 
-if (dev->reali

[PATCH v4 09/32] qdev: Make qdev_get_prop_ptr() get Object* arg

2020-12-11 Thread Eduardo Habkost

Make the code more generic and not specific to TYPE_DEVICE.

Reviewed-by: Marc-André Lureau 
Reviewed-by: Cornelia Huck  #s390 parts
Signed-off-by: Eduardo Habkost 
---
Changes v1 -> v2:
- Fix build error with CONFIG_XEN
  I took the liberty of keeping the Reviewed-by line from
  Marc-André as the build fix is a trivial one line change
---
Cc: Stefan Berger 
Cc: Stefano Stabellini 
Cc: Anthony Perard 
Cc: Paul Durrant 
Cc: Kevin Wolf 
Cc: Max Reitz 
Cc: Paolo Bonzini 
Cc: "Daniel P. Berrangé" 
Cc: Eduardo Habkost 
Cc: Cornelia Huck 
Cc: Thomas Huth 
Cc: Richard Henderson 
Cc: David Hildenbrand 
Cc: Halil Pasic 
Cc: Christian Borntraeger 
Cc: Matthew Rosato 
Cc: Alex Williamson 
Cc: qemu-de...@nongnu.org
Cc: xen-de...@lists.xenproject.org
Cc: qemu-block@nongnu.org
Cc: qemu-s3...@nongnu.org
---
 include/hw/qdev-properties.h |  2 +-
 backends/tpm/tpm_util.c  |  8 ++--
 hw/block/xen-block.c |  5 +-
 hw/core/qdev-properties-system.c | 57 +-
 hw/core/qdev-properties.c| 82 +---
 hw/s390x/css.c   |  5 +-
 hw/s390x/s390-pci-bus.c  |  4 +-
 hw/vfio/pci-quirks.c |  5 +-
 8 files changed, 68 insertions(+), 100 deletions(-)

diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
index 0ea822e6a7..0b92cfc761 100644
--- a/include/hw/qdev-properties.h
+++ b/include/hw/qdev-properties.h
@@ -302,7 +302,7 @@ void qdev_prop_set_macaddr(DeviceState *dev, const char 
*name,
const uint8_t *value);
 void qdev_prop_set_enum(DeviceState *dev, const char *name, int value);
 
-void *qdev_get_prop_ptr(DeviceState *dev, Property *prop);
+void *qdev_get_prop_ptr(Object *obj, Property *prop);
 
 void qdev_prop_register_global(GlobalProperty *prop);
 const GlobalProperty *qdev_find_global_prop(DeviceState *dev,
diff --git a/backends/tpm/tpm_util.c b/backends/tpm/tpm_util.c
index e6aeb63587..3973105658 100644
--- a/backends/tpm/tpm_util.c
+++ b/backends/tpm/tpm_util.c
@@ -35,8 +35,7 @@
 static void get_tpm(Object *obj, Visitor *v, const char *name, void *opaque,
 Error **errp)
 {
-DeviceState *dev = DEVICE(obj);
-TPMBackend **be = qdev_get_prop_ptr(dev, opaque);
+TPMBackend **be = qdev_get_prop_ptr(obj, opaque);
 char *p;
 
 p = g_strdup(*be ? (*be)->id : "");
@@ -49,7 +48,7 @@ static void set_tpm(Object *obj, Visitor *v, const char 
*name, void *opaque,
 {
 DeviceState *dev = DEVICE(obj);
 Property *prop = opaque;
-TPMBackend *s, **be = qdev_get_prop_ptr(dev, prop);
+TPMBackend *s, **be = qdev_get_prop_ptr(obj, prop);
 char *str;
 
 if (dev->realized) {
@@ -73,9 +72,8 @@ static void set_tpm(Object *obj, Visitor *v, const char 
*name, void *opaque,
 
 static void release_tpm(Object *obj, const char *name, void *opaque)
 {
-DeviceState *dev = DEVICE(obj);
 Property *prop = opaque;
-TPMBackend **be = qdev_get_prop_ptr(dev, prop);
+TPMBackend **be = qdev_get_prop_ptr(obj, prop);
 
 if (*be) {
 tpm_backend_reset(*be);
diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c
index 8a7a3f5452..905e4acd97 100644
--- a/hw/block/xen-block.c
+++ b/hw/block/xen-block.c
@@ -335,9 +335,8 @@ static char *disk_to_vbd_name(unsigned int disk)
 static void xen_block_get_vdev(Object *obj, Visitor *v, const char *name,
void *opaque, Error **errp)
 {
-DeviceState *dev = DEVICE(obj);
 Property *prop = opaque;
-XenBlockVdev *vdev = qdev_get_prop_ptr(dev, prop);
+XenBlockVdev *vdev = qdev_get_prop_ptr(obj, prop);
 char *str;
 
 switch (vdev->type) {
@@ -398,7 +397,7 @@ static void xen_block_set_vdev(Object *obj, Visitor *v, 
const char *name,
 {
 DeviceState *dev = DEVICE(obj);
 Property *prop = opaque;
-XenBlockVdev *vdev = qdev_get_prop_ptr(dev, prop);
+XenBlockVdev *vdev = qdev_get_prop_ptr(obj, prop);
 char *str, *p;
 const char *end;
 
diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
index 77b31eb9dc..9ac9b95852 100644
--- a/hw/core/qdev-properties-system.c
+++ b/hw/core/qdev-properties-system.c
@@ -59,9 +59,8 @@ static bool check_prop_still_unset(DeviceState *dev, const 
char *name,
 static void get_drive(Object *obj, Visitor *v, const char *name, void *opaque,
   Error **errp)
 {
-DeviceState *dev = DEVICE(obj);
 Property *prop = opaque;
-void **ptr = qdev_get_prop_ptr(dev, prop);
+void **ptr = qdev_get_prop_ptr(obj, prop);
 const char *value;
 char *p;
 
@@ -87,7 +86,7 @@ static void set_drive_helper(Object *obj, Visitor *v, const 
char *name,
 {
 DeviceState *dev = DEVICE(obj);
 Property *prop = opaque;
-void **ptr = qdev_get_prop_ptr(dev, prop);
+void **ptr = qdev_get_prop_ptr(obj, prop);
 char *str;
 BlockBackend *blk;
 bool blk_created = false;
@@ -185,7 +184,7 @@ static void release_drive(Object *obj, const char *name,

Re: [PATCH 19/20] block: Use GString instead of QString to build filenames

2020-12-11 Thread Vladimir Sementsov-Ogievskiy


11.12.2020 20:11, Markus Armbruster wrote:

QString supports modifying its string, but it's quite limited: you can
only append.  Just one caller remains:
bdrv_parse_filename_strip_prefix() uses it just for building an
initial string.

Change it to do build the initial string with GString.  This is
another step towards making QString immutable.

Cc: Kevin Wolf
Cc: Max Reitz
Cc:qemu-block@nongnu.org
Signed-off-by: Markus Armbruster


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir

[PATCH v4 14/16] block/io: support int64_t bytes in bdrv_co_p{read, write}v_part()

2020-12-11 Thread Vladimir Sementsov-Ogievskiy via

We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

So, prepare bdrv_co_preadv_part() and bdrv_co_pwritev_part() and their
remaining dependencies now.

bdrv_pad_request() is updated simultaneously, as pointer to bytes passed
to it both from bdrv_co_pwritev_part() and bdrv_co_preadv_part().

So, all callers of bdrv_pad_request() are updated to pass 64bit bytes.
bdrv_pad_request() is already good for 64bit requests, add
corresponding assertion.

Look at bdrv_co_preadv_part() and bdrv_co_pwritev_part().
Type is widening, so callers are safe. Let's look inside the functions.

In bdrv_co_preadv_part() and bdrv_aligned_pwritev() we only pass bytes
to other already int64_t interfaces (and some obviously safe
calculations), it's OK.

In bdrv_co_do_zero_pwritev() aligned_bytes may become large now, still
it's passed to bdrv_aligned_pwritev which supports int64_t bytes.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/block_int.h |  4 ++--
 block/io.c| 14 --
 block/trace-events|  4 ++--
 3 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 04c1e5cb58..55b1039872 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1031,13 +1031,13 @@ int coroutine_fn bdrv_co_preadv(BdrvChild *child,
 int64_t offset, unsigned int bytes, QEMUIOVector *qiov,
 BdrvRequestFlags flags);
 int coroutine_fn bdrv_co_preadv_part(BdrvChild *child,
-int64_t offset, unsigned int bytes,
+int64_t offset, int64_t bytes,
 QEMUIOVector *qiov, size_t qiov_offset, BdrvRequestFlags flags);
 int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
 int64_t offset, unsigned int bytes, QEMUIOVector *qiov,
 BdrvRequestFlags flags);
 int coroutine_fn bdrv_co_pwritev_part(BdrvChild *child,
-int64_t offset, unsigned int bytes,
+int64_t offset, int64_t bytes,
 QEMUIOVector *qiov, size_t qiov_offset, BdrvRequestFlags flags);
 
 static inline int coroutine_fn bdrv_co_pread(BdrvChild *child,
diff --git a/block/io.c b/block/io.c
index 93a89a56e3..5200658224 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1697,11 +1697,13 @@ static void bdrv_padding_destroy(BdrvRequestPadding 
*pad)
  */
 static int bdrv_pad_request(BlockDriverState *bs,
 QEMUIOVector **qiov, size_t *qiov_offset,
-int64_t *offset, unsigned int *bytes,
+int64_t *offset, int64_t *bytes,
 BdrvRequestPadding *pad, bool *padded)
 {
 int ret;
 
+bdrv_check_qiov_request(*offset, *bytes, *qiov, *qiov_offset, 
&error_abort);
+
 if (!bdrv_init_padding(bs, *offset, *bytes, pad)) {
 if (padded) {
 *padded = false;
@@ -1736,7 +1738,7 @@ int coroutine_fn bdrv_co_preadv(BdrvChild *child,
 }
 
 int coroutine_fn bdrv_co_preadv_part(BdrvChild *child,
-int64_t offset, unsigned int bytes,
+int64_t offset, int64_t bytes,
 QEMUIOVector *qiov, size_t qiov_offset,
 BdrvRequestFlags flags)
 {
@@ -1745,7 +1747,7 @@ int coroutine_fn bdrv_co_preadv_part(BdrvChild *child,
 BdrvRequestPadding pad;
 int ret;
 
-trace_bdrv_co_preadv(bs, offset, bytes, flags);
+trace_bdrv_co_preadv_part(bs, offset, bytes, flags);
 
 if (!bdrv_is_inserted(bs)) {
 return -ENOMEDIUM;
@@ -2089,7 +2091,7 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild 
*child,
 
 static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild *child,
 int64_t offset,
-unsigned int bytes,
+int64_t bytes,
 BdrvRequestFlags flags,
 BdrvTrackedRequest *req)
 {
@@ -2163,7 +2165,7 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
 }
 
 int coroutine_fn bdrv_co_pwritev_part(BdrvChild *child,
-int64_t offset, unsigned int bytes, QEMUIOVector *qiov, size_t qiov_offset,
+int64_t offset, int64_t bytes, QEMUIOVector *qiov, size_t qiov_offset,
 BdrvRequestFlags flags)
 {
 BlockDriverState *bs = child->bs;
@@ -2173,7 +2175,7 @@ int coroutine_fn bdrv_co_pwritev_part(BdrvChild *child,
 int ret;
 bool padded = false;
 
-trace_bdrv_co_pwritev(child->bs, offset, bytes, flags);
+trace_bdrv_co_pwritev_part(child->bs, offset, bytes, flags);
 
 if (!bdrv_is_inserted(bs)) {
 return -ENOMEDIUM;
diff --git a/block/trace-events b/block/trace-events
index a5f6ffb7da..91a0f70575 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -11,8 +11,8 @

[PATCH v4 13/16] block/io: support int64_t bytes in bdrv_aligned_preadv()

2020-12-11 Thread Vladimir Sementsov-Ogievskiy

We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

So, prepare bdrv_aligned_preadv() now.

Make byte variable in bdrv_padding_rmw_read() int64_t, as it defined
only to be passed to bdrv_aligned_preadv().

All bdrv_aligned_preadv() callers are safe as type is widening. Let's
look inside:

 - add a new-style assertion that request is good.
 - callees bdrv_is_allocated(), bdrv_co_do_copy_on_readv() supports
   int64_t bytes
 - conversion of bytes_remaining is OK, as we never has requests
   overflowing BDRV_MAX_LENGTH
 - looping through bytes_remaining is ok, num is updated to int64_t
   - for bdrv_driver_preadv we have same limit of max_transfer
   - qemu_iovec_memset is OK, as bytes+qiov_offset should not overflow
 qiov->size anyway (thanks to bdrv_check_qiov_request())

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/io.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/io.c b/block/io.c
index d8c07fac56..93a89a56e3 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1453,15 +1453,16 @@ err:
  * reads; any other features must be implemented by the caller.
  */
 static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child,
-BdrvTrackedRequest *req, int64_t offset, unsigned int bytes,
+BdrvTrackedRequest *req, int64_t offset, int64_t bytes,
 int64_t align, QEMUIOVector *qiov, size_t qiov_offset, int flags)
 {
 BlockDriverState *bs = child->bs;
 int64_t total_bytes, max_bytes;
 int ret = 0;
-uint64_t bytes_remaining = bytes;
+int64_t bytes_remaining = bytes;
 int max_transfer;
 
+bdrv_check_qiov_request(offset, bytes, qiov, qiov_offset, &error_abort);
 assert(is_power_of_2(align));
 assert((offset & (align - 1)) == 0);
 assert((bytes & (align - 1)) == 0);
@@ -1518,7 +1519,7 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild 
*child,
 }
 
 while (bytes_remaining) {
-int num;
+int64_t num;
 
 if (max_bytes) {
 num = MIN(bytes_remaining, MIN(max_bytes, max_transfer));
@@ -1624,7 +1625,7 @@ static int bdrv_padding_rmw_read(BdrvChild *child,
 assert(req->serialising && pad->buf);
 
 if (pad->head || pad->merge_reads) {
-uint64_t bytes = pad->merge_reads ? pad->buf_len : align;
+int64_t bytes = pad->merge_reads ? pad->buf_len : align;
 
 qemu_iovec_init_buf(&local_qiov, pad->buf, bytes);
 
-- 
2.25.4

[PATCH v4 08/16] block: use int64_t as bytes type in tracked requests

2020-12-11 Thread Vladimir Sementsov-Ogievskiy

We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

All requests in block/io must not overflow BDRV_MAX_LENGTH, all
external users of BdrvTrackedRequest already have corresponding
assertions, so we are safe. Add some assertions still.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/block_int.h |  4 ++--
 block/io.c| 14 +-
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index ff29f31451..04c1e5cb58 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -79,12 +79,12 @@ enum BdrvTrackedRequestType {
 typedef struct BdrvTrackedRequest {
 BlockDriverState *bs;
 int64_t offset;
-uint64_t bytes;
+int64_t bytes;
 enum BdrvTrackedRequestType type;
 
 bool serialising;
 int64_t overlap_offset;
-uint64_t overlap_bytes;
+int64_t overlap_bytes;
 
 QLIST_ENTRY(BdrvTrackedRequest) list;
 Coroutine *co; /* owner, used for deadlock detection */
diff --git a/block/io.c b/block/io.c
index 42e687a388..5dec6ab925 100644
--- a/block/io.c
+++ b/block/io.c
@@ -717,10 +717,10 @@ static void tracked_request_end(BdrvTrackedRequest *req)
 static void tracked_request_begin(BdrvTrackedRequest *req,
   BlockDriverState *bs,
   int64_t offset,
-  uint64_t bytes,
+  int64_t bytes,
   enum BdrvTrackedRequestType type)
 {
-assert(bytes <= INT64_MAX && offset <= INT64_MAX - bytes);
+bdrv_check_request(offset, bytes, &error_abort);
 
 *req = (BdrvTrackedRequest){
 .bs = bs,
@@ -741,8 +741,10 @@ static void tracked_request_begin(BdrvTrackedRequest *req,
 }
 
 static bool tracked_request_overlaps(BdrvTrackedRequest *req,
- int64_t offset, uint64_t bytes)
+ int64_t offset, int64_t bytes)
 {
+bdrv_check_request(offset, bytes, &error_abort);
+
 /*    */
 if (offset >= req->overlap_offset + req->overlap_bytes) {
 return false;
@@ -798,10 +800,12 @@ bool bdrv_mark_request_serialising(BdrvTrackedRequest 
*req, uint64_t align)
 {
 BlockDriverState *bs = req->bs;
 int64_t overlap_offset = req->offset & ~(align - 1);
-uint64_t overlap_bytes = ROUND_UP(req->offset + req->bytes, align)
-   - overlap_offset;
+int64_t overlap_bytes =
+ROUND_UP(req->offset + req->bytes, align) - overlap_offset;
 bool waited;
 
+bdrv_check_request(req->offset, req->bytes, &error_abort);
+
 qemu_co_mutex_lock(&bs->reqs_lock);
 if (!req->serialising) {
 qatomic_inc(&req->bs->serialising_in_flight);
-- 
2.25.4

[PATCH v4 07/16] block/io: improve bdrv_check_request: check qiov too

2020-12-11 Thread Vladimir Sementsov-Ogievskiy

Operations with qiov add more restrictions on bytes, let's cover it.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/io.c | 46 +++---
 1 file changed, 39 insertions(+), 7 deletions(-)

diff --git a/block/io.c b/block/io.c
index 4a057660f8..42e687a388 100644
--- a/block/io.c
+++ b/block/io.c
@@ -898,8 +898,14 @@ static bool coroutine_fn 
bdrv_wait_serialising_requests(BdrvTrackedRequest *self
 return waited;
 }
 
-int bdrv_check_request(int64_t offset, int64_t bytes, Error **errp)
+static int bdrv_check_qiov_request(int64_t offset, int64_t bytes,
+   QEMUIOVector *qiov, size_t qiov_offset,
+   Error **errp)
 {
+/*
+ * Check generic offset/bytes correctness
+ */
+
 if (offset < 0) {
 error_setg(errp, "offset is negative: %" PRIi64, offset);
 return -EIO;
@@ -929,12 +935,38 @@ int bdrv_check_request(int64_t offset, int64_t bytes, 
Error **errp)
 return -EIO;
 }
 
+if (!qiov) {
+return 0;
+}
+
+/*
+ * Check qiov and qiov_offset
+ */
+
+if (qiov_offset > qiov->size) {
+error_setg(errp, "qiov_offset(%zu) overflow io vector size(%zu)",
+   qiov_offset, qiov->size);
+return -EIO;
+}
+
+if (bytes > qiov->size - qiov_offset) {
+error_setg(errp, "bytes(%" PRIi64 ") + qiov_offset(%zu) overflow io "
+   "vector size(%zu)", bytes, qiov_offset, qiov->size);
+return -EIO;
+}
+
 return 0;
 }
 
-static int bdrv_check_request32(int64_t offset, int64_t bytes)
+int bdrv_check_request(int64_t offset, int64_t bytes, Error **errp)
+{
+return bdrv_check_qiov_request(offset, bytes, NULL, 0, errp);
+}
+
+static int bdrv_check_request32(int64_t offset, int64_t bytes,
+QEMUIOVector *qiov, size_t qiov_offset)
 {
-int ret = bdrv_check_request(offset, bytes, NULL);
+int ret = bdrv_check_qiov_request(offset, bytes, qiov, qiov_offset, NULL);
 if (ret < 0) {
 return ret;
 }
@@ -1708,7 +1740,7 @@ int coroutine_fn bdrv_co_preadv_part(BdrvChild *child,
 return -ENOMEDIUM;
 }
 
-ret = bdrv_check_request32(offset, bytes);
+ret = bdrv_check_request32(offset, bytes, qiov, qiov_offset);
 if (ret < 0) {
 return ret;
 }
@@ -2129,7 +2161,7 @@ int coroutine_fn bdrv_co_pwritev_part(BdrvChild *child,
 return -ENOMEDIUM;
 }
 
-ret = bdrv_check_request32(offset, bytes);
+ret = bdrv_check_request32(offset, bytes, qiov, qiov_offset);
 if (ret < 0) {
 return ret;
 }
@@ -3135,7 +3167,7 @@ static int coroutine_fn bdrv_co_copy_range_internal(
 if (!dst || !dst->bs || !bdrv_is_inserted(dst->bs)) {
 return -ENOMEDIUM;
 }
-ret = bdrv_check_request32(dst_offset, bytes);
+ret = bdrv_check_request32(dst_offset, bytes, NULL, 0);
 if (ret) {
 return ret;
 }
@@ -3146,7 +3178,7 @@ static int coroutine_fn bdrv_co_copy_range_internal(
 if (!src || !src->bs || !bdrv_is_inserted(src->bs)) {
 return -ENOMEDIUM;
 }
-ret = bdrv_check_request32(src_offset, bytes);
+ret = bdrv_check_request32(src_offset, bytes, NULL, 0);
 if (ret) {
 return ret;
 }
-- 
2.25.4

[PATCH v4 12/16] block/io: support int64_t bytes in bdrv_co_do_copy_on_readv()

2020-12-11 Thread Vladimir Sementsov-Ogievskiy

We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

So, prepare bdrv_co_do_copy_on_readv() now.

'bytes' type widening, so callers are safe. Look at the function
itself:

bytes, skip_bytes and progress become int64_t.

bdrv_round_to_clusters() is OK, cluster_bytes now may be large.
trace_bdrv_co_do_copy_on_readv() is OK

looping through cluster_bytes is still OK.

pnum is still capped to max_transfer, and to MAX_BOUNCE_BUFFER when we
are going to do COR operation. Therefor calculations in
qemu_iovec_from_buf() and bdrv_driver_preadv() should not change.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/io.c | 8 +---
 block/trace-events | 2 +-
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/block/io.c b/block/io.c
index b48f54..d8c07fac56 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1289,7 +1289,7 @@ bdrv_driver_pwritev_compressed(BlockDriverState *bs, 
int64_t offset,
 }
 
 static int coroutine_fn bdrv_co_do_copy_on_readv(BdrvChild *child,
-int64_t offset, unsigned int bytes, QEMUIOVector *qiov,
+int64_t offset, int64_t bytes, QEMUIOVector *qiov,
 size_t qiov_offset, int flags)
 {
 BlockDriverState *bs = child->bs;
@@ -1304,13 +1304,15 @@ static int coroutine_fn 
bdrv_co_do_copy_on_readv(BdrvChild *child,
 BlockDriver *drv = bs->drv;
 int64_t cluster_offset;
 int64_t cluster_bytes;
-size_t skip_bytes;
+int64_t skip_bytes;
 int ret;
 int max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
 BDRV_REQUEST_MAX_BYTES);
-unsigned int progress = 0;
+int64_t progress = 0;
 bool skip_write;
 
+bdrv_check_qiov_request(offset, bytes, qiov, qiov_offset, &error_abort);
+
 if (!drv) {
 return -ENOMEDIUM;
 }
diff --git a/block/trace-events b/block/trace-events
index 8368f4acb0..a5f6ffb7da 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -14,7 +14,7 @@ blk_root_detach(void *child, void *blk, void *bs) "child %p 
blk %p bs %p"
 bdrv_co_preadv(void *bs, int64_t offset, int64_t nbytes, unsigned int flags) 
"bs %p offset %"PRId64" nbytes %"PRId64" flags 0x%x"
 bdrv_co_pwritev(void *bs, int64_t offset, int64_t nbytes, unsigned int flags) 
"bs %p offset %"PRId64" nbytes %"PRId64" flags 0x%x"
 bdrv_co_pwrite_zeroes(void *bs, int64_t offset, int count, int flags) "bs %p 
offset %"PRId64" count %d flags 0x%x"
-bdrv_co_do_copy_on_readv(void *bs, int64_t offset, unsigned int bytes, int64_t 
cluster_offset, int64_t cluster_bytes) "bs %p offset %"PRId64" bytes %u 
cluster_offset %"PRId64" cluster_bytes %"PRId64
+bdrv_co_do_copy_on_readv(void *bs, int64_t offset, int64_t bytes, int64_t 
cluster_offset, int64_t cluster_bytes) "bs %p offset %" PRId64 " bytes %" 
PRId64 " cluster_offset %" PRId64 " cluster_bytes %" PRId64
 bdrv_co_copy_range_from(void *src, uint64_t src_offset, void *dst, uint64_t 
dst_offset, uint64_t bytes, int read_flags, int write_flags) "src %p offset 
%"PRIu64" dst %p offset %"PRIu64" bytes %"PRIu64" rw flags 0x%x 0x%x"
 bdrv_co_copy_range_to(void *src, uint64_t src_offset, void *dst, uint64_t 
dst_offset, uint64_t bytes, int read_flags, int write_flags) "src %p offset 
%"PRIu64" dst %p offset %"PRIu64" bytes %"PRIu64" rw flags 0x%x 0x%x"
 
-- 
2.25.4

[PATCH v4 11/16] block/io: support int64_t bytes in bdrv_aligned_pwritev()

2020-12-11 Thread Vladimir Sementsov-Ogievskiy

We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

So, prepare bdrv_aligned_pwritev() now and convert the dependencies:
bdrv_co_write_req_prepare() and bdrv_co_write_req_finish() to signed
type bytes.

Conversion of bdrv_co_write_req_prepare() and
bdrv_co_write_req_finish() is definitely safe, as all requests in
block/io must not overflow BDRV_MAX_LENGTH. Still add assertions.

For bdrv_aligned_pwritev() 'bytes' type is widened, so callers are
safe. Let's check usage of the parameter inside the function.

Passing to bdrv_co_write_req_prepare() and bdrv_co_write_req_finish()
is OK.

Passing to qemu_iovec_* is OK after new assertion. All other callees
are already updated to int64_t.

Checking alignment is not changed, offset + bytes and qiov_offset +
bytes calculations are safe (thanks to new assertions).

max_transfer is kept to be int for now. It has a default of INT_MAX
here, and some drivers may rely on it. It's to be refactored later.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/io.c | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/block/io.c b/block/io.c
index c6a476559a..b48f54 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1904,12 +1904,13 @@ fail:
 }
 
 static inline int coroutine_fn
-bdrv_co_write_req_prepare(BdrvChild *child, int64_t offset, uint64_t bytes,
+bdrv_co_write_req_prepare(BdrvChild *child, int64_t offset, int64_t bytes,
   BdrvTrackedRequest *req, int flags)
 {
 BlockDriverState *bs = child->bs;
 bool waited;
-int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
+
+bdrv_check_request(offset, bytes, &error_abort);
 
 if (bs->read_only) {
 return -EPERM;
@@ -1935,7 +1936,8 @@ bdrv_co_write_req_prepare(BdrvChild *child, int64_t 
offset, uint64_t bytes,
 
 assert(req->overlap_offset <= offset);
 assert(offset + bytes <= req->overlap_offset + req->overlap_bytes);
-assert(end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE);
+assert(offset + bytes <= bs->total_sectors * BDRV_SECTOR_SIZE ||
+   child->perm & BLK_PERM_RESIZE);
 
 switch (req->type) {
 case BDRV_TRACKED_WRITE:
@@ -1956,12 +1958,14 @@ bdrv_co_write_req_prepare(BdrvChild *child, int64_t 
offset, uint64_t bytes,
 }
 
 static inline void coroutine_fn
-bdrv_co_write_req_finish(BdrvChild *child, int64_t offset, uint64_t bytes,
+bdrv_co_write_req_finish(BdrvChild *child, int64_t offset, int64_t bytes,
  BdrvTrackedRequest *req, int ret)
 {
 int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
 BlockDriverState *bs = child->bs;
 
+bdrv_check_request(offset, bytes, &error_abort);
+
 qatomic_inc(&bs->write_gen);
 
 /*
@@ -1998,16 +2002,18 @@ bdrv_co_write_req_finish(BdrvChild *child, int64_t 
offset, uint64_t bytes,
  * after possibly fragmenting it.
  */
 static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child,
-BdrvTrackedRequest *req, int64_t offset, unsigned int bytes,
+BdrvTrackedRequest *req, int64_t offset, int64_t bytes,
 int64_t align, QEMUIOVector *qiov, size_t qiov_offset, int flags)
 {
 BlockDriverState *bs = child->bs;
 BlockDriver *drv = bs->drv;
 int ret;
 
-uint64_t bytes_remaining = bytes;
+int64_t bytes_remaining = bytes;
 int max_transfer;
 
+bdrv_check_qiov_request(offset, bytes, qiov, qiov_offset, &error_abort);
+
 if (!drv) {
 return -ENOMEDIUM;
 }
@@ -2019,7 +2025,6 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild 
*child,
 assert(is_power_of_2(align));
 assert((offset & (align - 1)) == 0);
 assert((bytes & (align - 1)) == 0);
-assert(!qiov || qiov_offset + bytes <= qiov->size);
 max_transfer = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX),
align);
 
@@ -2118,7 +2123,7 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild 
*child,
 assert(!bytes || (offset & (align - 1)) == 0);
 if (bytes >= align) {
 /* Write the aligned part in the middle. */
-uint64_t aligned_bytes = bytes & ~(align - 1);
+int64_t aligned_bytes = bytes & ~(align - 1);
 ret = bdrv_aligned_pwritev(child, req, offset, aligned_bytes, align,
NULL, 0, flags);
 if (ret < 0) {
-- 
2.25.4

[PATCH v4 16/16] block/io: use int64_t bytes in copy_range

2020-12-11 Thread Vladimir Sementsov-Ogievskiy

We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

So, convert now copy_range parameters which are already 64bit to signed
type.

It's safe as we don't work with requests overflowing BDRV_MAX_LENGTH
(which is less than INT64_MAX), and do check the requests in
bdrv_co_copy_range_internal() (by bdrv_check_request32(), which calls
bdrv_check_request()).

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/block.h |  6 +++---
 include/block/block_int.h | 12 ++--
 block/io.c| 22 +++---
 block/trace-events|  4 ++--
 4 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 3549125f1d..88629eb3a6 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -843,8 +843,8 @@ void bdrv_unregister_buf(BlockDriverState *bs, void *host);
  *
  * Returns: 0 if succeeded; negative error code if failed.
  **/
-int coroutine_fn bdrv_co_copy_range(BdrvChild *src, uint64_t src_offset,
-BdrvChild *dst, uint64_t dst_offset,
-uint64_t bytes, BdrvRequestFlags 
read_flags,
+int coroutine_fn bdrv_co_copy_range(BdrvChild *src, int64_t src_offset,
+BdrvChild *dst, int64_t dst_offset,
+int64_t bytes, BdrvRequestFlags read_flags,
 BdrvRequestFlags write_flags);
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 5e482a8f08..cee5cb5f85 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1343,14 +1343,14 @@ void bdrv_dec_in_flight(BlockDriverState *bs);
 
 void blockdev_close_all_bdrv_states(void);
 
-int coroutine_fn bdrv_co_copy_range_from(BdrvChild *src, uint64_t src_offset,
- BdrvChild *dst, uint64_t dst_offset,
- uint64_t bytes,
+int coroutine_fn bdrv_co_copy_range_from(BdrvChild *src, int64_t src_offset,
+ BdrvChild *dst, int64_t dst_offset,
+ int64_t bytes,
  BdrvRequestFlags read_flags,
  BdrvRequestFlags write_flags);
-int coroutine_fn bdrv_co_copy_range_to(BdrvChild *src, uint64_t src_offset,
-   BdrvChild *dst, uint64_t dst_offset,
-   uint64_t bytes,
+int coroutine_fn bdrv_co_copy_range_to(BdrvChild *src, int64_t src_offset,
+   BdrvChild *dst, int64_t dst_offset,
+   int64_t bytes,
BdrvRequestFlags read_flags,
BdrvRequestFlags write_flags);
 
diff --git a/block/io.c b/block/io.c
index 34dae81fa7..28680a1f64 100644
--- a/block/io.c
+++ b/block/io.c
@@ -3173,8 +3173,8 @@ void bdrv_unregister_buf(BlockDriverState *bs, void *host)
 }
 
 static int coroutine_fn bdrv_co_copy_range_internal(
-BdrvChild *src, uint64_t src_offset, BdrvChild *dst,
-uint64_t dst_offset, uint64_t bytes,
+BdrvChild *src, int64_t src_offset, BdrvChild *dst,
+int64_t dst_offset, int64_t bytes,
 BdrvRequestFlags read_flags, BdrvRequestFlags write_flags,
 bool recurse_src)
 {
@@ -3252,9 +3252,9 @@ static int coroutine_fn bdrv_co_copy_range_internal(
  *
  * See the comment of bdrv_co_copy_range for the parameter and return value
  * semantics. */
-int coroutine_fn bdrv_co_copy_range_from(BdrvChild *src, uint64_t src_offset,
- BdrvChild *dst, uint64_t dst_offset,
- uint64_t bytes,
+int coroutine_fn bdrv_co_copy_range_from(BdrvChild *src, int64_t src_offset,
+ BdrvChild *dst, int64_t dst_offset,
+ int64_t bytes,
  BdrvRequestFlags read_flags,
  BdrvRequestFlags write_flags)
 {
@@ -3268,9 +3268,9 @@ int coroutine_fn bdrv_co_copy_range_from(BdrvChild *src, 
uint64_t src_offset,
  *
  * See the comment of bdrv_co_copy_range for the parameter and return value
  * semantics. */
-int coroutine_fn bdrv_co_copy_range_to(BdrvChild *src, uint64_t src_offset,
-   BdrvChild *dst, uint64_t dst_offset,
-   uint64_t bytes,
+int coroutine_fn bdrv_co_copy_range_to(BdrvChild *src, int64_t src_offset,
+

[PATCH v4 05/16] block/io: bdrv_pad_request(): support qemu_iovec_init_extended failure

2020-12-11 Thread Vladimir Sementsov-Ogievskiy

Make bdrv_pad_request() honest: return error if
qemu_iovec_init_extended() failed.

Update also bdrv_padding_destroy() to clean the structure for safety.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/io.c | 45 +++--
 1 file changed, 31 insertions(+), 14 deletions(-)

diff --git a/block/io.c b/block/io.c
index dcfab267f8..4a057660f8 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1637,6 +1637,7 @@ static void bdrv_padding_destroy(BdrvRequestPadding *pad)
 qemu_vfree(pad->buf);
 qemu_iovec_destroy(&pad->local_qiov);
 }
+memset(pad, 0, sizeof(*pad));
 }
 
 /*
@@ -1646,33 +1647,42 @@ static void bdrv_padding_destroy(BdrvRequestPadding 
*pad)
  * read of padding, bdrv_padding_rmw_read() should be called separately if
  * needed.
  *
- * All parameters except @bs are in-out: they represent original request at
- * function call and padded (if padding needed) at function finish.
- *
- * Function always succeeds.
+ * Request parameters (@qiov, &qiov_offset, &offset, &bytes) are in-out:
+ *  - on function start they represent original request
+ *  - on failure or when padding is not needed they are unchanged
+ *  - on success when padding is needed they represent padded request
  */
-static bool bdrv_pad_request(BlockDriverState *bs,
- QEMUIOVector **qiov, size_t *qiov_offset,
- int64_t *offset, unsigned int *bytes,
- BdrvRequestPadding *pad)
+static int bdrv_pad_request(BlockDriverState *bs,
+QEMUIOVector **qiov, size_t *qiov_offset,
+int64_t *offset, unsigned int *bytes,
+BdrvRequestPadding *pad, bool *padded)
 {
 int ret;
 
 if (!bdrv_init_padding(bs, *offset, *bytes, pad)) {
-return false;
+if (padded) {
+*padded = false;
+}
+return 0;
 }
 
 ret = qemu_iovec_init_extended(&pad->local_qiov, pad->buf, pad->head,
*qiov, *qiov_offset, *bytes,
pad->buf + pad->buf_len - pad->tail,
pad->tail);
-assert(ret == 0);
+if (ret < 0) {
+bdrv_padding_destroy(pad);
+return ret;
+}
 *bytes += pad->head + pad->tail;
 *offset -= pad->head;
 *qiov = &pad->local_qiov;
 *qiov_offset = 0;
+if (padded) {
+*padded = true;
+}
 
-return true;
+return 0;
 }
 
 int coroutine_fn bdrv_co_preadv(BdrvChild *child,
@@ -1722,7 +1732,11 @@ int coroutine_fn bdrv_co_preadv_part(BdrvChild *child,
 flags |= BDRV_REQ_COPY_ON_READ;
 }
 
-bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes, &pad);
+ret = bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes, &pad,
+   NULL);
+if (ret < 0) {
+return ret;
+}
 
 tracked_request_begin(&req, bs, offset, bytes, BDRV_TRACKED_READ);
 ret = bdrv_aligned_preadv(child, &req, offset, bytes,
@@ -2145,8 +2159,11 @@ int coroutine_fn bdrv_co_pwritev_part(BdrvChild *child,
  * bdrv_co_do_zero_pwritev() does aligning by itself, so, we do
  * alignment only if there is no ZERO flag.
  */
-padded = bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes,
-  &pad);
+ret = bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes, &pad,
+   &padded);
+if (ret < 0) {
+return ret;
+}
 }
 
 bdrv_inc_in_flight(bs);
-- 
2.25.4

[PATCH v4 10/16] block/io: support int64_t bytes in bdrv_co_do_pwrite_zeroes()

2020-12-11 Thread Vladimir Sementsov-Ogievskiy

We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

So, prepare bdrv_co_do_pwrite_zeroes() now.

Callers are safe, as converting int to int64_t is safe. Concentrate on
'bytes' usage in the function (thx to Eric Blake):

compute 'int tail' via % 'int alignment' - safe
fragmentation loop 'int num' - still fragments with a cap on
  max_transfer

use of 'num' within the loop
MIN(bytes, max_transfer) as well as %alignment - still works, so
 calculations in if (head) {} are safe
clamp size by 'int max_write_zeroes' - safe
drv->bdrv_co_pwrite_zeroes(int) - safe because of clamping
clamp size by 'int max_transfer' - safe
buf allocation is still clamped to max_transfer
qemu_iovec_init_buf(size_t) - safe because of clamping
bdrv_driver_pwritev(uint64_t) - safe

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/io.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/block/io.c b/block/io.c
index b2bf18038b..c6a476559a 100644
--- a/block/io.c
+++ b/block/io.c
@@ -41,7 +41,7 @@
 
 static void bdrv_parent_cb_resize(BlockDriverState *bs);
 static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
-int64_t offset, int bytes, BdrvRequestFlags flags);
+int64_t offset, int64_t bytes, BdrvRequestFlags flags);
 
 static void bdrv_parent_drained_begin(BlockDriverState *bs, BdrvChild *ignore,
   bool ignore_bds_parents)
@@ -1791,7 +1791,7 @@ int coroutine_fn bdrv_co_preadv_part(BdrvChild *child,
 }
 
 static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
-int64_t offset, int bytes, BdrvRequestFlags flags)
+int64_t offset, int64_t bytes, BdrvRequestFlags flags)
 {
 BlockDriver *drv = bs->drv;
 QEMUIOVector qiov;
@@ -1806,6 +1806,8 @@ static int coroutine_fn 
bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
 bs->bl.request_alignment);
 int max_transfer = MIN_NON_ZERO(bs->bl.max_transfer, MAX_BOUNCE_BUFFER);
 
+bdrv_check_request(offset, bytes, &error_abort);
+
 if (!drv) {
 return -ENOMEDIUM;
 }
@@ -1821,7 +1823,7 @@ static int coroutine_fn 
bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
 assert(max_write_zeroes >= bs->bl.request_alignment);
 
 while (bytes > 0 && !ret) {
-int num = bytes;
+int64_t num = bytes;
 
 /* Align request.  Block drivers can expect the "bulk" of the request
  * to be aligned, and that unaligned requests do not cross cluster
-- 
2.25.4

[PATCH v4 09/16] block/io: use int64_t bytes in driver wrappers

2020-12-11 Thread Vladimir Sementsov-Ogievskiy

We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

So, convert driver wrappers parameters which are already 64bit to
signed type.

Requests in block/io.c must never exceed BDRV_MAX_LENGTH (which is less
than INT64_MAX), which makes the conversion to signed 64bit type safe.

Add corresponding assertions.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/io.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/block/io.c b/block/io.c
index 5dec6ab925..b2bf18038b 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1103,7 +1103,7 @@ static void bdrv_co_io_em_complete(void *opaque, int ret)
 }
 
 static int coroutine_fn bdrv_driver_preadv(BlockDriverState *bs,
-   uint64_t offset, uint64_t bytes,
+   int64_t offset, int64_t bytes,
QEMUIOVector *qiov,
size_t qiov_offset, int flags)
 {
@@ -1113,6 +1113,7 @@ static int coroutine_fn 
bdrv_driver_preadv(BlockDriverState *bs,
 QEMUIOVector local_qiov;
 int ret;
 
+bdrv_check_qiov_request(offset, bytes, qiov, qiov_offset, &error_abort);
 assert(!(flags & ~BDRV_REQ_MASK));
 assert(!(flags & BDRV_REQ_NO_FALLBACK));
 
@@ -1172,7 +1173,7 @@ out:
 }
 
 static int coroutine_fn bdrv_driver_pwritev(BlockDriverState *bs,
-uint64_t offset, uint64_t bytes,
+int64_t offset, int64_t bytes,
 QEMUIOVector *qiov,
 size_t qiov_offset, int flags)
 {
@@ -1182,6 +1183,7 @@ static int coroutine_fn 
bdrv_driver_pwritev(BlockDriverState *bs,
 QEMUIOVector local_qiov;
 int ret;
 
+bdrv_check_qiov_request(offset, bytes, qiov, qiov_offset, &error_abort);
 assert(!(flags & ~BDRV_REQ_MASK));
 assert(!(flags & BDRV_REQ_NO_FALLBACK));
 
@@ -1252,14 +1254,16 @@ emulate_flags:
 }
 
 static int coroutine_fn
-bdrv_driver_pwritev_compressed(BlockDriverState *bs, uint64_t offset,
-   uint64_t bytes, QEMUIOVector *qiov,
+bdrv_driver_pwritev_compressed(BlockDriverState *bs, int64_t offset,
+   int64_t bytes, QEMUIOVector *qiov,
size_t qiov_offset)
 {
 BlockDriver *drv = bs->drv;
 QEMUIOVector local_qiov;
 int ret;
 
+bdrv_check_qiov_request(offset, bytes, qiov, qiov_offset, &error_abort);
+
 if (!drv) {
 return -ENOMEDIUM;
 }
-- 
2.25.4

[PATCH v4 15/16] block/io: support int64_t bytes in read/write wrappers

2020-12-11 Thread Vladimir Sementsov-Ogievskiy

We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

Now, when bdrv_co_preadv_part() and bdrv_co_pwritev_part() updated,
update all their wrappers.

For all of them type of 'bytes' is widening, so callers are safe. We
have update request_fn in blkverify.c simultaneusly. Still it's just a
pointer to on of bdrv_co_pwritev() or bdrv_co_preadv(), and type is
widening for callers of the request_fn anyway.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/block.h | 11 ++-
 include/block/block_int.h |  4 ++--
 block/blkverify.c |  2 +-
 block/io.c| 15 ---
 block/trace-events|  2 +-
 5 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 5b81e33e94..3549125f1d 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -390,12 +390,13 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state,
 void bdrv_reopen_commit(BDRVReopenState *reopen_state);
 void bdrv_reopen_abort(BDRVReopenState *reopen_state);
 int bdrv_pwrite_zeroes(BdrvChild *child, int64_t offset,
-   int bytes, BdrvRequestFlags flags);
+   int64_t bytes, BdrvRequestFlags flags);
 int bdrv_make_zero(BdrvChild *child, BdrvRequestFlags flags);
-int bdrv_pread(BdrvChild *child, int64_t offset, void *buf, int bytes);
-int bdrv_pwrite(BdrvChild *child, int64_t offset, const void *buf, int bytes);
+int bdrv_pread(BdrvChild *child, int64_t offset, void *buf, int64_t bytes);
+int bdrv_pwrite(BdrvChild *child, int64_t offset, const void *buf,
+int64_t bytes);
 int bdrv_pwrite_sync(BdrvChild *child, int64_t offset,
- const void *buf, int count);
+ const void *buf, int64_t bytes);
 /*
  * Efficiently zero a region of the disk image.  Note that this is a regular
  * I/O request like read or write and should have a reasonable size.  This
@@ -403,7 +404,7 @@ int bdrv_pwrite_sync(BdrvChild *child, int64_t offset,
  * because it may allocate memory for the entire region.
  */
 int coroutine_fn bdrv_co_pwrite_zeroes(BdrvChild *child, int64_t offset,
-   int bytes, BdrvRequestFlags flags);
+   int64_t bytes, BdrvRequestFlags flags);
 BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
 const char *backing_file);
 void bdrv_refresh_filename(BlockDriverState *bs);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 55b1039872..5e482a8f08 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1028,13 +1028,13 @@ extern BlockDriver bdrv_raw;
 extern BlockDriver bdrv_qcow2;
 
 int coroutine_fn bdrv_co_preadv(BdrvChild *child,
-int64_t offset, unsigned int bytes, QEMUIOVector *qiov,
+int64_t offset, int64_t bytes, QEMUIOVector *qiov,
 BdrvRequestFlags flags);
 int coroutine_fn bdrv_co_preadv_part(BdrvChild *child,
 int64_t offset, int64_t bytes,
 QEMUIOVector *qiov, size_t qiov_offset, BdrvRequestFlags flags);
 int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
-int64_t offset, unsigned int bytes, QEMUIOVector *qiov,
+int64_t offset, int64_t bytes, QEMUIOVector *qiov,
 BdrvRequestFlags flags);
 int coroutine_fn bdrv_co_pwritev_part(BdrvChild *child,
 int64_t offset, int64_t bytes,
diff --git a/block/blkverify.c b/block/blkverify.c
index 4aed53ab59..943e62be9c 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -31,7 +31,7 @@ typedef struct BlkverifyRequest {
 uint64_t bytes;
 int flags;
 
-int (*request_fn)(BdrvChild *, int64_t, unsigned int, QEMUIOVector *,
+int (*request_fn)(BdrvChild *, int64_t, int64_t, QEMUIOVector *,
   BdrvRequestFlags);
 
 int ret;/* test image result */
diff --git a/block/io.c b/block/io.c
index 5200658224..34dae81fa7 100644
--- a/block/io.c
+++ b/block/io.c
@@ -983,7 +983,7 @@ static int bdrv_check_request32(int64_t offset, int64_t 
bytes,
 }
 
 int bdrv_pwrite_zeroes(BdrvChild *child, int64_t offset,
-   int bytes, BdrvRequestFlags flags)
+   int64_t bytes, BdrvRequestFlags flags)
 {
 return bdrv_pwritev(child, offset, bytes, NULL,
 BDRV_REQ_ZERO_WRITE | flags);
@@ -1031,7 +1031,7 @@ int bdrv_make_zero(BdrvChild *child, BdrvRequestFlags 
flags)
 }
 
 /* See bdrv_pwrite() for the return codes */
-int bdrv_pread(BdrvChild *child, int64_t offset, void *buf, int bytes)
+int bdrv_pread(BdrvChild *child, int64_t offset, void *buf, int64_t bytes)
 {
 int ret;
 QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, buf, bytes);
@@

[PATCH v4 06/16] block/throttle-groups: throttle_group_co_io_limits_intercept(): 64bit bytes

2020-12-11 Thread Vladimir Sementsov-Ogievskiy

The function is called from 64bit io handlers, and bytes is just passed
to throttle_account() which is 64bit too (unsigned though). So, let's
convert intermediate argument to 64bit too.

This patch is a first in the 64-bit-blocklayer series, so we are
generally moving to int64_t for both offset and bytes parameters on all
io paths. Main motivation is realization of 64-bit write_zeroes
operation for fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

Patch-correctness audit by Eric Blake:

  Caller has 32-bit, this patch now causes widening which is safe:
  block/block-backend.c: blk_do_preadv() passes 'unsigned int'
  block/block-backend.c: blk_do_pwritev_part() passes 'unsigned int'
  block/throttle.c: throttle_co_pwrite_zeroes() passes 'int'
  block/throttle.c: throttle_co_pdiscard() passes 'int'

  Caller has 64-bit, this patch fixes potential bug where pre-patch
  could narrow, except it's easy enough to trace that callers are still
  capped at 2G actions:
  block/throttle.c: throttle_co_preadv() passes 'uint64_t'
  block/throttle.c: throttle_co_pwritev() passes 'uint64_t'

  Implementation in question: block/throttle-groups.c
  throttle_group_co_io_limits_intercept() takes 'unsigned int bytes'
  and uses it: argument to util/throttle.c throttle_account(uint64_t)

  All safe: it patches a latent bug, and does not introduce any 64-bit
  gotchas once throttle_co_p{read,write}v are relaxed, and assuming
  throttle_account() is not buggy.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Eric Blake 
Reviewed-by: Alberto Garcia 
---
 include/block/throttle-groups.h | 2 +-
 block/throttle-groups.c | 5 -
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/block/throttle-groups.h b/include/block/throttle-groups.h
index 8bf7d233fa..9541b32432 100644
--- a/include/block/throttle-groups.h
+++ b/include/block/throttle-groups.h
@@ -77,7 +77,7 @@ void throttle_group_unregister_tgm(ThrottleGroupMember *tgm);
 void throttle_group_restart_tgm(ThrottleGroupMember *tgm);
 
 void coroutine_fn throttle_group_co_io_limits_intercept(ThrottleGroupMember 
*tgm,
-unsigned int bytes,
+int64_t bytes,
 bool is_write);
 void throttle_group_attach_aio_context(ThrottleGroupMember *tgm,
AioContext *new_context);
diff --git a/block/throttle-groups.c b/block/throttle-groups.c
index abd16ed9db..fb203c3ced 100644
--- a/block/throttle-groups.c
+++ b/block/throttle-groups.c
@@ -358,12 +358,15 @@ static void schedule_next_request(ThrottleGroupMember 
*tgm, bool is_write)
  * @is_write:  the type of operation (read/write)
  */
 void coroutine_fn throttle_group_co_io_limits_intercept(ThrottleGroupMember 
*tgm,
-unsigned int bytes,
+int64_t bytes,
 bool is_write)
 {
 bool must_wait;
 ThrottleGroupMember *token;
 ThrottleGroup *tg = container_of(tgm->throttle_state, ThrottleGroup, ts);
+
+assert(bytes >= 0);
+
 qemu_mutex_lock(&tg->lock);
 
 /* First we check if this I/O has to be throttled. */
-- 
2.25.4

[PATCH v4 02/16] util/iov: make qemu_iovec_init_extended() honest

2020-12-11 Thread Vladimir Sementsov-Ogievskiy

Actually, we can't extend the io vector in all cases. Handle possible
MAX_IOV and size_t overflows.

For now add assertion to callers (actually they rely on success anyway)
and fix them in the following patch.

Add also some additional good assertions to qemu_iovec_init_slice()
while being here.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/qemu/iov.h |  2 +-
 block/io.c | 10 +++---
 util/iov.c | 25 +++--
 3 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/include/qemu/iov.h b/include/qemu/iov.h
index b6b283a5e5..9330746680 100644
--- a/include/qemu/iov.h
+++ b/include/qemu/iov.h
@@ -222,7 +222,7 @@ static inline void *qemu_iovec_buf(QEMUIOVector *qiov)
 
 void qemu_iovec_init(QEMUIOVector *qiov, int alloc_hint);
 void qemu_iovec_init_external(QEMUIOVector *qiov, struct iovec *iov, int niov);
-void qemu_iovec_init_extended(
+int qemu_iovec_init_extended(
 QEMUIOVector *qiov,
 void *head_buf, size_t head_len,
 QEMUIOVector *mid_qiov, size_t mid_offset, size_t mid_len,
diff --git a/block/io.c b/block/io.c
index e076236db2..21e8a50725 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1652,13 +1652,17 @@ static bool bdrv_pad_request(BlockDriverState *bs,
  int64_t *offset, unsigned int *bytes,
  BdrvRequestPadding *pad)
 {
+int ret;
+
 if (!bdrv_init_padding(bs, *offset, *bytes, pad)) {
 return false;
 }
 
-qemu_iovec_init_extended(&pad->local_qiov, pad->buf, pad->head,
- *qiov, *qiov_offset, *bytes,
- pad->buf + pad->buf_len - pad->tail, pad->tail);
+ret = qemu_iovec_init_extended(&pad->local_qiov, pad->buf, pad->head,
+   *qiov, *qiov_offset, *bytes,
+   pad->buf + pad->buf_len - pad->tail,
+   pad->tail);
+assert(ret == 0);
 *bytes += pad->head + pad->tail;
 *offset -= pad->head;
 *qiov = &pad->local_qiov;
diff --git a/util/iov.c b/util/iov.c
index f3a9e92a37..58c7b3 100644
--- a/util/iov.c
+++ b/util/iov.c
@@ -415,7 +415,7 @@ int qemu_iovec_subvec_niov(QEMUIOVector *qiov, size_t 
offset, size_t len)
  * Compile new iovec, combining @head_buf buffer, sub-qiov of @mid_qiov,
  * and @tail_buf buffer into new qiov.
  */
-void qemu_iovec_init_extended(
+int qemu_iovec_init_extended(
 QEMUIOVector *qiov,
 void *head_buf, size_t head_len,
 QEMUIOVector *mid_qiov, size_t mid_offset, size_t mid_len,
@@ -425,12 +425,24 @@ void qemu_iovec_init_extended(
 int total_niov, mid_niov = 0;
 struct iovec *p, *mid_iov = NULL;
 
+assert(mid_qiov->niov <= IOV_MAX);
+
+if (SIZE_MAX - head_len < mid_len ||
+SIZE_MAX - head_len - mid_len < tail_len)
+{
+return -EINVAL;
+}
+
 if (mid_len) {
 mid_iov = qiov_slice(mid_qiov, mid_offset, mid_len,
  &mid_head, &mid_tail, &mid_niov);
 }
 
 total_niov = !!head_len + mid_niov + !!tail_len;
+if (total_niov > IOV_MAX) {
+return -EINVAL;
+}
+
 if (total_niov == 1) {
 qemu_iovec_init_buf(qiov, NULL, 0);
 p = &qiov->local_iov;
@@ -459,6 +471,8 @@ void qemu_iovec_init_extended(
 p->iov_base = tail_buf;
 p->iov_len = tail_len;
 }
+
+return 0;
 }
 
 /*
@@ -492,7 +506,14 @@ bool qemu_iovec_is_zero(QEMUIOVector *qiov, size_t offset, 
size_t bytes)
 void qemu_iovec_init_slice(QEMUIOVector *qiov, QEMUIOVector *source,
size_t offset, size_t len)
 {
-qemu_iovec_init_extended(qiov, NULL, 0, source, offset, len, NULL, 0);
+int ret;
+
+assert(source->size >= len);
+assert(source->size - len >= offset);
+
+/* We shrink the request, so we can't overflow neither size_t nor MAX_IOV 
*/
+ret = qemu_iovec_init_extended(qiov, NULL, 0, source, offset, len, NULL, 
0);
+assert(ret == 0);
 }
 
 void qemu_iovec_destroy(QEMUIOVector *qiov)
-- 
2.25.4

[PATCH v4 03/16] block: fix theoretical overflow in bdrv_init_padding()

2020-12-11 Thread Vladimir Sementsov-Ogievskiy

Calculation of sum may theoretically overflow, so use 64bit type and
add some good assertions.

Use int64_t constantly.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/io.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/block/io.c b/block/io.c
index 21e8a50725..d9bc67f1b0 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1537,8 +1537,12 @@ static bool bdrv_init_padding(BlockDriverState *bs,
   int64_t offset, int64_t bytes,
   BdrvRequestPadding *pad)
 {
-uint64_t align = bs->bl.request_alignment;
-size_t sum;
+int64_t align = bs->bl.request_alignment;
+int64_t sum;
+
+bdrv_check_request(offset, bytes, &error_abort);
+assert(align <= INT_MAX); /* documented in block/block_int.h */
+assert(align * 2 <= SIZE_MAX); /* so we can allocate the buffer */
 
 memset(pad, 0, sizeof(*pad));
 
-- 
2.25.4

[PATCH v4 04/16] block/io: refactor bdrv_pad_request(): move bdrv_pad_request() up

2020-12-11 Thread Vladimir Sementsov-Ogievskiy

Prepare to the following patch when bdrv_pad_request() will be able to
fail. Update the comments.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/io.c | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/block/io.c b/block/io.c
index d9bc67f1b0..dcfab267f8 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2107,6 +2107,7 @@ int coroutine_fn bdrv_co_pwritev_part(BdrvChild *child,
 uint64_t align = bs->bl.request_alignment;
 BdrvRequestPadding pad;
 int ret;
+bool padded = false;
 
 trace_bdrv_co_pwritev(child->bs, offset, bytes, flags);
 
@@ -2138,20 +2139,32 @@ int coroutine_fn bdrv_co_pwritev_part(BdrvChild *child,
 return 0;
 }
 
+if (!(flags & BDRV_REQ_ZERO_WRITE)) {
+/*
+ * Pad request for following read-modify-write cycle.
+ * bdrv_co_do_zero_pwritev() does aligning by itself, so, we do
+ * alignment only if there is no ZERO flag.
+ */
+padded = bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes,
+  &pad);
+}
+
 bdrv_inc_in_flight(bs);
-/*
- * Align write if necessary by performing a read-modify-write cycle.
- * Pad qiov with the read parts and be sure to have a tracked request not
- * only for bdrv_aligned_pwritev, but also for the reads of the RMW cycle.
- */
 tracked_request_begin(&req, bs, offset, bytes, BDRV_TRACKED_WRITE);
 
 if (flags & BDRV_REQ_ZERO_WRITE) {
+assert(!padded);
 ret = bdrv_co_do_zero_pwritev(child, offset, bytes, flags, &req);
 goto out;
 }
 
-if (bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes, &pad)) {
+if (padded) {
+/*
+ * Request was unaligned to request_alignment and therefore padded.
+ * We are going to do read-modify-write. User is not prepared to 
widened
+ * request intersections with other requests, so we serialize the
+ * request.
+ */
 bdrv_mark_request_serialising(&req, align);
 bdrv_padding_rmw_read(child, &req, &pad, false);
 }
-- 
2.25.4

Re: [PATCH 10/20] block: Avoid qobject_get_try_str()

2020-12-11 Thread Vladimir Sementsov-Ogievskiy

11.12.2020 20:11, Markus Armbruster wrote:

I'm about to remove qobject_get_try_str().  Use qstring_get_str()
instead.  Safe because the argument is known to be a QString here.

Cc: Kevin Wolf
Cc: Max Reitz
Cc:qemu-block@nongnu.org
Signed-off-by: Markus Armbruster

Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir

[PATCH v4 00/16] 64bit block-layer: part I

2020-12-11 Thread Vladimir Sementsov-Ogievskiy

Hi all!

We want 64bit write-zeroes, and for this, convert all io functions to
64bit.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

Please refer to initial cover-letter 
 https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg08723.html
for more info.

v4: I found, that some more work is needed for block/block-backend, so
decided to make partI, converting block/io

v4 is based on Kevin's block branch ([PULL 00/34] Block layer patches)
   for BDRV_MAX_LENGTH

changes:
01-05: new
06: add Alberto's r-b
07: new
08-16: rebase, add new-style request check, improve commit-msg, drop r-bs

Based-on: <20201211170812.228643-1-kw...@redhat.com>

Vladimir Sementsov-Ogievskiy (16):
  block: refactor bdrv_check_request: add errp
  util/iov: make qemu_iovec_init_extended() honest
  block: fix theoretical overflow in bdrv_init_padding()
  block/io: refactor bdrv_pad_request(): move bdrv_pad_request() up
  block/io: bdrv_pad_request(): support qemu_iovec_init_extended failure
  block/throttle-groups: throttle_group_co_io_limits_intercept(): 64bit
bytes
  block/io: improve bdrv_check_request: check qiov too
  block: use int64_t as bytes type in tracked requests
  block/io: use int64_t bytes in driver wrappers
  block/io: support int64_t bytes in bdrv_co_do_pwrite_zeroes()
  block/io: support int64_t bytes in bdrv_aligned_pwritev()
  block/io: support int64_t bytes in bdrv_co_do_copy_on_readv()
  block/io: support int64_t bytes in bdrv_aligned_preadv()
  block/io: support int64_t bytes in bdrv_co_p{read,write}v_part()
  block/io: support int64_t bytes in read/write wrappers
  block/io: use int64_t bytes in copy_range

 include/block/block.h   |  17 +-
 include/block/block_int.h   |  26 +--
 include/block/throttle-groups.h |   2 +-
 include/qemu/iov.h  |   2 +-
 block/blkverify.c   |   2 +-
 block/file-posix.c  |   2 +-
 block/io.c  | 274 ++--
 block/throttle-groups.c |   5 +-
 tests/test-write-threshold.c|   5 +-
 util/iov.c  |  25 ++-
 block/trace-events  |  12 +-
 11 files changed, 252 insertions(+), 120 deletions(-)

-- 
2.25.4

[PATCH v4 01/16] block: refactor bdrv_check_request: add errp

2020-12-11 Thread Vladimir Sementsov-Ogievskiy

It's better to pass &error_abort than just assert that result is 0: on
crash, we'll immediately see the reason in the backtrace.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/block_int.h|  2 +-
 block/file-posix.c   |  2 +-
 block/io.c   | 29 ++---
 tests/test-write-threshold.c |  5 +++--
 4 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 1eeafc118c..ff29f31451 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -93,7 +93,7 @@ typedef struct BdrvTrackedRequest {
 struct BdrvTrackedRequest *waiting_for;
 } BdrvTrackedRequest;
 
-int bdrv_check_request(int64_t offset, int64_t bytes);
+int bdrv_check_request(int64_t offset, int64_t bytes, Error **errp);
 
 struct BlockDriver {
 const char *format_name;
diff --git a/block/file-posix.c b/block/file-posix.c
index 83e2cc5530..fc35a47832 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2951,7 +2951,7 @@ raw_do_pwrite_zeroes(BlockDriverState *bs, int64_t 
offset, int bytes,
 
 req->bytes = BDRV_MAX_LENGTH - req->offset;
 
-assert(bdrv_check_request(req->offset, req->bytes) == 0);
+bdrv_check_request(req->offset, req->bytes, &error_abort);
 
 bdrv_mark_request_serialising(req, bs->bl.request_alignment);
 }
diff --git a/block/io.c b/block/io.c
index 24205f5168..e076236db2 100644
--- a/block/io.c
+++ b/block/io.c
@@ -898,17 +898,34 @@ static bool coroutine_fn 
bdrv_wait_serialising_requests(BdrvTrackedRequest *self
 return waited;
 }
 
-int bdrv_check_request(int64_t offset, int64_t bytes)
+int bdrv_check_request(int64_t offset, int64_t bytes, Error **errp)
 {
-if (offset < 0 || bytes < 0) {
+if (offset < 0) {
+error_setg(errp, "offset is negative: %" PRIi64, offset);
+return -EIO;
+}
+
+if (bytes < 0) {
+error_setg(errp, "bytes is negative: %" PRIi64, bytes);
 return -EIO;
 }
 
 if (bytes > BDRV_MAX_LENGTH) {
+error_setg(errp, "bytes(%" PRIi64 ") exceeds maximum(%" PRIi64 ")",
+   bytes, BDRV_MAX_LENGTH);
+return -EIO;
+}
+
+if (offset > BDRV_MAX_LENGTH) {
+error_setg(errp, "offset(%" PRIi64 ") exceeds maximum(%" PRIi64 ")",
+   offset, BDRV_MAX_LENGTH);
 return -EIO;
 }
 
 if (offset > BDRV_MAX_LENGTH - bytes) {
+error_setg(errp, "sum of offset(%" PRIi64 ") and bytes(%" PRIi64 ") "
+   "exceeds maximum(%" PRIi64 ")", offset, bytes,
+   BDRV_MAX_LENGTH);
 return -EIO;
 }
 
@@ -917,7 +934,7 @@ int bdrv_check_request(int64_t offset, int64_t bytes)
 
 static int bdrv_check_request32(int64_t offset, int64_t bytes)
 {
-int ret = bdrv_check_request(offset, bytes);
+int ret = bdrv_check_request(offset, bytes, NULL);
 if (ret < 0) {
 return ret;
 }
@@ -2819,7 +2836,7 @@ int coroutine_fn bdrv_co_pdiscard(BdrvChild *child, 
int64_t offset,
 return -EPERM;
 }
 
-ret = bdrv_check_request(offset, bytes);
+ret = bdrv_check_request(offset, bytes, NULL);
 if (ret < 0) {
 return ret;
 }
@@ -3221,10 +3238,8 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, 
int64_t offset, bool exact,
 return -EINVAL;
 }
 
-ret = bdrv_check_request(offset, 0);
+ret = bdrv_check_request(offset, 0, errp);
 if (ret < 0) {
-error_setg(errp, "Required too big image size, it must be not greater "
-   "than %" PRId64, BDRV_MAX_LENGTH);
 return ret;
 }
 
diff --git a/tests/test-write-threshold.c b/tests/test-write-threshold.c
index 4cf032652d..fc1c45a2eb 100644
--- a/tests/test-write-threshold.c
+++ b/tests/test-write-threshold.c
@@ -7,6 +7,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qapi/error.h"
 #include "block/block_int.h"
 #include "block/write-threshold.h"
 
@@ -64,7 +65,7 @@ static void test_threshold_not_trigger(void)
 req.offset = 1024;
 req.bytes = 1024;
 
-assert(bdrv_check_request(req.offset, req.bytes) == 0);
+bdrv_check_request(req.offset, req.bytes, &error_abort);
 
 bdrv_write_threshold_set(&bs, threshold);
 amount = bdrv_write_threshold_exceeded(&bs, &req);
@@ -84,7 +85,7 @@ static void test_threshold_trigger(void)
 req.offset = (4 * 1024 * 1024) - 1024;
 req.bytes = 2 * 1024;
 
-assert(bdrv_check_request(req.offset, req.bytes) == 0);
+bdrv_check_request(req.offset, req.bytes, &error_abort);
 
 bdrv_write_threshold_set(&bs, threshold);
 amount = bdrv_write_threshold_exceeded(&bs, &req);
-- 
2.25.4

Re: [PATCH v14 13/13] block: apply COR-filter to block-stream jobs

2020-12-11 Thread Vladimir Sementsov-Ogievskiy


11.12.2020 20:21, Max Reitz wrote:

On 04.12.20 23:07, Vladimir Sementsov-Ogievskiy wrote:

From: Andrey Shinkevich 

This patch completes the series with the COR-filter applied to
block-stream operations.

Adding the filter makes it possible in future implement discarding
copied regions in backing files during the block-stream job, to reduce
the disk overuse (we need control on permissions).

Also, the filter now is smart enough to do copy-on-read with specified
base, so we have benefit on guest reads even when doing block-stream of
the part of the backing chain.

Several iotests are slightly modified due to filter insertion.

Signed-off-by: Andrey Shinkevich 
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/stream.c | 78 ++
  tests/qemu-iotests/030 |  8 ++--
  tests/qemu-iotests/141.out |  2 +-
  tests/qemu-iotests/245 | 20 ++
  4 files changed, 72 insertions(+), 36 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index a7fd8945ad..b92f7de55b 100644
--- a/block/stream.c
+++ b/block/stream.c


[...]


@@ -295,17 +287,49 @@ void stream_start(const char *job_id, BlockDriverState 
*bs,


[...]


+    opts = qdict_new();
+
+    qdict_put_str(opts, "driver", "copy-on-read");
+    qdict_put_str(opts, "file", bdrv_get_node_name(bs));
+    /* Pass the base_overlay node name as 'bottom' to COR driver */
+    qdict_put_str(opts, "bottom", base_overlay->node_name);


Hm.  Should we set this option even if no base was specified?

On one hand, omitting this option would cor_co_preadv_part() a bit quicker.

On the other, what happens when you add a backing file below the bottom node 
during streaming (yes, a largely theoretical case)...


Yes, that's what I was thinking about.

And more: we are moving to using "bottom" and deprecate "base". So bottom is 
the main concept, and it can't be NULL. If user don't specify it, than default bottom - is the 
current bottom node in the chain.

I think, we are not going to introduce a different behavior for stream "without 
bottom", when user can add more nodes to the chain during the job, and all of them 
will be removed after the job. It will require rethinking of freezing and keeping 
references on intermediate nodes at least..


  Now, all data from it is ignored.  That seemed a bit strange to me at first, 
but on second thought, it makes more sense.  Doing anything else would produce 
a garbage result basically, because stream_run() doesn’t take such a change 
into account.



yes


So...  After all I think I agree with setting @bottom unconditionally.

And that’s the only comment I had. :)

Reviewed-by: Max Reitz 



Thanks! v15 will come next week)

--
Best regards,
Vladimir

Re: [PATCH v14 13/13] block: apply COR-filter to block-stream jobs

2020-12-11 Thread Max Reitz


On 04.12.20 23:07, Vladimir Sementsov-Ogievskiy wrote:

From: Andrey Shinkevich 

This patch completes the series with the COR-filter applied to
block-stream operations.

Adding the filter makes it possible in future implement discarding
copied regions in backing files during the block-stream job, to reduce
the disk overuse (we need control on permissions).

Also, the filter now is smart enough to do copy-on-read with specified
base, so we have benefit on guest reads even when doing block-stream of
the part of the backing chain.

Several iotests are slightly modified due to filter insertion.

Signed-off-by: Andrey Shinkevich 
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/stream.c | 78 ++
  tests/qemu-iotests/030 |  8 ++--
  tests/qemu-iotests/141.out |  2 +-
  tests/qemu-iotests/245 | 20 ++
  4 files changed, 72 insertions(+), 36 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index a7fd8945ad..b92f7de55b 100644
--- a/block/stream.c
+++ b/block/stream.c


[...]


@@ -295,17 +287,49 @@ void stream_start(const char *job_id, BlockDriverState 
*bs,


[...]


+opts = qdict_new();
+
+qdict_put_str(opts, "driver", "copy-on-read");
+qdict_put_str(opts, "file", bdrv_get_node_name(bs));
+/* Pass the base_overlay node name as 'bottom' to COR driver */
+qdict_put_str(opts, "bottom", base_overlay->node_name);


Hm.  Should we set this option even if no base was specified?

On one hand, omitting this option would cor_co_preadv_part() a bit quicker.

On the other, what happens when you add a backing file below the bottom 
node during streaming (yes, a largely theoretical case)...  Now, all 
data from it is ignored.  That seemed a bit strange to me at first, but 
on second thought, it makes more sense.  Doing anything else would 
produce a garbage result basically, because stream_run() doesn’t take 
such a change into account.


So...  After all I think I agree with setting @bottom unconditionally.

And that’s the only comment I had. :)

Reviewed-by: Max Reitz

[PATCH 10/20] block: Avoid qobject_get_try_str()

2020-12-11 Thread Markus Armbruster

I'm about to remove qobject_get_try_str().  Use qstring_get_str()
instead.  Safe because the argument is known to be a QString here.

Cc: Kevin Wolf 
Cc: Max Reitz 
Cc: qemu-block@nongnu.org
Signed-off-by: Markus Armbruster 
---
 block.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index 487b2b1497..94d3a15081 100644
--- a/block.c
+++ b/block.c
@@ -4015,7 +4015,7 @@ static int bdrv_reopen_parse_backing(BDRVReopenState 
*reopen_state,
 new_backing_bs = NULL;
 break;
 case QTYPE_QSTRING:
-str = qobject_get_try_str(value);
+str = qstring_get_str(qobject_to(QString, value));
 new_backing_bs = bdrv_lookup_bs(NULL, str, errp);
 if (new_backing_bs == NULL) {
 return -EINVAL;
@@ -4278,8 +4278,8 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, 
BlockReopenQueue *queue,
 }
 
 if (child) {
-const char *str = qobject_get_try_str(new);
-if (!strcmp(child->bs->node_name, str)) {
+if (!strcmp(child->bs->node_name,
+qstring_get_str(qobject_to(QString, new {
 continue; /* Found child with this name, skip option */
 }
 }
-- 
2.26.2

[PATCH 19/20] block: Use GString instead of QString to build filenames

2020-12-11 Thread Markus Armbruster

QString supports modifying its string, but it's quite limited: you can
only append.  Just one caller remains:
bdrv_parse_filename_strip_prefix() uses it just for building an
initial string.

Change it to do build the initial string with GString.  This is
another step towards making QString immutable.

Cc: Kevin Wolf 
Cc: Max Reitz 
Cc: qemu-block@nongnu.org
Signed-off-by: Markus Armbruster 
---
 block.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/block.c b/block.c
index 94d3a15081..75ffbe9092 100644
--- a/block.c
+++ b/block.c
@@ -216,7 +216,7 @@ void bdrv_parse_filename_strip_prefix(const char *filename, 
const char *prefix,
 /* Stripping the explicit protocol prefix may result in a protocol
  * prefix being (wrongly) detected (if the filename contains a colon) 
*/
 if (path_has_protocol(filename)) {
-QString *fat_filename;
+GString *fat_filename;
 
 /* This means there is some colon before the first slash; 
therefore,
  * this cannot be an absolute path */
@@ -224,12 +224,13 @@ void bdrv_parse_filename_strip_prefix(const char 
*filename, const char *prefix,
 
 /* And we can thus fix the protocol detection issue by prefixing it
  * by "./" */
-fat_filename = qstring_from_str("./");
-qstring_append(fat_filename, filename);
+fat_filename = g_string_new("./");
+g_string_append(fat_filename, filename);
 
-assert(!path_has_protocol(qstring_get_str(fat_filename)));
+assert(!path_has_protocol(fat_filename->str));
 
-qdict_put(options, "filename", fat_filename);
+qdict_put(options, "filename",
+  qstring_from_gstring(fat_filename));
 } else {
 /* If no protocol prefix was detected, we can use the shortened
  * filename as-is */
-- 
2.26.2

Re: [PATCH v14 10/13] qapi: block-stream: add "bottom" argument

2020-12-11 Thread Max Reitz


On 11.12.20 18:42, Vladimir Sementsov-Ogievskiy wrote:

11.12.2020 20:24, Max Reitz wrote:

On 11.12.20 17:50, Vladimir Sementsov-Ogievskiy wrote:

11.12.2020 19:05, Max Reitz wrote:

On 04.12.20 23:07, Vladimir Sementsov-Ogievskiy wrote:

The code already don't freeze base node and we try to make it prepared
for the situation when base node is changed during the operation. In
other words, block-stream doesn't own base node.

Let's introduce a new interface which should replace the current one,
which will in better relations with the code. Specifying bottom node
instead of base, and requiring it to be non-filter gives us the
following benefits:

  - drop difference between above_base and base_overlay, which will be
    renamed to just bottom, when old interface dropped

  - clean way to work with parallel streams/commits on the same 
backing

    chain, which otherwise become a problem when we introduce a filter
    for stream job

  - cleaner interface. Nobody will surprised the fact that base 
node may
    disappear during block-stream, when there is no word about 
"base" in

    the interface.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  qapi/block-core.json   |  8 +++--
  include/block/block_int.h  |  1 +
  block/monitor/block-hmp-cmds.c |  3 +-
  block/stream.c | 50 +++-
  blockdev.c | 61 
--

  5 files changed, 94 insertions(+), 29 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 04055ef50c..5d6681a35d 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2522,6 +2522,10 @@
  # @base-node: the node name of the backing file.
  # It cannot be set if @base is also set. (Since 2.8)
  #
+# @bottom: the last node in the chain that should be streamed into
+#  top. It cannot be set any of @base, @base-node or 
@backing-file


s/set any/set if any/

But what’s the problem with backing-file?  The fact that specifying 
backing-file means that stream will look for that filename in the 
backing chain when the job is done (so if you use @bottom, we 
generally don’t want to rely on the presence of any nodes below it)?


I just wanted to deprecate 'backing-file' together with base and 
base-node as a next step. If user wants to set backing file unrelated 
to current backing-chain, is it correct at all? It's a direct 
violation of what's going on, and I doubt that other parts of Qemu 
working with backing-file are prepared for such situation. User can 
do it by hand later.. Anyway, we'll have three releases deprecation 
period for people to come and cry that this is a really needed 
option, so we can support it later on demand.




(If so, I would have thought that we actually want the user to 
specify backing-file so we don’t have to look down below @bottom to 
look for a filename.  Perhaps a @backing-fmt parameter would help.)


If we decide that 'backing-file' is really needed, than yes we should 
require backing-fmt to be specified together with backing-file when 
using new "bottom" interface.
Before I can agree on removing backing-file (or deprecating it), I 
need to know what it’s actually used for.  I actually don’t, though.  
The only reason I could imagine was because the user wanted to write 
some string into there that is different from base.filename.


(The original commit 13d8cc515df does mention cases like FD passing, 
where qemu has no idea what an appropriate filename would be (it can 
only see /dev/fd/*).  From that, it does appear to me that it’ll be 
needed even with @bottom.)




I should have checked it myself.. That's one more reason for my "RFC: 
don't store backing filename in qcow2 image"..


OK, do you think we can require backing-fmt to be specified if 
backing-file and bottom are specified?


Sure.

Or allow omitting it and 
deprecate this thing? We actually already have deprecation message in 
bdrv_change_backing_file(), and how we are trying to workaround it in 
block-stream will not work with file descriptors anyway (hmm, and old 
code works, so, actually 09 is a regression?)


I think requiring backing-fmt for bottom + backing-file would be the 
most simple and clean way, hopefully saving us some headaches.


Max

Re: [PATCH v14 10/13] qapi: block-stream: add "bottom" argument

2020-12-11 Thread Vladimir Sementsov-Ogievskiy


11.12.2020 20:24, Max Reitz wrote:

On 11.12.20 17:50, Vladimir Sementsov-Ogievskiy wrote:

11.12.2020 19:05, Max Reitz wrote:

On 04.12.20 23:07, Vladimir Sementsov-Ogievskiy wrote:

The code already don't freeze base node and we try to make it prepared
for the situation when base node is changed during the operation. In
other words, block-stream doesn't own base node.

Let's introduce a new interface which should replace the current one,
which will in better relations with the code. Specifying bottom node
instead of base, and requiring it to be non-filter gives us the
following benefits:

  - drop difference between above_base and base_overlay, which will be
    renamed to just bottom, when old interface dropped

  - clean way to work with parallel streams/commits on the same backing
    chain, which otherwise become a problem when we introduce a filter
    for stream job

  - cleaner interface. Nobody will surprised the fact that base node may
    disappear during block-stream, when there is no word about "base" in
    the interface.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  qapi/block-core.json   |  8 +++--
  include/block/block_int.h  |  1 +
  block/monitor/block-hmp-cmds.c |  3 +-
  block/stream.c | 50 +++-
  blockdev.c | 61 --
  5 files changed, 94 insertions(+), 29 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 04055ef50c..5d6681a35d 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2522,6 +2522,10 @@
  # @base-node: the node name of the backing file.
  # It cannot be set if @base is also set. (Since 2.8)
  #
+# @bottom: the last node in the chain that should be streamed into
+#  top. It cannot be set any of @base, @base-node or @backing-file


s/set any/set if any/

But what’s the problem with backing-file?  The fact that specifying 
backing-file means that stream will look for that filename in the backing chain 
when the job is done (so if you use @bottom, we generally don’t want to rely on 
the presence of any nodes below it)?


I just wanted to deprecate 'backing-file' together with base and base-node as a 
next step. If user wants to set backing file unrelated to current 
backing-chain, is it correct at all? It's a direct violation of what's going 
on, and I doubt that other parts of Qemu working with backing-file are prepared 
for such situation. User can do it by hand later.. Anyway, we'll have three 
releases deprecation period for people to come and cry that this is a really 
needed option, so we can support it later on demand.



(If so, I would have thought that we actually want the user to specify 
backing-file so we don’t have to look down below @bottom to look for a 
filename.  Perhaps a @backing-fmt parameter would help.)


If we decide that 'backing-file' is really needed, than yes we should require backing-fmt 
to be specified together with backing-file when using new "bottom" interface.

Before I can agree on removing backing-file (or deprecating it), I need to know 
what it’s actually used for.  I actually don’t, though.  The only reason I 
could imagine was because the user wanted to write some string into there that 
is different from base.filename.

(The original commit 13d8cc515df does mention cases like FD passing, where qemu 
has no idea what an appropriate filename would be (it can only see /dev/fd/*).  
From that, it does appear to me that it’ll be needed even with @bottom.)



I should have checked it myself.. That's one more reason for my "RFC: don't store 
backing filename in qcow2 image"..

OK, do you think we can require backing-fmt to be specified if backing-file and 
bottom are specified? Or allow omitting it and deprecate this thing? We 
actually already have deprecation message in bdrv_change_backing_file(), and 
how we are trying to workaround it in block-stream will not work with file 
descriptors anyway (hmm, and old code works, so, actually 09 is a regression?)

--
Best regards,
Vladimir

[PULL 33/34] block: Fix locking in qmp_block_resize()

2020-12-11 Thread Kevin Wolf

The drain functions assume that we hold the AioContext lock of the
drained block node. Make sure to actually take the lock.

Cc: qemu-sta...@nongnu.org
Fixes: eb94b81a94bce112e6b206df846c1551aaf6cab6
Signed-off-by: Kevin Wolf 
Message-Id: <20201203172311.68232-3-kw...@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Kevin Wolf 
---
 blockdev.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/blockdev.c b/blockdev.c
index 660c735c81..412354b4b6 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2481,13 +2481,16 @@ void coroutine_fn qmp_block_resize(bool has_device, 
const char *device,
 return;
 }
 
+bdrv_co_lock(bs);
 bdrv_drained_begin(bs);
+bdrv_co_unlock(bs);
+
 old_ctx = bdrv_co_enter(bs);
 blk_truncate(blk, size, false, PREALLOC_MODE_OFF, 0, errp);
 bdrv_co_leave(bs, old_ctx);
-bdrv_drained_end(bs);
 
 bdrv_co_lock(bs);
+bdrv_drained_end(bs);
 blk_unref(blk);
 bdrv_co_unlock(bs);
 }
-- 
2.29.2

Re: [PATCH v14 10/13] qapi: block-stream: add "bottom" argument

2020-12-11 Thread Max Reitz


On 11.12.20 17:50, Vladimir Sementsov-Ogievskiy wrote:

11.12.2020 19:05, Max Reitz wrote:

On 04.12.20 23:07, Vladimir Sementsov-Ogievskiy wrote:

The code already don't freeze base node and we try to make it prepared
for the situation when base node is changed during the operation. In
other words, block-stream doesn't own base node.

Let's introduce a new interface which should replace the current one,
which will in better relations with the code. Specifying bottom node
instead of base, and requiring it to be non-filter gives us the
following benefits:

  - drop difference between above_base and base_overlay, which will be
    renamed to just bottom, when old interface dropped

  - clean way to work with parallel streams/commits on the same backing
    chain, which otherwise become a problem when we introduce a filter
    for stream job

  - cleaner interface. Nobody will surprised the fact that base node may
    disappear during block-stream, when there is no word about "base" in
    the interface.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  qapi/block-core.json   |  8 +++--
  include/block/block_int.h  |  1 +
  block/monitor/block-hmp-cmds.c |  3 +-
  block/stream.c | 50 +++-
  blockdev.c | 61 --
  5 files changed, 94 insertions(+), 29 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 04055ef50c..5d6681a35d 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2522,6 +2522,10 @@
  # @base-node: the node name of the backing file.
  # It cannot be set if @base is also set. (Since 2.8)
  #
+# @bottom: the last node in the chain that should be streamed into
+#  top. It cannot be set any of @base, @base-node or 
@backing-file


s/set any/set if any/

But what’s the problem with backing-file?  The fact that specifying 
backing-file means that stream will look for that filename in the 
backing chain when the job is done (so if you use @bottom, we 
generally don’t want to rely on the presence of any nodes below it)?


I just wanted to deprecate 'backing-file' together with base and 
base-node as a next step. If user wants to set backing file unrelated to 
current backing-chain, is it correct at all? It's a direct violation of 
what's going on, and I doubt that other parts of Qemu working with 
backing-file are prepared for such situation. User can do it by hand 
later.. Anyway, we'll have three releases deprecation period for people 
to come and cry that this is a really needed option, so we can support 
it later on demand.




(If so, I would have thought that we actually want the user to specify 
backing-file so we don’t have to look down below @bottom to look for a 
filename.  Perhaps a @backing-fmt parameter would help.)


If we decide that 'backing-file' is really needed, than yes we should 
require backing-fmt to be specified together with backing-file when 
using new "bottom" interface.
Before I can agree on removing backing-file (or deprecating it), I need 
to know what it’s actually used for.  I actually don’t, though.  The 
only reason I could imagine was because the user wanted to write some 
string into there that is different from base.filename.


(The original commit 13d8cc515df does mention cases like FD passing, 
where qemu has no idea what an appropriate filename would be (it can 
only see /dev/fd/*).  From that, it does appear to me that it’ll be 
needed even with @bottom.)


Max

[PULL 28/34] block/file-posix: fix workaround in raw_do_pwrite_zeroes()

2020-12-11 Thread Kevin Wolf

From: Vladimir Sementsov-Ogievskiy 

We should not set overlap_bytes:

1. Don't worry: it is calculated by bdrv_mark_request_serialising() and
   will be equal to or greater than bytes anyway.

2. If the request was already aligned up to some greater alignment,
   than we may break things: we reduce overlap_bytes, and further
   bdrv_mark_request_serialising() may not help, as it will not restore
   old bigger alignment.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Message-Id: <20201203222713.13507-2-vsement...@virtuozzo.com>
Signed-off-by: Kevin Wolf 
---
 block/file-posix.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 806764f7e3..9bee3d88d0 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2952,7 +2952,6 @@ raw_do_pwrite_zeroes(BlockDriverState *bs, int64_t 
offset, int bytes,
 
 end = INT64_MAX & -(uint64_t)bs->bl.request_alignment;
 req->bytes = end - req->offset;
-req->overlap_bytes = req->bytes;
 
 bdrv_mark_request_serialising(req, bs->bl.request_alignment);
 }
-- 
2.29.2

[PULL 21/34] iotests: Give access to the qemu-storage-daemon

2020-12-11 Thread Kevin Wolf

From: Max Reitz 

Signed-off-by: Max Reitz 
Message-Id: <20201027190600.192171-18-mre...@redhat.com>
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/check | 11 +++
 tests/qemu-iotests/common.rc | 17 +
 2 files changed, 28 insertions(+)

diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index 678b6e4910..3c1fa4435a 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -644,6 +644,17 @@ if [ -z $QEMU_NBD_PROG ]; then
 fi
 export QEMU_NBD_PROG="$(type -p "$QEMU_NBD_PROG")"
 
+if [ -z "$QSD_PROG" ]; then
+if [ -x "$build_iotests/qemu-storage-daemon" ]; then
+export QSD_PROG="$build_iotests/qemu-storage-daemon"
+elif [ -x "$build_root/storage-daemon/qemu-storage-daemon" ]; then
+export QSD_PROG="$build_root/storage-daemon/qemu-storage-daemon"
+else
+_init_error "qemu-storage-daemon not found"
+fi
+fi
+export QSD_PROG="$(type -p "$QSD_PROG")"
+
 if [ -x "$build_iotests/socket_scm_helper" ]
 then
 export SOCKET_SCM_HELPER="$build_iotests/socket_scm_helper"
diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index 23f46da2db..20589e59a5 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -124,6 +124,7 @@ fi
 : ${VALGRIND_QEMU_IMG=$VALGRIND_QEMU}
 : ${VALGRIND_QEMU_IO=$VALGRIND_QEMU}
 : ${VALGRIND_QEMU_NBD=$VALGRIND_QEMU}
+: ${VALGRIND_QSD=$VALGRIND_QEMU}
 
 # The Valgrind own parameters may be set with
 # its environment variable VALGRIND_OPTS, e.g.
@@ -211,6 +212,21 @@ _qemu_nbd_wrapper()
 return $RETVAL
 }
 
+_qemu_storage_daemon_wrapper()
+{
+local VALGRIND_LOGFILE="${TEST_DIR}"/$$.valgrind
+(
+if [ -n "${QSD_NEED_PID}" ]; then
+echo $BASHPID > "${QEMU_TEST_DIR}/qemu-storage-daemon.pid"
+fi
+VALGRIND_QEMU="${VALGRIND_QSD}" _qemu_proc_exec "${VALGRIND_LOGFILE}" \
+"$QSD_PROG" $QSD_OPTIONS "$@"
+)
+RETVAL=$?
+_qemu_proc_valgrind_log "${VALGRIND_LOGFILE}" $RETVAL
+return $RETVAL
+}
+
 # Valgrind bug #409141 https://bugs.kde.org/show_bug.cgi?id=409141
 # Until valgrind 3.16+ is ubiquitous, we must work around a hang in
 # valgrind when issuing sigkill. Disable valgrind for this invocation.
@@ -223,6 +239,7 @@ export QEMU=_qemu_wrapper
 export QEMU_IMG=_qemu_img_wrapper
 export QEMU_IO=_qemu_io_wrapper
 export QEMU_NBD=_qemu_nbd_wrapper
+export QSD=_qemu_storage_daemon_wrapper
 
 if [ "$IMGOPTSSYNTAX" = "true" ]; then
 DRIVER="driver=$IMGFMT"
-- 
2.29.2

[PULL 25/34] file-posix: check the use_lock before setting the file lock

2020-12-11 Thread Kevin Wolf

From: Li Feng 

The scenario is that when accessing a volume on an NFS filesystem
without supporting the file lock,  Qemu will complain "Failed to lock
byte 100", even when setting the file.locking = off.

We should do file lock related operations only when the file.locking is
enabled, otherwise, the syscall of 'fcntl' will return non-zero.

Signed-off-by: Li Feng 
Message-Id: <1607341446-85506-1-git-send-email-fen...@smartx.com>
Signed-off-by: Kevin Wolf 
---
 block/file-posix.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index d5fd1dbcd2..806764f7e3 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -3104,7 +3104,7 @@ static int raw_check_perm(BlockDriverState *bs, uint64_t 
perm, uint64_t shared,
 }
 
 /* Copy locks to the new fd */
-if (s->perm_change_fd) {
+if (s->perm_change_fd && s->use_lock) {
 ret = raw_apply_lock_bytes(NULL, s->perm_change_fd, perm, ~shared,
false, errp);
 if (ret < 0) {
-- 
2.29.2

[PULL 32/34] block: Simplify qmp_block_resize() error paths

2020-12-11 Thread Kevin Wolf

The only thing that happens after the 'out:' label is blk_unref(blk).
However, blk = NULL in all of the error cases, so instead of jumping to
'out:', we can just return directly.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Kevin Wolf 
Message-Id: <20201203172311.68232-2-kw...@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Kevin Wolf 
---
 blockdev.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 6c7be7c522..660c735c81 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2454,7 +2454,7 @@ void coroutine_fn qmp_block_resize(bool has_device, const 
char *device,
int64_t size, Error **errp)
 {
 Error *local_err = NULL;
-BlockBackend *blk = NULL;
+BlockBackend *blk;
 BlockDriverState *bs;
 AioContext *old_ctx;
 
@@ -2468,17 +2468,17 @@ void coroutine_fn qmp_block_resize(bool has_device, 
const char *device,
 
 if (size < 0) {
 error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "size", "a >0 size");
-goto out;
+return;
 }
 
 if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_RESIZE, NULL)) {
 error_setg(errp, QERR_DEVICE_IN_USE, device);
-goto out;
+return;
 }
 
 blk = blk_new_with_bs(bs, BLK_PERM_RESIZE, BLK_PERM_ALL, errp);
 if (!blk) {
-goto out;
+return;
 }
 
 bdrv_drained_begin(bs);
@@ -2487,7 +2487,6 @@ void coroutine_fn qmp_block_resize(bool has_device, const 
char *device,
 bdrv_co_leave(bs, old_ctx);
 bdrv_drained_end(bs);
 
-out:
 bdrv_co_lock(bs);
 blk_unref(blk);
 bdrv_co_unlock(bs);
-- 
2.29.2

[PULL 20/34] storage-daemon: Call bdrv_close_all() on exit

2020-12-11 Thread Kevin Wolf

From: Max Reitz 

Otherwise, exports and block devices are not properly shut down and
closed, unless the users explicitly issues blockdev-del and
block-export-del commands for each of them.

Signed-off-by: Max Reitz 
Reviewed-by: Kevin Wolf 
Message-Id: <20201027190600.192171-17-mre...@redhat.com>
Signed-off-by: Kevin Wolf 
---
 storage-daemon/qemu-storage-daemon.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/storage-daemon/qemu-storage-daemon.c 
b/storage-daemon/qemu-storage-daemon.c
index 7c914b0dc1..e0c87edbdd 100644
--- a/storage-daemon/qemu-storage-daemon.c
+++ b/storage-daemon/qemu-storage-daemon.c
@@ -314,6 +314,9 @@ int main(int argc, char *argv[])
 main_loop_wait(false);
 }
 
+bdrv_drain_all_begin();
+bdrv_close_all();
+
 monitor_cleanup();
 qemu_chr_cleanup();
 user_creatable_cleanup();
-- 
2.29.2

[PULL 34/34] block: Fix deadlock in bdrv_co_yield_to_drain()

2020-12-11 Thread Kevin Wolf

If bdrv_co_yield_to_drain() is called for draining a block node that
runs in a different AioContext, it keeps that AioContext locked while it
yields and schedules a BH in the AioContext to do the actual drain.

As long as executing the BH is the very next thing that the event loop
of the node's AioContext does, this actually happens to work, but when
it tries to execute something else that wants to take the AioContext
lock, it will deadlock. (In the bug report, this other thing is a
virtio-scsi device running virtio_scsi_data_plane_handle_cmd().)

Instead, always drop the AioContext lock across the yield and reacquire
it only when the coroutine is reentered. The BH needs to unconditionally
take the lock for itself now.

This fixes the 'block_resize' QMP command on a block node that runs in
an iothread.

Cc: qemu-sta...@nongnu.org
Fixes: eb94b81a94bce112e6b206df846c1551aaf6cab6
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1903511
Signed-off-by: Kevin Wolf 
Message-Id: <20201203172311.68232-4-kw...@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Kevin Wolf 
---
 block/io.c | 41 -
 1 file changed, 24 insertions(+), 17 deletions(-)

diff --git a/block/io.c b/block/io.c
index 6343d85476..24205f5168 100644
--- a/block/io.c
+++ b/block/io.c
@@ -312,17 +312,7 @@ static void bdrv_co_drain_bh_cb(void *opaque)
 
 if (bs) {
 AioContext *ctx = bdrv_get_aio_context(bs);
-AioContext *co_ctx = qemu_coroutine_get_aio_context(co);
-
-/*
- * When the coroutine yielded, the lock for its home context was
- * released, so we need to re-acquire it here. If it explicitly
- * acquired a different context, the lock is still held and we don't
- * want to lock it a second time (or AIO_WAIT_WHILE() would hang).
- */
-if (ctx == co_ctx) {
-aio_context_acquire(ctx);
-}
+aio_context_acquire(ctx);
 bdrv_dec_in_flight(bs);
 if (data->begin) {
 assert(!data->drained_end_counter);
@@ -334,9 +324,7 @@ static void bdrv_co_drain_bh_cb(void *opaque)
 data->ignore_bds_parents,
 data->drained_end_counter);
 }
-if (ctx == co_ctx) {
-aio_context_release(ctx);
-}
+aio_context_release(ctx);
 } else {
 assert(data->begin);
 bdrv_drain_all_begin();
@@ -354,13 +342,16 @@ static void coroutine_fn 
bdrv_co_yield_to_drain(BlockDriverState *bs,
 int *drained_end_counter)
 {
 BdrvCoDrainData data;
+Coroutine *self = qemu_coroutine_self();
+AioContext *ctx = bdrv_get_aio_context(bs);
+AioContext *co_ctx = qemu_coroutine_get_aio_context(self);
 
 /* Calling bdrv_drain() from a BH ensures the current coroutine yields and
  * other coroutines run if they were queued by aio_co_enter(). */
 
 assert(qemu_in_coroutine());
 data = (BdrvCoDrainData) {
-.co = qemu_coroutine_self(),
+.co = self,
 .bs = bs,
 .done = false,
 .begin = begin,
@@ -374,13 +365,29 @@ static void coroutine_fn 
bdrv_co_yield_to_drain(BlockDriverState *bs,
 if (bs) {
 bdrv_inc_in_flight(bs);
 }
-replay_bh_schedule_oneshot_event(bdrv_get_aio_context(bs),
- bdrv_co_drain_bh_cb, &data);
+
+/*
+ * Temporarily drop the lock across yield or we would get deadlocks.
+ * bdrv_co_drain_bh_cb() reaquires the lock as needed.
+ *
+ * When we yield below, the lock for the current context will be
+ * released, so if this is actually the lock that protects bs, don't drop
+ * it a second time.
+ */
+if (ctx != co_ctx) {
+aio_context_release(ctx);
+}
+replay_bh_schedule_oneshot_event(ctx, bdrv_co_drain_bh_cb, &data);
 
 qemu_coroutine_yield();
 /* If we are resumed from some other event (such as an aio completion or a
  * timer callback), it is a bug in the caller that should be fixed. */
 assert(data.done);
+
+/* Reaquire the AioContext of bs if we dropped it */
+if (ctx != co_ctx) {
+aio_context_acquire(ctx);
+}
 }
 
 void bdrv_do_drained_begin_quiesce(BlockDriverState *bs,
-- 
2.29.2

[PULL 29/34] block/io: bdrv_refresh_limits(): use ERRP_GUARD

2020-12-11 Thread Kevin Wolf

From: Vladimir Sementsov-Ogievskiy 

This simplifies following commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Message-Id: <20201203222713.13507-3-vsement...@virtuozzo.com>
Reviewed-by: Alberto Garcia 
Signed-off-by: Kevin Wolf 
---
 block/io.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/block/io.c b/block/io.c
index ec5e152bb7..3e91074c9f 100644
--- a/block/io.c
+++ b/block/io.c
@@ -135,10 +135,10 @@ static void bdrv_merge_limits(BlockLimits *dst, const 
BlockLimits *src)
 
 void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
 {
+ERRP_GUARD();
 BlockDriver *drv = bs->drv;
 BdrvChild *c;
 bool have_limits;
-Error *local_err = NULL;
 
 memset(&bs->bl, 0, sizeof(bs->bl));
 
@@ -156,9 +156,8 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
 QLIST_FOREACH(c, &bs->children, next) {
 if (c->role & (BDRV_CHILD_DATA | BDRV_CHILD_FILTERED | BDRV_CHILD_COW))
 {
-bdrv_refresh_limits(c->bs, &local_err);
-if (local_err) {
-error_propagate(errp, local_err);
+bdrv_refresh_limits(c->bs, errp);
+if (*errp) {
 return;
 }
 bdrv_merge_limits(&bs->bl, &c->bs->bl);
-- 
2.29.2

[PULL 26/34] iotests/221: Discard image before qemu-img map

2020-12-11 Thread Kevin Wolf

From: Max Reitz 

See the new comment for why this should be done.

I do not have a reproducer on master, but when using FUSE block exports,
this test breaks depending on the underlying filesystem (for me, it
works on tmpfs, but fails on xfs, because the block allocated by
file-posix has 16 kB there instead of 4 kB).

Suggested-by: Kevin Wolf 
Signed-off-by: Max Reitz 
Message-Id: <20201207152245.66987-1-mre...@redhat.com>
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/221 |  7 +++
 tests/qemu-iotests/221.out | 14 ++
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/tests/qemu-iotests/221 b/tests/qemu-iotests/221
index ca62b3baa1..7e6086b205 100755
--- a/tests/qemu-iotests/221
+++ b/tests/qemu-iotests/221
@@ -46,6 +46,13 @@ echo "=== Check mapping of unaligned raw image ==="
 echo
 
 _make_test_img 65537 # qemu-img create rounds size up
+
+# file-posix allocates the first block of any images when it is created;
+# the size of this block depends on the host page size and the file
+# system block size, none of which are constant.  Discard the whole
+# image so we will not see this allocation in qemu-img map's output.
+$QEMU_IO -c 'discard 0 65537' "$TEST_IMG" | _filter_qemu_io
+
 $QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
 
 truncate --size=65537 "$TEST_IMG" # so we resize it and check again
diff --git a/tests/qemu-iotests/221.out b/tests/qemu-iotests/221.out
index dca024a0c3..93846c7dab 100644
--- a/tests/qemu-iotests/221.out
+++ b/tests/qemu-iotests/221.out
@@ -3,18 +3,16 @@ QA output created by 221
 === Check mapping of unaligned raw image ===
 
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=65537
-[{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true, 
"offset": OFFSET},
-{ "start": 4096, "length": 61952, "depth": 0, "zero": true, "data": false, 
"offset": OFFSET}]
-[{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true, 
"offset": OFFSET},
-{ "start": 4096, "length": 61952, "depth": 0, "zero": true, "data": false, 
"offset": OFFSET}]
+discard 65537/65537 bytes at offset 0
+64.001 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+[{ "start": 0, "length": 66048, "depth": 0, "zero": true, "data": false, 
"offset": OFFSET}]
+[{ "start": 0, "length": 66048, "depth": 0, "zero": true, "data": false, 
"offset": OFFSET}]
 wrote 1/1 bytes at offset 65536
 1 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-[{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true, 
"offset": OFFSET},
-{ "start": 4096, "length": 61440, "depth": 0, "zero": true, "data": false, 
"offset": OFFSET},
+[{ "start": 0, "length": 65536, "depth": 0, "zero": true, "data": false, 
"offset": OFFSET},
 { "start": 65536, "length": 1, "depth": 0, "zero": false, "data": true, 
"offset": OFFSET},
 { "start": 65537, "length": 511, "depth": 0, "zero": true, "data": false, 
"offset": OFFSET}]
-[{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true, 
"offset": OFFSET},
-{ "start": 4096, "length": 61440, "depth": 0, "zero": true, "data": false, 
"offset": OFFSET},
+[{ "start": 0, "length": 65536, "depth": 0, "zero": true, "data": false, 
"offset": OFFSET},
 { "start": 65536, "length": 1, "depth": 0, "zero": false, "data": true, 
"offset": OFFSET},
 { "start": 65537, "length": 511, "depth": 0, "zero": true, "data": false, 
"offset": OFFSET}]
 *** done
-- 
2.29.2

[PULL 24/34] iotests/308: Add test for FUSE exports

2020-12-11 Thread Kevin Wolf

From: Max Reitz 

We have good coverage of the normal I/O paths now, but what remains is a
test that tests some more special cases: Exporting an image on itself
(thus turning a formatted image into a raw one), some error cases, and
non-writable and non-growable exports.

Signed-off-by: Max Reitz 
Reviewed-by: Kevin Wolf 
Message-Id: <20201027190600.192171-21-mre...@redhat.com>
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/308 | 339 +
 tests/qemu-iotests/308.out |  97 +++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 437 insertions(+)
 create mode 100755 tests/qemu-iotests/308
 create mode 100644 tests/qemu-iotests/308.out

diff --git a/tests/qemu-iotests/308 b/tests/qemu-iotests/308
new file mode 100755
index 00..b30f4400f6
--- /dev/null
+++ b/tests/qemu-iotests/308
@@ -0,0 +1,339 @@
+#!/usr/bin/env bash
+#
+# Test FUSE exports (in ways that are not captured by the generic
+# tests)
+#
+# Copyright (C) 2020 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+seq=$(basename "$0")
+echo "QA output created by $seq"
+
+status=1   # failure is the default!
+
+_cleanup()
+{
+_cleanup_qemu
+_cleanup_test_img
+rmdir "$EXT_MP" 2>/dev/null
+rm -f "$EXT_MP"
+rm -f "$COPIED_IMG"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+. ./common.qemu
+
+# Generic format, but needs a plain filename
+_supported_fmt generic
+if [ "$IMGOPTSSYNTAX" = "true" ]; then
+_unsupported_fmt $IMGFMT
+fi
+# We need the image to have exactly the specified size, and VPC does
+# not allow that by default
+_unsupported_fmt vpc
+
+_supported_proto file # We create the FUSE export manually
+_supported_os Linux # We need /dev/urandom
+
+# $1: Export ID
+# $2: Options (beyond the node-name and ID)
+# $3: Expected return value (defaults to 'return')
+# $4: Node to export (defaults to 'node-format')
+fuse_export_add()
+{
+_send_qemu_cmd $QEMU_HANDLE \
+"{'execute': 'block-export-add',
+  'arguments': {
+  'type': 'fuse',
+  'id': '$1',
+  'node-name': '${4:-node-format}',
+  $2
+  } }" \
+"${3:-return}" \
+| _filter_imgfmt
+}
+
+# $1: Export ID
+fuse_export_del()
+{
+_send_qemu_cmd $QEMU_HANDLE \
+"{'execute': 'block-export-del',
+  'arguments': {
+  'id': '$1'
+  } }" \
+'return'
+
+_send_qemu_cmd $QEMU_HANDLE \
+'' \
+'BLOCK_EXPORT_DELETED'
+}
+
+# Return the length of the protocol file
+# $1: Protocol node export mount point
+# $2: Original file (to compare)
+get_proto_len()
+{
+len1=$(stat -c '%s' "$1")
+len2=$(stat -c '%s' "$2")
+
+if [ "$len1" != "$len2" ]; then
+echo 'ERROR: Length of export and original differ:' >&2
+echo "$len1 != $len2" >&2
+else
+echo '(OK: Lengths of export and original are the same)' >&2
+fi
+
+echo "$len1"
+}
+
+COPIED_IMG="$TEST_IMG.copy"
+EXT_MP="$TEST_IMG.fuse"
+
+echo '=== Set up ==='
+
+# Create image with random data
+_make_test_img 64M
+$QEMU_IO -c 'write -s /dev/urandom 0 64M' "$TEST_IMG" | _filter_qemu_io
+
+_launch_qemu
+_send_qemu_cmd $QEMU_HANDLE \
+"{'execute': 'qmp_capabilities'}" \
+'return'
+
+# Separate blockdev-add calls for format and protocol so we can remove
+# the format layer later on
+_send_qemu_cmd $QEMU_HANDLE \
+"{'execute': 'blockdev-add',
+  'arguments': {
+  'driver': 'file',
+  'node-name': 'node-protocol',
+  'filename': '$TEST_IMG'
+  } }" \
+'return'
+
+_send_qemu_cmd $QEMU_HANDLE \
+"{'execute': 'blockdev-add',
+  'arguments': {
+  'driver': '$IMGFMT',
+  'node-name': 'node-format',
+  'file': 'node-protocol'
+  } }" \
+'return'
+
+echo
+echo '=== Mountpoint not present ==='
+
+rmdir "$EXT_MP" 2>/dev/null
+rm -f "$EXT_MP"
+output=$(fuse_export_add 'export-err' "'mountpoint': '$EXT_MP'" error)
+
+if echo "$output" | grep -q "Invalid parameter 'fuse'"; then
+_notrun 'No FUSE support'
+fi
+
+echo "$output"
+
+echo
+echo '=== Mountpoint is a directory ==='
+
+mkdir "$EXT_MP"
+fuse_export_add 'export-err' "'mountpoint': '$EXT_MP'" error
+rmdir "$EXT_MP"
+
+echo
+echo '=== Mountpoint is a regular file ==='
+
+touch "

[PULL 18/34] iotests: Let _make_test_img guess $TEST_IMG_FILE

2020-12-11 Thread Kevin Wolf

From: Max Reitz 

When most iotests want to create a test image that is named differently
from the default $TEST_IMG, they do something like this:

TEST_IMG="$TEST_IMG.base" _make_test_img $options

This works fine with the "file" protocol, but not so much for anything
else: _make_test_img tries to create an image under $TEST_IMG_FILE
first, and only under $TEST_IMG if the former is not set; and on
everything but "file", $TEST_IMG_FILE is set.

There are two ways we can fix this: First, we could make all tests
adjust not only TEST_IMG, but also TEST_IMG_FILE if that is present
(e.g. with something like _set_test_img_suffix $suffix that would affect
not only TEST_IMG but also TEST_IMG_FILE, if necessary).  This is a
pretty clean solution, and this is maybe what we should have done from
the start.

But it would also require changes to most existing bash tests.  So the
alternative is this: Let _make_test_img see whether $TEST_IMG_FILE still
points to the original value.  If so, it is possible that the caller has
adjusted $TEST_IMG but not $TEST_IMG_FILE.  In such a case, we can (for
most protocols) derive the corresponding $TEST_IMG_FILE value from
$TEST_IMG value and thus work around what technically is the caller
misbehaving.

This second solution is less clean, but it is robust against people
keeping their old habit of adjusting TEST_IMG only, and requires much
less changes.  So this patch implements it.

Signed-off-by: Max Reitz 
Reviewed-by: Kevin Wolf 
Message-Id: <20201027190600.192171-15-mre...@redhat.com>
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/common.rc | 40 +---
 1 file changed, 37 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index 494490a272..23f46da2db 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -268,6 +268,7 @@ else
 TEST_IMG=$IMGPROTO:$TEST_DIR/t.$IMGFMT
 fi
 fi
+ORIG_TEST_IMG_FILE=$TEST_IMG_FILE
 ORIG_TEST_IMG="$TEST_IMG"
 
 if [ -z "$TEST_DIR" ]; then
@@ -330,6 +331,30 @@ _get_data_file()
 | sed -e "s#\\\$TEST_IMG#$1#"
 }
 
+# Translate a $TEST_IMG to its corresponding $TEST_IMG_FILE for
+# different protocols
+_test_img_to_test_img_file()
+{
+case "$IMGPROTO" in
+file)
+echo "$1"
+;;
+
+nfs)
+echo "$1" | sed -e "s#nfs://127.0.0.1##"
+;;
+
+ssh)
+echo "$1" | \
+sed -e "s#ssh://\\($USER@\\)\\?127.0.0.1\\(:[0-9]\\+\\)\\?##"
+;;
+
+*)
+return 1
+;;
+esac
+}
+
 _make_test_img()
 {
 # extra qemu-img options can be added by tests
@@ -343,10 +368,19 @@ _make_test_img()
 local opts_param=false
 local misc_params=()
 
-if [ -n "$TEST_IMG_FILE" ]; then
-img_name=$TEST_IMG_FILE
-else
+if [ -z "$TEST_IMG_FILE" ]; then
 img_name=$TEST_IMG
+elif [ "$IMGOPTSSYNTAX" != "true" -a \
+   "$TEST_IMG_FILE" = "$ORIG_TEST_IMG_FILE" ]; then
+# Handle cases of tests only updating TEST_IMG, but not TEST_IMG_FILE
+img_name=$(_test_img_to_test_img_file "$TEST_IMG")
+if [ "$?" != 0 ]; then
+img_name=$TEST_IMG_FILE
+fi
+else
+# $TEST_IMG_FILE is not the default value, so it definitely has been
+# modified by the test
+img_name=$TEST_IMG_FILE
 fi
 
 if [ -n "$IMGOPTS" ]; then
-- 
2.29.2

[PULL 15/34] iotests: Derive image names from $TEST_IMG

2020-12-11 Thread Kevin Wolf

From: Max Reitz 

Avoid creating images with custom filenames in $TEST_DIR, because
non-file protocols may want to keep $TEST_IMG (and all other test
images) in some other directory.

Signed-off-by: Max Reitz 
Reviewed-by: Kevin Wolf 
Message-Id: <20201027190600.192171-12-mre...@redhat.com>
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/200 | 3 +--
 tests/qemu-iotests/200.out | 4 ++--
 tests/qemu-iotests/229 | 3 +--
 tests/qemu-iotests/229.out | 6 +++---
 4 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/tests/qemu-iotests/200 b/tests/qemu-iotests/200
index 59f7854b9f..a7aabbd032 100755
--- a/tests/qemu-iotests/200
+++ b/tests/qemu-iotests/200
@@ -44,8 +44,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 _supported_fmt qcow2 qed
 _supported_proto file
 
-BACKING_IMG="${TEST_DIR}/backing.img"
-TEST_IMG="${TEST_DIR}/test.img"
+BACKING_IMG="$TEST_IMG.base"
 
 TEST_IMG="$BACKING_IMG" _make_test_img 512M
 _make_test_img -F $IMGFMT -b "$BACKING_IMG" 512M
diff --git a/tests/qemu-iotests/200.out b/tests/qemu-iotests/200.out
index a6776070e4..5883f16ac3 100644
--- a/tests/qemu-iotests/200.out
+++ b/tests/qemu-iotests/200.out
@@ -1,6 +1,6 @@
 QA output created by 200
-Formatting 'TEST_DIR/backing.img', fmt=IMGFMT size=536870912
-Formatting 'TEST_DIR/test.img', fmt=IMGFMT size=536870912 
backing_file=TEST_DIR/backing.img backing_fmt=IMGFMT
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=536870912
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=536870912 
backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT
 wrote 314572800/314572800 bytes at offset 512
 300 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
diff --git a/tests/qemu-iotests/229 b/tests/qemu-iotests/229
index 89a5359f32..5f759fa587 100755
--- a/tests/qemu-iotests/229
+++ b/tests/qemu-iotests/229
@@ -51,8 +51,7 @@ _supported_os Linux
 _unsupported_imgopts data_file
 
 
-DEST_IMG="$TEST_DIR/d.$IMGFMT"
-TEST_IMG="$TEST_DIR/b.$IMGFMT"
+DEST_IMG="$TEST_IMG.dest"
 BLKDEBUG_CONF="$TEST_DIR/blkdebug.conf"
 
 _make_test_img 2M
diff --git a/tests/qemu-iotests/229.out b/tests/qemu-iotests/229.out
index 4de6dfaa28..7eed393013 100644
--- a/tests/qemu-iotests/229.out
+++ b/tests/qemu-iotests/229.out
@@ -1,6 +1,6 @@
 QA output created by 229
-Formatting 'TEST_DIR/b.IMGFMT', fmt=IMGFMT size=2097152
-Formatting 'TEST_DIR/d.IMGFMT', fmt=IMGFMT size=2097152
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=2097152
+Formatting 'TEST_DIR/t.IMGFMT.dest', fmt=IMGFMT size=2097152
 wrote 2097152/2097152 bytes at offset 0
 2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 {'execute': 'qmp_capabilities'}
@@ -8,7 +8,7 @@ wrote 2097152/2097152 bytes at offset 0
 
 === Starting drive-mirror, causing error & stop  ===
 
-{'execute': 'drive-mirror', 'arguments': {'device': 'testdisk', 'format': 
'IMGFMT', 'target': 'blkdebug:TEST_DIR/blkdebug.conf:TEST_DIR/d.IMGFMT', 
'sync': 'full', 'mode': 'existing', 'on-source-error': 'stop', 
'on-target-error': 'stop' }}
+{'execute': 'drive-mirror', 'arguments': {'device': 'testdisk', 'format': 
'IMGFMT', 'target': 'blkdebug:TEST_DIR/blkdebug.conf:TEST_DIR/t.IMGFMT.dest', 
'sync': 'full', 'mode': 'existing', 'on-source-error': 'stop', 
'on-target-error': 'stop' }}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"JOB_STATUS_CHANGE", "data": {"status": "created", "id": "testdisk"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"JOB_STATUS_CHANGE", "data": {"status": "running", "id": "testdisk"}}
 {"return": {}}
-- 
2.29.2

[PULL 14/34] iotests/046: Avoid renaming images

2020-12-11 Thread Kevin Wolf

From: Max Reitz 

This generally does not work on non-file protocols.  It is better to
create the image with the final name from the start, and most tests do
this already.  Let 046 follow suit.

Signed-off-by: Max Reitz 
Reviewed-by: Kevin Wolf 
Message-Id: <20201027190600.192171-11-mre...@redhat.com>
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/046 | 5 +++--
 tests/qemu-iotests/046.out | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/046 b/tests/qemu-iotests/046
index 88b3363c19..40a9f30087 100755
--- a/tests/qemu-iotests/046
+++ b/tests/qemu-iotests/046
@@ -47,6 +47,8 @@ size=128M
 echo
 echo "== creating backing file for COW tests =="
 
+TEST_IMG_SAVE=$TEST_IMG
+TEST_IMG="$TEST_IMG.base"
 _make_test_img $size
 
 backing_io()
@@ -67,8 +69,7 @@ backing_io()
 
 backing_io 0 32 write | $QEMU_IO "$TEST_IMG" | _filter_qemu_io
 
-mv "$TEST_IMG" "$TEST_IMG.base"
-
+TEST_IMG=$TEST_IMG_SAVE
 _make_test_img -b "$TEST_IMG.base" -F $IMGFMT 6G
 
 echo
diff --git a/tests/qemu-iotests/046.out b/tests/qemu-iotests/046.out
index b022bcddd5..66ad987ab3 100644
--- a/tests/qemu-iotests/046.out
+++ b/tests/qemu-iotests/046.out
@@ -1,7 +1,7 @@
 QA output created by 046
 
 == creating backing file for COW tests ==
-Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134217728
 wrote 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 65536/65536 bytes at offset 65536
-- 
2.29.2

[PULL 13/34] iotests: Use convert -n in some cases

2020-12-11 Thread Kevin Wolf

From: Max Reitz 

qemu-img convert (without -n) can often be replaced by a combination of
_make_test_img + qemu-img convert -n.  Doing so allows converting to
protocols that do not allow direct file creation, such as FUSE exports.
The only problem is that for formats other than qcow2 and qed (qcow1 at
least), this may lead to high disk usage for some reason, so we cannot
do it everywhere.

But we can do it in 028 and 089, so let us do that so they can run on
FUSE exports.  Also, in 028 this allows us to remove a 9-line comment
that used to explain why we cannot safely filter drive-backup's image
creation output.

Signed-off-by: Max Reitz 
Reviewed-by: Kevin Wolf 
Message-Id: <20201027190600.192171-10-mre...@redhat.com>
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/028 | 14 --
 tests/qemu-iotests/028.out |  3 +++
 tests/qemu-iotests/089 |  3 ++-
 tests/qemu-iotests/089.out |  1 +
 4 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/tests/qemu-iotests/028 b/tests/qemu-iotests/028
index 6dd3ae09a3..864dc4a4e2 100755
--- a/tests/qemu-iotests/028
+++ b/tests/qemu-iotests/028
@@ -116,16 +116,10 @@ else
 QEMU_COMM_TIMEOUT=1
 fi
 
-# Silence output since it contains the disk image path and QEMU's readline
-# character echoing makes it very hard to filter the output. Plus, there
-# is no telling how many times the command will repeat before succeeding.
-# (Note that creating the image results in a "Formatting..." message over
-# stdout, which is the same channel the monitor uses.  We cannot reliably
-# wait for it because the monitor output may interact with it in such a
-# way that _timed_wait_for cannot read it.  However, once the block job is
-# done, we know that the "Formatting..." message must have appeared
-# already, so the output is still deterministic.)
-silent=y _send_qemu_cmd $h "drive_backup disk ${TEST_IMG}.copy" "(qemu)"
+TEST_IMG="$TEST_IMG.copy" _make_test_img $image_size
+_send_qemu_cmd $h "drive_backup -n disk ${TEST_IMG}.copy" "(qemu)" \
+| _filter_imgfmt
+
 silent=y qemu_cmd_repeat=20 _send_qemu_cmd $h "info block-jobs" "No active 
jobs"
 _send_qemu_cmd $h "info block-jobs" "No active jobs"
 _send_qemu_cmd $h 'quit' ""
diff --git a/tests/qemu-iotests/028.out b/tests/qemu-iotests/028.out
index 5a68de5c46..e580488216 100644
--- a/tests/qemu-iotests/028.out
+++ b/tests/qemu-iotests/028.out
@@ -468,6 +468,9 @@ No errors were found on the image.
 
 block-backup
 
+Formatting 'TEST_DIR/t.IMGFMT.copy', fmt=IMGFMT size=4294968832
+QEMU X.Y.Z monitor - type 'help' for more information
+(qemu) drive_backup -n disk TEST_DIR/t.IMGFMT.copy
 (qemu) info block-jobs
 No active jobs
 === IO: pattern 195
diff --git a/tests/qemu-iotests/089 b/tests/qemu-iotests/089
index 66c5415abe..03a2ccf1e8 100755
--- a/tests/qemu-iotests/089
+++ b/tests/qemu-iotests/089
@@ -62,7 +62,8 @@ TEST_IMG="$TEST_IMG.base" _make_test_img $IMG_SIZE
 $QEMU_IO -c 'write -P 42 0 512' -c 'write -P 23 512 512' \
  -c 'write -P 66 1024 512' "$TEST_IMG.base" | _filter_qemu_io
 
-$QEMU_IMG convert -f raw -O $IMGFMT "$TEST_IMG.base" "$TEST_IMG"
+_make_test_img $IMG_SIZE
+$QEMU_IMG convert -f raw -O $IMGFMT -n "$TEST_IMG.base" "$TEST_IMG"
 
 $QEMU_IO_PROG --cache $CACHEMODE --aio $AIOMODE \
  -c 'read -P 42 0 512' -c 'read -P 23 512 512' \
diff --git a/tests/qemu-iotests/089.out b/tests/qemu-iotests/089.out
index 15682c2886..c53fc4823a 100644
--- a/tests/qemu-iotests/089.out
+++ b/tests/qemu-iotests/089.out
@@ -9,6 +9,7 @@ wrote 512/512 bytes at offset 512
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 512/512 bytes at offset 1024
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 read 512/512 bytes at offset 0
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 read 512/512 bytes at offset 512
-- 
2.29.2

[PATCH 00/20] Immutable QString, and also one JSON writer less

2020-12-11 Thread Markus Armbruster

Based-on: <20201210161452.2813491-1-arm...@redhat.com>

Cc: Daniel P. Berrangé 
Cc: Dr. David Alan Gilbert 
Cc: Eduardo Habkost 
Cc: Juan Quintela 
Cc: Kevin Wolf 
Cc: Marcel Apfelbaum 
Cc: Max Reitz 
Cc: Paolo Bonzini 
Cc: Yuval Shaia 
Cc: qemu-block@nongnu.org

Markus Armbruster (20):
  hmp: Simplify how qmp_human_monitor_command() gets output
  monitor: Use GString instead of QString for output buffer
  qobject: Make qobject_to_json_pretty() take a pretty argument
  qobject: Use GString instead of QString to accumulate JSON
  qobject: Change qobject_to_json()'s value to GString
  Revert "qstring: add qstring_free()"
  hw/rdma: Replace QList by GQueue
  qobject: Move internals to qobject-internal.h
  qmp: Fix tracing of non-string command IDs
  block: Avoid qobject_get_try_str()
  Revert "qobject: let object_property_get_str() use new API"
  qobject: Drop qobject_get_try_str()
  qobject: Drop qstring_get_try_str()
  qobject: Factor quoted_str() out of to_json()
  qobject: Factor JSON writer out of qobject_to_json()
  migration: Replace migration's JSON writer by the general one
  json: Use GString instead of QString to accumulate strings
  keyval: Use GString to accumulate value strings
  block: Use GString instead of QString to build filenames
  qobject: Make QString immutable

 hw/rdma/rdma_backend_defs.h|   2 +-
 hw/rdma/rdma_utils.h   |  15 +-
 include/migration/vmstate.h|   7 +-
 include/qapi/qmp/json-writer.h |  35 
 include/qapi/qmp/qbool.h   |   2 -
 include/qapi/qmp/qdict.h   |   2 -
 include/qapi/qmp/qjson.h   |   4 +-
 include/qapi/qmp/qlist.h   |   2 -
 include/qapi/qmp/qnull.h   |   2 -
 include/qapi/qmp/qnum.h|   3 -
 include/qapi/qmp/qobject.h |   9 +-
 include/qapi/qmp/qstring.h |  14 +-
 include/qemu/typedefs.h|   4 +-
 migration/qjson.h  |  29 
 monitor/monitor-internal.h |   2 +-
 qobject/qobject-internal.h |  39 +
 block.c|  23 +--
 block/rbd.c|   2 +-
 hw/display/virtio-gpu.c|   2 +-
 hw/intc/s390_flic_kvm.c|   2 +-
 hw/nvram/eeprom93xx.c  |   2 +-
 hw/nvram/fw_cfg.c  |   2 +-
 hw/pci/msix.c  |   2 +-
 hw/pci/pci.c   |   4 +-
 hw/pci/shpc.c  |   2 +-
 hw/rdma/rdma_backend.c |  10 +-
 hw/rdma/rdma_utils.c   |  29 ++--
 hw/rtc/twl92230.c  |   2 +-
 hw/scsi/scsi-bus.c |   2 +-
 hw/usb/redirect.c  |   7 +-
 hw/virtio/virtio.c |   4 +-
 migration/qjson.c  | 114 -
 migration/savevm.c |  53 ---
 migration/vmstate-types.c  |  38 ++---
 migration/vmstate.c|  52 +++---
 monitor/misc.c |   6 +-
 monitor/monitor.c  |  20 +--
 monitor/qmp.c  |  46 +++---
 qemu-img.c |  33 ++--
 qga/main.c |  22 +--
 qobject/json-parser.c  |  30 ++--
 qobject/json-writer.c  | 247 +
 qobject/qbool.c|   1 +
 qobject/qdict.c|   1 +
 qobject/qjson.c| 144 -
 qobject/qlist.c|   1 +
 qobject/qnull.c|   1 +
 qobject/qnum.c |   6 +-
 qobject/qobject.c  |   1 +
 qobject/qstring.c  | 117 +++---
 qom/object.c   |   9 +-
 qom/object_interfaces.c|   4 +-
 qom/qom-hmp-cmds.c |   7 +-
 target/alpha/machine.c |   2 +-
 target/arm/machine.c   |   6 +-
 target/avr/machine.c   |   4 +-
 target/hppa/machine.c  |   4 +-
 target/microblaze/machine.c|   2 +-
 target/mips/machine.c  |   4 +-
 target/openrisc/machine.c  |   2 +-
 target/ppc/machine.c   |  10 +-
 target/sparc/machine.c |   2 +-
 tests/check-qjson.c|  67 
 tests/check-qobject.c  |   3 +-
 tests/check-qstring.c  |  16 --
 tests/qtest/libqtest.c |  20 ++-
 tests/test-visitor-serialization.c |   6 +-
 util/keyval.c  |  11 +-
 migration/meson.build  |   1 -
 qobject/meson.build|   5 +-
 70 files changed, 679 insertions(+), 705 deletions(-)
 create mode 100644 include/qapi/qmp/json-writer.h
 delete mode 100644 migration/qjson.h
 create mode 100644 qobject/qobject-internal.h
 delete mode 100644 migration/qjson.c
 create mode 100644 qobject/json-writer.c

-- 
2.26.2

[PULL 23/34] iotests: Enable fuse for many tests

2020-12-11 Thread Kevin Wolf

From: Max Reitz 

Many tests (that do not support generic protocols) can run just fine
with FUSE-exported images, so allow them to.  Note that this is no
attempt at being definitely complete.  There are some tests that might
be modified to run on FUSE, but this patch still skips them.  This patch
only tries to pick the rather low-hanging fruits.

Note that 221 and 250 only pass when .lseek is correctly implemented,
which is only possible with a libfuse that is 3.8 or newer.

Signed-off-by: Max Reitz 
Reviewed-by: Kevin Wolf 
Message-Id: <20201027190600.192171-20-mre...@redhat.com>
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/025 | 2 +-
 tests/qemu-iotests/026 | 2 +-
 tests/qemu-iotests/028 | 2 +-
 tests/qemu-iotests/031 | 2 +-
 tests/qemu-iotests/034 | 2 +-
 tests/qemu-iotests/036 | 2 +-
 tests/qemu-iotests/037 | 2 +-
 tests/qemu-iotests/038 | 2 +-
 tests/qemu-iotests/039 | 2 +-
 tests/qemu-iotests/046 | 2 +-
 tests/qemu-iotests/050 | 2 +-
 tests/qemu-iotests/054 | 2 +-
 tests/qemu-iotests/060 | 2 +-
 tests/qemu-iotests/071 | 2 +-
 tests/qemu-iotests/079 | 2 +-
 tests/qemu-iotests/080 | 2 +-
 tests/qemu-iotests/089 | 2 +-
 tests/qemu-iotests/090 | 2 +-
 tests/qemu-iotests/091 | 2 +-
 tests/qemu-iotests/095 | 2 +-
 tests/qemu-iotests/097 | 2 +-
 tests/qemu-iotests/098 | 2 +-
 tests/qemu-iotests/102 | 2 +-
 tests/qemu-iotests/103 | 2 +-
 tests/qemu-iotests/106 | 2 +-
 tests/qemu-iotests/107 | 2 +-
 tests/qemu-iotests/108 | 2 +-
 tests/qemu-iotests/111 | 2 +-
 tests/qemu-iotests/112 | 2 +-
 tests/qemu-iotests/115 | 2 +-
 tests/qemu-iotests/117 | 2 +-
 tests/qemu-iotests/120 | 2 +-
 tests/qemu-iotests/121 | 2 +-
 tests/qemu-iotests/127 | 2 +-
 tests/qemu-iotests/133 | 2 +-
 tests/qemu-iotests/137 | 2 +-
 tests/qemu-iotests/138 | 2 +-
 tests/qemu-iotests/140 | 2 +-
 tests/qemu-iotests/154 | 2 +-
 tests/qemu-iotests/161 | 2 +-
 tests/qemu-iotests/171 | 2 +-
 tests/qemu-iotests/175 | 2 +-
 tests/qemu-iotests/176 | 2 +-
 tests/qemu-iotests/177 | 2 +-
 tests/qemu-iotests/179 | 2 +-
 tests/qemu-iotests/183 | 2 +-
 tests/qemu-iotests/186 | 2 +-
 tests/qemu-iotests/187 | 2 +-
 tests/qemu-iotests/191 | 2 +-
 tests/qemu-iotests/195 | 2 +-
 tests/qemu-iotests/200 | 2 +-
 tests/qemu-iotests/204 | 2 +-
 tests/qemu-iotests/214 | 2 +-
 tests/qemu-iotests/217 | 2 +-
 tests/qemu-iotests/220 | 2 +-
 tests/qemu-iotests/221 | 2 +-
 tests/qemu-iotests/229 | 2 +-
 tests/qemu-iotests/247 | 2 +-
 tests/qemu-iotests/249 | 2 +-
 tests/qemu-iotests/250 | 2 +-
 tests/qemu-iotests/252 | 2 +-
 tests/qemu-iotests/265 | 2 +-
 tests/qemu-iotests/268 | 2 +-
 tests/qemu-iotests/272 | 2 +-
 tests/qemu-iotests/273 | 2 +-
 tests/qemu-iotests/279 | 2 +-
 tests/qemu-iotests/286 | 2 +-
 tests/qemu-iotests/287 | 2 +-
 tests/qemu-iotests/289 | 2 +-
 tests/qemu-iotests/290 | 2 +-
 tests/qemu-iotests/291 | 2 +-
 tests/qemu-iotests/292 | 2 +-
 tests/qemu-iotests/293 | 2 +-
 tests/qemu-iotests/294 | 2 +-
 tests/qemu-iotests/305 | 2 +-
 75 files changed, 75 insertions(+), 75 deletions(-)

diff --git a/tests/qemu-iotests/025 b/tests/qemu-iotests/025
index e05d833452..1569d912f4 100755
--- a/tests/qemu-iotests/025
+++ b/tests/qemu-iotests/025
@@ -38,7 +38,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.pattern
 
 _supported_fmt raw qcow2 qed luks
-_supported_proto file sheepdog rbd nfs
+_supported_proto file sheepdog rbd nfs fuse
 
 echo "=== Creating image"
 echo
diff --git a/tests/qemu-iotests/026 b/tests/qemu-iotests/026
index b9713eb591..9ecc5880b1 100755
--- a/tests/qemu-iotests/026
+++ b/tests/qemu-iotests/026
@@ -41,7 +41,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 
 # Currently only qcow2 supports rebasing
 _supported_fmt qcow2
-_supported_proto file
+_supported_proto file fuse
 _default_cache_mode writethrough
 _supported_cache_modes writethrough none
 # The refcount table tests expect a certain minimum width for refcount entries
diff --git a/tests/qemu-iotests/028 b/tests/qemu-iotests/028
index 864dc4a4e2..57d34aae99 100755
--- a/tests/qemu-iotests/028
+++ b/tests/qemu-iotests/028
@@ -46,7 +46,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 # Any format supporting backing files except vmdk and qcow which do not support
 # smaller backing files.
 _supported_fmt qcow2 qed
-_supported_proto file
+_supported_proto file fuse
 _supported_os Linux
 
 # Choose a size that is not necessarily a cluster size multiple for image
diff --git a/tests/qemu-iotests/031 b/tests/qemu-iotests/031
index 646ecd593f..2bcbc5886e 100755
--- a/tests/qemu-iotests/031
+++ b/tests/qemu-iotests/031
@@ -39,7 +39,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 
 # This tests qcow2-specific low-level functionality
 _supported_fmt qcow2
-_supported_proto file
+_supported_proto file fuse
 # We want to test compat=0.10, which does not support external data
 # files or refcount widths other than 16
 _unsupported_imgopts data_file 'refcount_bits=\([^1]\|.\([^6]\|$\)\)'
diff --git a/tests/qemu-iotests/034 b/tests/qemu-iotests/034
index ac2d687c71..08f7aea6d5 100755
--

[PULL 11/34] iotests: Do not needlessly filter _make_test_img

2020-12-11 Thread Kevin Wolf

From: Max Reitz 

In most cases, _make_test_img does not need a _filter_imgfmt on top.  It
does that by itself.

(The exception is when IMGFMT has been overwritten but TEST_IMG has not.
In such cases, we do need a _filter_imgfmt on top to filter the test's
original IMGFMT from TEST_IMG.)

Signed-off-by: Max Reitz 
Reviewed-by: Kevin Wolf 
Message-Id: <20201027190600.192171-8-mre...@redhat.com>
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/161 | 12 ++--
 tests/qemu-iotests/175 |  6 +++---
 tests/qemu-iotests/249 |  6 +++---
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/tests/qemu-iotests/161 b/tests/qemu-iotests/161
index e270976d87..bbf7dbbc5c 100755
--- a/tests/qemu-iotests/161
+++ b/tests/qemu-iotests/161
@@ -48,9 +48,9 @@ _supported_os Linux
 IMG_SIZE=1M
 
 # Create the images
-TEST_IMG="$TEST_IMG.base" _make_test_img $IMG_SIZE | _filter_imgfmt
-TEST_IMG="$TEST_IMG.int" _make_test_img -b "$TEST_IMG.base" -F $IMGFMT | 
_filter_imgfmt
-_make_test_img -b "$TEST_IMG.int" -F $IMGFMT -F $IMGFMT | _filter_imgfmt
+TEST_IMG="$TEST_IMG.base" _make_test_img $IMG_SIZE
+TEST_IMG="$TEST_IMG.int" _make_test_img -b "$TEST_IMG.base" -F $IMGFMT
+_make_test_img -b "$TEST_IMG.int" -F $IMGFMT -F $IMGFMT
 
 # First test: reopen $TEST.IMG changing the detect-zeroes option on
 # its backing file ($TEST_IMG.int).
@@ -105,9 +105,9 @@ echo
 echo "*** Commit and then change an option on the backing file"
 echo
 # Create the images again
-TEST_IMG="$TEST_IMG.base" _make_test_img $IMG_SIZE | _filter_imgfmt
-TEST_IMG="$TEST_IMG.int" _make_test_img -b "$TEST_IMG.base" -F $IMGFMT| 
_filter_imgfmt
-_make_test_img -b "$TEST_IMG.int" -F $IMGFMT | _filter_imgfmt
+TEST_IMG="$TEST_IMG.base" _make_test_img $IMG_SIZE
+TEST_IMG="$TEST_IMG.int" _make_test_img -b "$TEST_IMG.base" -F $IMGFMT
+_make_test_img -b "$TEST_IMG.int" -F $IMGFMT
 
 _launch_qemu -drive if=none,file="${TEST_IMG}"
 _send_qemu_cmd $QEMU_HANDLE \
diff --git a/tests/qemu-iotests/175 b/tests/qemu-iotests/175
index 00a626aa63..c3c2aed653 100755
--- a/tests/qemu-iotests/175
+++ b/tests/qemu-iotests/175
@@ -89,20 +89,20 @@ min_blocks=$(stat -c '%b' "$TEST_DIR/empty")
 
 echo
 echo "== creating image with default preallocation =="
-_make_test_img -o extent_size_hint=0 $size | _filter_imgfmt
+_make_test_img -o extent_size_hint=0 $size
 stat -c "size=%s, blocks=%b" $TEST_IMG | _filter_blocks $extra_blocks 
$min_blocks $size
 
 for mode in off full falloc; do
 echo
 echo "== creating image with preallocation $mode =="
-_make_test_img -o preallocation=$mode,extent_size_hint=0 $size | 
_filter_imgfmt
+_make_test_img -o preallocation=$mode,extent_size_hint=0 $size
 stat -c "size=%s, blocks=%b" $TEST_IMG | _filter_blocks $extra_blocks 
$min_blocks $size
 done
 
 for new_size in 4096 1048576; do
 echo
 echo "== resize empty image with block_resize =="
-_make_test_img -o extent_size_hint=0 0 | _filter_imgfmt
+_make_test_img -o extent_size_hint=0 0
 _block_resize $TEST_IMG $new_size >/dev/null
 stat -c "size=%s, blocks=%b" $TEST_IMG | _filter_blocks $extra_blocks 
$min_blocks $new_size
 done
diff --git a/tests/qemu-iotests/249 b/tests/qemu-iotests/249
index 68f13ed328..a9aa9303eb 100755
--- a/tests/qemu-iotests/249
+++ b/tests/qemu-iotests/249
@@ -48,9 +48,9 @@ _supported_os Linux
 IMG_SIZE=1M
 
 # Create the images: base <- int <- active
-TEST_IMG="$TEST_IMG.base" _make_test_img $IMG_SIZE | _filter_imgfmt
-TEST_IMG="$TEST_IMG.int" _make_test_img -b "$TEST_IMG.base" -F $IMGFMT | 
_filter_imgfmt
-_make_test_img -b "$TEST_IMG.int" -F $IMGFMT | _filter_imgfmt
+TEST_IMG="$TEST_IMG.base" _make_test_img $IMG_SIZE
+TEST_IMG="$TEST_IMG.int" _make_test_img -b "$TEST_IMG.base" -F $IMGFMT
+_make_test_img -b "$TEST_IMG.int" -F $IMGFMT
 
 # Launch QEMU with these two drives:
 # none0: base (read-only)
-- 
2.29.2

[PULL 10/34] fuse: Implement hole detection through lseek

2020-12-11 Thread Kevin Wolf

From: Max Reitz 

This is a relatively new feature in libfuse (available since 3.8.0,
which was released in November 2019), so we have to add a dedicated
check whether it is available before making use of it.

Signed-off-by: Max Reitz 
Message-Id: <20201027190600.192171-7-mre...@redhat.com>
Signed-off-by: Kevin Wolf 
---
 meson_options.txt   |  2 ++
 configure   |  8 -
 block/export/fuse.c | 77 +
 meson.build | 20 
 4 files changed, 106 insertions(+), 1 deletion(-)

diff --git a/meson_options.txt b/meson_options.txt
index 8f9f2e3df6..74ac853548 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -68,6 +68,8 @@ option('vhost_user_blk_server', type: 'feature', value: 
'auto',
description: 'build vhost-user-blk server')
 option('fuse', type: 'feature', value: 'auto',
description: 'FUSE block device export')
+option('fuse_lseek', type : 'feature', value : 'auto',
+   description: 'SEEK_HOLE/SEEK_DATA support for FUSE exports')
 
 option('capstone', type: 'combo', value: 'auto',
choices: ['disabled', 'enabled', 'auto', 'system', 'internal'],
diff --git a/configure b/configure
index b2f96c0da2..3f823ed163 100755
--- a/configure
+++ b/configure
@@ -450,6 +450,7 @@ ninja=""
 skip_meson=no
 gettext=""
 fuse="auto"
+fuse_lseek="auto"
 
 bogus_os="no"
 malloc_trim="auto"
@@ -1530,6 +1531,10 @@ for opt do
   ;;
   --disable-fuse) fuse="disabled"
   ;;
+  --enable-fuse-lseek) fuse_lseek="enabled"
+  ;;
+  --disable-fuse-lseek) fuse_lseek="disabled"
+  ;;
   *)
   echo "ERROR: unknown option $opt"
   echo "Try '$0 --help' for more information"
@@ -1856,6 +1861,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   rng-nonedummy RNG, avoid using /dev/(u)random and getrandom()
   libdaxctl   libdaxctl support
   fuseFUSE block device export
+  fuse-lseek  SEEK_HOLE/SEEK_DATA support for FUSE exports
 
 NOTE: The object files are built at the place where configure is launched
 EOF
@@ -7020,7 +7026,7 @@ NINJA=$ninja $meson setup \
 -Diconv=$iconv -Dcurses=$curses -Dlibudev=$libudev\
 -Ddocs=$docs -Dsphinx_build=$sphinx_build -Dinstall_blobs=$blobs \
 -Dvhost_user_blk_server=$vhost_user_blk_server \
--Dfuse=$fuse \
+-Dfuse=$fuse -Dfuse_lseek=$fuse_lseek \
 $cross_arg \
 "$PWD" "$source_path"
 
diff --git a/block/export/fuse.c b/block/export/fuse.c
index 0b9d226b2f..38f74c94da 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -627,6 +627,80 @@ static void fuse_flush(fuse_req_t req, fuse_ino_t inode,
 fuse_fsync(req, inode, 1, fi);
 }
 
+#ifdef CONFIG_FUSE_LSEEK
+/**
+ * Let clients inquire allocation status.
+ */
+static void fuse_lseek(fuse_req_t req, fuse_ino_t inode, off_t offset,
+   int whence, struct fuse_file_info *fi)
+{
+FuseExport *exp = fuse_req_userdata(req);
+
+if (whence != SEEK_HOLE && whence != SEEK_DATA) {
+fuse_reply_err(req, EINVAL);
+return;
+}
+
+while (true) {
+int64_t pnum;
+int ret;
+
+ret = bdrv_block_status_above(blk_bs(exp->common.blk), NULL,
+  offset, INT64_MAX, &pnum, NULL, NULL);
+if (ret < 0) {
+fuse_reply_err(req, -ret);
+return;
+}
+
+if (!pnum && (ret & BDRV_BLOCK_EOF)) {
+int64_t blk_len;
+
+/*
+ * If blk_getlength() rounds (e.g. by sectors), then the
+ * export length will be rounded, too.  However,
+ * bdrv_block_status_above() may return EOF at unaligned
+ * offsets.  We must not let this become visible and thus
+ * always simulate a hole between @offset (the real EOF)
+ * and @blk_len (the client-visible EOF).
+ */
+
+blk_len = blk_getlength(exp->common.blk);
+if (blk_len < 0) {
+fuse_reply_err(req, -blk_len);
+return;
+}
+
+if (offset > blk_len || whence == SEEK_DATA) {
+fuse_reply_err(req, ENXIO);
+} else {
+fuse_reply_lseek(req, offset);
+}
+return;
+}
+
+if (ret & BDRV_BLOCK_DATA) {
+if (whence == SEEK_DATA) {
+fuse_reply_lseek(req, offset);
+return;
+}
+} else {
+if (whence == SEEK_HOLE) {
+fuse_reply_lseek(req, offset);
+return;
+}
+}
+
+/* Safety check against infinite loops */
+if (!pnum) {
+fuse_reply_err(req, ENXIO);
+return;
+}
+
+offset += pnum;
+}
+}
+#endif
+
 static const struct fuse_lowlevel_ops fuse_ops = {
 .init   = fuse_init,
 .lookup = fuse_lookup,
@@ -638,6 +712,9 @@ static const struct fuse_lowlevel_ops fus

[PULL 09/34] fuse: (Partially) implement fallocate()

2020-12-11 Thread Kevin Wolf

From: Max Reitz 

This allows allocating areas after the (old) EOF as part of a growing
resize, writing zeroes, and discarding.

Signed-off-by: Max Reitz 
Message-Id: <20201027190600.192171-6-mre...@redhat.com>
Signed-off-by: Kevin Wolf 
---
 block/export/fuse.c | 84 +
 1 file changed, 84 insertions(+)

diff --git a/block/export/fuse.c b/block/export/fuse.c
index 92d2f50bcc..0b9d226b2f 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -521,6 +521,89 @@ static void fuse_write(fuse_req_t req, fuse_ino_t inode, 
const char *buf,
 }
 }
 
+/**
+ * Let clients perform various fallocate() operations.
+ */
+static void fuse_fallocate(fuse_req_t req, fuse_ino_t inode, int mode,
+   off_t offset, off_t length,
+   struct fuse_file_info *fi)
+{
+FuseExport *exp = fuse_req_userdata(req);
+int64_t blk_len;
+int ret;
+
+if (!exp->writable) {
+fuse_reply_err(req, EACCES);
+return;
+}
+
+blk_len = blk_getlength(exp->common.blk);
+if (blk_len < 0) {
+fuse_reply_err(req, -blk_len);
+return;
+}
+
+if (mode & FALLOC_FL_KEEP_SIZE) {
+length = MIN(length, blk_len - offset);
+}
+
+if (mode & FALLOC_FL_PUNCH_HOLE) {
+if (!(mode & FALLOC_FL_KEEP_SIZE)) {
+fuse_reply_err(req, EINVAL);
+return;
+}
+
+do {
+int size = MIN(length, BDRV_REQUEST_MAX_BYTES);
+
+ret = blk_pdiscard(exp->common.blk, offset, size);
+offset += size;
+length -= size;
+} while (ret == 0 && length > 0);
+} else if (mode & FALLOC_FL_ZERO_RANGE) {
+if (!(mode & FALLOC_FL_KEEP_SIZE) && offset + length > blk_len) {
+/* No need for zeroes, we are going to write them ourselves */
+ret = fuse_do_truncate(exp, offset + length, false,
+   PREALLOC_MODE_OFF);
+if (ret < 0) {
+fuse_reply_err(req, -ret);
+return;
+}
+}
+
+do {
+int size = MIN(length, BDRV_REQUEST_MAX_BYTES);
+
+ret = blk_pwrite_zeroes(exp->common.blk,
+offset, size, 0);
+offset += size;
+length -= size;
+} while (ret == 0 && length > 0);
+} else if (!mode) {
+/* We can only fallocate at the EOF with a truncate */
+if (offset < blk_len) {
+fuse_reply_err(req, EOPNOTSUPP);
+return;
+}
+
+if (offset > blk_len) {
+/* No preallocation needed here */
+ret = fuse_do_truncate(exp, offset, true, PREALLOC_MODE_OFF);
+if (ret < 0) {
+fuse_reply_err(req, -ret);
+return;
+}
+}
+
+ret = fuse_do_truncate(exp, offset + length, true,
+   PREALLOC_MODE_FALLOC);
+} else {
+ret = -EOPNOTSUPP;
+}
+
+fuse_reply_err(req, ret < 0 ? -ret : 0);
+}
+
 /**
  * Let clients fsync the exported image.
  */
@@ -552,6 +635,7 @@ static const struct fuse_lowlevel_ops fuse_ops = {
 .open   = fuse_open,
 .read   = fuse_read,
 .write  = fuse_write,
+.fallocate  = fuse_fallocate,
 .flush  = fuse_flush,
 .fsync  = fuse_fsync,
 };
-- 
2.29.2

[PULL 19/34] iotests/287: Clean up subshell test image

2020-12-11 Thread Kevin Wolf

From: Max Reitz 

287 creates an image in a subshell (thanks to the pipe) to see whether
that is possible with compression_type=zstd.  If _make_test_img were to
modify any global state, this global state would then be lost before we
could cleanup the image.

When using FUSE as the test protocol, this global state is important, so
clean up the image before the state is lost.

Signed-off-by: Max Reitz 
Reviewed-by: Kevin Wolf 
Message-Id: <20201027190600.192171-16-mre...@redhat.com>
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/287 | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/287 b/tests/qemu-iotests/287
index f98a4cadc1..036cc09e82 100755
--- a/tests/qemu-iotests/287
+++ b/tests/qemu-iotests/287
@@ -51,8 +51,8 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 CLUSTER_SIZE=65536
 
 # Check if we can run this test.
-if IMGOPTS='compression_type=zstd' _make_test_img 64M |
-grep "Invalid parameter 'zstd'"; then
+output=$(_make_test_img -o 'compression_type=zstd' 64M; _cleanup_test_img)
+if echo "$output" | grep -q "Invalid parameter 'zstd'"; then
 _notrun "ZSTD is disabled"
 fi
 
-- 
2.29.2

[PULL 22/34] iotests: Allow testing FUSE exports

2020-12-11 Thread Kevin Wolf

From: Max Reitz 

This pretends FUSE exports are a kind of protocol.  As such, they are
always tested under the format node.  This is probably the best way to
test them, actually, because this will generate more I/O load and more
varied patterns.

Signed-off-by: Max Reitz 
Message-Id: <20201027190600.192171-19-mre...@redhat.com>
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/check |   6 ++
 tests/qemu-iotests/common.filter |   5 +-
 tests/qemu-iotests/common.rc | 124 +++
 3 files changed, 134 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index 3c1fa4435a..952762d5ed 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -270,6 +270,7 @@ image protocol options
 -rbdtest rbd
 -sheepdog   test sheepdog
 -nbdtest nbd
+-fuse   test fuse
 -sshtest ssh
 -nfstest nfs
 
@@ -382,6 +383,11 @@ testlist options
 xpand=false
 ;;
 
+-fuse)
+IMGPROTO=fuse
+xpand=false
+;;
+
 -ssh)
 IMGPROTO=ssh
 xpand=false
diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index 838ed15793..172ea5752e 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -44,7 +44,8 @@ _filter_qom_path()
 _filter_testdir()
 {
 $SED -e "s#$TEST_DIR/#TEST_DIR/#g" \
- -e "s#$SOCK_DIR/#SOCK_DIR/#g"
+ -e "s#$SOCK_DIR/#SOCK_DIR/#g" \
+ -e "s#SOCK_DIR/fuse-#TEST_DIR/#g"
 }
 
 # replace occurrences of the actual IMGFMT value with IMGFMT
@@ -127,6 +128,7 @@ _filter_img_create_filenames()
 -e "s#$IMGPROTO:$TEST_DIR#TEST_DIR#g" \
 -e "s#$TEST_DIR#TEST_DIR#g" \
 -e "s#$SOCK_DIR#SOCK_DIR#g" \
+-e 's#SOCK_DIR/fuse-#TEST_DIR/#g' \
 -e "s#$IMGFMT#IMGFMT#g" \
 -e 's#nbd:127.0.0.1:[0-9]\\+#TEST_DIR/t.IMGFMT#g' \
 -e 's#nbd+unix:///\??socket=SOCK_DIR/nbd#TEST_DIR/t.IMGFMT#g'
@@ -227,6 +229,7 @@ _filter_img_info()
 -e "s#$IMGFMT#IMGFMT#g" \
 -e 's#nbd://127.0.0.1:[0-9]\\+$#TEST_DIR/t.IMGFMT#g' \
 -e 's#nbd+unix:///\??socket=SOCK_DIR/nbd#TEST_DIR/t.IMGFMT#g' \
+-e 's#SOCK_DIR/fuse-#TEST_DIR/#g' \
 -e "/encrypted: yes/d" \
 -e "/cluster_size: [0-9]\\+/d" \
 -e "/table_size: [0-9]\\+/d" \
diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index 20589e59a5..29354654cc 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -257,6 +257,9 @@ if [ "$IMGOPTSSYNTAX" = "true" ]; then
 TEST_IMG_FILE=$TEST_DIR/t.$IMGFMT
 TEST_IMG="$DRIVER,file.driver=nbd,file.type=unix"
 TEST_IMG="$TEST_IMG,file.path=$SOCK_DIR/nbd"
+elif [ "$IMGPROTO" = "fuse" ]; then
+TEST_IMG_FILE=$TEST_DIR/t.$IMGFMT
+TEST_IMG="$DRIVER,file.filename=$SOCK_DIR/fuse-t.$IMGFMT"
 elif [ "$IMGPROTO" = "ssh" ]; then
 TEST_IMG_FILE=$TEST_DIR/t.$IMGFMT
 
TEST_IMG="$DRIVER,file.driver=ssh,file.host=127.0.0.1,file.path=$TEST_IMG_FILE"
@@ -273,6 +276,9 @@ else
 elif [ "$IMGPROTO" = "nbd" ]; then
 TEST_IMG_FILE=$TEST_DIR/t.$IMGFMT
 TEST_IMG="nbd+unix:///?socket=$SOCK_DIR/nbd"
+elif [ "$IMGPROTO" = "fuse" ]; then
+TEST_IMG_FILE=$TEST_DIR/t.$IMGFMT
+TEST_IMG="$SOCK_DIR/fuse-t.$IMGFMT"
 elif [ "$IMGPROTO" = "ssh" ]; then
 TEST_IMG_FILE=$TEST_DIR/t.$IMGFMT
 
REMOTE_TEST_DIR="ssh://\\($USER@\\)\\?127.0.0.1\\(:[0-9]\\+\\)\\?$TEST_DIR"
@@ -288,6 +294,9 @@ fi
 ORIG_TEST_IMG_FILE=$TEST_IMG_FILE
 ORIG_TEST_IMG="$TEST_IMG"
 
+FUSE_PIDS=()
+FUSE_EXPORTS=()
+
 if [ -z "$TEST_DIR" ]; then
 TEST_DIR=$PWD/scratch
 fi
@@ -357,6 +366,10 @@ _test_img_to_test_img_file()
 echo "$1"
 ;;
 
+fuse)
+echo "$1" | sed -e "s#$SOCK_DIR/fuse-#$TEST_DIR/#"
+;;
+
 nfs)
 echo "$1" | sed -e "s#nfs://127.0.0.1##"
 ;;
@@ -385,6 +398,11 @@ _make_test_img()
 local opts_param=false
 local misc_params=()
 
+if [[ $IMGPROTO == fuse && $TEST_IMG == $SOCK_DIR/fuse-* ]]; then
+# The caller may be trying to overwrite an existing image
+_rm_test_img "$TEST_IMG"
+fi
+
 if [ -z "$TEST_IMG_FILE" ]; then
 img_name=$TEST_IMG
 elif [ "$IMGOPTSSYNTAX" != "true" -a \
@@ -469,11 +487,105 @@ _make_test_img()
 eval "$QEMU_NBD -v -t -k '$SOCK_DIR/nbd' -f $IMGFMT -e 42 -x '' 
$TEST_IMG_FILE >/dev/null &"
 sleep 1 # FIXME: qemu-nbd needs to be listening before we continue
 fi
+
+if [ $IMGPROTO = "fuse" -a -f "$img_name" ]; then
+local export_mp
+local pid
+local pidfile
+local timeout
+
+export_mp=$(echo "$img_name" | sed -e "s#$TEST_DIR/#$SOCK_DIR/fuse-#")
+if ! echo "$export_mp" | grep -q "^$SOCK

[PULL 05/34] meson: Detect libfuse

2020-12-11 Thread Kevin Wolf

From: Max Reitz 

Signed-off-by: Max Reitz 
Message-Id: <20201027190600.192171-2-mre...@redhat.com>
Signed-off-by: Kevin Wolf 
---
 meson_options.txt | 2 ++
 configure | 7 +++
 meson.build   | 6 ++
 3 files changed, 15 insertions(+)

diff --git a/meson_options.txt b/meson_options.txt
index f6f64785fe..8f9f2e3df6 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -66,6 +66,8 @@ option('virtiofsd', type: 'feature', value: 'auto',
description: 'build virtiofs daemon (virtiofsd)')
 option('vhost_user_blk_server', type: 'feature', value: 'auto',
description: 'build vhost-user-blk server')
+option('fuse', type: 'feature', value: 'auto',
+   description: 'FUSE block device export')
 
 option('capstone', type: 'combo', value: 'auto',
choices: ['disabled', 'enabled', 'auto', 'system', 'internal'],
diff --git a/configure b/configure
index 18c26e0389..b2f96c0da2 100755
--- a/configure
+++ b/configure
@@ -449,6 +449,7 @@ meson=""
 ninja=""
 skip_meson=no
 gettext=""
+fuse="auto"
 
 bogus_os="no"
 malloc_trim="auto"
@@ -1525,6 +1526,10 @@ for opt do
   ;;
   --disable-libdaxctl) libdaxctl=no
   ;;
+  --enable-fuse) fuse="enabled"
+  ;;
+  --disable-fuse) fuse="disabled"
+  ;;
   *)
   echo "ERROR: unknown option $opt"
   echo "Try '$0 --help' for more information"
@@ -1850,6 +1855,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   xkbcommon   xkbcommon support
   rng-nonedummy RNG, avoid using /dev/(u)random and getrandom()
   libdaxctl   libdaxctl support
+  fuseFUSE block device export
 
 NOTE: The object files are built at the place where configure is launched
 EOF
@@ -7014,6 +7020,7 @@ NINJA=$ninja $meson setup \
 -Diconv=$iconv -Dcurses=$curses -Dlibudev=$libudev\
 -Ddocs=$docs -Dsphinx_build=$sphinx_build -Dinstall_blobs=$blobs \
 -Dvhost_user_blk_server=$vhost_user_blk_server \
+-Dfuse=$fuse \
 $cross_arg \
 "$PWD" "$source_path"
 
diff --git a/meson.build b/meson.build
index 9ea05ab49f..6e8ef151d8 100644
--- a/meson.build
+++ b/meson.build
@@ -773,6 +773,10 @@ elif get_option('vhost_user_blk_server').disabled() or not 
have_system
 have_vhost_user_blk_server = false
 endif
 
+fuse = dependency('fuse3', required: get_option('fuse'),
+  version: '>=3.1', method: 'pkg-config',
+  static: enable_static)
+
 #
 # config-host.h #
 #
@@ -807,6 +811,7 @@ config_host_data.set('CONFIG_KEYUTILS', keyutils.found())
 config_host_data.set('CONFIG_GETTID', has_gettid)
 config_host_data.set('CONFIG_MALLOC_TRIM', has_malloc_trim)
 config_host_data.set('CONFIG_STATX', has_statx)
+config_host_data.set('CONFIG_FUSE', fuse.found())
 config_host_data.set('QEMU_VERSION', '"@0@"'.format(meson.project_version()))
 config_host_data.set('QEMU_VERSION_MAJOR', 
meson.project_version().split('.')[0])
 config_host_data.set('QEMU_VERSION_MINOR', 
meson.project_version().split('.')[1])
@@ -2208,6 +2213,7 @@ endif
 summary_info += {'thread sanitizer':  config_host.has_key('CONFIG_TSAN')}
 summary_info += {'rng-none':  config_host.has_key('CONFIG_RNG_NONE')}
 summary_info += {'Linux keyring': 
config_host.has_key('CONFIG_SECRET_KEYRING')}
+summary_info += {'FUSE exports':  fuse.found()}
 summary(summary_info, bool_yn: true)
 
 if not supported_cpus.contains(cpu)
-- 
2.29.2

[PULL 30/34] block/io: bdrv_check_byte_request(): drop bdrv_is_inserted()

2020-12-11 Thread Kevin Wolf

From: Vladimir Sementsov-Ogievskiy 

Move bdrv_is_inserted() calls into callers.

We are going to make bdrv_check_byte_request() a clean thing.
bdrv_is_inserted() is not about checking the request, it's about
checking the bs. So, it should be separate.

With this patch we probably change error path for some failure
scenarios. But depending on the fact that querying too big request on
empty cdrom (or corrupted qcow2 node with no drv) will result in EIO
and not ENOMEDIUM would be very strange. More over, we are going to
move to 64bit requests, so larger requests will be allowed anyway.

More over, keeping in mind that cdrom is the only driver that has
.bdrv_is_inserted() handler it's strange that we should care so much
about it in generic block layer, intuitively we should just do read and
write, and cdrom driver should return correct errors if it is not
inserted. But it's a work for another series.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Message-Id: <20201203222713.13507-4-vsement...@virtuozzo.com>
Reviewed-by: Alberto Garcia 
Signed-off-by: Kevin Wolf 
---
 block/io.c | 25 -
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/block/io.c b/block/io.c
index 3e91074c9f..ef75a5abb4 100644
--- a/block/io.c
+++ b/block/io.c
@@ -884,17 +884,12 @@ static bool coroutine_fn 
bdrv_wait_serialising_requests(BdrvTrackedRequest *self
 return waited;
 }
 
-static int bdrv_check_byte_request(BlockDriverState *bs, int64_t offset,
-   size_t size)
+static int bdrv_check_byte_request(int64_t offset, size_t size)
 {
 if (size > BDRV_REQUEST_MAX_BYTES) {
 return -EIO;
 }
 
-if (!bdrv_is_inserted(bs)) {
-return -ENOMEDIUM;
-}
-
 if (offset < 0) {
 return -EIO;
 }
@@ -1642,7 +1637,11 @@ int coroutine_fn bdrv_co_preadv_part(BdrvChild *child,
 
 trace_bdrv_co_preadv(bs, offset, bytes, flags);
 
-ret = bdrv_check_byte_request(bs, offset, bytes);
+if (!bdrv_is_inserted(bs)) {
+return -ENOMEDIUM;
+}
+
+ret = bdrv_check_byte_request(offset, bytes);
 if (ret < 0) {
 return ret;
 }
@@ -2054,11 +2053,11 @@ int coroutine_fn bdrv_co_pwritev_part(BdrvChild *child,
 
 trace_bdrv_co_pwritev(child->bs, offset, bytes, flags);
 
-if (!bs->drv) {
+if (!bdrv_is_inserted(bs)) {
 return -ENOMEDIUM;
 }
 
-ret = bdrv_check_byte_request(bs, offset, bytes);
+ret = bdrv_check_byte_request(offset, bytes);
 if (ret < 0) {
 return ret;
 }
@@ -3045,10 +3044,10 @@ static int coroutine_fn bdrv_co_copy_range_internal(
 assert(!(read_flags & BDRV_REQ_NO_FALLBACK));
 assert(!(write_flags & BDRV_REQ_NO_FALLBACK));
 
-if (!dst || !dst->bs) {
+if (!dst || !dst->bs || !bdrv_is_inserted(dst->bs)) {
 return -ENOMEDIUM;
 }
-ret = bdrv_check_byte_request(dst->bs, dst_offset, bytes);
+ret = bdrv_check_byte_request(dst_offset, bytes);
 if (ret) {
 return ret;
 }
@@ -3056,10 +3055,10 @@ static int coroutine_fn bdrv_co_copy_range_internal(
 return bdrv_co_pwrite_zeroes(dst, dst_offset, bytes, write_flags);
 }
 
-if (!src || !src->bs) {
+if (!src || !src->bs || !bdrv_is_inserted(src->bs)) {
 return -ENOMEDIUM;
 }
-ret = bdrv_check_byte_request(src->bs, src_offset, bytes);
+ret = bdrv_check_byte_request(src_offset, bytes);
 if (ret) {
 return ret;
 }
-- 
2.29.2

[PULL 31/34] block: introduce BDRV_MAX_LENGTH

2020-12-11 Thread Kevin Wolf

From: Vladimir Sementsov-Ogievskiy 

We are going to modify block layer to work with 64bit requests. And
first step is moving to int64_t type for both offset and bytes
arguments in all block request related functions.

It's mostly safe (when widening signed or unsigned int to int64_t), but
switching from uint64_t is questionable.

So, let's first establish the set of requests we want to work with.
First signed int64_t should be enough, as off_t is signed anyway. Then,
obviously offset + bytes should not overflow.

And most interesting: (offset + bytes) being aligned up should not
overflow as well. Aligned to what alignment? First thing that comes in
mind is bs->bl.request_alignment, as we align up request to this
alignment. But there is another thing: look at
bdrv_mark_request_serialising(). It aligns request up to some given
alignment. And this parameter may be bdrv_get_cluster_size(), which is
often a lot greater than bs->bl.request_alignment.
Note also, that bdrv_mark_request_serialising() uses signed int64_t for
calculations. So, actually, we already depend on some restrictions.

Happily, bdrv_get_cluster_size() returns int and
bs->bl.request_alignment has 32bit unsigned type, but defined to be a
power of 2 less than INT_MAX. So, we may establish, that INT_MAX is
absolute maximum for any kind of alignment that may occur with the
request.

Note, that bdrv_get_cluster_size() is not documented to return power
of 2, still bdrv_mark_request_serialising() behaves like it is.
Also, backup uses bdi.cluster_size and is not prepared to it not being
power of 2.
So, let's establish that Qemu supports only power-of-2 clusters and
alignments.

So, alignment can't be greater than 2^30.

Finally to be safe with calculations, to not calculate different
maximums for different nodes (depending on cluster size and
request_alignment), let's simply set QEMU_ALIGN_DOWN(INT64_MAX, 2^30)
as absolute maximum bytes length for Qemu. Actually, it's not much less
than INT64_MAX.

OK, then, let's apply it to block/io.

Let's consider all block/io entry points of offset/bytes:

4 bytes/offset interface functions: bdrv_co_preadv_part(),
bdrv_co_pwritev_part(), bdrv_co_copy_range_internal() and
bdrv_co_pdiscard() and we check them all with bdrv_check_request().

We also have one entry point with only offset: bdrv_co_truncate().
Check the offset.

And one public structure: BdrvTrackedRequest. Happily, it has only
three external users:

 file-posix.c: adopted by this patch
 write-threshold.c: only read fields
 test-write-threshold.c: sets obviously small constant values

Better is to make the structure private and add corresponding
interfaces.. Still it's not obvious what kind of interface is needed
for file-posix.c. Let's keep it public but add corresponding
assertions.

After this patch we'll convert functions in block/io.c to int64_t bytes
and offset parameters. We can assume that offset/bytes pair always
satisfy new restrictions, and make
corresponding assertions where needed. If we reach some offset/bytes
point in block/io.c missing bdrv_check_request() it is considered a
bug. As well, if block/io.c modifies a offset/bytes request, expanding
it more then aligning up to request_alignment, it's a bug too.

For all io requests except for discard we keep for now old restriction
of 32bit request length.

iotest 206 output error message changed, as now test disk size is
larger than new limit. Add one more test case with new maximum disk
size to cover too-big-L1 case.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Message-Id: <20201203222713.13507-5-vsement...@virtuozzo.com>
Signed-off-by: Kevin Wolf 
---
 include/block/block.h| 10 +++
 include/block/block_int.h|  8 ++
 block.c  | 17 +++-
 block/file-posix.c   |  6 ++---
 block/io.c   | 51 +---
 tests/test-write-threshold.c |  4 +++
 tests/qemu-iotests/206   |  2 +-
 tests/qemu-iotests/206.out   |  6 +
 8 files changed, 90 insertions(+), 14 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index c9d7c58765..5b81e33e94 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -142,6 +142,16 @@ typedef struct HDGeometry {
INT_MAX >> BDRV_SECTOR_BITS)
 #define BDRV_REQUEST_MAX_BYTES (BDRV_REQUEST_MAX_SECTORS << BDRV_SECTOR_BITS)
 
+/*
+ * We want allow aligning requests and disk length up to any 32bit alignment
+ * and don't afraid of overflow.
+ * To achieve it, and in the same time use some pretty number as maximum disk
+ * size, let's define maximum "length" (a limit for any offset/bytes request 
and
+ * for disk size) to be the greatest power of 2 less than INT64_MAX.
+ */
+#define BDRV_MAX_ALIGNMENT (1L << 30)
+#define BDRV_MAX_LENGTH (QEMU_ALIGN_DOWN(INT64_MAX, BDRV_MAX_ALIGNMENT))
+
 /*
  * Allocation status flags for bdrv_block_status() and friends.
  *
diff --git a/include/block/block_int.h b/include/

[PULL 27/34] can-host: Fix crash when 'canbus' property is not set

2020-12-11 Thread Kevin Wolf

Providing the 'if' property, but not 'canbus' segfaults like this:

 #0  0x55b0f14d in can_bus_insert_client (bus=0x0, 
client=0x56aa9af0) at ../net/can/can_core.c:88
 #1  0x559c3803 in can_host_connect (ch=0x56aa9ac0, 
errp=0x7fffd568) at ../net/can/can_host.c:62
 #2  0x559c386a in can_host_complete (uc=0x56aa9ac0, 
errp=0x7fffd568) at ../net/can/can_host.c:72
 #3  0x55d52de9 in user_creatable_complete (uc=0x56aa9ac0, 
errp=0x7fffd5c8) at ../qom/object_interfaces.c:23

Add the missing NULL check.

Signed-off-by: Kevin Wolf 
Message-Id: <20201130105615.21799-5-kw...@redhat.com>
Signed-off-by: Kevin Wolf 
---
 net/can/can_host.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/net/can/can_host.c b/net/can/can_host.c
index be4547d913..ba7f163d0a 100644
--- a/net/can/can_host.c
+++ b/net/can/can_host.c
@@ -53,6 +53,11 @@ static void can_host_connect(CanHostState *ch, Error **errp)
 CanHostClass *chc = CAN_HOST_GET_CLASS(ch);
 Error *local_err = NULL;
 
+if (ch->bus == NULL) {
+error_setg(errp, "'canbus' property not set");
+return;
+}
+
 chc->connect(ch, &local_err);
 if (local_err) {
 error_propagate(errp, local_err);
-- 
2.29.2

[PULL 16/34] iotests/091: Use _cleanup_qemu instad of "wait"

2020-12-11 Thread Kevin Wolf

From: Max Reitz 

If the test environment has some other child processes running (like a
storage daemon that provides a FUSE export), then "wait" will never
finish.  Use wait=yes _cleanup_qemu instead.

(We need to discard the output so there is no change to the reference
output.)

Signed-off-by: Max Reitz 
Reviewed-by: Kevin Wolf 
Message-Id: <20201027190600.192171-13-mre...@redhat.com>
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/091 | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/091 b/tests/qemu-iotests/091
index 68fbfd777b..8a4ce5b7e2 100755
--- a/tests/qemu-iotests/091
+++ b/tests/qemu-iotests/091
@@ -96,7 +96,8 @@ _send_qemu_cmd $h2 'qemu-io disk flush' "(qemu)"
 _send_qemu_cmd $h2 'quit' ""
 _send_qemu_cmd $h1 'quit' ""
 
-wait
+wait=yes _cleanup_qemu >/dev/null
+
 echo "Check image pattern"
 ${QEMU_IO} -c "read -P 0x22 0 4M" "${TEST_IMG}" | _filter_testdir | 
_filter_qemu_io
 
-- 
2.29.2

[PULL 02/34] block/curl: Use lock guard macros

2020-12-11 Thread Kevin Wolf

From: Gan Qixin 

Replace manual lock()/unlock() calls with lock guard macros
(QEMU_LOCK_GUARD/WITH_QEMU_LOCK_GUARD) in block/curl.

Signed-off-by: Gan Qixin 
Reviewed-by: Paolo Bonzini 
Message-Id: <20201203075055.127773-3-ganqi...@huawei.com>
Signed-off-by: Kevin Wolf 
---
 block/curl.c | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/block/curl.c b/block/curl.c
index 4f907c47be..d24a4c5897 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -564,23 +564,23 @@ static void curl_detach_aio_context(BlockDriverState *bs)
 BDRVCURLState *s = bs->opaque;
 int i;
 
-qemu_mutex_lock(&s->mutex);
-for (i = 0; i < CURL_NUM_STATES; i++) {
-if (s->states[i].in_use) {
-curl_clean_state(&s->states[i]);
+WITH_QEMU_LOCK_GUARD(&s->mutex) {
+for (i = 0; i < CURL_NUM_STATES; i++) {
+if (s->states[i].in_use) {
+curl_clean_state(&s->states[i]);
+}
+if (s->states[i].curl) {
+curl_easy_cleanup(s->states[i].curl);
+s->states[i].curl = NULL;
+}
+g_free(s->states[i].orig_buf);
+s->states[i].orig_buf = NULL;
 }
-if (s->states[i].curl) {
-curl_easy_cleanup(s->states[i].curl);
-s->states[i].curl = NULL;
+if (s->multi) {
+curl_multi_cleanup(s->multi);
+s->multi = NULL;
 }
-g_free(s->states[i].orig_buf);
-s->states[i].orig_buf = NULL;
-}
-if (s->multi) {
-curl_multi_cleanup(s->multi);
-s->multi = NULL;
 }
-qemu_mutex_unlock(&s->mutex);
 
 timer_del(&s->timer);
 }
-- 
2.29.2

[PULL 08/34] fuse: Allow growable exports

2020-12-11 Thread Kevin Wolf

From: Max Reitz 

These will behave more like normal files in that writes beyond the EOF
will automatically grow the export size.

As an optimization, keep the RESIZE permission for growable exports so
we do not have to take it for every post-EOF write.  (This permission is
not released when the export is destroyed, because at that point the
BlockBackend is destroyed altogether anyway.)

Signed-off-by: Max Reitz 
Message-Id: <20201027190600.192171-5-mre...@redhat.com>
Signed-off-by: Kevin Wolf 
---
 qapi/block-export.json |  6 +-
 block/export/fuse.c| 44 ++
 2 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/qapi/block-export.json b/qapi/block-export.json
index 430bc69f35..e819e70cac 100644
--- a/qapi/block-export.json
+++ b/qapi/block-export.json
@@ -129,10 +129,14 @@
 # @mountpoint: Path on which to export the block device via FUSE.
 #  This must point to an existing regular file.
 #
+# @growable: Whether writes beyond the EOF should grow the block node
+#accordingly. (default: false)
+#
 # Since: 6.0
 ##
 { 'struct': 'BlockExportOptionsFuse',
-  'data': { 'mountpoint': 'str' },
+  'data': { 'mountpoint': 'str',
+'*growable': 'bool' },
   'if': 'defined(CONFIG_FUSE)' }
 
 ##
diff --git a/block/export/fuse.c b/block/export/fuse.c
index d995829ab7..92d2f50bcc 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -45,6 +45,7 @@ typedef struct FuseExport {
 
 char *mountpoint;
 bool writable;
+bool growable;
 } FuseExport;
 
 static GHashTable *exports;
@@ -72,6 +73,19 @@ static int fuse_export_create(BlockExport *blk_exp,
 
 assert(blk_exp_args->type == BLOCK_EXPORT_TYPE_FUSE);
 
+/* For growable exports, take the RESIZE permission */
+if (args->growable) {
+uint64_t blk_perm, blk_shared_perm;
+
+blk_get_perm(exp->common.blk, &blk_perm, &blk_shared_perm);
+
+ret = blk_set_perm(exp->common.blk, blk_perm | BLK_PERM_RESIZE,
+   blk_shared_perm, errp);
+if (ret < 0) {
+return ret;
+}
+}
+
 init_exports_table();
 
 /*
@@ -102,6 +116,7 @@ static int fuse_export_create(BlockExport *blk_exp,
 
 exp->mountpoint = g_strdup(args->mountpoint);
 exp->writable = blk_exp_args->writable;
+exp->growable = args->growable;
 
 ret = setup_fuse_export(exp, args->mountpoint, errp);
 if (ret < 0) {
@@ -349,19 +364,24 @@ static int fuse_do_truncate(const FuseExport *exp, 
int64_t size,
 truncate_flags |= BDRV_REQ_ZERO_WRITE;
 }
 
-blk_get_perm(exp->common.blk, &blk_perm, &blk_shared_perm);
+/* Growable exports have a permanent RESIZE permission */
+if (!exp->growable) {
+blk_get_perm(exp->common.blk, &blk_perm, &blk_shared_perm);
 
-ret = blk_set_perm(exp->common.blk, blk_perm | BLK_PERM_RESIZE,
-   blk_shared_perm, NULL);
-if (ret < 0) {
-return ret;
+ret = blk_set_perm(exp->common.blk, blk_perm | BLK_PERM_RESIZE,
+   blk_shared_perm, NULL);
+if (ret < 0) {
+return ret;
+}
 }
 
 ret = blk_truncate(exp->common.blk, size, true, prealloc,
truncate_flags, NULL);
 
-/* Must succeed, because we are only giving up the RESIZE permission */
-blk_set_perm(exp->common.blk, blk_perm, blk_shared_perm, &error_abort);
+if (!exp->growable) {
+/* Must succeed, because we are only giving up the RESIZE permission */
+blk_set_perm(exp->common.blk, blk_perm, blk_shared_perm, &error_abort);
+}
 
 return ret;
 }
@@ -482,7 +502,15 @@ static void fuse_write(fuse_req_t req, fuse_ino_t inode, 
const char *buf,
 }
 
 if (offset + size > length) {
-size = length - offset;
+if (exp->growable) {
+ret = fuse_do_truncate(exp, offset + size, true, 
PREALLOC_MODE_OFF);
+if (ret < 0) {
+fuse_reply_err(req, -ret);
+return;
+}
+} else {
+size = length - offset;
+}
 }
 
 ret = blk_pwrite(exp->common.blk, offset, buf, size, 0);
-- 
2.29.2

[PULL 03/34] block/throttle-groups: Use lock guard macros

2020-12-11 Thread Kevin Wolf

From: Gan Qixin 

Replace manual lock()/unlock() calls with lock guard macros
(QEMU_LOCK_GUARD/WITH_QEMU_LOCK_GUARD) in block/throttle-groups.

Signed-off-by: Gan Qixin 
Message-Id: <20201203075055.127773-4-ganqi...@huawei.com>
Signed-off-by: Kevin Wolf 
---
 block/throttle-groups.c | 48 -
 1 file changed, 23 insertions(+), 25 deletions(-)

diff --git a/block/throttle-groups.c b/block/throttle-groups.c
index e2f2813c0f..abd16ed9db 100644
--- a/block/throttle-groups.c
+++ b/block/throttle-groups.c
@@ -546,7 +546,7 @@ void throttle_group_register_tgm(ThrottleGroupMember *tgm,
 tgm->aio_context = ctx;
 qatomic_set(&tgm->restart_pending, 0);
 
-qemu_mutex_lock(&tg->lock);
+QEMU_LOCK_GUARD(&tg->lock);
 /* If the ThrottleGroup is new set this ThrottleGroupMember as the token */
 for (i = 0; i < 2; i++) {
 if (!tg->tokens[i]) {
@@ -565,8 +565,6 @@ void throttle_group_register_tgm(ThrottleGroupMember *tgm,
 qemu_co_mutex_init(&tgm->throttled_reqs_lock);
 qemu_co_queue_init(&tgm->throttled_reqs[0]);
 qemu_co_queue_init(&tgm->throttled_reqs[1]);
-
-qemu_mutex_unlock(&tg->lock);
 }
 
 /* Unregister a ThrottleGroupMember from its group, removing it from the list,
@@ -594,25 +592,25 @@ void throttle_group_unregister_tgm(ThrottleGroupMember 
*tgm)
 /* Wait for throttle_group_restart_queue_entry() coroutines to finish */
 AIO_WAIT_WHILE(tgm->aio_context, qatomic_read(&tgm->restart_pending) > 0);
 
-qemu_mutex_lock(&tg->lock);
-for (i = 0; i < 2; i++) {
-assert(tgm->pending_reqs[i] == 0);
-assert(qemu_co_queue_empty(&tgm->throttled_reqs[i]));
-assert(!timer_pending(tgm->throttle_timers.timers[i]));
-if (tg->tokens[i] == tgm) {
-token = throttle_group_next_tgm(tgm);
-/* Take care of the case where this is the last tgm in the group */
-if (token == tgm) {
-token = NULL;
+WITH_QEMU_LOCK_GUARD(&tg->lock) {
+for (i = 0; i < 2; i++) {
+assert(tgm->pending_reqs[i] == 0);
+assert(qemu_co_queue_empty(&tgm->throttled_reqs[i]));
+assert(!timer_pending(tgm->throttle_timers.timers[i]));
+if (tg->tokens[i] == tgm) {
+token = throttle_group_next_tgm(tgm);
+/* Take care of the case where this is the last tgm in the 
group */
+if (token == tgm) {
+token = NULL;
+}
+tg->tokens[i] = token;
 }
-tg->tokens[i] = token;
 }
-}
 
-/* remove the current tgm from the list */
-QLIST_REMOVE(tgm, round_robin);
-throttle_timers_destroy(&tgm->throttle_timers);
-qemu_mutex_unlock(&tg->lock);
+/* remove the current tgm from the list */
+QLIST_REMOVE(tgm, round_robin);
+throttle_timers_destroy(&tgm->throttle_timers);
+}
 
 throttle_group_unref(&tg->ts);
 tgm->throttle_state = NULL;
@@ -638,14 +636,14 @@ void 
throttle_group_detach_aio_context(ThrottleGroupMember *tgm)
 assert(qemu_co_queue_empty(&tgm->throttled_reqs[1]));
 
 /* Kick off next ThrottleGroupMember, if necessary */
-qemu_mutex_lock(&tg->lock);
-for (i = 0; i < 2; i++) {
-if (timer_pending(tt->timers[i])) {
-tg->any_timer_armed[i] = false;
-schedule_next_request(tgm, i);
+WITH_QEMU_LOCK_GUARD(&tg->lock) {
+for (i = 0; i < 2; i++) {
+if (timer_pending(tt->timers[i])) {
+tg->any_timer_armed[i] = false;
+schedule_next_request(tgm, i);
+}
 }
 }
-qemu_mutex_unlock(&tg->lock);
 
 throttle_timers_detach_aio_context(tt);
 tgm->aio_context = NULL;
-- 
2.29.2

[PULL 04/34] block/iscsi: Use lock guard macros

2020-12-11 Thread Kevin Wolf

From: Gan Qixin 

Replace manual lock()/unlock() calls with lock guard macros
(QEMU_LOCK_GUARD/WITH_QEMU_LOCK_GUARD) in block/iscsi.

Signed-off-by: Gan Qixin 
Message-Id: <20201203075055.127773-5-ganqi...@huawei.com>
Signed-off-by: Kevin Wolf 
---
 block/iscsi.c | 50 --
 1 file changed, 24 insertions(+), 26 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index e30a7e3606..7d4b3b56d5 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -322,25 +322,23 @@ iscsi_aio_cancel(BlockAIOCB *blockacb)
 IscsiAIOCB *acb = (IscsiAIOCB *)blockacb;
 IscsiLun *iscsilun = acb->iscsilun;
 
-qemu_mutex_lock(&iscsilun->mutex);
+WITH_QEMU_LOCK_GUARD(&iscsilun->mutex) {
 
-/* If it was cancelled or completed already, our work is done here */
-if (acb->cancelled || acb->status != -EINPROGRESS) {
-qemu_mutex_unlock(&iscsilun->mutex);
-return;
-}
+/* If it was cancelled or completed already, our work is done here */
+if (acb->cancelled || acb->status != -EINPROGRESS) {
+return;
+}
 
-acb->cancelled = true;
+acb->cancelled = true;
 
-qemu_aio_ref(acb); /* released in iscsi_abort_task_cb() */
+qemu_aio_ref(acb); /* released in iscsi_abort_task_cb() */
 
-/* send a task mgmt call to the target to cancel the task on the target */
-if (iscsi_task_mgmt_abort_task_async(iscsilun->iscsi, acb->task,
- iscsi_abort_task_cb, acb) < 0) {
-qemu_aio_unref(acb); /* since iscsi_abort_task_cb() won't be called */
+/* send a task mgmt call to the target to cancel the task on the 
target */
+if (iscsi_task_mgmt_abort_task_async(iscsilun->iscsi, acb->task,
+ iscsi_abort_task_cb, acb) < 0) {
+qemu_aio_unref(acb); /* since iscsi_abort_task_cb() won't be 
called */
+}
 }
-
-qemu_mutex_unlock(&iscsilun->mutex);
 }
 
 static const AIOCBInfo iscsi_aiocb_info = {
@@ -375,22 +373,22 @@ static void iscsi_timed_check_events(void *opaque)
 {
 IscsiLun *iscsilun = opaque;
 
-qemu_mutex_lock(&iscsilun->mutex);
+WITH_QEMU_LOCK_GUARD(&iscsilun->mutex) {
+/* check for timed out requests */
+iscsi_service(iscsilun->iscsi, 0);
 
-/* check for timed out requests */
-iscsi_service(iscsilun->iscsi, 0);
+if (iscsilun->request_timed_out) {
+iscsilun->request_timed_out = false;
+iscsi_reconnect(iscsilun->iscsi);
+}
 
-if (iscsilun->request_timed_out) {
-iscsilun->request_timed_out = false;
-iscsi_reconnect(iscsilun->iscsi);
+/*
+ * newer versions of libiscsi may return zero events. Ensure we are
+ * able to return to service once this situation changes.
+ */
+iscsi_set_events(iscsilun);
 }
 
-/* newer versions of libiscsi may return zero events. Ensure we are able
- * to return to service once this situation changes. */
-iscsi_set_events(iscsilun);
-
-qemu_mutex_unlock(&iscsilun->mutex);
-
 timer_mod(iscsilun->event_timer,
   qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + EVENT_INTERVAL);
 }
-- 
2.29.2

[PULL 01/34] block/accounting: Use lock guard macros

2020-12-11 Thread Kevin Wolf

From: Gan Qixin 

Replace manual lock()/unlock() calls with lock guard macros
(QEMU_LOCK_GUARD/WITH_QEMU_LOCK_GUARD) in block/accounting.

Signed-off-by: Gan Qixin 
Reviewed-by: Paolo Bonzini 
Message-Id: <20201203075055.127773-2-ganqi...@huawei.com>
Signed-off-by: Kevin Wolf 
---
 block/accounting.c | 32 +++-
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/block/accounting.c b/block/accounting.c
index 8d41c8a83a..2030851d79 100644
--- a/block/accounting.c
+++ b/block/accounting.c
@@ -199,29 +199,27 @@ static void block_account_one_io(BlockAcctStats *stats, 
BlockAcctCookie *cookie,
 return;
 }
 
-qemu_mutex_lock(&stats->lock);
-
-if (failed) {
-stats->failed_ops[cookie->type]++;
-} else {
-stats->nr_bytes[cookie->type] += cookie->bytes;
-stats->nr_ops[cookie->type]++;
-}
+WITH_QEMU_LOCK_GUARD(&stats->lock) {
+if (failed) {
+stats->failed_ops[cookie->type]++;
+} else {
+stats->nr_bytes[cookie->type] += cookie->bytes;
+stats->nr_ops[cookie->type]++;
+}
 
-block_latency_histogram_account(&stats->latency_histogram[cookie->type],
-latency_ns);
+
block_latency_histogram_account(&stats->latency_histogram[cookie->type],
+latency_ns);
 
-if (!failed || stats->account_failed) {
-stats->total_time_ns[cookie->type] += latency_ns;
-stats->last_access_time_ns = time_ns;
+if (!failed || stats->account_failed) {
+stats->total_time_ns[cookie->type] += latency_ns;
+stats->last_access_time_ns = time_ns;
 
-QSLIST_FOREACH(s, &stats->intervals, entries) {
-timed_average_account(&s->latency[cookie->type], latency_ns);
+QSLIST_FOREACH(s, &stats->intervals, entries) {
+timed_average_account(&s->latency[cookie->type], latency_ns);
+}
 }
 }
 
-qemu_mutex_unlock(&stats->lock);
-
 cookie->type = BLOCK_ACCT_NONE;
 }
 
-- 
2.29.2

[PULL 17/34] iotests: Restrict some Python tests to file

2020-12-11 Thread Kevin Wolf

From: Max Reitz 

Most Python tests are restricted to the file protocol (without
explicitly saying so), but these are the ones that would break
./check -fuse -qcow2.

Signed-off-by: Max Reitz 
Reviewed-by: Kevin Wolf 
Message-Id: <20201027190600.192171-14-mre...@redhat.com>
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/206 | 3 ++-
 tests/qemu-iotests/242 | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/206 b/tests/qemu-iotests/206
index 11bc51f256..0a3ee5ef00 100755
--- a/tests/qemu-iotests/206
+++ b/tests/qemu-iotests/206
@@ -23,7 +23,8 @@
 import iotests
 from iotests import imgfmt
 
-iotests.script_initialize(supported_fmts=['qcow2'])
+iotests.script_initialize(supported_fmts=['qcow2'],
+  supported_protocols=['file'])
 iotests.verify_working_luks()
 
 with iotests.FilePath('t.qcow2') as disk_path, \
diff --git a/tests/qemu-iotests/242 b/tests/qemu-iotests/242
index 64f1bd95e4..a16de3085f 100755
--- a/tests/qemu-iotests/242
+++ b/tests/qemu-iotests/242
@@ -24,7 +24,8 @@ import struct
 from iotests import qemu_img_create, qemu_io, qemu_img_pipe, \
 file_path, img_info_log, log, filter_qemu_io
 
-iotests.script_initialize(supported_fmts=['qcow2'])
+iotests.script_initialize(supported_fmts=['qcow2'],
+  supported_protocols=['file'])
 
 disk = file_path('disk')
 chunk = 256 * 1024
-- 
2.29.2

[PULL 12/34] iotests: Do not pipe _make_test_img

2020-12-11 Thread Kevin Wolf

From: Max Reitz 

Executing _make_test_img as part of a pipe will undo all variable
changes it has done.  As such, this could not work with FUSE (because
we want to remember all of our exports and their qemu instances).

Replace the pipe by a temporary file in 071 and 174 (the two tests that
can run on FUSE).

Signed-off-by: Max Reitz 
Reviewed-by: Kevin Wolf 
Message-Id: <20201027190600.192171-9-mre...@redhat.com>
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/071 | 19 +++
 tests/qemu-iotests/174 | 10 +-
 2 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/tests/qemu-iotests/071 b/tests/qemu-iotests/071
index 88faebcc1d..18fe9054b0 100755
--- a/tests/qemu-iotests/071
+++ b/tests/qemu-iotests/071
@@ -61,8 +61,17 @@ echo
 echo "=== Testing blkverify through filename ==="
 echo
 
-TEST_IMG="$TEST_IMG.base" IMGFMT="raw" _make_test_img --no-opts $IMG_SIZE |\
-_filter_imgfmt
+# _make_test_img may set variables that we need to retain.  Everything
+# in a pipe is executed in a subshell, so doing so would throw away
+# all changes.  Therefore, we have to store the output in some temp
+# file and filter that.
+scratch_out="$TEST_DIR/img-create.out"
+
+TEST_IMG="$TEST_IMG.base" IMGFMT="raw" _make_test_img --no-opts $IMG_SIZE \
+>"$scratch_out"
+_filter_imgfmt <"$scratch_out"
+rm -f "$scratch_out"
+
 _make_test_img $IMG_SIZE
 $QEMU_IO -c "open -o 
driver=raw,file.driver=blkverify,file.raw.filename=$TEST_IMG.base $TEST_IMG" \
  -c 'read 0 512' -c 'write -P 42 0x38000 512' -c 'read -P 42 0x38000 
512' | _filter_qemu_io
@@ -76,8 +85,10 @@ echo
 echo "=== Testing blkverify through file blockref ==="
 echo
 
-TEST_IMG="$TEST_IMG.base" IMGFMT="raw" _make_test_img --no-opts $IMG_SIZE |\
-_filter_imgfmt
+TEST_IMG="$TEST_IMG.base" IMGFMT="raw" _make_test_img --no-opts $IMG_SIZE \
+>"$scratch_out"
+_filter_imgfmt <"$scratch_out"
+
 _make_test_img $IMG_SIZE
 $QEMU_IO -c "open -o 
driver=raw,file.driver=blkverify,file.raw.filename=$TEST_IMG.base,file.test.driver=$IMGFMT,file.test.file.filename=$TEST_IMG"
 \
  -c 'read 0 512' -c 'write -P 42 0x38000 512' -c 'read -P 42 0x38000 
512' | _filter_qemu_io
diff --git a/tests/qemu-iotests/174 b/tests/qemu-iotests/174
index e2f14a38c6..1b0dd2e8b7 100755
--- a/tests/qemu-iotests/174
+++ b/tests/qemu-iotests/174
@@ -40,7 +40,15 @@ _unsupported_fmt raw
 
 
 size=256K
-IMGFMT=raw IMGKEYSECRET= _make_test_img --no-opts $size | _filter_imgfmt
+
+# _make_test_img may set variables that we need to retain.  Everything
+# in a pipe is executed in a subshell, so doing so would throw away
+# all changes.  Therefore, we have to store the output in some temp
+# file and filter that.
+scratch_out="$TEST_DIR/img-create.out"
+IMGFMT=raw IMGKEYSECRET= _make_test_img --no-opts $size >"$scratch_out"
+_filter_imgfmt <"$scratch_out"
+rm -f "$scratch_out"
 
 echo
 echo "== reading wrong format should fail =="
-- 
2.29.2

[PULL 07/34] fuse: Implement standard FUSE operations

2020-12-11 Thread Kevin Wolf

From: Max Reitz 

This makes the export actually useful instead of only producing errors
whenever it is accessed.

Signed-off-by: Max Reitz 
Message-Id: <20201027190600.192171-4-mre...@redhat.com>
Signed-off-by: Kevin Wolf 
---
 block/export/fuse.c | 242 
 1 file changed, 242 insertions(+)

diff --git a/block/export/fuse.c b/block/export/fuse.c
index 0553bcd630..d995829ab7 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -282,8 +282,250 @@ static void fuse_init(void *userdata, struct 
fuse_conn_info *conn)
 conn->max_write = MIN_NON_ZERO(BDRV_REQUEST_MAX_BYTES, conn->max_write);
 }
 
+/**
+ * Let clients look up files.  Always return ENOENT because we only
+ * care about the mountpoint itself.
+ */
+static void fuse_lookup(fuse_req_t req, fuse_ino_t parent, const char *name)
+{
+fuse_reply_err(req, ENOENT);
+}
+
+/**
+ * Let clients get file attributes (i.e., stat() the file).
+ */
+static void fuse_getattr(fuse_req_t req, fuse_ino_t inode,
+ struct fuse_file_info *fi)
+{
+struct stat statbuf;
+int64_t length, allocated_blocks;
+time_t now = time(NULL);
+FuseExport *exp = fuse_req_userdata(req);
+mode_t mode;
+
+length = blk_getlength(exp->common.blk);
+if (length < 0) {
+fuse_reply_err(req, -length);
+return;
+}
+
+allocated_blocks = bdrv_get_allocated_file_size(blk_bs(exp->common.blk));
+if (allocated_blocks <= 0) {
+allocated_blocks = DIV_ROUND_UP(length, 512);
+} else {
+allocated_blocks = DIV_ROUND_UP(allocated_blocks, 512);
+}
+
+mode = S_IFREG | S_IRUSR;
+if (exp->writable) {
+mode |= S_IWUSR;
+}
+
+statbuf = (struct stat) {
+.st_ino = inode,
+.st_mode= mode,
+.st_nlink   = 1,
+.st_uid = getuid(),
+.st_gid = getgid(),
+.st_size= length,
+.st_blksize = blk_bs(exp->common.blk)->bl.request_alignment,
+.st_blocks  = allocated_blocks,
+.st_atime   = now,
+.st_mtime   = now,
+.st_ctime   = now,
+};
+
+fuse_reply_attr(req, &statbuf, 1.);
+}
+
+static int fuse_do_truncate(const FuseExport *exp, int64_t size,
+bool req_zero_write, PreallocMode prealloc)
+{
+uint64_t blk_perm, blk_shared_perm;
+BdrvRequestFlags truncate_flags = 0;
+int ret;
+
+if (req_zero_write) {
+truncate_flags |= BDRV_REQ_ZERO_WRITE;
+}
+
+blk_get_perm(exp->common.blk, &blk_perm, &blk_shared_perm);
+
+ret = blk_set_perm(exp->common.blk, blk_perm | BLK_PERM_RESIZE,
+   blk_shared_perm, NULL);
+if (ret < 0) {
+return ret;
+}
+
+ret = blk_truncate(exp->common.blk, size, true, prealloc,
+   truncate_flags, NULL);
+
+/* Must succeed, because we are only giving up the RESIZE permission */
+blk_set_perm(exp->common.blk, blk_perm, blk_shared_perm, &error_abort);
+
+return ret;
+}
+
+/**
+ * Let clients set file attributes.  Only resizing is supported.
+ */
+static void fuse_setattr(fuse_req_t req, fuse_ino_t inode, struct stat 
*statbuf,
+ int to_set, struct fuse_file_info *fi)
+{
+FuseExport *exp = fuse_req_userdata(req);
+int ret;
+
+if (!exp->writable) {
+fuse_reply_err(req, EACCES);
+return;
+}
+
+if (to_set & ~FUSE_SET_ATTR_SIZE) {
+fuse_reply_err(req, ENOTSUP);
+return;
+}
+
+ret = fuse_do_truncate(exp, statbuf->st_size, true, PREALLOC_MODE_OFF);
+if (ret < 0) {
+fuse_reply_err(req, -ret);
+return;
+}
+
+fuse_getattr(req, inode, fi);
+}
+
+/**
+ * Let clients open a file (i.e., the exported image).
+ */
+static void fuse_open(fuse_req_t req, fuse_ino_t inode,
+  struct fuse_file_info *fi)
+{
+fuse_reply_open(req, fi);
+}
+
+/**
+ * Handle client reads from the exported image.
+ */
+static void fuse_read(fuse_req_t req, fuse_ino_t inode,
+  size_t size, off_t offset, struct fuse_file_info *fi)
+{
+FuseExport *exp = fuse_req_userdata(req);
+int64_t length;
+void *buf;
+int ret;
+
+/* Limited by max_read, should not happen */
+if (size > FUSE_MAX_BOUNCE_BYTES) {
+fuse_reply_err(req, EINVAL);
+return;
+}
+
+/**
+ * Clients will expect short reads at EOF, so we have to limit
+ * offset+size to the image length.
+ */
+length = blk_getlength(exp->common.blk);
+if (length < 0) {
+fuse_reply_err(req, -length);
+return;
+}
+
+if (offset + size > length) {
+size = length - offset;
+}
+
+buf = qemu_try_blockalign(blk_bs(exp->common.blk), size);
+if (!buf) {
+fuse_reply_err(req, ENOMEM);
+return;
+}
+
+ret = blk_pread(exp->common.blk, offset, buf, size);
+if (ret >= 0) {
+fuse_reply_buf(req, buf, size);
+} else {
+

[PULL 06/34] fuse: Allow exporting BDSs via FUSE

2020-12-11 Thread Kevin Wolf

From: Max Reitz 

block-export-add type=fuse allows mounting block graph nodes via FUSE on
some existing regular file.  That file should then appears like a raw
disk image, and accesses to it result in accesses to the exported BDS.

Right now, we only implement the necessary block export functions to set
it up and shut it down.  We do not implement any access functions, so
accessing the mount point only results in errors.  This will be
addressed by a followup patch.

We keep a hash table of exported mount points, because we want to be
able to detect when users try to use a mount point twice.  This is
because we invoke stat() to check whether the given mount point is a
regular file, but if that file is served by ourselves (because it is
already used as a mount point), then this stat() would have to be served
by ourselves, too, which is impossible to do while we (as the caller)
are waiting for it to settle.  Therefore, keep track of mount point
paths to at least catch the most obvious instances of that problem.

Signed-off-by: Max Reitz 
Message-Id: <20201027190600.192171-3-mre...@redhat.com>
Signed-off-by: Kevin Wolf 
---
 qapi/block-export.json   |  23 ++-
 include/block/fuse.h |  30 
 block.c  |   1 +
 block/export/export.c|   4 +
 block/export/fuse.c  | 295 +++
 MAINTAINERS  |   6 +
 block/export/meson.build |   2 +
 7 files changed, 359 insertions(+), 2 deletions(-)
 create mode 100644 include/block/fuse.h
 create mode 100644 block/export/fuse.c

diff --git a/qapi/block-export.json b/qapi/block-export.json
index 4eeac7842d..430bc69f35 100644
--- a/qapi/block-export.json
+++ b/qapi/block-export.json
@@ -120,6 +120,21 @@
'*logical-block-size': 'size',
 '*num-queues': 'uint16'} }
 
+##
+# @BlockExportOptionsFuse:
+#
+# Options for exporting a block graph node on some (file) mountpoint
+# as a raw image.
+#
+# @mountpoint: Path on which to export the block device via FUSE.
+#  This must point to an existing regular file.
+#
+# Since: 6.0
+##
+{ 'struct': 'BlockExportOptionsFuse',
+  'data': { 'mountpoint': 'str' },
+  'if': 'defined(CONFIG_FUSE)' }
+
 ##
 # @NbdServerAddOptions:
 #
@@ -222,11 +237,13 @@
 #
 # @nbd: NBD export
 # @vhost-user-blk: vhost-user-blk export (since 5.2)
+# @fuse: FUSE export (since: 6.0)
 #
 # Since: 4.2
 ##
 { 'enum': 'BlockExportType',
-  'data': [ 'nbd', 'vhost-user-blk' ] }
+  'data': [ 'nbd', 'vhost-user-blk',
+{ 'name': 'fuse', 'if': 'defined(CONFIG_FUSE)' } ] }
 
 ##
 # @BlockExportOptions:
@@ -267,7 +284,9 @@
   'discriminator': 'type',
   'data': {
   'nbd': 'BlockExportOptionsNbd',
-  'vhost-user-blk': 'BlockExportOptionsVhostUserBlk'
+  'vhost-user-blk': 'BlockExportOptionsVhostUserBlk',
+  'fuse': { 'type': 'BlockExportOptionsFuse',
+'if': 'defined(CONFIG_FUSE)' }
} }
 
 ##
diff --git a/include/block/fuse.h b/include/block/fuse.h
new file mode 100644
index 00..ffa91fe364
--- /dev/null
+++ b/include/block/fuse.h
@@ -0,0 +1,30 @@
+/*
+ * Present a block device as a raw image through FUSE
+ *
+ * Copyright (c) 2020 Max Reitz 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; under version 2 or later of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#ifndef BLOCK_FUSE_H
+#define BLOCK_FUSE_H
+
+#ifdef CONFIG_FUSE
+
+#include "block/export.h"
+
+extern const BlockExportDriver blk_exp_fuse;
+
+#endif /* CONFIG_FUSE */
+
+#endif
diff --git a/block.c b/block.c
index f1cedac362..eb16fb48c6 100644
--- a/block.c
+++ b/block.c
@@ -26,6 +26,7 @@
 #include "block/trace.h"
 #include "block/block_int.h"
 #include "block/blockjob.h"
+#include "block/fuse.h"
 #include "block/nbd.h"
 #include "block/qdict.h"
 #include "qemu/error-report.h"
diff --git a/block/export/export.c b/block/export/export.c
index bad6f21b1c..b716c1522c 100644
--- a/block/export/export.c
+++ b/block/export/export.c
@@ -17,6 +17,7 @@
 #include "sysemu/block-backend.h"
 #include "sysemu/iothread.h"
 #include "block/export.h"
+#include "block/fuse.h"
 #include "block/nbd.h"
 #include "qapi/error.h"
 #include "qapi/qapi-commands-block-export.h"
@@ -31,6 +32,9 @@ static const BlockExportDriver *blk_exp_drivers[] = {
 #ifdef CONFIG_VHOST_USER_BLK_SERVER
 &blk_exp_vhost_user_blk,
 #endif
+#ifdef CONFIG_FUSE
+&blk_exp_fuse,
+#endif
 };
 
 /* Only accessed from the main thread */
diff --git a/block/export/fuse.c b/block/export/fuse.c
new file mode 1

[PULL 00/34] Block layer patches

2020-12-11 Thread Kevin Wolf

The following changes since commit b785d25e91718a660546a6550f64b3c543af7754:

  Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into 
staging (2020-12-11 13:50:35 +)

are available in the Git repository at:

  git://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to 960d5fb3e8ee09bc5f1a5c84f66dce42a6cef920:

  block: Fix deadlock in bdrv_co_yield_to_drain() (2020-12-11 17:52:40 +0100)


Block layer patches:

- Support for FUSE exports
- Fix deadlock in bdrv_co_yield_to_drain()
- Use lock guard macros
- Some preparational patches for 64 bit block layer
- file-posix: Fix request extension to INT64_MAX in raw_do_pwrite_zeroes()


Gan Qixin (4):
  block/accounting: Use lock guard macros
  block/curl: Use lock guard macros
  block/throttle-groups: Use lock guard macros
  block/iscsi: Use lock guard macros

Kevin Wolf (4):
  can-host: Fix crash when 'canbus' property is not set
  block: Simplify qmp_block_resize() error paths
  block: Fix locking in qmp_block_resize()
  block: Fix deadlock in bdrv_co_yield_to_drain()

Li Feng (1):
  file-posix: check the use_lock before setting the file lock

Max Reitz (21):
  meson: Detect libfuse
  fuse: Allow exporting BDSs via FUSE
  fuse: Implement standard FUSE operations
  fuse: Allow growable exports
  fuse: (Partially) implement fallocate()
  fuse: Implement hole detection through lseek
  iotests: Do not needlessly filter _make_test_img
  iotests: Do not pipe _make_test_img
  iotests: Use convert -n in some cases
  iotests/046: Avoid renaming images
  iotests: Derive image names from $TEST_IMG
  iotests/091: Use _cleanup_qemu instad of "wait"
  iotests: Restrict some Python tests to file
  iotests: Let _make_test_img guess $TEST_IMG_FILE
  iotests/287: Clean up subshell test image
  storage-daemon: Call bdrv_close_all() on exit
  iotests: Give access to the qemu-storage-daemon
  iotests: Allow testing FUSE exports
  iotests: Enable fuse for many tests
  iotests/308: Add test for FUSE exports
  iotests/221: Discard image before qemu-img map

Vladimir Sementsov-Ogievskiy (4):
  block/file-posix: fix workaround in raw_do_pwrite_zeroes()
  block/io: bdrv_refresh_limits(): use ERRP_GUARD
  block/io: bdrv_check_byte_request(): drop bdrv_is_inserted()
  block: introduce BDRV_MAX_LENGTH

 qapi/block-export.json   |  27 +-
 meson_options.txt|   4 +
 configure|  13 +
 include/block/block.h|  10 +
 include/block/block_int.h|   8 +
 include/block/fuse.h |  30 ++
 block.c  |  18 +-
 block/accounting.c   |  32 +-
 block/curl.c |  28 +-
 block/export/export.c|   4 +
 block/export/fuse.c  | 726 +++
 block/file-posix.c   |   9 +-
 block/io.c   | 110 --
 block/iscsi.c|  50 ++-
 block/throttle-groups.c  |  48 ++-
 blockdev.c   |  14 +-
 net/can/can_host.c   |   5 +
 storage-daemon/qemu-storage-daemon.c |   3 +
 tests/test-write-threshold.c |   4 +
 MAINTAINERS  |   6 +
 block/export/meson.build |   2 +
 meson.build  |  26 ++
 tests/qemu-iotests/025   |   2 +-
 tests/qemu-iotests/026   |   2 +-
 tests/qemu-iotests/028   |  16 +-
 tests/qemu-iotests/028.out   |   3 +
 tests/qemu-iotests/031   |   2 +-
 tests/qemu-iotests/034   |   2 +-
 tests/qemu-iotests/036   |   2 +-
 tests/qemu-iotests/037   |   2 +-
 tests/qemu-iotests/038   |   2 +-
 tests/qemu-iotests/039   |   2 +-
 tests/qemu-iotests/046   |   7 +-
 tests/qemu-iotests/046.out   |   2 +-
 tests/qemu-iotests/050   |   2 +-
 tests/qemu-iotests/054   |   2 +-
 tests/qemu-iotests/060   |   2 +-
 tests/qemu-iotests/071   |  21 +-
 tests/qemu-iotests/079   |   2 +-
 tests/qemu-iotests/080   |   2 +-
 tests/qemu-iotests/089   |   5 +-
 tests/qemu-iotests/089.out   |   1 +
 tests/qemu-iotests/090   |   2 +-
 tests/qemu-iotests/091   |   5 +-
 tests/qemu-iotests/095   |   2 +-
 tests/qemu-iotests/097   |   2 +-
 tests/qemu-iotests/098   |   2 +-
 tests/qemu-iotests/102   |   2 +-
 tests/qemu-iotests/103   |   2 +-
 tests/qemu-iotests/106   |   2 +-
 tests/qemu-iotests/107   |   2 +-
 t

Re: [PATCH v14 10/13] qapi: block-stream: add "bottom" argument

2020-12-11 Thread Vladimir Sementsov-Ogievskiy


11.12.2020 19:05, Max Reitz wrote:

On 04.12.20 23:07, Vladimir Sementsov-Ogievskiy wrote:

The code already don't freeze base node and we try to make it prepared
for the situation when base node is changed during the operation. In
other words, block-stream doesn't own base node.

Let's introduce a new interface which should replace the current one,
which will in better relations with the code. Specifying bottom node
instead of base, and requiring it to be non-filter gives us the
following benefits:

  - drop difference between above_base and base_overlay, which will be
    renamed to just bottom, when old interface dropped

  - clean way to work with parallel streams/commits on the same backing
    chain, which otherwise become a problem when we introduce a filter
    for stream job

  - cleaner interface. Nobody will surprised the fact that base node may
    disappear during block-stream, when there is no word about "base" in
    the interface.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  qapi/block-core.json   |  8 +++--
  include/block/block_int.h  |  1 +
  block/monitor/block-hmp-cmds.c |  3 +-
  block/stream.c | 50 +++-
  blockdev.c | 61 --
  5 files changed, 94 insertions(+), 29 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 04055ef50c..5d6681a35d 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2522,6 +2522,10 @@
  # @base-node: the node name of the backing file.
  # It cannot be set if @base is also set. (Since 2.8)
  #
+# @bottom: the last node in the chain that should be streamed into
+#  top. It cannot be set any of @base, @base-node or @backing-file


s/set any/set if any/

But what’s the problem with backing-file?  The fact that specifying 
backing-file means that stream will look for that filename in the backing chain 
when the job is done (so if you use @bottom, we generally don’t want to rely on 
the presence of any nodes below it)?


I just wanted to deprecate 'backing-file' together with base and base-node as a 
next step. If user wants to set backing file unrelated to current 
backing-chain, is it correct at all? It's a direct violation of what's going 
on, and I doubt that other parts of Qemu working with backing-file are prepared 
for such situation. User can do it by hand later.. Anyway, we'll have three 
releases deprecation period for people to come and cry that this is a really 
needed option, so we can support it later on demand.



(If so, I would have thought that we actually want the user to specify 
backing-file so we don’t have to look down below @bottom to look for a 
filename.  Perhaps a @backing-fmt parameter would help.)


If we decide that 'backing-file' is really needed, than yes we should require backing-fmt 
to be specified together with backing-file when using new "bottom" interface.



[...]


diff --git a/blockdev.c b/blockdev.c
index 70900f4f77..e0e19db88b 100644
--- a/blockdev.c
+++ b/blockdev.c


[...]


@@ -2551,8 +2567,33 @@ void qmp_block_stream(bool has_job_id, const char 
*job_id, const char *device,
  bdrv_refresh_filename(base_bs);
  }
-    /* Check for op blockers in the whole chain between bs and base */
-    for (iter = bs; iter && iter != base_bs;
+    if (has_bottom) {
+    bottom_bs = bdrv_lookup_bs(NULL, bottom, errp);
+    if (!bottom_bs) {
+    goto out;
+    }
+    if (!bottom_bs->drv) {
+    error_setg(errp, "Node '%s' is not open", bottom);
+    goto out;
+    }
+    if (bottom_bs->drv->is_filter) {
+    error_setg(errp, "Node '%s' is filter, use non-filter node"
+   "as 'bottom'", bottom);


Missing a space between “node” and “as”.  (Also, probably two articles, i.e. 
“Node '%s' is a filter, use a non-filter node...”.)

The rest looks good to me, but I’m withholding my R-b because I haven’t 
understood why using @bottom precludes giving @backing-file.

Max




--
Best regards,
Vladimir

Re: [PATCH v14 12/13] block/stream: add s->target_bs

2020-12-11 Thread Max Reitz


On 04.12.20 23:07, Vladimir Sementsov-Ogievskiy wrote:

Add a direct link to target bs for convenience and to simplify
following commit which will insert COR filter above target bs.

This is a part of original commit written by Andrey.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/stream.c | 23 ++-
  1 file changed, 10 insertions(+), 13 deletions(-)


Reviewed-by: Max Reitz

Re: [PATCH] hw/block: m25p80: Fix fast read for SST flashes

2020-12-11 Thread Francisco Iglesias

Hello Bin,

On [2020 Dec 11] Fri 23:29:16, Bin Meng wrote:
> Hi Francisco,
> 
> On Fri, Dec 11, 2020 at 11:16 PM Francisco Iglesias
>  wrote:
> >
> > Hello Bin,
> >
> > On [2020 Dec 11] Fri 14:07:21, Bin Meng wrote:
> > > Hi Francisco,
> > >
> > > On Fri, Dec 4, 2020 at 7:28 PM Francisco Iglesias
> > >  wrote:
> > > >
> > > > Hello Bin,
> > > >
> > > > On [2020 Dec 04] Fri 18:52:50, Bin Meng wrote:
> > > > > Hi Francisco,
> > > > >
> > > > > On Fri, Dec 4, 2020 at 6:46 PM Francisco Iglesias
> > > > >  wrote:
> > > > > >
> > > > > > Hello Bin,
> > > > > >
> > > > > > On [2020 Dec 04] Fri 15:52:12, Bin Meng wrote:
> > > > > > > Hi Francisco,
> > > > > > >
> > > > > > > On Thu, Dec 3, 2020 at 4:38 PM Francisco Iglesias
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > Hi Bin and Alistair,
> > > > > > > >
> > > > > > > > On [2020 Dec 02] Wed 11:40:11, Alistair Francis wrote:
> > > > > > > > > On Sun, Nov 29, 2020 at 6:55 PM Bin Meng  
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > From: Bin Meng 
> > > > > > > > > >
> > > > > > > > > > SST flashes require a dummy byte after the address bits.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Bin Meng 
> > > > > > > > >
> > > > > > > > > I couldn't find a datasheet that says this... But the actual 
> > > > > > > > > code
> > > > > > > > > change looks fine, so:
> > > > > > > > >
> > > > > > > > > Acked-by: Alistair Francis 
> > > > > > > > >
> > > > > > > > > Alistair
> > > > > > > > >
> > > > > > > > > > ---
> > > > > > > > > >
> > > > > > > > > >  hw/block/m25p80.c | 3 +++
> > > > > > > > > >  1 file changed, 3 insertions(+)
> > > > > > > > > >
> > > > > > > > > > diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
> > > > > > > > > > index 483925f..9b36762 100644
> > > > > > > > > > --- a/hw/block/m25p80.c
> > > > > > > > > > +++ b/hw/block/m25p80.c
> > > > > > > > > > @@ -825,6 +825,9 @@ static void decode_fast_read_cmd(Flash 
> > > > > > > > > > *s)
> > > > > > > > > >  s->needed_bytes = get_addr_length(s);
> > > > > > > > > >  switch (get_man(s)) {
> > > > > > > > > >  /* Dummy cycles - modeled with bytes writes instead of 
> > > > > > > > > > bits */
> > > > > > > > > > +case MAN_SST:
> > > > > > > > > > +s->needed_bytes += 1;
> > > > > > > >
> > > > > > > > 1 dummy clk cycle is modelled as 1 byte write (see the comment 
> > > > > > > > above), so 1
> > > > > > > > dummy byte (8 dummy clk cycles) will need +8 above.
> > > > > > >
> > > > > > > I think you were confused by the WINBOND codes. The comments are
> > > > > > > correct. It is modeled with bytes instead of bits, so we should 
> > > > > > > +=1.
> > > > > >
> > > > > > What the comment says is (perhaps not superclear) that 1 dummy 
> > > > > > clock cycle
> > > > > > is modeled as one 1 byte write into the flash (meaining that 8 byte 
> > > > > > writes
> > > > > > are needed for 1 dummy byte). Perhaps it is easier to understand
> > > > > > looking into how the controllers issue the command towards the 
> > > > > > flash model
> > > > > > (for example the xilinx_spips), the start of the FAST_READ cmd is 
> > > > > > issued
> > > > > > as writing the following into the flash: 1 byte (cmd), 3 bytes 
> > > > > > (address),
> > > > > > 8 bytes (8 dummy cycles -> 1 dummy byte).
> > > > > >
> > > > >
> > > > > My interpretation of the comments are opposite: one cycle is a bit,
> > > > > but we are not using bits, instead we are using bytes.
> > > >
> > > > Yes, the mentioning of 'bits' in the comment makes it not very clear at 
> > > > first read.
> > > > Maybe just bellow would have been better:
> > > >
> > > > /* Dummy clock cycles - modeled with bytes writes */
> > > >
> > > > >
> > > > > Testing shows that +=1 is the correct way with the imx_spi controller,
> > > > > and with my SiFive SPI model in my local tree (not upstreamed yet)
> > > >
> > > > Perhaps an option could be to look into how the aspeed_smc, 
> > > > xilinx_spips or the
> > > > npcm7xx_fiu generate dummy clock cycles and see if a similar solution 
> > > > to one of
> > > > those could work aswell for the imx_spi?
> > > >
> > >
> > > Thanks for pointing this out. So there is some inconsistency among
> > > different SPI controller modeling.
> >
> > I'm not sure I understand you correctly but the controllers supporting
> > commands with dummy clock cycles can only do it following the modeled
> > approach, so I would rather say it is pretty consistent across the
> > controllers (not all controllers support these commands though).
> 
> I mean there are 2 approaches to emulate the dummy cycles for

There is currently only 1 way of modeling dummy clock cycles. All commands that
require / support them in m25p80 goes with that approach. An the controllers
that support dummy clock cycles uses that approach. 

> different SPI controller models, yet we only have one m25p80 flash
> model to work with both of them. Some controllers may choose 1 byte to
> emulate 1 dummy clock cycle, but some others

Re: [PATCH v14 11/13] iotests: 30: prepare to COR filter insertion by stream job

2020-12-11 Thread Max Reitz


On 04.12.20 23:07, Vladimir Sementsov-Ogievskiy wrote:

test_stream_parallel run parallel stream jobs, intersecting so that top
of one is base of another. It's OK now, but it would be a problem if
insert the filter, as one job will want to use another job's filter as
above_base node.

Correct thing to do is move to new interface: "bottom" argument instead
of base. This guarantees that jobs don't intersect by their actions.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  tests/qemu-iotests/030 | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)


Reviewed-by: Max Reitz

Re: [PATCH v14 10/13] qapi: block-stream: add "bottom" argument

2020-12-11 Thread Max Reitz


On 04.12.20 23:07, Vladimir Sementsov-Ogievskiy wrote:

The code already don't freeze base node and we try to make it prepared
for the situation when base node is changed during the operation. In
other words, block-stream doesn't own base node.

Let's introduce a new interface which should replace the current one,
which will in better relations with the code. Specifying bottom node
instead of base, and requiring it to be non-filter gives us the
following benefits:

  - drop difference between above_base and base_overlay, which will be
renamed to just bottom, when old interface dropped

  - clean way to work with parallel streams/commits on the same backing
chain, which otherwise become a problem when we introduce a filter
for stream job

  - cleaner interface. Nobody will surprised the fact that base node may
disappear during block-stream, when there is no word about "base" in
the interface.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  qapi/block-core.json   |  8 +++--
  include/block/block_int.h  |  1 +
  block/monitor/block-hmp-cmds.c |  3 +-
  block/stream.c | 50 +++-
  blockdev.c | 61 --
  5 files changed, 94 insertions(+), 29 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 04055ef50c..5d6681a35d 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2522,6 +2522,10 @@
  # @base-node: the node name of the backing file.
  # It cannot be set if @base is also set. (Since 2.8)
  #
+# @bottom: the last node in the chain that should be streamed into
+#  top. It cannot be set any of @base, @base-node or @backing-file


s/set any/set if any/

But what’s the problem with backing-file?  The fact that specifying 
backing-file means that stream will look for that filename in the 
backing chain when the job is done (so if you use @bottom, we generally 
don’t want to rely on the presence of any nodes below it)?


(If so, I would have thought that we actually want the user to specify 
backing-file so we don’t have to look down below @bottom to look for a 
filename.  Perhaps a @backing-fmt parameter would help.)


[...]


diff --git a/blockdev.c b/blockdev.c
index 70900f4f77..e0e19db88b 100644
--- a/blockdev.c
+++ b/blockdev.c


[...]


@@ -2551,8 +2567,33 @@ void qmp_block_stream(bool has_job_id, const char 
*job_id, const char *device,
  bdrv_refresh_filename(base_bs);
  }
  
-/* Check for op blockers in the whole chain between bs and base */

-for (iter = bs; iter && iter != base_bs;
+if (has_bottom) {
+bottom_bs = bdrv_lookup_bs(NULL, bottom, errp);
+if (!bottom_bs) {
+goto out;
+}
+if (!bottom_bs->drv) {
+error_setg(errp, "Node '%s' is not open", bottom);
+goto out;
+}
+if (bottom_bs->drv->is_filter) {
+error_setg(errp, "Node '%s' is filter, use non-filter node"
+   "as 'bottom'", bottom);


Missing a space between “node” and “as”.  (Also, probably two articles, 
i.e. “Node '%s' is a filter, use a non-filter node...”.)


The rest looks good to me, but I’m withholding my R-b because I haven’t 
understood why using @bottom precludes giving @backing-file.


Max

Re: [PATCH] hw/block: m25p80: Fix fast read for SST flashes

2020-12-11 Thread Bin Meng

Hi Francisco,

On Fri, Dec 11, 2020 at 11:16 PM Francisco Iglesias
 wrote:
>
> Hello Bin,
>
> On [2020 Dec 11] Fri 14:07:21, Bin Meng wrote:
> > Hi Francisco,
> >
> > On Fri, Dec 4, 2020 at 7:28 PM Francisco Iglesias
> >  wrote:
> > >
> > > Hello Bin,
> > >
> > > On [2020 Dec 04] Fri 18:52:50, Bin Meng wrote:
> > > > Hi Francisco,
> > > >
> > > > On Fri, Dec 4, 2020 at 6:46 PM Francisco Iglesias
> > > >  wrote:
> > > > >
> > > > > Hello Bin,
> > > > >
> > > > > On [2020 Dec 04] Fri 15:52:12, Bin Meng wrote:
> > > > > > Hi Francisco,
> > > > > >
> > > > > > On Thu, Dec 3, 2020 at 4:38 PM Francisco Iglesias
> > > > > >  wrote:
> > > > > > >
> > > > > > > Hi Bin and Alistair,
> > > > > > >
> > > > > > > On [2020 Dec 02] Wed 11:40:11, Alistair Francis wrote:
> > > > > > > > On Sun, Nov 29, 2020 at 6:55 PM Bin Meng  
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > From: Bin Meng 
> > > > > > > > >
> > > > > > > > > SST flashes require a dummy byte after the address bits.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Bin Meng 
> > > > > > > >
> > > > > > > > I couldn't find a datasheet that says this... But the actual 
> > > > > > > > code
> > > > > > > > change looks fine, so:
> > > > > > > >
> > > > > > > > Acked-by: Alistair Francis 
> > > > > > > >
> > > > > > > > Alistair
> > > > > > > >
> > > > > > > > > ---
> > > > > > > > >
> > > > > > > > >  hw/block/m25p80.c | 3 +++
> > > > > > > > >  1 file changed, 3 insertions(+)
> > > > > > > > >
> > > > > > > > > diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
> > > > > > > > > index 483925f..9b36762 100644
> > > > > > > > > --- a/hw/block/m25p80.c
> > > > > > > > > +++ b/hw/block/m25p80.c
> > > > > > > > > @@ -825,6 +825,9 @@ static void decode_fast_read_cmd(Flash *s)
> > > > > > > > >  s->needed_bytes = get_addr_length(s);
> > > > > > > > >  switch (get_man(s)) {
> > > > > > > > >  /* Dummy cycles - modeled with bytes writes instead of 
> > > > > > > > > bits */
> > > > > > > > > +case MAN_SST:
> > > > > > > > > +s->needed_bytes += 1;
> > > > > > >
> > > > > > > 1 dummy clk cycle is modelled as 1 byte write (see the comment 
> > > > > > > above), so 1
> > > > > > > dummy byte (8 dummy clk cycles) will need +8 above.
> > > > > >
> > > > > > I think you were confused by the WINBOND codes. The comments are
> > > > > > correct. It is modeled with bytes instead of bits, so we should +=1.
> > > > >
> > > > > What the comment says is (perhaps not superclear) that 1 dummy clock 
> > > > > cycle
> > > > > is modeled as one 1 byte write into the flash (meaining that 8 byte 
> > > > > writes
> > > > > are needed for 1 dummy byte). Perhaps it is easier to understand
> > > > > looking into how the controllers issue the command towards the flash 
> > > > > model
> > > > > (for example the xilinx_spips), the start of the FAST_READ cmd is 
> > > > > issued
> > > > > as writing the following into the flash: 1 byte (cmd), 3 bytes 
> > > > > (address),
> > > > > 8 bytes (8 dummy cycles -> 1 dummy byte).
> > > > >
> > > >
> > > > My interpretation of the comments are opposite: one cycle is a bit,
> > > > but we are not using bits, instead we are using bytes.
> > >
> > > Yes, the mentioning of 'bits' in the comment makes it not very clear at 
> > > first read.
> > > Maybe just bellow would have been better:
> > >
> > > /* Dummy clock cycles - modeled with bytes writes */
> > >
> > > >
> > > > Testing shows that +=1 is the correct way with the imx_spi controller,
> > > > and with my SiFive SPI model in my local tree (not upstreamed yet)
> > >
> > > Perhaps an option could be to look into how the aspeed_smc, xilinx_spips 
> > > or the
> > > npcm7xx_fiu generate dummy clock cycles and see if a similar solution to 
> > > one of
> > > those could work aswell for the imx_spi?
> > >
> >
> > Thanks for pointing this out. So there is some inconsistency among
> > different SPI controller modeling.
>
> I'm not sure I understand you correctly but the controllers supporting
> commands with dummy clock cycles can only do it following the modeled
> approach, so I would rather say it is pretty consistent across the
> controllers (not all controllers support these commands though).

I mean there are 2 approaches to emulate the dummy cycles for
different SPI controller models, yet we only have one m25p80 flash
model to work with both of them. Some controllers may choose 1 byte to
emulate 1 dummy clock cycle, but some others choose 1 bit to emulate 1
dummy cycle. This is inconsistent.

>
> >
> > Or maybe fixing aspeed_smc, xilinx_spips and npcm7xx_fiu to work like
> > imx_spi?
>
> For me I would say no to above (it makes more sense to let new controllers
> implement the currently modeled approach).

Yes, we can certainly make them consistent. But the question is which
one is the correct one? I tried to search in the doc but in vain.

>
> > Which one is the expected behavior for dummy cycles?
>
> Dummy clock cycles are modeled as 1 byte written to the flash pe

Re: [PATCH v14 09/13] stream: skip filters when writing backing file name to QCOW2 header

2020-12-11 Thread Max Reitz


On 04.12.20 23:07, Vladimir Sementsov-Ogievskiy wrote:

From: Andrey Shinkevich 

Avoid writing a filter JSON file name and a filter format name to QCOW2
image when the backing file is being changed after the block stream
job. It can occur due to a concurrent commit job on the same backing
chain.
A user is still able to assign the 'backing-file' parameter for a
block-stream job keeping in mind the possible issue mentioned above.
If the user does not specify the 'backing-file' parameter, QEMU will
assign it automatically.

Signed-off-by: Andrey Shinkevich 
  [vsementsov: use unfiltered_bs for bdrv_find_backing_image()]
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/stream.c | 21 +++--
  blockdev.c |  8 +---
  2 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index 6e281c71ac..c208393c34 100644
--- a/block/stream.c
+++ b/block/stream.c


[...]


@@ -75,8 +78,22 @@ static int stream_prepare(Job *job)
  const char *base_id = NULL, *base_fmt = NULL;
  if (base) {
  base_id = s->backing_file_str;
-if (base->drv) {
-base_fmt = base->drv->format_name;
+if (base_id) {
+backing_bs = bdrv_find_backing_image(unfiltered_bs, base_id);
+if (backing_bs && backing_bs->drv) {
+base_fmt = backing_bs->drv->format_name;
+} else {
+error_report("Format not found for backing file %s",
+ s->backing_file_str);


I think it’s actually going to be rather likely that we’re not going to 
find the backing file here.  If the user were to use a filename that 
just appears as-such in the backing chain, they wouldn’t need to specify 
a backing-file parameter at all, because the one figured out 
automatically would be just fine.


But then again, what are we supposed to do then.  We can continue as 
before, which is to just use the base node’s format.  But if the user 
wants to perhaps use a backing file that isn’t even open in qemu (a copy 
of the the base on some different storage), we have no idea what format 
it’s in.


So printing an error here, but continuing on with setting a backing_fmt 
is probably the most reasonable thing to do indeed.



+}
+} else {
+base_unfiltered = bdrv_skip_filters(base);
+if (base_unfiltered) {


@base_unfiltered cannot be NULL here (because @base is sure not to be 
NULL).  Of course, double-checking isn’t wrong, it just looks a bit 
weird, because it seems to imply that we might end up with a case where 
base != NULL, but base_id == NULL.  Anyway:


Reviewed-by: Max Reitz 


+base_id = base_unfiltered->filename;
+if (base_unfiltered->drv) {
+base_fmt = base_unfiltered->drv->format_name;
+}
+}
  }
  }
  bdrv_set_backing_hd(unfiltered_bs, base, &local_err);

Re: [PATCH] hw/block: m25p80: Fix fast read for SST flashes

2020-12-11 Thread Francisco Iglesias

Hello Bin,

On [2020 Dec 11] Fri 14:07:21, Bin Meng wrote:
> Hi Francisco,
> 
> On Fri, Dec 4, 2020 at 7:28 PM Francisco Iglesias
>  wrote:
> >
> > Hello Bin,
> >
> > On [2020 Dec 04] Fri 18:52:50, Bin Meng wrote:
> > > Hi Francisco,
> > >
> > > On Fri, Dec 4, 2020 at 6:46 PM Francisco Iglesias
> > >  wrote:
> > > >
> > > > Hello Bin,
> > > >
> > > > On [2020 Dec 04] Fri 15:52:12, Bin Meng wrote:
> > > > > Hi Francisco,
> > > > >
> > > > > On Thu, Dec 3, 2020 at 4:38 PM Francisco Iglesias
> > > > >  wrote:
> > > > > >
> > > > > > Hi Bin and Alistair,
> > > > > >
> > > > > > On [2020 Dec 02] Wed 11:40:11, Alistair Francis wrote:
> > > > > > > On Sun, Nov 29, 2020 at 6:55 PM Bin Meng  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > From: Bin Meng 
> > > > > > > >
> > > > > > > > SST flashes require a dummy byte after the address bits.
> > > > > > > >
> > > > > > > > Signed-off-by: Bin Meng 
> > > > > > >
> > > > > > > I couldn't find a datasheet that says this... But the actual code
> > > > > > > change looks fine, so:
> > > > > > >
> > > > > > > Acked-by: Alistair Francis 
> > > > > > >
> > > > > > > Alistair
> > > > > > >
> > > > > > > > ---
> > > > > > > >
> > > > > > > >  hw/block/m25p80.c | 3 +++
> > > > > > > >  1 file changed, 3 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
> > > > > > > > index 483925f..9b36762 100644
> > > > > > > > --- a/hw/block/m25p80.c
> > > > > > > > +++ b/hw/block/m25p80.c
> > > > > > > > @@ -825,6 +825,9 @@ static void decode_fast_read_cmd(Flash *s)
> > > > > > > >  s->needed_bytes = get_addr_length(s);
> > > > > > > >  switch (get_man(s)) {
> > > > > > > >  /* Dummy cycles - modeled with bytes writes instead of 
> > > > > > > > bits */
> > > > > > > > +case MAN_SST:
> > > > > > > > +s->needed_bytes += 1;
> > > > > >
> > > > > > 1 dummy clk cycle is modelled as 1 byte write (see the comment 
> > > > > > above), so 1
> > > > > > dummy byte (8 dummy clk cycles) will need +8 above.
> > > > >
> > > > > I think you were confused by the WINBOND codes. The comments are
> > > > > correct. It is modeled with bytes instead of bits, so we should +=1.
> > > >
> > > > What the comment says is (perhaps not superclear) that 1 dummy clock 
> > > > cycle
> > > > is modeled as one 1 byte write into the flash (meaining that 8 byte 
> > > > writes
> > > > are needed for 1 dummy byte). Perhaps it is easier to understand
> > > > looking into how the controllers issue the command towards the flash 
> > > > model
> > > > (for example the xilinx_spips), the start of the FAST_READ cmd is issued
> > > > as writing the following into the flash: 1 byte (cmd), 3 bytes 
> > > > (address),
> > > > 8 bytes (8 dummy cycles -> 1 dummy byte).
> > > >
> > >
> > > My interpretation of the comments are opposite: one cycle is a bit,
> > > but we are not using bits, instead we are using bytes.
> >
> > Yes, the mentioning of 'bits' in the comment makes it not very clear at 
> > first read.
> > Maybe just bellow would have been better:
> >
> > /* Dummy clock cycles - modeled with bytes writes */
> >
> > >
> > > Testing shows that +=1 is the correct way with the imx_spi controller,
> > > and with my SiFive SPI model in my local tree (not upstreamed yet)
> >
> > Perhaps an option could be to look into how the aspeed_smc, xilinx_spips or 
> > the
> > npcm7xx_fiu generate dummy clock cycles and see if a similar solution to 
> > one of
> > those could work aswell for the imx_spi?
> >
> 
> Thanks for pointing this out. So there is some inconsistency among
> different SPI controller modeling.

I'm not sure I understand you correctly but the controllers supporting
commands with dummy clock cycles can only do it following the modeled
approach, so I would rather say it is pretty consistent across the
controllers (not all controllers support these commands though).

> 
> Or maybe fixing aspeed_smc, xilinx_spips and npcm7xx_fiu to work like
> imx_spi?

For me I would say no to above (it makes more sense to let new controllers
implement the currently modeled approach).

> Which one is the expected behavior for dummy cycles?

Dummy clock cycles are modeled as 1 byte written to the flash per dummy clock
cycle (expected behavior).

Best regards,
Francisco Iglesias

> 
> > Regarding this patch, with += 8 it looks correct to me (and will work with
> > above controllers as far as I can see).
> >
> 
> Regards,
> Bin

Re: [PATCH v14 08/13] copy-on-read: skip non-guest reads if no copy needed

2020-12-11 Thread Max Reitz


On 04.12.20 23:07, Vladimir Sementsov-Ogievskiy wrote:

From: Andrey Shinkevich 

If the flag BDRV_REQ_PREFETCH was set, skip idling read/write
operations in COR-driver. It can be taken into account for the
COR-algorithms optimization. That check is being made during the
block stream job by the moment.

Add the BDRV_REQ_PREFETCH flag to the supported_read_flags of the
COR-filter.

block: Modify the comment for the flag BDRV_REQ_PREFETCH as we are
going to use it alone and pass it to the COR-filter driver for further
processing.

Signed-off-by: Andrey Shinkevich 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
  include/block/block.h |  8 +---
  block/copy-on-read.c  | 14 ++
  2 files changed, 15 insertions(+), 7 deletions(-)


Reviewed-by: Max Reitz

Re: [PATCH v14 07/13] block: include supported_read_flags into BDS structure

2020-12-11 Thread Vladimir Sementsov-Ogievskiy


11.12.2020 16:20, Max Reitz wrote:

On 04.12.20 23:07, Vladimir Sementsov-Ogievskiy wrote:

From: Andrey Shinkevich 

Add the new member supported_read_flags to the BlockDriverState
structure. It will control the flags set for copy-on-read operations.
Make the block generic layer evaluate supported read flags before they
go to a block driver.

Suggested-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Andrey Shinkevich 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
  include/block/block_int.h |  4 
  block/io.c    | 12 ++--
  2 files changed, 14 insertions(+), 2 deletions(-)


[...]


diff --git a/block/io.c b/block/io.c
index ec5e152bb7..e28b11c42b 100644
--- a/block/io.c
+++ b/block/io.c


[...]


@@ -1426,9 +1429,13 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild 
*child,
  goto out;
  }
+    if (flags & ~bs->supported_read_flags) {
+    abort();
+    }


I’d prefer an assert(!(flags & ~bs->supported_read_flags)), so in case we do 
abort, there’s going to be an error message that immediately tells what the problem is.


agree. and one-line check is shorter than three-line



Apart from that:

Reviewed-by: Max Reitz 


+
  max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align);
  if (bytes <= max_bytes && bytes <= max_transfer) {
-    ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, 0);
+    ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, flags);
  goto out;
  }





--
Best regards,
Vladimir

Re: [PATCH v14 06/13] iotests: add #310 to test bottom node in COR driver

2020-12-11 Thread Max Reitz


On 11.12.20 14:10, Vladimir Sementsov-Ogievskiy wrote:

11.12.2020 15:49, Max Reitz wrote:

On 04.12.20 23:07, Vladimir Sementsov-Ogievskiy wrote:

From: Andrey Shinkevich 

The test case #310 is similar to #216 by Max Reitz. The difference is
that the test #310 involves a bottom node to the COR filter driver.

Signed-off-by: Andrey Shinkevich 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
  tests/qemu-iotests/310 | 114 +
  tests/qemu-iotests/310.out |  15 +
  tests/qemu-iotests/group   |   1 +
  3 files changed, 130 insertions(+)
  create mode 100755 tests/qemu-iotests/310
  create mode 100644 tests/qemu-iotests/310.out

diff --git a/tests/qemu-iotests/310 b/tests/qemu-iotests/310
new file mode 100755
index 00..c8b34cd887
--- /dev/null
+++ b/tests/qemu-iotests/310
@@ -0,0 +1,114 @@
+#!/usr/bin/env python3
+#
+# Copy-on-read tests using a COR filter with a bottom node
+#
+# Copyright (C) 2018 Red Hat, Inc.
+# Copyright (c) 2020 Virtuozzo International GmbH
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import iotests
+from iotests import log, qemu_img, qemu_io_silent
+
+# Need backing file support
+iotests.script_initialize(supported_fmts=['qcow2', 'qcow', 'qed', 
'vmdk'],

+  supported_platforms=['linux'])
+
+log('')
+log('=== Copy-on-read across nodes ===')
+log('')
+
+# This test is similar to the 216 one by Max Reitz 
+# The difference is that this test case involves a bottom node to the
+# COR filter driver.
+
+with iotests.FilePath('base.img') as base_img_path, \
+ iotests.FilePath('mid.img') as mid_img_path, \
+ iotests.FilePath('top.img') as top_img_path, \
+ iotests.VM() as vm:
+
+    log('--- Setting up images ---')
+    log('')
+
+    assert qemu_img('create', '-f', iotests.imgfmt, base_img_path, 
'64M') == 0

+    assert qemu_io_silent(base_img_path, '-c', 'write -P 1 0M 1M') == 0
+    assert qemu_io_silent(base_img_path, '-c', 'write -P 1 3M 1M') == 0
+    assert qemu_img('create', '-f', iotests.imgfmt, '-b', 
base_img_path,

+    '-F', iotests.imgfmt, mid_img_path) == 0
+    assert qemu_io_silent(mid_img_path,  '-c', 'write -P 3 2M 1M') == 0
+    assert qemu_io_silent(mid_img_path,  '-c', 'write -P 3 4M 1M') == 0
+    assert qemu_img('create', '-f', iotests.imgfmt, '-b', mid_img_path,
+    '-F', iotests.imgfmt, top_img_path) == 0
+    assert qemu_io_silent(top_img_path,  '-c', 'write -P 2 1M 1M') == 0
+
+#  0 1 2 3 4
+# top    2
+# mid  3   3
+# base 1 1
+
+    log('Done')
+
+    log('')
+    log('--- Doing COR ---')
+    log('')
+
+    vm.launch()
+
+    log(vm.qmp('blockdev-add',
+   node_name='node0',
+   driver='copy-on-read',
+   bottom='node2',
+   file={
+   'driver': iotests.imgfmt,
+   'file': {
+   'driver': 'file',
+   'filename': top_img_path
+   },
+   'backing': {
+   'node-name': 'node2',
+   'driver': iotests.imgfmt,
+   'file': {
+   'driver': 'file',
+   'filename': mid_img_path
+   },
+   'backing': {
+   'driver': iotests.imgfmt,
+   'file': {
+   'driver': 'file',
+   'filename': base_img_path
+   }
+   },
+   }
+   }))
+
+    # Trigger COR
+    log(vm.qmp('human-monitor-command',
+   command_line='qemu-io node0 "read 0 5M"'))
+
+    vm.shutdown()
+
+    log('')
+    log('--- Checking COR result ---')
+    log('')
+
+    assert qemu_io_silent(base_img_path, '-c', 'discard 0 4M') == 0
+    assert qemu_io_silent(mid_img_path, '-c', 'discard 0M 5M') == 0


The data discard leaves behind is undefined, so this may not result in 
zeroes.  (In fact, the test does fail for me with vmdk, qed, and 
qcow.)  'write -z' would work better, although perhaps you 
intentionally chose discard to just drop the data from the backing 
images.


In that case, you could also recreate the middle image, so it’s empty 
then – the only problem with that is that it’ll break VMDK because it 
st

Re: [PATCH v14 07/13] block: include supported_read_flags into BDS structure

2020-12-11 Thread Max Reitz


On 04.12.20 23:07, Vladimir Sementsov-Ogievskiy wrote:

From: Andrey Shinkevich 

Add the new member supported_read_flags to the BlockDriverState
structure. It will control the flags set for copy-on-read operations.
Make the block generic layer evaluate supported read flags before they
go to a block driver.

Suggested-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Andrey Shinkevich 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
  include/block/block_int.h |  4 
  block/io.c| 12 ++--
  2 files changed, 14 insertions(+), 2 deletions(-)


[...]


diff --git a/block/io.c b/block/io.c
index ec5e152bb7..e28b11c42b 100644
--- a/block/io.c
+++ b/block/io.c


[...]


@@ -1426,9 +1429,13 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild 
*child,
  goto out;
  }
  
+if (flags & ~bs->supported_read_flags) {

+abort();
+}


I’d prefer an assert(!(flags & ~bs->supported_read_flags)), so in case 
we do abort, there’s going to be an error message that immediately tells 
what the problem is.


Apart from that:

Reviewed-by: Max Reitz 


+
  max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align);
  if (bytes <= max_bytes && bytes <= max_transfer) {
-ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, 0);
+ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, flags);
  goto out;
  }

Re: [PATCH v14 06/13] iotests: add #310 to test bottom node in COR driver

2020-12-11 Thread Vladimir Sementsov-Ogievskiy


11.12.2020 15:49, Max Reitz wrote:

On 04.12.20 23:07, Vladimir Sementsov-Ogievskiy wrote:

From: Andrey Shinkevich 

The test case #310 is similar to #216 by Max Reitz. The difference is
that the test #310 involves a bottom node to the COR filter driver.

Signed-off-by: Andrey Shinkevich 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
  tests/qemu-iotests/310 | 114 +
  tests/qemu-iotests/310.out |  15 +
  tests/qemu-iotests/group   |   1 +
  3 files changed, 130 insertions(+)
  create mode 100755 tests/qemu-iotests/310
  create mode 100644 tests/qemu-iotests/310.out

diff --git a/tests/qemu-iotests/310 b/tests/qemu-iotests/310
new file mode 100755
index 00..c8b34cd887
--- /dev/null
+++ b/tests/qemu-iotests/310
@@ -0,0 +1,114 @@
+#!/usr/bin/env python3
+#
+# Copy-on-read tests using a COR filter with a bottom node
+#
+# Copyright (C) 2018 Red Hat, Inc.
+# Copyright (c) 2020 Virtuozzo International GmbH
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import iotests
+from iotests import log, qemu_img, qemu_io_silent
+
+# Need backing file support
+iotests.script_initialize(supported_fmts=['qcow2', 'qcow', 'qed', 'vmdk'],
+  supported_platforms=['linux'])
+
+log('')
+log('=== Copy-on-read across nodes ===')
+log('')
+
+# This test is similar to the 216 one by Max Reitz 
+# The difference is that this test case involves a bottom node to the
+# COR filter driver.
+
+with iotests.FilePath('base.img') as base_img_path, \
+ iotests.FilePath('mid.img') as mid_img_path, \
+ iotests.FilePath('top.img') as top_img_path, \
+ iotests.VM() as vm:
+
+    log('--- Setting up images ---')
+    log('')
+
+    assert qemu_img('create', '-f', iotests.imgfmt, base_img_path, '64M') == 0
+    assert qemu_io_silent(base_img_path, '-c', 'write -P 1 0M 1M') == 0
+    assert qemu_io_silent(base_img_path, '-c', 'write -P 1 3M 1M') == 0
+    assert qemu_img('create', '-f', iotests.imgfmt, '-b', base_img_path,
+    '-F', iotests.imgfmt, mid_img_path) == 0
+    assert qemu_io_silent(mid_img_path,  '-c', 'write -P 3 2M 1M') == 0
+    assert qemu_io_silent(mid_img_path,  '-c', 'write -P 3 4M 1M') == 0
+    assert qemu_img('create', '-f', iotests.imgfmt, '-b', mid_img_path,
+    '-F', iotests.imgfmt, top_img_path) == 0
+    assert qemu_io_silent(top_img_path,  '-c', 'write -P 2 1M 1M') == 0
+
+#  0 1 2 3 4
+# top    2
+# mid  3   3
+# base 1 1
+
+    log('Done')
+
+    log('')
+    log('--- Doing COR ---')
+    log('')
+
+    vm.launch()
+
+    log(vm.qmp('blockdev-add',
+   node_name='node0',
+   driver='copy-on-read',
+   bottom='node2',
+   file={
+   'driver': iotests.imgfmt,
+   'file': {
+   'driver': 'file',
+   'filename': top_img_path
+   },
+   'backing': {
+   'node-name': 'node2',
+   'driver': iotests.imgfmt,
+   'file': {
+   'driver': 'file',
+   'filename': mid_img_path
+   },
+   'backing': {
+   'driver': iotests.imgfmt,
+   'file': {
+   'driver': 'file',
+   'filename': base_img_path
+   }
+   },
+   }
+   }))
+
+    # Trigger COR
+    log(vm.qmp('human-monitor-command',
+   command_line='qemu-io node0 "read 0 5M"'))
+
+    vm.shutdown()
+
+    log('')
+    log('--- Checking COR result ---')
+    log('')
+
+    assert qemu_io_silent(base_img_path, '-c', 'discard 0 4M') == 0
+    assert qemu_io_silent(mid_img_path, '-c', 'discard 0M 5M') == 0


The data discard leaves behind is undefined, so this may not result in zeroes.  
(In fact, the test does fail for me with vmdk, qed, and qcow.)  'write -z' 
would work better, although perhaps you intentionally chose discard to just 
drop the data from the backing images.

In that case, you could also recreate the middle image, so it’s empty then – 
the only problem with that is that it’ll break VMDK because it stores this 
reference to its backing image, and if the backing ima

Re: [PATCH v14 06/13] iotests: add #310 to test bottom node in COR driver

2020-12-11 Thread Max Reitz


On 04.12.20 23:07, Vladimir Sementsov-Ogievskiy wrote:

From: Andrey Shinkevich 

The test case #310 is similar to #216 by Max Reitz. The difference is
that the test #310 involves a bottom node to the COR filter driver.

Signed-off-by: Andrey Shinkevich 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
  tests/qemu-iotests/310 | 114 +
  tests/qemu-iotests/310.out |  15 +
  tests/qemu-iotests/group   |   1 +
  3 files changed, 130 insertions(+)
  create mode 100755 tests/qemu-iotests/310
  create mode 100644 tests/qemu-iotests/310.out

diff --git a/tests/qemu-iotests/310 b/tests/qemu-iotests/310
new file mode 100755
index 00..c8b34cd887
--- /dev/null
+++ b/tests/qemu-iotests/310
@@ -0,0 +1,114 @@
+#!/usr/bin/env python3
+#
+# Copy-on-read tests using a COR filter with a bottom node
+#
+# Copyright (C) 2018 Red Hat, Inc.
+# Copyright (c) 2020 Virtuozzo International GmbH
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import iotests
+from iotests import log, qemu_img, qemu_io_silent
+
+# Need backing file support
+iotests.script_initialize(supported_fmts=['qcow2', 'qcow', 'qed', 'vmdk'],
+  supported_platforms=['linux'])
+
+log('')
+log('=== Copy-on-read across nodes ===')
+log('')
+
+# This test is similar to the 216 one by Max Reitz 
+# The difference is that this test case involves a bottom node to the
+# COR filter driver.
+
+with iotests.FilePath('base.img') as base_img_path, \
+ iotests.FilePath('mid.img') as mid_img_path, \
+ iotests.FilePath('top.img') as top_img_path, \
+ iotests.VM() as vm:
+
+log('--- Setting up images ---')
+log('')
+
+assert qemu_img('create', '-f', iotests.imgfmt, base_img_path, '64M') == 0
+assert qemu_io_silent(base_img_path, '-c', 'write -P 1 0M 1M') == 0
+assert qemu_io_silent(base_img_path, '-c', 'write -P 1 3M 1M') == 0
+assert qemu_img('create', '-f', iotests.imgfmt, '-b', base_img_path,
+'-F', iotests.imgfmt, mid_img_path) == 0
+assert qemu_io_silent(mid_img_path,  '-c', 'write -P 3 2M 1M') == 0
+assert qemu_io_silent(mid_img_path,  '-c', 'write -P 3 4M 1M') == 0
+assert qemu_img('create', '-f', iotests.imgfmt, '-b', mid_img_path,
+'-F', iotests.imgfmt, top_img_path) == 0
+assert qemu_io_silent(top_img_path,  '-c', 'write -P 2 1M 1M') == 0
+
+#  0 1 2 3 4
+# top2
+# mid  3   3
+# base 1 1
+
+log('Done')
+
+log('')
+log('--- Doing COR ---')
+log('')
+
+vm.launch()
+
+log(vm.qmp('blockdev-add',
+   node_name='node0',
+   driver='copy-on-read',
+   bottom='node2',
+   file={
+   'driver': iotests.imgfmt,
+   'file': {
+   'driver': 'file',
+   'filename': top_img_path
+   },
+   'backing': {
+   'node-name': 'node2',
+   'driver': iotests.imgfmt,
+   'file': {
+   'driver': 'file',
+   'filename': mid_img_path
+   },
+   'backing': {
+   'driver': iotests.imgfmt,
+   'file': {
+   'driver': 'file',
+   'filename': base_img_path
+   }
+   },
+   }
+   }))
+
+# Trigger COR
+log(vm.qmp('human-monitor-command',
+   command_line='qemu-io node0 "read 0 5M"'))
+
+vm.shutdown()
+
+log('')
+log('--- Checking COR result ---')
+log('')
+
+assert qemu_io_silent(base_img_path, '-c', 'discard 0 4M') == 0
+assert qemu_io_silent(mid_img_path, '-c', 'discard 0M 5M') == 0


The data discard leaves behind is undefined, so this may not result in 
zeroes.  (In fact, the test does fail for me with vmdk, qed, and qcow.) 
 'write -z' would work better, although perhaps you intentionally chose 
discard to just drop the data from the backing images.


In that case, you could also recreate the middle image, so it’s empty 
then – the only problem with that is that it’ll break VMDK because it 
stores this reference to its backing image, and if the backing image is 
changed, you’ll get EINVAL w

Re: RFC: don't store backing filename in qcow2 image

2020-12-11 Thread Vladimir Sementsov-Ogievskiy


11.12.2020 12:44, Peter Krempa wrote:

On Thu, Dec 10, 2020 at 17:26:52 +0300, Vladimir Sementsov-Ogievskiy wrote:

Hi all!


Hi,



I have an idea, that not storing backing filename in qcow2 image at all may be 
a good thing. I'll give some reasons and want to know what do you think about 
it.

1. Libvirt has to manage and keep in mind backing chains anyway.

This means, that storing this information in qcow2 header is a source of bugs when we 
update it in one place but failed/forget to update in another. Of course, Libvirt is not 
the only user of qemu.. But we are moving to "blockdev" anyway, when management 
tool should control all node-names at least. It would be strange to not control the 
relations between images in the same time.


At the same time many users depend on this. If you move in images from
another host, you'd have to remember the dependencies/order.


2. backing file name specified in qcow2 metadata doesn't relate to any other 
thing, and nothing rely on it.

3. calculating and updating backing file name in Qemu is a headache:
- with some options specified or with filters we risk to write json filenames 
into qcow2 metadata, which is almost never what user wants. Also, json may exceed 
the qcow2 limitation of backing_file_size to be <= 1023


As long as it works (libvirt and qemu have parsers for json:) I don't
think the user cares.


- updating it in transactional way for read-only image during reopen, when another 
transactional permission update is ongoing is difficult (who know, how to do it?) 
(remember recent d669ed6ab02849 "block: make bdrv_drop_intermediate() less 
wrong")

4. Moving qcow2 files to another directory is a problem: you should care to 
update backing file names in all dependent qcow2 images.


Or alternatively use relative names.


So, what about moving libvirt (at least) to not rely on backing file name 
stored in qcow2 image? Backing chain then should be in xml? Is it hard or not? 
Finally, will it make the code simpler, or more difficult?


Then, if the idea is good in general, what to do on Qemu part? If we want to 
finally get rid of problem code (see [3.]) we should deprecate something.. Just 
deprecate support for qcow2 images with backing file specified, requiring user 
always specify backing chain by hand? I don't see anything that should be 
changed in qcow2 format itself: no reason to add some kind of restricted bits, 
etc..


I think this will create headaches for many users. Libvirt does support
specification of the chain manually, but doesn't mandate it.

It's also a fairly recent addition to libvirt so I doubt that any other
project which uses libvirt only for a part of the functionality (such as
oVirt or openstack) picked up the full specification of chain in the
XML. The problem here is that libvirt isn't used for the whole knowledge
state here. Rather projects like oVirt feed us a new XML every single
time. This means that they'd need to start keeping the chain info
internally too.

Rather they currently rely on our detection code and the proper setting
of paths in the image, and thus removing it would be a rather serious
regression in behaviour, which would be visible beyond libvirt without
any way for us to make it opaque to higher levels.



Thanks for explanation.

Hmm, yes, it sounds like we'll never drop support for filename-based backing 
chain. And if we can't drop the support, no reason to deprecate it.

--
Best regards,
Vladimir

Re: [PATCH v14 05/13] qapi: create BlockdevOptionsCor structure for COR driver

2020-12-11 Thread Vladimir Sementsov-Ogievskiy


11.12.2020 11:54, Max Reitz wrote:

On 10.12.20 19:30, Vladimir Sementsov-Ogievskiy wrote:

10.12.2020 20:43, Max Reitz wrote:

I don’t like this patch’s subject very much, because I find the implementation 
of the @bottom option to be more noteworthy than the addition of the QAPI 
structure.


On 04.12.20 23:07, Vladimir Sementsov-Ogievskiy wrote:

From: Andrey Shinkevich 

Create the BlockdevOptionsCor structure for COR driver specific options
splitting it off form the BlockdevOptionsGenericFormat. The only option
'bottom' node in the structure denotes an image file that limits the
COR operations in the backing chain.
We are going to use the COR-filter for a block-stream job and will pass
a bottom node name to the COR driver. The bottom node is the first
non-filter overlay of the base. It was introduced because the base node
itself may change due to possible concurrent jobs.

Suggested-by: Max Reitz 
Suggested-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Andrey Shinkevich 
   [vsementsov: fix bdrv_is_allocated_above() usage]
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  qapi/block-core.json | 21 +++-
  block/copy-on-read.c | 57 ++--
  2 files changed, 75 insertions(+), 3 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 8ef3df6767..04055ef50c 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3942,6 +3942,25 @@
    'data': { 'throttle-group': 'str',
  'file' : 'BlockdevRef'
   } }
+
+##
+# @BlockdevOptionsCor:
+#
+# Driver specific block device options for the copy-on-read driver.
+#
+# @bottom: the name of a non-filter node (allocation-bearing layer) that limits
+#  the COR operations in the backing chain (inclusive).


This seems to me like something’s missing.  Perhaps technically there isn’t, 
but “limits the COR operations” begs the question (to me) “Limits them in what 
way?” (to which the answer is: No data below @bottom is copied).

Could you make it more verbose?  Perhaps something like “The name of a 
non-filter node (allocation-bearing layer) that limits the COR operations in 
the backing chain (inclusive), so that no data below this node will be copied 
by this filter”?


Sounds good for me.




+#  For the block-stream job, it will be the first non-filter overlay of
+#  the base node. We do not involve the base node into the COR
+#  operations because the base may change due to a concurrent
+#  block-commit job on the same backing chain.




I now see that paragraph conflicts with further introduce of "bottom" for 
stream job itself. I think it may be safely dropped. It's a wrong place to describe how 
block-stream works.


I think the default behavior should be mentioned here somewhere, i.e. that no 
limit is applied, so that data from all backing layers may be copied.


agree




+#
+# Since: 5.2


*6.0


+##
+{ 'struct': 'BlockdevOptionsCor',
+  'base': 'BlockdevOptionsGenericFormat',
+  'data': { '*bottom': 'str' } }
+
  ##
  # @BlockdevOptions:
  #


[...]


diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index 618c4c4f43..67f61983c0 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c


[...]


@@ -51,7 +56,17 @@ static int cor_open(BlockDriverState *bs, QDict *options, 
int flags,
  ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
  bs->file->bs->supported_zero_flags);
+    if (bottom_node) {
+    bottom_bs = bdrv_lookup_bs(NULL, bottom_node, errp);
+    if (!bottom_bs) {
+    error_setg(errp, "Bottom node '%s' not found", bottom_node);
+    qdict_del(options, "bottom");
+    return -EINVAL;
+    }


Should we verify that bottom_bs is not a filter, as required by the schema?



yes, thanks for the catch!


Hmm.. Interesting, we don't freeze the backing chain in cor filter open. And I 
think we shouldn't. But then, bottom node may disappear. We should handle it 
without a crash.

I suggest:

1. document, that if bottom node disappear from the backing chain of the filter, it 
continues to work like without any specified "bottom" node

2. do bdrv_ref/bdrv_unref of bottom_bs, to not work with dead pointer

3. check in cor_co_preadv_part() is bottom_bs is still in backing chain or not


Hm, right.

Alternatively, we could also freeze the chain until the bottom node and then 
allow changing the @bottom node through reopen.  Then it would have to be 
manually unset before the bottom node is allowed to disappear from the chain.

Would freezing the chain pose a problem?



Hmm. Then we'll just need not freeze it in block-stream, so freezing is done by 
filter.

It's just more restrictive.. But I can't imagine reasonable cases where user specify 
bottom node and than remove it. Forcing user to reopen the filter to change the bottom 
node may be more clean then "just don't care". OK, I think we can freeze the 
chain in the filter.


--
Best regard

Re: RFC: don't store backing filename in qcow2 image

2020-12-11 Thread Peter Krempa

On Thu, Dec 10, 2020 at 17:26:52 +0300, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!

Hi,

> 
> I have an idea, that not storing backing filename in qcow2 image at all may 
> be a good thing. I'll give some reasons and want to know what do you think 
> about it.
> 
> 1. Libvirt has to manage and keep in mind backing chains anyway.
> 
> This means, that storing this information in qcow2 header is a source of bugs 
> when we update it in one place but failed/forget to update in another. Of 
> course, Libvirt is not the only user of qemu.. But we are moving to 
> "blockdev" anyway, when management tool should control all node-names at 
> least. It would be strange to not control the relations between images in the 
> same time.

At the same time many users depend on this. If you move in images from
another host, you'd have to remember the dependencies/order.

> 2. backing file name specified in qcow2 metadata doesn't relate to any other 
> thing, and nothing rely on it.
> 
> 3. calculating and updating backing file name in Qemu is a headache:
>- with some options specified or with filters we risk to write json 
> filenames into qcow2 metadata, which is almost never what user wants. Also, 
> json may exceed the qcow2 limitation of backing_file_size to be <= 1023

As long as it works (libvirt and qemu have parsers for json:) I don't
think the user cares.

>- updating it in transactional way for read-only image during reopen, when 
> another transactional permission update is ongoing is difficult (who know, 
> how to do it?) (remember recent d669ed6ab02849 "block: make 
> bdrv_drop_intermediate() less wrong")
> 
> 4. Moving qcow2 files to another directory is a problem: you should care to 
> update backing file names in all dependent qcow2 images.

Or alternatively use relative names.

> So, what about moving libvirt (at least) to not rely on backing file name 
> stored in qcow2 image? Backing chain then should be in xml? Is it hard or 
> not? Finally, will it make the code simpler, or more difficult?
> 
> 
> Then, if the idea is good in general, what to do on Qemu part? If we want to 
> finally get rid of problem code (see [3.]) we should deprecate something.. 
> Just deprecate support for qcow2 images with backing file specified, 
> requiring user always specify backing chain by hand? I don't see anything 
> that should be changed in qcow2 format itself: no reason to add some kind of 
> restricted bits, etc..

I think this will create headaches for many users. Libvirt does support
specification of the chain manually, but doesn't mandate it.

It's also a fairly recent addition to libvirt so I doubt that any other
project which uses libvirt only for a part of the functionality (such as
oVirt or openstack) picked up the full specification of chain in the
XML. The problem here is that libvirt isn't used for the whole knowledge
state here. Rather projects like oVirt feed us a new XML every single
time. This means that they'd need to start keeping the chain info
internally too.

Rather they currently rely on our detection code and the proper setting
of paths in the image, and thus removing it would be a rather serious
regression in behaviour, which would be visible beyond libvirt without
any way for us to make it opaque to higher levels.

Re: [PATCH v14 05/13] qapi: create BlockdevOptionsCor structure for COR driver

2020-12-11 Thread Max Reitz


On 10.12.20 19:30, Vladimir Sementsov-Ogievskiy wrote:

10.12.2020 20:43, Max Reitz wrote:
I don’t like this patch’s subject very much, because I find the 
implementation of the @bottom option to be more noteworthy than the 
addition of the QAPI structure.



On 04.12.20 23:07, Vladimir Sementsov-Ogievskiy wrote:

From: Andrey Shinkevich 

Create the BlockdevOptionsCor structure for COR driver specific options
splitting it off form the BlockdevOptionsGenericFormat. The only option
'bottom' node in the structure denotes an image file that limits the
COR operations in the backing chain.
We are going to use the COR-filter for a block-stream job and will pass
a bottom node name to the COR driver. The bottom node is the first
non-filter overlay of the base. It was introduced because the base node
itself may change due to possible concurrent jobs.

Suggested-by: Max Reitz 
Suggested-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Andrey Shinkevich 
   [vsementsov: fix bdrv_is_allocated_above() usage]
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  qapi/block-core.json | 21 +++-
  block/copy-on-read.c | 57 ++--
  2 files changed, 75 insertions(+), 3 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 8ef3df6767..04055ef50c 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3942,6 +3942,25 @@
    'data': { 'throttle-group': 'str',
  'file' : 'BlockdevRef'
   } }
+
+##
+# @BlockdevOptionsCor:
+#
+# Driver specific block device options for the copy-on-read driver.
+#
+# @bottom: the name of a non-filter node (allocation-bearing layer) 
that limits

+#  the COR operations in the backing chain (inclusive).


This seems to me like something’s missing.  Perhaps technically there 
isn’t, but “limits the COR operations” begs the question (to me) 
“Limits them in what way?” (to which the answer is: No data below 
@bottom is copied).


Could you make it more verbose?  Perhaps something like “The name of a 
non-filter node (allocation-bearing layer) that limits the COR 
operations in the backing chain (inclusive), so that no data below 
this node will be copied by this filter”?


Sounds good for me.



+#  For the block-stream job, it will be the first non-filter 
overlay of

+#  the base node. We do not involve the base node into the COR
+#  operations because the base may change due to a concurrent
+#  block-commit job on the same backing chain.




I now see that paragraph conflicts with further introduce of "bottom" 
for stream job itself. I think it may be safely dropped. It's a wrong 
place to describe how block-stream works.


I think the default behavior should be mentioned here somewhere, i.e. 
that no limit is applied, so that data from all backing layers may be 
copied.


agree




+#
+# Since: 5.2


*6.0


+##
+{ 'struct': 'BlockdevOptionsCor',
+  'base': 'BlockdevOptionsGenericFormat',
+  'data': { '*bottom': 'str' } }
+
  ##
  # @BlockdevOptions:
  #


[...]


diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index 618c4c4f43..67f61983c0 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c


[...]

@@ -51,7 +56,17 @@ static int cor_open(BlockDriverState *bs, QDict 
*options, int flags,

  ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
  bs->file->bs->supported_zero_flags);
+    if (bottom_node) {
+    bottom_bs = bdrv_lookup_bs(NULL, bottom_node, errp);
+    if (!bottom_bs) {
+    error_setg(errp, "Bottom node '%s' not found", 
bottom_node);

+    qdict_del(options, "bottom");
+    return -EINVAL;
+    }


Should we verify that bottom_bs is not a filter, as required by the 
schema?




yes, thanks for the catch!


Hmm.. Interesting, we don't freeze the backing chain in cor filter open. 
And I think we shouldn't. But then, bottom node may disappear. We should 
handle it without a crash.


I suggest:

1. document, that if bottom node disappear from the backing chain of the 
filter, it continues to work like without any specified "bottom" node


2. do bdrv_ref/bdrv_unref of bottom_bs, to not work with dead pointer

3. check in cor_co_preadv_part() is bottom_bs is still in backing chain 
or not


Hm, right.

Alternatively, we could also freeze the chain until the bottom node and 
then allow changing the @bottom node through reopen.  Then it would have 
to be manually unset before the bottom node is allowed to disappear from 
the chain.


Would freezing the chain pose a problem?

Max

83 matches

Mail list logo