[Qemu-block] [PATCH 0/2] nbd: add zero-init parameter

2016-10-05 Thread Denis V. Lunev
When using a nbd block device, the info about necessity of prior disk
zeroing could significantly improve the speed of certain operations
(e.g. backups).

This patch also will allow to preserve QCOW2 images during migration.
Management software now may specify zero-init option and thus abscent
areas in the QCOW2 areas will not be marked as zeroes and will remain
empty. This is tight distiction but it is here.

Signed-off-by: Denis Plotnikov 
Signed-off-by: Denis V. Lunev 
CC: Paolo Bonzini 
CC: Kevin Wolf 
CC: Max Reitz 

Denis Plotnikov (2):
  nbd: change option parsing scheme
  nbd: add zero-init parameter

 block/nbd.c | 204 ++--
 1 file changed, 184 insertions(+), 20 deletions(-)

-- 
2.7.4




[Qemu-block] [PATCH 1/2] nbd: change option parsing scheme

2016-10-05 Thread Denis V. Lunev
From: Denis Plotnikov 

This is a preparatory commit to make code more generic. We are going to
add more options in the next patch.

Signed-off-by: Denis Plotnikov 
Signed-off-by: Denis V. Lunev 
CC: Paolo Bonzini 
CC: Kevin Wolf 
CC: Max Reitz 
---
 block/nbd.c | 143 
 1 file changed, 124 insertions(+), 19 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 6bc06d6..3b133ed 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -38,7 +38,9 @@
 #include "qapi/qmp/qstring.h"
 #include "qemu/cutils.h"
 
-#define EN_OPTSTR ":exportname="
+#define EN_OPTSTR "exportname"
+
+#define PATH_PARAM  (1u << 0)
 
 typedef struct BDRVNBDState {
 NbdClientSession client;
@@ -47,6 +49,46 @@ typedef struct BDRVNBDState {
 char *path, *host, *port, *export, *tlscredsid;
 } BDRVNBDState;
 
+/*
+ * helpers for dealing with option parsing
+ * to ease futher params adding and managing
+ */
+
+/*
+ *  @param_flags - bit flags defining a set of param names to be parsed
+ */
+static bool parse_query_params(QueryParams *qp, QDict *options,
+   unsigned int param_flags)
+{
+int i;
+for (i = 0; i < qp->n; i++) {
+QueryParam *param = >p[i];
+
+if ((PATH_PARAM & param_flags) &&
+strcmp(param->name, "socket") == 0) {
+qdict_put(options, "path", qstring_from_str(param->value));
+continue;
+}
+
+}
+return true;
+}
+
+static bool find_prohibited_params(QueryParams *qp, unsigned int param_flags)
+{
+int i;
+for (i = 0; i < qp->n; i++) {
+QueryParam *param = >p[i];
+
+if ((PATH_PARAM & param_flags) &&
+strcmp(param->name, "socket") == 0) {
+return true;
+}
+}
+return false;
+}
+
+
 static int nbd_parse_uri(const char *filename, QDict *options)
 {
 URI *uri;
@@ -79,18 +121,18 @@ static int nbd_parse_uri(const char *filename, QDict 
*options)
 }
 
 qp = query_params_parse(uri->query);
-if (qp->n > 1 || (is_unix && !qp->n) || (!is_unix && qp->n)) {
-ret = -EINVAL;
-goto out;
-}
 
 if (is_unix) {
 /* nbd+unix:///export?socket=path */
-if (uri->server || uri->port || strcmp(qp->p[0].name, "socket")) {
+if (uri->server || uri->port) {
+ret = -EINVAL;
+goto out;
+}
+
+if (!parse_query_params(qp, options, PATH_PARAM)) {
 ret = -EINVAL;
 goto out;
 }
-qdict_put(options, "path", qstring_from_str(qp->p[0].value));
 } else {
 QString *host;
 /* nbd[+tcp]://host[:port]/export */
@@ -113,6 +155,11 @@ static int nbd_parse_uri(const char *filename, QDict 
*options)
 qdict_put(options, "port", qstring_from_str(port_str));
 g_free(port_str);
 }
+
+if (find_prohibited_params(qp, PATH_PARAM)) {
+ret = -EINVAL;
+goto out;
+}
 }
 
 out:
@@ -127,7 +174,7 @@ static void nbd_parse_filename(const char *filename, QDict 
*options,
Error **errp)
 {
 char *file;
-char *export_name;
+char *opt_str;
 const char *host_spec;
 const char *unixpath;
 
@@ -150,17 +197,6 @@ static void nbd_parse_filename(const char *filename, QDict 
*options,
 
 file = g_strdup(filename);
 
-export_name = strstr(file, EN_OPTSTR);
-if (export_name) {
-if (export_name[strlen(EN_OPTSTR)] == 0) {
-goto out;
-}
-export_name[0] = 0; /* truncate 'file' */
-export_name += strlen(EN_OPTSTR);
-
-qdict_put(options, "export", qstring_from_str(export_name));
-}
-
 /* extract the host_spec - fail if it's not nbd:... */
 if (!strstart(file, "nbd:", _spec)) {
 error_setg(errp, "File name string for NBD must start with 'nbd:'");
@@ -173,8 +209,40 @@ static void nbd_parse_filename(const char *filename, QDict 
*options,
 
 /* are we a UNIX or TCP socket? */
 if (strstart(host_spec, "unix:", )) {
+opt_str = (char *) unixpath;
+
+/* do we have any options? */
+/* unixpath could be unix: or unix:something:options */
+opt_str = strchr(opt_str, ':');
+
+/* if we have any options then "divide" */
+/* the path and the options by replacing the last colon with "\0" */
+if (opt_str != NULL) {
+/* truncate 'unixpath' replacing the last ":" */
+char *colon_pos = opt_str;
+colon_pos[0] = '\0';
+opt_str++;
+}
 qdict_put(options, "path", qstring_from_str(unixpath));
 } else {
+/* host_spec could be ip:port or ip:port:options */
+int i;
+opt_str = (char *)host_spec;
+for (i = 0; i < 2; i++) {
+opt_str = strchr(opt_str, ':');
+

Re: [Qemu-block] [PATCH 0/2] nbd: add zero-init parameter

2016-10-05 Thread Denis V. Lunev
On 10/05/2016 12:55 PM, Paolo Bonzini wrote:
>
> On 05/10/2016 11:33, Denis V. Lunev wrote:
>> When using a nbd block device, the info about necessity of prior disk
>> zeroing could significantly improve the speed of certain operations
>> (e.g. backups).
>>
>> This patch also will allow to preserve QCOW2 images during migration.
>> Management software now may specify zero-init option and thus abscent
>> areas in the QCOW2 areas will not be marked as zeroes and will remain
>> empty. This is tight distiction but it is here.
> I think Kevin said he wanted to avoid the URI shortcut, and instead
> preferred to use JSON specifications?
>
> Paolo
yep. filling myself stupid. Will resend, the patch will look MUSH shorter.

Den



[Qemu-block] [PATCH 1/3] block: Add node name to BLOCK_IO_ERROR event

2016-10-05 Thread Kevin Wolf
The event currently only contains the BlockBackend name. However, with
anonymous BlockBackends, this is always the empty string. Add the node
name so that the user can still see which block device caused the event.

Signed-off-by: Kevin Wolf 
---
 block/block-backend.c | 5 +++--
 docs/qmp-events.txt   | 6 +-
 qapi/block-core.json  | 8 ++--
 3 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 639294b..911254a 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1206,8 +1206,9 @@ static void send_qmp_error_event(BlockBackend *blk,
 IoOperationType optype;
 
 optype = is_read ? IO_OPERATION_TYPE_READ : IO_OPERATION_TYPE_WRITE;
-qapi_event_send_block_io_error(blk_name(blk), optype, action,
-   blk_iostatus_is_enabled(blk),
+qapi_event_send_block_io_error(blk_name(blk),
+   bdrv_get_node_name(blk_bs(blk)), optype,
+   action, blk_iostatus_is_enabled(blk),
error == ENOSPC, strerror(error),
_abort);
 }
diff --git a/docs/qmp-events.txt b/docs/qmp-events.txt
index 7967ec4..25307ab 100644
--- a/docs/qmp-events.txt
+++ b/docs/qmp-events.txt
@@ -65,7 +65,10 @@ Emitted when a disk I/O error occurs.
 
 Data:
 
-- "device": device name (json-string)
+- "device": device name. This is always present for compatibility
+reasons, but it can be empty ("") if the image does not
+have a device name associated. (json-string)
+- "node-name": node name (json-string)
 - "operation": I/O operation (json-string, "read" or "write")
 - "action": action that has been taken, it's one of the following 
(json-string):
 "ignore": error has been ignored
@@ -76,6 +79,7 @@ Example:
 
 { "event": "BLOCK_IO_ERROR",
 "data": { "device": "ide0-hd1",
+  "node-name": "#block212",
   "operation": "write",
   "action": "stop" },
 "timestamp": { "seconds": 1265044230, "microseconds": 450486 } }
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 9d797b8..c4538d6 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2545,7 +2545,11 @@
 #
 # Emitted when a disk I/O error occurs
 #
-# @device: device name
+# @device: device name. This is always present for compatibility
+#  reasons, but it can be empty ("") if the image does not
+#  have a device name associated.
+#
+# @node-name: node name (Since: 2.8)
 #
 # @operation: I/O operation
 #
@@ -2566,7 +2570,7 @@
 # Since: 0.13.0
 ##
 { 'event': 'BLOCK_IO_ERROR',
-  'data': { 'device': 'str', 'operation': 'IoOperationType',
+  'data': { 'device': 'str', 'node-name': 'str', 'operation': 
'IoOperationType',
 'action': 'BlockErrorAction', '*nospace': 'bool',
 'reason': 'str' } }
 
-- 
1.8.3.1




[Qemu-block] [PATCH 2/3] block-backend: Remember if attached device is non-qdev

2016-10-05 Thread Kevin Wolf
Almost all block devices are qdevified by now. This allows us to go back
from the BlockBackend to the DeviceState. xen_disk is the last device
that is missing. We'll remember in the BlockBackend if a xen_disk is
attached and can then disable any features that require going from a BB
to the DeviceState.

While at it, clearly mark the function used by xen_disk as legacy even
in its name, not just in TODO comments.

Signed-off-by: Kevin Wolf 
---
 block/block-backend.c  | 26 +++---
 hw/block/xen_disk.c|  2 +-
 include/sysemu/block-backend.h |  4 ++--
 3 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 911254a..39c5613 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -38,6 +38,7 @@ struct BlockBackend {
 BlockBackendPublic public;
 
 void *dev;  /* attached device model, if any */
+bool legacy_dev;/* true if dev is not a DeviceState */
 /* TODO change to DeviceState when all users are qdevified */
 const BlockDevOps *dev_ops;
 void *dev_opaque;
@@ -507,32 +508,38 @@ void blk_insert_bs(BlockBackend *blk, BlockDriverState 
*bs)
 }
 }
 
-/*
- * Attach device model @dev to @blk.
- * Return 0 on success, -EBUSY when a device model is attached already.
- */
-int blk_attach_dev(BlockBackend *blk, void *dev)
-/* TODO change to DeviceState *dev when all users are qdevified */
+static int blk_do_attach_dev(BlockBackend *blk, void *dev)
 {
 if (blk->dev) {
 return -EBUSY;
 }
 blk_ref(blk);
 blk->dev = dev;
+blk->legacy_dev = false;
 blk_iostatus_reset(blk);
 return 0;
 }
 
 /*
  * Attach device model @dev to @blk.
+ * Return 0 on success, -EBUSY when a device model is attached already.
+ */
+int blk_attach_dev(BlockBackend *blk, DeviceState *dev)
+{
+return blk_do_attach_dev(blk, dev);
+}
+
+/*
+ * Attach device model @dev to @blk.
  * @blk must not have a device model attached already.
  * TODO qdevified devices don't use this, remove when devices are qdevified
  */
-void blk_attach_dev_nofail(BlockBackend *blk, void *dev)
+void blk_attach_dev_legacy(BlockBackend *blk, void *dev)
 {
 if (blk_attach_dev(blk, dev) < 0) {
 abort();
 }
+blk->legacy_dev = true;
 }
 
 /*
@@ -586,6 +593,11 @@ BlockBackend *blk_by_dev(void *dev)
 void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps *ops,
  void *opaque)
 {
+/* All drivers that use blk_set_dev_ops() are qdevified and we want to keep
+ * it that way, so we can assume blk->dev is a DeviceState if blk->dev_ops
+ * is set. */
+assert(!blk->legacy_dev);
+
 blk->dev_ops = ops;
 blk->dev_opaque = opaque;
 }
diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index 5aa350a..1292a4b 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -1079,7 +1079,7 @@ static int blk_connect(struct XenDevice *xendev)
  * so we can blk_unref() unconditionally */
 blk_ref(blkdev->blk);
 }
-blk_attach_dev_nofail(blkdev->blk, blkdev);
+blk_attach_dev_legacy(blkdev->blk, blkdev);
 blkdev->file_size = blk_getlength(blkdev->blk);
 if (blkdev->file_size < 0) {
 BlockDriverState *bs = blk_bs(blkdev->blk);
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index a7993af..b07159b 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -107,8 +107,8 @@ BlockDeviceIoStatus blk_iostatus(const BlockBackend *blk);
 void blk_iostatus_disable(BlockBackend *blk);
 void blk_iostatus_reset(BlockBackend *blk);
 void blk_iostatus_set_err(BlockBackend *blk, int error);
-int blk_attach_dev(BlockBackend *blk, void *dev);
-void blk_attach_dev_nofail(BlockBackend *blk, void *dev);
+int blk_attach_dev(BlockBackend *blk, DeviceState *dev);
+void blk_attach_dev_legacy(BlockBackend *blk, void *dev);
 void blk_detach_dev(BlockBackend *blk, void *dev);
 void *blk_get_attached_dev(BlockBackend *blk);
 BlockBackend *blk_by_dev(void *dev);
-- 
1.8.3.1




[Qemu-block] [PATCH 3/3] block: Add qdev ID to DEVICE_TRAY_MOVED

2016-10-05 Thread Kevin Wolf
The event currently only contains the BlockBackend name. However, with
anonymous BlockBackends, this is always the empty string. Add the qdev
ID (or if none was given, the QOM path) so that the user can still see
which device caused the event.

Event generation has to be moved from bdrv_eject() to the BlockBackend
because the BDS doesn't know the attached device, but that's easy
because blk_eject() is the only user of it.

Signed-off-by: Kevin Wolf 
---
 block.c   |  7 ---
 block/block-backend.c | 33 -
 docs/qmp-commands.txt |  3 +++
 docs/qmp-events.txt   |  6 +-
 qapi/block.json   |  8 ++--
 5 files changed, 46 insertions(+), 11 deletions(-)

diff --git a/block.c b/block.c
index bb1f1ec..b0bdad6 100644
--- a/block.c
+++ b/block.c
@@ -3360,17 +3360,10 @@ int bdrv_media_changed(BlockDriverState *bs)
 void bdrv_eject(BlockDriverState *bs, bool eject_flag)
 {
 BlockDriver *drv = bs->drv;
-const char *device_name;
 
 if (drv && drv->bdrv_eject) {
 drv->bdrv_eject(bs, eject_flag);
 }
-
-device_name = bdrv_get_device_name(bs);
-if (device_name[0] != '\0') {
-qapi_event_send_device_tray_moved(device_name,
-  eject_flag, _abort);
-}
 }
 
 /**
diff --git a/block/block-backend.c b/block/block-backend.c
index 39c5613..6859791 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -566,6 +566,23 @@ void *blk_get_attached_dev(BlockBackend *blk)
 return blk->dev;
 }
 
+/* Return the qdev ID, or if no ID is assigned the QOM path, of the block
+ * device attached to the BlockBackend. */
+static char *blk_get_attached_dev_id(BlockBackend *blk)
+{
+DeviceState *dev;
+
+assert(!blk->legacy_dev);
+dev = blk->dev;
+
+if (!dev) {
+return g_strdup("");
+} else if (dev->id) {
+return g_strdup(dev->id);
+}
+return object_get_canonical_path(OBJECT(dev));
+}
+
 /*
  * Return the BlockBackend which has the device model @dev attached if it
  * exists, else null.
@@ -613,13 +630,17 @@ void blk_dev_change_media_cb(BlockBackend *blk, bool load)
 if (blk->dev_ops && blk->dev_ops->change_media_cb) {
 bool tray_was_open, tray_is_open;
 
+assert(!blk->legacy_dev);
+
 tray_was_open = blk_dev_is_tray_open(blk);
 blk->dev_ops->change_media_cb(blk->dev_opaque, load);
 tray_is_open = blk_dev_is_tray_open(blk);
 
 if (tray_was_open != tray_is_open) {
-qapi_event_send_device_tray_moved(blk_name(blk), tray_is_open,
+char *id = blk_get_attached_dev_id(blk);
+qapi_event_send_device_tray_moved(blk_name(blk), id, tray_is_open,
   _abort);
+g_free(id);
 }
 }
 }
@@ -1325,9 +1346,19 @@ void blk_lock_medium(BlockBackend *blk, bool locked)
 void blk_eject(BlockBackend *blk, bool eject_flag)
 {
 BlockDriverState *bs = blk_bs(blk);
+char *id;
+
+/* blk_eject is only called by qdevified devices */
+assert(!blk->legacy_dev);
 
 if (bs) {
 bdrv_eject(bs, eject_flag);
+
+id = blk_get_attached_dev_id(blk);
+qapi_event_send_device_tray_moved(blk_name(blk), id,
+  eject_flag, _abort);
+g_free(id);
+
 }
 }
 
diff --git a/docs/qmp-commands.txt b/docs/qmp-commands.txt
index e0adceb..e044029 100644
--- a/docs/qmp-commands.txt
+++ b/docs/qmp-commands.txt
@@ -3239,6 +3239,7 @@ Example:
 "microseconds": 716996 },
  "event": "DEVICE_TRAY_MOVED",
  "data": { "device": "ide1-cd0",
+   "id": "ide0-1-0",
"tray-open": true } }
 
 <- { "return": {} }
@@ -3267,6 +3268,7 @@ Example:
 "microseconds": 272147 },
  "event": "DEVICE_TRAY_MOVED",
  "data": { "device": "ide1-cd0",
+   "id": "ide0-1-0",
"tray-open": false } }
 
 <- { "return": {} }
@@ -3303,6 +3305,7 @@ Example:
 "microseconds": 549958 },
  "event": "DEVICE_TRAY_MOVED",
  "data": { "device": "ide1-cd0",
+   "id": "ide0-1-0",
"tray-open": true } }
 
 <- { "return": {} }
diff --git a/docs/qmp-events.txt b/docs/qmp-events.txt
index 25307ab..4d56022 100644
--- a/docs/qmp-events.txt
+++ b/docs/qmp-events.txt
@@ -218,12 +218,16 @@ or by HMP/QMP commands.
 
 Data:
 
-- "device": device name (json-string)
+- "device": Block device name. This is always present for compatibility
+reasons, but it can be empty ("") if the image does not have a
+device name associated. (json-string)
+- "id": The name or QOM path of the guest device (json-string)
 - "tray-open": true if the tray has been opened or false if it has been closed
(json-bool)
 
 { "event": "DEVICE_TRAY_MOVED",
   "data": { "device": "ide1-cd0",
+"id": "/machine/unattached/device[22]",
   

[Qemu-block] [PATCH v2 1/1] nbd: add zero-init parameter

2016-10-05 Thread Denis V. Lunev
When using a nbd block device, the info about necessity of prior disk
zeroing could significantly improve the speed of certain operations
(e.g. backups).

This patch also will allow to preserve QCOW2 images during migration.
Management software now may specify zero-init option and thus abscent
areas in the original QCOW2 image will not be marked as zeroes in the
target image. This is tight distiction but it is here.

Signed-off-by: Denis V. Lunev 
CC: Paolo Bonzini 
CC: Kevin Wolf 
CC: Max Reitz 
---
 block/nbd.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index 6bc06d6..eed06d1 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -45,6 +45,7 @@ typedef struct BDRVNBDState {
 
 /* For nbd_refresh_filename() */
 char *path, *host, *port, *export, *tlscredsid;
+bool zero_init;
 } BDRVNBDState;
 
 static int nbd_parse_uri(const char *filename, QDict *options)
@@ -194,6 +195,7 @@ out:
 static SocketAddress *nbd_config(BDRVNBDState *s, QemuOpts *opts, Error **errp)
 {
 SocketAddress *saddr;
+const char *zero_init;
 
 s->path = g_strdup(qemu_opt_get(opts, "path"));
 s->host = g_strdup(qemu_opt_get(opts, "host"));
@@ -232,6 +234,11 @@ static SocketAddress *nbd_config(BDRVNBDState *s, QemuOpts 
*opts, Error **errp)
 
 s->export = g_strdup(qemu_opt_get(opts, "export"));
 
+zero_init = qemu_opt_get(opts, "zero-init");
+if (zero_init != NULL) {
+s->zero_init = strcmp(zero_init, "on") == 0;
+}
+
 return saddr;
 }
 
@@ -322,6 +329,11 @@ static QemuOptsList nbd_runtime_opts = {
 .type = QEMU_OPT_STRING,
 .help = "ID of the TLS credentials to use",
 },
+{
+.name = "zero-init",
+.type = QEMU_OPT_BOOL,
+.help = "Zero-initialized image flag",
+},
 },
 };
 
@@ -483,6 +495,12 @@ static void nbd_refresh_filename(BlockDriverState *bs, 
QDict *options)
 bs->full_open_options = opts;
 }
 
+static int nbd_has_zero_init(BlockDriverState *bs)
+{
+BDRVNBDState *s = bs->opaque;
+return s->zero_init;
+}
+
 static BlockDriver bdrv_nbd = {
 .format_name= "nbd",
 .protocol_name  = "nbd",
@@ -499,6 +517,7 @@ static BlockDriver bdrv_nbd = {
 .bdrv_detach_aio_context= nbd_detach_aio_context,
 .bdrv_attach_aio_context= nbd_attach_aio_context,
 .bdrv_refresh_filename  = nbd_refresh_filename,
+.bdrv_has_zero_init = nbd_has_zero_init,
 };
 
 static BlockDriver bdrv_nbd_tcp = {
@@ -517,6 +536,7 @@ static BlockDriver bdrv_nbd_tcp = {
 .bdrv_detach_aio_context= nbd_detach_aio_context,
 .bdrv_attach_aio_context= nbd_attach_aio_context,
 .bdrv_refresh_filename  = nbd_refresh_filename,
+.bdrv_has_zero_init = nbd_has_zero_init,
 };
 
 static BlockDriver bdrv_nbd_unix = {
@@ -535,6 +555,7 @@ static BlockDriver bdrv_nbd_unix = {
 .bdrv_detach_aio_context= nbd_detach_aio_context,
 .bdrv_attach_aio_context= nbd_attach_aio_context,
 .bdrv_refresh_filename  = nbd_refresh_filename,
+.bdrv_has_zero_init = nbd_has_zero_init,
 };
 
 static void bdrv_nbd_init(void)
-- 
2.5.0




[Qemu-block] [PATCH 2/2] nbd: add zero-init parameter

2016-10-05 Thread Denis V. Lunev
From: Denis Plotnikov 

When using a nbd block device, the info about necessity of prior disk
zeroing could significantly improve the speed of certain operations
(e.g. backups).

This patch also will allow to preserve QCOW2 images during migration.
Management software now may specify zero-init option and thus abscent
areas in the original QCOW2 image will not be marked as zeroes in the
target image. This is tight distiction but it is here.

Signed-off-by: Denis Plotnikov 
Signed-off-by: Denis V. Lunev 
CC: Paolo Bonzini 
CC: Kevin Wolf 
CC: Max Reitz 
---
 block/nbd.c | 67 +
 1 file changed, 63 insertions(+), 4 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 3b133ed..f9dfe57 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -36,19 +36,35 @@
 #include "qapi/qmp/qjson.h"
 #include "qapi/qmp/qint.h"
 #include "qapi/qmp/qstring.h"
+#include "qapi/qmp/qbool.h"
 #include "qemu/cutils.h"
 
 #define EN_OPTSTR "exportname"
+#define ZI_OPTSTR "zero-init"
 
-#define PATH_PARAM  (1u << 0)
+#define PATH_PARAM  (1u << 0)
+#define ZERO_INIT_PARAM (1u << 1)
 
 typedef struct BDRVNBDState {
 NbdClientSession client;
 
 /* For nbd_refresh_filename() */
 char *path, *host, *port, *export, *tlscredsid;
+bool zero_init;
 } BDRVNBDState;
 
+static bool set_option_zero_init(QDict *options, const char *value)
+{
+if (strcmp(value, "on") == 0) {
+qdict_put(options, "zero-init", qbool_from_bool(true));
+} else if (strcmp(value, "off") == 0) {
+qdict_put(options, "zero-init", qbool_from_bool(false));
+} else {
+return false;
+}
+return true;
+}
+
 /*
  * helpers for dealing with option parsing
  * to ease futher params adding and managing
@@ -69,7 +85,13 @@ static bool parse_query_params(QueryParams *qp, QDict 
*options,
 qdict_put(options, "path", qstring_from_str(param->value));
 continue;
 }
-
+if ((ZERO_INIT_PARAM & param_flags) &&
+strcmp(param->name, "zero-init") == 0) {
+if (!set_option_zero_init(options, param->value)) {
+return false;
+}
+continue;
+}
 }
 return true;
 }
@@ -123,13 +145,13 @@ static int nbd_parse_uri(const char *filename, QDict 
*options)
 qp = query_params_parse(uri->query);
 
 if (is_unix) {
-/* nbd+unix:///export?socket=path */
+/* nbd+unix:///export?socket=path{,zero-init=[on|off]} */
 if (uri->server || uri->port) {
 ret = -EINVAL;
 goto out;
 }
 
-if (!parse_query_params(qp, options, PATH_PARAM)) {
+if (!parse_query_params(qp, options, PATH_PARAM | ZERO_INIT_PARAM)) {
 ret = -EINVAL;
 goto out;
 }
@@ -160,6 +182,11 @@ static int nbd_parse_uri(const char *filename, QDict 
*options)
 ret = -EINVAL;
 goto out;
 }
+
+if (!parse_query_params(qp, options, ZERO_INIT_PARAM)) {
+ret = -EINVAL;
+goto out;
+}
 }
 
 out:
@@ -187,6 +214,8 @@ static void nbd_parse_filename(const char *filename, QDict 
*options,
 return;
 }
 
+set_option_zero_init(options, "off");
+
 if (strstr(filename, "://")) {
 int ret = nbd_parse_uri(filename, options);
 if (ret < 0) {
@@ -266,6 +295,11 @@ static void nbd_parse_filename(const char *filename, QDict 
*options,
 .type = QEMU_OPT_STRING,
 .help = "Name of the NBD export to open",
 },
+{
+.name = ZI_OPTSTR,
+.type = QEMU_OPT_BOOL,
+.help = "Zero-initialized image flag",
+},
 },
 };
 
@@ -289,6 +323,11 @@ static void nbd_parse_filename(const char *filename, QDict 
*options,
 qdict_put(options, "export", qstring_from_str(value));
 }
 
+value = qemu_opt_get(opts, ZI_OPTSTR);
+if (value) {
+qdict_put(options, ZI_OPTSTR,
+  qbool_from_bool(strcmp(value, "on") == 0));
+}
 qemu_opts_del(opts);
 }
 
@@ -337,6 +376,8 @@ static SocketAddress *nbd_config(BDRVNBDState *s, QemuOpts 
*opts, Error **errp)
 
 s->export = g_strdup(qemu_opt_get(opts, "export"));
 
+s->zero_init = strcmp(qemu_opt_get(opts, "zero-init"), "on") == 0;
+
 return saddr;
 }
 
@@ -427,6 +468,11 @@ static QemuOptsList nbd_runtime_opts = {
 .type = QEMU_OPT_STRING,
 .help = "ID of the TLS credentials to use",
 },
+{
+.name = "zero-init",
+.type = QEMU_OPT_BOOL,
+.help = "Zero-initialized image flag",
+},
 },
 };
 
@@ -585,9 +631,19 @@ static void 

Re: [Qemu-block] [PATCH] block: use bdrv_add_before_write_notifier

2016-10-05 Thread Kevin Wolf
Am 04.10.2016 um 10:49 hat Paolo Bonzini geschrieben:
> Register the notifier using the specific API for block devices.
> 
> Signed-off-by: Paolo Bonzini 

Thanks, applied to the block branch.

Kevin



Re: [Qemu-block] [PATCH 0/2] nbd: add zero-init parameter

2016-10-05 Thread Paolo Bonzini


On 05/10/2016 11:33, Denis V. Lunev wrote:
> When using a nbd block device, the info about necessity of prior disk
> zeroing could significantly improve the speed of certain operations
> (e.g. backups).
> 
> This patch also will allow to preserve QCOW2 images during migration.
> Management software now may specify zero-init option and thus abscent
> areas in the QCOW2 areas will not be marked as zeroes and will remain
> empty. This is tight distiction but it is here.

I think Kevin said he wanted to avoid the URI shortcut, and instead
preferred to use JSON specifications?

Paolo



[Qemu-block] [PATCH] Put the copyright information on a separate line

2016-10-05 Thread Thomas Huth
The output string QEMU with "--version" is very long, it does
not fit into a normal line of a terminal window anymore. By
putting the copyright information on a separate line instead,
the output looks much nicer.

Signed-off-by: Thomas Huth 
---
 Note: I'm not sure whether there is a technical or legal reason
 that the copyright information has to be on the same line as the
 version information, but if there is no such reason, I think
 a separate line looks much nicer here.

 bsd-user/main.c   | 2 +-
 linux-user/main.c | 2 +-
 qemu-img.c| 2 +-
 vl.c  | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/bsd-user/main.c b/bsd-user/main.c
index d803d3e..cd8ba64 100644
--- a/bsd-user/main.c
+++ b/bsd-user/main.c
@@ -651,7 +651,7 @@ void cpu_loop(CPUSPARCState *env)
 static void usage(void)
 {
 printf("qemu-" TARGET_NAME " version " QEMU_VERSION QEMU_PKGVERSION
-   ", " QEMU_COPYRIGHT "\n"
+   "\n" QEMU_COPYRIGHT "\n"
"usage: qemu-" TARGET_NAME " [options] program [arguments...]\n"
"BSD CPU emulator (compiled for %s emulation)\n"
"\n"
diff --git a/linux-user/main.c b/linux-user/main.c
index 9e4b430..1be24d9 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -3936,7 +3936,7 @@ static void handle_arg_strace(const char *arg)
 static void handle_arg_version(const char *arg)
 {
 printf("qemu-" TARGET_NAME " version " QEMU_VERSION QEMU_PKGVERSION
-   ", " QEMU_COPYRIGHT "\n");
+   "\n" QEMU_COPYRIGHT "\n");
 exit(EXIT_SUCCESS);
 }
 
diff --git a/qemu-img.c b/qemu-img.c
index ceffefe..7e05349 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -44,7 +44,7 @@
 #include 
 
 #define QEMU_IMG_VERSION "qemu-img version " QEMU_VERSION QEMU_PKGVERSION \
-  ", " QEMU_COPYRIGHT "\n"
+  "\n" QEMU_COPYRIGHT "\n"
 
 typedef struct img_cmd_t {
 const char *name;
diff --git a/vl.c b/vl.c
index f3abd99..4dd6399 100644
--- a/vl.c
+++ b/vl.c
@@ -1954,7 +1954,7 @@ static void main_loop(void)
 
 static void version(void)
 {
-printf("QEMU emulator version " QEMU_VERSION QEMU_PKGVERSION ", "
+printf("QEMU emulator version " QEMU_VERSION QEMU_PKGVERSION "\n"
QEMU_COPYRIGHT "\n");
 }
 
-- 
1.8.3.1




[Qemu-block] [PATCH 0/3] block: Extend QMP events for anonymous BlockBackends

2016-10-05 Thread Kevin Wolf
We have two QMP events that include only a BlockBackend name and that are
therefore useless with anonymous BBs. This series adds a node-name/qdev ID to
them.

Kevin Wolf (3):
  block: Add node name to BLOCK_IO_ERROR event
  block-backend: Remember if attached device is non-qdev
  block: Add qdev ID to DEVICE_TRAY_MOVED

 block.c|  7 -
 block/block-backend.c  | 64 +++---
 docs/qmp-commands.txt  |  3 ++
 docs/qmp-events.txt| 12 ++--
 hw/block/xen_disk.c|  2 +-
 include/sysemu/block-backend.h |  4 +--
 qapi/block-core.json   |  8 --
 qapi/block.json|  8 --
 8 files changed, 82 insertions(+), 26 deletions(-)

-- 
1.8.3.1




Re: [Qemu-block] [PATCH] Put the copyright information on a separate line

2016-10-05 Thread Kevin Wolf
Am 05.10.2016 um 11:54 hat Thomas Huth geschrieben:
> The output string QEMU with "--version" is very long, it does
> not fit into a normal line of a terminal window anymore. By
> putting the copyright information on a separate line instead,
> the output looks much nicer.
> 
> Signed-off-by: Thomas Huth 

Acked-by: Kevin Wolf 

Is this something for qemu-trivial?

Kevin



Re: [Qemu-block] [PATCH v2 1/1] nbd: add zero-init parameter

2016-10-05 Thread Kevin Wolf
Am 05.10.2016 um 12:57 hat Denis V. Lunev geschrieben:
> When using a nbd block device, the info about necessity of prior disk
> zeroing could significantly improve the speed of certain operations
> (e.g. backups).
> 
> This patch also will allow to preserve QCOW2 images during migration.
> Management software now may specify zero-init option and thus abscent
> areas in the original QCOW2 image will not be marked as zeroes in the
> target image. This is tight distiction but it is here.
> 
> Signed-off-by: Denis V. Lunev 
> CC: Paolo Bonzini 
> CC: Kevin Wolf 
> CC: Max Reitz 
> ---
>  block/nbd.c | 21 +
>  1 file changed, 21 insertions(+)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index 6bc06d6..eed06d1 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -45,6 +45,7 @@ typedef struct BDRVNBDState {
>  
>  /* For nbd_refresh_filename() */
>  char *path, *host, *port, *export, *tlscredsid;
> +bool zero_init;
>  } BDRVNBDState;
>  
>  static int nbd_parse_uri(const char *filename, QDict *options)
> @@ -194,6 +195,7 @@ out:
>  static SocketAddress *nbd_config(BDRVNBDState *s, QemuOpts *opts, Error 
> **errp)
>  {
>  SocketAddress *saddr;
> +const char *zero_init;
>  
>  s->path = g_strdup(qemu_opt_get(opts, "path"));
>  s->host = g_strdup(qemu_opt_get(opts, "host"));
> @@ -232,6 +234,11 @@ static SocketAddress *nbd_config(BDRVNBDState *s, 
> QemuOpts *opts, Error **errp)
>  
>  s->export = g_strdup(qemu_opt_get(opts, "export"));
>  
> +zero_init = qemu_opt_get(opts, "zero-init");
> +if (zero_init != NULL) {
> +s->zero_init = strcmp(zero_init, "on") == 0;
> +}

We don't need a local char* zero_init and parse it manually:

s->zero_init = qemu_opt_get_bool(opts, "zero-init", false);

Much easier.

> +
>  return saddr;
>  }
>  
> @@ -322,6 +329,11 @@ static QemuOptsList nbd_runtime_opts = {
>  .type = QEMU_OPT_STRING,
>  .help = "ID of the TLS credentials to use",
>  },
> +{
> +.name = "zero-init",
> +.type = QEMU_OPT_BOOL,
> +.help = "Zero-initialized image flag",

We can be a bit more descriptive here:

"If enabled, assume the image to be completely zeroed on startup"

(Or maybe you can think of something better than that.)

> +},
>  },
>  };
>  
> @@ -483,6 +495,12 @@ static void nbd_refresh_filename(BlockDriverState *bs, 
> QDict *options)
>  bs->full_open_options = opts;
>  }
>  
> +static int nbd_has_zero_init(BlockDriverState *bs)
> +{
> +BDRVNBDState *s = bs->opaque;
> +return s->zero_init;
> +}
> +
>  static BlockDriver bdrv_nbd = {
>  .format_name= "nbd",
>  .protocol_name  = "nbd",
> @@ -499,6 +517,7 @@ static BlockDriver bdrv_nbd = {
>  .bdrv_detach_aio_context= nbd_detach_aio_context,
>  .bdrv_attach_aio_context= nbd_attach_aio_context,
>  .bdrv_refresh_filename  = nbd_refresh_filename,
> +.bdrv_has_zero_init = nbd_has_zero_init,
>  };
>  
>  static BlockDriver bdrv_nbd_tcp = {
> @@ -517,6 +536,7 @@ static BlockDriver bdrv_nbd_tcp = {
>  .bdrv_detach_aio_context= nbd_detach_aio_context,
>  .bdrv_attach_aio_context= nbd_attach_aio_context,
>  .bdrv_refresh_filename  = nbd_refresh_filename,
> +.bdrv_has_zero_init = nbd_has_zero_init,
>  };
>  
>  static BlockDriver bdrv_nbd_unix = {
> @@ -535,6 +555,7 @@ static BlockDriver bdrv_nbd_unix = {
>  .bdrv_detach_aio_context= nbd_detach_aio_context,
>  .bdrv_attach_aio_context= nbd_attach_aio_context,
>  .bdrv_refresh_filename  = nbd_refresh_filename,
> +.bdrv_has_zero_init = nbd_has_zero_init,
>  };

I believe nbd_refresh_filename() needs an update, too.

Kevin



Re: [Qemu-block] [PATCH 1/2] nbd: change option parsing scheme

2016-10-05 Thread Paolo Bonzini
I reviewed this patch before noticing that the overall idea is not what
Kevin suggested, so I'm sending it out anyway.  Further comments from
Kevin and Max might come since I am not familiar with the current
conventions on parsing block device options.

On 05/10/2016 11:33, Denis V. Lunev wrote:
> From: Denis Plotnikov 
> 
> This is a preparatory commit to make code more generic. We are going to
> add more options in the next patch.
> 
> Signed-off-by: Denis Plotnikov 
> Signed-off-by: Denis V. Lunev 
> CC: Paolo Bonzini 
> CC: Kevin Wolf 
> CC: Max Reitz 
> ---
>  block/nbd.c | 143 
> 
>  1 file changed, 124 insertions(+), 19 deletions(-)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index 6bc06d6..3b133ed 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -38,7 +38,9 @@
>  #include "qapi/qmp/qstring.h"
>  #include "qemu/cutils.h"
>  
> -#define EN_OPTSTR ":exportname="
> +#define EN_OPTSTR "exportname"

Please replace the #define with the "exportname" string; same for
ZI_OPTSTR in patch 2.

> +#define PATH_PARAM  (1u << 0)
>  
>  typedef struct BDRVNBDState {
>  NbdClientSession client;
> @@ -47,6 +49,46 @@ typedef struct BDRVNBDState {
>  char *path, *host, *port, *export, *tlscredsid;
>  } BDRVNBDState;
>  
> +/*
> + * helpers for dealing with option parsing
> + * to ease futher params adding and managing
> + */
> +
> +/*
> + *  @param_flags - bit flags defining a set of param names to be parsed
> + */
> +static bool parse_query_params(QueryParams *qp, QDict *options,
> +   unsigned int param_flags)
> +{
> +int i;
> +for (i = 0; i < qp->n; i++) {
> +QueryParam *param = >p[i];
> +
> +if ((PATH_PARAM & param_flags) &&
> +strcmp(param->name, "socket") == 0) {
> +qdict_put(options, "path", qstring_from_str(param->value));
> +continue;
> +}
> +
> +}
> +return true;
> +}

Please remove the param_flags argument.  In patch 2 you do:

if (find_prohibited_params(qp, PATH_PARAM)) {
ret = -EINVAL;
goto out;
}
if (!parse_query_params(qp, options, ZERO_INIT_PARAM)) {
ret = -EINVAL;
goto out;
}

Because you've filtered out the socket parameter, it's okay if
parse_query_params parses it always.

Also, please change nbd_parse_uri to return errors via Error**.  Then
parse_query_params can do the same, and we get better error messages.

> +
> +static bool find_prohibited_params(QueryParams *qp, unsigned int param_flags)
> +{
> +int i;
> +for (i = 0; i < qp->n; i++) {
> +QueryParam *param = >p[i];
> +
> +if ((PATH_PARAM & param_flags) &&
> +strcmp(param->name, "socket") == 0) {
> +return true;
> +}
> +}
> +return false;
> +}

Please change this function to something like

static QueryParam *find_query_param(QueryParams *qp, const char *name)

> +if (!parse_query_params(qp, options, PATH_PARAM)) {
>  ret = -EINVAL;
>  goto out;
>  }
> -qdict_put(options, "path", qstring_from_str(qp->p[0].value));
>  } else {
>  QString *host;
>  /* nbd[+tcp]://host[:port]/export */
> @@ -113,6 +155,11 @@ static int nbd_parse_uri(const char *filename, QDict 
> *options)
>  qdict_put(options, "port", qstring_from_str(port_str));
>  g_free(port_str);
>  }
> +
> +if (find_prohibited_params(qp, PATH_PARAM)) {
> +ret = -EINVAL;
> +goto out;
> +}

As mentioned above, using Error** will let you return a better error
message, such as "The 'socket' parameter is only valid for NBD over Unix
domain sockets".

>  }
>  
>  out:
> @@ -127,7 +174,7 @@ static void nbd_parse_filename(const char *filename, 
> QDict *options,
> Error **errp)
>  {
>  char *file;
> -char *export_name;
> +char *opt_str;
>  const char *host_spec;
>  const char *unixpath;
>  
> @@ -150,17 +197,6 @@ static void nbd_parse_filename(const char *filename, 
> QDict *options,
>  
>  file = g_strdup(filename);
>  
> -export_name = strstr(file, EN_OPTSTR);
> -if (export_name) {
> -if (export_name[strlen(EN_OPTSTR)] == 0) {
> -goto out;
> -}
> -export_name[0] = 0; /* truncate 'file' */
> -export_name += strlen(EN_OPTSTR);
> -
> -qdict_put(options, "export", qstring_from_str(export_name));
> -}
> -
>  /* extract the host_spec - fail if it's not nbd:... */
>  if (!strstart(file, "nbd:", _spec)) {
>  error_setg(errp, "File name string for NBD must start with 'nbd:'");
> @@ -173,8 +209,40 @@ static void nbd_parse_filename(const char *filename, 
> QDict *options,
>  
>  /* are 

Re: [Qemu-block] [Qemu-devel] [PATCH] Put the copyright information on a separate line

2016-10-05 Thread Eric Blake
On 10/05/2016 04:54 AM, Thomas Huth wrote:
> The output string QEMU with "--version" is very long, it does
> not fit into a normal line of a terminal window anymore. By
> putting the copyright information on a separate line instead,
> the output looks much nicer.
> 
> Signed-off-by: Thomas Huth 
> ---
>  Note: I'm not sure whether there is a technical or legal reason
>  that the copyright information has to be on the same line as the
>  version information, but if there is no such reason, I think
>  a separate line looks much nicer here.

There's precedent for separate lines of information (to wit, 'ls
--version' is the first thing I compared against), so I doubt there's
any legal reason (at any rate, if legality were an issue, I'm sure that
more programs would have run into that reason and adjusted to be
single-line output).  The only technical reason for a long line would be
for machine parsing, but we recommend (and libvirt uses) QMP for
machine-parseable version number information, rather than scraping
--version.  So I'm in favor of this.

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH v2 1/1] nbd: add zero-init parameter

2016-10-05 Thread Eric Blake
On 10/05/2016 05:57 AM, Denis V. Lunev wrote:
> When using a nbd block device, the info about necessity of prior disk
> zeroing could significantly improve the speed of certain operations
> (e.g. backups).
> 
> This patch also will allow to preserve QCOW2 images during migration.

'allow to' is not idiomatic English; and your placement of 'also' before
'will' is awkward. You want either of:

This patch will also allow preservation of QCOW2 images...
This patch will also preserve QCOW2 images...

> Management software now may specify zero-init option and thus abscent

s/now may/may now/
s/abscent/absent/

> areas in the original QCOW2 image will not be marked as zeroes in the
> target image. This is tight distiction but it is here.

s/distiction/distinction/

> 
> Signed-off-by: Denis V. Lunev 
> CC: Paolo Bonzini 
> CC: Kevin Wolf 
> CC: Max Reitz 
> ---
>  block/nbd.c | 21 +
>  1 file changed, 21 insertions(+)
> 

> @@ -232,6 +234,11 @@ static SocketAddress *nbd_config(BDRVNBDState *s, 
> QemuOpts *opts, Error **errp)
>  
>  s->export = g_strdup(qemu_opt_get(opts, "export"));
>  
> +zero_init = qemu_opt_get(opts, "zero-init");
> +if (zero_init != NULL) {
> +s->zero_init = strcmp(zero_init, "on") == 0;

As Kevin pointed out, a manual parse makes the command line less
consistent; reusing the common parser also means that you will support
things like zero-init=true for free.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v2 02/11] blockjob: centralize QMP event emissions

2016-10-05 Thread Kevin Wolf
Am 01.10.2016 um 00:00 hat John Snow geschrieben:
> There's no reason to leave this to blockdev; we can do it in blockjobs
> directly and get rid of an extra callback for most users.
> 
> Signed-off-by: John Snow 
> ---
>  blockdev.c | 37 ++---
>  blockjob.c | 16 ++--
>  2 files changed, 20 insertions(+), 33 deletions(-)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 29c6561..03200e7 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -2957,31 +2957,6 @@ out:
>  aio_context_release(aio_context);
>  }
>  
> -static void block_job_cb(void *opaque, int ret)
> -{
> -/* Note that this function may be executed from another AioContext 
> besides
> - * the QEMU main loop.  If you need to access anything that assumes the
> - * QEMU global mutex, use a BH or introduce a mutex.
> - */
> -
> -BlockDriverState *bs = opaque;
> -const char *msg = NULL;
> -
> -trace_block_job_cb(bs, bs->job, ret);

This trace event is removed from the code, but not from trace-events.

> -
> -assert(bs->job);
> -
> -if (ret < 0) {
> -msg = strerror(-ret);
> -}
> -
> -if (block_job_is_cancelled(bs->job)) {
> -block_job_event_cancelled(bs->job);
> -} else {
> -block_job_event_completed(bs->job, msg);

block_job_event_cancelled/completed can become static now.

> -}
> -}
> -
>  void qmp_block_stream(bool has_job_id, const char *job_id, const char 
> *device,
>bool has_base, const char *base,
>bool has_backing_file, const char *backing_file,
> @@ -3033,7 +3008,7 @@ void qmp_block_stream(bool has_job_id, const char 
> *job_id, const char *device,
>  base_name = has_backing_file ? backing_file : base_name;
>  
>  stream_start(has_job_id ? job_id : NULL, bs, base_bs, base_name,
> - has_speed ? speed : 0, on_error, block_job_cb, bs, 
> _err);
> + has_speed ? speed : 0, on_error, NULL, bs, _err);

Passing cb == NULL, but opaque != NULL is harmless, but feels odd.

And actually this is the only caller of stream_start, so the parameters
could just be dropped.

>  if (local_err) {
>  error_propagate(errp, local_err);
>  goto out;
> @@ -3136,10 +3111,10 @@ void qmp_block_commit(bool has_job_id, const char 
> *job_id, const char *device,
>  goto out;
>  }
>  commit_active_start(has_job_id ? job_id : NULL, bs, base_bs, speed,
> -on_error, block_job_cb, bs, _err, false);
> +on_error, NULL, bs, _err, false);

Here we have an additional caller in block/replication.c and qemu-img,
so the parameters must stay. For qemu-img, nothing changes. For
replication, the block job events are added as a side effect.

Not sure if we want to emit such events for an internal block job, but
if we do want the change, it should be explicit.

>  } else {
>  commit_start(has_job_id ? job_id : NULL, bs, base_bs, top_bs, speed,
> - on_error, block_job_cb, bs,
> + on_error, NULL, bs,
>   has_backing_file ? backing_file : NULL, _err);

Like stream_start, drop the parameters.

>  }
>  if (local_err != NULL) {
> @@ -3260,7 +3235,7 @@ static void do_drive_backup(DriveBackup *backup, 
> BlockJobTxn *txn, Error **errp)
>  
>  backup_start(backup->job_id, bs, target_bs, backup->speed, backup->sync,
>   bmap, backup->compress, backup->on_source_error,
> - backup->on_target_error, block_job_cb, bs, txn, _err);
> + backup->on_target_error, NULL, bs, txn, _err);
>  bdrv_unref(target_bs);
>  if (local_err != NULL) {
>  error_propagate(errp, local_err);
> @@ -3330,7 +3305,7 @@ void do_blockdev_backup(BlockdevBackup *backup, 
> BlockJobTxn *txn, Error **errp)
>  }
>  backup_start(backup->job_id, bs, target_bs, backup->speed, backup->sync,
>   NULL, backup->compress, backup->on_source_error,
> - backup->on_target_error, block_job_cb, bs, txn, _err);
> + backup->on_target_error, NULL, bs, txn, _err);

Backup is another job used by replication, too. Same question as above.

>  if (local_err != NULL) {
>  error_propagate(errp, local_err);
>  }
> @@ -3410,7 +3385,7 @@ static void blockdev_mirror_common(const char *job_id, 
> BlockDriverState *bs,
>   has_replaces ? replaces : NULL,
>   speed, granularity, buf_size, sync, backing_mode,
>   on_source_error, on_target_error, unmap,
> - block_job_cb, bs, errp);
> + NULL, bs, errp);

And again, the parameters can be dropped.

Kevin



Re: [Qemu-block] [PATCH v2 01/11] blockjob: fix dead pointer in txn list

2016-10-05 Thread Kevin Wolf
Am 01.10.2016 um 00:00 hat John Snow geschrieben:
> From: Vladimir Sementsov-Ogievskiy 
> 
> Though it is not intended to be reached through normal circumstances,
> if we do not gracefully deconstruct the transaction QLIST, we may wind
> up with stale pointers in the list.
> 
> The rest of this series attempts to address the underlying issues,
> but this should fix list inconsistencies.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> Tested-by: John Snow 
> Reviewed-by: John Snow 
> [Rewrote commit message. --js]
> Signed-off-by: John Snow 
> Reviewed-by: Eric Blake 
> 
> Signed-off-by: John Snow 

Reviewed-by: Kevin Wolf 



Re: [Qemu-block] [PATCH] block: use bdrv_add_before_write_notifier

2016-10-05 Thread Stefan Hajnoczi
On Tue, Oct 04, 2016 at 10:49:43AM +0200, Paolo Bonzini wrote:
> Register the notifier using the specific API for block devices.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  block/write-threshold.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [Qemu-block] [Qemu-devel] backup notifier fail policy

2016-10-05 Thread Stefan Hajnoczi
On Wed, Oct 05, 2016 at 10:12:57AM +0200, Kevin Wolf wrote:
> Am 04.10.2016 um 18:02 hat Stefan Hajnoczi geschrieben:
> > On Tue, Oct 04, 2016 at 01:55:30PM +0200, Kevin Wolf wrote:
> > > Am 04.10.2016 um 12:41 hat Denis V. Lunev geschrieben:
> > > > On 10/04/2016 12:34 PM, Kevin Wolf wrote:
> > > > > Am 04.10.2016 um 11:23 hat Stefan Hajnoczi geschrieben:
> > > > >> On Mon, Oct 03, 2016 at 02:07:34PM -0400, John Snow wrote:
> > > > >>> Eh, regardless: If we're not using a STOP policy, it seems like the 
> > > > >>> right
> > > > >>> thing to do is definitely to just fail the backup instead of 
> > > > >>> failing the
> > > > >>> write.
> > > > >> Even with a -drive werror=stop policy the user probably doesn't want
> > > > >> guest downtime if writing to the backup target fails.
> > > > > That's a policy decision that ultimately only the user can make. For 
> > > > > one
> > > > > user, it might be preferable to cancel the backup and keep the VM
> > > > > running, but for another user it may be more important to keep a
> > > > > consistent snapshot of the point in time when the backup job was 
> > > > > started
> > > > > than keeping the VM running.
> > > > >
> > > > > Kevin
> > > > In this case policy for guest error and policy for backup
> > > > error should be different policies or I have missed something.
> > > 
> > > I guess so.
> > 
> > There are separate error policies for -device and the blockjob.  Perhaps
> > the blockjob error policy can be used in the write notifier code path if
> > the failure occurs while writing to the backup target.
> 
> Isn't the block job policy used for the background copy? Or do you think
> it's okay to use the same for both cases? That would mean that we stop
> the VM even if it's just a background copy that fails.

You're right, strictly speaking the guest notifier code path is a
separate case and should have a dedicated parameter.  At least if we
want to allow users to select from all possible policy combinations...

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH v2 05/11] blockjobs: split interface into public/private

2016-10-05 Thread John Snow



On 10/05/2016 10:17 AM, Kevin Wolf wrote:

Am 01.10.2016 um 00:00 hat John Snow geschrieben:

To make it a little more obvious which functions are intended to be
public interface and which are intended to be for use only by jobs
themselves, split the interface into "public" and "private" files.

Convert blockjobs (e.g. block/backup) to using the private interface.
Leave blockdev and others on the public interface.

Give up and let qemu-img use the internal interface, though it doesn't
strictly need to be using it.

As a side-effect, hide the BlockJob and BlockJobDriver implementation
from most of the QEMU codebase.

Signed-off-by: John Snow 



--- a/block/replication.c
+++ b/block/replication.c
@@ -15,7 +15,7 @@
 #include "qemu/osdep.h"
 #include "qemu-common.h"
 #include "block/nbd.h"
-#include "block/blockjob.h"
+#include "block/blockjob_int.h"
 #include "block/block_int.h"
 #include "block/block_backup.h"
 #include "sysemu/block-backend.h"


This one is wrong. replication.c is a block job user, not part of the
implementation. After removing this hunk, the result still compiles.



Sorry, this is a mistake from an intermediate form of the patch.


--- a/include/block/block.h
+++ b/include/block/block.h
@@ -7,16 +7,15 @@
 #include "qemu/coroutine.h"
 #include "block/accounting.h"
 #include "block/dirty-bitmap.h"
+#include "block/blockjob.h"
 #include "qapi/qmp/qobject.h"
 #include "qapi-types.h"
 #include "qemu/hbitmap.h"


Hm... This header file doesn't need any of the definitions from
blockjob.h, so I guess .c files should include it explicitly rather than
getting it from here. That would be a separate patch, though.



OK. The reason this happened was because we used to have the struct 
definitions in block_int.h; and by moving those to a Blockjob-specific 
header, I needed to reintroduce those definitions somehow. I reasoned 
that BlockJobs were part of the public interface for the Block layer, so 
I added it here.


I can remove it and include the public blockjob interface only where 
specifically required if desired, though.



 /* block.c */
 typedef struct BlockDriver BlockDriver;
-typedef struct BlockJob BlockJob;
 typedef struct BdrvChild BdrvChild;
 typedef struct BdrvChildRole BdrvChildRole;
-typedef struct BlockJobTxn BlockJobTxn;

 typedef struct BlockDriverInfo {
 /* in bytes, 0 if irrelevant */



--- a/qemu-img.c
+++ b/qemu-img.c
@@ -37,7 +37,7 @@
 #include "sysemu/sysemu.h"
 #include "sysemu/block-backend.h"
 #include "block/block_int.h"
-#include "block/blockjob.h"
+#include "block/blockjob_int.h"
 #include "block/qapi.h"
 #include "crypto/init.h"
 #include "trace/control.h"


qemu-img doesn't compile without including blockjob_int.h, but that's
just a sign that the split isn't completely right yet. Maybe it needs to
use the pulic function block_job_query() to get the information it takes
directly from the BlockJob struct.

Kevin



Yes, you caught me. I'd need to spend a little more time shining up 
qemu-img, which just takes time away from that FDC rev-- I'm kidding. 
I'll fix this up at your request.


--js



[Qemu-block] [PATCH 2/2] quorum: do not allocate multiple iovecs for FIFO strategy

2016-10-05 Thread Paolo Bonzini
In FIFO mode there are no parallel reads, hence there is no need to
allocate separate buffers and clone the iovecs.

The two cases of quorum_aio_cb are now even more different, and
most of quorum_aio_finalize is only needed in one of them, so split
them in separate functions.

Signed-off-by: Paolo Bonzini 
---
 block/quorum.c | 79 +-
 1 file changed, 40 insertions(+), 39 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index 435296e..d122299 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -156,21 +156,7 @@ static AIOCBInfo quorum_aiocb_info = {
 
 static void quorum_aio_finalize(QuorumAIOCB *acb)
 {
-int i, ret = 0;
-
-if (acb->vote_ret) {
-ret = acb->vote_ret;
-}
-
-acb->common.cb(acb->common.opaque, ret);
-
-if (acb->is_read) {
-for (i = 0; i < acb->children_read; i++) {
-qemu_vfree(acb->qcrs[i].buf);
-qemu_iovec_destroy(>qcrs[i].qiov);
-}
-}
-
+acb->common.cb(acb->common.opaque, acb->vote_ret);
 g_free(acb->qcrs);
 qemu_aio_unref(acb);
 }
@@ -282,38 +268,52 @@ static void quorum_copy_qiov(QEMUIOVector *dest, 
QEMUIOVector *source)
 }
 }
 
-static void quorum_aio_cb(void *opaque, int ret)
+static void quorum_report_bad_acb(QuorumChildRequest *sacb, int ret)
+{
+QuorumAIOCB *acb = sacb->parent;
+QuorumOpType type = acb->is_read ? QUORUM_OP_TYPE_READ : 
QUORUM_OP_TYPE_WRITE;
+quorum_report_bad(type, acb->sector_num, acb->nb_sectors,
+  sacb->aiocb->bs->node_name, ret);
+}
+
+static void quorum_fifo_aio_cb(void *opaque, int ret)
 {
 QuorumChildRequest *sacb = opaque;
 QuorumAIOCB *acb = sacb->parent;
 BDRVQuorumState *s = acb->common.bs->opaque;
-bool rewrite = false;
 
-if (ret == 0) {
-acb->success_count++;
-} else {
-QuorumOpType type;
-type = acb->is_read ? QUORUM_OP_TYPE_READ : QUORUM_OP_TYPE_WRITE;
-quorum_report_bad(type, acb->sector_num, acb->nb_sectors,
-  sacb->aiocb->bs->node_name, ret);
-}
+assert(acb->is_read && s->read_pattern == QUORUM_READ_PATTERN_FIFO);
+
+if (ret < 0) {
+quorum_report_bad_acb(sacb, ret);
 
-if (acb->is_read && s->read_pattern == QUORUM_READ_PATTERN_FIFO) {
 /* We try to read next child in FIFO order if we fail to read */
-if (ret < 0 && acb->children_read < s->num_children) {
+if (acb->children_read < s->num_children) {
 read_fifo_child(acb);
 return;
 }
-
-if (ret == 0) {
-quorum_copy_qiov(acb->qiov, >qcrs[acb->children_read - 
1].qiov);
-}
-acb->vote_ret = ret;
-quorum_aio_finalize(acb);
-return;
 }
 
+acb->vote_ret = ret;
+
+/* FIXME: rewrite failed children if acb->children_read > 1? */
+quorum_aio_finalize(acb);
+}
+
+static void quorum_aio_cb(void *opaque, int ret)
+{
+QuorumChildRequest *sacb = opaque;
+QuorumAIOCB *acb = sacb->parent;
+BDRVQuorumState *s = acb->common.bs->opaque;
+bool rewrite = false;
+int i;
+
 sacb->ret = ret;
+if (ret == 0) {
+acb->success_count++;
+} else {
+quorum_report_bad_acb(sacb, ret);
+}
 acb->count++;
 assert(acb->count <= s->num_children);
 assert(acb->success_count <= s->num_children);
@@ -324,6 +324,10 @@ static void quorum_aio_cb(void *opaque, int ret)
 /* Do the vote on read */
 if (acb->is_read) {
 rewrite = quorum_vote(acb);
+for (i = 0; i < s->num_children; i++) {
+qemu_vfree(acb->qcrs[i].buf);
+qemu_iovec_destroy(>qcrs[i].qiov);
+}
 } else {
 quorum_has_too_much_io_failed(acb);
 }
@@ -672,12 +676,9 @@ static BlockAIOCB *read_fifo_child(QuorumAIOCB *acb)
 BDRVQuorumState *s = acb->common.bs->opaque;
 int n = acb->children_read++;
 
-acb->qcrs[n].buf = qemu_blockalign(s->children[n]->bs, acb->qiov->size);
-qemu_iovec_init(>qcrs[n].qiov, acb->qiov->niov);
-qemu_iovec_clone(>qcrs[n].qiov, acb->qiov, acb->qcrs[n].buf);
 acb->qcrs[n].aiocb = bdrv_aio_readv(s->children[n], acb->sector_num,
->qcrs[n].qiov, acb->nb_sectors,
-quorum_aio_cb, >qcrs[n]);
+acb->qiov, acb->nb_sectors,
+quorum_fifo_aio_cb, >qcrs[n]);
 
 return >common;
 }
-- 
2.7.4




[Qemu-block] [PATCH] blockjob: introduce .drain callback for jobs

2016-10-05 Thread Paolo Bonzini
This is required to decouple block jobs from running in an
AioContext.  With multiqueue block devices, a BlockDriverState
does not really belong to a single AioContext.

The solution is to first wait until all I/O operations are
complete; then loop in the main thread for the block job to
complete entirely.

Signed-off-by: Paolo Bonzini 
---
This patch comes from the multiqueue series.  It is self-contained
so I'm sending it now; it achieves the purpose of getting rid of
block_job_get_aio_context, so that any other use of it would
be more visible.  But if you all think it's premature, I'm okay
with keeping it out some more.

 block/backup.c   | 16 
 block/mirror.c   | 34 ++
 blockjob.c   | 37 -
 include/block/blockjob.h |  7 +++
 4 files changed, 69 insertions(+), 25 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 582bd0f..0350cfc 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -300,6 +300,20 @@ void backup_cow_request_end(CowRequest *req)
 cow_request_end(req);
 }
 
+static void backup_drain(BlockJob *job)
+{
+BackupBlockJob *s = container_of(job, BackupBlockJob, common);
+
+/* Need to keep a reference in case blk_drain triggers execution
+ * of backup_complete...
+ */
+if (s->target) {
+blk_ref(s->target);
+blk_drain(s->target);
+blk_unref(s->target);
+}
+}
+
 static const BlockJobDriver backup_job_driver = {
 .instance_size  = sizeof(BackupBlockJob),
 .job_type   = BLOCK_JOB_TYPE_BACKUP,
@@ -307,6 +321,7 @@ static const BlockJobDriver backup_job_driver = {
 .commit = backup_commit,
 .abort  = backup_abort,
 .attached_aio_context   = backup_attached_aio_context,
+.drain  = backup_drain,
 };
 
 static BlockErrorAction backup_error_action(BackupBlockJob *job,
@@ -331,6 +346,7 @@ static void backup_complete(BlockJob *job, void *opaque)
 BackupCompleteData *data = opaque;
 
 blk_unref(s->target);
+s->target = NULL;
 
 block_job_completed(job, data->ret);
 g_free(data);
diff --git a/block/mirror.c b/block/mirror.c
index a0ac81a..4cd6bab2 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -471,7 +471,11 @@ static void mirror_free_init(MirrorBlockJob *s)
 }
 }
 
-static void mirror_drain(MirrorBlockJob *s)
+/* This is also used for the .pause callback. There is no matching
+ * mirror_resume() because mirror_run() will begin iterating again
+ * when the job is resumed.
+ */
+static void mirror_wait_for_all_io(MirrorBlockJob *s)
 {
 while (s->in_flight > 0) {
 mirror_wait_for_io(s);
@@ -530,6 +534,7 @@ static void mirror_exit(BlockJob *job, void *opaque)
 g_free(s->replaces);
 bdrv_op_unblock_all(target_bs, s->common.blocker);
 blk_unref(s->target);
+s->target = NULL;
 block_job_completed(>common, data->ret);
 g_free(data);
 bdrv_drained_end(src);
@@ -584,7 +589,7 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
 sector_num += nb_sectors;
 }
 
-mirror_drain(s);
+mirror_wait_for_all_io(s);
 }
 
 /* First part, loop on the sectors and initialize the dirty bitmap.  */
@@ -802,7 +807,7 @@ immediate_exit:
  */
 assert(ret < 0 || (!s->synced && block_job_is_cancelled(>common)));
 assert(need_drain);
-mirror_drain(s);
+mirror_wait_for_all_io(s);
 }
 
 assert(s->in_flight == 0);
@@ -887,14 +892,11 @@ static void mirror_complete(BlockJob *job, Error **errp)
 block_job_enter(>common);
 }
 
-/* There is no matching mirror_resume() because mirror_run() will begin
- * iterating again when the job is resumed.
- */
-static void coroutine_fn mirror_pause(BlockJob *job)
+static void mirror_pause(BlockJob *job)
 {
 MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
 
-mirror_drain(s);
+mirror_wait_for_all_io(s);
 }
 
 static void mirror_attached_aio_context(BlockJob *job, AioContext *new_context)
@@ -904,6 +906,20 @@ static void mirror_attached_aio_context(BlockJob *job, 
AioContext *new_context)
 blk_set_aio_context(s->target, new_context);
 }
 
+static void mirror_drain(BlockJob *job)
+{
+MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
+
+/* Need to keep a reference in case blk_drain triggers execution
+ * of mirror_complete...
+ */
+if (s->target) {
+blk_ref(s->target);
+blk_drain(s->target);
+blk_unref(s->target);
+}
+}
+
 static const BlockJobDriver mirror_job_driver = {
 .instance_size  = sizeof(MirrorBlockJob),
 .job_type   = BLOCK_JOB_TYPE_MIRROR,
@@ -911,6 +927,7 @@ static const BlockJobDriver mirror_job_driver = {
 .complete   = mirror_complete,
 .pause  = 

[Qemu-block] [PATCH] block: change drain to look only at one child at a time

2016-10-05 Thread Paolo Bonzini
bdrv_requests_pending is checking children to also wait until internal
requests (such as metadata writes) have completed.  However, checking
children is in general overkill.  Children requests can be of two kinds:

- requests caused by an operation on bs, e.g. a bdrv_aio_write to bs
causing a write to bs->file->bs.  In this case, the parent's in_flight
count will always be incremented by at least one for every request in
the child.

- asynchronous metadata writes or flushes.  Such writes can be started
even if bs's in_flight count is zero, but not after the .bdrv_drain
callback has been invoked.

This patch therefore changes bdrv_drain to finish I/O in the parent
(after which the parent's in_flight will be locked to zero), call
bdrv_drain (after which the parent will not generate I/O on the child
anymore), and then wait for internal I/O in the children to complete.

Signed-off-by: Paolo Bonzini 
---
 block/io.c  | 54 --
 block/qed.c | 16 +++-
 2 files changed, 51 insertions(+), 19 deletions(-)

diff --git a/block/io.c b/block/io.c
index f81c884..b94edec 100644
--- a/block/io.c
+++ b/block/io.c
@@ -156,16 +156,33 @@ bool bdrv_requests_pending(BlockDriverState *bs)
 return false;
 }
 
-static void bdrv_drain_recurse(BlockDriverState *bs)
+static bool bdrv_drain_poll(BlockDriverState *bs)
+{
+bool waited = false;
+
+while (atomic_read(>in_flight) > 0) {
+aio_poll(bdrv_get_aio_context(bs), true);
+waited = true;
+}
+return waited;
+}
+
+static bool bdrv_drain_io_recurse(BlockDriverState *bs)
 {
 BdrvChild *child;
+bool waited;
+
+waited = bdrv_drain_poll(bs);
 
 if (bs->drv && bs->drv->bdrv_drain) {
 bs->drv->bdrv_drain(bs);
 }
+
 QLIST_FOREACH(child, >children, next) {
-bdrv_drain_recurse(child->bs);
+waited |= bdrv_drain_io_recurse(child->bs);
 }
+
+return waited;
 }
 
 typedef struct {
@@ -174,14 +191,6 @@ typedef struct {
 bool done;
 } BdrvCoDrainData;
 
-static void bdrv_drain_poll(BlockDriverState *bs)
-{
-while (bdrv_requests_pending(bs)) {
-/* Keep iterating */
-aio_poll(bdrv_get_aio_context(bs), true);
-}
-}
-
 static void bdrv_co_drain_bh_cb(void *opaque)
 {
 BdrvCoDrainData *data = opaque;
@@ -215,6 +224,20 @@ static void coroutine_fn 
bdrv_co_yield_to_drain(BlockDriverState *bs)
 assert(data.done);
 }
 
+static void bdrv_co_drain_io_recurse(BlockDriverState *bs)
+{
+BdrvChild *child;
+
+bdrv_co_yield_to_drain(bs);
+if (bs->drv && bs->drv->bdrv_drain) {
+bs->drv->bdrv_drain(bs);
+}
+
+QLIST_FOREACH(child, >children, next) {
+bdrv_co_drain_io_recurse(child->bs);
+}
+}
+
 void bdrv_drained_begin(BlockDriverState *bs)
 {
 if (!bs->quiesce_counter++) {
@@ -223,11 +246,10 @@ void bdrv_drained_begin(BlockDriverState *bs)
 }
 
 bdrv_io_unplugged_begin(bs);
-bdrv_drain_recurse(bs);
 if (qemu_in_coroutine()) {
-bdrv_co_yield_to_drain(bs);
+bdrv_co_drain_io_recurse(bs);
 } else {
-bdrv_drain_poll(bs);
+bdrv_drain_io_recurse(bs);
 }
 bdrv_io_unplugged_end(bs);
 }
@@ -296,7 +318,6 @@ void bdrv_drain_all(void)
 aio_context_acquire(aio_context);
 bdrv_parent_drained_begin(bs);
 bdrv_io_unplugged_begin(bs);
-bdrv_drain_recurse(bs);
 aio_context_release(aio_context);
 
 if (!g_slist_find(aio_ctxs, aio_context)) {
@@ -319,10 +340,7 @@ void bdrv_drain_all(void)
 aio_context_acquire(aio_context);
 for (bs = bdrv_first(); bs; bs = bdrv_next()) {
 if (aio_context == bdrv_get_aio_context(bs)) {
-if (bdrv_requests_pending(bs)) {
-aio_poll(aio_context, true);
-waited = true;
-}
+waited |= bdrv_drain_io_recurse(bs);
 }
 }
 aio_context_release(aio_context);
diff --git a/block/qed.c b/block/qed.c
index 3ee879b..1a7ef0a 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -336,7 +336,7 @@ static void qed_need_check_timer_cb(void *opaque)
 qed_plug_allocating_write_reqs(s);
 
 /* Ensure writes are on disk before clearing flag */
-bdrv_aio_flush(s->bs, qed_clear_need_check, s);
+bdrv_aio_flush(s->bs->file->bs, qed_clear_need_check, s);
 }
 
 static void qed_start_need_check_timer(BDRVQEDState *s)
@@ -378,6 +378,19 @@ static void bdrv_qed_attach_aio_context(BlockDriverState 
*bs,
 }
 }
 
+static void bdrv_qed_drain(BlockDriverState *bs)
+{
+BDRVQEDState *s = bs->opaque;
+
+/* Fire the timer immediately in order to start doing I/O as soon as the
+ * header is flushed.
+ */
+if (s->need_check_timer && timer_pending(s->need_check_timer)) {
+qed_cancel_need_check_timer(s);
+qed_need_check_timer_cb(s);
+}
+}
+
 static int 

Re: [Qemu-block] [PATCH v9 09/10] tests: Add test code for hbitmap serialization

2016-10-05 Thread Max Reitz
On 05.10.2016 00:52, John Snow wrote:
> From: Fam Zheng 
> 
> Signed-off-by: Fam Zheng 
> [Fixed minor constant issue. --js]
> Signed-off-by: John Snow 
> 
> Signed-off-by: John Snow 
> ---
>  tests/test-hbitmap.c | 155 
> +++
>  1 file changed, 155 insertions(+)
> 
> diff --git a/tests/test-hbitmap.c b/tests/test-hbitmap.c
> index e3abde1..4fc5612 100644
> --- a/tests/test-hbitmap.c
> +++ b/tests/test-hbitmap.c

[...]

> +static void test_hbitmap_serialize_zeroes(TestHBitmapData *data,
> +  const void *unused)
> +{
> +int i;
> +HBitmapIter iter;
> +int64_t next;
> +uint64_t positions[] = { 0, L1, L2, L3 - L1};
> +int num_positions = sizeof(positions) / sizeof(positions[0]);
> +
> +hbitmap_test_init(data, L3, 0);
> +
> +for (i = 0; i < num_positions; i++) {
> +hbitmap_set(data->hb, positions[i], L1);
> +}
> +
> +for (i = 0; i < num_positions; i++) {
> +hbitmap_deserialize_zeroes(data->hb, positions[i], L1, true);

This tries to deserialize L1 (= BITS_PER_LONG) bits; this is bad on 32
bit hosts, because the serialization granularity is (at least) 64 bits,
and the hbitmap code really doesn't like start or end positions (and
therefore also lengths) which aren't aligned to the granularity.

(Therefore, make check fails with -m32.)

Max

> +hbitmap_iter_init(, data->hb, 0);
> +next = hbitmap_iter_next();
> +if (i == num_positions - 1) {
> +g_assert_cmpint(next, ==, -1);
> +} else {
> +g_assert_cmpint(next, ==, positions[i + 1]);
> +}
> +}
> +}
> +
>  static void hbitmap_test_add(const char *testpath,
> void (*test_func)(TestHBitmapData *data, 
> const void *user_data))
>  {
> @@ -799,6 +945,15 @@ int main(int argc, char **argv)
>  hbitmap_test_add("/hbitmap/meta/byte", test_hbitmap_meta_byte);
>  hbitmap_test_add("/hbitmap/meta/word", test_hbitmap_meta_word);
>  hbitmap_test_add("/hbitmap/meta/sector", test_hbitmap_meta_sector);
> +
> +hbitmap_test_add("/hbitmap/serialize/granularity",
> + test_hbitmap_serialize_granularity);
> +hbitmap_test_add("/hbitmap/serialize/basic",
> + test_hbitmap_serialize_basic);
> +hbitmap_test_add("/hbitmap/serialize/part",
> + test_hbitmap_serialize_part);
> +hbitmap_test_add("/hbitmap/serialize/zeroes",
> + test_hbitmap_serialize_zeroes);
>  g_test_run();
>  
>  return 0;
> 




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v2 06/11] blockjobs: fix documentation

2016-10-05 Thread John Snow



On 10/05/2016 11:03 AM, Kevin Wolf wrote:

Am 01.10.2016 um 00:00 hat John Snow geschrieben:

Wrong function names in documentation.

Signed-off-by: John Snow 
---
 include/block/blockjob_int.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index 0a2d41e..c6da7e4 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -305,7 +305,7 @@ void block_job_enter(BlockJob *job);
 void block_job_event_cancelled(BlockJob *job);

 /**
- * block_job_ready:
+ * block_job_event_completed:


Yes...


  * @job: The job which is now ready to complete.


...that was half of the fix.



:(


  * @msg: Error message. Only present on failure.
  *


Kevin



I'm sorry for not double-checking this one.



[Qemu-block] [PATCH 1/2] quorum: change child_iter to children_read

2016-10-05 Thread Paolo Bonzini
This simplifies a bit the code by using the usual C "inclusive start,
exclusive end" pattern for ranges.

Signed-off-by: Paolo Bonzini 
---
 block/quorum.c | 30 +-
 1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index 9cf876f..435296e 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -130,7 +130,7 @@ struct QuorumAIOCB {
 
 bool is_read;
 int vote_ret;
-int child_iter; /* which child to read in fifo pattern */
+int children_read;  /* how many children have been read from */
 };
 
 static bool quorum_vote(QuorumAIOCB *acb);
@@ -165,8 +165,7 @@ static void quorum_aio_finalize(QuorumAIOCB *acb)
 acb->common.cb(acb->common.opaque, ret);
 
 if (acb->is_read) {
-/* on the quorum case acb->child_iter == s->num_children - 1 */
-for (i = 0; i <= acb->child_iter; i++) {
+for (i = 0; i < acb->children_read; i++) {
 qemu_vfree(acb->qcrs[i].buf);
 qemu_iovec_destroy(>qcrs[i].qiov);
 }
@@ -301,14 +300,13 @@ static void quorum_aio_cb(void *opaque, int ret)
 
 if (acb->is_read && s->read_pattern == QUORUM_READ_PATTERN_FIFO) {
 /* We try to read next child in FIFO order if we fail to read */
-if (ret < 0 && (acb->child_iter + 1) < s->num_children) {
-acb->child_iter++;
+if (ret < 0 && acb->children_read < s->num_children) {
 read_fifo_child(acb);
 return;
 }
 
 if (ret == 0) {
-quorum_copy_qiov(acb->qiov, >qcrs[acb->child_iter].qiov);
+quorum_copy_qiov(acb->qiov, >qcrs[acb->children_read - 
1].qiov);
 }
 acb->vote_ret = ret;
 quorum_aio_finalize(acb);
@@ -653,6 +651,7 @@ static BlockAIOCB *read_quorum_children(QuorumAIOCB *acb)
 BDRVQuorumState *s = acb->common.bs->opaque;
 int i;
 
+acb->children_read = s->num_children;
 for (i = 0; i < s->num_children; i++) {
 acb->qcrs[i].buf = qemu_blockalign(s->children[i]->bs, 
acb->qiov->size);
 qemu_iovec_init(>qcrs[i].qiov, acb->qiov->niov);
@@ -671,16 +670,14 @@ static BlockAIOCB *read_quorum_children(QuorumAIOCB *acb)
 static BlockAIOCB *read_fifo_child(QuorumAIOCB *acb)
 {
 BDRVQuorumState *s = acb->common.bs->opaque;
+int n = acb->children_read++;
 
-acb->qcrs[acb->child_iter].buf =
-qemu_blockalign(s->children[acb->child_iter]->bs, acb->qiov->size);
-qemu_iovec_init(>qcrs[acb->child_iter].qiov, acb->qiov->niov);
-qemu_iovec_clone(>qcrs[acb->child_iter].qiov, acb->qiov,
- acb->qcrs[acb->child_iter].buf);
-acb->qcrs[acb->child_iter].aiocb =
-bdrv_aio_readv(s->children[acb->child_iter], acb->sector_num,
-   >qcrs[acb->child_iter].qiov, acb->nb_sectors,
-   quorum_aio_cb, >qcrs[acb->child_iter]);
+acb->qcrs[n].buf = qemu_blockalign(s->children[n]->bs, acb->qiov->size);
+qemu_iovec_init(>qcrs[n].qiov, acb->qiov->niov);
+qemu_iovec_clone(>qcrs[n].qiov, acb->qiov, acb->qcrs[n].buf);
+acb->qcrs[n].aiocb = bdrv_aio_readv(s->children[n], acb->sector_num,
+>qcrs[n].qiov, acb->nb_sectors,
+quorum_aio_cb, >qcrs[n]);
 
 return >common;
 }
@@ -696,13 +693,12 @@ static BlockAIOCB *quorum_aio_readv(BlockDriverState *bs,
 QuorumAIOCB *acb = quorum_aio_get(s, bs, qiov, sector_num,
   nb_sectors, cb, opaque);
 acb->is_read = true;
+acb->children_read = 0;
 
 if (s->read_pattern == QUORUM_READ_PATTERN_QUORUM) {
-acb->child_iter = s->num_children - 1;
 return read_quorum_children(acb);
 }
 
-acb->child_iter = 0;
 return read_fifo_child(acb);
 }
 
-- 
2.7.4





[Qemu-block] [PATCH 0/2] quorum: cleanup and optimization for FIFO case

2016-10-05 Thread Paolo Bonzini
A couple patches I had lying in the huge multiqueue series but are
not really related.  So let's flush 'em...

Paolo Bonzini (2):
  quorum: change child_iter to children_read
  quorum: do not allocate multiple iovecs for FIFO strategy

 block/quorum.c | 93 --
 1 file changed, 45 insertions(+), 48 deletions(-)

-- 
2.7.4




Re: [Qemu-block] [PATCH] block: change drain to look only at one child at a time

2016-10-05 Thread Paolo Bonzini
Please disregard this, I'll send it soon as part of the correct series.

Paolo

On 05/10/2016 18:47, Paolo Bonzini wrote:
> bdrv_requests_pending is checking children to also wait until internal
> requests (such as metadata writes) have completed.  However, checking
> children is in general overkill.  Children requests can be of two kinds:
> 
> - requests caused by an operation on bs, e.g. a bdrv_aio_write to bs
> causing a write to bs->file->bs.  In this case, the parent's in_flight
> count will always be incremented by at least one for every request in
> the child.
> 
> - asynchronous metadata writes or flushes.  Such writes can be started
> even if bs's in_flight count is zero, but not after the .bdrv_drain
> callback has been invoked.
> 
> This patch therefore changes bdrv_drain to finish I/O in the parent
> (after which the parent's in_flight will be locked to zero), call
> bdrv_drain (after which the parent will not generate I/O on the child
> anymore), and then wait for internal I/O in the children to complete.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  block/io.c  | 54 --
>  block/qed.c | 16 +++-
>  2 files changed, 51 insertions(+), 19 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index f81c884..b94edec 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -156,16 +156,33 @@ bool bdrv_requests_pending(BlockDriverState *bs)
>  return false;
>  }
>  
> -static void bdrv_drain_recurse(BlockDriverState *bs)
> +static bool bdrv_drain_poll(BlockDriverState *bs)
> +{
> +bool waited = false;
> +
> +while (atomic_read(>in_flight) > 0) {
> +aio_poll(bdrv_get_aio_context(bs), true);
> +waited = true;
> +}
> +return waited;
> +}
> +
> +static bool bdrv_drain_io_recurse(BlockDriverState *bs)
>  {
>  BdrvChild *child;
> +bool waited;
> +
> +waited = bdrv_drain_poll(bs);
>  
>  if (bs->drv && bs->drv->bdrv_drain) {
>  bs->drv->bdrv_drain(bs);
>  }
> +
>  QLIST_FOREACH(child, >children, next) {
> -bdrv_drain_recurse(child->bs);
> +waited |= bdrv_drain_io_recurse(child->bs);
>  }
> +
> +return waited;
>  }
>  
>  typedef struct {
> @@ -174,14 +191,6 @@ typedef struct {
>  bool done;
>  } BdrvCoDrainData;
>  
> -static void bdrv_drain_poll(BlockDriverState *bs)
> -{
> -while (bdrv_requests_pending(bs)) {
> -/* Keep iterating */
> -aio_poll(bdrv_get_aio_context(bs), true);
> -}
> -}
> -
>  static void bdrv_co_drain_bh_cb(void *opaque)
>  {
>  BdrvCoDrainData *data = opaque;
> @@ -215,6 +224,20 @@ static void coroutine_fn 
> bdrv_co_yield_to_drain(BlockDriverState *bs)
>  assert(data.done);
>  }
>  
> +static void bdrv_co_drain_io_recurse(BlockDriverState *bs)
> +{
> +BdrvChild *child;
> +
> +bdrv_co_yield_to_drain(bs);
> +if (bs->drv && bs->drv->bdrv_drain) {
> +bs->drv->bdrv_drain(bs);
> +}
> +
> +QLIST_FOREACH(child, >children, next) {
> +bdrv_co_drain_io_recurse(child->bs);
> +}
> +}
> +
>  void bdrv_drained_begin(BlockDriverState *bs)
>  {
>  if (!bs->quiesce_counter++) {
> @@ -223,11 +246,10 @@ void bdrv_drained_begin(BlockDriverState *bs)
>  }
>  
>  bdrv_io_unplugged_begin(bs);
> -bdrv_drain_recurse(bs);
>  if (qemu_in_coroutine()) {
> -bdrv_co_yield_to_drain(bs);
> +bdrv_co_drain_io_recurse(bs);
>  } else {
> -bdrv_drain_poll(bs);
> +bdrv_drain_io_recurse(bs);
>  }
>  bdrv_io_unplugged_end(bs);
>  }
> @@ -296,7 +318,6 @@ void bdrv_drain_all(void)
>  aio_context_acquire(aio_context);
>  bdrv_parent_drained_begin(bs);
>  bdrv_io_unplugged_begin(bs);
> -bdrv_drain_recurse(bs);
>  aio_context_release(aio_context);
>  
>  if (!g_slist_find(aio_ctxs, aio_context)) {
> @@ -319,10 +340,7 @@ void bdrv_drain_all(void)
>  aio_context_acquire(aio_context);
>  for (bs = bdrv_first(); bs; bs = bdrv_next()) {
>  if (aio_context == bdrv_get_aio_context(bs)) {
> -if (bdrv_requests_pending(bs)) {
> -aio_poll(aio_context, true);
> -waited = true;
> -}
> +waited |= bdrv_drain_io_recurse(bs);
>  }
>  }
>  aio_context_release(aio_context);
> diff --git a/block/qed.c b/block/qed.c
> index 3ee879b..1a7ef0a 100644
> --- a/block/qed.c
> +++ b/block/qed.c
> @@ -336,7 +336,7 @@ static void qed_need_check_timer_cb(void *opaque)
>  qed_plug_allocating_write_reqs(s);
>  
>  /* Ensure writes are on disk before clearing flag */
> -bdrv_aio_flush(s->bs, qed_clear_need_check, s);
> +bdrv_aio_flush(s->bs->file->bs, qed_clear_need_check, s);
>  }
>  
>  static void qed_start_need_check_timer(BDRVQEDState *s)
> @@ -378,6 +378,19 @@ static void 

Re: [Qemu-block] [PATCH 1/2] quorum: change child_iter to children_read

2016-10-05 Thread Max Reitz
On 05.10.2016 18:35, Paolo Bonzini wrote:
> This simplifies a bit the code by using the usual C "inclusive start,
> exclusive end" pattern for ranges.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  block/quorum.c | 30 +-
>  1 file changed, 13 insertions(+), 17 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v2 04/11] blockjobs: Always use block_job_get_aio_context

2016-10-05 Thread Kevin Wolf
Am 01.10.2016 um 00:00 hat John Snow geschrieben:
> There are a few places where we're fishing it out for ourselves.
> Let's not do that and instead use the helper.
> 
> Signed-off-by: John Snow 

That change makes a difference when the block job is running its
completion part after block_job_defer_to_main_loop(). The commit message
could be more explicit about whether this is intentional or whether this
case is expected to happen at all.

I suspect that if it can happen, this is a bug fix. Please check and
update the commit message accordingly.

Kevin



Re: [Qemu-block] [PATCH] qcow2: Optimize L2 table cache size based on image and cluster sizes

2016-10-05 Thread Max Reitz
On 05.10.2016 16:57, Alberto Garcia wrote:
> On Tue 04 Oct 2016 05:51:26 PM CEST, Max Reitz wrote:
> 
>>> At least giving users a way to skip the math would be an improvement.
>>> Would you be okay with an explicitly-set option like
>>> l2_cache_size=auto or =max that optimizes for performance at the
>>> expense of memory?
>>
>> That (cache-size=max) is actually something Stefan Hajnoczi has
>> proposed at KVM Forum 2015.
>>
>> I agree that implementing the formula in qemu's qcow2 driver itself
>> makes sense and will help users; however, I do think it is appropriate
>> to expect the user to pass cache-size=max if they need it.
> 
> Frank Myhr's suggestion (in bugzilla) is that we allow specifying a % of
> the disk size, so
> 
> l2-cache-size=100%  (that would be cache-size=max)
> l2-cache-size=80%
> ...
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1377735#c3
> 
> Either one looks good to me. They would change however the data type of
> 'l2-cache-size' from int to string in BlockdevOptionsQcow2, can we
> actually do that?

I think we can do that with an alternate data type which accepts both an
integer and a string. If we only had "max", we'd probably want to make
it an enumeration instead of a free-form string, though. But with a %
suffix, we'd probably need a string.

For me, either works fine.

Apart from that, I have to say I think it would be a bit more useful if
one would specify the area covered by the metadata caches as an absolute
number instead of a relative one (I guess it's generally easier to know
what area your applications will perform random accesses on than the
relative size, but maybe that's just me).

Supporting that with cache-size is difficult, though, because how are
you going to decide between whether the user is specifying the actual
size of the cache or the area it covers?

So maybe it would make more sense to introduce a whole new set of
options called {,l2-,refcount-}cache-coverage or something. These
options would then accept both an absolute and a relative number.

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH] qcow2: Optimize L2 table cache size based on image and cluster sizes

2016-10-05 Thread Kevin Wolf
Am 05.10.2016 um 17:13 hat Max Reitz geschrieben:
> On 05.10.2016 16:57, Alberto Garcia wrote:
> > On Tue 04 Oct 2016 05:51:26 PM CEST, Max Reitz wrote:
> > 
> >>> At least giving users a way to skip the math would be an improvement.
> >>> Would you be okay with an explicitly-set option like
> >>> l2_cache_size=auto or =max that optimizes for performance at the
> >>> expense of memory?
> >>
> >> That (cache-size=max) is actually something Stefan Hajnoczi has
> >> proposed at KVM Forum 2015.
> >>
> >> I agree that implementing the formula in qemu's qcow2 driver itself
> >> makes sense and will help users; however, I do think it is appropriate
> >> to expect the user to pass cache-size=max if they need it.
> > 
> > Frank Myhr's suggestion (in bugzilla) is that we allow specifying a % of
> > the disk size, so
> > 
> > l2-cache-size=100%  (that would be cache-size=max)
> > l2-cache-size=80%
> > ...
> > 
> > https://bugzilla.redhat.com/show_bug.cgi?id=1377735#c3
> > 
> > Either one looks good to me. They would change however the data type of
> > 'l2-cache-size' from int to string in BlockdevOptionsQcow2, can we
> > actually do that?
> 
> I think we can do that with an alternate data type which accepts both an
> integer and a string. If we only had "max", we'd probably want to make
> it an enumeration instead of a free-form string, though. But with a %
> suffix, we'd probably need a string.
> 
> For me, either works fine.

I'm not sure if we want to support such syntax in QMP. It sounds like a
typical convenience option for human users.

> Apart from that, I have to say I think it would be a bit more useful if
> one would specify the area covered by the metadata caches as an absolute
> number instead of a relative one (I guess it's generally easier to know
> what area your applications will perform random accesses on than the
> relative size, but maybe that's just me).
> 
> Supporting that with cache-size is difficult, though, because how are
> you going to decide between whether the user is specifying the actual
> size of the cache or the area it covers?
> 
> So maybe it would make more sense to introduce a whole new set of
> options called {,l2-,refcount-}cache-coverage or something. These
> options would then accept both an absolute and a relative number.

If we want easy, make it easy: cache-coverage and apply it to both.

Kevin


pgpvoadS14oHN.pgp
Description: PGP signature


Re: [Qemu-block] [PATCH] qcow2: Optimize L2 table cache size based on image and cluster sizes

2016-10-05 Thread Alberto Garcia
On Tue 04 Oct 2016 05:51:26 PM CEST, Max Reitz wrote:

>> At least giving users a way to skip the math would be an improvement.
>> Would you be okay with an explicitly-set option like
>> l2_cache_size=auto or =max that optimizes for performance at the
>> expense of memory?
>
> That (cache-size=max) is actually something Stefan Hajnoczi has
> proposed at KVM Forum 2015.
>
> I agree that implementing the formula in qemu's qcow2 driver itself
> makes sense and will help users; however, I do think it is appropriate
> to expect the user to pass cache-size=max if they need it.

Frank Myhr's suggestion (in bugzilla) is that we allow specifying a % of
the disk size, so

l2-cache-size=100%  (that would be cache-size=max)
l2-cache-size=80%
...

https://bugzilla.redhat.com/show_bug.cgi?id=1377735#c3

Either one looks good to me. They would change however the data type of
'l2-cache-size' from int to string in BlockdevOptionsQcow2, can we
actually do that?

Berto



Re: [Qemu-block] [PATCH] qcow2: Optimize L2 table cache size based on image and cluster sizes

2016-10-05 Thread Alberto Garcia
On Wed 05 Oct 2016 05:13:39 PM CEST, Max Reitz wrote:

 At least giving users a way to skip the math would be an
 improvement.  Would you be okay with an explicitly-set option like
 l2_cache_size=auto or =max that optimizes for performance at the
 expense of memory?
>>>
>> Frank Myhr's suggestion (in bugzilla) is that we allow specifying a %
>> of the disk size, so
>> 
>> l2-cache-size=100%  (that would be cache-size=max)
>> l2-cache-size=80%

> For me, either works fine.
>
> Apart from that, I have to say I think it would be a bit more useful
> if one would specify the area covered by the metadata caches as an
> absolute number instead of a relative one (I guess it's generally
> easier to know what area your applications will perform random
> accesses on than the relative size, but maybe that's just me).

I'm not sure if I'm following you, can you give an example of what the
area covered by the cache exactly means?

Berto



Re: [Qemu-block] [PATCH v2 06/11] blockjobs: fix documentation

2016-10-05 Thread Kevin Wolf
Am 01.10.2016 um 00:00 hat John Snow geschrieben:
> Wrong function names in documentation.
> 
> Signed-off-by: John Snow 
> ---
>  include/block/blockjob_int.h | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
> index 0a2d41e..c6da7e4 100644
> --- a/include/block/blockjob_int.h
> +++ b/include/block/blockjob_int.h
> @@ -305,7 +305,7 @@ void block_job_enter(BlockJob *job);
>  void block_job_event_cancelled(BlockJob *job);
>  
>  /**
> - * block_job_ready:
> + * block_job_event_completed:

Yes...

>   * @job: The job which is now ready to complete.

...that was half of the fix.

>   * @msg: Error message. Only present on failure.
>   *

Kevin



Re: [Qemu-block] [PATCH v2 09/11] blockjob: add block_job_start

2016-10-05 Thread Kevin Wolf
Am 01.10.2016 um 00:00 hat John Snow geschrieben:
> Instead of automatically starting jobs at creation time via backup_start
> et al, we'd like to return a job object pointer that can be started
> manually at later point in time.
> 
> For now, add the block_job_start mechanism and start the jobs
> automatically as we have been doing, with conversions job-by-job coming
> in later patches.
> 
> Of note: cancellation of unstarted jobs will perform all the normal
> cleanup as if the job had started, particularly abort and clean. The
> only difference is that we will not emit any events, because the job
> never actually started.
> 
> Signed-off-by: John Snow 

Should we make sure that jobs are only added to the block_jobs list once
they are started? It doesn't sound like a good idea to make a job
without a coroutine user-accessible.

Kevin



Re: [Qemu-block] [PATCH 1/3] block: Add node name to BLOCK_IO_ERROR event

2016-10-05 Thread Max Reitz
On 05.10.2016 11:26, Kevin Wolf wrote:
> The event currently only contains the BlockBackend name. However, with
> anonymous BlockBackends, this is always the empty string. Add the node
> name so that the user can still see which block device caused the event.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block/block-backend.c | 5 +++--
>  docs/qmp-events.txt   | 6 +-
>  qapi/block-core.json  | 8 ++--
>  3 files changed, 14 insertions(+), 5 deletions(-)

Reviewed-by: Max Reitz 

Although it is a bit weird to report the root node's name, because I'd
expect the I/O error to occur on the protocol layer and thus the event
to report the protocol node's name.

OTOH, I think it'd be rather difficult to either keep the information
from which node the error originated or to trace it back when we want to
report it (and the decision to report it is generally made on the root
node, so...).

Therefore, reporting it on the root node is probably the best we can do.

Would it make sense to add a short note to the documentation
(qmp-events.txt and block-core.json) that the node name reported is not
necessarily the node the I/O error originated from but may (and
generally is) instead be some node above it?

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 3/3] block: Add qdev ID to DEVICE_TRAY_MOVED

2016-10-05 Thread Max Reitz
On 05.10.2016 11:26, Kevin Wolf wrote:
> The event currently only contains the BlockBackend name. However, with
> anonymous BlockBackends, this is always the empty string. Add the qdev
> ID (or if none was given, the QOM path) so that the user can still see
> which device caused the event.
> 
> Event generation has to be moved from bdrv_eject() to the BlockBackend
> because the BDS doesn't know the attached device, but that's easy
> because blk_eject() is the only user of it.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block.c   |  7 ---
>  block/block-backend.c | 33 -
>  docs/qmp-commands.txt |  3 +++
>  docs/qmp-events.txt   |  6 +-
>  qapi/block.json   |  8 ++--
>  5 files changed, 46 insertions(+), 11 deletions(-)

I didn't even know we use the same event for both virtual and physical
trays. Fun, fun.

Even more fun in that blk_eject() always issues the event, regardless of
whether there actually is a physical tray. Therefore, I assume it's
effectively just an event for the virtual tray which has to hook up in
blk_eject() because a guest eject won't go through
blk_dev_change_media_cb(), but it will go through blk_eject() (which QMP
ejects do not). Totally makes sense.

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 2/3] block-backend: Remember if attached device is non-qdev

2016-10-05 Thread Max Reitz
On 05.10.2016 11:26, Kevin Wolf wrote:
> Almost all block devices are qdevified by now. This allows us to go back
> from the BlockBackend to the DeviceState. xen_disk is the last device
> that is missing. We'll remember in the BlockBackend if a xen_disk is
> attached and can then disable any features that require going from a BB
> to the DeviceState.
> 
> While at it, clearly mark the function used by xen_disk as legacy even
> in its name, not just in TODO comments.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block/block-backend.c  | 26 +++---
>  hw/block/xen_disk.c|  2 +-
>  include/sysemu/block-backend.h |  4 ++--
>  3 files changed, 22 insertions(+), 10 deletions(-)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 911254a..39c5613 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c

[...]

> @@ -507,32 +508,38 @@ void blk_insert_bs(BlockBackend *blk, BlockDriverState 
> *bs)
>  }
>  }
>  
> -/*
> - * Attach device model @dev to @blk.
> - * Return 0 on success, -EBUSY when a device model is attached already.
> - */
> -int blk_attach_dev(BlockBackend *blk, void *dev)
> -/* TODO change to DeviceState *dev when all users are qdevified */
> +static int blk_do_attach_dev(BlockBackend *blk, void *dev)
>  {
>  if (blk->dev) {
>  return -EBUSY;
>  }
>  blk_ref(blk);
>  blk->dev = dev;
> +blk->legacy_dev = false;
>  blk_iostatus_reset(blk);
>  return 0;
>  }
>  
>  /*
>   * Attach device model @dev to @blk.
> + * Return 0 on success, -EBUSY when a device model is attached already.
> + */
> +int blk_attach_dev(BlockBackend *blk, DeviceState *dev)
> +{
> +return blk_do_attach_dev(blk, dev);
> +}
> +
> +/*
> + * Attach device model @dev to @blk.
>   * @blk must not have a device model attached already.
>   * TODO qdevified devices don't use this, remove when devices are qdevified
>   */
> -void blk_attach_dev_nofail(BlockBackend *blk, void *dev)
> +void blk_attach_dev_legacy(BlockBackend *blk, void *dev)
>  {
>  if (blk_attach_dev(blk, dev) < 0) {

I'd make this a blk_do_attach_dev(), but it's not syntactically wrong,
so the choice is up to you.

Thus, either way:

Reviewed-by: Max Reitz 

>  abort();
>  }
> +blk->legacy_dev = true;
>  }
>  
>  /*



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 2/2] quorum: do not allocate multiple iovecs for FIFO strategy

2016-10-05 Thread Max Reitz
On 05.10.2016 18:35, Paolo Bonzini wrote:
> In FIFO mode there are no parallel reads, hence there is no need to
> allocate separate buffers and clone the iovecs.
> 
> The two cases of quorum_aio_cb are now even more different, and
> most of quorum_aio_finalize is only needed in one of them, so split
> them in separate functions.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  block/quorum.c | 79 
> +-
>  1 file changed, 40 insertions(+), 39 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 2/3] block-backend: Remember if attached device is non-qdev

2016-10-05 Thread Eric Blake
On 10/05/2016 01:01 PM, Max Reitz wrote:

>> +static int blk_do_attach_dev(BlockBackend *blk, void *dev)

>> + */
>> +int blk_attach_dev(BlockBackend *blk, DeviceState *dev)
>> +{
>> +return blk_do_attach_dev(blk, dev);

>> +void blk_attach_dev_legacy(BlockBackend *blk, void *dev)
>>  {
>>  if (blk_attach_dev(blk, dev) < 0) {
> 
> I'd make this a blk_do_attach_dev(), but it's not syntactically wrong,
> so the choice is up to you.

It's technically undefined C behavior to cast from void* to an unrelated
pointer and back to void* (you are only guaranteed a round trip from
type to void* and back to original type, not with unrelated types).  So
syntactically valid but semantically undefined.

On the other hand, ABI-wise, I'd be shocked if our compilers are able to
exploit our use of undefined behavior.

At any rate, using blk_do_attach_dev() would avoid the definedness
problem, so I'd go ahead and make the change.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH v2 02/11] blockjob: centralize QMP event emissions

2016-10-05 Thread Eric Blake
On 10/05/2016 01:49 PM, John Snow wrote:

>> Here we have an additional caller in block/replication.c and qemu-img,
>> so the parameters must stay. For qemu-img, nothing changes. For
>> replication, the block job events are added as a side effect.
>>
>> Not sure if we want to emit such events for an internal block job, but
>> if we do want the change, it should be explicit.
>>
> 
> Hmm, do we want to make it so some jobs are invisible and others are
> not? Because as it stands right now, neither case is strictly true. We
> only emit cancelled/completed events if it was started via QMP, however
> we do emit events for error and ready regardless of who started the job.

Libvirt tries to mirror any block job event it receives to upper layers.
 But if it cannot figure out which upper-layer disk the event is
associated with, it just drops the event.  So I think that from the
libvirt perspective, you are okay if if you always report job events,
even for internal jobs.  (Do we have a quick and easy way to set up an
internal job event, to double-check if I can produce some sort of
libvirt scenario to trigger the event and see that it gets safely ignored?)

> 
> That didn't seem particularly consistent to me; either all events should
> be controlled by the job layer itself or none of them should be.
> 
> I opted for "all."
> 
> For "internal" jobs that did not previously emit any events, is it not
> true that these jobs still appear in the block job list and are
> effectively public regardless? I'd argue that these messages may be of
> value for management utilities who are still blocked by these jobs
> whether or not they are 'internal' or not.
> 
> I'll push for keeping it mandatory and explicit. If it becomes a
> problem, we can always add a 'silent' job property that silences ALL qmp
> events, including all completion, error, and ready notices.

Completely silencing an internal job seems a little cleaner than having
events for the job but not being able to query it.  But if nothing
breaks by exposing the internal jobs, that seems even easier than trying
to decide which jobs are internal and hidden vs. user-requested and public.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH v2 02/11] blockjob: centralize QMP event emissions

2016-10-05 Thread John Snow



On 10/05/2016 09:43 AM, Kevin Wolf wrote:

Am 01.10.2016 um 00:00 hat John Snow geschrieben:

There's no reason to leave this to blockdev; we can do it in blockjobs
directly and get rid of an extra callback for most users.

Signed-off-by: John Snow 
---
 blockdev.c | 37 ++---
 blockjob.c | 16 ++--
 2 files changed, 20 insertions(+), 33 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 29c6561..03200e7 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2957,31 +2957,6 @@ out:
 aio_context_release(aio_context);
 }

-static void block_job_cb(void *opaque, int ret)
-{
-/* Note that this function may be executed from another AioContext besides
- * the QEMU main loop.  If you need to access anything that assumes the
- * QEMU global mutex, use a BH or introduce a mutex.
- */
-
-BlockDriverState *bs = opaque;
-const char *msg = NULL;
-
-trace_block_job_cb(bs, bs->job, ret);


This trace event is removed from the code, but not from trace-events.


-
-assert(bs->job);
-
-if (ret < 0) {
-msg = strerror(-ret);
-}
-
-if (block_job_is_cancelled(bs->job)) {
-block_job_event_cancelled(bs->job);
-} else {
-block_job_event_completed(bs->job, msg);


block_job_event_cancelled/completed can become static now.


-}
-}
-
 void qmp_block_stream(bool has_job_id, const char *job_id, const char *device,
   bool has_base, const char *base,
   bool has_backing_file, const char *backing_file,
@@ -3033,7 +3008,7 @@ void qmp_block_stream(bool has_job_id, const char 
*job_id, const char *device,
 base_name = has_backing_file ? backing_file : base_name;

 stream_start(has_job_id ? job_id : NULL, bs, base_bs, base_name,
- has_speed ? speed : 0, on_error, block_job_cb, bs, 
_err);
+ has_speed ? speed : 0, on_error, NULL, bs, _err);


Passing cb == NULL, but opaque != NULL is harmless, but feels odd.



Yes.


And actually this is the only caller of stream_start, so the parameters
could just be dropped.



OK. I left the parameters in on purpose, but they can be re-added in the 
future if desired.


(Hm, maybe as part of a CommonJobOpts parameter someday?)


 if (local_err) {
 error_propagate(errp, local_err);
 goto out;
@@ -3136,10 +3111,10 @@ void qmp_block_commit(bool has_job_id, const char 
*job_id, const char *device,
 goto out;
 }
 commit_active_start(has_job_id ? job_id : NULL, bs, base_bs, speed,
-on_error, block_job_cb, bs, _err, false);
+on_error, NULL, bs, _err, false);


Here we have an additional caller in block/replication.c and qemu-img,
so the parameters must stay. For qemu-img, nothing changes. For
replication, the block job events are added as a side effect.

Not sure if we want to emit such events for an internal block job, but
if we do want the change, it should be explicit.



Hmm, do we want to make it so some jobs are invisible and others are 
not? Because as it stands right now, neither case is strictly true. We 
only emit cancelled/completed events if it was started via QMP, however 
we do emit events for error and ready regardless of who started the job.


That didn't seem particularly consistent to me; either all events should 
be controlled by the job layer itself or none of them should be.


I opted for "all."

For "internal" jobs that did not previously emit any events, is it not 
true that these jobs still appear in the block job list and are 
effectively public regardless? I'd argue that these messages may be of 
value for management utilities who are still blocked by these jobs 
whether or not they are 'internal' or not.


I'll push for keeping it mandatory and explicit. If it becomes a 
problem, we can always add a 'silent' job property that silences ALL qmp 
events, including all completion, error, and ready notices.


I've CC'd Wen Congyang and Eric Blake to talk me down if they wish.


 } else {
 commit_start(has_job_id ? job_id : NULL, bs, base_bs, top_bs, speed,
- on_error, block_job_cb, bs,
+ on_error, NULL, bs,
  has_backing_file ? backing_file : NULL, _err);


Like stream_start, drop the parameters.


 }
 if (local_err != NULL) {
@@ -3260,7 +3235,7 @@ static void do_drive_backup(DriveBackup *backup, 
BlockJobTxn *txn, Error **errp)

 backup_start(backup->job_id, bs, target_bs, backup->speed, backup->sync,
  bmap, backup->compress, backup->on_source_error,
- backup->on_target_error, block_job_cb, bs, txn, _err);
+ backup->on_target_error, NULL, bs, txn, _err);
 bdrv_unref(target_bs);
 if (local_err != NULL) {
 error_propagate(errp, local_err);
@@ -3330,7 +3305,7 @@ void do_blockdev_backup(BlockdevBackup *backup, 
BlockJobTxn *txn, 

Re: [Qemu-block] [Qemu-devel] [PATCH v2 02/11] blockjob: centralize QMP event emissions

2016-10-05 Thread John Snow



On 10/05/2016 03:24 PM, Eric Blake wrote:

On 10/05/2016 01:49 PM, John Snow wrote:


Here we have an additional caller in block/replication.c and qemu-img,
so the parameters must stay. For qemu-img, nothing changes. For
replication, the block job events are added as a side effect.

Not sure if we want to emit such events for an internal block job, but
if we do want the change, it should be explicit.



Hmm, do we want to make it so some jobs are invisible and others are
not? Because as it stands right now, neither case is strictly true. We
only emit cancelled/completed events if it was started via QMP, however
we do emit events for error and ready regardless of who started the job.


Libvirt tries to mirror any block job event it receives to upper layers.
 But if it cannot figure out which upper-layer disk the event is
associated with, it just drops the event.  So I think that from the
libvirt perspective, you are okay if if you always report job events,
even for internal jobs.  (Do we have a quick and easy way to set up an
internal job event, to double-check if I can produce some sort of
libvirt scenario to trigger the event and see that it gets safely ignored?)



Not in a QEMU release yet, I think.



That didn't seem particularly consistent to me; either all events should
be controlled by the job layer itself or none of them should be.

I opted for "all."

For "internal" jobs that did not previously emit any events, is it not
true that these jobs still appear in the block job list and are
effectively public regardless? I'd argue that these messages may be of
value for management utilities who are still blocked by these jobs
whether or not they are 'internal' or not.

I'll push for keeping it mandatory and explicit. If it becomes a
problem, we can always add a 'silent' job property that silences ALL qmp
events, including all completion, error, and ready notices.


Completely silencing an internal job seems a little cleaner than having
events for the job but not being able to query it.  But if nothing
breaks by exposing the internal jobs, that seems even easier than trying
to decide which jobs are internal and hidden vs. user-requested and public.



Well, at the moment anything requested directly via blockdev.c is 
"public." Before 2.8, all jobs were public ones, with the exception of 
those in qemu-img which is a bit of a different/special case.


We have this block/replication.c beast now, though, and it uses 
backup_start and commit_active_start as it sees fit without direct user 
intervention.


As it stands, I believe the jobs that replication creates are 
user-visible via query, will not issue cancellation or completion 
events, but WILL emit error events. It may emit ready events for the 
mirror job it uses, but I haven't traced that. (It could, at least.)