Re: [Qemu-block] [Question] How can we confirm hot-plug disk succesfully?

2017-06-21 Thread Xie Changlong

在 6/19/2017 6:49 PM, Kevin Wolf 写道:

'info block' shows nothing, but we can't add drive who's id
is'drive-virtio-disk1' too.

Yes, the old BlockBackend is only fully freed when the guest actually
unplugs the device. Specifically, we would have to free the QemuOpts
in DriveInfo that keeps the ID reserved. Currently, it is only freed
when the BlockBackend is destroyed:

 blockdev_auto_del()
 blk_unref()
 blk_delete()
 drive_info_del()

We can't free the DriveInfo earlier because it's still needed while the
guest device is still alive.

I'm not sure, but it may be possible to free just the QemuOpts in
monitor_remove_blk(), so that the ID can immediately be reused.

Markus, would you know?



Hi, all. Any ideas?


There is a more serious situation, we
could*never*  destory device memory with 'device_del', it's memory
leak to me if the guest os never support hot-plug and the user don't
know this information.

The user sees that they never get a DEVICE_REMOVED event, so in some way
the do know about it.


But the useless memory is always there and no way to free it, although 
we known that.

BTW, is it possible to force destroy the BlockBackend in this situation?

--
Thanks
-Xie



Re: [Qemu-block] [PATCH v2 1/3] xen-disk: only advertize feature-persistent if grant copy is not available

2017-06-21 Thread Stefano Stabellini
On Wed, 21 Jun 2017, Paul Durrant wrote:
> If grant copy is available then it will always be used in preference to
> persistent maps. In this case feature-persistent should not be advertized
> to the frontend, otherwise it may needlessly copy data into persistently
> granted buffers.
> 
> Signed-off-by: Paul Durrant 

Reviewed-by: Stefano Stabellini 


> ---
> Cc: Stefano Stabellini 
> Cc: Anthony Perard 
> Cc: Kevin Wolf 
> Cc: Max Reitz 
> ---
>  hw/block/xen_disk.c | 15 ---
>  1 file changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index 3a22805fbc..9b06e3aa81 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -1023,11 +1023,18 @@ static int blk_init(struct XenDevice *xendev)
>  
>  blkdev->file_blk  = BLOCK_SIZE;
>  
> +blkdev->feature_grant_copy =
> +(xengnttab_grant_copy(blkdev->xendev.gnttabdev, 0, NULL) == 
> 0);
> +
> +xen_pv_printf(>xendev, 3, "grant copy operation %s\n",
> +  blkdev->feature_grant_copy ? "enabled" : "disabled");
> +
>  /* fill info
>   * blk_connect supplies sector-size and sectors
>   */
>  xenstore_write_be_int(>xendev, "feature-flush-cache", 1);
> -xenstore_write_be_int(>xendev, "feature-persistent", 1);
> +xenstore_write_be_int(>xendev, "feature-persistent",
> +  !blkdev->feature_grant_copy);
>  xenstore_write_be_int(>xendev, "info", info);
>  
>  blk_parse_discard(blkdev);
> @@ -1202,12 +1209,6 @@ static int blk_connect(struct XenDevice *xendev)
>  
>  xen_be_bind_evtchn(>xendev);
>  
> -blkdev->feature_grant_copy =
> -(xengnttab_grant_copy(blkdev->xendev.gnttabdev, 0, NULL) == 
> 0);
> -
> -xen_pv_printf(>xendev, 3, "grant copy operation %s\n",
> -  blkdev->feature_grant_copy ? "enabled" : "disabled");
> -
>  xen_pv_printf(>xendev, 1, "ok: proto %s, ring-ref %d, "
>"remote port %d, local port %d\n",
>blkdev->xendev.protocol, blkdev->ring_ref,
> -- 
> 2.11.0
> 



Re: [Qemu-block] [PATCH v2 2/3] xen-disk: add support for multi-page shared rings

2017-06-21 Thread Stefano Stabellini
On Wed, 21 Jun 2017, Paul Durrant wrote:
> The blkif protocol has had provision for negotiation of multi-page shared
> rings for some time now and many guest OS have support in their frontend
> drivers.
> 
> This patch makes the necessary modifications to xen-disk support a shared
> ring up to order 4 (i.e. 16 pages).
> 
> Signed-off-by: Paul Durrant 

Reviewed-by: Stefano Stabellini 

> ---
> Cc: Stefano Stabellini 
> Cc: Anthony Perard 
> Cc: Kevin Wolf 
> Cc: Max Reitz 
> 
> v2:
>  - Fix memory leak in error path
>  - Print warning if ring-page-order exceeds limits
> ---
>  hw/block/xen_disk.c | 144 
> +---
>  1 file changed, 113 insertions(+), 31 deletions(-)
> 
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index 9b06e3aa81..0e6513708e 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -36,8 +36,6 @@
>  
>  static int batch_maps   = 0;
>  
> -static int max_requests = 32;
> -
>  /* - */
>  
>  #define BLOCK_SIZE  512
> @@ -84,6 +82,8 @@ struct ioreq {
>  BlockAcctCookie acct;
>  };
>  
> +#define MAX_RING_PAGE_ORDER 4
> +
>  struct XenBlkDev {
>  struct XenDevicexendev;  /* must be first */
>  char*params;
> @@ -94,7 +94,8 @@ struct XenBlkDev {
>  booldirectiosafe;
>  const char  *fileproto;
>  const char  *filename;
> -int ring_ref;
> +unsigned intring_ref[1 << MAX_RING_PAGE_ORDER];
> +unsigned intnr_ring_ref;
>  void*sring;
>  int64_t file_blk;
>  int64_t file_size;
> @@ -110,6 +111,7 @@ struct XenBlkDev {
>  int requests_total;
>  int requests_inflight;
>  int requests_finished;
> +unsigned intmax_requests;
>  
>  /* Persistent grants extension */
>  gbooleanfeature_discard;
> @@ -199,7 +201,7 @@ static struct ioreq *ioreq_start(struct XenBlkDev *blkdev)
>  struct ioreq *ioreq = NULL;
>  
>  if (QLIST_EMPTY(>freelist)) {
> -if (blkdev->requests_total >= max_requests) {
> +if (blkdev->requests_total >= blkdev->max_requests) {
>  goto out;
>  }
>  /* allocate new struct */
> @@ -905,7 +907,7 @@ static void blk_handle_requests(struct XenBlkDev *blkdev)
>  ioreq_runio_qemu_aio(ioreq);
>  }
>  
> -if (blkdev->more_work && blkdev->requests_inflight < max_requests) {
> +if (blkdev->more_work && blkdev->requests_inflight < 
> blkdev->max_requests) {
>  qemu_bh_schedule(blkdev->bh);
>  }
>  }
> @@ -918,15 +920,6 @@ static void blk_bh(void *opaque)
>  blk_handle_requests(blkdev);
>  }
>  
> -/*
> - * We need to account for the grant allocations requiring contiguous
> - * chunks; the worst case number would be
> - * max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
> - * but in order to keep things simple just use
> - * 2 * max_req * max_seg.
> - */
> -#define MAX_GRANTS(max_req, max_seg) (2 * (max_req) * (max_seg))
> -
>  static void blk_alloc(struct XenDevice *xendev)
>  {
>  struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, 
> xendev);
> @@ -938,11 +931,6 @@ static void blk_alloc(struct XenDevice *xendev)
>  if (xen_mode != XEN_EMULATE) {
>  batch_maps = 1;
>  }
> -if (xengnttab_set_max_grants(xendev->gnttabdev,
> -MAX_GRANTS(max_requests, BLKIF_MAX_SEGMENTS_PER_REQUEST)) < 0) {
> -xen_pv_printf(xendev, 0, "xengnttab_set_max_grants failed: %s\n",
> -  strerror(errno));
> -}
>  }
>  
>  static void blk_parse_discard(struct XenBlkDev *blkdev)
> @@ -1037,6 +1025,9 @@ static int blk_init(struct XenDevice *xendev)
>!blkdev->feature_grant_copy);
>  xenstore_write_be_int(>xendev, "info", info);
>  
> +xenstore_write_be_int(>xendev, "max-ring-page-order",
> +  MAX_RING_PAGE_ORDER);
> +
>  blk_parse_discard(blkdev);
>  
>  g_free(directiosafe);
> @@ -1058,12 +1049,25 @@ out_error:
>  return -1;
>  }
>  
> +/*
> + * We need to account for the grant allocations requiring contiguous
> + * chunks; the worst case number would be
> + * max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
> + * but in order to keep things simple just use
> + * 2 * max_req * max_seg.
> + */
> +#define MAX_GRANTS(max_req, max_seg) (2 * (max_req) * (max_seg))
> +
>  static int blk_connect(struct XenDevice *xendev)
>  {
>  struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, 
> xendev);
>  int pers, index, qflags;
>  bool readonly = true;
>  bool writethrough = true;
> +int order, ring_ref;
> +unsigned int 

Re: [Qemu-block] [PATCH v2 3/4] qcow2: add shrink image support

2017-06-21 Thread Max Reitz
On 2017-06-13 14:16, Pavel Butsykin wrote:
> This patch add shrinking of the image file for qcow2. As a result, this allows
> us to reduce the virtual image size and free up space on the disk without
> copying the image. Image can be fragmented and shrink is done by punching 
> holes
> in the image file.
> 
> Signed-off-by: Pavel Butsykin 
> ---
>  block/qcow2-cluster.c  | 42 
>  block/qcow2-refcount.c | 65 
> ++
>  block/qcow2.c  | 40 +++
>  block/qcow2.h  |  2 ++
>  qapi/block-core.json   |  3 ++-
>  5 files changed, 141 insertions(+), 11 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index d779ea19cf..a84b7e607e 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -32,6 +32,48 @@
>  #include "qemu/bswap.h"
>  #include "trace.h"
>  
> +int qcow2_shrink_l1_table(BlockDriverState *bs, uint64_t max_size)

It's not really a max_size but always an exact size. You don't want it
to be any smaller than this.

> +{
> +BDRVQcow2State *s = bs->opaque;
> +int new_l1_size, i, ret;
> +
> +if (max_size >= s->l1_size) {
> +return 0;
> +}
> +
> +new_l1_size = max_size;
> +
> +#ifdef DEBUG_ALLOC2
> +fprintf(stderr, "shrink l1_table from %d to %" PRId64 "\n",
> +s->l1_size, new_l1_size);

new_l1_size is of type int, not int64_t.

> +#endif
> +
> +BLKDBG_EVENT(bs->file, BLKDBG_L1_SHRINK_WRITE_TABLE);
> +ret = bdrv_pwrite_zeroes(bs->file, s->l1_table_offset +
> +   sizeof(uint64_t) * new_l1_size,
> + (s->l1_size - new_l1_size) * sizeof(uint64_t), 
> 0);
> +if (ret < 0) {
> +return ret;
> +}
> +
> +ret = bdrv_flush(bs->file->bs);
> +if (ret < 0) {
> +return ret;
> +}
> +
> +BLKDBG_EVENT(bs->file, BLKDBG_L1_SHRINK_FREE_L2_CLUSTERS);
> +for (i = s->l1_size - 1; i > new_l1_size - 1; i--) {
> +if ((s->l1_table[i] & L1E_OFFSET_MASK) == 0) {
> +continue;
> +}
> +qcow2_free_clusters(bs, s->l1_table[i] & L1E_OFFSET_MASK,
> +s->l2_size * sizeof(uint64_t),

I'm more of a fan of s->cluster_size instead of s->l2_size *
sizeof(uint64_t) but it's not like it matters...

> +QCOW2_DISCARD_ALWAYS);
> +s->l1_table[i] = 0;

I'd probably clear the overhanging s->l1_table entries before
bdrv_flush() (before you shouldn't really use them after
bdrv_pwrite_zeroes() has returned, even if bdrv_flush() has failed), but
it's not absolutely necessary. As long as they still have a refcount of
at least one, writing to them will just be useless but not destroy any data.

> +}
> +return 0;
> +}
> +
>  int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
>  bool exact_size)
>  {
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index 576ab551d6..e98306acd8 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -29,6 +29,7 @@
>  #include "block/qcow2.h"
>  #include "qemu/range.h"
>  #include "qemu/bswap.h"
> +#include "qemu/cutils.h"
>  
>  static int64_t alloc_clusters_noref(BlockDriverState *bs, uint64_t size);
>  static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
> @@ -2936,3 +2937,67 @@ done:
>  qemu_vfree(new_refblock);
>  return ret;
>  }
> +
> +int qcow2_shrink_reftable(BlockDriverState *bs)
> +{
> +BDRVQcow2State *s = bs->opaque;
> +uint64_t *reftable_tmp =
> +g_try_malloc(sizeof(uint64_t) * s->refcount_table_size);
> +int i, ret;
> +
> +if (s->refcount_table_size && reftable_tmp == NULL) {
> +return -ENOMEM;
> +}
> +
> +for (i = 0; i < s->refcount_table_size; i++) {
> +int64_t refblock_offs = s->refcount_table[i] & REFT_OFFSET_MASK;
> +void *refblock;
> +bool unused_block;
> +
> +if (refblock_offs == 0) {
> +reftable_tmp[i] = 0;
> +continue;
> +}
> +ret = qcow2_cache_get(bs, s->refcount_block_cache, refblock_offs,
> +  );
> +if (ret < 0) {
> +goto out;
> +}
> +
> +/* the refblock has own reference */
> +if (i == refblock_offs >> (s->refcount_block_bits + 
> s->cluster_bits)) {
> +uint64_t blk_index = (refblock_offs >> s->cluster_bits) &
> + (s->refcount_block_size - 1);
> +uint64_t refcount = s->get_refcount(refblock, blk_index);
> +
> +s->set_refcount(refblock, blk_index, 0);
> +
> +unused_block = buffer_is_zero(refblock, s->refcount_block_size);

s/refcount_block_size/cluster_size/

> +
> +s->set_refcount(refblock, blk_index, refcount);
> +} else {
> +unused_block = buffer_is_zero(refblock, 

Re: [Qemu-block] [Qemu-devel] [PATCH v3] live-block-ops.txt: Rename, rewrite, and improve it

2017-06-21 Thread John Snow


On 06/21/2017 06:19 AM, Kashyap Chamarthy wrote:
> This edition documents (including their QMP invocations) all four
> operations:
> 
>   - `block-stream`
>   - `block-commit`
>   - `drive-mirror` (& `blockdev-mirror`)
>   - `drive-backup` (& `blockdev-backup`)
> 
> Things considered while writing this document:
> 
>   - Use reStructuredText as markup language (with the goal of generating
> the HTML output using the Sphinx Documentation Generator).  It is
> gentler on the eye, and can be trivially converted to different
> formats.  (Another reason: upstream QEMU is considering to switch to
> Sphinx, which uses reStructuredText as its markup language.)
> 
>   - Raw QMP JSON output vs. 'qmp-shell'.  I debated with myself whether
> to only show raw QMP JSON output (as that is the canonical
> representation), or use 'qmp-shell', which takes key-value pairs.  I
> settled on the approach of: for the first occurence of a command,
> use raw JSON; for subsequent occurences, use 'qmp-shell', with an
> occasional exception.
> 
>   - Usage of `-blockdev` command-line.
> 
>   - Usage of 'node-name' vs. file path to refer to disks.  While we have
> `blockdev-{mirror, backup}` as 'node-name'-alternatives for
> `drive-{mirror, backup}`, the `block-commit` command still operate
> on file names for parameters 'base' and 'top'.  So I added a caveat
> at the beginning to that effect.
> 
> Refer this related thread that I started (where I learnt
> `block-stream` was recently reworked to accept 'node-name' for 'top'
> and 'base' parameters):
> https://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg06466.html
> "[RFC] Making 'block-stream', and 'block-commit' accept node-name"
> 
> All commands showed in this document were tested while documenting.
> 
> Thanks: Eric Blake for the section: "A note on points-in-time vs file
> names".  This useful bit was originally articulated by Eric in his
> KVMForum 2015 presentation, so I included that specific bit in this
> document.
> 
> Signed-off-by: Kashyap Chamarthy 
> ---
> * A Sphinx-rendered HTML version is here:
>   
> https://kashyapc.fedorapeople.org/v3-QEMU-Docs/_build/html/docs/live-block-operations.html
> 
> 

[snip]

> 
> * TODO (after feedback from John Snow):
>- Eric Blake suggested to consider documenting incremental backup
>  policies as part of the section: "Live disk backup ---
>  `drive-backup` and `blockdev-backup`"

Perhaps it could be mentioned, but hopefully I've covered it in some
sufficient detail in the (now) docs/devel/bitmaps.md file; I'm a little
wary of duplicating efforts in this area, but you've covered everything
*else* in good detail here, so now my file is the odd one out.

I will leave this up to you, really. Perhaps it could be paid some lip
service with a link to the other document? The detail in bitmaps.md is a
little more verbose than the rest of this file, so if you include it
wholesale it'd dwarf the rest of this document.

What do you think?



Re: [Qemu-block] [PATCH v2 2/4] qcow2: add qcow2_cache_discard

2017-06-21 Thread Max Reitz
On 2017-06-13 14:16, Pavel Butsykin wrote:
> Whenever l2/refcount table clusters are discarded from the file we can
> automatically drop unnecessary content of the cache tables. This reduces
> the chance of eviction useful cache data and eliminates inconsistent data
> in thecache with the data in the file.
> 
> Signed-off-by: Pavel Butsykin 
> ---
>  block/qcow2-cache.c| 21 +
>  block/qcow2-refcount.c |  5 +
>  block/qcow2.h  |  1 +
>  3 files changed, 27 insertions(+)
> 
> diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
> index 1d25147392..7931edf237 100644
> --- a/block/qcow2-cache.c
> +++ b/block/qcow2-cache.c
> @@ -411,3 +411,24 @@ void qcow2_cache_entry_mark_dirty(BlockDriverState *bs, 
> Qcow2Cache *c,
>  assert(c->entries[i].offset != 0);
>  c->entries[i].dirty = true;
>  }
> +
> +void qcow2_cache_discard(BlockDriverState *bs, Qcow2Cache *c, uint64_t 
> offset)
> +{
> +int i;
> +
> +for (i = 0; i < c->size; i++) {
> +if (c->entries[i].offset == offset) {
> +goto found; /* table offset */
> +}
> +}
> +return;
> +
> +found:
> +assert(c->entries[i].ref == 0);
> +
> +c->entries[i].offset = 0;
> +c->entries[i].lru_counter = 0;
> +c->entries[i].dirty = false;
> +
> +qcow2_cache_table_release(bs, c, i, 1);
> +}
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index 7c06061aae..576ab551d6 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -767,6 +767,11 @@ static int QEMU_WARN_UNUSED_RESULT 
> update_refcount(BlockDriverState *bs,
>  s->set_refcount(refcount_block, block_index, refcount);
>  
>  if (refcount == 0 && s->discard_passthrough[type]) {
> +qcow2_cache_put(bs, s->refcount_block_cache, _block);

I don't like this very much. It works, but it feels bad.

Would it be possible to store this refblock's offset somewhere and only
put it back if @offset is equal to that?

> +qcow2_cache_discard(bs, s->refcount_block_cache, offset);
> +
> +qcow2_cache_discard(bs, s->l2_table_cache, offset);
> +

So you're blindly calling qcow2_cache_discard() on @offset because
@offset may be pointing to a refblock or an L2 table? Right, that works,
but it still deserves a comment, I think (that we have no idea whether
@offset actually points to any of these refcount structures, but that we
also just don't have to care).

Looks OK to me, functionally, but I'd at least like to have that comment.

Max

>  update_refcount_discard(bs, cluster_offset, s->cluster_size);
>  }
>  }
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 1801dc30dc..07faa6dc78 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -597,5 +597,6 @@ int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, 
> uint64_t offset,
>  int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t 
> offset,
>  void **table);
>  void qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
> +void qcow2_cache_discard(BlockDriverState *bs, Qcow2Cache *c, uint64_t 
> offset);
>  
>  #endif
> 




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v2 1/4] qemu-img: add --shrink flag for resize

2017-06-21 Thread Max Reitz
On 2017-06-13 14:16, Pavel Butsykin wrote:
> The flag as additional precaution of data loss. Perhaps in the future the
> operation shrink without this flag will be banned, but while we need to
> maintain compatibility.
> 
> Signed-off-by: Pavel Butsykin 
> ---
>  qemu-img-cmds.hx   |  4 ++--
>  qemu-img.c | 15 +++
>  qemu-img.texi  |  5 -
>  tests/qemu-iotests/102 |  4 ++--
>  4 files changed, 23 insertions(+), 5 deletions(-)
> 
> diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
> index a39fcdba71..3b2eab9d20 100644
> --- a/qemu-img-cmds.hx
> +++ b/qemu-img-cmds.hx
> @@ -76,9 +76,9 @@ STEXI
>  ETEXI
>  
>  DEF("resize", img_resize,
> -"resize [--object objectdef] [--image-opts] [-q] filename [+ | -]size")
> +"resize [--object objectdef] [--image-opts] [-q] [--shrink] filename [+ 
> | -]size")
>  STEXI
> -@item resize [--object @var{objectdef}] [--image-opts] [-q] @var{filename} 
> [+ | -]@var{size}
> +@item resize [--object @var{objectdef}] [--image-opts] [-q] [--shrink] 
> @var{filename} [+ | -]@var{size}
>  ETEXI
>  
>  DEF("amend", img_amend,
> diff --git a/qemu-img.c b/qemu-img.c
> index 0ad698d7f1..bfe5f61b0b 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -61,6 +61,7 @@ enum {
>  OPTION_FLUSH_INTERVAL = 261,
>  OPTION_NO_DRAIN = 262,
>  OPTION_TARGET_IMAGE_OPTS = 263,
> +OPTION_SHRINK = 264,
>  };
>  
>  typedef enum OutputFormat {
> @@ -3452,6 +3453,7 @@ static int img_resize(int argc, char **argv)
>  },
>  };
>  bool image_opts = false;
> +bool shrink = false;
>  
>  /* Remove size from argv manually so that negative numbers are not 
> treated
>   * as options by getopt. */
> @@ -3469,6 +3471,7 @@ static int img_resize(int argc, char **argv)
>  {"help", no_argument, 0, 'h'},
>  {"object", required_argument, 0, OPTION_OBJECT},
>  {"image-opts", no_argument, 0, OPTION_IMAGE_OPTS},
> +{"shrink", no_argument, 0, OPTION_SHRINK},
>  {0, 0, 0, 0}
>  };
>  c = getopt_long(argc, argv, ":f:hq",
> @@ -3503,6 +3506,9 @@ static int img_resize(int argc, char **argv)
>  case OPTION_IMAGE_OPTS:
>  image_opts = true;
>  break;
> +case OPTION_SHRINK:
> +shrink = true;
> +break;
>  }
>  }
>  if (optind != argc - 1) {
> @@ -3562,6 +3568,15 @@ static int img_resize(int argc, char **argv)
>  goto out;
>  }
>  
> +if (total_size < blk_getlength(blk) && !shrink) {
> +qprintf(quiet, "Warning: shrinking of the image can lead to data 
> loss. "

I think this should always be printed to stderr, even if quiet is true;
especially considering we (or at least I) plan to make it mandatory.

> +   "Before performing shrink operation you must make 
> sure "

*Before performaing a shrink operation

Also, I'd rather use the imperative: "Before ... operation, make sure
that the..."

(English isn't my native language either, but as far as I remember
"must" is generally used for external influences. But it's not like a
force of nature is forcing you to confirm there's no important data; you
can just ignore this advice and see all of your non-backed-up childhood
pictures go to /dev/null.)

> +   "that the shrink part of image doesn't contain 
> important"

Hmm... I don't think "shrink part" really works.

Maybe the following would work better:

  Warning: Shrinking an image will delete all data beyond the shrunken
  image's end. Before performing such an operation, make sure there is
  no important data there.

> +   " data.\n");
> +qprintf(quiet,
> +"If you don't want to see this message use --shrink 
> option.\n");

You should make a note that --shrink may (and I hope it will) become
necessary in the future, like:

  Using the --shrink option will suppress this message. Note that future
  versions of qemu-img may refuse to shrink images without this option!

> +}
> +
>  ret = blk_truncate(blk, total_size, );
>  if (!ret) {
>  qprintf(quiet, "Image resized.\n");
> diff --git a/qemu-img.texi b/qemu-img.texi
> index 5b925ecf41..c2b694cd00 100644
> --- a/qemu-img.texi
> +++ b/qemu-img.texi
> @@ -499,7 +499,7 @@ qemu-img rebase -b base.img diff.qcow2
>  At this point, @code{modified.img} can be discarded, since
>  @code{base.img + diff.qcow2} contains the same information.
>  
> -@item resize @var{filename} [+ | -]@var{size}
> +@item resize [--shrink] @var{filename} [+ | -]@var{size}
>  
>  Change the disk image as if it had been created with @var{size}.
>  
> @@ -507,6 +507,9 @@ Before using this command to shrink a disk image, you 
> MUST use file system and
>  partitioning tools inside the VM to reduce allocated file systems and 
> partition
>  sizes accordingly.  Failure to do so will result in data loss!
>  
> +If @code{--shrink} is 

Re: [Qemu-block] [Qemu-devel] [PATCH v2 2/4] qapi: Add qobject_is_equal()

2017-06-21 Thread Markus Armbruster
Max Reitz  writes:

> This generic function (along with its implementations for different
> types) determines whether two QObjects are equal.
>
> Signed-off-by: Max Reitz 
> ---
>  include/qapi/qmp/qbool.h   |  1 +
>  include/qapi/qmp/qdict.h   |  1 +
>  include/qapi/qmp/qfloat.h  |  1 +
>  include/qapi/qmp/qint.h|  1 +
>  include/qapi/qmp/qlist.h   |  1 +
>  include/qapi/qmp/qnull.h   |  2 ++
>  include/qapi/qmp/qobject.h |  9 +
>  include/qapi/qmp/qstring.h |  1 +
>  qobject/qbool.c|  8 
>  qobject/qdict.c| 28 
>  qobject/qfloat.c   |  8 
>  qobject/qint.c |  8 
>  qobject/qlist.c| 30 ++
>  qobject/qnull.c|  5 +
>  qobject/qobject.c  | 30 ++
>  qobject/qstring.c  |  9 +
>  16 files changed, 143 insertions(+)

No unit test?

> diff --git a/include/qapi/qmp/qbool.h b/include/qapi/qmp/qbool.h
> index a4c..f77ea86 100644
> --- a/include/qapi/qmp/qbool.h
> +++ b/include/qapi/qmp/qbool.h
> @@ -24,6 +24,7 @@ typedef struct QBool {
>  QBool *qbool_from_bool(bool value);
>  bool qbool_get_bool(const QBool *qb);
>  QBool *qobject_to_qbool(const QObject *obj);
> +bool qbool_is_equal(const QObject *x, const QObject *y);
>  void qbool_destroy_obj(QObject *obj);
>  
>  #endif /* QBOOL_H */
> diff --git a/include/qapi/qmp/qdict.h b/include/qapi/qmp/qdict.h
> index 188440a..134a526 100644
> --- a/include/qapi/qmp/qdict.h
> +++ b/include/qapi/qmp/qdict.h
> @@ -41,6 +41,7 @@ void qdict_del(QDict *qdict, const char *key);
>  int qdict_haskey(const QDict *qdict, const char *key);
>  QObject *qdict_get(const QDict *qdict, const char *key);
>  QDict *qobject_to_qdict(const QObject *obj);
> +bool qdict_is_equal(const QObject *x, const QObject *y);
>  void qdict_iter(const QDict *qdict,
>  void (*iter)(const char *key, QObject *obj, void *opaque),
>  void *opaque);
> diff --git a/include/qapi/qmp/qfloat.h b/include/qapi/qmp/qfloat.h
> index b5d1583..318e91e 100644
> --- a/include/qapi/qmp/qfloat.h
> +++ b/include/qapi/qmp/qfloat.h
> @@ -24,6 +24,7 @@ typedef struct QFloat {
>  QFloat *qfloat_from_double(double value);
>  double qfloat_get_double(const QFloat *qi);
>  QFloat *qobject_to_qfloat(const QObject *obj);
> +bool qfloat_is_equal(const QObject *x, const QObject *y);
>  void qfloat_destroy_obj(QObject *obj);
>  
>  #endif /* QFLOAT_H */
> diff --git a/include/qapi/qmp/qint.h b/include/qapi/qmp/qint.h
> index 3aaff76..20975da 100644
> --- a/include/qapi/qmp/qint.h
> +++ b/include/qapi/qmp/qint.h
> @@ -23,6 +23,7 @@ typedef struct QInt {
>  QInt *qint_from_int(int64_t value);
>  int64_t qint_get_int(const QInt *qi);
>  QInt *qobject_to_qint(const QObject *obj);
> +bool qint_is_equal(const QObject *x, const QObject *y);
>  void qint_destroy_obj(QObject *obj);
>  
>  #endif /* QINT_H */
> diff --git a/include/qapi/qmp/qlist.h b/include/qapi/qmp/qlist.h
> index 5dc4ed9..4380a5b 100644
> --- a/include/qapi/qmp/qlist.h
> +++ b/include/qapi/qmp/qlist.h
> @@ -57,6 +57,7 @@ QObject *qlist_peek(QList *qlist);
>  int qlist_empty(const QList *qlist);
>  size_t qlist_size(const QList *qlist);
>  QList *qobject_to_qlist(const QObject *obj);
> +bool qlist_is_equal(const QObject *x, const QObject *y);
>  void qlist_destroy_obj(QObject *obj);
>  
>  static inline const QListEntry *qlist_first(const QList *qlist)
> diff --git a/include/qapi/qmp/qnull.h b/include/qapi/qmp/qnull.h
> index 69555ac..9299683 100644
> --- a/include/qapi/qmp/qnull.h
> +++ b/include/qapi/qmp/qnull.h
> @@ -23,4 +23,6 @@ static inline QObject *qnull(void)
>  return _;
>  }
>  
> +bool qnull_is_equal(const QObject *x, const QObject *y);
> +
>  #endif /* QNULL_H */
> diff --git a/include/qapi/qmp/qobject.h b/include/qapi/qmp/qobject.h
> index ef1d1a9..cac72e3 100644
> --- a/include/qapi/qmp/qobject.h
> +++ b/include/qapi/qmp/qobject.h
> @@ -68,6 +68,15 @@ static inline void qobject_incref(QObject *obj)
>  }
>  
>  /**
> + * qobject_is_equal(): Returns whether the two objects are equal.

Imperative mood, please: Return whether...

> + *
> + * Any of the pointers may be NULL; will return true if both are. Will always
> + * return false if only one is (therefore a QNull object is not considered 
> equal
> + * to a NULL pointer).
> + */

Humor me: wrap comment lines around column 70, and put two spaces
between sentences.

Same for the other function comments.

> +bool qobject_is_equal(const QObject *x, const QObject *y);
> +
> +/**
>   * qobject_destroy(): Free resources used by the object
>   */
>  void qobject_destroy(QObject *obj);
> diff --git a/include/qapi/qmp/qstring.h b/include/qapi/qmp/qstring.h
> index 10076b7..65c05a9 100644
> --- a/include/qapi/qmp/qstring.h
> +++ b/include/qapi/qmp/qstring.h
> @@ -31,6 +31,7 @@ void qstring_append_int(QString *qstring, int64_t value);
>  

Re: [Qemu-block] [Qemu-devel] [PATCH v2 1/4] qapi/qnull: Add own header

2017-06-21 Thread Markus Armbruster
Max Reitz  writes:

> Reviewed-by: Kevin Wolf 
> Signed-off-by: Max Reitz 
> ---
>  include/qapi/qmp/qnull.h   | 26 ++
>  include/qapi/qmp/qobject.h |  8 
>  include/qapi/qmp/types.h   |  1 +
>  qobject/qnull.c|  1 +
>  target/i386/cpu.c  |  6 +-
>  tests/check-qnull.c|  2 +-
>  6 files changed, 30 insertions(+), 14 deletions(-)
>  create mode 100644 include/qapi/qmp/qnull.h
>
> diff --git a/include/qapi/qmp/qnull.h b/include/qapi/qmp/qnull.h
> new file mode 100644
> index 000..69555ac
> --- /dev/null
> +++ b/include/qapi/qmp/qnull.h
> @@ -0,0 +1,26 @@
> +/*
> + * QNull Module
> + *
> + * Copyright (C) 2009, 2017 Red Hat Inc.
> + *
> + * Authors:
> + *  Luiz Capitulino 
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2.1 or 
> later.
> + * See the COPYING.LIB file in the top-level directory.

Copy the boilerplate from qnull.c instead, factual correctness.

> + */
> +
> +#ifndef QNULL_H
> +#define QNULL_H
> +
> +#include "qapi/qmp/qobject.h"
> +
> +extern QObject qnull_;
> +
> +static inline QObject *qnull(void)
> +{
> +qobject_incref(_);
> +return _;
> +}
> +
> +#endif /* QNULL_H */

Meh, another tiny header.  Are our compiles too fast?

> diff --git a/include/qapi/qmp/qobject.h b/include/qapi/qmp/qobject.h
> index b8ddbca..ef1d1a9 100644
> --- a/include/qapi/qmp/qobject.h
> +++ b/include/qapi/qmp/qobject.h
> @@ -93,12 +93,4 @@ static inline QType qobject_type(const QObject *obj)
>  return obj->type;
>  }
>  
> -extern QObject qnull_;
> -
> -static inline QObject *qnull(void)
> -{
> -qobject_incref(_);
> -return _;
> -}
> -
>  #endif /* QOBJECT_H */
> diff --git a/include/qapi/qmp/types.h b/include/qapi/qmp/types.h
> index 27cfbd8..4c87182 100644
> --- a/include/qapi/qmp/types.h
> +++ b/include/qapi/qmp/types.h
> @@ -20,5 +20,6 @@
>  #include "qapi/qmp/qstring.h"
>  #include "qapi/qmp/qdict.h"
>  #include "qapi/qmp/qlist.h"
> +#include "qapi/qmp/qnull.h"
>  
>  #endif /* QAPI_QMP_TYPES_H */
> diff --git a/qobject/qnull.c b/qobject/qnull.c
> index c124d05..b3cc85e 100644
> --- a/qobject/qnull.c
> +++ b/qobject/qnull.c
> @@ -13,6 +13,7 @@
>  #include "qemu/osdep.h"
>  #include "qemu-common.h"
>  #include "qapi/qmp/qobject.h"

Let's drop this include, like you do in check-qnull.c below.

> +#include "qapi/qmp/qnull.h"
>  
>  QObject qnull_ = {
>  .type = QTYPE_QNULL,
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index b2b1d20..f118a54 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -29,11 +29,7 @@
>  #include "qemu/option.h"
>  #include "qemu/config-file.h"
>  #include "qapi/qmp/qerror.h"
> -#include "qapi/qmp/qstring.h"
> -#include "qapi/qmp/qdict.h"
> -#include "qapi/qmp/qbool.h"
> -#include "qapi/qmp/qint.h"
> -#include "qapi/qmp/qfloat.h"
> +#include "qapi/qmp/types.h"

qapi/qmp/types.h is a lazy way to increase compile times by including
more than you need.  One day I'll kill it.  Until then, I tolerate it in
.c, but not in .h.

>  
>  #include "qapi-types.h"
>  #include "qapi-visit.h"
> diff --git a/tests/check-qnull.c b/tests/check-qnull.c
> index 8dd1c96..4a67b9a 100644
> --- a/tests/check-qnull.c
> +++ b/tests/check-qnull.c
> @@ -8,7 +8,7 @@
>   */
>  #include "qemu/osdep.h"
>  
> -#include "qapi/qmp/qobject.h"
> +#include "qapi/qmp/qnull.h"
>  #include "qemu-common.h"
>  #include "qapi/qobject-input-visitor.h"
>  #include "qapi/qobject-output-visitor.h"

With the boilerplate corrected and the superfluous include in qnull.c
deleted
Reviewed-by: Markus Armbruster 



Re: [Qemu-block] [Qemu-devel] [PATCH v2 3/4] block: qobject_is_equal() in bdrv_reopen_prepare()

2017-06-21 Thread Markus Armbruster
Max Reitz  writes:

> Currently, bdrv_reopen_prepare() assumes that all BDS options are
> strings. However, this is not the case if the BDS has been created
> through the json: pseudo-protocol or blockdev-add.
>
> Note that the user-invokable reopen command is an HMP command, so you
> can only specify strings there. Therefore, specifying a non-string
> option with the "same" value as it was when originally created will now
> return an error because the values are supposedly similar (and there is
> no way for the user to circumvent this but to just not specify the
> option again -- however, this is still strictly better than just
> crashing).
>
> Reviewed-by: Kevin Wolf 
> Signed-off-by: Max Reitz 
> ---
>  block.c | 15 +++
>  1 file changed, 3 insertions(+), 12 deletions(-)
>
> diff --git a/block.c b/block.c
> index fa1d06d..23424d5 100644
> --- a/block.c
> +++ b/block.c
> @@ -2950,19 +2950,10 @@ int bdrv_reopen_prepare(BDRVReopenState 
> *reopen_state, BlockReopenQueue *queue,
>  const QDictEntry *entry = qdict_first(reopen_state->options);
>  
>  do {
> -QString *new_obj = qobject_to_qstring(entry->value);
> -const char *new = qstring_get_str(new_obj);
> -/*
> - * Caution: while qdict_get_try_str() is fine, getting
> - * non-string types would require more care.  When
> - * bs->options come from -blockdev or blockdev_add, its
> - * members are typed according to the QAPI schema, but
> - * when they come from -drive, they're all QString.
> - */

Your commit message makes me suspect this comment still applies in some
form.  Does it?

> -const char *old = qdict_get_try_str(reopen_state->bs->options,
> -entry->key);
> +QObject *new = entry->value;
> +QObject *old = qdict_get(reopen_state->bs->options, entry->key);
>  
> -if (!old || strcmp(new, old)) {
> +if (!qobject_is_equal(new, old)) {
>  error_setg(errp, "Cannot change the option '%s'", 
> entry->key);
>  ret = -EINVAL;
>  goto error;



Re: [Qemu-block] [PATCH v9 13/20] qcow2: add support for LUKS encryption format

2017-06-21 Thread Max Reitz
On 2017-06-21 16:46, Max Reitz wrote:
> On 2017-06-21 16:42, Max Reitz wrote:
>> On 2017-06-19 19:34, Daniel P. Berrange wrote:
>>> This adds support for using LUKS as an encryption format
>>> with the qcow2 file, using the new encrypt.format parameter
>>> to request "luks" format. e.g.
>>>
>>>   # qemu-img create --object secret,data=123456,id=sec0 \
>>>-f qcow2 -o encrypt.format=luks,encrypt.key-secret=sec0 \
>>>test.qcow2 10G
>>>
>>> The legacy "encryption=on" parameter still results in
>>> creation of the old qcow2 AES format (and is equivalent
>>> to the new 'encryption-format=aes'). e.g. the following are
>>> equivalent:
>>>
>>>   # qemu-img create --object secret,data=123456,id=sec0 \
>>>-f qcow2 -o encryption=on,encrypt.key-secret=sec0 \
>>>test.qcow2 10G
>>>
>>>  # qemu-img create --object secret,data=123456,id=sec0 \
>>>-f qcow2 -o encryption-format=aes,encrypt.key-secret=sec0 \
>>>test.qcow2 10G
>>>
>>> With the LUKS format it is necessary to store the LUKS
>>> partition header and key material in the QCow2 file. This
>>> data can be many MB in size, so cannot go into the QCow2
>>> header region directly. Thus the spec defines a FDE
>>> (Full Disk Encryption) header extension that specifies
>>> the offset of a set of clusters to hold the FDE headers,
>>> as well as the length of that region. The LUKS header is
>>> thus stored in these extra allocated clusters before the
>>> main image payload.
>>>
>>> Aside from all the cryptographic differences implied by
>>> use of the LUKS format, there is one further key difference
>>> between the use of legacy AES and LUKS encryption in qcow2.
>>> For LUKS, the initialiazation vectors are generated using
>>> the host physical sector as the input, rather than the
>>> guest virtual sector. This guarantees unique initialization
>>> vectors for all sectors when qcow2 internal snapshots are
>>> used, thus giving stronger protection against watermarking
>>> attacks.
>>>
>>> Reviewed-by: Eric Blake 
>>> Reviewed-by: Alberto Garcia 
>>> Signed-off-by: Daniel P. Berrange 
>>> ---
>>>  block/qcow2-cluster.c  |   4 +-
>>>  block/qcow2-refcount.c |  10 ++
>>>  block/qcow2.c  | 268 
>>> ++--
>>>  block/qcow2.h  |   9 ++
>>>  qapi/block-core.json   |   5 +-
>>>  tests/qemu-iotests/082.out | 270 
>>> -
>>>  6 files changed, 478 insertions(+), 88 deletions(-)
>>
>> Reviewed-by: Max Reitz 
> 
> (But due to the split of do_perform_cow(), this will need the same
> changes that patch 10 and 11 will need (in this case only contextual, in
> the cases of 10 and 11 there are also functional changes required).)

(Turns out it will need a functional change, too, because
do_perform_cow_encrypt() doesn't take the host offset yet.)

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v9 13/20] qcow2: add support for LUKS encryption format

2017-06-21 Thread Max Reitz
On 2017-06-19 19:34, Daniel P. Berrange wrote:
> This adds support for using LUKS as an encryption format
> with the qcow2 file, using the new encrypt.format parameter
> to request "luks" format. e.g.
> 
>   # qemu-img create --object secret,data=123456,id=sec0 \
>-f qcow2 -o encrypt.format=luks,encrypt.key-secret=sec0 \
>test.qcow2 10G
> 
> The legacy "encryption=on" parameter still results in
> creation of the old qcow2 AES format (and is equivalent
> to the new 'encryption-format=aes'). e.g. the following are
> equivalent:
> 
>   # qemu-img create --object secret,data=123456,id=sec0 \
>-f qcow2 -o encryption=on,encrypt.key-secret=sec0 \
>test.qcow2 10G
> 
>  # qemu-img create --object secret,data=123456,id=sec0 \
>-f qcow2 -o encryption-format=aes,encrypt.key-secret=sec0 \
>test.qcow2 10G
> 
> With the LUKS format it is necessary to store the LUKS
> partition header and key material in the QCow2 file. This
> data can be many MB in size, so cannot go into the QCow2
> header region directly. Thus the spec defines a FDE
> (Full Disk Encryption) header extension that specifies
> the offset of a set of clusters to hold the FDE headers,
> as well as the length of that region. The LUKS header is
> thus stored in these extra allocated clusters before the
> main image payload.
> 
> Aside from all the cryptographic differences implied by
> use of the LUKS format, there is one further key difference
> between the use of legacy AES and LUKS encryption in qcow2.
> For LUKS, the initialiazation vectors are generated using
> the host physical sector as the input, rather than the
> guest virtual sector. This guarantees unique initialization
> vectors for all sectors when qcow2 internal snapshots are
> used, thus giving stronger protection against watermarking
> attacks.
> 
> Reviewed-by: Eric Blake 
> Reviewed-by: Alberto Garcia 
> Signed-off-by: Daniel P. Berrange 
> ---
>  block/qcow2-cluster.c  |   4 +-
>  block/qcow2-refcount.c |  10 ++
>  block/qcow2.c  | 268 ++--
>  block/qcow2.h  |   9 ++
>  qapi/block-core.json   |   5 +-
>  tests/qemu-iotests/082.out | 270 
> -
>  6 files changed, 478 insertions(+), 88 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v9 09/20] qcow: convert QCow to use QCryptoBlock for encryption

2017-06-21 Thread Max Reitz
On 2017-06-19 19:34, Daniel P. Berrange wrote:
> This converts the qcow driver to make use of the QCryptoBlock
> APIs for encrypting image content. This is only wired up to
> permit use of the legacy QCow encryption format. Users who wish
> to have the strong LUKS format should switch to qcow2 instead.
> 
> With this change it is now required to use the QCryptoSecret
> object for providing passwords, instead of the current block
> password APIs / interactive prompting.
> 
>   $QEMU \
> -object secret,id=sec0,filename=/home/berrange/encrypted.pw \
> -drive file=/home/berrange/encrypted.qcow,encrypt.format=qcow,\

Still should be encrypt.format=aes, but, well... Let's just give it a

Reviewed-by: Max Reitz 

regardless.

>encrypt.key-secret=sec0
> 
> Though note that running QEMU system emulators with the AES
> encryption is no longer supported, so while the above syntax
> is valid, QEMU will refuse to actually run the VM in this
> particular example.
> 
> Likewise when creating images with the legacy AES-CBC format
> 
>   qemu-img create -f qcow \
> --object secret,id=sec0,filename=/home/berrange/encrypted.pw \
> -o encrypt.format=aes,encrypt.key-secret=sec0 \
> /home/berrange/encrypted.qcow 64M
> 
> Reviewed-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> Signed-off-by: Daniel P. Berrange 
> ---
>  block/crypto.c   |  10 +++
>  block/crypto.h   |  20 --
>  block/qcow.c | 198 
> +--
>  qapi/block-core.json |  38 +-
>  4 files changed, 158 insertions(+), 108 deletions(-)



signature.asc
Description: OpenPGP digital signature


[Qemu-block] [PATCH v2 2/4] qapi: Add qobject_is_equal()

2017-06-21 Thread Max Reitz
This generic function (along with its implementations for different
types) determines whether two QObjects are equal.

Signed-off-by: Max Reitz 
---
 include/qapi/qmp/qbool.h   |  1 +
 include/qapi/qmp/qdict.h   |  1 +
 include/qapi/qmp/qfloat.h  |  1 +
 include/qapi/qmp/qint.h|  1 +
 include/qapi/qmp/qlist.h   |  1 +
 include/qapi/qmp/qnull.h   |  2 ++
 include/qapi/qmp/qobject.h |  9 +
 include/qapi/qmp/qstring.h |  1 +
 qobject/qbool.c|  8 
 qobject/qdict.c| 28 
 qobject/qfloat.c   |  8 
 qobject/qint.c |  8 
 qobject/qlist.c| 30 ++
 qobject/qnull.c|  5 +
 qobject/qobject.c  | 30 ++
 qobject/qstring.c  |  9 +
 16 files changed, 143 insertions(+)

diff --git a/include/qapi/qmp/qbool.h b/include/qapi/qmp/qbool.h
index a4c..f77ea86 100644
--- a/include/qapi/qmp/qbool.h
+++ b/include/qapi/qmp/qbool.h
@@ -24,6 +24,7 @@ typedef struct QBool {
 QBool *qbool_from_bool(bool value);
 bool qbool_get_bool(const QBool *qb);
 QBool *qobject_to_qbool(const QObject *obj);
+bool qbool_is_equal(const QObject *x, const QObject *y);
 void qbool_destroy_obj(QObject *obj);
 
 #endif /* QBOOL_H */
diff --git a/include/qapi/qmp/qdict.h b/include/qapi/qmp/qdict.h
index 188440a..134a526 100644
--- a/include/qapi/qmp/qdict.h
+++ b/include/qapi/qmp/qdict.h
@@ -41,6 +41,7 @@ void qdict_del(QDict *qdict, const char *key);
 int qdict_haskey(const QDict *qdict, const char *key);
 QObject *qdict_get(const QDict *qdict, const char *key);
 QDict *qobject_to_qdict(const QObject *obj);
+bool qdict_is_equal(const QObject *x, const QObject *y);
 void qdict_iter(const QDict *qdict,
 void (*iter)(const char *key, QObject *obj, void *opaque),
 void *opaque);
diff --git a/include/qapi/qmp/qfloat.h b/include/qapi/qmp/qfloat.h
index b5d1583..318e91e 100644
--- a/include/qapi/qmp/qfloat.h
+++ b/include/qapi/qmp/qfloat.h
@@ -24,6 +24,7 @@ typedef struct QFloat {
 QFloat *qfloat_from_double(double value);
 double qfloat_get_double(const QFloat *qi);
 QFloat *qobject_to_qfloat(const QObject *obj);
+bool qfloat_is_equal(const QObject *x, const QObject *y);
 void qfloat_destroy_obj(QObject *obj);
 
 #endif /* QFLOAT_H */
diff --git a/include/qapi/qmp/qint.h b/include/qapi/qmp/qint.h
index 3aaff76..20975da 100644
--- a/include/qapi/qmp/qint.h
+++ b/include/qapi/qmp/qint.h
@@ -23,6 +23,7 @@ typedef struct QInt {
 QInt *qint_from_int(int64_t value);
 int64_t qint_get_int(const QInt *qi);
 QInt *qobject_to_qint(const QObject *obj);
+bool qint_is_equal(const QObject *x, const QObject *y);
 void qint_destroy_obj(QObject *obj);
 
 #endif /* QINT_H */
diff --git a/include/qapi/qmp/qlist.h b/include/qapi/qmp/qlist.h
index 5dc4ed9..4380a5b 100644
--- a/include/qapi/qmp/qlist.h
+++ b/include/qapi/qmp/qlist.h
@@ -57,6 +57,7 @@ QObject *qlist_peek(QList *qlist);
 int qlist_empty(const QList *qlist);
 size_t qlist_size(const QList *qlist);
 QList *qobject_to_qlist(const QObject *obj);
+bool qlist_is_equal(const QObject *x, const QObject *y);
 void qlist_destroy_obj(QObject *obj);
 
 static inline const QListEntry *qlist_first(const QList *qlist)
diff --git a/include/qapi/qmp/qnull.h b/include/qapi/qmp/qnull.h
index 69555ac..9299683 100644
--- a/include/qapi/qmp/qnull.h
+++ b/include/qapi/qmp/qnull.h
@@ -23,4 +23,6 @@ static inline QObject *qnull(void)
 return _;
 }
 
+bool qnull_is_equal(const QObject *x, const QObject *y);
+
 #endif /* QNULL_H */
diff --git a/include/qapi/qmp/qobject.h b/include/qapi/qmp/qobject.h
index ef1d1a9..cac72e3 100644
--- a/include/qapi/qmp/qobject.h
+++ b/include/qapi/qmp/qobject.h
@@ -68,6 +68,15 @@ static inline void qobject_incref(QObject *obj)
 }
 
 /**
+ * qobject_is_equal(): Returns whether the two objects are equal.
+ *
+ * Any of the pointers may be NULL; will return true if both are. Will always
+ * return false if only one is (therefore a QNull object is not considered 
equal
+ * to a NULL pointer).
+ */
+bool qobject_is_equal(const QObject *x, const QObject *y);
+
+/**
  * qobject_destroy(): Free resources used by the object
  */
 void qobject_destroy(QObject *obj);
diff --git a/include/qapi/qmp/qstring.h b/include/qapi/qmp/qstring.h
index 10076b7..65c05a9 100644
--- a/include/qapi/qmp/qstring.h
+++ b/include/qapi/qmp/qstring.h
@@ -31,6 +31,7 @@ void qstring_append_int(QString *qstring, int64_t value);
 void qstring_append(QString *qstring, const char *str);
 void qstring_append_chr(QString *qstring, int c);
 QString *qobject_to_qstring(const QObject *obj);
+bool qstring_is_equal(const QObject *x, const QObject *y);
 void qstring_destroy_obj(QObject *obj);
 
 #endif /* QSTRING_H */
diff --git a/qobject/qbool.c b/qobject/qbool.c
index 0606bbd..f1d1719 100644
--- a/qobject/qbool.c
+++ b/qobject/qbool.c
@@ -52,6 +52,14 @@ QBool *qobject_to_qbool(const QObject 

[Qemu-block] [PATCH v2 4/4] iotests: Add test for non-string option reopening

2017-06-21 Thread Max Reitz
Reviewed-by: Kevin Wolf 
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/133 | 9 +
 tests/qemu-iotests/133.out | 5 +
 2 files changed, 14 insertions(+)

diff --git a/tests/qemu-iotests/133 b/tests/qemu-iotests/133
index 9d35a6a..af6b3e1 100755
--- a/tests/qemu-iotests/133
+++ b/tests/qemu-iotests/133
@@ -83,6 +83,15 @@ $QEMU_IO -c 'reopen -o driver=qcow2' $TEST_IMG
 $QEMU_IO -c 'reopen -o file.driver=file' $TEST_IMG
 $QEMU_IO -c 'reopen -o backing.driver=qcow2' $TEST_IMG
 
+echo
+echo "=== Check that reopening works with non-string options ==="
+echo
+
+# Using the json: pseudo-protocol we can create non-string options
+# (Invoke 'info' just so we get some output afterwards)
+IMGOPTSSYNTAX=false $QEMU_IO -f null-co -c 'reopen' -c 'info' \
+"json:{'driver': 'null-co', 'size': 65536}"
+
 # success, all done
 echo "*** done"
 rm -f $seq.full
diff --git a/tests/qemu-iotests/133.out b/tests/qemu-iotests/133.out
index cc86b94..f4a85ae 100644
--- a/tests/qemu-iotests/133.out
+++ b/tests/qemu-iotests/133.out
@@ -19,4 +19,9 @@ Cannot change the option 'driver'
 
 === Check that unchanged driver is okay ===
 
+
+=== Check that reopening works with non-string options ===
+
+format name: null-co
+format name: null-co
 *** done
-- 
2.9.4




[Qemu-block] [PATCH v2 3/4] block: qobject_is_equal() in bdrv_reopen_prepare()

2017-06-21 Thread Max Reitz
Currently, bdrv_reopen_prepare() assumes that all BDS options are
strings. However, this is not the case if the BDS has been created
through the json: pseudo-protocol or blockdev-add.

Note that the user-invokable reopen command is an HMP command, so you
can only specify strings there. Therefore, specifying a non-string
option with the "same" value as it was when originally created will now
return an error because the values are supposedly similar (and there is
no way for the user to circumvent this but to just not specify the
option again -- however, this is still strictly better than just
crashing).

Reviewed-by: Kevin Wolf 
Signed-off-by: Max Reitz 
---
 block.c | 15 +++
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/block.c b/block.c
index fa1d06d..23424d5 100644
--- a/block.c
+++ b/block.c
@@ -2950,19 +2950,10 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, 
BlockReopenQueue *queue,
 const QDictEntry *entry = qdict_first(reopen_state->options);
 
 do {
-QString *new_obj = qobject_to_qstring(entry->value);
-const char *new = qstring_get_str(new_obj);
-/*
- * Caution: while qdict_get_try_str() is fine, getting
- * non-string types would require more care.  When
- * bs->options come from -blockdev or blockdev_add, its
- * members are typed according to the QAPI schema, but
- * when they come from -drive, they're all QString.
- */
-const char *old = qdict_get_try_str(reopen_state->bs->options,
-entry->key);
+QObject *new = entry->value;
+QObject *old = qdict_get(reopen_state->bs->options, entry->key);
 
-if (!old || strcmp(new, old)) {
+if (!qobject_is_equal(new, old)) {
 error_setg(errp, "Cannot change the option '%s'", entry->key);
 ret = -EINVAL;
 goto error;
-- 
2.9.4




[Qemu-block] [PATCH v2 0/4] block: Don't compare strings in bdrv_reopen_prepare()

2017-06-21 Thread Max Reitz
bdrv_reopen_prepare() assumes that all BDS options are strings, which is
not necessarily correct. This series introduces a new qobject_is_equal()
function which can be used to test whether any options have changed,
independently of their type.


v2:
- Add comments detailing when QDicts and QLists are considered equal
  [Kevin]


git-backport-diff against v1:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/4:[] [--] 'qapi/qnull: Add own header'
002/4:[0009] [FC] 'qapi: Add qobject_is_equal()'
003/4:[] [--] 'block: qobject_is_equal() in bdrv_reopen_prepare()'
004/4:[] [--] 'iotests: Add test for non-string option reopening'


Max Reitz (4):
  qapi/qnull: Add own header
  qapi: Add qobject_is_equal()
  block: qobject_is_equal() in bdrv_reopen_prepare()
  iotests: Add test for non-string option reopening

 include/qapi/qmp/qbool.h   |  1 +
 include/qapi/qmp/qdict.h   |  1 +
 include/qapi/qmp/qfloat.h  |  1 +
 include/qapi/qmp/qint.h|  1 +
 include/qapi/qmp/qlist.h   |  1 +
 include/qapi/qmp/qnull.h   | 28 
 include/qapi/qmp/qobject.h | 17 +
 include/qapi/qmp/qstring.h |  1 +
 include/qapi/qmp/types.h   |  1 +
 block.c| 15 +++
 qobject/qbool.c|  8 
 qobject/qdict.c| 28 
 qobject/qfloat.c   |  8 
 qobject/qint.c |  8 
 qobject/qlist.c| 30 ++
 qobject/qnull.c|  6 ++
 qobject/qobject.c  | 30 ++
 qobject/qstring.c  |  9 +
 target/i386/cpu.c  |  6 +-
 tests/check-qnull.c|  2 +-
 tests/qemu-iotests/133 |  9 +
 tests/qemu-iotests/133.out |  5 +
 22 files changed, 190 insertions(+), 26 deletions(-)
 create mode 100644 include/qapi/qmp/qnull.h

-- 
2.9.4




[Qemu-block] [PATCH v2 1/4] qapi/qnull: Add own header

2017-06-21 Thread Max Reitz
Reviewed-by: Kevin Wolf 
Signed-off-by: Max Reitz 
---
 include/qapi/qmp/qnull.h   | 26 ++
 include/qapi/qmp/qobject.h |  8 
 include/qapi/qmp/types.h   |  1 +
 qobject/qnull.c|  1 +
 target/i386/cpu.c  |  6 +-
 tests/check-qnull.c|  2 +-
 6 files changed, 30 insertions(+), 14 deletions(-)
 create mode 100644 include/qapi/qmp/qnull.h

diff --git a/include/qapi/qmp/qnull.h b/include/qapi/qmp/qnull.h
new file mode 100644
index 000..69555ac
--- /dev/null
+++ b/include/qapi/qmp/qnull.h
@@ -0,0 +1,26 @@
+/*
+ * QNull Module
+ *
+ * Copyright (C) 2009, 2017 Red Hat Inc.
+ *
+ * Authors:
+ *  Luiz Capitulino 
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.1 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ */
+
+#ifndef QNULL_H
+#define QNULL_H
+
+#include "qapi/qmp/qobject.h"
+
+extern QObject qnull_;
+
+static inline QObject *qnull(void)
+{
+qobject_incref(_);
+return _;
+}
+
+#endif /* QNULL_H */
diff --git a/include/qapi/qmp/qobject.h b/include/qapi/qmp/qobject.h
index b8ddbca..ef1d1a9 100644
--- a/include/qapi/qmp/qobject.h
+++ b/include/qapi/qmp/qobject.h
@@ -93,12 +93,4 @@ static inline QType qobject_type(const QObject *obj)
 return obj->type;
 }
 
-extern QObject qnull_;
-
-static inline QObject *qnull(void)
-{
-qobject_incref(_);
-return _;
-}
-
 #endif /* QOBJECT_H */
diff --git a/include/qapi/qmp/types.h b/include/qapi/qmp/types.h
index 27cfbd8..4c87182 100644
--- a/include/qapi/qmp/types.h
+++ b/include/qapi/qmp/types.h
@@ -20,5 +20,6 @@
 #include "qapi/qmp/qstring.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qlist.h"
+#include "qapi/qmp/qnull.h"
 
 #endif /* QAPI_QMP_TYPES_H */
diff --git a/qobject/qnull.c b/qobject/qnull.c
index c124d05..b3cc85e 100644
--- a/qobject/qnull.c
+++ b/qobject/qnull.c
@@ -13,6 +13,7 @@
 #include "qemu/osdep.h"
 #include "qemu-common.h"
 #include "qapi/qmp/qobject.h"
+#include "qapi/qmp/qnull.h"
 
 QObject qnull_ = {
 .type = QTYPE_QNULL,
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index b2b1d20..f118a54 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -29,11 +29,7 @@
 #include "qemu/option.h"
 #include "qemu/config-file.h"
 #include "qapi/qmp/qerror.h"
-#include "qapi/qmp/qstring.h"
-#include "qapi/qmp/qdict.h"
-#include "qapi/qmp/qbool.h"
-#include "qapi/qmp/qint.h"
-#include "qapi/qmp/qfloat.h"
+#include "qapi/qmp/types.h"
 
 #include "qapi-types.h"
 #include "qapi-visit.h"
diff --git a/tests/check-qnull.c b/tests/check-qnull.c
index 8dd1c96..4a67b9a 100644
--- a/tests/check-qnull.c
+++ b/tests/check-qnull.c
@@ -8,7 +8,7 @@
  */
 #include "qemu/osdep.h"
 
-#include "qapi/qmp/qobject.h"
+#include "qapi/qmp/qnull.h"
 #include "qemu-common.h"
 #include "qapi/qobject-input-visitor.h"
 #include "qapi/qobject-output-visitor.h"
-- 
2.9.4




[Qemu-block] [PATCH v4 1/2] iotests: Use absolute paths for executables

2017-06-21 Thread Max Reitz
A user may specify a relative path for accessing qemu, qemu-img, etc.
through environment variables ($QEMU_PROG and friends) or a symlink.

If a test decides to change its working directory, relative paths will
cease to work, however. Work around this by making all of the paths to
programs that should undergo testing absolute. Besides "realpath", we
also have to use "which" to support programs in $PATH.

As a side effect, this fixes specifying these programs as symlinks for
out-of-tree builds: Before, you would have to create two symlinks, one
in the build and one in the source tree (the first one for common.config
to find, the second one for the iotest to use). Now it is sufficient to
create one in the build tree because common.config will resolve it.

Reported-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/common.config | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/tests/qemu-iotests/common.config b/tests/qemu-iotests/common.config
index d1b45f5..c1dc425 100644
--- a/tests/qemu-iotests/common.config
+++ b/tests/qemu-iotests/common.config
@@ -103,6 +103,12 @@ if [ -z "$QEMU_VXHS_PROG" ]; then
 export QEMU_VXHS_PROG="`set_prog_path qnio_server`"
 fi
 
+export QEMU_PROG=$(realpath -- "$(type -p "$QEMU_PROG")")
+export QEMU_IMG_PROG=$(realpath -- "$(type -p "$QEMU_IMG_PROG")")
+export QEMU_IO_PROG=$(realpath -- "$(type -p "$QEMU_IO_PROG")")
+export QEMU_NBD_PROG=$(realpath -- "$(type -p "$QEMU_NBD_PROG")")
+export QEMU_VXHS_PROG=$(realpath -- "$(type -p "$QEMU_VXHS_PROG")")
+
 _qemu_wrapper()
 {
 (
-- 
2.9.4




[Qemu-block] [PATCH v4 0/2] iotests: Add test for colon handling

2017-06-21 Thread Max Reitz
v2 of v3 of "iotests: Add test for colon handling"; thus, basically, v4.

This adds an iotest for the original series "block: Fix backing paths
for filenames with colons" and fixes common.config so it works if you
have specified the qemu binaries through relative paths. As a bonus, it
makes symlinked binaries work for out-of-tree builds.


v4: Use `type -p` instead of `which`, and use -- before the realpath
FILE argument [Eric]


git-backport-diff against v3:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/2:[0010] [FC] 'iotests: Use absolute paths for executables'
002/2:[] [--] 'iotests: Add test for colon handling'


Max Reitz (2):
  iotests: Use absolute paths for executables
  iotests: Add test for colon handling

 tests/qemu-iotests/126   | 105 +++
 tests/qemu-iotests/126.out   |  23 +
 tests/qemu-iotests/common.config |   6 +++
 tests/qemu-iotests/group |   1 +
 4 files changed, 135 insertions(+)
 create mode 100755 tests/qemu-iotests/126
 create mode 100644 tests/qemu-iotests/126.out

-- 
2.9.4




[Qemu-block] [PATCH v4 2/2] iotests: Add test for colon handling

2017-06-21 Thread Max Reitz
Reviewed-by: Eric Blake 
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/126 | 105 +
 tests/qemu-iotests/126.out |  23 ++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 129 insertions(+)
 create mode 100755 tests/qemu-iotests/126
 create mode 100644 tests/qemu-iotests/126.out

diff --git a/tests/qemu-iotests/126 b/tests/qemu-iotests/126
new file mode 100755
index 000..a2d4d6c
--- /dev/null
+++ b/tests/qemu-iotests/126
@@ -0,0 +1,105 @@
+#!/bin/bash
+#
+# Tests handling of colons in filenames (which may be confused with protocol
+# prefixes)
+#
+# Copyright (C) 2017 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=mre...@redhat.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+here="$PWD"
+status=1   # failure is the default!
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+# Needs backing file support
+_supported_fmt qcow qcow2 qed vmdk
+# This is the default protocol (and we want to test the difference between
+# colons which separate a protocol prefix from the rest and colons which are
+# just part of the filename, so we cannot test protocols which require a 
prefix)
+_supported_proto file
+_supported_os Linux
+
+echo
+echo '=== Testing plain files ==='
+echo
+
+# A colon after a slash is not a protocol prefix separator
+TEST_IMG="$TEST_DIR/a:b.$IMGFMT" _make_test_img 64M
+_rm_test_img "$TEST_DIR/a:b.$IMGFMT"
+
+# But if you want to be really sure, you can do this
+TEST_IMG="file:$TEST_DIR/a:b.$IMGFMT" _make_test_img 64M
+_rm_test_img "$TEST_DIR/a:b.$IMGFMT"
+
+
+echo
+echo '=== Testing relative backing filename resolution ==='
+echo
+
+BASE_IMG="$TEST_DIR/image:base.$IMGFMT"
+TOP_IMG="$TEST_DIR/image:top.$IMGFMT"
+
+TEST_IMG=$BASE_IMG _make_test_img 64M
+TEST_IMG=$TOP_IMG _make_test_img -b ./image:base.$IMGFMT
+
+# The default cluster size depends on the image format
+TEST_IMG=$TOP_IMG _img_info | grep -v 'cluster_size'
+
+_rm_test_img "$BASE_IMG"
+_rm_test_img "$TOP_IMG"
+
+
+# Do another test where we access both top and base without any slash in them
+echo
+pushd "$TEST_DIR" >/dev/null
+
+BASE_IMG="base.$IMGFMT"
+TOP_IMG="file:image:top.$IMGFMT"
+
+TEST_IMG=$BASE_IMG _make_test_img 64M
+TEST_IMG=$TOP_IMG _make_test_img -b "$BASE_IMG"
+
+TEST_IMG=$TOP_IMG _img_info | grep -v 'cluster_size'
+
+_rm_test_img "$BASE_IMG"
+_rm_test_img "image:top.$IMGFMT"
+
+popd >/dev/null
+
+# Note that we could also do the same test with 
BASE_IMG=file:image:base.$IMGFMT
+# -- but behavior for that case is a bit strange. Protocol-prefixed paths are
+# in a sense always absolute paths, so such paths will never be combined with
+# the path of the overlay. But since "image:base.$IMGFMT" is actually a
+# relative path, it will always be evaluated relative to qemu's CWD (but not
+# relative to the overlay!). While this is more or less intended, it is still
+# pretty strange and thus not something that is tested here.
+# (The root of the issue is the use of a relative path with a protocol prefix.
+#  This may always give you weird results because in one sense, qemu considers
+#  such paths absolute, whereas in another, they are still relative.)
+
+
+# success, all done
+echo '*** done'
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/126.out b/tests/qemu-iotests/126.out
new file mode 100644
index 000..50d7308
--- /dev/null
+++ b/tests/qemu-iotests/126.out
@@ -0,0 +1,23 @@
+QA output created by 126
+
+=== Testing plain files ===
+
+Formatting 'TEST_DIR/a:b.IMGFMT', fmt=IMGFMT size=67108864
+Formatting 'TEST_DIR/a:b.IMGFMT', fmt=IMGFMT size=67108864
+
+=== Testing relative backing filename resolution ===
+
+Formatting 'TEST_DIR/image:base.IMGFMT', fmt=IMGFMT size=67108864
+Formatting 'TEST_DIR/image:top.IMGFMT', fmt=IMGFMT size=67108864 
backing_file=./image:base.IMGFMT
+image: TEST_DIR/image:top.IMGFMT
+file format: IMGFMT
+virtual size: 64M (67108864 bytes)
+backing file: ./image:base.IMGFMT (actual path: TEST_DIR/./image:base.IMGFMT)
+
+Formatting 'base.IMGFMT', fmt=IMGFMT size=67108864
+Formatting 'file:image:top.IMGFMT', fmt=IMGFMT size=67108864 
backing_file=base.IMGFMT
+image: ./image:top.IMGFMT
+file format: IMGFMT
+virtual size: 64M (67108864 bytes)
+backing file: base.IMGFMT (actual path: ./base.IMGFMT)
+*** done
diff 

[Qemu-block] [PATCH] iotests: 181 does not work for all formats

2017-06-21 Thread Max Reitz
Test 181 only works for formats which support live migration (naturally,
as it is a live migration test). Disable it for all formats which do
not.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/181 | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tests/qemu-iotests/181 b/tests/qemu-iotests/181
index e969a2a..f73ad5a 100755
--- a/tests/qemu-iotests/181
+++ b/tests/qemu-iotests/181
@@ -43,6 +43,8 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.qemu
 
 _supported_fmt generic
+# Formats that do not support live migration
+_unsupported_fmt qcow vdi vhdx vmdk vpc vvfat
 _supported_proto generic
 _supported_os Linux
 
-- 
2.9.4




[Qemu-block] [PATCH v2 3/3] xen-disk: use an IOThread per instance

2017-06-21 Thread Paul Durrant
This patch allocates an IOThread object for each xen_disk instance and
sets the AIO context appropriately on connect. This allows processing
of I/O to proceed in parallel.

The patch also adds tracepoints into xen_disk to make it possible to
follow the state transtions of an instance in the log.

Signed-off-by: Paul Durrant 
---
Cc: Stefano Stabellini 
Cc: Anthony Perard 
Cc: Kevin Wolf 
Cc: Max Reitz 

v2:
 - explicitly acquire and release AIO context in qemu_aio_complete() and
   blk_bh()
---
 hw/block/trace-events |  7 ++
 hw/block/xen_disk.c   | 69 ---
 2 files changed, 67 insertions(+), 9 deletions(-)

diff --git a/hw/block/trace-events b/hw/block/trace-events
index 65e83dc258..608b24ba66 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -10,3 +10,10 @@ virtio_blk_submit_multireq(void *mrb, int start, int 
num_reqs, uint64_t offset,
 # hw/block/hd-geometry.c
 hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p LCHS 
%d %d %d"
 hd_geometry_guess(void *blk, uint32_t cyls, uint32_t heads, uint32_t secs, int 
trans) "blk %p CHS %u %u %u trans %d"
+
+# hw/block/xen_disk.c
+xen_disk_alloc(char *name) "%s"
+xen_disk_init(char *name) "%s"
+xen_disk_connect(char *name) "%s"
+xen_disk_disconnect(char *name) "%s"
+xen_disk_free(char *name) "%s"
diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index 0e6513708e..8548195195 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -27,10 +27,13 @@
 #include "hw/xen/xen_backend.h"
 #include "xen_blkif.h"
 #include "sysemu/blockdev.h"
+#include "sysemu/iothread.h"
 #include "sysemu/block-backend.h"
 #include "qapi/error.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qstring.h"
+#include "qom/object_interfaces.h"
+#include "trace.h"
 
 /* - */
 
@@ -128,6 +131,9 @@ struct XenBlkDev {
 DriveInfo   *dinfo;
 BlockBackend*blk;
 QEMUBH  *bh;
+
+IOThread*iothread;
+AioContext  *ctx;
 };
 
 /* - */
@@ -599,9 +605,12 @@ static int ioreq_runio_qemu_aio(struct ioreq *ioreq);
 static void qemu_aio_complete(void *opaque, int ret)
 {
 struct ioreq *ioreq = opaque;
+struct XenBlkDev *blkdev = ioreq->blkdev;
+
+aio_context_acquire(blkdev->ctx);
 
 if (ret != 0) {
-xen_pv_printf(>blkdev->xendev, 0, "%s I/O error\n",
+xen_pv_printf(>xendev, 0, "%s I/O error\n",
   ioreq->req.operation == BLKIF_OP_READ ? "read" : 
"write");
 ioreq->aio_errors++;
 }
@@ -610,13 +619,13 @@ static void qemu_aio_complete(void *opaque, int ret)
 if (ioreq->presync) {
 ioreq->presync = 0;
 ioreq_runio_qemu_aio(ioreq);
-return;
+goto done;
 }
 if (ioreq->aio_inflight > 0) {
-return;
+goto done;
 }
 
-if (ioreq->blkdev->feature_grant_copy) {
+if (blkdev->feature_grant_copy) {
 switch (ioreq->req.operation) {
 case BLKIF_OP_READ:
 /* in case of failure ioreq->aio_errors is increased */
@@ -638,7 +647,7 @@ static void qemu_aio_complete(void *opaque, int ret)
 }
 
 ioreq->status = ioreq->aio_errors ? BLKIF_RSP_ERROR : BLKIF_RSP_OKAY;
-if (!ioreq->blkdev->feature_grant_copy) {
+if (!blkdev->feature_grant_copy) {
 ioreq_unmap(ioreq);
 }
 ioreq_finish(ioreq);
@@ -650,16 +659,19 @@ static void qemu_aio_complete(void *opaque, int ret)
 }
 case BLKIF_OP_READ:
 if (ioreq->status == BLKIF_RSP_OKAY) {
-block_acct_done(blk_get_stats(ioreq->blkdev->blk), >acct);
+block_acct_done(blk_get_stats(blkdev->blk), >acct);
 } else {
-block_acct_failed(blk_get_stats(ioreq->blkdev->blk), >acct);
+block_acct_failed(blk_get_stats(blkdev->blk), >acct);
 }
 break;
 case BLKIF_OP_DISCARD:
 default:
 break;
 }
-qemu_bh_schedule(ioreq->blkdev->bh);
+qemu_bh_schedule(blkdev->bh);
+
+done:
+aio_context_release(blkdev->ctx);
 }
 
 static bool blk_split_discard(struct ioreq *ioreq, blkif_sector_t 
sector_number,
@@ -917,17 +929,40 @@ static void blk_handle_requests(struct XenBlkDev *blkdev)
 static void blk_bh(void *opaque)
 {
 struct XenBlkDev *blkdev = opaque;
+
+aio_context_acquire(blkdev->ctx);
 blk_handle_requests(blkdev);
+aio_context_release(blkdev->ctx);
 }
 
 static void blk_alloc(struct XenDevice *xendev)
 {
 struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
+Object *obj;
+char *name;
+Error *err = NULL;
+
+trace_xen_disk_alloc(xendev->name);
 
 QLIST_INIT(>inflight);
 QLIST_INIT(>finished);
 QLIST_INIT(>freelist);
-blkdev->bh = qemu_bh_new(blk_bh, 

[Qemu-block] [PATCH v2 2/3] xen-disk: add support for multi-page shared rings

2017-06-21 Thread Paul Durrant
The blkif protocol has had provision for negotiation of multi-page shared
rings for some time now and many guest OS have support in their frontend
drivers.

This patch makes the necessary modifications to xen-disk support a shared
ring up to order 4 (i.e. 16 pages).

Signed-off-by: Paul Durrant 
---
Cc: Stefano Stabellini 
Cc: Anthony Perard 
Cc: Kevin Wolf 
Cc: Max Reitz 

v2:
 - Fix memory leak in error path
 - Print warning if ring-page-order exceeds limits
---
 hw/block/xen_disk.c | 144 +---
 1 file changed, 113 insertions(+), 31 deletions(-)

diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index 9b06e3aa81..0e6513708e 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -36,8 +36,6 @@
 
 static int batch_maps   = 0;
 
-static int max_requests = 32;
-
 /* - */
 
 #define BLOCK_SIZE  512
@@ -84,6 +82,8 @@ struct ioreq {
 BlockAcctCookie acct;
 };
 
+#define MAX_RING_PAGE_ORDER 4
+
 struct XenBlkDev {
 struct XenDevicexendev;  /* must be first */
 char*params;
@@ -94,7 +94,8 @@ struct XenBlkDev {
 booldirectiosafe;
 const char  *fileproto;
 const char  *filename;
-int ring_ref;
+unsigned intring_ref[1 << MAX_RING_PAGE_ORDER];
+unsigned intnr_ring_ref;
 void*sring;
 int64_t file_blk;
 int64_t file_size;
@@ -110,6 +111,7 @@ struct XenBlkDev {
 int requests_total;
 int requests_inflight;
 int requests_finished;
+unsigned intmax_requests;
 
 /* Persistent grants extension */
 gbooleanfeature_discard;
@@ -199,7 +201,7 @@ static struct ioreq *ioreq_start(struct XenBlkDev *blkdev)
 struct ioreq *ioreq = NULL;
 
 if (QLIST_EMPTY(>freelist)) {
-if (blkdev->requests_total >= max_requests) {
+if (blkdev->requests_total >= blkdev->max_requests) {
 goto out;
 }
 /* allocate new struct */
@@ -905,7 +907,7 @@ static void blk_handle_requests(struct XenBlkDev *blkdev)
 ioreq_runio_qemu_aio(ioreq);
 }
 
-if (blkdev->more_work && blkdev->requests_inflight < max_requests) {
+if (blkdev->more_work && blkdev->requests_inflight < blkdev->max_requests) 
{
 qemu_bh_schedule(blkdev->bh);
 }
 }
@@ -918,15 +920,6 @@ static void blk_bh(void *opaque)
 blk_handle_requests(blkdev);
 }
 
-/*
- * We need to account for the grant allocations requiring contiguous
- * chunks; the worst case number would be
- * max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
- * but in order to keep things simple just use
- * 2 * max_req * max_seg.
- */
-#define MAX_GRANTS(max_req, max_seg) (2 * (max_req) * (max_seg))
-
 static void blk_alloc(struct XenDevice *xendev)
 {
 struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
@@ -938,11 +931,6 @@ static void blk_alloc(struct XenDevice *xendev)
 if (xen_mode != XEN_EMULATE) {
 batch_maps = 1;
 }
-if (xengnttab_set_max_grants(xendev->gnttabdev,
-MAX_GRANTS(max_requests, BLKIF_MAX_SEGMENTS_PER_REQUEST)) < 0) {
-xen_pv_printf(xendev, 0, "xengnttab_set_max_grants failed: %s\n",
-  strerror(errno));
-}
 }
 
 static void blk_parse_discard(struct XenBlkDev *blkdev)
@@ -1037,6 +1025,9 @@ static int blk_init(struct XenDevice *xendev)
   !blkdev->feature_grant_copy);
 xenstore_write_be_int(>xendev, "info", info);
 
+xenstore_write_be_int(>xendev, "max-ring-page-order",
+  MAX_RING_PAGE_ORDER);
+
 blk_parse_discard(blkdev);
 
 g_free(directiosafe);
@@ -1058,12 +1049,25 @@ out_error:
 return -1;
 }
 
+/*
+ * We need to account for the grant allocations requiring contiguous
+ * chunks; the worst case number would be
+ * max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
+ * but in order to keep things simple just use
+ * 2 * max_req * max_seg.
+ */
+#define MAX_GRANTS(max_req, max_seg) (2 * (max_req) * (max_seg))
+
 static int blk_connect(struct XenDevice *xendev)
 {
 struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
 int pers, index, qflags;
 bool readonly = true;
 bool writethrough = true;
+int order, ring_ref;
+unsigned int ring_size, max_grants;
+unsigned int i;
+uint32_t *domids;
 
 /* read-only ? */
 if (blkdev->directiosafe) {
@@ -1138,9 +1142,42 @@ static int blk_connect(struct XenDevice *xendev)
 xenstore_write_be_int64(>xendev, "sectors",
 blkdev->file_size / blkdev->file_blk);
 
-if (xenstore_read_fe_int(>xendev, "ring-ref", >ring_ref) 
== -1) {
+

[Qemu-block] [PATCH v2 0/3] xen-disk: performance improvements

2017-06-21 Thread Paul Durrant
Paul Durrant (3):
  xen-disk: only advertize feature-persistent if grant copy is not
available
  xen-disk: add support for multi-page shared rings
  xen-disk: use an IOThread per instance

 hw/block/trace-events |   7 ++
 hw/block/xen_disk.c   | 228 +++---
 2 files changed, 188 insertions(+), 47 deletions(-)

-- 
2.11.0




[Qemu-block] [PATCH v5 25/25] block/null: Generate filename even with latency-ns

2017-06-21 Thread Max Reitz
While we cannot represent the latency-ns option in a filename, it is not
a significant option so not being able to should not stop us from
generating a filename nonetheless.

Signed-off-by: Max Reitz 
---
 block/null.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/null.c b/block/null.c
index fb03fb9..f665bcf 100644
--- a/block/null.c
+++ b/block/null.c
@@ -231,7 +231,8 @@ static void null_refresh_filename(BlockDriverState *bs)
 {
 /* These options can be ignored */
 if (strcmp(qdict_entry_key(e), "filename") &&
-strcmp(qdict_entry_key(e), "driver"))
+strcmp(qdict_entry_key(e), "driver") &&
+strcmp(qdict_entry_key(e), NULL_OPT_LATENCY))
 {
 return;
 }
-- 
2.9.4




[Qemu-block] [PATCH v5 23/25] block: Fix FIXME from "Add BDS.backing_overridden"

2017-06-21 Thread Max Reitz
Said commit introduced a FIXME stating that bdrv_open_backing_file()
should set bs->backing_overridden to true not only if the file.filename
option was set, but if the "options" QDict contained any option that is
significant for any node in the BDS tree emerging from the backing BDS.
This behavior is implemented by this patch.

Signed-off-by: Max Reitz 
---
 block.c | 81 -
 1 file changed, 75 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index 2542ff5..2c3c983 100644
--- a/block.c
+++ b/block.c
@@ -74,6 +74,9 @@ static BlockDriverState *bdrv_open_inherit(const char 
*filename,
const BdrvChildRole *child_role,
Error **errp);
 
+static bool has_significant_runtime_options(BlockDriverState *bs,
+const QDict *d);
+
 /* If non-zero, use only whitelisted block drivers */
 static int use_bdrv_whitelist;
 
@@ -2124,6 +2127,42 @@ out:
 bdrv_refresh_limits(bs, NULL);
 }
 
+/**
+ * Checks whether @options contains any significant option for any of the nodes
+ * in the BDS tree emerging from @bs.
+ */
+static bool is_significant_option_tree(BlockDriverState *bs, QDict *options)
+{
+BdrvChild *child;
+
+if (!qdict_size(options)) {
+/* No need to recurse */
+return false;
+}
+
+if (has_significant_runtime_options(bs, options)) {
+return true;
+}
+
+QLIST_FOREACH(child, >children, next) {
+QDict *child_options;
+char *option_prefix;
+
+option_prefix = g_strdup_printf("%s.", child->name);
+qdict_extract_subqdict(options, _options, option_prefix);
+g_free(option_prefix);
+
+if (is_significant_option_tree(child->bs, child_options)) {
+QDECREF(child_options);
+return true;
+}
+
+QDECREF(child_options);
+}
+
+return false;
+}
+
 /*
  * Opens the backing file for a BlockDriverState if not yet open
  *
@@ -2142,7 +2181,7 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*parent_options,
 const char *reference = NULL;
 int ret = 0;
 BlockDriverState *backing_hd;
-QDict *options;
+QDict *options, *cloned_options = NULL;
 QDict *tmp_parent_options = NULL;
 Error *local_err = NULL;
 
@@ -2172,11 +2211,6 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*parent_options,
 reference = qdict_get_try_str(parent_options, bdref_key);
 if (reference || qdict_haskey(options, "file.filename")) {
 /* keep backing_filename NULL */
-
-/* FIXME: Should also be set to true if @options contains other runtime
- *options which control the data that is read from the backing
- *BDS */
-bs->backing_overridden = true;
 } else if (bs->backing_file[0] == '\0' && qdict_size(options) == 0) {
 QDECREF(options);
 goto free_exit;
@@ -2197,6 +2231,8 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*parent_options,
 goto free_exit;
 }
 
+cloned_options = qdict_clone_shallow(options);
+
 if (bs->backing_format[0] != '\0' && !qdict_haskey(options, "driver")) {
 qdict_put_str(options, "driver", bs->backing_format);
 }
@@ -2210,6 +2246,10 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*parent_options,
 goto free_exit;
 }
 
+if (reference || is_significant_option_tree(backing_hd, cloned_options)) {
+bs->backing_overridden = true;
+}
+
 /* Hook up the backing file link; drop our reference, bs owns the
  * backing_hd reference now */
 bdrv_set_backing_hd(bs, backing_hd, _err);
@@ -2225,6 +2265,7 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*parent_options,
 free_exit:
 g_free(backing_filename);
 QDECREF(tmp_parent_options);
+QDECREF(cloned_options);
 return ret;
 }
 
@@ -4802,6 +4843,34 @@ static const char *const 
*significant_options(BlockDriverState *bs,
 }
 
 /**
+ * Returns true if @d contains any options the block driver of @bs considers to
+ * be significant runtime options.
+ */
+static bool has_significant_runtime_options(BlockDriverState *bs,
+const QDict *d)
+{
+const char *const *option_name = NULL;
+
+while ((option_name = significant_options(bs, option_name))) {
+assert(strlen(*option_name) > 0);
+if ((*option_name)[strlen(*option_name) - 1] != '.') {
+if (qdict_haskey(d, *option_name)) {
+return true;
+}
+} else {
+const QDictEntry *entry;
+for (entry = qdict_first(d); entry; entry = qdict_next(d, entry)) {
+if (strstart(qdict_entry_key(entry), *option_name, NULL)) {
+return true;
+}
+}
+}
+}
+
+return false;
+}
+

[Qemu-block] [PATCH v5 24/25] block/curl: Implement bdrv_refresh_filename()

2017-06-21 Thread Max Reitz
Signed-off-by: Max Reitz 
---
 block/curl.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/block/curl.c b/block/curl.c
index 11318a9..fe57223 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -957,6 +957,20 @@ static int64_t curl_getlength(BlockDriverState *bs)
 return s->len;
 }
 
+static void curl_refresh_filename(BlockDriverState *bs)
+{
+BDRVCURLState *s = bs->opaque;
+
+if (!s->sslverify || s->cookie ||
+s->username || s->password || s->proxyusername || s->proxypassword)
+{
+return;
+}
+
+pstrcpy(bs->exact_filename, sizeof(bs->exact_filename), s->url);
+}
+
+
 static const char *const curl_sgfnt_runtime_opts[] = {
 CURL_BLOCK_OPT_URL,
 CURL_BLOCK_OPT_SSLVERIFY,
@@ -985,6 +999,7 @@ static BlockDriver bdrv_http = {
 .bdrv_detach_aio_context= curl_detach_aio_context,
 .bdrv_attach_aio_context= curl_attach_aio_context,
 
+.bdrv_refresh_filename  = curl_refresh_filename,
 .sgfnt_runtime_opts = curl_sgfnt_runtime_opts,
 };
 
@@ -1003,6 +1018,7 @@ static BlockDriver bdrv_https = {
 .bdrv_detach_aio_context= curl_detach_aio_context,
 .bdrv_attach_aio_context= curl_attach_aio_context,
 
+.bdrv_refresh_filename  = curl_refresh_filename,
 .sgfnt_runtime_opts = curl_sgfnt_runtime_opts,
 };
 
@@ -1021,6 +1037,7 @@ static BlockDriver bdrv_ftp = {
 .bdrv_detach_aio_context= curl_detach_aio_context,
 .bdrv_attach_aio_context= curl_attach_aio_context,
 
+.bdrv_refresh_filename  = curl_refresh_filename,
 .sgfnt_runtime_opts = curl_sgfnt_runtime_opts,
 };
 
@@ -1039,6 +1056,7 @@ static BlockDriver bdrv_ftps = {
 .bdrv_detach_aio_context= curl_detach_aio_context,
 .bdrv_attach_aio_context= curl_attach_aio_context,
 
+.bdrv_refresh_filename  = curl_refresh_filename,
 .sgfnt_runtime_opts = curl_sgfnt_runtime_opts,
 };
 
-- 
2.9.4




[Qemu-block] [PATCH v5 21/25] block: Purify .bdrv_refresh_filename()

2017-06-21 Thread Max Reitz
Currently, BlockDriver.bdrv_refresh_filename() is supposed to both
refresh the filename (BDS.exact_filename) and set BDS.full_open_options.
Now that we have generic code in the central bdrv_refresh_filename() for
creating BDS.full_open_options, we can drop the latter part from all
BlockDriver.bdrv_refresh_filename() implementations.

This also means that we can drop all of the existing default code for
this from the global bdrv_refresh_filename() itself.

Furthermore, we now have to call BlockDriver.bdrv_refresh_filename()
after having set BDS.full_open_options, because the block driver's
implementation should now be allowed to depend on BDS.full_open_options
being set correctly.

Finally, with this patch we can drop the @options parameter from
BlockDriver.bdrv_refresh_filename(); also, add a comment on this
function's purpose in block/block_int.h while touching its interface.

Signed-off-by: Max Reitz 
---
 include/block/block_int.h |   6 +-
 block.c   | 145 +++---
 block/blkdebug.c  |  52 ++---
 block/blkverify.c |  16 +
 block/commit.c|   2 +-
 block/mirror.c|   2 +-
 block/nbd.c   |  23 +---
 block/nfs.c   |  36 +---
 block/null.c  |  23 +---
 block/quorum.c|  30 --
 10 files changed, 63 insertions(+), 272 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index ec37ab3..5283509 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -129,7 +129,11 @@ struct BlockDriver {
 int (*bdrv_set_key)(BlockDriverState *bs, const char *key);
 int (*bdrv_make_empty)(BlockDriverState *bs);
 
-void (*bdrv_refresh_filename)(BlockDriverState *bs, QDict *options);
+/*
+ * Refreshes the bs->exact_filename field. If that is impossible,
+ * bs->exact_filename has to be left empty.
+ */
+void (*bdrv_refresh_filename)(BlockDriverState *bs);
 
 /*
  * Gathers the open options for all children into @target. A simple format
diff --git a/block.c b/block.c
index af5d1ef..e22771b 100644
--- a/block.c
+++ b/block.c
@@ -4859,47 +4859,6 @@ static bool append_significant_runtime_options(QDict *d, 
BlockDriverState *bs)
 return found_any;
 }
 
-static bool append_open_options(QDict *d, BlockDriverState *bs)
-{
-const QDictEntry *entry;
-QemuOptDesc *desc;
-BdrvChild *child;
-bool found_any = false;
-const char *p;
-
-for (entry = qdict_first(bs->options); entry;
- entry = qdict_next(bs->options, entry))
-{
-/* Exclude options for children */
-QLIST_FOREACH(child, >children, next) {
-if (strstart(qdict_entry_key(entry), child->name, )
-&& (!*p || *p == '.'))
-{
-break;
-}
-}
-if (child) {
-continue;
-}
-
-/* And exclude all non-driver-specific options */
-for (desc = bdrv_runtime_opts.desc; desc->name; desc++) {
-if (!strcmp(qdict_entry_key(entry), desc->name)) {
-break;
-}
-}
-if (desc->name) {
-continue;
-}
-
-qobject_incref(qdict_entry_value(entry));
-qdict_put_obj(d, qdict_entry_key(entry), qdict_entry_value(entry));
-found_any = true;
-}
-
-return found_any;
-}
-
 /* Updates the following BDS fields:
  *  - exact_filename: A filename which may be used for opening a block device
  *which (mostly) equals the given BDS (even without any
@@ -4917,6 +4876,8 @@ void bdrv_refresh_filename(BlockDriverState *bs)
 BlockDriver *drv = bs->drv;
 BdrvChild *child;
 QDict *opts;
+bool generate_json_filename; /* Whether our default implementation should
+fill exact_filename (false) or not (true) 
*/
 
 if (!drv) {
 return;
@@ -4932,90 +4893,10 @@ void bdrv_refresh_filename(BlockDriverState *bs)
 }
 }
 
-if (drv->bdrv_refresh_filename) {
-/* Obsolete information is of no use here, so drop the old file name
- * information before refreshing it */
-bs->exact_filename[0] = '\0';
-if (bs->full_open_options) {
-QDECREF(bs->full_open_options);
-bs->full_open_options = NULL;
-}
-
-opts = qdict_new();
-append_open_options(opts, bs);
-drv->bdrv_refresh_filename(bs, opts);
-QDECREF(opts);
-} else if (bs->file) {
-/* Try to reconstruct valid information from the underlying file */
-bool has_open_options;
-
-bs->exact_filename[0] = '\0';
-if (bs->full_open_options) {
-QDECREF(bs->full_open_options);
-bs->full_open_options = NULL;
-}
-
-opts = qdict_new();
-has_open_options = append_open_options(opts, bs);
-

[Qemu-block] [PATCH v5 15/25] block: Use bdrv_dirname() for relative filenames

2017-06-21 Thread Max Reitz
bdrv_get_full_backing_filename_from_filename() breaks down when it comes
to JSON filenames. Using bdrv_dirname() as the basis is better because
since we have BDS, we can descend through the BDS tree to the protocol
layer, which gives us a greater probability of finding a non-JSON name;
also, bdrv_dirname() is more correct as it allows block drivers to
override the generation of that directory name in a protocol-specific
way.

We still need to keep bdrv_get_full_backing_filename_from_filename(),
though, because it has valid callers which need it during image creation
when no BDS is available yet.

This makes a test case in qemu-iotest 110, which was supposed to fail,
work. That is actually good, but we need to change the reference output
(and the comment in 110) accordingly.

Signed-off-by: Max Reitz 
---
 block.c| 20 +++-
 tests/qemu-iotests/110 |  3 ++-
 tests/qemu-iotests/110.out |  2 +-
 3 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/block.c b/block.c
index 0838c12..a5dd6bb 100644
--- a/block.c
+++ b/block.c
@@ -298,12 +298,22 @@ char *bdrv_get_full_backing_filename_from_filename(const 
char *backed,
 static char *bdrv_make_absolute_filename(BlockDriverState *relative_to,
  const char *filename, Error **errp)
 {
-char *bs_filename = relative_to->exact_filename[0]
-? relative_to->exact_filename
-: relative_to->filename;
+char *dir, *full_name;
 
-return bdrv_get_full_backing_filename_from_filename(bs_filename, filename,
-errp);
+if (filename[0] == '\0' || path_has_protocol(filename) ||
+path_is_absolute(filename))
+{
+return g_strdup(filename);
+}
+
+dir = bdrv_dirname(relative_to, errp);
+if (!dir) {
+return NULL;
+}
+
+full_name = g_strconcat(dir, filename, NULL);
+g_free(dir);
+return full_name;
 }
 
 char *bdrv_get_full_backing_filename(BlockDriverState *bs, Error **errp)
diff --git a/tests/qemu-iotests/110 b/tests/qemu-iotests/110
index 9de7369..ba1b3c6 100755
--- a/tests/qemu-iotests/110
+++ b/tests/qemu-iotests/110
@@ -61,7 +61,8 @@ echo '=== Non-reconstructable filename ==='
 echo
 
 # Across blkdebug without a config file, you cannot reconstruct filenames, so
-# qemu is incapable of knowing the directory of the top image
+# qemu is incapable of knowing the directory of the top image from the filename
+# alone. However, using bdrv_dirname(), it should still work.
 TEST_IMG="json:{
 'driver': '$IMGFMT',
 'file': {
diff --git a/tests/qemu-iotests/110.out b/tests/qemu-iotests/110.out
index b3584ff..5370bc1 100644
--- a/tests/qemu-iotests/110.out
+++ b/tests/qemu-iotests/110.out
@@ -14,7 +14,7 @@ backing file: t.IMGFMT.base (actual path: 
TEST_DIR/t.IMGFMT.base)
 image: json:{"driver": "IMGFMT", "file": {"set-state.0.event": "read_aio", 
"image": {"driver": "file", "filename": "TEST_DIR/t.IMGFMT"}, "driver": 
"blkdebug", "set-state.0.new_state": 42}}
 file format: IMGFMT
 virtual size: 64M (67108864 bytes)
-backing file: t.IMGFMT.base (cannot determine actual path)
+backing file: t.IMGFMT.base (actual path: TEST_DIR/t.IMGFMT.base)
 
 === Backing name is always relative to the backed image ===
 
-- 
2.9.4




[Qemu-block] [PATCH v5 22/25] block: Do not copy exact_filename from format file

2017-06-21 Thread Max Reitz
If the a format BDS's file BDS is in turn a format BDS, we cannot simply
use the same filename, because when opening a BDS tree based on a
filename alone, qemu will create only one format node on top of one
protocol node (disregarding a potential backing file).

Signed-off-by: Max Reitz 
---
 block.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index e22771b..2542ff5 100644
--- a/block.c
+++ b/block.c
@@ -4933,9 +4933,20 @@ void bdrv_refresh_filename(BlockDriverState *bs)
 
 bs->exact_filename[0] = '\0';
 
-/* If no specific options have been given for this BDS, the filename of
- * the underlying file should suffice for this one as well */
-if (bs->file->bs->exact_filename[0] && !generate_json_filename) {
+/* We can use the underlying file's filename if:
+ * - it has a filename,
+ * - the file is a protocol BDS, and
+ * - opening that file (as this BDS's format) will automatically create
+ *   the BDS tree we have right now, that is:
+ *   - the user did not significantly change this BDS's behavior with
+ * some explicit options
+ *   - no non-file child of this BDS has been overridden by the user
+ *   Both of these conditions are represented by 
generate_json_filename.
+ */
+if (bs->file->bs->exact_filename[0] &&
+bs->file->bs->drv->bdrv_file_open &&
+!generate_json_filename)
+{
 strcpy(bs->exact_filename, bs->file->bs->exact_filename);
 }
 }
-- 
2.9.4




[Qemu-block] [PATCH v5 20/25] block: Generically refresh runtime options

2017-06-21 Thread Max Reitz
Instead of having every block driver which implements
bdrv_refresh_filename() copy all of the significant runtime options over
to bs->full_open_options, implement this process generically in
bdrv_refresh_filename().

This patch only adds this new generic implementation, it does not remove
the old functionality. This is done in a follow-up patch.

With this patch, some superfluous information (that should never have
been there) may be removed from some JSON filenames, as can be seen in
the change to iotest 110's reference output.

Signed-off-by: Max Reitz 
---
 block.c| 116 -
 tests/qemu-iotests/110.out |   4 +-
 2 files changed, 117 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index 8e03cca..af5d1ef 100644
--- a/block.c
+++ b/block.c
@@ -4773,6 +4773,92 @@ out:
 return to_replace_bs;
 }
 
+/**
+ * Iterates through the list of runtime option keys that are said to be
+ * "significant" for a BDS. An option is called "significant" if it changes a
+ * BDS's data. For example, the null block driver's "size" and "read-zeroes"
+ * options are significant, but its "latency-ns" option is not.
+ *
+ * If a key returned by this function ends with a dot, all options starting 
with
+ * that prefix are significant.
+ */
+static const char *const *significant_options(BlockDriverState *bs,
+  const char *const *curopt)
+{
+static const char *const global_options[] = {
+"driver", "filename", "base-directory", NULL
+};
+
+if (!curopt) {
+return _options[0];
+}
+
+curopt++;
+if (curopt == _options[ARRAY_SIZE(global_options) - 1] && bs->drv) {
+curopt = bs->drv->sgfnt_runtime_opts;
+}
+
+return (curopt && *curopt) ? curopt : NULL;
+}
+
+/**
+ * Copies all significant runtime options from bs->options to the given QDict.
+ * The set of significant option keys is determined by invoking
+ * significant_options().
+ *
+ * Returns true iff any significant option was present in bs->options (and thus
+ * copied to the target QDict) with the exception of "filename" and "driver".
+ * The caller is expected to use this value to decide whether the existence of
+ * significant options prevents the generation of a plain filename.
+ */
+static bool append_significant_runtime_options(QDict *d, BlockDriverState *bs)
+{
+bool found_any = false;
+const char *const *option_name = NULL;
+
+if (!bs->drv) {
+return false;
+}
+
+while ((option_name = significant_options(bs, option_name))) {
+bool option_given = false;
+
+assert(strlen(*option_name) > 0);
+if ((*option_name)[strlen(*option_name) - 1] != '.') {
+QObject *entry = qdict_get(bs->options, *option_name);
+if (!entry) {
+continue;
+}
+
+qobject_incref(entry);
+qdict_put_obj(d, *option_name, entry);
+option_given = true;
+} else {
+const QDictEntry *entry;
+for (entry = qdict_first(bs->options); entry;
+ entry = qdict_next(bs->options, entry))
+{
+if (strstart(qdict_entry_key(entry), *option_name, NULL)) {
+qobject_incref(qdict_entry_value(entry));
+qdict_put_obj(d, qdict_entry_key(entry),
+  qdict_entry_value(entry));
+option_given = true;
+}
+}
+}
+
+/* While "driver" and "filename" need to be included in a JSON 
filename,
+ * their existence does not prohibit generation of a plain filename. */
+if (!found_any && option_given &&
+strcmp(*option_name, "driver") && strcmp(*option_name, "filename"))
+{
+found_any = true;
+}
+}
+
+return found_any;
+}
+
 static bool append_open_options(QDict *d, BlockDriverState *bs)
 {
 const QDictEntry *entry;
@@ -4927,9 +5013,37 @@ void bdrv_refresh_filename(BlockDriverState *bs)
 bs->full_open_options = opts;
 }
 
+/* Gather the options QDict */
+opts = qdict_new();
+append_significant_runtime_options(opts, bs);
+
+if (drv->bdrv_gather_child_options) {
+/* Some block drivers may not want to present all of their children's
+ * options, or name them differently from BdrvChild.name */
+drv->bdrv_gather_child_options(bs, opts);
+} else {
+QLIST_FOREACH(child, >children, next) {
+if (child->role == _backing && !bs->backing_overridden) {
+/* We can skip the backing BDS if it has not been overridden */
+continue;
+}
+
+QINCREF(child->bs->full_open_options);
+qdict_put(opts, child->name, child->bs->full_open_options);
+}
+
+if (bs->backing_overridden && !bs->backing) {
+/* Force no 

[Qemu-block] [PATCH v5 18/25] block: Add sgfnt_runtime_opts to BlockDriver

2017-06-21 Thread Max Reitz
This new field can be set by block drivers to list the runtime options
they accept that may influence the contents of the respective BDS. As of
a follow-up patch, this list will be used by the common
bdrv_refresh_filename() implementation to decide which options to put
into BDS.full_open_options (and consequently whether a JSON filename has
to be created), thus freeing the drivers of having to implement that
logic themselves.

Additionally, this patch adds the field to all of the block drivers that
need it and sets it accordingly.

Signed-off-by: Max Reitz 
---
 include/block/block_int.h |  7 +++
 block/blkdebug.c  |  6 ++
 block/crypto.c|  5 +
 block/curl.c  | 21 +
 block/gluster.c   | 19 +++
 block/iscsi.c | 18 ++
 block/nbd.c   | 14 ++
 block/nfs.c   |  4 
 block/null.c  |  9 +
 block/quorum.c|  8 
 block/rbd.c   |  5 +
 block/replication.c   |  5 +
 block/sheepdog.c  | 12 
 block/ssh.c   |  5 +
 block/vpc.c   |  4 
 block/vvfat.c |  4 
 block/vxhs.c  |  8 
 17 files changed, 154 insertions(+)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 8a7fc2d..1ec1d70 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -388,6 +388,13 @@ struct BlockDriver {
  uint64_t *nperm, uint64_t *nshared);
 
 QLIST_ENTRY(BlockDriver) list;
+
+/* Pointer to a NULL-terminated array of names of significant options that
+ * can be specified for bdrv_open(). A significant option is one that
+ * changes the data of a BDS.
+ * If this pointer is NULL, the array is considered empty.
+ * "filename" and "driver" are always considered significant. */
+const char *const *sgfnt_runtime_opts;
 };
 
 typedef struct BlockLimits {
diff --git a/block/blkdebug.c b/block/blkdebug.c
index f508ede..06c7924 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -923,6 +923,12 @@ static BlockDriver bdrv_blkdebug = {
 = blkdebug_debug_remove_breakpoint,
 .bdrv_debug_resume  = blkdebug_debug_resume,
 .bdrv_debug_is_suspended= blkdebug_debug_is_suspended,
+
+.sgfnt_runtime_opts = (const char *const[]) { "config",
+  "inject-error.",
+  "set-state.",
+  "suspend.",
+  NULL },
 };
 
 static void bdrv_blkdebug_init(void)
diff --git a/block/crypto.c b/block/crypto.c
index 10e5ddc..295b83e 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -639,6 +639,11 @@ BlockDriver bdrv_crypto_luks = {
 .bdrv_getlength = block_crypto_getlength,
 .bdrv_get_info  = block_crypto_get_info_luks,
 .bdrv_get_specific_info = block_crypto_get_specific_info_luks,
+
+.sgfnt_runtime_opts = (const char *const[]) {
+  BLOCK_CRYPTO_OPT_LUKS_KEY_SECRET,
+  NULL
+  },
 };
 
 static void block_crypto_init(void)
diff --git a/block/curl.c b/block/curl.c
index 2a244e2..11318a9 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -957,6 +957,19 @@ static int64_t curl_getlength(BlockDriverState *bs)
 return s->len;
 }
 
+static const char *const curl_sgfnt_runtime_opts[] = {
+CURL_BLOCK_OPT_URL,
+CURL_BLOCK_OPT_SSLVERIFY,
+CURL_BLOCK_OPT_COOKIE,
+CURL_BLOCK_OPT_COOKIE_SECRET,
+CURL_BLOCK_OPT_USERNAME,
+CURL_BLOCK_OPT_PASSWORD_SECRET,
+CURL_BLOCK_OPT_PROXY_USERNAME,
+CURL_BLOCK_OPT_PROXY_PASSWORD_SECRET,
+
+NULL
+};
+
 static BlockDriver bdrv_http = {
 .format_name= "http",
 .protocol_name  = "http",
@@ -971,6 +984,8 @@ static BlockDriver bdrv_http = {
 
 .bdrv_detach_aio_context= curl_detach_aio_context,
 .bdrv_attach_aio_context= curl_attach_aio_context,
+
+.sgfnt_runtime_opts = curl_sgfnt_runtime_opts,
 };
 
 static BlockDriver bdrv_https = {
@@ -987,6 +1002,8 @@ static BlockDriver bdrv_https = {
 
 .bdrv_detach_aio_context= curl_detach_aio_context,
 .bdrv_attach_aio_context= curl_attach_aio_context,
+
+.sgfnt_runtime_opts = curl_sgfnt_runtime_opts,
 };
 
 static BlockDriver bdrv_ftp = {
@@ -1003,6 +1020,8 @@ static BlockDriver bdrv_ftp = {
 
 .bdrv_detach_aio_context= curl_detach_aio_context,
 .bdrv_attach_aio_context= curl_attach_aio_context,
+
+.sgfnt_runtime_opts = curl_sgfnt_runtime_opts,
 };
 
 static BlockDriver bdrv_ftps = {
@@ -1019,6 +1038,8 @@ static BlockDriver bdrv_ftps = {
 
 .bdrv_detach_aio_context= 

[Qemu-block] [PATCH v5 17/25] iotests: Add quorum case to test 110

2017-06-21 Thread Max Reitz
Test 110 tests relative backing filenames for complex BDS trees. Add
quorum as an example that can never work automatically (without
special-casing if all child nodes have the same base directory), and an
example on how to make it work manually (using the base-directory
option).

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/110 | 48 ++
 tests/qemu-iotests/110.out | 12 
 2 files changed, 60 insertions(+)

diff --git a/tests/qemu-iotests/110 b/tests/qemu-iotests/110
index ba1b3c6..d96b656 100755
--- a/tests/qemu-iotests/110
+++ b/tests/qemu-iotests/110
@@ -30,6 +30,7 @@ status=1  # failure is the default!
 _cleanup()
 {
_cleanup_test_img
+rm -f "$TEST_IMG.copy"
 }
 trap "_cleanup; exit \$status" 0 1 2 3 15
 
@@ -87,6 +88,53 @@ echo
 # omit the image size; it should work anyway
 _make_test_img -b "$TEST_IMG_REL.base"
 
+echo
+echo '=== Nodes without a common directory ==='
+echo
+
+cp "$TEST_IMG" "$TEST_IMG.copy"
+
+# Should inform us that the actual path of the backing file cannot be 
determined
+TEST_IMG="json:{
+'driver': '$IMGFMT',
+'file': {
+'driver': 'quorum',
+'vote-threshold': 1,
+'children': [
+{
+'driver': 'file',
+'filename': '$TEST_IMG'
+},
+{
+'driver': 'file',
+'filename': '$TEST_IMG.copy'
+}
+]
+}
+}" _img_info | _filter_img_info
+
+echo
+
+# Should work fine
+TEST_IMG="json:{
+'driver': '$IMGFMT',
+'file': {
+'driver': 'quorum',
+'base-directory': '$TEST_DIR/',
+'vote-threshold': 1,
+'children': [
+{
+'driver': 'file',
+'filename': '$TEST_IMG'
+},
+{
+'driver': 'file',
+'filename': '$TEST_IMG.copy'
+}
+]
+}
+}" _img_info | _filter_img_info
+
 
 # success, all done
 echo '*** done'
diff --git a/tests/qemu-iotests/110.out b/tests/qemu-iotests/110.out
index 5370bc1..e1845d8 100644
--- a/tests/qemu-iotests/110.out
+++ b/tests/qemu-iotests/110.out
@@ -19,4 +19,16 @@ backing file: t.IMGFMT.base (actual path: 
TEST_DIR/t.IMGFMT.base)
 === Backing name is always relative to the backed image ===
 
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 
backing_file=t.IMGFMT.base
+
+=== Nodes without a common directory ===
+
+image: json:{"driver": "IMGFMT", "file": {"children": [{"driver": "file", 
"filename": "TEST_DIR/t.IMGFMT"}, {"driver": "file", "filename": 
"TEST_DIR/t.IMGFMT.copy"}], "driver": "quorum", "blkverify": false, 
"rewrite-corrupted": false, "vote-threshold": 1}}
+file format: IMGFMT
+virtual size: 64M (67108864 bytes)
+backing file: t.IMGFMT.base (cannot determine actual path)
+
+image: json:{"driver": "IMGFMT", "file": {"children": [{"driver": "file", 
"filename": "TEST_DIR/t.IMGFMT"}, {"driver": "file", "filename": 
"TEST_DIR/t.IMGFMT.copy"}], "driver": "quorum", "blkverify": false, 
"rewrite-corrupted": false, "vote-threshold": 1}}
+file format: IMGFMT
+virtual size: 64M (67108864 bytes)
+backing file: t.IMGFMT.base (actual path: TEST_DIR/t.IMGFMT.base)
 *** done
-- 
2.9.4




[Qemu-block] [PATCH v5 11/25] blkverify: Make bdrv_dirname() return NULL

2017-06-21 Thread Max Reitz
blkverify's BDSs have a file BDS, but we do not want this to be
preferred over the raw node. There is no way to decide between the two
(and not really a reason to, either), so just return NULL in blkverify's
implementation of bdrv_dirname().

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 block/blkverify.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/block/blkverify.c b/block/blkverify.c
index b2ed8cd..d5233ee 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -309,6 +309,15 @@ static void blkverify_refresh_filename(BlockDriverState 
*bs, QDict *options)
 }
 }
 
+static char *blkverify_dirname(BlockDriverState *bs, Error **errp)
+{
+/* In general, there are two BDSs with different dirnames below this one;
+ * so there is no unique dirname we could return (unless both are equal by
+ * chance). Therefore, to be consistent, just always return NULL. */
+error_setg(errp, "Cannot generate a base directory for blkverify nodes");
+return NULL;
+}
+
 static BlockDriver bdrv_blkverify = {
 .format_name  = "blkverify",
 .protocol_name= "blkverify",
@@ -320,6 +329,7 @@ static BlockDriver bdrv_blkverify = {
 .bdrv_child_perm  = bdrv_filter_default_perms,
 .bdrv_getlength   = blkverify_getlength,
 .bdrv_refresh_filename= blkverify_refresh_filename,
+.bdrv_dirname = blkverify_dirname,
 
 .bdrv_co_preadv   = blkverify_co_preadv,
 .bdrv_co_pwritev  = blkverify_co_pwritev,
-- 
2.9.4




[Qemu-block] [PATCH v5 16/25] block: Add 'base-directory' BDS option

2017-06-21 Thread Max Reitz
Using this option, one can directly override what bdrv_dirname() will
return. This is useful if one uses e.g. qcow2 on top of quorum (with
only protocol BDSs under the quorum BDS) and wants to be able to use
relative backing filenames.

Signed-off-by: Max Reitz 
---
 qapi/block-core.json  |  9 +
 include/block/block_int.h |  2 ++
 block.c   | 14 ++
 3 files changed, 25 insertions(+)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index f85c223..4090c62 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2940,6 +2940,14 @@
 # (default: off)
 # @force-share:   force share all permission on added nodes.
 # Requires read-only=true. (Since 2.10)
+# @base-directory May be specified for any node. Normally, whenever a filename
+# is specified which is supposed to be relative to this node
+# (such as relative backing filenames), the base directory to 
be
+# used is the directory the image file of this node is in, 
which
+# is simply prepended to the relative filename. Using this
+# option, the string which is prepended (i.e. the base
+# directory) can be overridden.
+# (Since 2.10)
 #
 # Remaining options are determined by the block driver.
 #
@@ -2948,6 +2956,7 @@
 { 'union': 'BlockdevOptions',
   'base': { 'driver': 'BlockdevDriver',
 '*node-name': 'str',
+'*base-directory': 'str',
 '*discard': 'BlockdevDiscardOptions',
 '*cache': 'BlockdevCacheOptions',
 '*read-only': 'bool',
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 61e7d7a..8a7fc2d 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -559,6 +559,8 @@ struct BlockDriverState {
 QDict *full_open_options;
 char exact_filename[PATH_MAX];
 
+char *dirname;
+
 BdrvChild *backing;
 BdrvChild *file;
 
diff --git a/block.c b/block.c
index a5dd6bb..8e03cca 100644
--- a/block.c
+++ b/block.c
@@ -1233,6 +1233,12 @@ QemuOptsList bdrv_runtime_opts = {
 .type = QEMU_OPT_BOOL,
 .help = "always accept other writers (default: off)",
 },
+{
+.name = "base-directory",
+.type = QEMU_OPT_STRING,
+.help = "String to prepend to filenames relative to this BDS for "
+"making them absolute",
+},
 { /* end of list */ }
 },
 };
@@ -1305,6 +1311,8 @@ static int bdrv_open_common(BlockDriverState *bs, 
BlockBackend *file,
 trace_bdrv_open_common(bs, filename ?: "", bs->open_flags,
drv->format_name);
 
+bs->dirname = g_strdup(qemu_opt_get(opts, "base-directory"));
+
 bs->read_only = !(bs->open_flags & BDRV_O_RDWR);
 
 if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv, bs->read_only)) {
@@ -3249,6 +3257,8 @@ static void bdrv_delete(BlockDriverState *bs)
 
 bdrv_close(bs);
 
+g_free(bs->dirname);
+
 /* remove from list, if necessary */
 if (bs->node_name[0] != '\0') {
 QTAILQ_REMOVE(_bdrv_states, bs, node_list);
@@ -4931,6 +4941,10 @@ char *bdrv_dirname(BlockDriverState *bs, Error **errp)
 {
 BlockDriver *drv = bs->drv;
 
+if (bs->dirname) {
+return g_strdup(bs->dirname);
+}
+
 if (!drv) {
 error_setg(errp, "Node '%s' is ejected", bs->node_name);
 return NULL;
-- 
2.9.4




[Qemu-block] [PATCH v5 14/25] block/nfs: Implement bdrv_dirname()

2017-06-21 Thread Max Reitz
While the basic idea is obvious and could be handled by the default
bdrv_dirname() implementation, we cannot generate a directory name if
the gid or uid are set, so we have to explicitly return NULL in those
cases.

Signed-off-by: Max Reitz 
---
 block/nfs.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/block/nfs.c b/block/nfs.c
index fee1f87..976d96f 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -862,6 +862,19 @@ static void nfs_refresh_filename(BlockDriverState *bs, 
QDict *options)
 bs->full_open_options = opts;
 }
 
+static char *nfs_dirname(BlockDriverState *bs, Error **errp)
+{
+NFSClient *client = bs->opaque;
+
+if (client->uid || client->gid) {
+error_setg(errp, "Cannot generate a base directory for NBD node '%s'",
+   bs->filename);
+return NULL;
+}
+
+return g_strdup_printf("nfs://%s%s/", client->server->host, client->path);
+}
+
 #ifdef LIBNFS_FEATURE_PAGECACHE
 static void nfs_invalidate_cache(BlockDriverState *bs,
  Error **errp)
@@ -895,6 +908,7 @@ static BlockDriver bdrv_nfs = {
 .bdrv_detach_aio_context= nfs_detach_aio_context,
 .bdrv_attach_aio_context= nfs_attach_aio_context,
 .bdrv_refresh_filename  = nfs_refresh_filename,
+.bdrv_dirname   = nfs_dirname,
 
 #ifdef LIBNFS_FEATURE_PAGECACHE
 .bdrv_invalidate_cache  = nfs_invalidate_cache,
-- 
2.9.4




[Qemu-block] [PATCH v5 08/25] block: Add bdrv_make_absolute_filename()

2017-06-21 Thread Max Reitz
This is a general function for making a filename that is relative to a
certain BDS absolute.

It calls bdrv_get_full_backing_filename_from_filename() for now, but
that will be changed in a follow-up patch.

Signed-off-by: Max Reitz 
Reviewed-by: Alberto Garcia 
---
 block.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index 14beb6b..6f19816 100644
--- a/block.c
+++ b/block.c
@@ -304,15 +304,22 @@ char *bdrv_get_full_backing_filename_from_filename(const 
char *backed,
 }
 }
 
-char *bdrv_get_full_backing_filename(BlockDriverState *bs, Error **errp)
+static char *bdrv_make_absolute_filename(BlockDriverState *relative_to,
+ const char *filename, Error **errp)
 {
-char *backed = bs->exact_filename[0] ? bs->exact_filename : bs->filename;
+char *bs_filename = relative_to->exact_filename[0]
+? relative_to->exact_filename
+: relative_to->filename;
 
-return bdrv_get_full_backing_filename_from_filename(backed,
-bs->backing_file,
+return bdrv_get_full_backing_filename_from_filename(bs_filename, filename,
 errp);
 }
 
+char *bdrv_get_full_backing_filename(BlockDriverState *bs, Error **errp)
+{
+return bdrv_make_absolute_filename(bs, bs->backing_file, errp);
+}
+
 void bdrv_register(BlockDriver *bdrv)
 {
 QLIST_INSERT_HEAD(_drivers, bdrv, list);
-- 
2.9.4




[Qemu-block] [PATCH v5 07/25] block: bdrv_get_full_backing_filename's ret. val.

2017-06-21 Thread Max Reitz
Make bdrv_get_full_backing_filename() return an allocated string instead
of placing the result in a caller-provided buffer.

Signed-off-by: Max Reitz 
---
 include/block/block.h |  3 +--
 block.c   | 42 +-
 block/qapi.c  | 12 ++--
 3 files changed, 20 insertions(+), 37 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index e9df983..dfae9b4 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -492,8 +492,7 @@ void bdrv_round_to_clusters(BlockDriverState *bs,
 const char *bdrv_get_encrypted_filename(BlockDriverState *bs);
 void bdrv_get_backing_filename(BlockDriverState *bs,
char *filename, int filename_size);
-void bdrv_get_full_backing_filename(BlockDriverState *bs,
-char *dest, size_t sz, Error **errp);
+char *bdrv_get_full_backing_filename(BlockDriverState *bs, Error **errp);
 char *bdrv_get_full_backing_filename_from_filename(const char *backed,
const char *backing,
Error **errp);
diff --git a/block.c b/block.c
index b70a9a8..14beb6b 100644
--- a/block.c
+++ b/block.c
@@ -304,19 +304,13 @@ char *bdrv_get_full_backing_filename_from_filename(const 
char *backed,
 }
 }
 
-void bdrv_get_full_backing_filename(BlockDriverState *bs, char *dest, size_t 
sz,
-Error **errp)
+char *bdrv_get_full_backing_filename(BlockDriverState *bs, Error **errp)
 {
 char *backed = bs->exact_filename[0] ? bs->exact_filename : bs->filename;
-char *full_name;
 
-full_name = bdrv_get_full_backing_filename_from_filename(backed,
- bs->backing_file,
- errp);
-if (full_name) {
-pstrcpy(dest, sz, full_name);
-g_free(full_name);
-}
+return bdrv_get_full_backing_filename_from_filename(backed,
+bs->backing_file,
+errp);
 }
 
 void bdrv_register(BlockDriver *bdrv)
@@ -2127,7 +2121,7 @@ out:
 int bdrv_open_backing_file(BlockDriverState *bs, QDict *parent_options,
const char *bdref_key, Error **errp)
 {
-char *backing_filename = g_malloc0(PATH_MAX);
+char *backing_filename = NULL;
 char *bdref_key_dot;
 const char *reference = NULL;
 int ret = 0;
@@ -2161,7 +2155,7 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*parent_options,
  */
 reference = qdict_get_try_str(parent_options, bdref_key);
 if (reference || qdict_haskey(options, "file.filename")) {
-backing_filename[0] = '\0';
+/* keep backing_filename NULL */
 
 /* FIXME: Should also be set to true if @options contains other runtime
  *options which control the data that is read from the backing
@@ -2171,8 +2165,7 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*parent_options,
 QDECREF(options);
 goto free_exit;
 } else {
-bdrv_get_full_backing_filename(bs, backing_filename, PATH_MAX,
-   _err);
+backing_filename = bdrv_get_full_backing_filename(bs, _err);
 if (local_err) {
 ret = -EINVAL;
 error_propagate(errp, local_err);
@@ -2192,9 +2185,8 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*parent_options,
 qdict_put_str(options, "driver", bs->backing_format);
 }
 
-backing_hd = bdrv_open_inherit(*backing_filename ? backing_filename : NULL,
-   reference, options, 0, bs, _backing,
-   errp);
+backing_hd = bdrv_open_inherit(backing_filename, reference, options, 0, bs,
+   _backing, errp);
 if (!backing_hd) {
 bs->open_flags |= BDRV_O_NO_BACKING;
 error_prepend(errp, "Could not open backing file: ");
@@ -3944,7 +3936,6 @@ BlockDriverState 
*bdrv_find_backing_image(BlockDriverState *bs,
 int is_protocol = 0;
 BlockDriverState *curr_bs = NULL;
 BlockDriverState *retval = NULL;
-Error *local_error = NULL;
 
 if (!bs || !bs->drv || !backing_file) {
 return NULL;
@@ -3961,21 +3952,22 @@ BlockDriverState 
*bdrv_find_backing_image(BlockDriverState *bs,
 /* If either of the filename paths is actually a protocol, then
  * compare unmodified paths; otherwise make paths relative */
 if (is_protocol || path_has_protocol(curr_bs->backing_file)) {
+char *backing_file_full_ret;
+
 if (strcmp(backing_file, curr_bs->backing_file) == 0) {
 retval = curr_bs->backing->bs;
 break;
 }
 /* Also check against the full 

[Qemu-block] [PATCH v5 09/25] block: Fix bdrv_find_backing_image()

2017-06-21 Thread Max Reitz
bdrv_find_backing_image() should use bdrv_get_full_backing_filename() or
bdrv_make_absolute_filename() instead of trying to do what those
functions do by itself.

path_combine_deprecated() can now be dropped, so let's do that.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 block.c | 28 +---
 1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/block.c b/block.c
index 6f19816..108da69 100644
--- a/block.c
+++ b/block.c
@@ -197,15 +197,6 @@ char *path_combine(const char *base_path, const char 
*filename)
 return result;
 }
 
-static void path_combine_deprecated(char *dest, int dest_size,
-const char *base_path,
-const char *filename)
-{
-char *combined = path_combine(base_path, filename);
-pstrcpy(dest, dest_size, combined);
-g_free(combined);
-}
-
 /*
  * Helper function for bdrv_parse_filename() implementations to remove optional
  * protocol prefixes (especially "file:") from a filename and for putting the
@@ -3950,7 +3941,6 @@ BlockDriverState 
*bdrv_find_backing_image(BlockDriverState *bs,
 
 filename_full = g_malloc(PATH_MAX);
 backing_file_full = g_malloc(PATH_MAX);
-filename_tmp  = g_malloc(PATH_MAX);
 
 is_protocol = path_has_protocol(backing_file);
 
@@ -3979,22 +3969,31 @@ BlockDriverState 
*bdrv_find_backing_image(BlockDriverState *bs,
 } else {
 /* If not an absolute filename path, make it relative to the 
current
  * image's filename path */
-path_combine_deprecated(filename_tmp, PATH_MAX, curr_bs->filename,
-backing_file);
+filename_tmp = bdrv_make_absolute_filename(curr_bs, backing_file,
+   NULL);
+if (!filename_tmp) {
+continue;
+}
 
 /* We are going to compare absolute pathnames */
 if (!realpath(filename_tmp, filename_full)) {
+g_free(filename_tmp);
 continue;
 }
+g_free(filename_tmp);
 
 /* We need to make sure the backing filename we are comparing 
against
  * is relative to the current image filename (or absolute) */
-path_combine_deprecated(filename_tmp, PATH_MAX, curr_bs->filename,
-curr_bs->backing_file);
+filename_tmp = bdrv_get_full_backing_filename(curr_bs, NULL);
+if (!filename_tmp) {
+continue;
+}
 
 if (!realpath(filename_tmp, backing_file_full)) {
+g_free(filename_tmp);
 continue;
 }
+g_free(filename_tmp);
 
 if (strcmp(backing_file_full, filename_full) == 0) {
 retval = curr_bs->backing->bs;
@@ -4005,7 +4004,6 @@ BlockDriverState 
*bdrv_find_backing_image(BlockDriverState *bs,
 
 g_free(filename_full);
 g_free(backing_file_full);
-g_free(filename_tmp);
 return retval;
 }
 
-- 
2.9.4




[Qemu-block] [PATCH v5 06/25] block: bdrv_get_full_backing_filename_from_...'s ret. val.

2017-06-21 Thread Max Reitz
Make bdrv_get_full_backing_filename_from_filename() return an allocated
string instead of placing the result in a caller-provided buffer.

Signed-off-by: Max Reitz 
Reviewed-by: Alberto Garcia 
---
 include/block/block.h |  7 +++
 block.c   | 32 +++-
 block/vmdk.c  |  8 +++-
 3 files changed, 25 insertions(+), 22 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 9f89fa7..e9df983 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -494,10 +494,9 @@ void bdrv_get_backing_filename(BlockDriverState *bs,
char *filename, int filename_size);
 void bdrv_get_full_backing_filename(BlockDriverState *bs,
 char *dest, size_t sz, Error **errp);
-void bdrv_get_full_backing_filename_from_filename(const char *backed,
-  const char *backing,
-  char *dest, size_t sz,
-  Error **errp);
+char *bdrv_get_full_backing_filename_from_filename(const char *backed,
+   const char *backing,
+   Error **errp);
 
 int path_has_protocol(const char *path);
 int path_is_absolute(const char *path);
diff --git a/block.c b/block.c
index 80e4805..b70a9a8 100644
--- a/block.c
+++ b/block.c
@@ -287,20 +287,20 @@ int bdrv_set_read_only(BlockDriverState *bs, bool 
read_only, Error **errp)
 return 0;
 }
 
-void bdrv_get_full_backing_filename_from_filename(const char *backed,
-  const char *backing,
-  char *dest, size_t sz,
-  Error **errp)
+char *bdrv_get_full_backing_filename_from_filename(const char *backed,
+   const char *backing,
+   Error **errp)
 {
 if (backing[0] == '\0' || path_has_protocol(backing) ||
 path_is_absolute(backing))
 {
-pstrcpy(dest, sz, backing);
+return g_strdup(backing);
 } else if (backed[0] == '\0' || strstart(backed, "json:", NULL)) {
 error_setg(errp, "Cannot use relative backing file names for '%s'",
backed);
+return NULL;
 } else {
-path_combine_deprecated(dest, sz, backed, backing);
+return path_combine(backed, backing);
 }
 }
 
@@ -308,9 +308,15 @@ void bdrv_get_full_backing_filename(BlockDriverState *bs, 
char *dest, size_t sz,
 Error **errp)
 {
 char *backed = bs->exact_filename[0] ? bs->exact_filename : bs->filename;
+char *full_name;
 
-bdrv_get_full_backing_filename_from_filename(backed, bs->backing_file,
- dest, sz, errp);
+full_name = bdrv_get_full_backing_filename_from_filename(backed,
+ bs->backing_file,
+ errp);
+if (full_name) {
+pstrcpy(dest, sz, full_name);
+g_free(full_name);
+}
 }
 
 void bdrv_register(BlockDriver *bdrv)
@@ -4431,16 +4437,16 @@ void bdrv_img_create(const char *filename, const char 
*fmt,
 if (size == -1) {
 if (backing_file) {
 BlockDriverState *bs;
-char *full_backing = g_new0(char, PATH_MAX);
+char *full_backing;
 int64_t size;
 int back_flags;
 QDict *backing_options = NULL;
 
-bdrv_get_full_backing_filename_from_filename(filename, 
backing_file,
- full_backing, 
PATH_MAX,
- _err);
+full_backing =
+bdrv_get_full_backing_filename_from_filename(filename,
+ backing_file,
+ _err);
 if (local_err) {
-g_free(full_backing);
 goto out;
 }
 
diff --git a/block/vmdk.c b/block/vmdk.c
index 4389a69..4e92ce4 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1983,12 +1983,10 @@ static int vmdk_create(const char *filename, QemuOpts 
*opts, Error **errp)
 }
 if (backing_file) {
 BlockBackend *blk;
-char *full_backing = g_new0(char, PATH_MAX);
-bdrv_get_full_backing_filename_from_filename(filename, backing_file,
- full_backing, PATH_MAX,
- _err);
+char *full_backing =
+bdrv_get_full_backing_filename_from_filename(filename, 

[Qemu-block] [PATCH v5 12/25] quorum: Make bdrv_dirname() return NULL

2017-06-21 Thread Max Reitz
While the common implementation for bdrv_dirname() should return NULL
for quorum BDSs already (because they do not have a file node and their
exact_filename field should be empty), there is no reason not to make
that explicit.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 block/quorum.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index f39b9e5..1e40b9e 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1108,6 +1108,16 @@ static void quorum_refresh_filename(BlockDriverState 
*bs, QDict *options)
 bs->full_open_options = opts;
 }
 
+static char *quorum_dirname(BlockDriverState *bs, Error **errp)
+{
+/* In general, there are multiple BDSs with different dirnames below this
+ * one; so there is no unique dirname we could return (unless all are equal
+ * by chance, or there is only one). Therefore, to be consistent, just
+ * always return NULL. */
+error_setg(errp, "Cannot generate a base directory for quorum nodes");
+return NULL;
+}
+
 static BlockDriver bdrv_quorum = {
 .format_name= "quorum",
 .protocol_name  = "quorum",
@@ -1117,6 +1127,7 @@ static BlockDriver bdrv_quorum = {
 .bdrv_file_open = quorum_open,
 .bdrv_close = quorum_close,
 .bdrv_refresh_filename  = quorum_refresh_filename,
+.bdrv_dirname   = quorum_dirname,
 
 .bdrv_co_flush_to_disk  = quorum_co_flush,
 
-- 
2.9.4




[Qemu-block] [PATCH v5 03/25] block: Add BDS.backing_overridden

2017-06-21 Thread Max Reitz
If the backing file is overridden, this most probably does change the
guest-visible data of a BDS. Therefore, we will need to consider this in
bdrv_refresh_filename().

Adding a new field to the BDS is not nice, but it is very simple and
exactly keeps track of whether the backing file has been overridden.

This commit adds a FIXME which will be remedied by a follow-up commit.
Until then, the respective piece of code will not result in any behavior
that is worse than what we currently have.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 include/block/block_int.h |  1 +
 block.c   | 13 +
 block/mirror.c|  4 
 blockdev.c| 16 
 4 files changed, 34 insertions(+)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 6820027..bedf19b 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -547,6 +547,7 @@ struct BlockDriverState {
 char backing_file[PATH_MAX]; /* if non zero, the image is a diff of
 this file image */
 char backing_format[16]; /* if non-zero and backing_file exists */
+bool backing_overridden; /* backing file has been specified by the user */
 
 QDict *full_open_options;
 char exact_filename[PATH_MAX];
diff --git a/block.c b/block.c
index 2926f53..2addec0 100644
--- a/block.c
+++ b/block.c
@@ -2147,6 +2147,11 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*parent_options,
 reference = qdict_get_try_str(parent_options, bdref_key);
 if (reference || qdict_haskey(options, "file.filename")) {
 backing_filename[0] = '\0';
+
+/* FIXME: Should also be set to true if @options contains other runtime
+ *options which control the data that is read from the backing
+ *BDS */
+bs->backing_overridden = true;
 } else if (bs->backing_file[0] == '\0' && qdict_size(options) == 0) {
 QDECREF(options);
 goto free_exit;
@@ -2346,6 +2351,9 @@ static BlockDriverState 
*bdrv_append_temp_snapshot(BlockDriverState *bs,
 goto out;
 }
 
+bs_snapshot->backing_overridden = true;
+bdrv_refresh_filename(bs_snapshot);
+
 out:
 QDECREF(snapshot_options);
 g_free(tmp_filename);
@@ -2476,6 +2484,7 @@ static BlockDriverState *bdrv_open_inherit(const char 
*filename,
 backing = qdict_get_try_str(options, "backing");
 if (backing && *backing == '\0') {
 flags |= BDRV_O_NO_BACKING;
+bs->backing_overridden = true;
 qdict_del(options, "backing");
 }
 
@@ -4799,6 +4808,10 @@ void bdrv_refresh_filename(BlockDriverState *bs)
  * refresh those first */
 QLIST_FOREACH(child, >children, next) {
 bdrv_refresh_filename(child->bs);
+
+if (child->role == _backing && child->bs->backing_overridden) {
+bs->backing_overridden = true;
+}
 }
 
 if (drv->bdrv_refresh_filename) {
diff --git a/block/mirror.c b/block/mirror.c
index df6c8bb..6ac3956 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -536,6 +536,10 @@ static void mirror_exit(BlockJob *job, void *opaque)
 error_report_err(local_err);
 data->ret = -EPERM;
 }
+
+/* The target image's file already has been created with the backing
+ * file we just set, so there is no need to set backing_overridden or
+ * call bdrv_refresh_filename(). */
 }
 
 if (s->to_replace) {
diff --git a/blockdev.c b/blockdev.c
index 6472548..c15308a 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1787,6 +1787,8 @@ static void external_snapshot_commit(BlkActionState 
*common)
 {
 ExternalSnapshotState *state =
  DO_UPCAST(ExternalSnapshotState, common, common);
+TransactionAction *action = common->action;
+bool image_was_existing = false;
 
 /* We don't need (or want) to use the transactional
  * bdrv_reopen_multiple() across all the entries at once, because we
@@ -1795,6 +1797,20 @@ static void external_snapshot_commit(BlkActionState 
*common)
 bdrv_reopen(state->old_bs, state->old_bs->open_flags & ~BDRV_O_RDWR,
 NULL);
 }
+
+if (action->type == TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC) {
+BlockdevSnapshotSync *s = action->u.blockdev_snapshot_sync.data;
+if (s->has_mode && s->mode == NEW_IMAGE_MODE_EXISTING) {
+image_was_existing = true;
+}
+} else {
+image_was_existing = true;
+}
+
+if (image_was_existing) {
+state->new_bs->backing_overridden = true;
+bdrv_refresh_filename(state->new_bs);
+}
 }
 
 static void external_snapshot_abort(BlkActionState *common)
-- 
2.9.4




[Qemu-block] [PATCH v5 13/25] block/nbd: Make bdrv_dirname() return NULL

2017-06-21 Thread Max Reitz
The generic bdrv_dirname() implementation would be able to generate some
form of directory name for many NBD nodes, but it would be always wrong.
Therefore, we have to explicitly make it an error (until NBD has some
form of specification for export paths, if it ever will).

Signed-off-by: Max Reitz 
---
 block/nbd.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index f91ac63..f896fc4 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -558,6 +558,16 @@ static void nbd_refresh_filename(BlockDriverState *bs, 
QDict *options)
 bs->full_open_options = opts;
 }
 
+static char *nbd_dirname(BlockDriverState *bs, Error **errp)
+{
+/* The generic bdrv_dirname() implementation is able to work out some
+ * directory name for NBD nodes, but that would be wrong. So far there is 
no
+ * specification for how "export paths" would work, so NBD does not have
+ * directory names. */
+error_setg(errp, "Cannot generate a base directory for NBD nodes");
+return NULL;
+}
+
 static BlockDriver bdrv_nbd = {
 .format_name= "nbd",
 .protocol_name  = "nbd",
@@ -575,6 +585,7 @@ static BlockDriver bdrv_nbd = {
 .bdrv_detach_aio_context= nbd_detach_aio_context,
 .bdrv_attach_aio_context= nbd_attach_aio_context,
 .bdrv_refresh_filename  = nbd_refresh_filename,
+.bdrv_dirname   = nbd_dirname,
 };
 
 static BlockDriver bdrv_nbd_tcp = {
@@ -594,6 +605,7 @@ static BlockDriver bdrv_nbd_tcp = {
 .bdrv_detach_aio_context= nbd_detach_aio_context,
 .bdrv_attach_aio_context= nbd_attach_aio_context,
 .bdrv_refresh_filename  = nbd_refresh_filename,
+.bdrv_dirname   = nbd_dirname,
 };
 
 static BlockDriver bdrv_nbd_unix = {
@@ -613,6 +625,7 @@ static BlockDriver bdrv_nbd_unix = {
 .bdrv_detach_aio_context= nbd_detach_aio_context,
 .bdrv_attach_aio_context= nbd_attach_aio_context,
 .bdrv_refresh_filename  = nbd_refresh_filename,
+.bdrv_dirname   = nbd_dirname,
 };
 
 static void bdrv_nbd_init(void)
-- 
2.9.4




[Qemu-block] [PATCH v5 10/25] block: Add bdrv_dirname()

2017-06-21 Thread Max Reitz
This function may be implemented by block drivers to derive a directory
name from a BDS. Concatenating this g_free()-able string with a relative
filename must result in a valid (not necessarily existing) filename, so
this is a function that should generally be not implemented by format
drivers, because this is protocol-specific.

If a BDS's driver does not implement this function, bdrv_dirname() will
fall through to the BDS's file if it exists. If it does not, the
exact_filename field will be used to generate a directory name.

Signed-off-by: Max Reitz 
---
 include/block/block.h |  1 +
 include/block/block_int.h |  7 +++
 block.c   | 26 ++
 3 files changed, 34 insertions(+)

diff --git a/include/block/block.h b/include/block/block.h
index dfae9b4..8327e4a 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -496,6 +496,7 @@ char *bdrv_get_full_backing_filename(BlockDriverState *bs, 
Error **errp);
 char *bdrv_get_full_backing_filename_from_filename(const char *backed,
const char *backing,
Error **errp);
+char *bdrv_dirname(BlockDriverState *bs, Error **errp);
 
 int path_has_protocol(const char *path);
 int path_is_absolute(const char *path);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index bedf19b..61e7d7a 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -131,6 +131,13 @@ struct BlockDriver {
 
 void (*bdrv_refresh_filename)(BlockDriverState *bs, QDict *options);
 
+/*
+ * Returns an allocated string which is the directory name of this BDS: It
+ * will be used to make relative filenames absolute by prepending this
+ * function's return value to them.
+ */
+char *(*bdrv_dirname)(BlockDriverState *bs, Error **errp);
+
 /* aio */
 BlockAIOCB *(*bdrv_aio_readv)(BlockDriverState *bs,
 int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
diff --git a/block.c b/block.c
index 108da69..0838c12 100644
--- a/block.c
+++ b/block.c
@@ -4917,6 +4917,32 @@ void bdrv_refresh_filename(BlockDriverState *bs)
 }
 }
 
+char *bdrv_dirname(BlockDriverState *bs, Error **errp)
+{
+BlockDriver *drv = bs->drv;
+
+if (!drv) {
+error_setg(errp, "Node '%s' is ejected", bs->node_name);
+return NULL;
+}
+
+if (drv->bdrv_dirname) {
+return drv->bdrv_dirname(bs, errp);
+}
+
+if (bs->file) {
+return bdrv_dirname(bs->file->bs, errp);
+}
+
+if (bs->exact_filename[0] != '\0') {
+return path_combine(bs->exact_filename, "");
+}
+
+error_setg(errp, "Cannot generate a base directory for %s nodes",
+   drv->format_name);
+return NULL;
+}
+
 /*
  * Hot add/remove a BDS's child. So the user can take a child offline when
  * it is broken and take a new child online
-- 
2.9.4




[Qemu-block] [PATCH v5 02/25] block: Use children list in bdrv_refresh_filename

2017-06-21 Thread Max Reitz
bdrv_refresh_filename() should invoke itself recursively on all
children, not just on file.

With that change, we can remove the manual invocations in blkverify,
quorum, commit, and mirror.

Signed-off-by: Max Reitz 
---
 block.c   | 9 +
 block/blkverify.c | 3 ---
 block/commit.c| 1 -
 block/mirror.c| 1 -
 block/quorum.c| 1 -
 5 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/block.c b/block.c
index fa1d06d..2926f53 100644
--- a/block.c
+++ b/block.c
@@ -4788,16 +4788,17 @@ static bool append_open_options(QDict *d, 
BlockDriverState *bs)
 void bdrv_refresh_filename(BlockDriverState *bs)
 {
 BlockDriver *drv = bs->drv;
+BdrvChild *child;
 QDict *opts;
 
 if (!drv) {
 return;
 }
 
-/* This BDS's file name will most probably depend on its file's name, so
- * refresh that first */
-if (bs->file) {
-bdrv_refresh_filename(bs->file->bs);
+/* This BDS's file name may depend on any of its children's file names, so
+ * refresh those first */
+QLIST_FOREACH(child, >children, next) {
+bdrv_refresh_filename(child->bs);
 }
 
 if (drv->bdrv_refresh_filename) {
diff --git a/block/blkverify.c b/block/blkverify.c
index 06369f9..b2ed8cd 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -281,9 +281,6 @@ static void blkverify_refresh_filename(BlockDriverState 
*bs, QDict *options)
 {
 BDRVBlkverifyState *s = bs->opaque;
 
-/* bs->file->bs has already been refreshed */
-bdrv_refresh_filename(s->test_file->bs);
-
 if (bs->file->bs->full_open_options
 && s->test_file->bs->full_open_options)
 {
diff --git a/block/commit.c b/block/commit.c
index 8c09c3d..7146e48 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -259,7 +259,6 @@ static int64_t coroutine_fn 
bdrv_commit_top_get_block_status(
 
 static void bdrv_commit_top_refresh_filename(BlockDriverState *bs, QDict *opts)
 {
-bdrv_refresh_filename(bs->backing->bs);
 pstrcpy(bs->exact_filename, sizeof(bs->exact_filename),
 bs->backing->bs->filename);
 }
diff --git a/block/mirror.c b/block/mirror.c
index db57bee..df6c8bb 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1070,7 +1070,6 @@ static int coroutine_fn 
bdrv_mirror_top_pdiscard(BlockDriverState *bs,
 
 static void bdrv_mirror_top_refresh_filename(BlockDriverState *bs, QDict *opts)
 {
-bdrv_refresh_filename(bs->backing->bs);
 pstrcpy(bs->exact_filename, sizeof(bs->exact_filename),
 bs->backing->bs->filename);
 }
diff --git a/block/quorum.c b/block/quorum.c
index 1b2a8c3..f39b9e5 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1087,7 +1087,6 @@ static void quorum_refresh_filename(BlockDriverState *bs, 
QDict *options)
 int i;
 
 for (i = 0; i < s->num_children; i++) {
-bdrv_refresh_filename(s->children[i]->bs);
 if (!s->children[i]->bs->full_open_options) {
 return;
 }
-- 
2.9.4




[Qemu-block] [PATCH v5 04/25] block: Respect backing bs in bdrv_refresh_filename

2017-06-21 Thread Max Reitz
Basically, bdrv_refresh_filename() should respect all children of a
BlockDriverState. However, generally those children are driver-specific,
so this function cannot handle the general case. On the other hand,
there are only few drivers which use other children than @file and
@backing (that being vmdk, quorum, and blkverify).

Most block drivers only use @file and/or @backing (if they use any
children at all). Both can be implemented directly in
bdrv_refresh_filename.

The user overriding the file's filename is already handled, however, the
user overriding the backing file is not. If this is done, opening the
BDS with the plain filename of its file will not be correct, so we may
not set bs->exact_filename in that case.

iotest 051 contains test cases for overwriting the backing file, and so
its output changes with this patch applied (which I consider a good
thing).

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 block.c   | 12 +++-
 tests/qemu-iotests/051.out|  8 
 tests/qemu-iotests/051.pc.out |  8 
 3 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/block.c b/block.c
index 2addec0..fd10a16 100644
--- a/block.c
+++ b/block.c
@@ -4839,6 +4839,7 @@ void bdrv_refresh_filename(BlockDriverState *bs)
 
 opts = qdict_new();
 has_open_options = append_open_options(opts, bs);
+has_open_options |= bs->backing_overridden;
 
 /* If no specific options have been given for this BDS, the filename of
  * the underlying file should suffice for this one as well */
@@ -4850,11 +4851,20 @@ void bdrv_refresh_filename(BlockDriverState *bs)
  * file BDS. The full options QDict of that file BDS should somehow
  * contain a representation of the filename, therefore the following
  * suffices without querying the (exact_)filename of this BDS. */
-if (bs->file->bs->full_open_options) {
+if (bs->file->bs->full_open_options &&
+(!bs->backing || bs->backing->bs->full_open_options))
+{
 qdict_put_str(opts, "driver", drv->format_name);
 QINCREF(bs->file->bs->full_open_options);
 qdict_put(opts, "file", bs->file->bs->full_open_options);
 
+if (bs->backing) {
+QINCREF(bs->backing->bs->full_open_options);
+qdict_put(opts, "backing", bs->backing->bs->full_open_options);
+} else if (bs->backing_overridden && !bs->backing) {
+qdict_put(opts, "backing", qstring_new());
+}
+
 bs->full_open_options = opts;
 } else {
 QDECREF(opts);
diff --git a/tests/qemu-iotests/051.out b/tests/qemu-iotests/051.out
index 4d3b1ff..d38b39f 100644
--- a/tests/qemu-iotests/051.out
+++ b/tests/qemu-iotests/051.out
@@ -59,7 +59,7 @@ QEMU X.Y.Z monitor - type 'help' for more information
 Testing: -drive 
file=TEST_DIR/t.qcow2,driver=qcow2,backing.file.filename=TEST_DIR/t.qcow2.orig,if=none,id=drive0
 -nodefaults
 QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) info block
-drive0 (NODE_NAME): TEST_DIR/t.qcow2 (qcow2)
+drive0 (NODE_NAME): json:{"backing": {"driver": "qcow2", "file": {"driver": 
"file", "filename": "TEST_DIR/t.qcow2.orig"}}, "driver": "qcow2", "file": 
{"driver": "file", "filename": "TEST_DIR/t.qcow2"}} (qcow2)
 Removable device: not locked, tray closed
 Cache mode:   writeback
 Backing file: TEST_DIR/t.qcow2.orig (chain depth: 1)
@@ -148,7 +148,7 @@ QEMU_PROG: -drive driver=null-co,cache=invalid_value: 
invalid cache option
 Testing: -drive 
file=TEST_DIR/t.qcow2,cache=writeback,backing.file.filename=TEST_DIR/t.qcow2.base,backing.cache.no-flush=on,backing.node-name=backing,backing.file.node-name=backing-file,file.node-name=file,if=none,id=drive0
 -nodefaults
 QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) info block
-drive0 (NODE_NAME): TEST_DIR/t.qcow2 (qcow2)
+drive0 (NODE_NAME): json:{"backing": {"driver": "qcow2", "file": {"driver": 
"file", "filename": "TEST_DIR/t.qcow2.base"}}, "driver": "qcow2", "file": 
{"driver": "file", "filename": "TEST_DIR/t.qcow2"}} (qcow2)
 Removable device: not locked, tray closed
 Cache mode:   writeback
 Backing file: TEST_DIR/t.qcow2.base (chain depth: 1)
@@ -168,7 +168,7 @@ backing-file: TEST_DIR/t.qcow2.base (file, read-only)
 Testing: -drive 
file=TEST_DIR/t.qcow2,cache=writethrough,backing.file.filename=TEST_DIR/t.qcow2.base,backing.cache.no-flush=on,backing.node-name=backing,backing.file.node-name=backing-file,file.node-name=file,if=none,id=drive0
 -nodefaults
 QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) info block
-drive0 (NODE_NAME): TEST_DIR/t.qcow2 (qcow2)
+drive0 (NODE_NAME): json:{"backing": {"driver": "qcow2", "file": {"driver": 
"file", "filename": "TEST_DIR/t.qcow2.base"}}, "driver": "qcow2", "file": 
{"driver": "file", "filename": "TEST_DIR/t.qcow2"}} (qcow2)
 

[Qemu-block] [PATCH v5 05/25] block: Make path_combine() return the path

2017-06-21 Thread Max Reitz
Besides being safe for arbitrary path lengths, after some follow-up
patches all callers will want a freshly allocated buffer anyway.

In the meantime, path_combine_deprecated() is added which has the same
interface as path_combine() had before this patch. All callers to that
function will be converted in follow-up patches.

Signed-off-by: Max Reitz 
---
 include/block/block.h |  4 +--
 block.c   | 85 ---
 block/vmdk.c  |  3 +-
 3 files changed, 49 insertions(+), 43 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 5bebff4..9f89fa7 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -501,9 +501,7 @@ void bdrv_get_full_backing_filename_from_filename(const 
char *backed,
 
 int path_has_protocol(const char *path);
 int path_is_absolute(const char *path);
-void path_combine(char *dest, int dest_size,
-  const char *base_path,
-  const char *filename);
+char *path_combine(const char *base_path, const char *filename);
 
 int bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
 int bdrv_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
diff --git a/block.c b/block.c
index fd10a16..80e4805 100644
--- a/block.c
+++ b/block.c
@@ -148,53 +148,62 @@ int path_is_absolute(const char *path)
 #endif
 }
 
-/* if filename is absolute, just copy it to dest. Otherwise, build a
+/* if filename is absolute, just return its duplicate. Otherwise, build a
path to it by considering it is relative to base_path. URL are
supported. */
-void path_combine(char *dest, int dest_size,
-  const char *base_path,
-  const char *filename)
+char *path_combine(const char *base_path, const char *filename)
 {
+const char *protocol_stripped = NULL;
 const char *p, *p1;
+char *result;
 int len;
 
-if (dest_size <= 0)
-return;
 if (path_is_absolute(filename)) {
-pstrcpy(dest, dest_size, filename);
-} else {
-const char *protocol_stripped = NULL;
+return g_strdup(filename);
+}
 
-if (path_has_protocol(base_path)) {
-protocol_stripped = strchr(base_path, ':');
-if (protocol_stripped) {
-protocol_stripped++;
-}
+if (path_has_protocol(base_path)) {
+protocol_stripped = strchr(base_path, ':');
+if (protocol_stripped) {
+protocol_stripped++;
 }
-p = protocol_stripped ?: base_path;
+}
+p = protocol_stripped ?: base_path;
 
-p1 = strrchr(base_path, '/');
+p1 = strrchr(base_path, '/');
 #ifdef _WIN32
-{
-const char *p2;
-p2 = strrchr(base_path, '\\');
-if (!p1 || p2 > p1)
-p1 = p2;
+{
+const char *p2;
+p2 = strrchr(base_path, '\\');
+if (!p1 || p2 > p1) {
+p1 = p2;
 }
+}
 #endif
-if (p1)
-p1++;
-else
-p1 = base_path;
-if (p1 > p)
-p = p1;
-len = p - base_path;
-if (len > dest_size - 1)
-len = dest_size - 1;
-memcpy(dest, base_path, len);
-dest[len] = '\0';
-pstrcat(dest, dest_size, filename);
+if (p1) {
+p1++;
+} else {
+p1 = base_path;
+}
+if (p1 > p) {
+p = p1;
 }
+len = p - base_path;
+
+result = g_malloc(len + strlen(filename) + 1);
+memcpy(result, base_path, len);
+strcpy(result + len, filename);
+
+return result;
+}
+
+static void path_combine_deprecated(char *dest, int dest_size,
+const char *base_path,
+const char *filename)
+{
+char *combined = path_combine(base_path, filename);
+pstrcpy(dest, dest_size, combined);
+g_free(combined);
 }
 
 /*
@@ -291,7 +300,7 @@ void bdrv_get_full_backing_filename_from_filename(const 
char *backed,
 error_setg(errp, "Cannot use relative backing file names for '%s'",
backed);
 } else {
-path_combine(dest, sz, backed, backing);
+path_combine_deprecated(dest, sz, backed, backing);
 }
 }
 
@@ -3965,8 +3974,8 @@ BlockDriverState 
*bdrv_find_backing_image(BlockDriverState *bs,
 } else {
 /* If not an absolute filename path, make it relative to the 
current
  * image's filename path */
-path_combine(filename_tmp, PATH_MAX, curr_bs->filename,
- backing_file);
+path_combine_deprecated(filename_tmp, PATH_MAX, curr_bs->filename,
+backing_file);
 
 /* We are going to compare absolute pathnames */
 if (!realpath(filename_tmp, filename_full)) {
@@ -3975,8 +3984,8 @@ BlockDriverState 
*bdrv_find_backing_image(BlockDriverState *bs,
 
 /* 

[Qemu-block] [PATCH v5 01/25] block/mirror: Small absolute-paths simplification

2017-06-21 Thread Max Reitz
When invoking drive-mirror in absolute-paths mode, the target's backing
BDS is assigned to it in mirror_exit(). The current logic only does so
if the target does not have that backing BDS already; but it actually
cannot have a backing BDS at all (the BDS is opened with O_NO_BACKING in
qmp_drive_mirror()), so just assert that and assign the new backing BDS
unconditionally.

Signed-off-by: Max Reitz 
---
 block/mirror.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 4618853..db57bee 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -529,12 +529,12 @@ static void mirror_exit(BlockJob *job, void *opaque)
 _abort);
 if (s->backing_mode == MIRROR_SOURCE_BACKING_CHAIN) {
 BlockDriverState *backing = s->is_none_mode ? src : s->base;
-if (backing_bs(target_bs) != backing) {
-bdrv_set_backing_hd(target_bs, backing, _err);
-if (local_err) {
-error_report_err(local_err);
-data->ret = -EPERM;
-}
+
+assert(!target_bs->backing);
+bdrv_set_backing_hd(target_bs, backing, _err);
+if (local_err) {
+error_report_err(local_err);
+data->ret = -EPERM;
 }
 }
 
-- 
2.9.4




[Qemu-block] [PATCH v5 00/25] block: Fix some filename generation issues

2017-06-21 Thread Max Reitz
[If you have read the cover letter in x \in [v2, v4], there is nothing
 new here; feel free to skip to the bottom to read the changes from v4.]

There are some issues regarding filename generation right now:

- You always get a JSON filename if you set even a single qcow2-specific
  runtime options (as long as it does not have a dot in it, which is a
  bug, too, but here it is working in our favor...). That is not nice
  and actually breaks the usage of backing files with relative
  filenames with such qcow2 BDS.

- As hinted above, you cannot use relative backing filenames with BDS
  that have a JSON filename only, even though qemu might be able to
  obtain the directory name by walking through the BDS graph to the
  protocol level.

- Overriding the backing file at runtime should invalidate the filename
  because it actually changes the BDS's data.  Therefore, we need to
  force a JSON filename in that case, containing the backing file
  override.

- Much of our code assumes paths never to exceed PATH_MAX in length.
  This is wrong, at least because of JSON filenames. This should be
  fixed wherever the opportunity arises.

- If a driver decides to implement bdrv_refresh_filename(), that
  implementation has to not only refresh the filename (as one would
  think) but it must also refresh the runtime options
  (bs->full_open_options). That is stupid. (I'm allowed to say that
  because I'm to blame for it.)

This series is enclosed by four patches (two at the front, two at the
back) that fix more or less general issues. They are included because:
- Patch 1 is required so that in patch 3 it's obvious why we don't need
  to set backing_overriden there or call bdrv_refresh_filename()
- Patch 2 is already reviewed, so I might just as well keep it.
- Patches 24 and 25 are basically general bug fixes. Their connection to
  this series is obvious, however, I think, and they depend on the rest
  of the series, so I decided to just put them in.

Patches 3 and 4 address the third issue above, and patch 23 adds
something that's missing from patch 3. It cannot be squashed into patch
3, however, because it depends on functionality introduced by patches 18
to 22.
Consequently, patch 3 introduces a FIXME that is resolved by patch 23.

Patches 5 to 9 address the fourth issue above, and are also necessary
preparation for the following patches.

Patches 10 to 16 address the second issue above, patch 17 adds a test
case. They implement a bdrv_dirname() function that returns the base
directory of a BDS by walking through the BDS graph to the protocol
layer and then trying to obtain a path based on that BDS's
exact_filename. This obviously fails if exact_filename on the protocol
layer is not set.
This behavior can be overriden either by any block driver along the way
implementing bdrv_dirname() itself or by the user through the new
'base-directory' node option. This may allow us to resolve relative
filenames even if the reference BDS only has a JSON filename.

Patches 18 to 22 address both the first and last issues above. They add
a field called "sgfnt_runtime_opts" to the BlockDriver structure. Block
drivers may point this to an array containing all of the runtime options
they accept that may change their BDS's data (i.e. that are
"significant"). bdrv_refresh_filename() will use this list to generate
bs->full_open_options itself (with only a little help by the block
driver, if necessary, through the .bdrv_gather_child_options()
function). This not only simplifies the process significantly, but also
results in the default implementation generating JSON filenames only
when really necessary.


v5: Rebased on my block branch and addressed Eric's comments on v4
- Patch 1: Rebase conflict (bdrv_set_backing_hd() is now called in
   mirror_exit() instead of mirror_complete())
- Patch 2: Drop bdrv_refresh_filename() calls in the commit and mirror
   block drivers' implementations of that very function
- Patch 5: Rebase conflict due to 0d54a6fed3ebaf0e
- Patch 7: Rebase conflict due to 418661e0324c1c41
- Patch 10: Added comment on bdrv_dirname()'s interface [Eric]
- Patch 13: Drop NBD bdrv_dirname() support [Eric]
- Patch 16: More or less contextual rebase conflicts, bumped qemu's
version number, removed the "#optional"
- Patch 18: More block drivers to support
- Patch 19:
  - Added comment on bdrv_gather_child_options()'s interface
  - Fixed implementation for VMDK: We should check that bs->backing is
non-NULL before accessing it (and it may be NULL even if
bs->backing_overridden is true)
- Patch 21:
  - Added comment on bdrv_refresh_filename()'s interface
  - Rebase conflicts mostly because of the new qdict_put_*() helpers
  - Handle the mirror and commit "block drivers", too


git-backport-diff against v4:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and 

Re: [Qemu-block] [Qemu-devel] [PATCH 1/3] xen-disk: only advertize feature-persistent if grant copy is not available

2017-06-21 Thread Roger Pau Monne
On Wed, Jun 21, 2017 at 11:40:00AM +0100, Paul Durrant wrote:
> > -Original Message-
> > From: Qemu-devel [mailto:qemu-devel-
> > bounces+paul.durrant=citrix@nongnu.org] On Behalf Of Paul Durrant
> > Sent: 21 June 2017 10:36
> > To: Roger Pau Monne ; Stefano Stabellini
> > 
> > Cc: Kevin Wolf ; qemu-block@nongnu.org; qemu-
> > de...@nongnu.org; Max Reitz ; Anthony Perard
> > ; xen-de...@lists.xenproject.org
> > Subject: Re: [Qemu-devel] [PATCH 1/3] xen-disk: only advertize feature-
> > persistent if grant copy is not available
> > 
> > > -Original Message-
> > > From: Roger Pau Monne
> > > Sent: 21 June 2017 10:18
> > > To: Stefano Stabellini 
> > > Cc: Paul Durrant ; xen-
> > de...@lists.xenproject.org;
> > > qemu-de...@nongnu.org; qemu-block@nongnu.org; Anthony Perard
> > > ; Kevin Wolf ; Max
> > Reitz
> > > 
> > > Subject: Re: [PATCH 1/3] xen-disk: only advertize feature-persistent if
> > grant
> > > copy is not available
> > >
> > > On Tue, Jun 20, 2017 at 03:19:33PM -0700, Stefano Stabellini wrote:
> > > > On Tue, 20 Jun 2017, Paul Durrant wrote:
> > > > > If grant copy is available then it will always be used in preference 
> > > > > to
> > > > > persistent maps. In this case feature-persistent should not be
> > advertized
> > > > > to the frontend, otherwise it may needlessly copy data into 
> > > > > persistently
> > > > > granted buffers.
> > > > >
> > > > > Signed-off-by: Paul Durrant 
> > > >
> > > > CC'ing Roger.
> > > >
> > > > It is true that using feature-persistent together with grant copies is a
> > > > a very bad idea.
> > > >
> > > > But this change enstablishes an explicit preference of
> > > > feature_grant_copy over feature-persistent in the xen_disk backend. It
> > > > is not obvious to me that it should be the case.
> > > >
> > > > Why is feature_grant_copy (without feature-persistent) better than
> > > > feature-persistent (without feature_grant_copy)? Shouldn't we simply
> > > > avoid grant copies to copy data to persistent grants?
> > >
> > > When using persistent grants the frontend must always copy data from
> > > the buffer to the persistent grant, there's no way to avoid this.
> > >
> > > Using grant_copy we move the copy from the frontend to the backend,
> > > which means the CPU time of the copy is accounted to the backend. This
> > > is not ideal, but IMHO it's better than persistent grants because it
> > > avoids keeping a pool of mapped grants that consume memory and make
> > > the code more complex.
> > >
> > > Do you have some performance data showing the difference between
> > > persistent grants vs grant copy?
> > >
> > 
> > No, but I can get some :-)
> > 
> > For a little background... I've been trying to push throughput of fio 
> > running in
> > a debian stretch guest on my skull canyon NUC. When I started out, I was
> > getting ~100MBbs. When I finished, with this patch, the IOThreads one, the
> > multi-page ring one and a bit of hackery to turn off all the aio flushes 
> > that
> > seem to occur even if the image is opened with O_DIRECT, I was getting
> > ~960Mbps... which is about line rate for the SSD in the in NUC.
> > 
> > So, I'll force use of persistent grants on and see what sort of throughput I
> > get.
> 
> A quick test with grant copy forced off (causing persistent grants to be 
> used)... My VM is debian stretch using a 16 page shared ring from blkfront. 
> The image backing xvdb is a fully inflated 10G qcow2.
> 
> root@dhcp-237-70:~# fio --randrepeat=1 --ioengine=libaio --direct=0 
> --gtod_reduce=1 --name=test --filename=/dev/xvdb --bs=512k --iodepth=64 
> --size=10G --readwrite=randwrite --ramp_time=4
> test: (g=0): rw=randwrite, bs=512K-512K/512K-512K/512K-512K, ioengine=libaio, 
> iodepth=64
> fio-2.16
> Starting 1 process
> Jobs: 1 (f=1): [w(1)] [70.6% done] [0KB/539.4MB/0KB /s] [0/1078/0 iops] [eta 
> 00m:05s]
> test: (groupid=0, jobs=1): err= 0: pid=633: Wed Jun 21 06:26:06 2017
>   write: io=6146.6MB, bw=795905KB/s, iops=1546, runt=  7908msec
>   cpu  : usr=2.07%, sys=34.00%, ctx=4490, majf=0, minf=1
>   IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.3%, >=64=166.9%
>  submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
> >=64=0.0%
>  complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, 
> >=64=0.0%
>  issued: total=r=0/w=12230/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
>  latency   : target=0, window=0, percentile=100.00%, depth=64
> 
> Run status group 0 (all jobs):
>   WRITE: io=6146.6MB, aggrb=795904KB/s, minb=795904KB/s, maxb=795904KB/s, 
> mint=7908msec, maxt=7908msec
> 
> Disk stats (read/write):
>   xvdb: ios=54/228860, merge=0/2230616, ticks=16/5403048, in_queue=5409068, 
> util=98.26%
> 
> The dom0 cpu 

Re: [Qemu-block] [Qemu-devel] [PATCH] qemu-img: don't shadow opts variable in img_dd()

2017-06-21 Thread Max Reitz
On 2017-06-21 10:24, Stefan Hajnoczi wrote:
> On Mon, Jun 19, 2017 at 05:18:18PM +0200, Max Reitz wrote:
>> On 2017-06-19 17:00, Stefan Hajnoczi wrote:
>>> It's confusing when two different variables have the same name in one
>>> function.
>>>
>>> Cc: Reda Sallahi 
>>> Signed-off-by: Stefan Hajnoczi 
>>> ---
>>>  qemu-img.c | 9 +++--
>>>  1 file changed, 3 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/qemu-img.c b/qemu-img.c
>>> index 0ad698d..c285c2f 100644
>>> --- a/qemu-img.c
>>> +++ b/qemu-img.c
>>> @@ -4249,15 +4249,12 @@ static int img_dd(int argc, char **argv)
>>>  case 'U':
>>>  force_share = true;
>>>  break;
>>> -case OPTION_OBJECT: {
>>> -QemuOpts *opts;
>>> -opts = qemu_opts_parse_noisily(_object_opts,
>>> -   optarg, true);
>>> -if (!opts) {
>>> +case OPTION_OBJECT:
>>> +if (!qemu_opts_parse_noisily(_object_opts, optarg, true)) 
>>> {
>>>  ret = -1;
>>>  goto out;
>>>  }
>>> -}   break;
>>> +break;
>>>  case OPTION_IMAGE_OPTS:
>>>  image_opts = true;
>>>  break;
>>
>> Hm, I basically reverted such a style in commit
>> 3258b91141090b05edcaab8f1d1dd355ca91b49a. I find it confusing to use the
>> same variable for two different things.
> 
> I don't follow how the commit you posted is related to this patch.  Did
> you read the patch too quickly and think it uses the outer opts
> variable?

Pleading guilty: Yes, I did. I just saw you dropped the inner variable
and thus thought you were using the outer ones to resolve shadowing.

> This patch doesn't use a variable at all - there is no need for one.

Good, then! :-)

Sorry, and thanks, applied to my block branch:

https://github.com/XanClic/qemu/commits/block

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 1/3] xen-disk: only advertize feature-persistent if grant copy is not available

2017-06-21 Thread Roger Pau Monné
On Tue, Jun 20, 2017 at 03:19:33PM -0700, Stefano Stabellini wrote:
> On Tue, 20 Jun 2017, Paul Durrant wrote:
> > If grant copy is available then it will always be used in preference to
> > persistent maps. In this case feature-persistent should not be advertized
> > to the frontend, otherwise it may needlessly copy data into persistently
> > granted buffers.
> > 
> > Signed-off-by: Paul Durrant 
> 
> CC'ing Roger.
> 
> It is true that using feature-persistent together with grant copies is a
> a very bad idea.
> 
> But this change enstablishes an explicit preference of
> feature_grant_copy over feature-persistent in the xen_disk backend. It
> is not obvious to me that it should be the case.
> 
> Why is feature_grant_copy (without feature-persistent) better than
> feature-persistent (without feature_grant_copy)? Shouldn't we simply
> avoid grant copies to copy data to persistent grants?

When using persistent grants the frontend must always copy data from
the buffer to the persistent grant, there's no way to avoid this.

Using grant_copy we move the copy from the frontend to the backend,
which means the CPU time of the copy is accounted to the backend. This
is not ideal, but IMHO it's better than persistent grants because it
avoids keeping a pool of mapped grants that consume memory and make
the code more complex.

Do you have some performance data showing the difference between
persistent grants vs grant copy?

Roger.



[Qemu-block] ping Re: [PATCH v3 0/3] qemu-img check: format allocation info

2017-06-21 Thread Vladimir Sementsov-Ogievskiy

ping

06.06.2017 19:26, Vladimir Sementsov-Ogievskiy wrote:

Hi all.

See 01 patch for the doc.

v3: - improve docs
 - rename fields
 - add 'zero' type of underlying file portions status. It as these
   areas cannot be presented as 'discarded', but they are not
   occupying real space, so we don't want to present them as
   'allocated data' too.
 - remove last patch. It is not needed after introducing new naming
   for the fields

v2: fix build error, gcc things that some variables may be used
 uninitialized (actually they didn't).

v1:

These series is a replacement for "qemu-img check: unallocated size"
series.

There was a question, should we account allocated clusters in qcow2 but
actually holes in underalying file as allocated or not. Instead of
hiding this information in one-number statistic I've decided to print
the whole information, 5 numbers:

For allocated by top-level format driver (qcow2 for ex.) clusters, 3
numbers: number of bytes, which are:
  - allocated in underlying file
  - holes in underlying file
  - after end of underlying file

To account other areas of underlying file, 2 more numbers of bytes:
  - unallocated by top-level driver but allocated in underlying file
  - unallocated by top-level driver and holes in underlying file

Vladimir Sementsov-Ogievskiy (3):
   block: add bdrv_get_format_alloc_stat format interface
   qcow2: add .bdrv_get_format_alloc_stat
   qemu-img check: add format allocation info

  block.c   |  16 +
  block/qcow2-refcount.c| 152 ++
  block/qcow2.c |   2 +
  block/qcow2.h |   2 +
  include/block/block.h |   3 +
  include/block/block_int.h |   2 +
  qapi/block-core.json  |  61 ++-
  qemu-img.c|  42 +
  8 files changed, 279 insertions(+), 1 deletion(-)



--
Best regards,
Vladimir




Re: [Qemu-block] [Qemu-devel] [PATCH 1/3] xen-disk: only advertize feature-persistent if grant copy is not available

2017-06-21 Thread Paul Durrant
> -Original Message-
> From: Roger Pau Monne
> Sent: 21 June 2017 11:51
> To: Paul Durrant 
> Cc: Stefano Stabellini ; Kevin Wolf
> ; qemu-block@nongnu.org; qemu-de...@nongnu.org;
> Max Reitz ; Anthony Perard
> ; xen-de...@lists.xenproject.org
> Subject: Re: [Qemu-devel] [PATCH 1/3] xen-disk: only advertize feature-
> persistent if grant copy is not available
> 
> On Wed, Jun 21, 2017 at 11:40:00AM +0100, Paul Durrant wrote:
> > > -Original Message-
> > > From: Qemu-devel [mailto:qemu-devel-
> > > bounces+paul.durrant=citrix@nongnu.org] On Behalf Of Paul
> Durrant
> > > Sent: 21 June 2017 10:36
> > > To: Roger Pau Monne ; Stefano Stabellini
> > > 
> > > Cc: Kevin Wolf ; qemu-block@nongnu.org; qemu-
> > > de...@nongnu.org; Max Reitz ; Anthony Perard
> > > ; xen-de...@lists.xenproject.org
> > > Subject: Re: [Qemu-devel] [PATCH 1/3] xen-disk: only advertize feature-
> > > persistent if grant copy is not available
> > >
> > > > -Original Message-
> > > > From: Roger Pau Monne
> > > > Sent: 21 June 2017 10:18
> > > > To: Stefano Stabellini 
> > > > Cc: Paul Durrant ; xen-
> > > de...@lists.xenproject.org;
> > > > qemu-de...@nongnu.org; qemu-block@nongnu.org; Anthony Perard
> > > > ; Kevin Wolf ; Max
> > > Reitz
> > > > 
> > > > Subject: Re: [PATCH 1/3] xen-disk: only advertize feature-persistent if
> > > grant
> > > > copy is not available
> > > >
> > > > On Tue, Jun 20, 2017 at 03:19:33PM -0700, Stefano Stabellini wrote:
> > > > > On Tue, 20 Jun 2017, Paul Durrant wrote:
> > > > > > If grant copy is available then it will always be used in 
> > > > > > preference to
> > > > > > persistent maps. In this case feature-persistent should not be
> > > advertized
> > > > > > to the frontend, otherwise it may needlessly copy data into
> persistently
> > > > > > granted buffers.
> > > > > >
> > > > > > Signed-off-by: Paul Durrant 
> > > > >
> > > > > CC'ing Roger.
> > > > >
> > > > > It is true that using feature-persistent together with grant copies 
> > > > > is a
> > > > > a very bad idea.
> > > > >
> > > > > But this change enstablishes an explicit preference of
> > > > > feature_grant_copy over feature-persistent in the xen_disk backend.
> It
> > > > > is not obvious to me that it should be the case.
> > > > >
> > > > > Why is feature_grant_copy (without feature-persistent) better than
> > > > > feature-persistent (without feature_grant_copy)? Shouldn't we
> simply
> > > > > avoid grant copies to copy data to persistent grants?
> > > >
> > > > When using persistent grants the frontend must always copy data from
> > > > the buffer to the persistent grant, there's no way to avoid this.
> > > >
> > > > Using grant_copy we move the copy from the frontend to the backend,
> > > > which means the CPU time of the copy is accounted to the backend.
> This
> > > > is not ideal, but IMHO it's better than persistent grants because it
> > > > avoids keeping a pool of mapped grants that consume memory and
> make
> > > > the code more complex.
> > > >
> > > > Do you have some performance data showing the difference between
> > > > persistent grants vs grant copy?
> > > >
> > >
> > > No, but I can get some :-)
> > >
> > > For a little background... I've been trying to push throughput of fio
> running in
> > > a debian stretch guest on my skull canyon NUC. When I started out, I was
> > > getting ~100MBbs. When I finished, with this patch, the IOThreads one,
> the
> > > multi-page ring one and a bit of hackery to turn off all the aio flushes 
> > > that
> > > seem to occur even if the image is opened with O_DIRECT, I was getting
> > > ~960Mbps... which is about line rate for the SSD in the in NUC.
> > >
> > > So, I'll force use of persistent grants on and see what sort of 
> > > throughput I
> > > get.
> >
> > A quick test with grant copy forced off (causing persistent grants to be
> used)... My VM is debian stretch using a 16 page shared ring from blkfront.
> The image backing xvdb is a fully inflated 10G qcow2.
> >
> > root@dhcp-237-70:~# fio --randrepeat=1 --ioengine=libaio --direct=0 --
> gtod_reduce=1 --name=test --filename=/dev/xvdb --bs=512k --iodepth=64 -
> -size=10G --readwrite=randwrite --ramp_time=4
> > test: (g=0): rw=randwrite, bs=512K-512K/512K-512K/512K-512K,
> ioengine=libaio, iodepth=64
> > fio-2.16
> > Starting 1 process
> > Jobs: 1 (f=1): [w(1)] [70.6% done] [0KB/539.4MB/0KB /s] [0/1078/0 iops]
> [eta 00m:05s]
> > test: (groupid=0, jobs=1): err= 0: pid=633: Wed Jun 21 06:26:06 2017
> >   write: io=6146.6MB, bw=795905KB/s, iops=1546, runt=  7908msec
> >   cpu  : usr=2.07%, sys=34.00%, ctx=4490, majf=0, minf=1
> >   IO depths 

Re: [Qemu-block] [PATCH 2/3] xen-disk: add support for multi-page shared rings

2017-06-21 Thread Paul Durrant
> -Original Message-
> From: Stefano Stabellini [mailto:sstabell...@kernel.org]
> Sent: 20 June 2017 23:51
> To: Paul Durrant 
> Cc: xen-de...@lists.xenproject.org; qemu-de...@nongnu.org; qemu-
> bl...@nongnu.org; Stefano Stabellini ; Anthony
> Perard ; Kevin Wolf ;
> Max Reitz 
> Subject: Re: [PATCH 2/3] xen-disk: add support for multi-page shared rings
> 
> On Tue, 20 Jun 2017, Paul Durrant wrote:
> > The blkif protocol has had provision for negotiation of multi-page shared
> > rings for some time now and many guest OS have support in their frontend
> > drivers.
> >
> > This patch makes the necessary modifications to xen-disk support a shared
> > ring up to order 4 (i.e. 16 pages).
> >
> > Signed-off-by: Paul Durrant 
> 
> Thanks for the patch!
> 

You're welcome.

> > ---
> > Cc: Stefano Stabellini 
> > Cc: Anthony Perard 
> > Cc: Kevin Wolf 
> > Cc: Max Reitz 
> > ---
> >  hw/block/xen_disk.c | 141
> 
> >  1 file changed, 110 insertions(+), 31 deletions(-)
> >
> > diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> > index 9b06e3aa81..a9942d32db 100644
> > --- a/hw/block/xen_disk.c
> > +++ b/hw/block/xen_disk.c
> > @@ -36,8 +36,6 @@
> >
> >  static int batch_maps   = 0;
> >
> > -static int max_requests = 32;
> > -
> >  /* - */
> >
> >  #define BLOCK_SIZE  512
> > @@ -84,6 +82,8 @@ struct ioreq {
> >  BlockAcctCookie acct;
> >  };
> >
> > +#define MAX_RING_PAGE_ORDER 4
> > +
> >  struct XenBlkDev {
> >  struct XenDevicexendev;  /* must be first */
> >  char*params;
> > @@ -94,7 +94,8 @@ struct XenBlkDev {
> >  booldirectiosafe;
> >  const char  *fileproto;
> >  const char  *filename;
> > -int ring_ref;
> > +unsigned intring_ref[1 << MAX_RING_PAGE_ORDER];
> > +unsigned intnr_ring_ref;
> >  void*sring;
> >  int64_t file_blk;
> >  int64_t file_size;
> > @@ -110,6 +111,7 @@ struct XenBlkDev {
> >  int requests_total;
> >  int requests_inflight;
> >  int requests_finished;
> > +unsigned intmax_requests;
> >
> >  /* Persistent grants extension */
> >  gbooleanfeature_discard;
> > @@ -199,7 +201,7 @@ static struct ioreq *ioreq_start(struct XenBlkDev
> *blkdev)
> >  struct ioreq *ioreq = NULL;
> >
> >  if (QLIST_EMPTY(>freelist)) {
> > -if (blkdev->requests_total >= max_requests) {
> > +if (blkdev->requests_total >= blkdev->max_requests) {
> >  goto out;
> >  }
> >  /* allocate new struct */
> > @@ -905,7 +907,7 @@ static void blk_handle_requests(struct XenBlkDev
> *blkdev)
> >  ioreq_runio_qemu_aio(ioreq);
> >  }
> >
> > -if (blkdev->more_work && blkdev->requests_inflight < max_requests) {
> > +if (blkdev->more_work && blkdev->requests_inflight < blkdev-
> >max_requests) {
> >  qemu_bh_schedule(blkdev->bh);
> >  }
> >  }
> > @@ -918,15 +920,6 @@ static void blk_bh(void *opaque)
> >  blk_handle_requests(blkdev);
> >  }
> >
> > -/*
> > - * We need to account for the grant allocations requiring contiguous
> > - * chunks; the worst case number would be
> > - * max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
> > - * but in order to keep things simple just use
> > - * 2 * max_req * max_seg.
> > - */
> > -#define MAX_GRANTS(max_req, max_seg) (2 * (max_req) * (max_seg))
> > -
> >  static void blk_alloc(struct XenDevice *xendev)
> >  {
> >  struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev,
> xendev);
> > @@ -938,11 +931,6 @@ static void blk_alloc(struct XenDevice *xendev)
> >  if (xen_mode != XEN_EMULATE) {
> >  batch_maps = 1;
> >  }
> > -if (xengnttab_set_max_grants(xendev->gnttabdev,
> > -MAX_GRANTS(max_requests,
> BLKIF_MAX_SEGMENTS_PER_REQUEST)) < 0) {
> > -xen_pv_printf(xendev, 0, "xengnttab_set_max_grants failed: %s\n",
> > -  strerror(errno));
> > -}
> >  }
> >
> >  static void blk_parse_discard(struct XenBlkDev *blkdev)
> > @@ -1037,6 +1025,9 @@ static int blk_init(struct XenDevice *xendev)
> >!blkdev->feature_grant_copy);
> >  xenstore_write_be_int(>xendev, "info", info);
> >
> > +xenstore_write_be_int(>xendev, "max-ring-page-order",
> > +  MAX_RING_PAGE_ORDER);
> > +
> >  blk_parse_discard(blkdev);
> >
> >  g_free(directiosafe);
> > @@ -1058,12 +1049,25 @@ out_error:
> >  return -1;
> >  }
> >
> > +/*
> > + * We need to account for the grant 

Re: [Qemu-block] [Qemu-devel] [PATCH 1/3] xen-disk: only advertize feature-persistent if grant copy is not available

2017-06-21 Thread Paul Durrant
> -Original Message-
> From: Qemu-devel [mailto:qemu-devel-
> bounces+paul.durrant=citrix@nongnu.org] On Behalf Of Paul Durrant
> Sent: 21 June 2017 10:36
> To: Roger Pau Monne ; Stefano Stabellini
> 
> Cc: Kevin Wolf ; qemu-block@nongnu.org; qemu-
> de...@nongnu.org; Max Reitz ; Anthony Perard
> ; xen-de...@lists.xenproject.org
> Subject: Re: [Qemu-devel] [PATCH 1/3] xen-disk: only advertize feature-
> persistent if grant copy is not available
> 
> > -Original Message-
> > From: Roger Pau Monne
> > Sent: 21 June 2017 10:18
> > To: Stefano Stabellini 
> > Cc: Paul Durrant ; xen-
> de...@lists.xenproject.org;
> > qemu-de...@nongnu.org; qemu-block@nongnu.org; Anthony Perard
> > ; Kevin Wolf ; Max
> Reitz
> > 
> > Subject: Re: [PATCH 1/3] xen-disk: only advertize feature-persistent if
> grant
> > copy is not available
> >
> > On Tue, Jun 20, 2017 at 03:19:33PM -0700, Stefano Stabellini wrote:
> > > On Tue, 20 Jun 2017, Paul Durrant wrote:
> > > > If grant copy is available then it will always be used in preference to
> > > > persistent maps. In this case feature-persistent should not be
> advertized
> > > > to the frontend, otherwise it may needlessly copy data into persistently
> > > > granted buffers.
> > > >
> > > > Signed-off-by: Paul Durrant 
> > >
> > > CC'ing Roger.
> > >
> > > It is true that using feature-persistent together with grant copies is a
> > > a very bad idea.
> > >
> > > But this change enstablishes an explicit preference of
> > > feature_grant_copy over feature-persistent in the xen_disk backend. It
> > > is not obvious to me that it should be the case.
> > >
> > > Why is feature_grant_copy (without feature-persistent) better than
> > > feature-persistent (without feature_grant_copy)? Shouldn't we simply
> > > avoid grant copies to copy data to persistent grants?
> >
> > When using persistent grants the frontend must always copy data from
> > the buffer to the persistent grant, there's no way to avoid this.
> >
> > Using grant_copy we move the copy from the frontend to the backend,
> > which means the CPU time of the copy is accounted to the backend. This
> > is not ideal, but IMHO it's better than persistent grants because it
> > avoids keeping a pool of mapped grants that consume memory and make
> > the code more complex.
> >
> > Do you have some performance data showing the difference between
> > persistent grants vs grant copy?
> >
> 
> No, but I can get some :-)
> 
> For a little background... I've been trying to push throughput of fio running 
> in
> a debian stretch guest on my skull canyon NUC. When I started out, I was
> getting ~100MBbs. When I finished, with this patch, the IOThreads one, the
> multi-page ring one and a bit of hackery to turn off all the aio flushes that
> seem to occur even if the image is opened with O_DIRECT, I was getting
> ~960Mbps... which is about line rate for the SSD in the in NUC.
> 
> So, I'll force use of persistent grants on and see what sort of throughput I
> get.

A quick test with grant copy forced off (causing persistent grants to be 
used)... My VM is debian stretch using a 16 page shared ring from blkfront. The 
image backing xvdb is a fully inflated 10G qcow2.

root@dhcp-237-70:~# fio --randrepeat=1 --ioengine=libaio --direct=0 
--gtod_reduce=1 --name=test --filename=/dev/xvdb --bs=512k --iodepth=64 
--size=10G --readwrite=randwrite --ramp_time=4
test: (g=0): rw=randwrite, bs=512K-512K/512K-512K/512K-512K, ioengine=libaio, 
iodepth=64
fio-2.16
Starting 1 process
Jobs: 1 (f=1): [w(1)] [70.6% done] [0KB/539.4MB/0KB /s] [0/1078/0 iops] [eta 
00m:05s]
test: (groupid=0, jobs=1): err= 0: pid=633: Wed Jun 21 06:26:06 2017
  write: io=6146.6MB, bw=795905KB/s, iops=1546, runt=  7908msec
  cpu  : usr=2.07%, sys=34.00%, ctx=4490, majf=0, minf=1
  IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.3%, >=64=166.9%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
 issued: total=r=0/w=12230/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
 latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: io=6146.6MB, aggrb=795904KB/s, minb=795904KB/s, maxb=795904KB/s, 
mint=7908msec, maxt=7908msec

Disk stats (read/write):
  xvdb: ios=54/228860, merge=0/2230616, ticks=16/5403048, in_queue=5409068, 
util=98.26%

The dom0 cpu usage for the relevant IOThread was ~60%

The same test with grant copy...

root@dhcp-237-70:~# fio --randrepeat=1 --ioengine=libaio --direct=0 
--gtod_reduce=1 --name=test --filename=/dev/xvdb --bs=512k --iodepth=64 
--size=10G --readwrite=randwrite --ramp_time=4
test: (g=0): rw=randwrite, 

[Qemu-block] [PATCH v3] live-block-ops.txt: Rename, rewrite, and improve it

2017-06-21 Thread Kashyap Chamarthy
This edition documents (including their QMP invocations) all four
operations:

  - `block-stream`
  - `block-commit`
  - `drive-mirror` (& `blockdev-mirror`)
  - `drive-backup` (& `blockdev-backup`)

Things considered while writing this document:

  - Use reStructuredText as markup language (with the goal of generating
the HTML output using the Sphinx Documentation Generator).  It is
gentler on the eye, and can be trivially converted to different
formats.  (Another reason: upstream QEMU is considering to switch to
Sphinx, which uses reStructuredText as its markup language.)

  - Raw QMP JSON output vs. 'qmp-shell'.  I debated with myself whether
to only show raw QMP JSON output (as that is the canonical
representation), or use 'qmp-shell', which takes key-value pairs.  I
settled on the approach of: for the first occurence of a command,
use raw JSON; for subsequent occurences, use 'qmp-shell', with an
occasional exception.

  - Usage of `-blockdev` command-line.

  - Usage of 'node-name' vs. file path to refer to disks.  While we have
`blockdev-{mirror, backup}` as 'node-name'-alternatives for
`drive-{mirror, backup}`, the `block-commit` command still operate
on file names for parameters 'base' and 'top'.  So I added a caveat
at the beginning to that effect.

Refer this related thread that I started (where I learnt
`block-stream` was recently reworked to accept 'node-name' for 'top'
and 'base' parameters):
https://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg06466.html
"[RFC] Making 'block-stream', and 'block-commit' accept node-name"

All commands showed in this document were tested while documenting.

Thanks: Eric Blake for the section: "A note on points-in-time vs file
names".  This useful bit was originally articulated by Eric in his
KVMForum 2015 presentation, so I included that specific bit in this
document.

Signed-off-by: Kashyap Chamarthy 
---
* A Sphinx-rendered HTML version is here:
  
https://kashyapc.fedorapeople.org/v3-QEMU-Docs/_build/html/docs/live-block-operations.html

* Change in v3 [address feedback from SFinucane]:
   - Make the Copyright header only part of the source, but not the
 rendered version
   - Use the ".. note::", ".. important::" admonitions where appropriate
   - Add a TODO note to remove the ".. contents::" directive when Sphinx
 is integrated
   - Make effective use of "::", resulting in removing lots of needless
 blank lines
   - Use anchors where applicable
   - Reword content in some places; fix some spelling mistakes

* Changes in v2 [address content feedback from Eric; styling changes
  from Stephen Finucane]:
   - [Styling] Remove the ToC, as the Sphinx, ".. contents::" will take
 auto-generate it as part of the rendered version
   - [Styling] Replace ".. code-block::" with "::" as it depends on the
 external 'pygments' library and the syntaxes available vary between
 different versions. [Thanks to Stephen Finucane, who this tip on
 IRC, from experience of doing Sphinx documentation for the Open
 vSwitch project]
   - [Styling] Remove all needless hyperlinks, since ToC will take care
 of them
   - Fix commit message typos
   - Add Copyright / License boilerplate text at the top
   - Reword sentences in "Disk image backing chain notation" section
   - Fix descriptions of `block-{stream, commit}`
   - Rework `block-stream` QMP invocations to take its 'node-name'
 parameter 'base-node'
   - Add 'file.node-name=file' to the '-blockdev' command-line
   - s/shall/will/g
   - Clarify throughout the document, where appropriate,
 that we're starting afresh with the original disk image chain
   - Address mistakes in "Live block commit (`block-commit`)" and
 "QMP invocation for `block-commit`" sections
   - Describe the case of "shallow mirroring" (synchronize only the
 contents of the *top*-most disk image -- "sync": "top") for
 `drive-mirror`, as it's part of an important use case: live storage
 migration without shared storage setup.  (Add a new section: "QMP
 invocation for live storage migration with `drive-mirror` + NBD" as
 part of this)
   - Add QMP invocation example for `blockdev-{mirror, backup}`

* TODO (after feedback from John Snow):
   - Eric Blake suggested to consider documenting incremental backup
 policies as part of the section: "Live disk backup ---
 `drive-backup` and `blockdev-backup`"
---
 docs/live-block-operations.rst | 1022 
 docs/live-block-ops.txt|   72 ---
 2 files changed, 1022 insertions(+), 72 deletions(-)
 create mode 100644 docs/live-block-operations.rst
 delete mode 100644 docs/live-block-ops.txt

diff --git a/docs/live-block-operations.rst b/docs/live-block-operations.rst
new file mode 100644
index 000..7aebbd9
--- /dev/null
+++ b/docs/live-block-operations.rst
@@ -0,0 +1,1022 @@
+..
+Copyright (C) 2017 Red Hat 

Re: [Qemu-block] [PATCH 1/3] xen-disk: only advertize feature-persistent if grant copy is not available

2017-06-21 Thread Paul Durrant
> -Original Message-
> From: Roger Pau Monne
> Sent: 21 June 2017 10:18
> To: Stefano Stabellini 
> Cc: Paul Durrant ; xen-de...@lists.xenproject.org;
> qemu-de...@nongnu.org; qemu-block@nongnu.org; Anthony Perard
> ; Kevin Wolf ; Max Reitz
> 
> Subject: Re: [PATCH 1/3] xen-disk: only advertize feature-persistent if grant
> copy is not available
> 
> On Tue, Jun 20, 2017 at 03:19:33PM -0700, Stefano Stabellini wrote:
> > On Tue, 20 Jun 2017, Paul Durrant wrote:
> > > If grant copy is available then it will always be used in preference to
> > > persistent maps. In this case feature-persistent should not be advertized
> > > to the frontend, otherwise it may needlessly copy data into persistently
> > > granted buffers.
> > >
> > > Signed-off-by: Paul Durrant 
> >
> > CC'ing Roger.
> >
> > It is true that using feature-persistent together with grant copies is a
> > a very bad idea.
> >
> > But this change enstablishes an explicit preference of
> > feature_grant_copy over feature-persistent in the xen_disk backend. It
> > is not obvious to me that it should be the case.
> >
> > Why is feature_grant_copy (without feature-persistent) better than
> > feature-persistent (without feature_grant_copy)? Shouldn't we simply
> > avoid grant copies to copy data to persistent grants?
> 
> When using persistent grants the frontend must always copy data from
> the buffer to the persistent grant, there's no way to avoid this.
> 
> Using grant_copy we move the copy from the frontend to the backend,
> which means the CPU time of the copy is accounted to the backend. This
> is not ideal, but IMHO it's better than persistent grants because it
> avoids keeping a pool of mapped grants that consume memory and make
> the code more complex.
> 
> Do you have some performance data showing the difference between
> persistent grants vs grant copy?
> 

No, but I can get some :-)

For a little background... I've been trying to push throughput of fio running 
in a debian stretch guest on my skull canyon NUC. When I started out, I was 
getting ~100MBbs. When I finished, with this patch, the IOThreads one, the 
multi-page ring one and a bit of hackery to turn off all the aio flushes that 
seem to occur even if the image is opened with O_DIRECT, I was getting 
~960Mbps... which is about line rate for the SSD in the in NUC.

So, I'll force use of persistent grants on and see what sort of throughput I 
get.

Cheers,

  Paul

> Roger.



Re: [Qemu-block] [Qemu-devel] [PATCH] qemu-img: don't shadow opts variable in img_dd()

2017-06-21 Thread Stefan Hajnoczi
On Mon, Jun 19, 2017 at 05:56:13PM -0700, no-re...@patchew.org wrote:
> collect2: fatal error: ld terminated with signal 9 [Killed]
> compilation terminated.
> Makefile:201: recipe for target 'qemu-system-x86_64w.exe' failed

Spurious CI error - ld(1) out of memory?  Not related to this patch.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-block] [Qemu-devel] [PATCH] qemu-img: don't shadow opts variable in img_dd()

2017-06-21 Thread Stefan Hajnoczi
On Mon, Jun 19, 2017 at 05:18:18PM +0200, Max Reitz wrote:
> On 2017-06-19 17:00, Stefan Hajnoczi wrote:
> > It's confusing when two different variables have the same name in one
> > function.
> > 
> > Cc: Reda Sallahi 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >  qemu-img.c | 9 +++--
> >  1 file changed, 3 insertions(+), 6 deletions(-)
> > 
> > diff --git a/qemu-img.c b/qemu-img.c
> > index 0ad698d..c285c2f 100644
> > --- a/qemu-img.c
> > +++ b/qemu-img.c
> > @@ -4249,15 +4249,12 @@ static int img_dd(int argc, char **argv)
> >  case 'U':
> >  force_share = true;
> >  break;
> > -case OPTION_OBJECT: {
> > -QemuOpts *opts;
> > -opts = qemu_opts_parse_noisily(_object_opts,
> > -   optarg, true);
> > -if (!opts) {
> > +case OPTION_OBJECT:
> > +if (!qemu_opts_parse_noisily(_object_opts, optarg, true)) 
> > {
> >  ret = -1;
> >  goto out;
> >  }
> > -}   break;
> > +break;
> >  case OPTION_IMAGE_OPTS:
> >  image_opts = true;
> >  break;
> 
> Hm, I basically reverted such a style in commit
> 3258b91141090b05edcaab8f1d1dd355ca91b49a. I find it confusing to use the
> same variable for two different things.

I don't follow how the commit you posted is related to this patch.  Did
you read the patch too quickly and think it uses the outer opts
variable?

This patch doesn't use a variable at all - there is no need for one.

Stefan


signature.asc
Description: PGP signature