[Qemu-block] [PATCH for-2.6 4/7] block/vpc: Use the correct max sector count for VHD images

2016-03-22 Thread Jeff Cody
The old VHD_MAX_SECTORS value is incorrect, and is a throwback
to the CHS calculations.  The VHD specification allows images up to 2040
GiB, which (using 512 byte sectors) corresponds to a maximum number of
sectors of 0xff00, rather than the old value of 0xfe0001ff.

Update VHD_MAX_SECTORS to reflect the correct value.

Also, update comment references to the actual size limit, and correct
one compare so that we can have sizes up to the limit.

Signed-off-by: Jeff Cody 
---
 block/vpc.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/block/vpc.c b/block/vpc.c
index 6ad8406..2e023d0 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -51,7 +51,7 @@ enum vhd_type {
 #define VHD_CHS_MAX_H   16
 #define VHD_CHS_MAX_S   255
 
-#define VHD_MAX_SECTORS   (65535LL * 255 * 255)
+#define VHD_MAX_SECTORS   0xff00/* 2040 GiB max image size */
 #define VHD_MAX_GEOMETRY  (VHD_CHS_MAX_C * VHD_CHS_MAX_H * VHD_CHS_MAX_S)
 
 #define VPC_OPT_FORCE_SIZE "force_size"
@@ -316,8 +316,8 @@ static int vpc_open(BlockDriverState *bs, QDict *options, 
int flags,
 BDRV_SECTOR_SIZE;
 }
 
-/* Allow a maximum disk size of approximately 2 TB */
-if (bs->total_sectors >= VHD_MAX_SECTORS) {
+/* Allow a maximum disk size of 2040 GiB */
+if (bs->total_sectors > VHD_MAX_SECTORS) {
 ret = -EFBIG;
 goto fail;
 }
@@ -721,7 +721,7 @@ static int64_t coroutine_fn 
vpc_co_get_block_status(BlockDriverState *bs,
  * Note that the geometry doesn't always exactly match total_sectors but
  * may round it down.
  *
- * Returns 0 on success, -EFBIG if the size is larger than ~2 TB. Override
+ * Returns 0 on success, -EFBIG if the size is larger than 2040 GiB. Override
  * the hardware EIDE and ATA-2 limit of 16 heads (max disk size of 127 GB)
  * and instead allow up to 255 heads.
  */
@@ -927,7 +927,7 @@ static int vpc_create(const char *filename, QemuOpts *opts, 
Error **errp)
 
 if ((int64_t)cyls * heads * secs_per_cyl == VHD_MAX_GEOMETRY) {
 total_sectors = total_size / BDRV_SECTOR_SIZE;
-/* Allow a maximum disk size of approximately 2 TB */
+/* Allow a maximum disk size of 2040 GiB */
 if (total_sectors > VHD_MAX_SECTORS) {
 error_setg(errp, "Disk size is too large, max size is 2040 GiB");
 ret = -EFBIG;
-- 
1.9.3




[Qemu-block] [PATCH for-2.6 5/7] block/vpc: make checks on max table size a bit more lax

2016-03-22 Thread Jeff Cody
The check on the max_table_size field not being larger than required is
valid, and in accordance with the VHD spec.  However, there have been
VHD images encountered in the wild that have an out-of-spec max table
size that is technically too large.

There is no issue in allowing this larger table size, as we also
later verify that the computed size (used for the pagetable) is
large enough to fit all sectors.  In addition, max_table_entries
is bounds checked against SIZE_MAX and INT_MAX.

Remove the strict check, so that we can accomodate these sorts of
images that are benignly out of spec.

Reported-by: Stefan Hajnoczi 
Reported-by: Grant Wu 
Signed-off-by: Jeff Cody 
---
 block/vpc.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/block/vpc.c b/block/vpc.c
index 2e023d0..67ab376 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -350,10 +350,6 @@ static int vpc_open(BlockDriverState *bs, QDict *options, 
int flags,
 ret = -EINVAL;
 goto fail;
 }
-if (s->max_table_entries > (VHD_MAX_SECTORS * 512) / s->block_size) {
-ret = -EINVAL;
-goto fail;
-}
 
 computed_size = (uint64_t) s->max_table_entries * s->block_size;
 if (computed_size < bs->total_sectors * 512) {
-- 
1.9.3




[Qemu-block] [PATCH for-2.6 7/7] block/vpc: update comments to be compliant w/coding guidelines

2016-03-22 Thread Jeff Cody
Signed-off-by: Jeff Cody 
---
 block/vpc.c | 70 ++---
 1 file changed, 35 insertions(+), 35 deletions(-)

diff --git a/block/vpc.c b/block/vpc.c
index 5dd9950..0b48cf4 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -36,7 +36,7 @@
 
 #define HEADER_SIZE 512
 
-//#define CACHE
+/* #define CACHE */
 
 enum vhd_type {
 VHD_FIXED   = 2,
@@ -44,7 +44,7 @@ enum vhd_type {
 VHD_DIFFERENCING= 4,
 };
 
-// Seconds since Jan 1, 2000 0:00:00 (UTC)
+/* Seconds since Jan 1, 2000 0:00:00 (UTC) */
 #define VHD_TIMESTAMP_BASE 946684800
 
 #define VHD_CHS_MAX_C   65535LL
@@ -56,22 +56,22 @@ enum vhd_type {
 
 #define VPC_OPT_FORCE_SIZE "force_size"
 
-// always big-endian
+/* always big-endian */
 typedef struct vhd_footer {
-charcreator[8]; // "conectix"
+charcreator[8]; /* "conectix" */
 uint32_tfeatures;
 uint32_tversion;
 
-// Offset of next header structure, 0x if none
+/* Offset of next header structure, 0x if none */
 uint64_tdata_offset;
 
-// Seconds since Jan 1, 2000 0:00:00 (UTC)
+/* Seconds since Jan 1, 2000 0:00:00 (UTC) */
 uint32_ttimestamp;
 
-charcreator_app[4]; // "vpc "
+charcreator_app[4]; /*  e.g., "vpc " */
 uint16_tmajor;
 uint16_tminor;
-charcreator_os[4]; // "Wi2k"
+charcreator_os[4]; /* "Wi2k" */
 
 uint64_torig_size;
 uint64_tcurrent_size;
@@ -82,29 +82,29 @@ typedef struct vhd_footer {
 
 uint32_ttype;
 
-// Checksum of the Hard Disk Footer ("one's complement of the sum of all
-// the bytes in the footer without the checksum field")
+/* Checksum of the Hard Disk Footer ("one's complement of the sum of all
+   the bytes in the footer without the checksum field") */
 uint32_tchecksum;
 
-// UUID used to identify a parent hard disk (backing file)
+/* UUID used to identify a parent hard disk (backing file) */
 uint8_t uuid[16];
 
 uint8_t in_saved_state;
 } QEMU_PACKED VHDFooter;
 
 typedef struct vhd_dyndisk_header {
-charmagic[8]; // "cxsparse"
+charmagic[8]; /* "cxsparse" */
 
-// Offset of next header structure, 0x if none
+/* Offset of next header structure, 0x if none */
 uint64_tdata_offset;
 
-// Offset of the Block Allocation Table (BAT)
+/* Offset of the Block Allocation Table (BAT) */
 uint64_ttable_offset;
 
 uint32_tversion;
-uint32_tmax_table_entries; // 32bit/entry
+uint32_tmax_table_entries; /* 32bit/entry */
 
-// 2 MB by default, must be a power of two
+/* 2 MB by default, must be a power of two */
 uint32_tblock_size;
 
 uint32_tchecksum;
@@ -112,7 +112,7 @@ typedef struct vhd_dyndisk_header {
 uint32_tparent_timestamp;
 uint32_treserved;
 
-// Backing file name (in UTF-16)
+/* Backing file name (in UTF-16) */
 uint8_t parent_name[512];
 
 struct {
@@ -277,9 +277,9 @@ static int vpc_open(BlockDriverState *bs, QDict *options, 
int flags,
 /* Write 'checksum' back to footer, or else will leave it with zero. */
 footer->checksum = cpu_to_be32(checksum);
 
-// The visible size of a image in Virtual PC depends on the geometry
-// rather than on the size stored in the footer (the size in the footer
-// is too large usually)
+/* The visible size of a image in Virtual PC depends on the geometry
+   rather than on the size stored in the footer (the size in the footer
+   is too large usually) */
 bs->total_sectors = (int64_t)
 be16_to_cpu(footer->cyls) * footer->heads * footer->secs_per_cyl;
 
@@ -465,16 +465,16 @@ static inline int64_t get_sector_offset(BlockDriverState 
*bs,
 pageentry_index = (offset % s->block_size) / 512;
 
 if (pagetable_index >= s->max_table_entries || 
s->pagetable[pagetable_index] == 0x)
-return -1; // not allocated
+return -1; /* not allocated */
 
 bitmap_offset = 512 * (uint64_t) s->pagetable[pagetable_index];
 block_offset = bitmap_offset + s->bitmap_size + (512 * pageentry_index);
 
-// We must ensure that we don't write to any sectors which are marked as
-// unused in the bitmap. We get away with setting all bits in the block
-// bitmap each time we write to a new block. This might cause Virtual PC to
-// miss sparse read optimization, but it's not a problem in terms of
-// correctness.
+/* We must ensure that we don't write to any sectors which are marked as
+   unused in the bitmap. We get away with setting all bits in the block
+   bitmap each time we write to a new block. This might cause Virtual PC to
+   miss sparse read optimization, but it's not a problem in terms of
+   correctness. */
 if (write && (s->last_bitmap_offset != bitmap_offset)) {
 uint8_t 

[Qemu-block] [PATCH for-2.6 3/7] block/vpc: use current_size field for XenConverter VHD images

2016-03-22 Thread Jeff Cody
XenConverter VHD images are another VHD image where current_size is
different from the CHS values in the the format header.  Use
current_size as the default, by looking at the creator_app signature
field.

Signed-off-by: Jeff Cody 
---
 block/vpc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/vpc.c b/block/vpc.c
index 8b8b9a7..6ad8406 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -299,6 +299,7 @@ static int vpc_open(BlockDriverState *bs, QDict *options, 
int flags,
  *  'win '  :  current_size Hyper-V
  *  'd2v '  :  current_size Disk2vhd
  *  'tap\0' :  current_size XenServer
+ *  'CTXS'  :  current_size XenConverter
  *
  *  The user can override the table values via drive options, however
  *  even with an override we will still use current_size for images
@@ -307,6 +308,7 @@ static int vpc_open(BlockDriverState *bs, QDict *options, 
int flags,
 use_chs = (!!strncmp(footer->creator_app, "win ", 4) &&
!!strncmp(footer->creator_app, "qem2", 4) &&
!!strncmp(footer->creator_app, "d2v ", 4) &&
+   !!strncmp(footer->creator_app, "CTXS", 4) &&
!!memcmp(footer->creator_app, "tap", 4)) || s->force_use_chs;
 
 if (!use_chs || bs->total_sectors == VHD_MAX_GEOMETRY || s->force_use_sz) {
-- 
1.9.3




[Qemu-block] [PATCH for-2.6 2/7] vpc: use current_size field for XenServer VHD images

2016-03-22 Thread Jeff Cody
From: Stefan Hajnoczi 

The vpc driver has two methods of determining virtual disk size.  The
correct one to use depends on the software that generated the image
file.  Add the XenServer creator_app signature so that image size is
correctly detected for those images.

Reported-by: Grant Wu 
Reported-by: Spencer Baugh 
Signed-off-by: Stefan Hajnoczi 
Signed-off-by: Jeff Cody 
---
 block/vpc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/vpc.c b/block/vpc.c
index bc3d1c6..8b8b9a7 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -298,6 +298,7 @@ static int vpc_open(BlockDriverState *bs, QDict *options, 
int flags,
  *  'qem2'  :  current_size QEMU (uses current_size)
  *  'win '  :  current_size Hyper-V
  *  'd2v '  :  current_size Disk2vhd
+ *  'tap\0' :  current_size XenServer
  *
  *  The user can override the table values via drive options, however
  *  even with an override we will still use current_size for images
@@ -305,7 +306,8 @@ static int vpc_open(BlockDriverState *bs, QDict *options, 
int flags,
  */
 use_chs = (!!strncmp(footer->creator_app, "win ", 4) &&
!!strncmp(footer->creator_app, "qem2", 4) &&
-   !!strncmp(footer->creator_app, "d2v ", 4)) || s->force_use_chs;
+   !!strncmp(footer->creator_app, "d2v ", 4) &&
+   !!memcmp(footer->creator_app, "tap", 4)) || s->force_use_chs;
 
 if (!use_chs || bs->total_sectors == VHD_MAX_GEOMETRY || s->force_use_sz) {
 bs->total_sectors = be64_to_cpu(footer->current_size) /
-- 
1.9.3




[Qemu-block] [PATCH for-2.6 1/7] block/vpc: fix VPC 'qemu-img create' regression

2016-03-22 Thread Jeff Cody
Commit 'b8f45cdf7827e39f9a1e6cc446f5972cc6144237' switched VPC
over to using blk_pwrite() instead of bdrv_pwrite_sync().  The
return value of bdrv_pwrite_sync() was always 0 for success, and
create_dynamic_disk() in one instance checked for a non-zero return
value to indicate error.  However, blk_pwrite() may return positive
values for success.

This fails silently as well, since vpc_create() did not set errp
in this failuer case.  Set errp in all instances in vpc_create().

Signed-off-by: Jeff Cody 
---
 block/vpc.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/block/vpc.c b/block/vpc.c
index 8435205..bc3d1c6 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -774,7 +774,7 @@ static int create_dynamic_disk(BlockBackend *blk, uint8_t 
*buf,
 num_bat_entries = (total_sectors + block_size / 512) / (block_size / 512);
 
 ret = blk_pwrite(blk, offset, buf, HEADER_SIZE);
-if (ret) {
+if (ret < 0) {
 goto fail;
 }
 
@@ -873,6 +873,7 @@ static int vpc_create(const char *filename, QemuOpts *opts, 
Error **errp)
 } else if (!strcmp(disk_type_param, "fixed")) {
 disk_type = VHD_FIXED;
 } else {
+error_setg(errp, "Invalid disk type, %s", disk_type_param);
 ret = -EINVAL;
 goto out;
 }
@@ -924,6 +925,7 @@ static int vpc_create(const char *filename, QemuOpts *opts, 
Error **errp)
 total_sectors = total_size / BDRV_SECTOR_SIZE;
 /* Allow a maximum disk size of approximately 2 TB */
 if (total_sectors > VHD_MAX_SECTORS) {
+error_setg(errp, "Disk size is too large, max size is 2040 GiB");
 ret = -EFBIG;
 goto out;
 }
@@ -974,6 +976,9 @@ static int vpc_create(const char *filename, QemuOpts *opts, 
Error **errp)
 } else {
 ret = create_fixed_disk(blk, buf, total_size);
 }
+if (ret < 0) {
+error_setg(errp, "Unable to create or write VHD header");
+}
 
 out:
 blk_unref(blk);
-- 
1.9.3




[Qemu-block] [PATCH for-2.6 0/7] block: VHD format fixes

2016-03-22 Thread Jeff Cody
Fixes for a regression in vpc_create(), as well as a few issues with VHD format
compatibility.


Jeff Cody (6):
  block/vpc: fix VPC 'qemu-img create' regression
  block/vpc: use current_size field for XenConverter VHD images
  block/vpc: Use the correct max sector count for VHD images
  block/vpc: make checks on max table size a bit more lax
  block/vpc: set errp in vpc_open
  block/vpc: update comments to be compliant w/coding guidelines

Stefan Hajnoczi (1):
  vpc: use current_size field for XenServer VHD images

 block/vpc.c | 106 ++--
 1 file changed, 60 insertions(+), 46 deletions(-)

-- 
1.9.3




Re: [Qemu-block] [Qemu-devel] [PATCH 06/22] hbitmap: load/store

2016-03-22 Thread John Snow


On 03/15/2016 04:04 PM, Vladimir Sementsov-Ogievskiy wrote:
> Add functions for load/store HBitmap to BDS, using clusters table:
> Last level of the bitmap is splitted into chunks of 'cluster_size'
> size. Each cell of the table contains offset in bds, to load/store
> corresponding chunk.
> 
> Also,
> 0 in cell means all-zeroes-chunk (should not be saved)
> 1 in cell means all-ones-chunk (should not be saved)
> hbitmap_prepare_store() fills table with
>   0 for all-zeroes chunks
>   1 for all-ones chunks
>   2 for others
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/dirty-bitmap.c |  23 +
>  include/block/dirty-bitmap.h |  11 +++
>  include/qemu/hbitmap.h   |  12 +++
>  util/hbitmap.c   | 209 
> +++
>  4 files changed, 255 insertions(+)
> 
> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
> index e68c177..816c6ee 100644
> --- a/block/dirty-bitmap.c
> +++ b/block/dirty-bitmap.c
> @@ -396,3 +396,26 @@ int64_t bdrv_get_dirty_count(BdrvDirtyBitmap *bitmap)
>  {
>  return hbitmap_count(bitmap->bitmap);
>  }
> +
> +int bdrv_dirty_bitmap_load(BdrvDirtyBitmap *bitmap, BlockDriverState *bs,
> +   const uint64_t *table, uint32_t table_size,
> +   uint32_t cluster_size)
> +{
> +return hbitmap_load(bitmap->bitmap, bs, table, table_size, cluster_size);
> +}
> +
> +int bdrv_dirty_bitmap_prepare_store(const BdrvDirtyBitmap *bitmap,
> +uint32_t cluster_size,
> +uint64_t *table,
> +uint32_t *table_size)
> +{
> +return hbitmap_prepare_store(bitmap->bitmap, cluster_size,
> + table, table_size);
> +}
> +
> +int bdrv_dirty_bitmap_store(const BdrvDirtyBitmap *bitmap, BlockDriverState 
> *bs,
> +const uint64_t *table, uint32_t table_size,
> +uint32_t cluster_size)
> +{
> +return hbitmap_store(bitmap->bitmap, bs, table, table_size, 
> cluster_size);
> +}
> diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
> index 27515af..20cb540 100644
> --- a/include/block/dirty-bitmap.h
> +++ b/include/block/dirty-bitmap.h
> @@ -43,4 +43,15 @@ void bdrv_set_dirty_iter(struct HBitmapIter *hbi, int64_t 
> offset);
>  int64_t bdrv_get_dirty_count(BdrvDirtyBitmap *bitmap);
>  void bdrv_dirty_bitmap_truncate(BlockDriverState *bs);
>  
> +int bdrv_dirty_bitmap_load(BdrvDirtyBitmap *bitmap, BlockDriverState *bs,
> +   const uint64_t *table, uint32_t table_size,
> +   uint32_t cluster_size);
> +int bdrv_dirty_bitmap_prepare_store(const BdrvDirtyBitmap *bitmap,
> +uint32_t cluster_size,
> +uint64_t *table,
> +uint32_t *table_size);
> +int bdrv_dirty_bitmap_store(const BdrvDirtyBitmap *bitmap, BlockDriverState 
> *bs,
> +const uint64_t *table, uint32_t table_size,
> +uint32_t cluster_size);
> +
>  #endif
> diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
> index 6d1da4d..d83bb79 100644
> --- a/include/qemu/hbitmap.h
> +++ b/include/qemu/hbitmap.h
> @@ -241,5 +241,17 @@ static inline size_t hbitmap_iter_next_word(HBitmapIter 
> *hbi, unsigned long *p_c
>  return hbi->pos;
>  }
>  
> +typedef struct BlockDriverState BlockDriverState;
> +
> +int hbitmap_load(HBitmap *bitmap, BlockDriverState *bs,
> + const uint64_t *table, uint32_t table_size,
> + uint32_t cluster_size);
> +
> +int hbitmap_prepare_store(const HBitmap *bitmap, uint32_t cluster_size,
> +  uint64_t *table, uint32_t *table_size);
> +
> +int hbitmap_store(HBitmap *bitmap, BlockDriverState *bs,
> +  const uint64_t *table, uint32_t table_size,
> +  uint32_t cluster_size);
>  
>  #endif
> diff --git a/util/hbitmap.c b/util/hbitmap.c
> index 28595fb..1960e4f 100644
> --- a/util/hbitmap.c
> +++ b/util/hbitmap.c
> @@ -15,6 +15,8 @@
>  #include "qemu/host-utils.h"
>  #include "trace.h"
>  
> +#include "block/block.h"
> +
>  /* HBitmaps provides an array of bits.  The bits are stored as usual in an
>   * array of unsigned longs, but HBitmap is also optimized to provide fast
>   * iteration over set bits; going from one bit to the next is O(logB n)
> @@ -499,3 +501,210 @@ char *hbitmap_md5(const HBitmap *bitmap)
>  const guchar *data = (const guchar *)bitmap->levels[HBITMAP_LEVELS - 1];
>  return g_compute_checksum_for_data(G_CHECKSUM_MD5, data, size);
>  }
> +
> +/* hb_restore_levels()
> + * Using the last level restore all other levels
> + */
> +static void hb_restore_levels(HBitmap *bitmap)
> +{
> +int64_t i, size, prev_size;
> +int lev;
> +
> +/* 

Re: [Qemu-block] [Qemu-devel] [PATCH 01/12] block: Don't disable I/O throttling on sync requests

2016-03-22 Thread Eric Blake
On 03/22/2016 09:33 AM, Kevin Wolf wrote:
> We had to disable I/O throttling with synchronous requests because we
> didn't use to run timers in nested event loops when the code was
> introduced. This isn't true any more, and throttling works just fine
> even when using the synchronous API.
> 
> The removed code is in fact dead code since commit a8823a3b ('block: Use
> blk_co_pwritev() for blk_write()') because I/O throttling can only be
> set on the top layer, but BlockBackend always uses the coroutine
> interface now instead of using the sync API emulation in block.c.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block/io.c | 11 ---
>  1 file changed, 11 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index cce508a..e4438da 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -561,17 +561,6 @@ static int bdrv_prwv_co(BlockDriverState *bs, int64_t 
> offset,
>  .flags = flags,
>  };
>  
> -/**
> - * In sync call context, when the vcpu is blocked, this throttling timer
> - * will not fire; so the I/O throttling function has to be disabled here
> - * if it has been enabled.
> - */
> -if (bs->io_limits_enabled) {
> -fprintf(stderr, "Disabling I/O throttling on '%s' due "
> -"to synchronous I/O.\n", bdrv_get_device_name(bs));

And we get rid of an fprintf().  Nice bonus.

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 0/9] block: Remove BlockDriverState.blk

2016-03-22 Thread Paolo Bonzini


On 22/03/2016 20:36, Kevin Wolf wrote:
> This is the final patch series that is required before we can start allowing
> setups with more than one BlockBackend per BlockDriverState.
> 
> My current plan is to put the patches up to (and including) this series into
> 2.6 so that we have a relatively consistent block layer state in the release
> that isn't in the middle of an overhaul, but not to make use of the new setups
> that we could allow now before 2.7.
> 
> Depends on 'block: Move I/O throttling to BlockBackend'.

I'm not sure how "posted one month after soft freeze", "depends on a
series also posted one month after soft freeze" and "put the patches up
to (and including) this series into 2.6" could go in the same message.

Paolo



Re: [Qemu-block] [PATCH 00/12] block: Move I/O throttling to BlockBackend

2016-03-22 Thread Paolo Bonzini
On 22/03/2016 16:33, Kevin Wolf wrote:
> This is another feature that was "logically" part of the BlockBackend, but
> implemented as a BlockDriverState feature. It was always kept on top using
> swap_feature_fields().
> 
> This series moves it to be actually implemented in the BlockBackend, removing
> another obstacle for removing bs->blk and allowing multiple BBs per BDS.
> 
> Depends on 'block: Implement writethrough in BlockBackend'.

This series would mess up my own I/O throttling cleanups that have been
posted in February and have hardly seen a review for one month.

I expect the rules for soft freeze to apply to maintainers as well.
These patches and the removal of bs->blk are about one month late and
shouldn't be included in 2.6.

Thanks,

Paolo

> Kevin Wolf (12):
>   block: Don't disable I/O throttling on sync requests
>   block: Make sure throttled BDSes always have a BB
>   block: Introduce BlockBackendPublic
>   block: throttle-groups: Use BlockBackend pointers internally
>   block: Convert throttle_group_get_name() to BlockBackend
>   block: Move throttling fields from BDS to BB
>   block: Move actual I/O throttling to BlockBackend
>   block: Move I/O throttling configuration functions to BlockBackend
>   block: Introduce BdrvChild.opaque
>   block: Drain throttling queue with BdrvChild callback
>   block: Decouple throttling from BlockDriverState
>   block: Don't check throttled reqs in bdrv_requests_pending()
> 
>  block.c |  23 +
>  block/block-backend.c   | 146 +-
>  block/io.c  | 115 +++--
>  block/qapi.c|   6 +-
>  block/throttle-groups.c | 223 
> +---
>  blockdev.c  |  43 +++-
>  include/block/block.h   |   6 +-
>  include/block/block_int.h   |  23 +
>  include/block/throttle-groups.h |  12 +--
>  include/sysemu/block-backend.h  |  27 +
>  tests/test-throttle.c   |  62 ++-
>  11 files changed, 345 insertions(+), 341 deletions(-)
> 



Re: [Qemu-block] [PATCH v3 09/10] chardev: add support for ACLs for TLS clients

2016-03-22 Thread Eric Blake
On 03/10/2016 11:59 AM, Daniel P. Berrange wrote:
> Currently any client which can complete the TLS handshake
> is able to use a chardev server. The server admin can turn
> on the 'verify-peer' option for the x509 creds to require
> the client to provide a x509 certificate. This means the
> client will have to acquire a certificate from the CA before
> they are permitted to use the chardev server. This is still
> a fairly weak bar.
> 
> This adds a 'tls-acl=ACL-ID' option to the socket chardev
> backend which takes the ID of a previously added 'QAuthZ'
> object instance. This ACL will be used to validate the client's
> x509 distinguished name. Clients failing the ACL will not be
> permitted to use the chardev server.
> 
> For example to setup an ACL that only allows connection from
> a client whose x509 certificate distinguished name contains
> 'CN=fred', you would use:
> 
>   $QEMU -object tls-creds-x509,id=tls0,dir=/home/berrange/qemutls,\
> endpoint=server,verify-peer=yes \
> -object authz-simple,id=acl0,policy=deny,\
> rules.0.match=*CN=fred,rules.0.policy=allow \

Needs shell quoting for *, and also the same recurring comment about
whitespace for presentation not actually being in the command line.

Food for thought: should we enhance QemuOpts to skip all whitespace
after ',', since we _know_ that valid key names start with a letter
rather than a space?  Then, we could represent command lines as:

$QEMU -object 'name,
   param1=value,
   param2=value'

with the same semantics as:

$QEMU -object name,param1=value,param2=value

and without having to worry about backslash-newline-whitespace
formatting.  Obviously, such an enhancement would be a separate patch.

> -chardev socket,host=127.0.0.1,port=9000,server,\
>tls-creds=tls0,tls-acl=acl0 \
> ...other qemud args...
> 
> Signed-off-by: Daniel P. Berrange 
> ---
>  qapi-schema.json |  2 ++
>  qemu-char.c  | 11 ++-
>  2 files changed, 12 insertions(+), 1 deletion(-)
> 

Code is fine; my only comments were on the commit message.
Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-block] [PATCH 9/9] block: Remove BlockDriverState.blk

2016-03-22 Thread Kevin Wolf
This patch removes the remaining users of bs->blk, which will allow us
to have multiple BBs on top of a single BDS. All checks that are
currently in place to prevent the user from creating such setups.

Future patches can allow them and e.g. enable users to mirror to a block
device that already has a BlockBackend on it.

Signed-off-by: Kevin Wolf 
---
 block.c|  8 
 block/block-backend.c  |  8 
 block/mirror.c |  4 ++--
 blockdev.c | 19 ---
 include/block/block_int.h  |  2 --
 tests/qemu-iotests/085.out |  6 +++---
 6 files changed, 13 insertions(+), 34 deletions(-)

diff --git a/block.c b/block.c
index eefbcf3..05f9ad4 100644
--- a/block.c
+++ b/block.c
@@ -2227,14 +2227,6 @@ static void change_parent_backing_link(BlockDriverState 
*from,
 {
 BdrvChild *c, *next;
 
-if (from->blk) {
-/* FIXME We bypass blk_set_bs(), so we need to make these updates
- * manually. The root problem is not in this change function, but the
- * existence of BlockDriverState.blk. */
-to->blk = from->blk;
-from->blk = NULL;
-}
-
 QLIST_FOREACH_SAFE(c, >parents, next_parent, next) {
 assert(c->role != _backing);
 c->bs = to;
diff --git a/block/block-backend.c b/block/block-backend.c
index b71b822..42f95a6 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -190,7 +190,6 @@ BlockBackend *blk_new_with_bs(Error **errp)
 bs = bdrv_new_root();
 blk->root = bdrv_root_attach_child(bs, "root", _root);
 blk->root->opaque = blk;
-bs->blk = blk;
 return blk;
 }
 
@@ -453,12 +452,10 @@ bool bdrv_has_blk(BlockDriverState *bs)
 BdrvChild *child;
 QLIST_FOREACH(child, >parents, next_parent) {
 if (child->role == _root) {
-assert(bs->blk);
 return true;
 }
 }
 
-assert(!bs->blk);
 return false;
 }
 
@@ -518,8 +515,6 @@ BlockBackend *blk_by_public(BlockBackendPublic *public)
  */
 void blk_remove_bs(BlockBackend *blk)
 {
-assert(blk->root->bs->blk == blk);
-
 notifier_list_notify(>remove_bs_notifiers, blk);
 if (blk->public.throttle_state) {
 throttle_timers_detach_aio_context(>public.throttle_timers);
@@ -527,7 +522,6 @@ void blk_remove_bs(BlockBackend *blk)
 
 blk_update_root_state(blk);
 
-blk->root->bs->blk = NULL;
 bdrv_root_unref_child(blk->root);
 blk->root = NULL;
 }
@@ -537,11 +531,9 @@ void blk_remove_bs(BlockBackend *blk)
  */
 void blk_insert_bs(BlockBackend *blk, BlockDriverState *bs)
 {
-assert(!blk->root && !bs->blk);
 bdrv_ref(bs);
 blk->root = bdrv_root_attach_child(bs, "root", _root);
 blk->root->opaque = blk;
-bs->blk = blk;
 
 notifier_list_notify(>insert_bs_notifiers, blk);
 if (blk->public.throttle_state) {
diff --git a/block/mirror.c b/block/mirror.c
index b9a93fd..8b32271 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -450,7 +450,7 @@ static void mirror_exit(BlockJob *job, void *opaque)
 
 /* This was checked in mirror_start_job(), but meanwhile one of the
  * nodes could have been newly attached to a BlockBackend. */
-if (to_replace->blk && s->target->blk) {
+if (bdrv_has_blk(to_replace) && bdrv_has_blk(s->target)) {
 error_report("block job: Can't create node with two 
BlockBackends");
 data->ret = -EINVAL;
 goto out;
@@ -825,7 +825,7 @@ static void mirror_start_job(BlockDriverState *bs, 
BlockDriverState *target,
 } else {
 replaced_bs = bs;
 }
-if (replaced_bs->blk && target->blk) {
+if (bdrv_has_blk(replaced_bs) && bdrv_has_blk(target)) {
 error_setg(errp, "Can't create node with two BlockBackends");
 return;
 }
diff --git a/blockdev.c b/blockdev.c
index c59cf3e..a658869 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1774,9 +1774,8 @@ static void external_snapshot_prepare(BlkActionState 
*common,
 return;
 }
 
-if (state->new_bs->blk != NULL) {
-error_setg(errp, "The snapshot is already in use by %s",
-   blk_name(state->new_bs->blk));
+if (bdrv_has_blk(state->new_bs)) {
+error_setg(errp, "The snapshot is already in use");
 return;
 }
 
@@ -2492,9 +2491,8 @@ void qmp_x_blockdev_insert_medium(const char *device, 
const char *node_name,
 return;
 }
 
-if (bs->blk) {
-error_setg(errp, "Node '%s' is already in use by '%s'", node_name,
-   blk_name(bs->blk));
+if (bdrv_has_blk(bs)) {
+error_setg(errp, "Node '%s' is already in use", node_name);
 return;
 }
 
@@ -3439,7 +3437,7 @@ static void blockdev_mirror_common(BlockDriverState *bs,
 if (bdrv_op_is_blocked(target, BLOCK_OP_TYPE_MIRROR_TARGET, errp)) {
 return;
 }
-if (target->blk) {
+if (bdrv_has_blk(target)) {
 error_setg(errp, "Cannot mirror to an attached block device");
  

[Qemu-block] [PATCH 8/9] block: Don't return throttling info in query-named-block-nodes

2016-03-22 Thread Kevin Wolf
query-named-block-nodes should not return information that is related
to the attached BlockBackend rather than the node itself, so throttling
information needs to be removed from it.

Signed-off-by: Kevin Wolf 
---
 block/qapi.c   | 6 +++---
 tests/qemu-iotests/096 | 3 ++-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/qapi.c b/block/qapi.c
index 0a222f6..d167e67 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -66,10 +66,10 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
 info->backing_file_depth = bdrv_get_backing_file_depth(bs);
 info->detect_zeroes = bs->detect_zeroes;
 
-if (bs->blk && blk_get_public(bs->blk)->throttle_state) {
+if (blk && blk_get_public(blk)->throttle_state) {
 ThrottleConfig cfg;
 
-throttle_group_get_config(bs->blk, );
+throttle_group_get_config(blk, );
 
 info->bps = cfg.buckets[THROTTLE_BPS_TOTAL].avg;
 info->bps_rd  = cfg.buckets[THROTTLE_BPS_READ].avg;
@@ -117,7 +117,7 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
 info->iops_size = cfg.op_size;
 
 info->has_group = true;
-info->group = g_strdup(throttle_group_get_name(bs->blk));
+info->group = g_strdup(throttle_group_get_name(blk));
 }
 
 info->write_threshold = bdrv_write_threshold_get(bs);
diff --git a/tests/qemu-iotests/096 b/tests/qemu-iotests/096
index e34204b..aeeb375 100644
--- a/tests/qemu-iotests/096
+++ b/tests/qemu-iotests/096
@@ -45,8 +45,9 @@ class TestLiveSnapshot(iotests.QMPTestCase):
 os.remove(self.target_img)
 
 def checkConfig(self, active_layer):
-result = self.vm.qmp('query-named-block-nodes')
+result = self.vm.qmp('query-block')
 for r in result['return']:
+r = r['inserted']
 if r['node-name'] == active_layer:
 self.assertEqual(r['group'], self.group)
 self.assertEqual(r['iops'], self.iops)
-- 
1.8.3.1




[Qemu-block] [PATCH 5/9] block: Avoid BDS.blk in bdrv_next()

2016-03-22 Thread Kevin Wolf
We just want to know whether a BDS has at least one BB attached in order
to avoid enumerating it twice. This doesn't depend on the exact BB that
is attached and is still a valid question when more than one BB can be
attached, so just answer it by checking the parents list.

Signed-off-by: Kevin Wolf 
---
 block.c|  4 ++--
 block/block-backend.c  | 17 +
 include/sysemu/block-backend.h |  1 +
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index 4cc117d..66f918e 100644
--- a/block.c
+++ b/block.c
@@ -2904,7 +2904,7 @@ BlockDriverState *bdrv_next_node(BlockDriverState *bs)
  * the monitor or attached to a BlockBackend */
 BlockDriverState *bdrv_next(BlockDriverState *bs)
 {
-if (!bs || bs->blk) {
+if (!bs || bdrv_has_blk(bs)) {
 bs = blk_next_root_bs(bs);
 if (bs) {
 return bs;
@@ -2915,7 +2915,7 @@ BlockDriverState *bdrv_next(BlockDriverState *bs)
  * handled by the above block already */
 do {
 bs = bdrv_next_monitor_owned(bs);
-} while (bs && bs->blk);
+} while (bs && bdrv_has_blk(bs));
 return bs;
 }
 
diff --git a/block/block-backend.c b/block/block-backend.c
index dfc11b5..fdd1727 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -434,6 +434,23 @@ BlockDriverState *blk_bs(BlockBackend *blk)
 }
 
 /*
+ * Returns true if @bs has an associated BlockBackend.
+ */
+bool bdrv_has_blk(BlockDriverState *bs)
+{
+BdrvChild *child;
+QLIST_FOREACH(child, >parents, next_parent) {
+if (child->role == _root) {
+assert(bs->blk);
+return true;
+}
+}
+
+assert(!bs->blk);
+return false;
+}
+
+/*
  * Return @blk's DriveInfo if any, else null.
  */
 DriveInfo *blk_legacy_dinfo(BlockBackend *blk)
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 1ae1ac9..bd68a17 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -99,6 +99,7 @@ BlockBackend *blk_by_public(BlockBackendPublic *public);
 BlockDriverState *blk_bs(BlockBackend *blk);
 void blk_remove_bs(BlockBackend *blk);
 void blk_insert_bs(BlockBackend *blk, BlockDriverState *bs);
+bool bdrv_has_blk(BlockDriverState *bs);
 
 void blk_set_allow_write_beyond_eof(BlockBackend *blk, bool allow);
 void blk_iostatus_enable(BlockBackend *blk);
-- 
1.8.3.1




[Qemu-block] [PATCH 6/9] block: Remove bdrv_move_feature_fields()

2016-03-22 Thread Kevin Wolf
bdrv_move_feature_fields() and swap_feature_fields() are empty now, they
can be removed.

Signed-off-by: Kevin Wolf 
---
 block.c | 30 --
 1 file changed, 30 deletions(-)

diff --git a/block.c b/block.c
index 66f918e..3770fb0 100644
--- a/block.c
+++ b/block.c
@@ -,13 +,6 @@ void bdrv_close_all(void)
 }
 }
 
-/* Fields that need to stay with the top-level BDS */
-static void bdrv_move_feature_fields(BlockDriverState *bs_dest,
- BlockDriverState *bs_src)
-{
-/* move some fields that need to stay attached to the device */
-}
-
 static void change_parent_backing_link(BlockDriverState *from,
BlockDriverState *to)
 {
@@ -2252,16 +2245,6 @@ static void change_parent_backing_link(BlockDriverState 
*from,
 }
 }
 
-static void swap_feature_fields(BlockDriverState *bs_top,
-BlockDriverState *bs_new)
-{
-BlockDriverState tmp;
-
-bdrv_move_feature_fields(, bs_top);
-bdrv_move_feature_fields(bs_top, bs_new);
-bdrv_move_feature_fields(bs_new, );
-}
-
 /*
  * Add new bs contents at the top of an image chain while the chain is
  * live, while keeping required fields on the top layer.
@@ -2285,9 +2268,6 @@ void bdrv_append(BlockDriverState *bs_new, 
BlockDriverState *bs_top)
 
 bdrv_ref(bs_top);
 
-/* Some fields always stay on top of the backing file chain */
-swap_feature_fields(bs_top, bs_new);
-
 change_parent_backing_link(bs_top, bs_new);
 bdrv_set_backing_hd(bs_new, bs_top);
 bdrv_unref(bs_top);
@@ -2304,16 +2284,6 @@ void bdrv_replace_in_backing_chain(BlockDriverState 
*old, BlockDriverState *new)
 
 bdrv_ref(old);
 
-if (old->blk) {
-/* As long as these fields aren't in BlockBackend, but in the top-level
- * BlockDriverState, it's not possible for a BDS to have two BBs.
- *
- * We really want to copy the fields from old to new, but we go for a
- * swap instead so that pointers aren't duplicated and cause trouble.
- * (Also, bdrv_swap() used to do the same.) */
-assert(!new->blk);
-swap_feature_fields(old, new);
-}
 change_parent_backing_link(old, new);
 
 /* Change backing files if a previously independent node is added to the
-- 
1.8.3.1




[Qemu-block] [PATCH 2/9] block: User BdrvChild callback for device name

2016-03-22 Thread Kevin Wolf
In order to get rid of bs->blk for bdrv_get_device_name() and
bdrv_get_device_or_node_name(), ask all parents for their name and
simply pick the first one.

Signed-off-by: Kevin Wolf 
---
 block.c   | 22 --
 block/block-backend.c |  6 ++
 include/block/block_int.h |  1 +
 3 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index 1fb5dac..4cc117d 100644
--- a/block.c
+++ b/block.c
@@ -2924,10 +2924,28 @@ const char *bdrv_get_node_name(const BlockDriverState 
*bs)
 return bs->node_name;
 }
 
+static const char *bdrv_get_parent_name(const BlockDriverState *bs)
+{
+BdrvChild *c;
+const char *name;
+
+/* If multiple parents have a name, just pick the first one. */
+QLIST_FOREACH(c, >parents, next_parent) {
+if (c->role->get_name) {
+name = c->role->get_name(c);
+if (name && *name) {
+return name;
+}
+}
+}
+
+return NULL;
+}
+
 /* TODO check what callers really want: bs->node_name or blk_name() */
 const char *bdrv_get_device_name(const BlockDriverState *bs)
 {
-return bs->blk ? blk_name(bs->blk) : "";
+return bdrv_get_parent_name(bs) ?: "";
 }
 
 /* This can be used to identify nodes that might not have a device
@@ -2936,7 +2954,7 @@ const char *bdrv_get_device_name(const BlockDriverState 
*bs)
  * absent, then this returns an empty (non-null) string. */
 const char *bdrv_get_device_or_node_name(const BlockDriverState *bs)
 {
-return bs->blk ? blk_name(bs->blk) : bs->node_name;
+return bdrv_get_parent_name(bs) ?: bs->node_name;
 }
 
 int bdrv_get_flags(BlockDriverState *bs)
diff --git a/block/block-backend.c b/block/block-backend.c
index 9d973ba..c4c0dc0 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -95,11 +95,17 @@ static bool blk_drain_throttling_queue(BdrvChild *child);
 static void blk_root_change_media(BdrvChild *child, bool load);
 static void blk_root_resize(BdrvChild *child);
 
+static const char *blk_root_get_name(BdrvChild *child)
+{
+return blk_name(child->opaque);
+}
+
 static const BdrvChildRole child_root = {
 .inherit_options= blk_root_inherit_options,
 
 .change_media   = blk_root_change_media,
 .resize = blk_root_resize,
+.get_name   = blk_root_get_name,
 
 .drain_queue= blk_drain_throttling_queue,
 };
diff --git a/include/block/block_int.h b/include/block/block_int.h
index f60cb7c..195abe8 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -358,6 +358,7 @@ struct BdrvChildRole {
 
 void (*change_media)(BdrvChild *child, bool load);
 void (*resize)(BdrvChild *child);
+const char* (*get_name)(BdrvChild *child);
 
 bool (*drain_queue)(BdrvChild *child);
 };
-- 
1.8.3.1




[Qemu-block] [PATCH 3/9] block jobs: Use BdrvChild callbacks for iostatus operations

2016-03-22 Thread Kevin Wolf
The block jobs currently modify the target BB's error handling options
and require that the source BB's iostatus is enabled in order to
implement the per-job error options. It's obvious that this is something
between ugly, adventurous and plain wrong, so we should ideally fix this
instead of thinking about how to handle multiple BBs in this case.

Unfortunately we have a chicken-and-egg problem here: In order to fix
the problem, the block jobs need their own BBs. This requires multiple
BBs per BDS, which in turn means that BDS.blk must be removed. Removing
BDS.blk means that the jobs already need their own BB. The only other
options is that we lift the iostatus operations to the BdrvChild level
and deal with multiple BBs now.

So even though we don't want to have these callbacks in the end (because
they indicate broken logic), this patch converts the iostatus operations
for block jobs targets to BdrvChild callbacks; if more than one parent
implements iostatus operations, they are called for each of them.

Once the conversion is completed, block jobs will get their own internal
BB and the callbacks can be removed again.

Signed-off-by: Kevin Wolf 
---
 block/backup.c| 20 +--
 block/block-backend.c | 42 +++
 block/commit.c|  2 +-
 block/mirror.c| 21 ++--
 block/stream.c|  2 +-
 blockjob.c| 85 +--
 include/block/block_int.h | 10 ++
 include/block/blockjob.h  |  9 +
 8 files changed, 165 insertions(+), 26 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 10397e2..016741e 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -220,9 +220,8 @@ static void backup_iostatus_reset(BlockJob *job)
 {
 BackupBlockJob *s = container_of(job, BackupBlockJob, common);
 
-if (s->target->blk) {
-blk_iostatus_reset(s->target->blk);
-}
+/* FIXME Only touch iostatus of a job-owned BB */
+bdrv_iostatus_reset(s->target);
 }
 
 static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
@@ -402,10 +401,9 @@ static void coroutine_fn backup_run(void *opaque)
 
 job->done_bitmap = bitmap_new(end);
 
-if (target->blk) {
-blk_set_on_error(target->blk, on_target_error, on_target_error);
-blk_iostatus_enable(target->blk);
-}
+/* FIXME Only touch on_error and iostatus of a job-owned BB */
+bdrv_set_on_error(target, on_target_error, on_target_error);
+bdrv_iostatus_enable(target);
 
 bdrv_add_before_write_notifier(bs, _write);
 
@@ -482,9 +480,9 @@ static void coroutine_fn backup_run(void *opaque)
 qemu_co_rwlock_unlock(>flush_rwlock);
 g_free(job->done_bitmap);
 
-if (target->blk) {
-blk_iostatus_disable(target->blk);
-}
+/* FIXME Only touch iostatus of a job-owned BB */
+bdrv_iostatus_disable(target);
+
 bdrv_op_unblock_all(target, job->common.blocker);
 
 data = g_malloc(sizeof(*data));
@@ -515,7 +513,7 @@ void backup_start(BlockDriverState *bs, BlockDriverState 
*target,
 
 if ((on_source_error == BLOCKDEV_ON_ERROR_STOP ||
  on_source_error == BLOCKDEV_ON_ERROR_ENOSPC) &&
-(!bs->blk || !blk_iostatus_is_enabled(bs->blk))) {
+!bdrv_iostatus_is_enabled(bs)) {
 error_setg(errp, QERR_INVALID_PARAMETER, "on-source-error");
 return;
 }
diff --git a/block/block-backend.c b/block/block-backend.c
index c4c0dc0..03e68a7 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -100,6 +100,39 @@ static const char *blk_root_get_name(BdrvChild *child)
 return blk_name(child->opaque);
 }
 
+static void blk_root_iostatus_reset(BdrvChild *child)
+{
+return blk_iostatus_reset(child->opaque);
+}
+
+static void blk_root_iostatus_enable(BdrvChild *child)
+{
+return blk_iostatus_enable(child->opaque);
+}
+
+static void blk_root_iostatus_disable(BdrvChild *child)
+{
+return blk_iostatus_disable(child->opaque);
+}
+
+static bool blk_root_iostatus_is_enabled(BdrvChild *child)
+{
+return blk_iostatus_is_enabled(child->opaque);
+}
+
+static void blk_root_iostatus_set_err(BdrvChild *child, int error)
+{
+return blk_iostatus_set_err(child->opaque, error);
+}
+
+static void blk_root_set_on_error(BdrvChild *child,
+  BlockdevOnError on_read_error,
+  BlockdevOnError on_write_error)
+{
+return blk_set_on_error(child->opaque, on_read_error, on_write_error);
+}
+
+
 static const BdrvChildRole child_root = {
 .inherit_options= blk_root_inherit_options,
 
@@ -108,6 +141,15 @@ static const BdrvChildRole child_root = {
 .get_name   = blk_root_get_name,
 
 .drain_queue= blk_drain_throttling_queue,
+
+/* FIXME Remove these callbacks. Children setting the iostatus are wrong,
+ * it should be controlled by the parents. */
+.iostatus_reset = blk_root_iostatus_reset,
+

[Qemu-block] [PATCH 7/9] block: Avoid bs->blk in bdrv_next()

2016-03-22 Thread Kevin Wolf
We need to introduce a separate BdrvNextIterator struct that can keep
more state than just the current BDS in order to avoid using the bs->blk
pointer.

Signed-off-by: Kevin Wolf 
---
 block.c| 34 +---
 block/block-backend.c  | 44 +++---
 block/io.c | 13 +++--
 block/snapshot.c   | 30 
 blockdev.c |  3 ++-
 include/block/block.h  |  3 ++-
 include/sysemu/block-backend.h |  1 -
 migration/block.c  |  4 +++-
 monitor.c  |  6 --
 qmp.c  |  5 -
 10 files changed, 77 insertions(+), 66 deletions(-)

diff --git a/block.c b/block.c
index 3770fb0..eefbcf3 100644
--- a/block.c
+++ b/block.c
@@ -2870,25 +2870,6 @@ BlockDriverState *bdrv_next_node(BlockDriverState *bs)
 return QTAILQ_NEXT(bs, node_list);
 }
 
-/* Iterates over all top-level BlockDriverStates, i.e. BDSs that are owned by
- * the monitor or attached to a BlockBackend */
-BlockDriverState *bdrv_next(BlockDriverState *bs)
-{
-if (!bs || bdrv_has_blk(bs)) {
-bs = blk_next_root_bs(bs);
-if (bs) {
-return bs;
-}
-}
-
-/* Ignore all BDSs that are attached to a BlockBackend here; they have been
- * handled by the above block already */
-do {
-bs = bdrv_next_monitor_owned(bs);
-} while (bs && bdrv_has_blk(bs));
-return bs;
-}
-
 const char *bdrv_get_node_name(const BlockDriverState *bs)
 {
 return bs->node_name;
@@ -3212,10 +3193,11 @@ void bdrv_invalidate_cache(BlockDriverState *bs, Error 
**errp)
 
 void bdrv_invalidate_cache_all(Error **errp)
 {
-BlockDriverState *bs = NULL;
+BlockDriverState *bs;
 Error *local_err = NULL;
+BdrvNextIterator *it = NULL;
 
-while ((bs = bdrv_next(bs)) != NULL) {
+while ((it = bdrv_next(it, )) != NULL) {
 AioContext *aio_context = bdrv_get_aio_context(bs);
 
 aio_context_acquire(aio_context);
@@ -3245,10 +3227,11 @@ static int bdrv_inactivate(BlockDriverState *bs)
 
 int bdrv_inactivate_all(void)
 {
-BlockDriverState *bs = NULL;
+BlockDriverState *bs;
+BdrvNextIterator *it = NULL;
 int ret;
 
-while ((bs = bdrv_next(bs)) != NULL) {
+while ((it = bdrv_next(it, )) != NULL) {
 AioContext *aio_context = bdrv_get_aio_context(bs);
 
 aio_context_acquire(aio_context);
@@ -3748,10 +3731,11 @@ bool bdrv_recurse_is_first_non_filter(BlockDriverState 
*bs,
  */
 bool bdrv_is_first_non_filter(BlockDriverState *candidate)
 {
-BlockDriverState *bs = NULL;
+BlockDriverState *bs;
+BdrvNextIterator *it = NULL;
 
 /* walk down the bs forest recursively */
-while ((bs = bdrv_next(bs)) != NULL) {
+while ((it = bdrv_next(it, )) != NULL) {
 bool perm;
 
 /* try to recurse in this top level bs */
diff --git a/block/block-backend.c b/block/block-backend.c
index fdd1727..b71b822 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -325,28 +325,40 @@ BlockBackend *blk_next(BlockBackend *blk)
: QTAILQ_FIRST(_block_backends);
 }
 
-/*
- * Iterates over all BlockDriverStates which are attached to a BlockBackend.
- * This function is for use by bdrv_next().
- *
- * @bs must be NULL or a BDS that is attached to a BB.
- */
-BlockDriverState *blk_next_root_bs(BlockDriverState *bs)
-{
+struct BdrvNextIterator {
+int phase;
 BlockBackend *blk;
+BlockDriverState *bs;
+};
 
-if (bs) {
-assert(bs->blk);
-blk = bs->blk;
-} else {
-blk = NULL;
+/* Iterates over all top-level BlockDriverStates, i.e. BDSs that are owned by
+ * the monitor or attached to a BlockBackend */
+BdrvNextIterator *bdrv_next(BdrvNextIterator *it, BlockDriverState **bs)
+{
+if (!it) {
+it = g_new0(BdrvNextIterator, 1);
+}
+
+if (it->phase == 0) {
+do {
+it->blk = blk_all_next(it->blk);
+*bs = it->blk ? blk_bs(it->blk) : NULL;
+} while (it->blk && *bs == NULL);
+
+if (*bs) {
+return it;
+}
+it->phase = 1;
 }
 
+/* Ignore all BDSs that are attached to a BlockBackend here; they have been
+ * handled by the above block already */
 do {
-blk = blk_all_next(blk);
-} while (blk && !blk->root);
+it->bs = bdrv_next_monitor_owned(it->bs);
+*bs = it->bs;
+} while (*bs && bdrv_has_blk(*bs));
 
-return blk ? blk->root->bs : NULL;
+return *bs ? it : NULL;
 }
 
 /*
diff --git a/block/io.c b/block/io.c
index 6ec133f..ead65cd 100644
--- a/block/io.c
+++ b/block/io.c
@@ -213,10 +213,11 @@ void bdrv_drain_all(void)
 {
 /* Always run first iteration so any pending completion BHs run */
 bool busy = true;
-BlockDriverState *bs = NULL;
+BlockDriverState *bs;
+BdrvNextIterator *it = NULL;
 GSList *aio_ctxs = 

[Qemu-block] [PATCH 0/9] block: Remove BlockDriverState.blk

2016-03-22 Thread Kevin Wolf
This is the final patch series that is required before we can start allowing
setups with more than one BlockBackend per BlockDriverState.

My current plan is to put the patches up to (and including) this series into
2.6 so that we have a relatively consistent block layer state in the release
that isn't in the middle of an overhaul, but not to make use of the new setups
that we could allow now before 2.7.

Depends on 'block: Move I/O throttling to BlockBackend'.

Kevin Wolf (9):
  block: Use BdrvChild callbacks for change_media/resize
  block: User BdrvChild callback for device name
  block jobs: Use BdrvChild callbacks for iostatus operations
  block: Remove bdrv_aio_multiwrite()
  block: Avoid BDS.blk in bdrv_next()
  block: Remove bdrv_move_feature_fields()
  block: Avoid bs->blk in bdrv_next()
  block: Don't return throttling info in query-named-block-nodes
  block: Remove BlockDriverState.blk

 block.c| 130 +++---
 block/backup.c |  20 ++--
 block/block-backend.c  | 142 
 block/commit.c |   2 +-
 block/io.c | 207 ++---
 block/mirror.c |  25 +++--
 block/qapi.c   |   6 +-
 block/snapshot.c   |  30 +++---
 block/stream.c |   2 +-
 blockdev.c |  22 ++---
 blockjob.c |  85 -
 include/block/block.h  |  10 +-
 include/block/block_int.h  |  17 +++-
 include/block/blockjob.h   |   9 ++
 include/sysemu/block-backend.h |   3 +-
 migration/block.c  |   4 +-
 monitor.c  |   6 +-
 qemu-io-cmds.c | 203 
 qmp.c  |   5 +-
 tests/qemu-iotests/085.out |   6 +-
 tests/qemu-iotests/096 |   3 +-
 tests/qemu-iotests/100 | 146 -
 tests/qemu-iotests/100.out |  89 --
 tests/qemu-iotests/136 |  20 +---
 tests/qemu-iotests/136.out |   4 +-
 tests/qemu-iotests/group   |   2 +-
 trace-events   |   2 -
 27 files changed, 353 insertions(+), 847 deletions(-)
 delete mode 100755 tests/qemu-iotests/100
 delete mode 100644 tests/qemu-iotests/100.out

-- 
1.8.3.1




[Qemu-block] [PATCH 1/9] block: Use BdrvChild callbacks for change_media/resize

2016-03-22 Thread Kevin Wolf
We want to get rid of BlockDriverState.blk in order to allow multiple
BlockBackends per BDS. Converting the device callbacks in block.c (which
assume a single BlockBackend) to per-child callbacks gets us rid of the
first few instances.

Signed-off-by: Kevin Wolf 
---
 block.c   | 38 +-
 block/block-backend.c | 15 ++-
 include/block/block_int.h |  4 +++-
 3 files changed, 42 insertions(+), 15 deletions(-)

diff --git a/block.c b/block.c
index fb37b91..1fb5dac 100644
--- a/block.c
+++ b/block.c
@@ -1216,6 +1216,27 @@ void bdrv_unref_child(BlockDriverState *parent, 
BdrvChild *child)
 bdrv_root_unref_child(child);
 }
 
+
+static void bdrv_parent_cb_change_media(BlockDriverState *bs, bool load)
+{
+BdrvChild *c;
+QLIST_FOREACH(c, >parents, next_parent) {
+if (c->role->change_media) {
+c->role->change_media(c, load);
+}
+}
+}
+
+static void bdrv_parent_cb_resize(BlockDriverState *bs)
+{
+BdrvChild *c;
+QLIST_FOREACH(c, >parents, next_parent) {
+if (c->role->resize) {
+c->role->resize(c);
+}
+}
+}
+
 /*
  * Sets the backing file link of a BDS. A new reference is created; callers
  * which don't need their own reference any more must call bdrv_unref().
@@ -1674,9 +1695,7 @@ static int bdrv_open_inherit(BlockDriverState **pbs, 
const char *filename,
 }
 
 if (!bdrv_key_required(bs)) {
-if (bs->blk) {
-blk_dev_change_media_cb(bs->blk, true);
-}
+bdrv_parent_cb_change_media(bs, true);
 } else if (!runstate_check(RUN_STATE_PRELAUNCH)
&& !runstate_check(RUN_STATE_INMIGRATE)
&& !runstate_check(RUN_STATE_PAUSED)) { /* HACK */
@@ -2122,9 +2141,7 @@ static void bdrv_close(BlockDriverState *bs)
 bdrv_release_named_dirty_bitmaps(bs);
 assert(QLIST_EMPTY(>dirty_bitmaps));
 
-if (bs->blk) {
-blk_dev_change_media_cb(bs->blk, false);
-}
+bdrv_parent_cb_change_media(bs, false);
 
 if (bs->drv) {
 BdrvChild *child, *next;
@@ -2605,9 +2622,7 @@ int bdrv_truncate(BlockDriverState *bs, int64_t offset)
 if (ret == 0) {
 ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
 bdrv_dirty_bitmap_truncate(bs);
-if (bs->blk) {
-blk_dev_resize_cb(bs->blk);
-}
+bdrv_parent_cb_resize(bs);
 }
 return ret;
 }
@@ -2718,10 +2733,7 @@ int bdrv_set_key(BlockDriverState *bs, const char *key)
 bs->valid_key = 0;
 } else if (!bs->valid_key) {
 bs->valid_key = 1;
-if (bs->blk) {
-/* call the change callback now, we skipped it on open */
-blk_dev_change_media_cb(bs->blk, true);
-}
+bdrv_parent_cb_change_media(bs, true);
 }
 return ret;
 }
diff --git a/block/block-backend.c b/block/block-backend.c
index 8b4eb1a..9d973ba 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -92,9 +92,15 @@ static void blk_root_inherit_options(int *child_flags, QDict 
*child_options,
 }
 static bool blk_drain_throttling_queue(BdrvChild *child);
 
+static void blk_root_change_media(BdrvChild *child, bool load);
+static void blk_root_resize(BdrvChild *child);
+
 static const BdrvChildRole child_root = {
 .inherit_options= blk_root_inherit_options,
 
+.change_media   = blk_root_change_media,
+.resize = blk_root_resize,
+
 .drain_queue= blk_drain_throttling_queue,
 };
 
@@ -553,6 +559,11 @@ void blk_dev_change_media_cb(BlockBackend *blk, bool load)
 }
 }
 
+static void blk_root_change_media(BdrvChild *child, bool load)
+{
+blk_dev_change_media_cb(child->opaque, load);
+}
+
 /*
  * Does @blk's attached device model have removable media?
  * %true if no device model is attached.
@@ -607,8 +618,10 @@ bool blk_dev_is_medium_locked(BlockBackend *blk)
 /*
  * Notify @blk's attached device model of a backend size change.
  */
-void blk_dev_resize_cb(BlockBackend *blk)
+static void blk_root_resize(BdrvChild *child)
 {
+BlockBackend *blk = child->opaque;
+
 if (blk->dev_ops && blk->dev_ops->resize_cb) {
 blk->dev_ops->resize_cb(blk->dev_opaque);
 }
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 000d2c0..f60cb7c 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -356,6 +356,9 @@ struct BdrvChildRole {
 void (*inherit_options)(int *child_flags, QDict *child_options,
 int parent_flags, QDict *parent_options);
 
+void (*change_media)(BdrvChild *child, bool load);
+void (*resize)(BdrvChild *child);
+
 bool (*drain_queue)(BdrvChild *child);
 };
 
@@ -695,7 +698,6 @@ bool blk_dev_has_tray(BlockBackend *blk);
 void blk_dev_eject_request(BlockBackend *blk, bool force);
 bool blk_dev_is_tray_open(BlockBackend *blk);
 bool blk_dev_is_medium_locked(BlockBackend *blk);
-void 

Re: [Qemu-block] [PATCH 22/22] qcow2-dirty-bitmap: add EXTRA_DATA_COMPATIBLE flag

2016-03-22 Thread Eric Blake
On 03/15/2016 02:04 PM, Vladimir Sementsov-Ogievskiy wrote:
> If this flag is unset and exta data present the bitmap should be

s/exta/extra/
s/present/is present/

> read-only. For now just return error for this case.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/qcow2-dirty-bitmap.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/block/qcow2-dirty-bitmap.c b/block/qcow2-dirty-bitmap.c
> index 159e935..95c166c 100644
> --- a/block/qcow2-dirty-bitmap.c
> +++ b/block/qcow2-dirty-bitmap.c
> @@ -45,6 +45,7 @@
>  #define BME_RESERVED_FLAGS 0xfffc
>  #define BME_FLAG_IN_USE 1
>  #define BME_FLAG_AUTO   (1U << 1)
> +#define BME_FLAG_EXTRA_DATA_COMPATIBLE   (1U << 1)
>  
>  /* bits [1, 8] U [56, 63] are reserved */
>  #define BME_TABLE_ENTRY_RESERVED_MASK 0xff0001fe
> @@ -333,6 +334,13 @@ static BdrvDirtyBitmap *load_bitmap(BlockDriverState 
> *bs, QCow2Bitmap *bm,
>  return NULL;
>  }
>  
> +if (!(bmh->flags & BME_FLAG_EXTRA_DATA_COMPATIBLE) &&
> +bmh->extra_data_size != 0) {
> +error_setg(errp, "Uncompatible extra data found for bitmap '%s'",

s/Uncompatible/Incompatible/

> +   bm->name);
> +return NULL;
> +}
> +
>  bitmap_table = g_try_malloc(bmh->bitmap_table_size * sizeof(uint64_t));
>  if (bitmap_table == NULL) {
>  error_setg_errno(errp, -ENOMEM,
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 18/22] qcow2-dirty-bitmaps: disallow stroing bitmap to other bs

2016-03-22 Thread Eric Blake
On 03/15/2016 02:04 PM, Vladimir Sementsov-Ogievskiy wrote:

In the subject: s/stroing/storing/

> Check, that bitmap is stored to the owning bs.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 12/22] qcow2-dirty-bitmap: add qcow2_bitmap_load_check()

2016-03-22 Thread Eric Blake
On 03/15/2016 02:04 PM, Vladimir Sementsov-Ogievskiy wrote:
> The function checks existing of the bitmap without loading it.

s/existing/the existence/

> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 10/22] qcow2-dirty-bitmap: add qcow2_bitmap_store()

2016-03-22 Thread Eric Blake
On 03/15/2016 02:04 PM, Vladimir Sementsov-Ogievskiy wrote:
> This function stores block dirty bitmap to qcow2. If the bitmap with
> the same name, size and granularity already exists, it will be
> rewritten, if the bitmap with the same name exists but granularity or
> size does not match, an error will be genrated.

s/genrated/generated/

> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---

> +
> +/* if no id is provided, a new one is constructed */
> +static int qcow2_bitmap_create(BlockDriverState *bs, const char *name,
> +   uint64_t size, int granularity)
> +{
> +int ret;
> +BDRVQcow2State *s = bs->opaque;
> +
> +if (s->nb_bitmaps >= QCOW_MAX_DIRTY_BITMAPS) {
> +return -EFBIG;
> +}
> +
> +/* Check that the name is unique */
> +if (find_bitmap_by_name(bs, name) != NULL) {
> +return -EEXIST;
> +}
> +

Is the comment about constructing a name stale or misplaced?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 20/22] iotests: test internal persistent dirty bitmap

2016-03-22 Thread Eric Blake
On 03/15/2016 02:04 PM, Vladimir Sementsov-Ogievskiy wrote:
> Add simple test cases for testing persistent dirty bitmaps.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  tests/qemu-iotests/160| 112 
> ++
>  tests/qemu-iotests/160.out|  21 
>  tests/qemu-iotests/group  |   1 +
>  tests/qemu-iotests/iotests.py |   6 +++
>  4 files changed, 140 insertions(+)
>  create mode 100755 tests/qemu-iotests/160
>  create mode 100644 tests/qemu-iotests/160.out
> 
> diff --git a/tests/qemu-iotests/160 b/tests/qemu-iotests/160
> new file mode 100755
> index 000..f9843da
> --- /dev/null
> +++ b/tests/qemu-iotests/160
> @@ -0,0 +1,112 @@
> +#!/usr/bin/env python
> +#
> +# Tests for persistent dirty bitmaps.
> +#
> +# Copyright: Vladimir Sementsov-Ogievskiy 2015

Do you want to also claim 2016?

> +++ b/tests/qemu-iotests/group
> @@ -150,3 +150,4 @@
>  145 auto quick
>  146 auto quick
>  148 rw auto quick
> +160 rw auto quick

Do we really have that many pending patch series with reserved test ids?


-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 19/22] iotests: add VM.test_launcn()

2016-03-22 Thread Eric Blake
On 03/15/2016 02:04 PM, Vladimir Sementsov-Ogievskiy wrote:

in subject: s/launcn/launch/

> Test vm can launch and print output in case of fail. This function is
> needed for testing erroneous cases
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  tests/qemu-iotests/iotests.py | 20 
>  1 file changed, 20 insertions(+)
> 

> +def test_launch(self):
> +'''Launch the VM, an error is expected'''
> +try:
> +self.launch()
> +except:
> +self._popen.wait()
> +regex = re.compile(r"qemu-system-\w+")
> +print "Test launch failed: %d" % self._popen.returncode
> +print "--- qemu output ---"
> +for line in open(self._qemu_log_path):
> +#filter qtest comments
> +if not "] OPENED" in line:
> +print regex.sub("qemu-system-*", line)
> +print "--- end qemu output ---"
> +return False
> +
> +print "Tast launch successed!"

What? Did you mean "Test launch succeeded" ?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 08/10] nbd: allow an ACL to be set with nbd-server-start QMP command

2016-03-22 Thread Eric Blake
On 03/10/2016 11:59 AM, Daniel P. Berrange wrote:
> As with the previous patch to qemu-nbd, the nbd-server-start
> QMP command also needs to be able to specify an ACL when
> enabling TLS encryption.
> 
> First the client must create a QAuthZ object instance using
> the 'object-add' command:
> 

> Signed-off-by: Daniel P. Berrange 
> ---

> +++ b/qapi/block.json
> @@ -147,6 +147,7 @@
>  #
>  # @addr: Address on which to listen.
>  # @tls-creds: (optional) ID of the TLS credentials object. Since 2.6
> +# @tls-acl: (optional) ID of the QAuthZ authorization object. Since 2.6
>  #
>  # Returns: error if the server is already running.
>  #
> @@ -154,7 +155,8 @@
>  ##
>  { 'command': 'nbd-server-start',
>'data': { 'addr': 'SocketAddress',
> -'*tls-creds': 'str'} }
> +'*tls-creds': 'str',
> +'*tls-acl': 'str'} }
>  

Interface change is deceptively simple :)

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 07/10] qemu-nbd: add support for ACLs for TLS clients

2016-03-22 Thread Eric Blake
On 03/10/2016 11:59 AM, Daniel P. Berrange wrote:
> Currently any client which can complete the TLS handshake
> is able to use the NBD server. The server admin can turn
> on the 'verify-peer' option for the x509 creds to require
> the client to provide a x509 certificate. This means the
> client will have to acquire a certificate from the CA before
> they are permitted to use the NBD server. This is still a
> fairly weak bar.
> 
> This adds a '--tls-acl ACL-ID' option to the qemu-nbd command
> which takes the ID of a previously added 'QAuthZ' object
> instance. This ACL will be used to validate the client's
> x509 distinguished name. Clients failing the ACL will not be
> permitted to use the NBD server.
> 
> For example to setup an ACL that only allows connection from
> a client whose x509 certificate distinguished name contains
> 'CN=fred', you would use:
> 
>   qemu-nbd -object tls-creds-x509,id=tls0,dir=/home/berrange/qemutls,\
>endpoint=server,verify-peer=yes \
>-object authz-simple,id=acl0,policy=deny,\
>  rules.0.match=*CN=fred,rules.0.policy=allow \
>-tls-creds tls0 \
>-tls-acl acl0
>  other qemu-nbd args...

Ah, so you are arguing that this is feature-completion of work started
in 2.6, continuing work started before soft-freeze, and not a new
feature to be delayed to 2.7.

> 
> Signed-off-by: Daniel P. Berrange 
> ---
>  qemu-nbd.c| 13 -
>  qemu-nbd.texi |  4 
>  2 files changed, 16 insertions(+), 1 deletion(-)
> 

> +++ b/qemu-nbd.texi
> @@ -86,6 +86,10 @@ the new style NBD protocol negotiation
>  Enable mandatory TLS encryption for the server by setting the ID
>  of the TLS credentials object previously created with the --object
>  option.
> +@item --tls-acl=ID
> +Specify the ID of a qauthz object previously created with the
> +--object option. This will be used to authorize users who
> +connect against their x509 distinguish name.

s/distinguish/distinguished/

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH 04/22] iotests: add default node-name

2016-03-22 Thread John Snow


On 03/15/2016 04:04 PM, Vladimir Sementsov-Ogievskiy wrote:
> When testing migration, auto-generated by qemu node-names differs in
> source and destination qemu and migration fails. After this patch,
> auto-generated by iotest nodenames will be the same.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  tests/qemu-iotests/iotests.py | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
> index 2445cf2..6807b07 100644
> --- a/tests/qemu-iotests/iotests.py
> +++ b/tests/qemu-iotests/iotests.py
> @@ -156,6 +156,7 @@ class VM(object):
>  options.append('file=%s' % path)
>  options.append('format=%s' % imgfmt)
>  options.append('cache=%s' % cachemode)
> +options.append('node-name=drivenode%d' % self._num_drives)
>  
>  if opts:
>  options.append(opts)
> 

Tested-by: John Snow 
Reviewed-by: John Snow 



Re: [Qemu-block] [Qemu-devel] [PATCH 01/22] block: Add two dirty bitmap getters

2016-03-22 Thread John Snow


On 03/15/2016 04:08 PM, Vladimir Sementsov-Ogievskiy wrote:
> On 15.03.2016 23:04, Vladimir Sementsov-Ogievskiy wrote:
>> From: Fam Zheng 
>>
>> For dirty bitmap users to get the size and the name of a
>> BdrvDirtyBitmap.
>>
>> Signed-off-by: Fam Zheng 
>> Reviewed-by: John Snow 
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> 
> it's an accidental s.o.b., actually there are no changes by me.
> 

You can add an S-o-B even when you don't make changes to indicate a
"chain of custody" for the patch. You are certifying that the patch is,
to the best of your knowledge, properly credited and licensed.

It's not required for you to add your /own/ S-o-B to every patch you
send, but it's not hurting anything here either.

At least, that was my understanding of it. I know as a maintainer I add
my S-o-B to everything that goes through my tree even if I don't modify
it just to signify where the patch has been and who has handled it.

--js

>> ---
>>   block/dirty-bitmap.c | 10 ++
>>   include/block/dirty-bitmap.h |  2 ++
>>   2 files changed, 12 insertions(+)
>>
>> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
>> index 556e1d1..45cfa3b 100644
>> --- a/block/dirty-bitmap.c
>> +++ b/block/dirty-bitmap.c
>> @@ -97,6 +97,16 @@ BdrvDirtyBitmap
>> *bdrv_create_dirty_bitmap(BlockDriverState *bs,
>>   return bitmap;
>>   }
>>   +int64_t bdrv_dirty_bitmap_size(const BdrvDirtyBitmap *bitmap)
>> +{
>> +return bitmap->size;
>> +}
>> +
>> +const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap *bitmap)
>> +{
>> +return bitmap->name;
>> +}
>> +
>>   bool bdrv_dirty_bitmap_frozen(BdrvDirtyBitmap *bitmap)
>>   {
>>   return bitmap->successor;
>> diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
>> index 80afe60..4dc8750 100644
>> --- a/include/block/dirty-bitmap.h
>> +++ b/include/block/dirty-bitmap.h
>> @@ -29,6 +29,8 @@ uint32_t
>> bdrv_get_default_bitmap_granularity(BlockDriverState *bs);
>>   uint32_t bdrv_dirty_bitmap_granularity(BdrvDirtyBitmap *bitmap);
>>   bool bdrv_dirty_bitmap_enabled(BdrvDirtyBitmap *bitmap);
>>   bool bdrv_dirty_bitmap_frozen(BdrvDirtyBitmap *bitmap);
>> +const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap *bitmap);
>> +int64_t bdrv_dirty_bitmap_size(const BdrvDirtyBitmap *bitmap);
>>   DirtyBitmapStatus bdrv_dirty_bitmap_status(BdrvDirtyBitmap *bitmap);
>>   int bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
>>  int64_t sector);
> 
> 



Re: [Qemu-block] [PATCH 4/4] virtio-blk: Clean up start/stop with mutex and BH

2016-03-22 Thread Paolo Bonzini


On 22/03/2016 13:52, Fam Zheng wrote:
>> You're right.  After unrealizing virtio_blk_data_plane_stop has set of
>> vblk->dataplane_started = false, so that's covered.  However, you still
>> need an object_ref/object_object_unref pair.
> 
> Is it safe to call object_unref outside BQL?

Hmm, no.

However, perhaps we can fix the code without a bottom half, using the 
assertion in virtio_blk_data_plane_start to ensure that there is no 
unwanted reentrancy.

Conny's patches are also enough to mask the bug for me, so my tests
do not say much.  But in any case the following patch works here too
instead of Fam's 4/4; it is a mess including some other experiments,
but I'm including it as is because that's what I tested and it's
dinner time now.

Even if it fails for you or Tu Bo, perhaps the backtraces say
something.

Thanks,

Paolo

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 1b2d5fa..5f72671 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -26,8 +26,7 @@
 #include "qom/object_interfaces.h"
 
 struct VirtIOBlockDataPlane {
-bool starting;
-bool stopping;
+int starting;
 bool disabled;
 
 VirtIOBlkConf *conf;
@@ -192,11 +191,7 @@ void virtio_blk_data_plane_start(VirtIOBlockDataPlane *s)
 VirtIOBlock *vblk = VIRTIO_BLK(s->vdev);
 int r;
 
-if (vblk->dataplane_started || s->starting) {
-return;
-}
-
-s->starting = true;
+assert(atomic_fetch_inc(>starting) == 0);
 s->vq = virtio_get_queue(s->vdev, 0);
 
 /* Set up guest notifier (irq) */
@@ -215,27 +210,28 @@ void virtio_blk_data_plane_start(VirtIOBlockDataPlane *s)
 goto fail_host_notifier;
 }
 
-s->starting = false;
-vblk->dataplane_started = true;
 trace_virtio_blk_data_plane_start(s);
 
 blk_set_aio_context(s->conf->conf.blk, s->ctx);
 
-/* Kick right away to begin processing requests already in vring */
-event_notifier_set(virtio_queue_get_host_notifier(s->vq));
+vblk->dataplane_started = true;
 
-/* Get this show started by hooking up our callbacks */
+/* Get this show started by hooking up our callbacks.  */
 aio_context_acquire(s->ctx);
 virtio_queue_aio_set_host_notifier_handler(s->vq, s->ctx, true, true);
 aio_context_release(s->ctx);
+atomic_dec(>starting);
+
+/* Kick right away to begin processing requests already in vring */
+event_notifier_set(virtio_queue_get_host_notifier(s->vq));
 return;
 
   fail_host_notifier:
 k->set_guest_notifiers(qbus->parent, 1, false);
   fail_guest_notifiers:
 s->disabled = true;
-s->starting = false;
 vblk->dataplane_started = true;
+atomic_dec(>starting);
 }
 
 /* Context: QEMU global mutex held */
@@ -245,7 +241,7 @@ void virtio_blk_data_plane_stop(VirtIOBlockDataPlane *s)
 VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
 VirtIOBlock *vblk = VIRTIO_BLK(s->vdev);
 
-if (!vblk->dataplane_started || s->stopping) {
+if (!vblk->dataplane_started) {
 return;
 }
 
@@ -255,7 +251,7 @@ void virtio_blk_data_plane_stop(VirtIOBlockDataPlane *s)
 vblk->dataplane_started = false;
 return;
 }
-s->stopping = true;
+
 trace_virtio_blk_data_plane_stop(s);
 
 aio_context_acquire(s->ctx);
@@ -274,5 +270,4 @@ void virtio_blk_data_plane_stop(VirtIOBlockDataPlane *s)
 k->set_guest_notifiers(qbus->parent, 1, false);
 
 vblk->dataplane_started = false;
-s->stopping = false;
 }



Re: [Qemu-block] [PATCH v3 06/10] acl: delete existing ACL implementation

2016-03-22 Thread Eric Blake
On 03/10/2016 11:59 AM, Daniel P. Berrange wrote:
> The 'qemu_acl' type was a previous non-QOM based attempt to
> provide an authorization facility in QEMU. Because it is
> non-QOM based it cannot be created via the command line and
> requires special monitor commands to manipulate it.
> 
> The new QAuthZ and QAuthZSimple QOM classes provide a superset
> of the functionality in qemu_acl, so the latter can now be
> deleted. The HMP 'acl_*' monitor commands are converted to
> use the new QAuthZSimple data type instead in order to provide
> backwards compatibility, but their use is discouraged.
> 
> Signed-off-by: Daniel P. Berrange 
> ---

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 05/10] util: add QAuthZSimple object type for a simple access control list

2016-03-22 Thread Eric Blake
On 03/10/2016 11:59 AM, Daniel P. Berrange wrote:
> Add a QAuthZSimple object type that implements the QAuthZ
> interface. This simple built-in implementation maintains
> a trivial access control list with a sequence of match
> rules and a final default policy. This replicates the
> functionality currently provided by the qemu_acl module.
> 
> To create an instance of this object via the QMP monitor,
> the syntax used would be
> 
>   {
> "execute": "object-add",
> "arguments": {
>   "qom-type": "authz-simple",
>   "id": "auth0",
>   "parameters": {
> "rules": [
>{ "match": "fred", "policy": "allow" },
>{ "match": "bob", "policy": "allow" },
>{ "match": "danb", "policy": "deny" },
>{ "match": "dan*", "policy": "allow" }
> ],
> "policy": "deny"
>   }
> }
>   }

So "match" appears to be a glob (as opposed to a regex).  And order is
important (the first rule matched ends the lookup), so you correctly
used an array for the list of rules (a dict does not have to maintain
order).

> 
> Or via the -object command line
> 
>   $QEMU \
>  -object authz-simple,id=acl0,policy=deny,\
>  match.0.name=fred,match.0.policy=allow, \
>  match.1.name=bob,match.1.policy=allow, \
>  match.2.name=danb,match.2.policy=deny, \
>  match.3.name=dan*,match.3.policy=allow

The '*' in the command line will require shell quoting.

> 
> This sets up an authorization rule that allows 'fred',
> 'bob' and anyone whose name starts with 'dan', except
> for 'danb'. Everyone unmatched is denied.
> 
> Signed-off-by: Daniel P. Berrange 
> ---

> +/**
> + * QAuthZSimple:
> + *
> + * This authorization driver provides a simple mechanism
> + * for granting access by matching user names against a
> + * list of globs. Each match rule has an associated policy
> + * and a catch all policy applies if no rule matches
> + *
> + * To create an instace of this class via QMP:

s/instace/instance/

> + *
> + * Or via the CLI:
> + *
> + *   $QEMU  \
> + *-object authz-simple,id=acl0,policy=deny, \
> + *match.0.name=fred,match.0.policy=allow,   \
> + *match.1.name=bob,match.1.policy=allow,\
> + *match.2.name=danb,match.2.policy=deny,\
> + *match.3.name=dan*,match.3.policy=allow

Again, quoting the "*" is important, and maybe a comment that the
whitespace is for display purposes but must be omitted on the command line.

> +++ b/qapi-schema.json
> @@ -5,6 +5,9 @@
>  # QAPI common definitions
>  { 'include': 'qapi/common.json' }
>  
> +# QAPI util definitions
> +{ 'include': 'qapi/util.json' }
> +
>  # QAPI crypto definitions
>  { 'include': 'qapi/crypto.json' }
>  
> @@ -3652,7 +3655,8 @@
>  # Since 2.5
>  ##
>  { 'struct': 'DummyForceArrays',
> -  'data': { 'unused': ['X86CPUFeatureWordInfo'] } }
> +  'data': { 'unused': ['X86CPUFeatureWordInfo'],
> +'iamalsounused': ['QAuthZSimpleRule'] } }

Cute.  I might have renamed things and gone 'unused1' and 'unused2'.

> +++ b/qapi/util.json
> @@ -0,0 +1,31 @@
> +# -*- Mode: Python -*-
> +#
> +# QAPI util definitions
> +
> +##
> +# QAuthZSimplePolicy:
> +#
> +# The authorization policy result
> +#
> +# @deny: deny access
> +# @allow: allow access
> +#
> +# Since: 2.6
> +##
> +{ 'enum': 'QAuthZSimplePolicy',
> +  'prefix': 'QAUTHZ_SIMPLE_POLICY',
> +  'data': ['deny', 'allow']}

I know Markus isn't a big fan of 'prefix', but I don't mind it.

We're awfully late in the 2.6 cycle; this feels like a feature addition
rather than a bug fix, so should it be 2.7?

> +
> +##
> +# QAuthZSimpleRule:
> +#
> +# A single authorization rule.
> +#
> +# @match: a glob to match against a user identity
> +# @policy: the result to return if @match evaluates to true
> +#
> +# Since: 2.6
> +##
> +{ 'struct': 'QAuthZSimpleRule',
> +  'data': {'match': 'str',
> +   'policy': 'QAuthZSimplePolicy'}}

Hmm. I was expecting something like:

{ 'struct': 'QAuthZSimple',
  'data': { 'rules': [ 'QAuthZSimpleRule' ],
'policy': 'QAuthZSimplePolicy' } }

but I guess that's one of our short-comings of QOM: the top-level
structure does not have to be specified anywhere in QAPI.

> +++ b/tests/test-authz-simple.c
> @@ -0,0 +1,156 @@
> +/*
> + * QEMU simple authorization object
> + *
> + * Copyright (c) 2016 Red Hat, Inc.
> + *

> +static void test_authz_default_deny(void)
> +{
> +QAuthZSimple *auth = qauthz_simple_new("auth0",
> +   QAUTHZ_SIMPLE_POLICY_DENY,
> +   _abort);
> +
> +g_assert(!qauthz_is_allowed(QAUTHZ(auth), "fred", _abort));
> +

Okay, so you definitely intend to return false without setting errp, so
it is a ternary result.

> +
> +#ifdef CONFIG_FNMATCH
> +static void test_authz_complex(void)
> +{

Wait - the behavior depends on 

Re: [Qemu-block] [Qemu-devel] [PATCH 0/2] block/qapi: trivial fixes

2016-03-22 Thread Kevin Wolf
Am 22.03.2016 um 16:50 hat Markus Armbruster geschrieben:
> Peter Xu  writes:
> 
> > One is to use literal printf format when possible.
> >
> > One is to fix an unbounded usage of stack.
> 
> I lack the time to take this through my tree before my Easter vacation.
> Kevin, can you stick it into your next pull request for me?  Note
> trivial fixup on PATCH 2.

Thanks, applied to the block branch.

Kevin



Re: [Qemu-block] [PATCH v3 04/10] util: add QAuthZ object as an authorization base class

2016-03-22 Thread Daniel P. Berrange
On Tue, Mar 22, 2016 at 10:33:42AM -0600, Eric Blake wrote:
> On 03/10/2016 11:59 AM, Daniel P. Berrange wrote:
> > The current qemu_acl module provides a simple access control
> > list facility inside QEMU, which is used via a set of monitor
> > commands acl_show, acl_policy, acl_add, acl_remove & acl_reset.
> > 
> > Note there is no ability to create ACLs - the network services
> > (eg VNC server) were expected to create ACLs that they want to
> > check.
> > 
> > There is also no way to define ACLs on the command line, nor
> > potentially integrate with external authorization systems like
> > polkit, pam, ldap lookup, etc.
> > 
> > The QAuthZ object defines a minimal abstract QOM class that can
> > be subclassed for creating different authorization providers.
> > 
> > Signed-off-by: Daniel P. Berrange 
> > ---
> 
> > +++ b/include/qemu/authz.h
> > +
> > +/**
> > + * QAuthZ:
> > + *
> > + * The QAuthZ class defines an API contract to be used
> > + * for providing an authorization driver for network
> > + * services.
> 
> Just network services? Or is it broader than that?
> 
> > +/**
> > + * qauthz_is_allowed:
> > + * @authz: the authorization object
> > + * @identity: the user identity to authorize
> > + * @errp: pointer to a NULL initialized error object
> > + *
> > + * Check if a user @identity is authorized
> > + *
> > + * Returns: true if @identity is authorizd, false otherwise
> 
> s/authorizd/authorized/
> 
> I think you need more documentation on return semantics.  Do we have
> strict binary return (either we returned true and errp is unset, or we
> returned false and errp is set), or is it a ternary (we return true and
> errp is unset: permission is explicitly granted; we return false and
> errp is unset: permission is explicitly denied; or we set errp: we could
> not determine permission).  And if a ternary, do we also want to require
> that setting 'errp' also requires a return of false, or is the return
> undefined in that case?

It is intended to be ternary, and if errp is set, the return value
should be false.

ie you should be able todo


  if (qauthz_is_allowed(authz, identity, NULL))
 

safe in the knowledge that any error that you're ignoring will
result in denial of permission

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-block] [PATCH 01/22] block: Add two dirty bitmap getters

2016-03-22 Thread Eric Blake
On 03/15/2016 02:08 PM, Vladimir Sementsov-Ogievskiy wrote:
> On 15.03.2016 23:04, Vladimir Sementsov-Ogievskiy wrote:
>> From: Fam Zheng 
>>
>> For dirty bitmap users to get the size and the name of a
>> BdrvDirtyBitmap.
>>
>> Signed-off-by: Fam Zheng 
>> Reviewed-by: John Snow 
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> 
> it's an accidental s.o.b., actually there are no changes by me.

But if you are emailing the current version of the series, you should at
least have your name associated with the commit message - if not s.o.b,
then Reviewed-by.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 04/10] util: add QAuthZ object as an authorization base class

2016-03-22 Thread Daniel P. Berrange
On Tue, Mar 22, 2016 at 10:33:42AM -0600, Eric Blake wrote:
> On 03/10/2016 11:59 AM, Daniel P. Berrange wrote:
> > The current qemu_acl module provides a simple access control
> > list facility inside QEMU, which is used via a set of monitor
> > commands acl_show, acl_policy, acl_add, acl_remove & acl_reset.
> > 
> > Note there is no ability to create ACLs - the network services
> > (eg VNC server) were expected to create ACLs that they want to
> > check.
> > 
> > There is also no way to define ACLs on the command line, nor
> > potentially integrate with external authorization systems like
> > polkit, pam, ldap lookup, etc.
> > 
> > The QAuthZ object defines a minimal abstract QOM class that can
> > be subclassed for creating different authorization providers.
> > 
> > Signed-off-by: Daniel P. Berrange 
> > ---
> 
> > +++ b/include/qemu/authz.h
> > +
> > +/**
> > + * QAuthZ:
> > + *
> > + * The QAuthZ class defines an API contract to be used
> > + * for providing an authorization driver for network
> > + * services.
> 
> Just network services? Or is it broader than that?

Any service that requires authentication. It is actually nothing
specific to networking

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-block] [PATCH v3 04/10] util: add QAuthZ object as an authorization base class

2016-03-22 Thread Eric Blake
On 03/10/2016 11:59 AM, Daniel P. Berrange wrote:
> The current qemu_acl module provides a simple access control
> list facility inside QEMU, which is used via a set of monitor
> commands acl_show, acl_policy, acl_add, acl_remove & acl_reset.
> 
> Note there is no ability to create ACLs - the network services
> (eg VNC server) were expected to create ACLs that they want to
> check.
> 
> There is also no way to define ACLs on the command line, nor
> potentially integrate with external authorization systems like
> polkit, pam, ldap lookup, etc.
> 
> The QAuthZ object defines a minimal abstract QOM class that can
> be subclassed for creating different authorization providers.
> 
> Signed-off-by: Daniel P. Berrange 
> ---

> +++ b/include/qemu/authz.h
> +
> +/**
> + * QAuthZ:
> + *
> + * The QAuthZ class defines an API contract to be used
> + * for providing an authorization driver for network
> + * services.

Just network services? Or is it broader than that?

> +/**
> + * qauthz_is_allowed:
> + * @authz: the authorization object
> + * @identity: the user identity to authorize
> + * @errp: pointer to a NULL initialized error object
> + *
> + * Check if a user @identity is authorized
> + *
> + * Returns: true if @identity is authorizd, false otherwise

s/authorizd/authorized/

I think you need more documentation on return semantics.  Do we have
strict binary return (either we returned true and errp is unset, or we
returned false and errp is set), or is it a ternary (we return true and
errp is unset: permission is explicitly granted; we return false and
errp is unset: permission is explicitly denied; or we set errp: we could
not determine permission).  And if a ternary, do we also want to require
that setting 'errp' also requires a return of false, or is the return
undefined in that case?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 02/10] qapi: allow QmpInputVisitor to auto-cast types

2016-03-22 Thread Eric Blake
On 03/22/2016 09:49 AM, Daniel P. Berrange wrote:
> On Mon, Mar 21, 2016 at 05:18:01PM -0600, Eric Blake wrote:
>> On 03/10/2016 11:59 AM, Daniel P. Berrange wrote:
>>> Currently the QmpInputVisitor assumes that all scalar
>>> values are directly represented as their final types.
>>> ie it assumes an 'int' is using QInt, and a 'bool' is
>>> using QBool.
>>>
>>> This extends it so that QString is optionally permitted
>>> for any of the non-string scalar types. This behaviour
>>> is turned on by requesting the 'autocast' flag in the
>>> constructor.
>>>
>> Hmm.  Do we need to worry about partial asymmetry?  That is,
>> qint_get_int() returns a signed number, but qemu_strtoull() parses
>> unsigned; if the original conversion from JSON to qint uses a different
>> parser, then we could have funny results where we get different results
>> for things like:
>>  "key1":9223372036854775807, "key2":"9223372036854775807",
>> even though the same string of digits is being parsed, based on whether
>> the different parsers handle numbers larger than INT64_MAX differently.
> 
> Is this something you want me to change for re-post, or just a general
> point for future ?  ie should I be using qemu_strtoll instead of
> qemu_strtoull or something else ?   qint itself doesn't seem
> to concern itself with parsing ints from strintgs, so presumably
> this is from json code ?

General comment for now. We already know we need a bigger audit of
handling of values larger than INT64_MAX, so any cleanups related to
that can be deferred to that later audit.  But maybe a FIXME or TODO
comment in the code in your submission, to remind us to think about it
during the later audit, would help.


>>> +qstr = qobject_to_qstring(qobj);
>>> +if (qstr && qstr->string && qiv->autocast) {
>>> +if (!strcasecmp(qstr->string, "on") ||
>>> +!strcasecmp(qstr->string, "yes") ||
>>> +!strcasecmp(qstr->string, "true")) {
>>> +*obj = true;
>>> +return;
>>> +}
>>
>> Do we also want to allow "0"/"1" for true/false?
> 
> These 3 strings I took from opts-visitor.c, so to maintain compat
> we probably should not allow 0/1, unless we decide to extend
> opts-visitor too

Good point.  Maybe a comment pointing in both places pointing out that
we should keep them in sync is a worthwhile addition.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH] block: Remove blk_set_bs()

2016-03-22 Thread Eric Blake
On 03/22/2016 06:55 AM, Kevin Wolf wrote:
> The function is unused since commit f21d96d0 ('block: Use BdrvChild in
> BlockBackend').
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block/block-backend.c | 17 -
>  include/block/block_int.h |  2 --
>  2 files changed, 19 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 03/10] qom: support arbitrary non-scalar properties with -object

2016-03-22 Thread Daniel P. Berrange
On Mon, Mar 21, 2016 at 05:27:24PM -0600, Eric Blake wrote:
> On 03/10/2016 11:59 AM, Daniel P. Berrange wrote:
> > The current -object command line syntax only allows for
> > creation of objects with scalar properties, or a list
> > with a fixed scalar element type. Objects which have
> > properties that are represented as structs in the QAPI
> > schema cannot be created using -object.
> > 
> > This is a design limitation of the way the OptsVisitor
> > is written. It simply iterates over the QemuOpts values
> > as a flat list. The support for lists is enabled by
> > allowing the same key to be repeated in the opts string.
> > 
> > It is not practical to extend the OptsVisitor to support
> > more complex data structures while also maintaining
> > the existing list handling behaviour that is relied upon
> > by other areas of QEMU.
> 
> Zoltán Kővágó tried earlier with his GSoC patches for the audio
> subsystem last year, but those got stalled waiting for qapi enhancements
> to go in.  But I think your approach is indeed a bit nicer (rather than
> making the warty OptsVisitor even wartier, just avoid it).

My first attempt did indeed modify OptsVisitor, but I quickly
abandoned it since it ended up being quite complex code to
make it fit in with the pre-existing hack to supports lists
of scalars in OptsVisitor.  The QmpInputVisitor approach is
cleaner and simpler overall


> > Would be creatable via the CLI now using
> > 
> > $QEMU \
> >   -object demo,id=demo0,\
> >   foo.0.bar=one,foo.0.wizz=1,\
> >   foo.1.bar=two,foo.1.wizz=2
> > 
> > This is also wired up to work for the 'object_add' command
> > in the HMP monitor with the same syntax.
> > 
> >   (hmp) object_add demo,id=demo0,\
> >foo.0.bar=one,foo.0.wizz=1,\
> >foo.1.bar=two,foo.1.wizz=2
> 
> Maybe mention that the indentation is not actually present in the real
> command lines typed.

Heh, yeah

> > @@ -120,6 +120,7 @@ Object *user_creatable_add_type(const char *type, const 
> > char *id,
> >  obj = object_new(type);
> >  if (qdict) {
> >  for (e = qdict_first(qdict); e; e = qdict_next(qdict, e)) {
> > +
> >  object_property_set(obj, v, e->key, _err);
> >  if (local_err) {
> >  goto out;
> 
> Spurious hunk?

Indeed

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-block] [Qemu-devel] [PATCH 0/2] block/qapi: trivial fixes

2016-03-22 Thread Markus Armbruster
Peter Xu  writes:

> One is to use literal printf format when possible.
>
> One is to fix an unbounded usage of stack.

I lack the time to take this through my tree before my Easter vacation.
Kevin, can you stick it into your next pull request for me?  Note
trivial fixup on PATCH 2.



Re: [Qemu-block] [PATCH v3 02/10] qapi: allow QmpInputVisitor to auto-cast types

2016-03-22 Thread Daniel P. Berrange
On Mon, Mar 21, 2016 at 05:18:01PM -0600, Eric Blake wrote:
> On 03/10/2016 11:59 AM, Daniel P. Berrange wrote:
> > Currently the QmpInputVisitor assumes that all scalar
> > values are directly represented as their final types.
> > ie it assumes an 'int' is using QInt, and a 'bool' is
> > using QBool.
> > 
> > This extends it so that QString is optionally permitted
> > for any of the non-string scalar types. This behaviour
> > is turned on by requesting the 'autocast' flag in the
> > constructor.
> > 
> > This makes it possible to use QmpInputVisitor with a
> > QDict produced from QemuOpts, where everything is in
> > string format.
> > 
> > Signed-off-by: Daniel P. Berrange 
> > ---
> >  include/qapi/qmp-input-visitor.h |   3 +
> >  qapi/qmp-input-visitor.c |  96 +++-
> >  tests/test-qmp-input-visitor.c   | 115 
> > ++-
> >  3 files changed, 196 insertions(+), 18 deletions(-)


> > -*obj = qint_get_int(qint);
> > +qstr = qobject_to_qstring(qobj);
> > +if (qstr && qstr->string && qiv->autocast) {
> > +errno = 0;
> > +if (qemu_strtoull(qstr->string, NULL, 10, obj) == 0) {
> 
> And again.
> 
> Hmm.  Do we need to worry about partial asymmetry?  That is,
> qint_get_int() returns a signed number, but qemu_strtoull() parses
> unsigned; if the original conversion from JSON to qint uses a different
> parser, then we could have funny results where we get different results
> for things like:
>  "key1":9223372036854775807, "key2":"9223372036854775807",
> even though the same string of digits is being parsed, based on whether
> the different parsers handle numbers larger than INT64_MAX differently.

Is this something you want me to change for re-post, or just a general
point for future ?  ie should I be using qemu_strtoll instead of
qemu_strtoull or something else ?   qint itself doesn't seem
to concern itself with parsing ints from strintgs, so presumably
this is from json code ?

> [Ultimately, I'd like QInt to be enhanced to track whether the input was
> signed or unsigned, and automatically make the output match the input
> when converting back to string; that is, track 65 bits of information
> instead of 64; but that's no sooner than 2.7 material]
> 
> 
> >  static void qmp_input_type_bool(Visitor *v, const char *name, bool *obj,
> >  Error **errp)
> >  {
> >  QmpInputVisitor *qiv = to_qiv(v);
> > -QBool *qbool = qobject_to_qbool(qmp_input_get_object(qiv, name, true));
> > +QObject *qobj = qmp_input_get_object(qiv, name, true);
> > +QBool *qbool;
> > +QString *qstr;
> >  
> > -if (!qbool) {
> > -error_setg(errp, QERR_INVALID_PARAMETER_TYPE, name ? name : "null",
> > -   "boolean");
> > +qbool = qobject_to_qbool(qobj);
> > +if (qbool) {
> > +*obj = qbool_get_bool(qbool);
> >  return;
> >  }
> >  
> > -*obj = qbool_get_bool(qbool);
> > +
> > +qstr = qobject_to_qstring(qobj);
> > +if (qstr && qstr->string && qiv->autocast) {
> > +if (!strcasecmp(qstr->string, "on") ||
> > +!strcasecmp(qstr->string, "yes") ||
> > +!strcasecmp(qstr->string, "true")) {
> > +*obj = true;
> > +return;
> > +}
> 
> Do we also want to allow "0"/"1" for true/false?

These 3 strings I took from opts-visitor.c, so to maintain compat
we probably should not allow 0/1, unless we decide to extend
opts-visitor too

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-block] [Qemu-devel] [PATCH 2/2] block/qapi: fix unbounded stack for dump_qdict

2016-03-22 Thread Markus Armbruster
Peter Xu  writes:

> Using heap instead of stack for better safety.
>
> Signed-off-by: Peter Xu 
> ---
>  block/qapi.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/block/qapi.c b/block/qapi.c
> index c4c2115..b798e35 100644
> --- a/block/qapi.c
> +++ b/block/qapi.c
> @@ -636,9 +636,8 @@ static void dump_qdict(fprintf_function func_fprintf, 
> void *f, int indentation,
>  for (entry = qdict_first(dict); entry; entry = qdict_next(dict, entry)) {
>  QType type = qobject_type(entry->value);
>  bool composite = (type == QTYPE_QDICT || type == QTYPE_QLIST);
> -char key[strlen(entry->key) + 1];
> +char *key = g_malloc(strlen(entry->key) + 1);
>  int i;
> -

Unwanted whitespace change.

>  /* replace dashes with spaces in key (variable) names */
>  for (i = 0; entry->key[i]; i++) {
>  key[i] = entry->key[i] == '-' ? ' ' : entry->key[i];
> @@ -650,6 +649,7 @@ static void dump_qdict(fprintf_function func_fprintf, 
> void *f, int indentation,
>  if (!composite) {
>  func_fprintf(f, "\n");
>  }
> +g_free(key);
>  }
>  }

With the unwanted whitespace change backed out:
Reviewed-by: Markus Armbruster 



Re: [Qemu-block] [Qemu-devel] [PATCH 1/2] block/qapi: make two printf() formats literal

2016-03-22 Thread Markus Armbruster
Peter Xu  writes:

> Fix two places to use literal printf format when possible.
>
> Signed-off-by: Peter Xu 

Reviewed-by: Markus Armbruster 



Re: [Qemu-block] [PATCH v3 01/10] qdict: implement a qdict_crumple method for un-flattening a dict

2016-03-22 Thread Daniel P. Berrange
On Mon, Mar 21, 2016 at 04:45:39PM -0600, Eric Blake wrote:
> On 03/10/2016 11:59 AM, Daniel P. Berrange wrote:

> > +/* Unescape the '..' sequence into '.' */
> > +for (i = 0, j = 0; (*prefix)[i] != '\0'; i++, j++) {
> > +if ((*prefix)[i] == '.' &&
> > +(*prefix)[i + 1] == '.') {
> 
> Technically, if (*prefix)[i] == '.', we could assert((*prefix)[i + 1] ==
> '.'), since the only way to get a '.' in prefix is via escaping.  For
> that matter, you could short-circuit (part of) the loop by doing a
> strchr for '.' (if not found, the loop is not needed; if found, start
> the reduction at that point rather on the bytes leading up to that point).

I'm not seeing obvious benefit in trying to short-circuit the loop
using a strchr, as both ways you still end up iterating over all
chars in the string - its just that you're hiding the iteration
in strchr instead.

> > +static ssize_t qdict_list_size(QDict *maybe_list, Error **errp)
> > +{
> > +const QDictEntry *entry, *next;
> > +ssize_t len = 0;
> > +ssize_t max = -1;
> > +int is_list = -1;
> > +int64_t val;
> > +
> > +entry = qdict_first(maybe_list);
> > +while (entry != NULL) {
> > +next = qdict_next(maybe_list, entry);
> > +
> > +if (qemu_strtoll(entry->key, NULL, 10, ) == 0) {
> > +if (is_list == -1) {
> > +is_list = 1;
> > +} else if (!is_list) {
> > +error_setg(errp,
> > +   "Key '%s' is for a list, but previous key is "
> > +   "for a dict", entry->key);
> 
> Keys are unsorted, so it's a bit hard to call it "previous key".  Maybe
> a better error message would be along the lines of "cannot crumple
> dictionary because of a mix of list and non-list keys"?  I dunno...

Yeah, I'll use

  "Cannot crumple a dictionary with a mix of list and non-list keys"


> 
> > +return -1;
> > +}
> > +len++;
> > +if (val > max) {
> > +max = val;
> > +}
> > +} else {
> > +if (is_list == -1) {
> > +is_list = 0;
> > +} else if (is_list) {
> > +error_setg(errp,
> > +   "Key '%s' is for a dict, but previous key is "
> > +   "for a list", entry->key);
> 
> ...same argument. If we can wordsmith something that makes sense, it
> might work for both places.  Otherwise, I can live with your messages.


> > +++ b/tests/check-qdict.c
> > @@ -596,6 +596,140 @@ static void qdict_join_test(void)
> >  QDECREF(dict2);
> >  }
> >  
> > +
> > +static void qdict_crumple_test_nonrecursive(void)
> > +{
> 
> This only covers a single layer of collapse, but not turning a dict into
> a list.  Is it also worth covering a case where no list indices are
> involved, such as the four keys "a.b.d", "a.b.e", "a.c.d", "a.d.e" being
> crumpled non-recursively into a single dict "a" with keys "b.d", "b.e",
> "c.d", and "d.e"?

I'll add an explicit rule to test dict -> list conversion, and some
extra dict items here to cover proper nested dicts.

> 
> > +
> > +static void qdict_crumple_test_recursive(void)
> > +{
> > +
> 
> This only covers a list of dict collapse, not a true multi-layer dict
> collapse.  Is it also worth covering the same four keys as above, but
> this time that dict "a" has keys "b" and "c", each of which is a dict in
> turn with keys "d" and "e"?

I'll add some more dict items to properly cover nested dicts

> > +static void qdict_crumple_test_bad_inputs(void)
> > +{
> > +QDict *src;
> > +Error *error = NULL;
> > +
> 
> > +
> > +src = qdict_new();
> > +/* The input should be flat, ie no dicts or lists */
> > +qdict_put(src, "rule.0", qdict_new());
> > +qdict_put(src, "rule.a", qstring_from_str("allow"));
> 
> I'd use "rule.a" and "rule.b" here, so that you aren't confusing this
> with the earlier test that you can't mix list and dict.

Good point.

> I'd also add a negative test for "rule.1" without "rule.0" being invalid
> (missing a list index).

Yep, I'll add that.


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



[Qemu-block] [PATCH 03/12] block: Introduce BlockBackendPublic

2016-03-22 Thread Kevin Wolf
Some features, like I/O throttling, are implemented outside
block-backend.c, but still want to keep BlockBackends in a list. In
order to avoid exposing the whole struct layout in the public header
file, this patch introduces an embedded public struct where list entry
structs can be added and a pair of functions to convert between
BlockBackend and BlockBackendPublic.

Signed-off-by: Kevin Wolf 
---
 block/block-backend.c  | 17 +
 include/sysemu/block-backend.h |  9 +
 2 files changed, 26 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index df8f717..4394950 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -33,6 +33,7 @@ struct BlockBackend {
 DriveInfo *legacy_dinfo;/* null unless created by drive_new() */
 QTAILQ_ENTRY(BlockBackend) link; /* for block_backends */
 QTAILQ_ENTRY(BlockBackend) monitor_link; /* for monitor_block_backends */
+BlockBackendPublic public;
 
 void *dev;  /* attached device model, if any */
 /* TODO change to DeviceState when all users are qdevified */
@@ -410,6 +411,22 @@ BlockBackend *blk_by_legacy_dinfo(DriveInfo *dinfo)
 }
 
 /*
+ * Returns a pointer to the publicly accessible fields of @blk.
+ */
+BlockBackendPublic *blk_get_public(BlockBackend *blk)
+{
+return >public;
+}
+
+/*
+ * Returns a BlockBackend given the associated @public fields.
+ */
+BlockBackend *blk_by_public(BlockBackendPublic *public)
+{
+return container_of(public, BlockBackend, public);
+}
+
+/*
  * Disassociates the currently associated BlockDriverState from @blk.
  */
 void blk_remove_bs(BlockBackend *blk)
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index d839bff..bb4ff6f 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -59,6 +59,12 @@ typedef struct BlockDevOps {
 void (*resize_cb)(void *opaque);
 } BlockDevOps;
 
+/* This struct is embedded in (the private) BlockBackend struct and contains
+ * fields that must be public. This is in particular for QLIST_ENTRY() and
+ * friends so that BlockBackends can be kept in lists outside block-backend.c 
*/
+typedef struct BlockBackendPublic {
+} BlockBackendPublic;
+
 BlockBackend *blk_new(Error **errp);
 BlockBackend *blk_new_with_bs(Error **errp);
 BlockBackend *blk_new_open(const char *filename, const char *reference,
@@ -74,6 +80,9 @@ BlockDriverState *blk_next_root_bs(BlockDriverState *bs);
 bool monitor_add_blk(BlockBackend *blk, const char *name, Error **errp);
 void monitor_remove_blk(BlockBackend *blk);
 
+BlockBackendPublic *blk_get_public(BlockBackend *blk);
+BlockBackend *blk_by_public(BlockBackendPublic *public);
+
 BlockDriverState *blk_bs(BlockBackend *blk);
 void blk_remove_bs(BlockBackend *blk);
 void blk_insert_bs(BlockBackend *blk, BlockDriverState *bs);
-- 
1.8.3.1




[Qemu-block] [PATCH 12/12] block: Don't check throttled reqs in bdrv_requests_pending()

2016-03-22 Thread Kevin Wolf
Checking whether there are throttled requests requires going to the
associated BlockBackend, which we want to avoid. All users of
bdrv_requests_pending() already call bdrv_flush_io_queue() first, which
restarts throttled requests. We just have to use the return value of
that callback (which tells us whether any requests have been restarted)
instead of ignoring it.

Signed-off-by: Kevin Wolf 
---
 block/io.c| 22 ++
 include/block/block.h |  2 +-
 2 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/block/io.c b/block/io.c
index f6edab8..24bdd6c 100644
--- a/block/io.c
+++ b/block/io.c
@@ -153,17 +153,10 @@ void bdrv_disable_copy_on_read(BlockDriverState *bs)
 bool bdrv_requests_pending(BlockDriverState *bs)
 {
 BdrvChild *child;
-BlockBackendPublic *blkp = bs->blk ? blk_get_public(bs->blk) : NULL;
 
 if (!QLIST_EMPTY(>tracked_requests)) {
 return true;
 }
-if (blkp && !qemu_co_queue_empty(>throttled_reqs[0])) {
-return true;
-}
-if (blkp && !qemu_co_queue_empty(>throttled_reqs[1])) {
-return true;
-}
 
 QLIST_FOREACH(child, >children, next) {
 if (bdrv_requests_pending(child->bs)) {
@@ -204,8 +197,8 @@ void bdrv_drain(BlockDriverState *bs)
 bdrv_drain_recurse(bs);
 while (busy) {
 /* Keep iterating */
- bdrv_flush_io_queue(bs);
- busy = bdrv_requests_pending(bs);
+ busy = bdrv_flush_io_queue(bs);
+ busy |= bdrv_requests_pending(bs);
  busy |= aio_poll(bdrv_get_aio_context(bs), busy);
 }
 }
@@ -254,7 +247,9 @@ void bdrv_drain_all(void)
 aio_context_acquire(aio_context);
 while ((bs = bdrv_next(bs))) {
 if (aio_context == bdrv_get_aio_context(bs)) {
-bdrv_flush_io_queue(bs);
+if (bdrv_flush_io_queue(bs)) {
+busy = true;
+}
 if (bdrv_requests_pending(bs)) {
 busy = true;
 aio_poll(aio_context, busy);
@@ -2639,10 +2634,11 @@ void bdrv_io_unplug(BlockDriverState *bs)
 }
 }
 
-void bdrv_flush_io_queue(BlockDriverState *bs)
+bool bdrv_flush_io_queue(BlockDriverState *bs)
 {
 BlockDriver *drv = bs->drv;
 BdrvChild *c;
+bool new_request = false;
 
 if (drv && drv->bdrv_flush_io_queue) {
 drv->bdrv_flush_io_queue(bs);
@@ -2652,9 +2648,11 @@ void bdrv_flush_io_queue(BlockDriverState *bs)
 
 QLIST_FOREACH(c, >parents, next_parent) {
 if (c->role->drain_queue) {
-c->role->drain_queue(c);
+new_request |= c->role->drain_queue(c);
 }
 }
+
+return new_request;
 }
 
 void bdrv_drained_begin(BlockDriverState *bs)
diff --git a/include/block/block.h b/include/block/block.h
index 80ece12..46cee8e 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -513,7 +513,7 @@ int bdrv_probe_geometry(BlockDriverState *bs, HDGeometry 
*geo);
 
 void bdrv_io_plug(BlockDriverState *bs);
 void bdrv_io_unplug(BlockDriverState *bs);
-void bdrv_flush_io_queue(BlockDriverState *bs);
+bool bdrv_flush_io_queue(BlockDriverState *bs);
 
 /**
  * bdrv_drained_begin:
-- 
1.8.3.1




[Qemu-block] [PATCH 10/12] block: Drain throttling queue with BdrvChild callback

2016-03-22 Thread Kevin Wolf
This removes the last part of I/O throttling from block/io.c and moves
it to the BlockBackend.

When draining the queue of a BlockDriverState, we must make sure that no
new requests can come in for it. Request sources from outside the block
layer are disabled with aio_disable_external(), but the throttling queue
must be handled separately.

The two obvious options we have are either to implement a similar
mechanism in BlockBackend that queues requests and avoids to pass them
to the BDS, or to flush the whole queue. The first option seems nicer
and could prevent bypassing the I/O limit, but the second is closer to
what we're already doing on the BDS level, so this patch keeps it this
way for now.

Signed-off-by: Kevin Wolf 
---
 block/block-backend.c | 29 +++--
 block/io.c| 35 ---
 include/block/block_int.h |  4 ++--
 3 files changed, 37 insertions(+), 31 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index d2bd268..c71ce4d 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -90,9 +90,12 @@ static void blk_root_inherit_options(int *child_flags, QDict 
*child_options,
 /* We're not supposed to call this function for root nodes */
 abort();
 }
+static bool blk_drain_throttling_queue(BdrvChild *child);
 
 static const BdrvChildRole child_root = {
-.inherit_options = blk_root_inherit_options,
+.inherit_options= blk_root_inherit_options,
+
+.drain_queue= blk_drain_throttling_queue,
 };
 
 /*
@@ -1677,7 +1680,7 @@ void blk_set_io_limits(BlockBackend *blk, ThrottleConfig 
*cfg)
 void blk_io_limits_disable(BlockBackend *blk)
 {
 blk->public.io_limits_enabled = false;
-bdrv_start_throttled_reqs(blk_bs(blk));
+blk_drain_throttling_queue(blk->root);
 throttle_group_unregister_blk(blk);
 }
 
@@ -1705,3 +1708,25 @@ void blk_io_limits_update_group(BlockBackend *blk, const 
char *group)
 blk_io_limits_disable(blk);
 blk_io_limits_enable(blk, group);
 }
+
+/* this function drain all the throttled IOs */
+static bool blk_drain_throttling_queue(BdrvChild *child)
+{
+BlockBackend *blk = child->opaque;
+BlockBackendPublic *blkp = >public;
+bool drained = false;
+bool enabled = blkp->io_limits_enabled;
+int i;
+
+blkp->io_limits_enabled = false;
+
+for (i = 0; i < 2; i++) {
+while (qemu_co_enter_next(>throttled_reqs[i])) {
+drained = true;
+}
+}
+
+blkp->io_limits_enabled = enabled;
+
+return drained;
+}
diff --git a/block/io.c b/block/io.c
index 22dbb43..f6edab8 100644
--- a/block/io.c
+++ b/block/io.c
@@ -27,7 +27,6 @@
 #include "sysemu/block-backend.h"
 #include "block/blockjob.h"
 #include "block/block_int.h"
-#include "block/throttle-groups.h"
 #include "qemu/error-report.h"
 
 #define NOT_DONE 0x7fff /* used while emulated sync operation in progress 
*/
@@ -56,31 +55,6 @@ static void coroutine_fn bdrv_co_do_rw(void *opaque);
 static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors, BdrvRequestFlags flags);
 
-/* this function drain all the throttled IOs */
-bool bdrv_start_throttled_reqs(BlockDriverState *bs)
-{
-if (!bs->blk) {
-return false;
-}
-
-BlockBackendPublic *blkp = blk_get_public(bs->blk);
-bool drained = false;
-bool enabled = blk_get_public(bs->blk)->io_limits_enabled;
-int i;
-
-blkp->io_limits_enabled = false;
-
-for (i = 0; i < 2; i++) {
-while (qemu_co_enter_next(>throttled_reqs[i])) {
-drained = true;
-}
-}
-
-blkp->io_limits_enabled = enabled;
-
-return drained;
-}
-
 void bdrv_setup_io_funcs(BlockDriver *bdrv)
 {
 /* Block drivers without coroutine functions need emulation */
@@ -2668,12 +2642,19 @@ void bdrv_io_unplug(BlockDriverState *bs)
 void bdrv_flush_io_queue(BlockDriverState *bs)
 {
 BlockDriver *drv = bs->drv;
+BdrvChild *c;
+
 if (drv && drv->bdrv_flush_io_queue) {
 drv->bdrv_flush_io_queue(bs);
 } else if (bs->file) {
 bdrv_flush_io_queue(bs->file->bs);
 }
-bdrv_start_throttled_reqs(bs);
+
+QLIST_FOREACH(c, >parents, next_parent) {
+if (c->role->drain_queue) {
+c->role->drain_queue(c);
+}
+}
 }
 
 void bdrv_drained_begin(BlockDriverState *bs)
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 1b3c310..e064d9d 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -355,6 +355,8 @@ typedef struct BdrvAioNotifier {
 struct BdrvChildRole {
 void (*inherit_options)(int *child_flags, QDict *child_options,
 int parent_flags, QDict *parent_options);
+
+bool (*drain_queue)(BdrvChild *child);
 };
 
 extern const BdrvChildRole child_file;
@@ -508,8 +510,6 @@ int get_tmp_filename(char *filename, int size);
 BlockDriver *bdrv_probe_all(const uint8_t *buf, int 

[Qemu-block] [PATCH 09/12] block: Introduce BdrvChild.opaque

2016-03-22 Thread Kevin Wolf
BlockBackends use it to get a back pointer from BdrvChild to
BlockBackend in any BdrvChildRole callbacks.

Signed-off-by: Kevin Wolf 
---
 block/block-backend.c | 2 ++
 include/block/block_int.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index 2309672..d2bd268 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -132,6 +132,7 @@ BlockBackend *blk_new_with_bs(Error **errp)
 
 bs = bdrv_new_root();
 blk->root = bdrv_root_attach_child(bs, "root", _root);
+blk->root->opaque = blk;
 bs->blk = blk;
 return blk;
 }
@@ -457,6 +458,7 @@ void blk_insert_bs(BlockBackend *blk, BlockDriverState *bs)
 assert(!blk->root && !bs->blk);
 bdrv_ref(bs);
 blk->root = bdrv_root_attach_child(bs, "root", _root);
+blk->root->opaque = blk;
 bs->blk = blk;
 
 notifier_list_notify(>insert_bs_notifiers, blk);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 0cc38d5..1b3c310 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -364,6 +364,7 @@ struct BdrvChild {
 BlockDriverState *bs;
 char *name;
 const BdrvChildRole *role;
+void *opaque;
 QLIST_ENTRY(BdrvChild) next;
 QLIST_ENTRY(BdrvChild) next_parent;
 };
-- 
1.8.3.1




[Qemu-block] [PATCH 08/12] block: Move I/O throttling configuration functions to BlockBackend

2016-03-22 Thread Kevin Wolf
Signed-off-by: Kevin Wolf 
---
 block.c |  2 +-
 block/block-backend.c   | 49 +++--
 block/io.c  | 47 +--
 block/qapi.c|  2 +-
 block/throttle-groups.c | 12 +-
 blockdev.c  | 16 +++---
 include/block/block.h   |  4 
 include/block/block_int.h   |  3 +--
 include/block/throttle-groups.h |  4 ++--
 include/sysemu/block-backend.h  |  5 +
 tests/test-throttle.c   | 16 +-
 11 files changed, 78 insertions(+), 82 deletions(-)

diff --git a/block.c b/block.c
index be06564..1fe35cd 100644
--- a/block.c
+++ b/block.c
@@ -2118,7 +2118,7 @@ static void bdrv_close(BlockDriverState *bs)
 
 /* Disable I/O limits and drain all pending throttled requests */
 if (bs->blk && blk_get_public(bs->blk)->throttle_state) {
-bdrv_io_limits_disable(bs);
+blk_io_limits_disable(bs->blk);
 }
 
 bdrv_drained_begin(bs); /* complete I/O */
diff --git a/block/block-backend.c b/block/block-backend.c
index 1567d8b..2309672 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -441,7 +441,7 @@ void blk_remove_bs(BlockBackend *blk)
 
 blk_update_root_state(blk);
 if (blk->public.io_limits_enabled) {
-bdrv_io_limits_disable(blk->root->bs);
+blk_io_limits_disable(blk);
 }
 
 blk->root->bs->blk = NULL;
@@ -1594,7 +1594,7 @@ void blk_apply_root_state(BlockBackend *blk, 
BlockDriverState *bs)
 {
 bs->detect_zeroes = blk->root_state.detect_zeroes;
 if (blk->root_state.throttle_group) {
-bdrv_io_limits_enable(bs, blk->root_state.throttle_group);
+blk_io_limits_enable(blk, blk->root_state.throttle_group);
 }
 }
 
@@ -1658,3 +1658,48 @@ int blk_flush_all(void)
 
 return result;
 }
+
+
+/* throttling disk I/O limits */
+void blk_set_io_limits(BlockBackend *blk, ThrottleConfig *cfg)
+{
+int i;
+
+throttle_group_config(blk, cfg);
+
+for (i = 0; i < 2; i++) {
+qemu_co_enter_next(>public.throttled_reqs[i]);
+}
+}
+
+void blk_io_limits_disable(BlockBackend *blk)
+{
+blk->public.io_limits_enabled = false;
+bdrv_start_throttled_reqs(blk_bs(blk));
+throttle_group_unregister_blk(blk);
+}
+
+/* should be called before bdrv_set_io_limits if a limit is set */
+void blk_io_limits_enable(BlockBackend *blk, const char *group)
+{
+assert(!blk->public.io_limits_enabled);
+throttle_group_register_blk(blk, group);
+blk->public.io_limits_enabled = true;
+}
+
+void blk_io_limits_update_group(BlockBackend *blk, const char *group)
+{
+/* this BB is not part of any group */
+if (!blk->public.throttle_state) {
+return;
+}
+
+/* this BB is a part of the same group than the one we want */
+if (!g_strcmp0(throttle_group_get_name(blk), group)) {
+return;
+}
+
+/* need to change the group this BB belong to */
+blk_io_limits_disable(blk);
+blk_io_limits_enable(blk, group);
+}
diff --git a/block/io.c b/block/io.c
index 84370ae..22dbb43 100644
--- a/block/io.c
+++ b/block/io.c
@@ -56,21 +56,8 @@ static void coroutine_fn bdrv_co_do_rw(void *opaque);
 static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors, BdrvRequestFlags flags);
 
-/* throttling disk I/O limits */
-void bdrv_set_io_limits(BlockDriverState *bs,
-ThrottleConfig *cfg)
-{
-int i;
-
-throttle_group_config(bs, cfg);
-
-for (i = 0; i < 2; i++) {
-qemu_co_enter_next(_get_public(bs->blk)->throttled_reqs[i]);
-}
-}
-
 /* this function drain all the throttled IOs */
-static bool bdrv_start_throttled_reqs(BlockDriverState *bs)
+bool bdrv_start_throttled_reqs(BlockDriverState *bs)
 {
 if (!bs->blk) {
 return false;
@@ -94,38 +81,6 @@ static bool bdrv_start_throttled_reqs(BlockDriverState *bs)
 return drained;
 }
 
-void bdrv_io_limits_disable(BlockDriverState *bs)
-{
-blk_get_public(bs->blk)->io_limits_enabled = false;
-bdrv_start_throttled_reqs(bs);
-throttle_group_unregister_blk(bs->blk);
-}
-
-/* should be called before bdrv_set_io_limits if a limit is set */
-void bdrv_io_limits_enable(BlockDriverState *bs, const char *group)
-{
-assert(!blk_get_public(bs->blk)->io_limits_enabled);
-throttle_group_register_blk(bs->blk, group);
-blk_get_public(bs->blk)->io_limits_enabled = true;
-}
-
-void bdrv_io_limits_update_group(BlockDriverState *bs, const char *group)
-{
-/* this bs is not part of any group */
-if (!blk_get_public(bs->blk)->throttle_state) {
-return;
-}
-
-/* this bs is a part of the same group than the one we want */
-if (!g_strcmp0(throttle_group_get_name(bs->blk), group)) {
-return;
-}
-
-/* need to change the group this bs belong to */
-bdrv_io_limits_disable(bs);
-

[Qemu-block] [PATCH 07/12] block: Move actual I/O throttling to BlockBackend

2016-03-22 Thread Kevin Wolf
Signed-off-by: Kevin Wolf 
---
 block/block-backend.c   | 10 ++
 block/io.c  | 10 --
 block/throttle-groups.c |  5 ++---
 include/block/throttle-groups.h |  2 +-
 4 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 8e8dbb2..1567d8b 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -715,6 +715,11 @@ static int coroutine_fn blk_co_preadv(BlockBackend *blk, 
int64_t offset,
 return ret;
 }
 
+/* throttling disk I/O */
+if (blk->public.io_limits_enabled) {
+throttle_group_co_io_limits_intercept(blk, bytes, false);
+}
+
 return bdrv_co_do_preadv(blk_bs(blk), offset, bytes, qiov, flags);
 }
 
@@ -729,6 +734,11 @@ static int coroutine_fn blk_co_pwritev(BlockBackend *blk, 
int64_t offset,
 return ret;
 }
 
+/* throttling disk I/O */
+if (blk->public.io_limits_enabled) {
+throttle_group_co_io_limits_intercept(blk, bytes, true);
+}
+
 if (!blk->enable_write_cache) {
 flags |= BDRV_REQ_FUA;
 }
diff --git a/block/io.c b/block/io.c
index bfafdfa..84370ae 100644
--- a/block/io.c
+++ b/block/io.c
@@ -943,11 +943,6 @@ int coroutine_fn bdrv_co_do_preadv(BlockDriverState *bs,
 flags |= BDRV_REQ_COPY_ON_READ;
 }
 
-/* throttling disk I/O */
-if (bs->blk && blk_get_public(bs->blk)->io_limits_enabled) {
-throttle_group_co_io_limits_intercept(bs, bytes, false);
-}
-
 /* Align read if necessary by padding qiov */
 if (offset & (align - 1)) {
 head_buf = qemu_blockalign(bs, align);
@@ -1292,11 +1287,6 @@ int coroutine_fn bdrv_co_do_pwritev(BlockDriverState *bs,
 return ret;
 }
 
-/* throttling disk I/O */
-if (bs->blk && blk_get_public(bs->blk)->io_limits_enabled) {
-throttle_group_co_io_limits_intercept(bs, bytes, true);
-}
-
 /*
  * Align write if necessary by performing a read-modify-write cycle.
  * Pad qiov with the read parts and be sure to have a tracked request not
diff --git a/block/throttle-groups.c b/block/throttle-groups.c
index af74f76..46e888c 100644
--- a/block/throttle-groups.c
+++ b/block/throttle-groups.c
@@ -280,18 +280,17 @@ static void schedule_next_request(BlockBackend *blk, bool 
is_write)
  * if necessary, and schedule the next request using a round robin
  * algorithm.
  *
- * @bs:the current BlockDriverState
+ * @blk:   the current BlockBackend
  * @bytes: the number of bytes for this I/O
  * @is_write:  the type of operation (read/write)
  */
-void coroutine_fn throttle_group_co_io_limits_intercept(BlockDriverState *bs,
+void coroutine_fn throttle_group_co_io_limits_intercept(BlockBackend *blk,
 unsigned int bytes,
 bool is_write)
 {
 bool must_wait;
 BlockBackend *token;
 
-BlockBackend *blk = bs->blk;
 BlockBackendPublic *blkp = blk_get_public(blk);
 ThrottleGroup *tg = container_of(blkp->throttle_state, ThrottleGroup, ts);
 qemu_mutex_lock(>lock);
diff --git a/include/block/throttle-groups.h b/include/block/throttle-groups.h
index 61e84f3..7019d04 100644
--- a/include/block/throttle-groups.h
+++ b/include/block/throttle-groups.h
@@ -39,7 +39,7 @@ void throttle_group_get_config(BlockDriverState *bs, 
ThrottleConfig *cfg);
 void throttle_group_register_blk(BlockBackend *blk, const char *groupname);
 void throttle_group_unregister_blk(BlockBackend *blk);
 
-void coroutine_fn throttle_group_co_io_limits_intercept(BlockDriverState *bs,
+void coroutine_fn throttle_group_co_io_limits_intercept(BlockBackend *blk,
 unsigned int bytes,
 bool is_write);
 
-- 
1.8.3.1




[Qemu-block] [PATCH 05/12] block: Convert throttle_group_get_name() to BlockBackend

2016-03-22 Thread Kevin Wolf
Signed-off-by: Kevin Wolf 
---
 block/block-backend.c   |  2 +-
 block/io.c  |  2 +-
 block/qapi.c|  2 +-
 block/throttle-groups.c | 12 ++--
 include/block/throttle-groups.h |  2 +-
 tests/test-throttle.c   |  4 ++--
 6 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 4394950..ed8550f 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1563,7 +1563,7 @@ void blk_update_root_state(BlockBackend *blk)
 throttle_group_unref(blk->root_state.throttle_state);
 }
 if (blk->root->bs->throttle_state) {
-const char *name = throttle_group_get_name(blk->root->bs);
+const char *name = throttle_group_get_name(blk);
 blk->root_state.throttle_group = g_strdup(name);
 blk->root_state.throttle_state = throttle_group_incref(name);
 } else {
diff --git a/block/io.c b/block/io.c
index fbfbbb2..b589857 100644
--- a/block/io.c
+++ b/block/io.c
@@ -112,7 +112,7 @@ void bdrv_io_limits_update_group(BlockDriverState *bs, 
const char *group)
 }
 
 /* this bs is a part of the same group than the one we want */
-if (!g_strcmp0(throttle_group_get_name(bs), group)) {
+if (!g_strcmp0(throttle_group_get_name(bs->blk), group)) {
 return;
 }
 
diff --git a/block/qapi.c b/block/qapi.c
index f01894a..e908757 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -117,7 +117,7 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
 info->iops_size = cfg.op_size;
 
 info->has_group = true;
-info->group = g_strdup(throttle_group_get_name(bs));
+info->group = g_strdup(throttle_group_get_name(bs->blk));
 }
 
 info->write_threshold = bdrv_write_threshold_get(bs);
diff --git a/block/throttle-groups.c b/block/throttle-groups.c
index 4ba0fa3..295bed0 100644
--- a/block/throttle-groups.c
+++ b/block/throttle-groups.c
@@ -133,16 +133,16 @@ void throttle_group_unref(ThrottleState *ts)
 qemu_mutex_unlock(_groups_lock);
 }
 
-/* Get the name from a BlockDriverState's ThrottleGroup. The name (and
- * the pointer) is guaranteed to remain constant during the lifetime
- * of the group.
+/* Get the name from a BlockBackend's ThrottleGroup. The name (and the pointer)
+ * is guaranteed to remain constant during the lifetime of the group.
  *
- * @bs:   a BlockDriverState that is member of a throttling group
+ * @blk:  a BlockBackend that is member of a throttling group
  * @ret:  the name of the group.
  */
-const char *throttle_group_get_name(BlockDriverState *bs)
+const char *throttle_group_get_name(BlockBackend *blk)
 {
-ThrottleGroup *tg = container_of(bs->throttle_state, ThrottleGroup, ts);
+ThrottleGroup *tg = container_of(blk_bs(blk)->throttle_state,
+ ThrottleGroup, ts);
 return tg->name;
 }
 
diff --git a/include/block/throttle-groups.h b/include/block/throttle-groups.h
index 46b843e..61e84f3 100644
--- a/include/block/throttle-groups.h
+++ b/include/block/throttle-groups.h
@@ -28,7 +28,7 @@
 #include "qemu/throttle.h"
 #include "block/block_int.h"
 
-const char *throttle_group_get_name(BlockDriverState *bs);
+const char *throttle_group_get_name(BlockBackend *blk);
 
 ThrottleState *throttle_group_incref(const char *name);
 void throttle_group_unref(ThrottleState *ts);
diff --git a/tests/test-throttle.c b/tests/test-throttle.c
index f03f71a..85d3de2 100644
--- a/tests/test-throttle.c
+++ b/tests/test-throttle.c
@@ -597,8 +597,8 @@ static void test_groups(void)
 g_assert(bdrv2->throttle_state != NULL);
 g_assert(bdrv3->throttle_state != NULL);
 
-g_assert(!strcmp(throttle_group_get_name(bdrv1), "bar"));
-g_assert(!strcmp(throttle_group_get_name(bdrv2), "foo"));
+g_assert(!strcmp(throttle_group_get_name(blk1), "bar"));
+g_assert(!strcmp(throttle_group_get_name(blk2), "foo"));
 g_assert(bdrv1->throttle_state == bdrv3->throttle_state);
 
 /* Setting the config of a group member affects the whole group */
-- 
1.8.3.1




[Qemu-block] [PATCH 11/12] block: Decouple throttling from BlockDriverState

2016-03-22 Thread Kevin Wolf
This moves the throttling related part of the BDS life cycle management
to BlockBackend. The throttling group reference is now kept even when no
medium is inserted.

With this commit, throttling isn't disabled and then re-enabled any more
during graph reconfiguration. This fixes the temporary breakage of I/O
throttling when used with live snapshots or block jobs that manipulate
the graph.

Signed-off-by: Kevin Wolf 
---
 block.c   | 35 ---
 block/block-backend.c | 37 ++---
 blockdev.c| 27 +--
 include/block/block_int.h |  3 ---
 4 files changed, 23 insertions(+), 79 deletions(-)

diff --git a/block.c b/block.c
index 1fe35cd..fb37b91 100644
--- a/block.c
+++ b/block.c
@@ -39,7 +39,6 @@
 #include "qmp-commands.h"
 #include "qemu/timer.h"
 #include "qapi-event.h"
-#include "block/throttle-groups.h"
 
 #ifdef CONFIG_BSD
 #include 
@@ -2116,11 +2115,6 @@ static void bdrv_close(BlockDriverState *bs)
 
 assert(!bs->job);
 
-/* Disable I/O limits and drain all pending throttled requests */
-if (bs->blk && blk_get_public(bs->blk)->throttle_state) {
-blk_io_limits_disable(bs->blk);
-}
-
 bdrv_drained_begin(bs); /* complete I/O */
 bdrv_flush(bs);
 bdrv_drain(bs); /* in case flush left pending I/O */
@@ -2249,27 +2243,6 @@ static void swap_feature_fields(BlockDriverState *bs_top,
 bdrv_move_feature_fields(, bs_top);
 bdrv_move_feature_fields(bs_top, bs_new);
 bdrv_move_feature_fields(bs_new, );
-
-assert(!bs_new->blk);
-if (bs_top->blk && blk_get_public(bs_top->blk)->throttle_state) {
-assert(blk_get_public(bs_top->blk)->io_limits_enabled);
-/*
- * FIXME Need to break I/O throttling with graph manipulations
- * temporarily because of conflicting invariants (3. will go away when
- * throttling is fully converted to work on BlockBackends):
- *
- * 1. Every BlockBackend has a single root BDS
- * 2. I/O throttling functions require an attached BlockBackend
- * 3. We need to first enable throttling on the new BDS and then
- *disable it on the old one (because of throttle group refcounts)
- */
-#if 0
-bdrv_io_limits_enable(bs_new, throttle_group_get_name(bs_top));
-bdrv_io_limits_disable(bs_top);
-#else
-abort();
-#endif
-}
 }
 
 /*
@@ -3637,10 +3610,6 @@ void bdrv_detach_aio_context(BlockDriverState *bs)
 baf->detach_aio_context(baf->opaque);
 }
 
-if (bs->blk && blk_get_public(bs->blk)->throttle_state) {
-throttle_timers_detach_aio_context(
-_get_public(bs->blk)->throttle_timers);
-}
 if (bs->drv->bdrv_detach_aio_context) {
 bs->drv->bdrv_detach_aio_context(bs);
 }
@@ -3674,10 +3643,6 @@ void bdrv_attach_aio_context(BlockDriverState *bs,
 if (bs->drv->bdrv_attach_aio_context) {
 bs->drv->bdrv_attach_aio_context(bs, new_context);
 }
-if (bs->blk && blk_get_public(bs->blk)->throttle_state) {
-throttle_timers_attach_aio_context(
-_get_public(bs->blk)->throttle_timers, new_context);
-}
 
 QLIST_FOREACH(ban, >aio_notifiers, list) {
 ban->attached_aio_context(new_context, ban->opaque);
diff --git a/block/block-backend.c b/block/block-backend.c
index c71ce4d..8b4eb1a 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -185,10 +185,6 @@ static void blk_delete(BlockBackend *blk)
 }
 assert(QLIST_EMPTY(>remove_bs_notifiers.notifiers));
 assert(QLIST_EMPTY(>insert_bs_notifiers.notifiers));
-if (blk->root_state.throttle_state) {
-g_free(blk->root_state.throttle_group);
-throttle_group_unref(blk->root_state.throttle_state);
-}
 QTAILQ_REMOVE(_backends, blk, link);
 drive_info_del(blk->legacy_dinfo);
 block_acct_cleanup(>stats);
@@ -442,11 +438,11 @@ void blk_remove_bs(BlockBackend *blk)
 assert(blk->root->bs->blk == blk);
 
 notifier_list_notify(>remove_bs_notifiers, blk);
+if (blk->public.throttle_state) {
+throttle_timers_detach_aio_context(>public.throttle_timers);
+}
 
 blk_update_root_state(blk);
-if (blk->public.io_limits_enabled) {
-blk_io_limits_disable(blk);
-}
 
 blk->root->bs->blk = NULL;
 bdrv_root_unref_child(blk->root);
@@ -465,6 +461,10 @@ void blk_insert_bs(BlockBackend *blk, BlockDriverState *bs)
 bs->blk = blk;
 
 notifier_list_notify(>insert_bs_notifiers, blk);
+if (blk->public.throttle_state) {
+throttle_timers_attach_aio_context(
+>public.throttle_timers, bdrv_get_aio_context(bs));
+}
 }
 
 /*
@@ -1405,7 +1405,14 @@ void blk_set_aio_context(BlockBackend *blk, AioContext 
*new_context)
 BlockDriverState *bs = blk_bs(blk);
 
 if (bs) {
+if (blk->public.throttle_state) {
+

[Qemu-block] [PATCH 01/12] block: Don't disable I/O throttling on sync requests

2016-03-22 Thread Kevin Wolf
We had to disable I/O throttling with synchronous requests because we
didn't use to run timers in nested event loops when the code was
introduced. This isn't true any more, and throttling works just fine
even when using the synchronous API.

The removed code is in fact dead code since commit a8823a3b ('block: Use
blk_co_pwritev() for blk_write()') because I/O throttling can only be
set on the top layer, but BlockBackend always uses the coroutine
interface now instead of using the sync API emulation in block.c.

Signed-off-by: Kevin Wolf 
---
 block/io.c | 11 ---
 1 file changed, 11 deletions(-)

diff --git a/block/io.c b/block/io.c
index cce508a..e4438da 100644
--- a/block/io.c
+++ b/block/io.c
@@ -561,17 +561,6 @@ static int bdrv_prwv_co(BlockDriverState *bs, int64_t 
offset,
 .flags = flags,
 };
 
-/**
- * In sync call context, when the vcpu is blocked, this throttling timer
- * will not fire; so the I/O throttling function has to be disabled here
- * if it has been enabled.
- */
-if (bs->io_limits_enabled) {
-fprintf(stderr, "Disabling I/O throttling on '%s' due "
-"to synchronous I/O.\n", bdrv_get_device_name(bs));
-bdrv_io_limits_disable(bs);
-}
-
 if (qemu_in_coroutine()) {
 /* Fast-path if already in coroutine context */
 bdrv_rw_co_entry();
-- 
1.8.3.1




[Qemu-block] [PATCH 00/12] block: Move I/O throttling to BlockBackend

2016-03-22 Thread Kevin Wolf
This is another feature that was "logically" part of the BlockBackend, but
implemented as a BlockDriverState feature. It was always kept on top using
swap_feature_fields().

This series moves it to be actually implemented in the BlockBackend, removing
another obstacle for removing bs->blk and allowing multiple BBs per BDS.

Depends on 'block: Implement writethrough in BlockBackend'.

Kevin Wolf (12):
  block: Don't disable I/O throttling on sync requests
  block: Make sure throttled BDSes always have a BB
  block: Introduce BlockBackendPublic
  block: throttle-groups: Use BlockBackend pointers internally
  block: Convert throttle_group_get_name() to BlockBackend
  block: Move throttling fields from BDS to BB
  block: Move actual I/O throttling to BlockBackend
  block: Move I/O throttling configuration functions to BlockBackend
  block: Introduce BdrvChild.opaque
  block: Drain throttling queue with BdrvChild callback
  block: Decouple throttling from BlockDriverState
  block: Don't check throttled reqs in bdrv_requests_pending()

 block.c |  23 +
 block/block-backend.c   | 146 +-
 block/io.c  | 115 +++--
 block/qapi.c|   6 +-
 block/throttle-groups.c | 223 +---
 blockdev.c  |  43 +++-
 include/block/block.h   |   6 +-
 include/block/block_int.h   |  23 +
 include/block/throttle-groups.h |  12 +--
 include/sysemu/block-backend.h  |  27 +
 tests/test-throttle.c   |  62 ++-
 11 files changed, 345 insertions(+), 341 deletions(-)

-- 
1.8.3.1




[Qemu-block] [PATCH 04/12] block: throttle-groups: Use BlockBackend pointers internally

2016-03-22 Thread Kevin Wolf
As a first step towards moving I/O throttling to the BlockBackend level,
this patch changes all pointers in struct ThrottleGroup from referencing
a BlockDriverState to referencing a BlockBackend.

This change is valid because we made sure that throttling can only be
enabled on BDSes which have a BB attached.

Signed-off-by: Kevin Wolf 
---
 block/io.c  |   4 +-
 block/throttle-groups.c | 132 
 include/block/block_int.h   |   1 -
 include/block/throttle-groups.h |   4 +-
 include/sysemu/block-backend.h  |   4 ++
 tests/test-throttle.c   |  13 ++--
 6 files changed, 82 insertions(+), 76 deletions(-)

diff --git a/block/io.c b/block/io.c
index e4438da..fbfbbb2 100644
--- a/block/io.c
+++ b/block/io.c
@@ -93,14 +93,14 @@ void bdrv_io_limits_disable(BlockDriverState *bs)
 {
 bs->io_limits_enabled = false;
 bdrv_start_throttled_reqs(bs);
-throttle_group_unregister_bs(bs);
+throttle_group_unregister_blk(bs->blk);
 }
 
 /* should be called before bdrv_set_io_limits if a limit is set */
 void bdrv_io_limits_enable(BlockDriverState *bs, const char *group)
 {
 assert(!bs->io_limits_enabled);
-throttle_group_register_bs(bs, group);
+throttle_group_register_blk(bs->blk, group);
 bs->io_limits_enabled = true;
 }
 
diff --git a/block/throttle-groups.c b/block/throttle-groups.c
index 4920e09..4ba0fa3 100644
--- a/block/throttle-groups.c
+++ b/block/throttle-groups.c
@@ -23,6 +23,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "sysemu/block-backend.h"
 #include "block/throttle-groups.h"
 #include "qemu/queue.h"
 #include "qemu/thread.h"
@@ -57,8 +58,8 @@ typedef struct ThrottleGroup {
 
 QemuMutex lock; /* This lock protects the following four fields */
 ThrottleState ts;
-QLIST_HEAD(, BlockDriverState) head;
-BlockDriverState *tokens[2];
+QLIST_HEAD(, BlockBackendPublic) head;
+BlockBackend *tokens[2];
 bool any_timer_armed[2];
 
 /* These two are protected by the global throttle_groups_lock */
@@ -145,77 +146,77 @@ const char *throttle_group_get_name(BlockDriverState *bs)
 return tg->name;
 }
 
-/* Return the next BlockDriverState in the round-robin sequence,
- * simulating a circular list.
+/* Return the next BlockBackend in the round-robin sequence, simulating a
+ * circular list.
  *
  * This assumes that tg->lock is held.
  *
- * @bs:  the current BlockDriverState
- * @ret: the next BlockDriverState in the sequence
+ * @blk: the current BlockBackend
+ * @ret: the next BlockBackend in the sequence
  */
-static BlockDriverState *throttle_group_next_bs(BlockDriverState *bs)
+static BlockBackend *throttle_group_next_blk(BlockBackend *blk)
 {
+BlockDriverState *bs = blk_bs(blk);
 ThrottleState *ts = bs->throttle_state;
 ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
-BlockDriverState *next = QLIST_NEXT(bs, round_robin);
+BlockBackendPublic *next = QLIST_NEXT(blk_get_public(blk), round_robin);
 
 if (!next) {
-return QLIST_FIRST(>head);
+next = QLIST_FIRST(>head);
 }
 
-return next;
+return blk_by_public(next);
 }
 
-/* Return the next BlockDriverState in the round-robin sequence with
- * pending I/O requests.
+/* Return the next BlockBackend in the round-robin sequence with pending I/O
+ * requests.
  *
  * This assumes that tg->lock is held.
  *
- * @bs:the current BlockDriverState
+ * @blk:   the current BlockBackend
  * @is_write:  the type of operation (read/write)
- * @ret:   the next BlockDriverState with pending requests, or bs
- * if there is none.
+ * @ret:   the next BlockBackend with pending requests, or blk if there is
+ * none.
  */
-static BlockDriverState *next_throttle_token(BlockDriverState *bs,
- bool is_write)
+static BlockBackend *next_throttle_token(BlockBackend *blk, bool is_write)
 {
-ThrottleGroup *tg = container_of(bs->throttle_state, ThrottleGroup, ts);
-BlockDriverState *token, *start;
+ThrottleGroup *tg = container_of(blk_bs(blk)->throttle_state,
+ ThrottleGroup, ts);
+BlockBackend *token, *start;
 
 start = token = tg->tokens[is_write];
 
 /* get next bs round in round robin style */
-token = throttle_group_next_bs(token);
-while (token != start && !token->pending_reqs[is_write]) {
-token = throttle_group_next_bs(token);
+token = throttle_group_next_blk(token);
+while (token != start && !blk_bs(token)->pending_reqs[is_write]) {
+token = throttle_group_next_blk(token);
 }
 
 /* If no IO are queued for scheduling on the next round robin token
  * then decide the token is the current bs because chances are
  * the current bs get the current request queued.
  */
-if (token == start && !token->pending_reqs[is_write]) {
-token = bs;
+if (token == start && 

[Qemu-block] [PATCH 02/12] block: Make sure throttled BDSes always have a BB

2016-03-22 Thread Kevin Wolf
It was already true in principle that a throttled BDS always has a BB
attached, except that the order of operations while attaching or
detaching a BDS to/from a BB wasn't careful enough.

This commit breaks graph manipulations while I/O throttling is enabled.
It would have been possible, but quite cumbersome, to keep things
working with some temporary hacks, so it's not worth the hassle. We'll
fix things again in a minute.

Signed-off-by: Kevin Wolf 
---
 block.c   | 14 ++
 block/block-backend.c |  3 +++
 blockdev.c|  4 ++--
 tests/test-throttle.c | 11 ---
 4 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/block.c b/block.c
index c22cb3f..4f05586 100644
--- a/block.c
+++ b/block.c
@@ -2255,8 +2255,22 @@ static void swap_feature_fields(BlockDriverState *bs_top,
 assert(!bs_new->throttle_state);
 if (bs_top->throttle_state) {
 assert(bs_top->io_limits_enabled);
+/*
+ * FIXME Need to break I/O throttling with graph manipulations
+ * temporarily because of conflicting invariants (3. will go away when
+ * throttling is fully converted to work on BlockBackends):
+ *
+ * 1. Every BlockBackend has a single root BDS
+ * 2. I/O throttling functions require an attached BlockBackend
+ * 3. We need to first enable throttling on the new BDS and then
+ *disable it on the old one (because of throttle group refcounts)
+ */
+#if 0
 bdrv_io_limits_enable(bs_new, throttle_group_get_name(bs_top));
 bdrv_io_limits_disable(bs_top);
+#else
+abort();
+#endif
 }
 }
 
diff --git a/block/block-backend.c b/block/block-backend.c
index a528674..df8f717 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -419,6 +419,9 @@ void blk_remove_bs(BlockBackend *blk)
 notifier_list_notify(>remove_bs_notifiers, blk);
 
 blk_update_root_state(blk);
+if (blk->root->bs->io_limits_enabled) {
+bdrv_io_limits_disable(blk->root->bs);
+}
 
 blk->root->bs->blk = NULL;
 bdrv_root_unref_child(blk->root);
diff --git a/blockdev.c b/blockdev.c
index 5a53f59..cac9afd 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2568,8 +2568,6 @@ void qmp_blockdev_change_medium(const char *device, const 
char *filename,
 goto fail;
 }
 
-blk_apply_root_state(blk, medium_bs);
-
 bdrv_add_key(medium_bs, NULL, );
 if (err) {
 error_propagate(errp, err);
@@ -2594,6 +2592,8 @@ void qmp_blockdev_change_medium(const char *device, const 
char *filename,
 goto fail;
 }
 
+blk_apply_root_state(blk, medium_bs);
+
 qmp_blockdev_close_tray(device, errp);
 
 fail:
diff --git a/tests/test-throttle.c b/tests/test-throttle.c
index 59675fa..d0b3c86 100644
--- a/tests/test-throttle.c
+++ b/tests/test-throttle.c
@@ -573,11 +573,16 @@ static void test_accounting(void)
 static void test_groups(void)
 {
 ThrottleConfig cfg1, cfg2;
+BlockBackend *blk1, *blk2, *blk3;
 BlockDriverState *bdrv1, *bdrv2, *bdrv3;
 
-bdrv1 = bdrv_new();
-bdrv2 = bdrv_new();
-bdrv3 = bdrv_new();
+blk1 = blk_new_with_bs(_abort);
+blk2 = blk_new_with_bs(_abort);
+blk3 = blk_new_with_bs(_abort);
+
+bdrv1 = blk_bs(blk1);
+bdrv2 = blk_bs(blk2);
+bdrv3 = blk_bs(blk3);
 
 g_assert(bdrv1->throttle_state == NULL);
 g_assert(bdrv2->throttle_state == NULL);
-- 
1.8.3.1




Re: [Qemu-block] [PATCH] vpc: use current_size field for XenServer VHD images

2016-03-22 Thread Grant Wu
root@storage-4:~# hexdump -C -n 512 -s 512 /dev/vgstorage/tartanfsbackup
0200  63 78 73 70 61 72 73 65  ff ff ff ff ff ff ff ff
 |cxsparse|
0210  00 00 00 00 00 00 06 00  00 01 00 00 00 10 00 00
 ||
0220  00 20 00 00 ff ff f4 67  00 00 00 00 00 00 00 00  |.
.g|
0230  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
 ||
*
0400


Thanks,

Grant Wu
grant...@gmail.com


Re: [Qemu-block] [Qemu-devel] [PATCH 06/20] xen_disk: Call blk_set_enable_write_cache() explicitly

2016-03-22 Thread Stefano Stabellini
On Fri, 18 Mar 2016, Kevin Wolf wrote:
> Signed-off-by: Kevin Wolf 
> ---
>  hw/block/xen_disk.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index 635328f..c358709 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -888,12 +888,14 @@ static int blk_connect(struct XenDevice *xendev)
>  struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, 
> xendev);
>  int pers, index, qflags;
>  bool readonly = true;
> +bool writethrough = true;
>  
>  /* read-only ? */
>  if (blkdev->directiosafe) {
>  qflags = BDRV_O_NOCACHE | BDRV_O_NATIVE_AIO;
>  } else {
> -qflags = BDRV_O_CACHE_WB;
> +qflags = 0;

You might as well initialize qflags to 0 above and leave it unchanged
here. In any case:

Acked-by: Stefano Stabellini 


> +writethrough = false;
>  }
>  if (strcmp(blkdev->mode, "w") == 0) {
>  qflags |= BDRV_O_RDWR;
> @@ -925,6 +927,7 @@ static int blk_connect(struct XenDevice *xendev)
>  error_free(local_err);
>  return -1;
>  }
> +blk_set_enable_write_cache(blkdev->blk, !writethrough);
>  } else {
>  /* setup via qemu cmdline -> already setup for us */
>  xen_be_printf(>xendev, 2, "get configured bdrv (cmdline 
> setup)\n");
> -- 
> 1.8.3.1
> 
> 



[Qemu-block] [PATCH] block: Remove blk_set_bs()

2016-03-22 Thread Kevin Wolf
The function is unused since commit f21d96d0 ('block: Use BdrvChild in
BlockBackend').

Signed-off-by: Kevin Wolf 
---
 block/block-backend.c | 17 -
 include/block/block_int.h |  2 --
 2 files changed, 19 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index dca21d1..4b44d46 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -371,23 +371,6 @@ BlockDriverState *blk_bs(BlockBackend *blk)
 }
 
 /*
- * Changes the BlockDriverState attached to @blk
- */
-void blk_set_bs(BlockBackend *blk, BlockDriverState *bs)
-{
-bdrv_ref(bs);
-
-if (blk->root) {
-blk->root->bs->blk = NULL;
-bdrv_root_unref_child(blk->root);
-}
-assert(bs->blk == NULL);
-
-blk->root = bdrv_root_attach_child(bs, "root", _root);
-bs->blk = blk;
-}
-
-/*
  * Return @blk's DriveInfo if any, else null.
  */
 DriveInfo *blk_legacy_dinfo(BlockBackend *blk)
diff --git a/include/block/block_int.h b/include/block/block_int.h
index ba6e9ac..a33b0de 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -704,8 +704,6 @@ BdrvChild *bdrv_root_attach_child(BlockDriverState 
*child_bs,
   const BdrvChildRole *child_role);
 void bdrv_root_unref_child(BdrvChild *child);
 
-void blk_set_bs(BlockBackend *blk, BlockDriverState *bs);
-
 void blk_dev_change_media_cb(BlockBackend *blk, bool load);
 bool blk_dev_has_removable_media(BlockBackend *blk);
 bool blk_dev_has_tray(BlockBackend *blk);
-- 
1.8.3.1




Re: [Qemu-block] [PATCH 4/4] virtio-blk: Clean up start/stop with mutex and BH

2016-03-22 Thread Fam Zheng
On Thu, 03/17 16:07, Paolo Bonzini wrote:
> 
> 
> On 17/03/2016 16:00, Stefan Hajnoczi wrote:
> >> > +data = g_new(VirtIOBlockStartData, 1);
> >> > +data->vblk = vblk;
> >> > +data->bh = aio_bh_new(s->ctx, virtio_blk_data_plane_start_bh_cb, 
> >> > data);
> >> > +qemu_bh_schedule(data->bh);
> >> > +qemu_mutex_unlock(>start_stop_lock);
> >> >  return;
> > This BH usage pattern is dangerous:
> > 
> > 1. The BH is created and scheduled.
> > 2. Before the BH executes the device is unrealized.
> > 3. The data->bh pointer is inaccessible so we have a dangling BH that
> >will access vblk after vblk has been freed.
> > 
> > In some cases it can be safe but I don't see why the pattern is safe in
> > this case.  Either the BH needs to hold some sort of reference to keep
> > vblk alive, or vblk needs to know about pending BHs so they can be
> > deleted.
> 
> You're right.  After unrealizing virtio_blk_data_plane_stop has set of
> vblk->dataplane_started = false, so that's covered.  However, you still
> need an object_ref/object_object_unref pair.

Is it safe to call object_unref outside BQL?

Fam



Re: [Qemu-block] [Qemu-devel] [PATCH 0/4] Tweaks around virtio-blk start/stop

2016-03-22 Thread Paolo Bonzini


On 22/03/2016 12:59, Cornelia Huck wrote:
>> > They can be fixed with just an extra object_ref/object_unref.
>> > 
>> > I didn't understand that Tu Bo also needed the BH fix, and with that
>> > information it makes sense.  Passing the assign value ensures that
>> > ioeventfd remains always assigned.  With the CPU threads out of the
>> > picture, the BH becomes enough to make everything thread-safe.
> Yes, this makes sense.
> 
> Might we still have a hole somewhere in dataplane teardown? Probably
> not, from reading the code, even if it runs in cpu thread context.

The bug arises when the main thread sets started = true, a CPU thread
comes along while the ioeventfd is reset, and as soon as the BQL is
released by the main thread the CPU thread thinks it is a dataplane
thread.  This does horrible things to non-reentrant code.  For stop we
should be safe because the CPU thread is the one that sets started = false.

IOW, we should be safe as long as the ioeventfd is never unassigned
(your fix) _and_ we ensure serialization between threads that touch
dataplane_started (Fam's fix).

Paolo



Re: [Qemu-block] [Qemu-devel] [PATCH 0/4] Tweaks around virtio-blk start/stop

2016-03-22 Thread Cornelia Huck
On Tue, 22 Mar 2016 10:46:58 +0100
Paolo Bonzini  wrote:

> On 22/03/2016 10:07, Cornelia Huck wrote:
> > So far, we had the best results with my refactoring + the mutex/bh
> > change. Two problems:
> > 
> > - We don't really understand yet why my refactoring helps, but passing
> > the assign value is a good canditate (and it's agreed that this fixes a
> > bug, I think.)
> > - There's some problem with the bh, if I understood Stefan correctly.
> 
> They can be fixed with just an extra object_ref/object_unref.
> 
> I didn't understand that Tu Bo also needed the BH fix, and with that
> information it makes sense.  Passing the assign value ensures that
> ioeventfd remains always assigned.  With the CPU threads out of the
> picture, the BH becomes enough to make everything thread-safe.

Yes, this makes sense.

Might we still have a hole somewhere in dataplane teardown? Probably
not, from reading the code, even if it runs in cpu thread context.




Re: [Qemu-block] [Qemu-devel] [PATCH 06/22] hbitmap: load/store

2016-03-22 Thread Vladimir Sementsov-Ogievskiy

On 22.03.2016 01:42, John Snow wrote:


On 03/15/2016 04:04 PM, Vladimir Sementsov-Ogievskiy wrote:

Add functions for load/store HBitmap to BDS, using clusters table:
Last level of the bitmap is splitted into chunks of 'cluster_size'
size. Each cell of the table contains offset in bds, to load/store
corresponding chunk.

Also,
 0 in cell means all-zeroes-chunk (should not be saved)
 1 in cell means all-ones-chunk (should not be saved)
 hbitmap_prepare_store() fills table with
   0 for all-zeroes chunks
   1 for all-ones chunks
   2 for others

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/dirty-bitmap.c |  23 +
  include/block/dirty-bitmap.h |  11 +++
  include/qemu/hbitmap.h   |  12 +++
  util/hbitmap.c   | 209 +++
  4 files changed, 255 insertions(+)

diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index e68c177..816c6ee 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -396,3 +396,26 @@ int64_t bdrv_get_dirty_count(BdrvDirtyBitmap *bitmap)
  {
  return hbitmap_count(bitmap->bitmap);
  }
+
+int bdrv_dirty_bitmap_load(BdrvDirtyBitmap *bitmap, BlockDriverState *bs,
+   const uint64_t *table, uint32_t table_size,
+   uint32_t cluster_size)
+{
+return hbitmap_load(bitmap->bitmap, bs, table, table_size, cluster_size);
+}
+
+int bdrv_dirty_bitmap_prepare_store(const BdrvDirtyBitmap *bitmap,
+uint32_t cluster_size,
+uint64_t *table,
+uint32_t *table_size)
+{
+return hbitmap_prepare_store(bitmap->bitmap, cluster_size,
+ table, table_size);
+}
+
+int bdrv_dirty_bitmap_store(const BdrvDirtyBitmap *bitmap, BlockDriverState 
*bs,
+const uint64_t *table, uint32_t table_size,
+uint32_t cluster_size)
+{
+return hbitmap_store(bitmap->bitmap, bs, table, table_size, cluster_size);
+}
diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index 27515af..20cb540 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -43,4 +43,15 @@ void bdrv_set_dirty_iter(struct HBitmapIter *hbi, int64_t 
offset);
  int64_t bdrv_get_dirty_count(BdrvDirtyBitmap *bitmap);
  void bdrv_dirty_bitmap_truncate(BlockDriverState *bs);
  
+int bdrv_dirty_bitmap_load(BdrvDirtyBitmap *bitmap, BlockDriverState *bs,

+   const uint64_t *table, uint32_t table_size,
+   uint32_t cluster_size);
+int bdrv_dirty_bitmap_prepare_store(const BdrvDirtyBitmap *bitmap,
+uint32_t cluster_size,
+uint64_t *table,
+uint32_t *table_size);
+int bdrv_dirty_bitmap_store(const BdrvDirtyBitmap *bitmap, BlockDriverState 
*bs,
+const uint64_t *table, uint32_t table_size,
+uint32_t cluster_size);
+
  #endif
diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
index 6d1da4d..d83bb79 100644
--- a/include/qemu/hbitmap.h
+++ b/include/qemu/hbitmap.h
@@ -241,5 +241,17 @@ static inline size_t hbitmap_iter_next_word(HBitmapIter 
*hbi, unsigned long *p_c
  return hbi->pos;
  }
  
+typedef struct BlockDriverState BlockDriverState;

+
+int hbitmap_load(HBitmap *bitmap, BlockDriverState *bs,
+ const uint64_t *table, uint32_t table_size,
+ uint32_t cluster_size);
+
+int hbitmap_prepare_store(const HBitmap *bitmap, uint32_t cluster_size,
+  uint64_t *table, uint32_t *table_size);
+
+int hbitmap_store(HBitmap *bitmap, BlockDriverState *bs,
+  const uint64_t *table, uint32_t table_size,
+  uint32_t cluster_size);
  
  #endif

diff --git a/util/hbitmap.c b/util/hbitmap.c
index 28595fb..1960e4f 100644
--- a/util/hbitmap.c
+++ b/util/hbitmap.c
@@ -15,6 +15,8 @@
  #include "qemu/host-utils.h"
  #include "trace.h"
  
+#include "block/block.h"

+

This is a bit of a red flag -- we shouldn't need block layer specifics
in the subcomponent-agnostic HBitmap utility.

Further, by relying on these facilities here in hbitmap.c, "make check"
no longer can compile the relevant hbitmap tests.


Thanks. Locally I've already fixed it (add entity into tests Makefile). 
But in fact, it may be better to move these functions into 
block/dirty_bitmap.c or to new file like block/hbitmap.c.




Make sure that each intermediate commit here passes these necessary
tests, test-hbitmap in particular for each, and a "make check" overall
at the end of your series.

--js


  /* HBitmaps provides an array of bits.  The bits are stored as usual in an
   * array of unsigned longs, but HBitmap is also optimized to provide fast
   * iteration over set bits; going from one 

Re: [Qemu-block] [PATCH] vpc: use current_size field for XenServer VHD images

2016-03-22 Thread Stefan Hajnoczi
On Mon, Mar 21, 2016 at 02:37:46PM -0400, Spencer Baugh wrote:
> Stefan Hajnoczi  writes:
> > What output did you get from "qemu-img info /dev/dm-1"?
> 
> After the patching:
> 
> root@storage-4:~# hexdump -C -n 512 /dev/vgstorage/tartanfsbackup
>   63 6f 6e 65 63 74 69 78  00 00 00 02 00 01 00 00 |conectix|
> 0010  00 00 00 00 00 00 02 00  14 10 4b 1f 71 65 6d 32 |..K.qem2|
> 0020  00 01 00 03 00 00 00 00  00 00 01 c4 4f 40 00 00 |O@..|
> 0030  00 00 01 f4 00 00 00 00  ff ff 10 ff 00 00 00 03 ||
> 0040  ff ff ed 81 6d 0d 68 bd  9c fd 4d 65 a3 17 c7 6c |m.h...Me...l|
> 0050  06 a6 aa bf 00 00 00 00  00 00 00 00 00 00 00 00 ||
> 0060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 ||
> *
> 0200
> root@storage-4:~# qemu-img info /dev/vgstorage/tartanfsbackup
> block-vpc: The header checksum of '/dev/vgstorage/tartanfsbackup' is 
> incorrect.

This is a warning message that does not cause opening the file to fail.
It makes sense since the header was modified without setting the new
checksum value.

> qemu-img: Could not open '/dev/vgstorage/tartanfsbackup': Could not open 
> '/dev/vgstorage/tartanfsbackup': Invalid argument

Please also post the output of "hexdump -C -n 512 -s 512
/dev/vgstorage/tartanfsbackup".  This is another header structure and it
gets parsed while opening the file.

Thanks,
Stefan


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH v6 08/11] block: add support for --image-opts in block I/O tests

2016-03-22 Thread Daniel P. Berrange
On Mon, Mar 21, 2016 at 02:08:16PM -0600, Eric Blake wrote:
> On 03/21/2016 08:11 AM, Daniel P. Berrange wrote:
> > Currently all block tests use the traditional syntax for images
> > just specifying a filename. To support the LUKS driver without
> > resorting to JSON, the tests need to be able to use the new
> > --image-opts argument to qemu-img and qemu-io.
> > 
> > This introduces a new env variable IMGOPTSSYNTAX. If this is
> 
> Would IMG_OPTS_SYNTAX be any more legible?

[snip]

> > @@ -53,6 +53,7 @@ export QEMU_IO_OPTIONS=""
> >  export CACHEMODE_IS_DEFAULT=true
> >  export QEMU_OPTIONS="-nodefaults"
> >  export VALGRIND_QEMU=
> > +export IMGOPTSSYNTAX=false
> 
> Particularly since we use _ between words in other variables above.

It isn't visible from the diff context but just above these
quoted lines we have IMGFMT, IMGPROTO and IMGOPTS, though we
then also have the inconssitent IMGFMT_GENERIC :-)


> >  _check_test_img()
> >  {
> > -$QEMU_IMG check "$@" -f $IMGFMT "$TEST_IMG" 2>&1 | _filter_testdir | \
> > +(
> > +if [ "$IMGOPTSSYNTAX" = "true" ]; then
> > +$QEMU_IMG check --image-opts "$@" "$TEST_IMG" 2>&1
> > +else
> > +$QEMU_IMG check "$@" -f $IMGFMT "$TEST_IMG" 2>&1
> > +fi
> > +) | _filter_testdir | \
> 
> Would '{ if ... fi; } |' be any better than a subshell?

FWIW this is the style used elsewhere in the I/O tests


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-block] [Qemu-devel] [PATCH v3 03/10] qom: support arbitrary non-scalar properties with -object

2016-03-22 Thread Daniel P. Berrange
On Tue, Mar 22, 2016 at 10:07:42AM +0100, Markus Armbruster wrote:
> Eric Blake  writes:
> 
> > On 03/10/2016 11:59 AM, Daniel P. Berrange wrote:
> >> The current -object command line syntax only allows for
> >> creation of objects with scalar properties, or a list
> >> with a fixed scalar element type. Objects which have
> >> properties that are represented as structs in the QAPI
> >> schema cannot be created using -object.
> >> 
> >> This is a design limitation of the way the OptsVisitor
> >> is written. It simply iterates over the QemuOpts values
> >> as a flat list. The support for lists is enabled by
> >> allowing the same key to be repeated in the opts string.
> >> 
> >> It is not practical to extend the OptsVisitor to support
> >> more complex data structures while also maintaining
> >> the existing list handling behaviour that is relied upon
> >> by other areas of QEMU.
> >
> > Zoltán Kővágó tried earlier with his GSoC patches for the audio
> > subsystem last year, but those got stalled waiting for qapi enhancements
> > to go in.
> 
> Yet another series stalled on the big QAPI rework.  Hitting a GSoC
> student that way is really unfortunate.
> 
> >But I think your approach is indeed a bit nicer (rather than
> > making the warty OptsVisitor even wartier, just avoid it).
> 
> QemuOpts defines an important part of the command line language, namely
> (most of) the syntax of many option arguments.  It parses them into a
> set of (name, value) pairs.
> 
> "Most of", because additional syntax may hide in the parameter value.
> 
> Parameter values are typed, except when they aren't.  Types are limited
> to string, bool, uint64_t number (accepts negative numbers and casts
> them) and uint64_t size (rejects negative numbers, accepts suffixes).
> 
> OptsVisitor adds special list syntax.  It's used with untyped values.
> 
> Bypassing OptsVisitor risks adding different special syntax.  Doesn't
> mean it's a bad idea, only that we need to keep close watch on what it
> does to the language.  See below.

FWIW, when I first started attacking this problem I actually went down
the path of extending the OptsVisitor to cope with arbitrarily nested
structs + lists. I quickly discovered that special list syntax supported
by the OptsVisitor. I tried to hack support for nested structs + lists
on top of that, but ultimately the way the special list syntax is designed
makes that an impossible problem without breaking back compat, or having
OptsVisitor support two completely different lists syntaxes at the same
time with ambiguous parsing results. None the less I did implement that
all and it was a huge amount of work. I then took a look at QMPInputVisitor
and realized that if we could converts a QemuOpts into a QDict, we could
just reuse the QMPInputVisitor.

In the specific case of -object the fact that QMPInputVisitor does not
support the special list syntax from OptsVisitor is not a problem because
we don't have any existing user defined object type that uses list props
yet. So if we want to change -object from OptsVisitor to QMPInputVisitor
the sooner we change it the better - it is only a matter of time before
something comes along that depends on the existing special list syntax
and then we'll be locked into OptsVisitor's non-extensible approach.

> >> Would be creatable via the CLI now using
> >> 
> >> $QEMU \
> >>   -object demo,id=demo0,\
> >>   foo.0.bar=one,foo.0.wizz=1,\
> >>   foo.1.bar=two,foo.1.wizz=2
> 
> Okay, this adds a bare minimum of syntax, and it's not even new: we use
> similar syntax for block stuff already.

Yes, indeed that usage by the block layer is what motivated me to ditch
OptsVisitor and take the approach of converting QemuOpts to QDict and
then using QMPInputVisitor.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-block] [Qemu-devel] [PATCH 0/4] Tweaks around virtio-blk start/stop

2016-03-22 Thread Paolo Bonzini


On 22/03/2016 10:07, Cornelia Huck wrote:
> So far, we had the best results with my refactoring + the mutex/bh
> change. Two problems:
> 
> - We don't really understand yet why my refactoring helps, but passing
> the assign value is a good canditate (and it's agreed that this fixes a
> bug, I think.)
> - There's some problem with the bh, if I understood Stefan correctly.

They can be fixed with just an extra object_ref/object_unref.

I didn't understand that Tu Bo also needed the BH fix, and with that
information it makes sense.  Passing the assign value ensures that
ioeventfd remains always assigned.  With the CPU threads out of the
picture, the BH becomes enough to make everything thread-safe.

Paolo



Re: [Qemu-block] [Qemu-devel] [PATCH v3 03/10] qom: support arbitrary non-scalar properties with -object

2016-03-22 Thread Markus Armbruster
Eric Blake  writes:

> On 03/10/2016 11:59 AM, Daniel P. Berrange wrote:
>> The current -object command line syntax only allows for
>> creation of objects with scalar properties, or a list
>> with a fixed scalar element type. Objects which have
>> properties that are represented as structs in the QAPI
>> schema cannot be created using -object.
>> 
>> This is a design limitation of the way the OptsVisitor
>> is written. It simply iterates over the QemuOpts values
>> as a flat list. The support for lists is enabled by
>> allowing the same key to be repeated in the opts string.
>> 
>> It is not practical to extend the OptsVisitor to support
>> more complex data structures while also maintaining
>> the existing list handling behaviour that is relied upon
>> by other areas of QEMU.
>
> Zoltán Kővágó tried earlier with his GSoC patches for the audio
> subsystem last year, but those got stalled waiting for qapi enhancements
> to go in.

Yet another series stalled on the big QAPI rework.  Hitting a GSoC
student that way is really unfortunate.

>But I think your approach is indeed a bit nicer (rather than
> making the warty OptsVisitor even wartier, just avoid it).

QemuOpts defines an important part of the command line language, namely
(most of) the syntax of many option arguments.  It parses them into a
set of (name, value) pairs.

"Most of", because additional syntax may hide in the parameter value.

Parameter values are typed, except when they aren't.  Types are limited
to string, bool, uint64_t number (accepts negative numbers and casts
them) and uint64_t size (rejects negative numbers, accepts suffixes).

OptsVisitor adds special list syntax.  It's used with untyped values.

Bypassing OptsVisitor risks adding different special syntax.  Doesn't
mean it's a bad idea, only that we need to keep close watch on what it
does to the language.  See below.

I merely scratched the surprising amount of complexity that has accrued
over time.  If you doubt me, study how the merge_lists flag works, and
how it interacts with the several ways we do defaults.

The gap between QemuOpts' and QAPI's type system is obvious.  But it's
not just a gap, it's also contraditions: QAPI does only *signed*
integers.

Nevertheless, the path from command line to QAPI is through QemuOpts for
now.

Aside: for historical reasons, we have a few QMP commands that detour
through QemuOpts.  Mistakes, do not add more.

Ceterum censeo QemuOpts esse delendam.  And I say that as author of 35
out of 117 commits touching qemu-option.c.

>> Fortunately there is no existing object that implements
>> the UserCreatable interface that relies on the list
>> handling behaviour, so it is possible to swap out the
>> OptsVisitor for a different visitor implementation, so
>> -object supports non-scalar properties, thus leaving
>> other users of OptsVisitor unaffected.
>> 
>> The previously added qdict_crumple() method is able to
>> take a qdict containing a flat set of properties and
>> turn that into a arbitrarily nested set of dicts and
>> lists. By combining qemu_opts_to_qdict and qdict_crumple()
>> together, we can turn the opt string into a data structure
>> that is practically identical to that passed over QMP
>> when defining an object. The only difference is that all
>> the scalar values are represented as strings, rather than
>> strings, ints and bools. This is sufficient to let us
>> replace the OptsVisitor with the QMPInputVisitor for
>> use with -object.
>
> Indeed, nice replacement.
>
>> 
>> Thus -object can now support non-scalar properties,
>> for example the QMP object
>> 
>>   {
>> "execute": "object-add",
>> "arguments": {
>>   "qom-type": "demo",
>>   "id": "demo0",
>>   "parameters": {
>> "foo": [
>>{ "bar": "one", "wizz": "1" },
>>{ "bar": "two", "wizz": "2" }
>> ]
>>   }
>> }
>>   }
>> 
>> Would be creatable via the CLI now using
>> 
>> $QEMU \
>>   -object demo,id=demo0,\
>>   foo.0.bar=one,foo.0.wizz=1,\
>>   foo.1.bar=two,foo.1.wizz=2

Okay, this adds a bare minimum of syntax, and it's not even new: we use
similar syntax for block stuff already.

>> This is also wired up to work for the 'object_add' command
>> in the HMP monitor with the same syntax.
>> 
>>   (hmp) object_add demo,id=demo0,\
>>foo.0.bar=one,foo.0.wizz=1,\
>> foo.1.bar=two,foo.1.wizz=2
>
> Maybe mention that the indentation is not actually present in the real
> command lines typed.
>
>> 
>> Signed-off-by: Daniel P. Berrange 
>> ---
>>  hmp.c  |  18 +--
>>  qom/object_interfaces.c|  20 ++-
>>  tests/check-qom-proplist.c | 295 
>> -
>>  3 files changed, 313 insertions(+), 20 deletions(-)
>> 
>
>> @@ -120,6 +120,7 @@ Object *user_creatable_add_type(const char *type, const 
>> char *id,
>>  obj = object_new(type);
>>  if 

Re: [Qemu-block] [Qemu-devel] [PATCH 0/4] Tweaks around virtio-blk start/stop

2016-03-22 Thread Cornelia Huck
(re-adding cc:s)

On Tue, 22 Mar 2016 15:18:05 +0800
Fam Zheng  wrote:

> On Tue, 03/22 15:10, tu bo wrote:
> > Hi Fam:
> > 
> > On 03/21/2016 06:57 PM, Fam Zheng wrote:
> > >On Thu, 03/17 19:03, tu bo wrote:
> > >>
> > >>On 03/17/2016 08:39 AM, Fam Zheng wrote:
> > >>>On Wed, 03/16 14:45, Paolo Bonzini wrote:
> > 
> > 
> > On 16/03/2016 14:38, Christian Borntraeger wrote:
> > >>If you just remove the calls to virtio_queue_host_notifier_read, here
> > >>and in virtio_queue_aio_set_host_notifier_fd_handler, does it work
> > >>(keeping patches 2-4 in)?
> > >
> > >With these changes and patch 2-4 it does no longer locks up.
> > >I keep it running some hour to check if a crash happens.
> > >
> > >Tu Bo, your setup is currently better suited for reproducing. Can you 
> > >also check?
> > 
> > Great, I'll prepare a patch to virtio then sketching the solution that
> > Conny agreed with.
> > 
> > While Fam and I agreed that patch 1 is not required, I'm not sure if the
> > mutex is necessary in the end.
> > >>>
> > >>>If we can fix this from the virtio_queue_host_notifier_read side, the 
> > >>>mutex/BH
> > >>>are not necessary; but OTOH the mutex does catch such bugs, so maybe 
> > >>>it's good
> > >>>to have it. I'm not sure about the BH.
> > >>>
> > >>>And on a hindsight I realize we don't want patches 2-3 too. Actually the
> > >>>begin/end pair won't work as expected because of the blk_set_aio_context.
> > >>>
> > >>>Let's hold on this series.
> > >>>
> > 
> > So if Tu Bo can check without the virtio_queue_host_notifier_read calls,
> > and both with/without Fam's patches, it would be great.
> > >>>
> > >>>Tu Bo, only with/withoug patch 4, if you want to check. Sorry for the 
> > >>>noise.
> > >>>
> > >>1. without the virtio_queue_host_notifier_read calls,  without patch 4
> > >>
> > >>crash happens very often,
> > >>
> > >>(gdb) bt
> > >>#0  bdrv_co_do_rw (opaque=0x0) at block/io.c:2172
> > >>#1  0x02aa165da37e in coroutine_trampoline (i0=,
> > >>i1=1812051552) at util/coroutine-ucontext.c:79
> > >>#2  0x03ff7dd5150a in __makecontext_ret () from /lib64/libc.so.6
> > >>
> > >>
> > >>2. without the virtio_queue_host_notifier_read calls, with patch 4
> > >>
> > >>crash happens very often,
> > >>
> > >>(gdb) bt
> > >>#0  bdrv_co_do_rw (opaque=0x0) at block/io.c:2172
> > >>#1  0x02aa39dda43e in coroutine_trampoline (i0=,
> > >>i1=-1677715600) at util/coroutine-ucontext.c:79
> > >>#2  0x03ffab6d150a in __makecontext_ret () from /lib64/libc.so.6
> > >>
> > >>
> > >
> > >Tu Bo,
> > >
> > >Could you help test this patch (on top of master, without patch 4)?
> > >
> > >diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > >index 08275a9..47f8043 100644
> > >--- a/hw/virtio/virtio.c
> > >+++ b/hw/virtio/virtio.c
> > >@@ -1098,7 +1098,14 @@ void virtio_queue_notify_vq(VirtQueue *vq)
> > >
> > >  void virtio_queue_notify(VirtIODevice *vdev, int n)
> > >  {
> > >-virtio_queue_notify_vq(>vq[n]);
> > >+VirtQueue *vq = >vq[n];
> > >+EventNotifier *n;
> > >+n = virtio_queue_get_host_notifier(vq);
> > >+if (n) {
> > >+event_notifier_set(n);
> > >+} else {
> > >+virtio_queue_notify_vq(vq);
> > >+}
> > >  }
> > >
> > >  uint16_t virtio_queue_vector(VirtIODevice *vdev, int n)
> > >
> > >
> > 
> > I got a build error as below,
> > 
> > /BUILD/qemu-2.5.50/hw/virtio/virtio.c: In function 'virtio_queue_notify':
> > /BUILD/qemu-2.5.50/hw/virtio/virtio.c:1102:20: error: 'n' redeclared
> > as different kind of symbol
> >  EventNotifier *n;
> > ^
> > /BUILD/qemu-2.5.50/hw/virtio/virtio.c:1099:50: note: previous
> > definition of 'n' was here
> >  void virtio_queue_notify(VirtIODevice *vdev, int n)
> > 
> > 
> > Then I did some change for your patch as below,
> > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > index 08275a9..a10da39 100644
> > --- a/hw/virtio/virtio.c
> > +++ b/hw/virtio/virtio.c
> > @@ -1098,7 +1098,14 @@ void virtio_queue_notify_vq(VirtQueue *vq)
> > 
> >  void virtio_queue_notify(VirtIODevice *vdev, int n)
> >  {
> > -virtio_queue_notify_vq(>vq[n]);
> > +VirtQueue *vq = >vq[n];
> > +EventNotifier *en;
> > +en = virtio_queue_get_host_notifier(vq);
> > +if (en) {
> > +event_notifier_set(en);
> > +} else {
> > +virtio_queue_notify_vq(vq);
> > +}
> >  }
> > 
> >  uint16_t virtio_queue_vector(VirtIODevice *vdev, int n)
> > 
> > With qemu master + modified patch above(without patch 4, without
> > Conny's patches), I did NOT get crash so far.  thanks
> 
> Yes, it was a mistake.  Thanks for the testing!

I'm wondering what we learn from this. Bypassing notify_vq() removes
the race, but that's not the solution here.

So far, we had the best results with my refactoring + the mutex/bh
change. Two problems:

- We don't really understand yet why my refactoring helps, but passing
the assign value is 

Re: [Qemu-block] [Qemu-devel] [PATCH 0/4] Tweaks around virtio-blk start/stop

2016-03-22 Thread Cornelia Huck
On Tue, 22 Mar 2016 07:45:19 +0800
Fam Zheng  wrote:

> On Mon, 03/21 14:02, Cornelia Huck wrote:
> > On Mon, 21 Mar 2016 20:45:27 +0800
> > Fam Zheng  wrote:
> > 
> > > On Mon, 03/21 12:15, Cornelia Huck wrote:
> > > > On Mon, 21 Mar 2016 18:57:18 +0800
> > > > Fam Zheng  wrote:
> > > > 
> > > > > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > > > > index 08275a9..47f8043 100644
> > > > > --- a/hw/virtio/virtio.c
> > > > > +++ b/hw/virtio/virtio.c
> > > > > @@ -1098,7 +1098,14 @@ void virtio_queue_notify_vq(VirtQueue *vq)
> > > > > 
> > > > >  void virtio_queue_notify(VirtIODevice *vdev, int n)
> > > > >  {
> > > > > -virtio_queue_notify_vq(>vq[n]);
> > > > > +VirtQueue *vq = >vq[n];
> > > > > +EventNotifier *n;
> > > > > +n = virtio_queue_get_host_notifier(vq);
> > > > > +if (n) {
> > > > 
> > > > Isn't that always true, even if the notifier has not been setup?
> > > 
> > > You are right, this doesn't make a correct fix. But we can still do a 
> > > quick
> > > test with this as the else branch should never be used with ioeventfd=on. 
> > > Am I
> > > right?
> > > 
> > > Fam
> > 
> > Won't we come through here for the very first kick, when we haven't
> > registered the ioeventfd with the kernel yet?
> > 
> 
> The ioeventfd in virtio-ccw is registered in the main loop when
> VIRTIO_CONFIG_S_DRIVER_OK is set, so I think the first kick is okay.

You're right, for well-behaved drivers this will be done from
set_status, so for testing that's fine.