[Qemu-block] [PATCH] virtio: introduce `info virtio' hmp command

2017-09-26 Thread Jan Dakinevich
The command is intended for exposing device specific virtio feature bits
and their negotiation status. It is convenient and useful for debug
purpose.

Names of features are taken from a devices via get_feature_name() within
VirtioDeviceClass. If certain device doesn't implement it, the command
will print only hexadecimal value of feature mask.

Cc: Denis V. Lunev 
Signed-off-by: Jan Dakinevich 
---
The patch suggests an another approach for this:

"virtio: show guest features in 'info qtree'"
http://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg01609.html
---
 hmp-commands-info.hx   | 14 ++
 hw/block/virtio-blk.c  | 20 ++
 hw/char/virtio-serial-bus.c| 15 +++
 hw/display/virtio-gpu.c| 12 +
 hw/net/virtio-net.c| 34 
 hw/scsi/virtio-scsi.c  | 15 +++
 hw/virtio/virtio-balloon.c | 15 +++
 hw/virtio/virtio-bus.c | 59 ++
 include/hw/virtio/virtio-bus.h |  2 ++
 include/hw/virtio/virtio.h |  2 ++
 monitor.c  |  1 +
 11 files changed, 189 insertions(+)

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index 4f1ece9..2550027 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -868,6 +868,20 @@ ETEXI
 },
 
 STEXI
+@item info virtio
+@findex virtio
+Display guest and host fetures for all virtio devices.
+ETEXI
+
+{
+.name   = "virtio",
+.args_type  = "",
+.params = "",
+.help   = "show virtio info",
+.cmd= hmp_info_virtio,
+},
+
+STEXI
 @end table
 ETEXI
 
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 05d1440..5856640 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -1017,6 +1017,25 @@ static Property virtio_blk_properties[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
+static const char *virtio_blk_get_feature_name(VirtIODevice *vdev, int feature)
+{
+static const char *names[] = {
+[VIRTIO_BLK_F_BARRIER] = "barrier",
+[VIRTIO_BLK_F_SIZE_MAX] = "size_max",
+[VIRTIO_BLK_F_SEG_MAX] = "seg_max",
+[VIRTIO_BLK_F_RO] = "ro",
+[VIRTIO_BLK_F_BLK_SIZE] = "blk_size",
+[VIRTIO_BLK_F_SCSI] = "scsi",
+[VIRTIO_BLK_F_TOPOLOGY] = "topology",
+[VIRTIO_BLK_F_FLUSH] = "flush",
+[VIRTIO_BLK_F_MQ] = "mq",
+};
+if (feature >= ARRAY_SIZE(names)) {
+return NULL;
+}
+return names[feature];
+}
+
 static void virtio_blk_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
@@ -1030,6 +1049,7 @@ static void virtio_blk_class_init(ObjectClass *klass, 
void *data)
 vdc->get_config = virtio_blk_update_config;
 vdc->set_config = virtio_blk_set_config;
 vdc->get_features = virtio_blk_get_features;
+vdc->get_feature_name = virtio_blk_get_feature_name;
 vdc->set_status = virtio_blk_set_status;
 vdc->reset = virtio_blk_reset;
 vdc->save = virtio_blk_save_device;
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index 9470bd7..a6ccb6a 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -1156,6 +1156,20 @@ static Property virtio_serial_properties[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
+static const char *virtio_serial_get_feature_name(VirtIODevice *vdev,
+  int feature)
+{
+static const char *names[] = {
+[VIRTIO_CONSOLE_F_SIZE] = "size",
+[VIRTIO_CONSOLE_F_MULTIPORT] = "multiport",
+[VIRTIO_CONSOLE_F_EMERG_WRITE] = "emerg_write",
+};
+if (feature >= ARRAY_SIZE(names)) {
+return NULL;
+}
+return names[feature];
+}
+
 static void virtio_serial_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
@@ -1170,6 +1184,7 @@ static void virtio_serial_class_init(ObjectClass *klass, 
void *data)
 vdc->realize = virtio_serial_device_realize;
 vdc->unrealize = virtio_serial_device_unrealize;
 vdc->get_features = get_features;
+vdc->get_feature_name = virtio_serial_get_feature_name;
 vdc->get_config = get_config;
 vdc->set_config = set_config;
 vdc->set_status = set_status;
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 3a8f1e1..860d2b3 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1307,6 +1307,17 @@ static Property virtio_gpu_properties[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
+static const char *virtio_gpu_get_feature_name(VirtIODevice *vdev, int feature)
+{
+static const char *names[] = {
+[VIRTIO_GPU_F_VIRGL] = "virgl",
+};
+if (feature >= ARRAY_SIZE(names)) {
+return NULL;
+}
+return names[feature];
+}
+
 static void virtio_gpu_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
@@ -1317,6 

Re: [Qemu-block] [Qemu-devel] [PATCH v4 08/23] block: Switch bdrv_co_get_block_status() to byte-based

2017-09-26 Thread John Snow


On 09/13/2017 12:03 PM, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Convert another internal
> function (no semantic change); and as with its public counterpart,
> rename to bdrv_co_block_status() to make the compiler enforce that
> we catch all uses.  For now, we assert that callers still pass
> aligned data, but ultimately, this will be the function where we
> hand off to a byte-based driver callback, and will eventually need
> to add logic to ensure we round calls according to the driver's
> request_alignment then touch up the result handed back to the
> caller, to start permitting a caller to pass unaligned offsets.
> 
> Note that we are now prepared to accepts 'bytes' larger than INT_MAX;
> this is okay as long as we clamp things internally before violating
> any 32-bit limits, and makes no difference to how a client will
> use the information (clients looping over the entire file must
> already be prepared for consecutive calls to return the same status,
> as drivers are already free to return shorter-than-maximal status
> due to any other convenient split points, such as when the L2 table
> crosses cluster boundaries in qcow2).
> 
> Signed-off-by: Eric Blake 


Reviewed-by: John Snow 



Re: [Qemu-block] [Qemu-devel] [PATCH v4 09/23] block: Switch BdrvCoGetBlockStatusData to byte-based

2017-09-26 Thread John Snow


On 09/13/2017 12:03 PM, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Convert another internal
> type (no semantic change), and rename it to match the corresponding
> public function rename.
> 
> Signed-off-by: Eric Blake 
> Reviewed-by: Fam Zheng 

Reviewed-by: John Snow 



Re: [Qemu-block] [Qemu-devel] [PATCH v4 07/23] block: Convert bdrv_get_block_status() to bytes

2017-09-26 Thread Eric Blake
On 09/26/2017 02:39 PM, John Snow wrote:
>> -int64_t bdrv_get_block_status(BlockDriverState *bs,
>> -  int64_t sector_num,
>> -  int nb_sectors, int *pnum,
>> -  BlockDriverState **file)
>> +int64_t bdrv_block_status(BlockDriverState *bs,
>> +  int64_t offset, int64_t bytes, int64_t *pnum,
>> +  BlockDriverState **file)
>>  {
>> -return bdrv_get_block_status_above(bs, backing_bs(bs),
>> -   sector_num, nb_sectors, pnum, file);
>> +int64_t ret;
>> +int n;
>> +
>> +assert(QEMU_IS_ALIGNED(offset | bytes, BDRV_SECTOR_SIZE));
>> +bytes = MIN(bytes, BDRV_REQUEST_MAX_BYTES);
>> +ret = bdrv_get_block_status_above(bs, backing_bs(bs),
>> +  offset >> BDRV_SECTOR_BITS,
>> +  bytes >> BDRV_SECTOR_BITS, , file);
>> +if (pnum) {
>> +*pnum = n * BDRV_SECTOR_SIZE;
>> +}
> 
> Is it safe to truncate the request in the event that the caller did not
> provide a pnum target? that is, how will they know for what range we are
> answering?

Hmm. I think I have some rebase cruft here. At one point, I was playing
with the idea of allowing pnum == NULL for ALL get_status() callers,
similar to the existing block/vvfat.c:cluster_was_modified():

block/vvfat.c:res = bdrv_is_allocated(s->qcow->bs,
block/vvfat.c-(offset + i) *
BDRV_SECTOR_SIZE,
block/vvfat.c-
BDRV_SECTOR_SIZE, NULL);

but looking further, only bdrv_is_allocated() (and NOT
bdrv_[get_]block_status) is ever used in that manner.  Or, in terms of
the 'mapping' variable, a NULL pnum only makes sense when mapping ==
false.  So the conditional on 'if (pnum)' should be dropped here.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH v4 07/23] block: Convert bdrv_get_block_status() to bytes

2017-09-26 Thread John Snow


On 09/13/2017 12:03 PM, Eric Blake wrote:
> We are gradually moving away from sector-based interfaces, towards
> byte-based.  In the common case, allocation is unlikely to ever use
> values that are not naturally sector-aligned, but it is possible
> that byte-based values will let us be more precise about allocation
> at the end of an unaligned file that can do byte-based access.
> 
> Changing the name of the function from bdrv_get_block_status() to
> bdrv_block_status() ensures that the compiler enforces that all
> callers are updated.  For now, the io.c layer still assert()s that
> all callers are sector-aligned, but that can be relaxed when a later
> patch implements byte-based block status in the drivers.
> 
> Note that we have an inherent limitation in the BDRV_BLOCK_* return
> values: BDRV_BLOCK_OFFSET_VALID can only return the start of a
> sector, even if we later relax the interface to query for the status
> starting at an intermediate byte; document the obvious interpretation
> that valid offsets are always sector-relative.
> 
> Therefore, for the most part this patch is just the addition of scaling
> at the callers followed by inverse scaling at bdrv_block_status().  But
> some code, particularly bdrv_is_allocated(), gets a lot simpler because
> it no longer has to mess with sectors.
> 
> For ease of review, bdrv_get_block_status_above() will be tackled
> separately.
> 
> Signed-off-by: Eric Blake 
> 
> ---
> v3: clamp bytes to 32-bits, rather than asserting
> v2: rebase to earlier changes
> ---
>  include/block/block.h | 12 +++-
>  block/io.c| 31 +++
>  block/qcow2-cluster.c |  2 +-
>  qemu-img.c| 20 +++-
>  4 files changed, 38 insertions(+), 27 deletions(-)
> 
> diff --git a/include/block/block.h b/include/block/block.h
> index bb3b95d491..7a9a8db588 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -138,8 +138,10 @@ typedef struct HDGeometry {
>   *
>   * If BDRV_BLOCK_OFFSET_VALID is set, bits 9-62 (BDRV_BLOCK_OFFSET_MASK)
>   * represent the offset in the returned BDS that is allocated for the
> - * corresponding raw data; however, whether that offset actually contains
> - * data also depends on BDRV_BLOCK_DATA and BDRV_BLOCK_ZERO, as follows:
> + * corresponding raw data.  Individual bytes are at the same sector-relative
> + * locations (and thus, this bit cannot be set for mappings which are
> + * not equivalent modulo 512).  However, whether that offset actually
> + * contains data also depends on BDRV_BLOCK_DATA, as follows:
>   *
>   * DATA ZERO OFFSET_VALID
>   *  ttt   sectors read as zero, returned file is zero at 
> offset
> @@ -421,9 +423,9 @@ int bdrv_has_zero_init_1(BlockDriverState *bs);
>  int bdrv_has_zero_init(BlockDriverState *bs);
>  bool bdrv_unallocated_blocks_are_zero(BlockDriverState *bs);
>  bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
> -int64_t bdrv_get_block_status(BlockDriverState *bs, int64_t sector_num,
> -  int nb_sectors, int *pnum,
> -  BlockDriverState **file);
> +int64_t bdrv_block_status(BlockDriverState *bs, int64_t offset,
> +  int64_t bytes, int64_t *pnum,
> +  BlockDriverState **file);
>  int64_t bdrv_get_block_status_above(BlockDriverState *bs,
>  BlockDriverState *base,
>  int64_t sector_num,
> diff --git a/block/io.c b/block/io.c
> index 638b3890b7..1ed46bcece 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -695,7 +695,6 @@ int bdrv_make_zero(BdrvChild *child, BdrvRequestFlags 
> flags)
>  {
>  int64_t target_size, ret, bytes, offset = 0;
>  BlockDriverState *bs = child->bs;
> -int n; /* sectors */
> 
>  target_size = bdrv_getlength(bs);
>  if (target_size < 0) {
> @@ -707,24 +706,23 @@ int bdrv_make_zero(BdrvChild *child, BdrvRequestFlags 
> flags)
>  if (bytes <= 0) {
>  return 0;
>  }
> -ret = bdrv_get_block_status(bs, offset >> BDRV_SECTOR_BITS,
> -bytes >> BDRV_SECTOR_BITS, , NULL);
> +ret = bdrv_block_status(bs, offset, bytes, , NULL);
>  if (ret < 0) {
>  error_report("error getting block status at offset %" PRId64 ": 
> %s",
>   offset, strerror(-ret));
>  return ret;
>  }
>  if (ret & BDRV_BLOCK_ZERO) {
> -offset += n * BDRV_SECTOR_BITS;
> +offset += bytes;
>  continue;
>  }
> -ret = bdrv_pwrite_zeroes(child, offset, n * BDRV_SECTOR_SIZE, flags);
> +ret = bdrv_pwrite_zeroes(child, offset, bytes, flags);
>  if (ret < 0) {
>  error_report("error writing zeroes at offset %" PRId64 ": %s",
>   offset, strerror(-ret));
>  return ret;
>  

Re: [Qemu-block] [Qemu-devel] [PATCH v4 03/23] block: Make bdrv_round_to_clusters() signature more useful

2017-09-26 Thread John Snow


On 09/26/2017 03:18 PM, Eric Blake wrote:
> On 09/26/2017 01:51 PM, John Snow wrote:
>>
>>
>> On 09/13/2017 12:03 PM, Eric Blake wrote:
>>> In the process of converting sector-based interfaces to bytes,
>>> I'm finding it easier to represent a byte count as a 64-bit
>>> integer at the block layer (even if we are internally capped
>>> by SIZE_MAX or even INT_MAX for individual transactions, it's
>>> still nicer to not have to worry about truncation/overflow
>>> issues on as many variables).  Update the signature of
>>> bdrv_round_to_clusters() to uniformly use int64_t, matching
>>> the signature already chosen for bdrv_is_allocated and the
>>> fact that off_t is also a signed type, then adjust clients
>>> according to the required fallout.
>>>
>>> Signed-off-by: Eric Blake 
>>> Reviewed-by: Fam Zheng 
>>>
> 
>>> @@ -946,7 +946,7 @@ static int coroutine_fn 
>>> bdrv_co_do_copy_on_readv(BdrvChild *child,
>>>  struct iovec iov;
>>>  QEMUIOVector bounce_qiov;
>>>  int64_t cluster_offset;
>>> -unsigned int cluster_bytes;
>>> +int64_t cluster_bytes;
>>>  size_t skip_bytes;
>>>  int ret;
>>>
>>> @@ -967,6 +967,7 @@ static int coroutine_fn 
>>> bdrv_co_do_copy_on_readv(BdrvChild *child,
>>>  trace_bdrv_co_do_copy_on_readv(bs, offset, bytes,
>>> cluster_offset, cluster_bytes);
>>>
>>> +assert(cluster_bytes < SIZE_MAX);
>>
>> later in this function, is there any real or imagined risk of
>> cluster_bytes exceeding INT_MAX when it's passed to
>> bdrv_co_do_pwrite_zeroes?
>>
>>>  iov.iov_len = cluster_bytes;
> 
> cluster_bytes is the input 'unsigned int bytes' rounded out to cluster

Ah, yes, we're probably not going to exceed that, you're right.

> boundaries, but where we know 'bytes <= BDRV_REQUEST_MAX_BYTES' (which
> is 2^31 - 511).  Still, I guess you are right that rounding to a cluster
> size could produce a larger value of exactly 2^31 (bigger than INT_MAX,
> but still fits in 32-bit unsigned int, so my assert was to make sure
> that truncating 64 bits to size_t iov.iov_len still works on 32-bit
> platforms).
> 
> In theory, I don't think we ever attempt an unaligned operation near
> 2^31 that would round up to INT_MAX overflow (if we can, that's a
> pre-existing bug that should be fixed separately).
> 
> Should I tighten the assertion to assert(cluster_bytes <=
> BDRV_REQUEST_MAX_BYTES), then see if I can come up with a case where we
> can violate that?
> 

*Only* if you think it's worth your time. You'd know better than me at
this point if this is remotely possible or not. Just a simple width
check that caught my eye.

(Gotta prove to everyone I'm reading these, right? :p)

>> Everything else looks obviously correct to me.
>>
> 



Re: [Qemu-block] [Qemu-devel] [PATCH v4 03/23] block: Make bdrv_round_to_clusters() signature more useful

2017-09-26 Thread Eric Blake
On 09/26/2017 01:51 PM, John Snow wrote:
> 
> 
> On 09/13/2017 12:03 PM, Eric Blake wrote:
>> In the process of converting sector-based interfaces to bytes,
>> I'm finding it easier to represent a byte count as a 64-bit
>> integer at the block layer (even if we are internally capped
>> by SIZE_MAX or even INT_MAX for individual transactions, it's
>> still nicer to not have to worry about truncation/overflow
>> issues on as many variables).  Update the signature of
>> bdrv_round_to_clusters() to uniformly use int64_t, matching
>> the signature already chosen for bdrv_is_allocated and the
>> fact that off_t is also a signed type, then adjust clients
>> according to the required fallout.
>>
>> Signed-off-by: Eric Blake 
>> Reviewed-by: Fam Zheng 
>>

>> @@ -946,7 +946,7 @@ static int coroutine_fn 
>> bdrv_co_do_copy_on_readv(BdrvChild *child,
>>  struct iovec iov;
>>  QEMUIOVector bounce_qiov;
>>  int64_t cluster_offset;
>> -unsigned int cluster_bytes;
>> +int64_t cluster_bytes;
>>  size_t skip_bytes;
>>  int ret;
>>
>> @@ -967,6 +967,7 @@ static int coroutine_fn 
>> bdrv_co_do_copy_on_readv(BdrvChild *child,
>>  trace_bdrv_co_do_copy_on_readv(bs, offset, bytes,
>> cluster_offset, cluster_bytes);
>>
>> +assert(cluster_bytes < SIZE_MAX);
> 
> later in this function, is there any real or imagined risk of
> cluster_bytes exceeding INT_MAX when it's passed to
> bdrv_co_do_pwrite_zeroes?
> 
>>  iov.iov_len = cluster_bytes;

cluster_bytes is the input 'unsigned int bytes' rounded out to cluster
boundaries, but where we know 'bytes <= BDRV_REQUEST_MAX_BYTES' (which
is 2^31 - 511).  Still, I guess you are right that rounding to a cluster
size could produce a larger value of exactly 2^31 (bigger than INT_MAX,
but still fits in 32-bit unsigned int, so my assert was to make sure
that truncating 64 bits to size_t iov.iov_len still works on 32-bit
platforms).

In theory, I don't think we ever attempt an unaligned operation near
2^31 that would round up to INT_MAX overflow (if we can, that's a
pre-existing bug that should be fixed separately).

Should I tighten the assertion to assert(cluster_bytes <=
BDRV_REQUEST_MAX_BYTES), then see if I can come up with a case where we
can violate that?

> Everything else looks obviously correct to me.
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH v4 06/23] qemu-img: Switch get_block_status() to byte-based

2017-09-26 Thread John Snow


On 09/13/2017 12:03 PM, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Continue by converting
> an internal function (no semantic change), and simplifying its
> caller accordingly.
> 
> Signed-off-by: Eric Blake 
> Reviewed-by: Fam Zheng 
Reviewed-by: John Snow 



Re: [Qemu-block] [Qemu-devel] [PATCH v4 05/23] block: Switch bdrv_make_zero() to byte-based

2017-09-26 Thread John Snow


On 09/13/2017 12:03 PM, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Change the internal
> loop iteration of zeroing a device to track by bytes instead of
> sectors (although we are still guaranteed that we iterate by steps
> that are sector-aligned).
> 
> Signed-off-by: Eric Blake 
> Reviewed-by: Fam Zheng 

Reviewed-by: John Snow 



Re: [Qemu-block] blockdev-commit design

2017-09-26 Thread Kevin Wolf
Am 26.09.2017 um 20:29 hat Eric Blake geschrieben:
> On 09/26/2017 12:59 PM, Kevin Wolf wrote:
> > This was the easy part. Then I started looking at the code and found a
> > few a bit more interesting questions:
> > 
> >   * The old block-commit command decides between an "actual" commit job
> > and the mirror-based active commit based on whether top is the
> > active layer.
> 
> And libvirt HAS to know whether it is requesting an active vs.
> intermediate commit job up front, because the two code paths have
> different expected signals for handling job completion (it is only
> active commit that reaches a ready point between phases, requiring
> further QMP commands to end the job).

This is a good point. If this isn't transparent and libvirt has to know
anyway which kind of commit job it is using, there is no point in hiding
the difference in the QMP command that starts the job.

> > We don't get an option for the active layer any more now, so this
> > isn't how things can work for blockdev-commit. We could probably
> > check whether top has a BlockBackend parent, but that's not really
> > what we're interested in anyway. Maybe the best we could do to
> > decide this automatically is to check whether there is any parent of
> > top that requires write permissions. If there is, we need active
> > commit, otherwise the "normal" one is good enough.
> > 
> > However, who says that the intentions of the user stay as we deduce
> > them at the start of the block job? Who says that the user doesn't
> > want to add a writable disk as a user of the node while the block
> > job is running?
> > 
> > The optimal solution to this would be that the commit filter node
> > responds to permission requests and switches between active and
> > "normal" commit mode. I'm not sure how hard this would be to
> > implement.
> > 
> > As long as we don't have the automatic switch, do we have to allow
> > the user to specify explicitly which mode they want instead of
> > automatically choosing one?
> 
> When committing one read-only image into another, you don't need the
> active mode.  On the other hand, committing a writeable image generally
> means you don't want to lose any data, even as further writes happen
> while the job is ongoing.  Does a "normal" mode commit make sense on a
> writeable image

No, it doesn't. If you start a "normal" mode commit, it wouldn't have
BLK_PERM_WRITE in its shared permissions, so it would fail to start if
there is a writer. And once it is started, adding a writer would fail.

The explicit option would be for the case where a "normal" mode commit
is possible to start, but you intend to add a writer later, so you want
to start an active commit even though the writer doesn't exist yet.

> (perhaps as a point-in-time operation: all data that was present when
> the job started gets written, but if we do a NEW write, we make sure
> to FIRST commit the old data into the backing file then do the write
> into the active layer, and mark that cluster as no longer needing
> commit), differently from an "active" mode commit (a write to the
> active layer dirties the cluster, and we make as many passes as
> necessary, possibly writing some clusters to the backing file multiple
> times, so that the backing file contains the contents at the point the
> job ends rather than starts).

This sounds a lot like what the backup job is to mirror, just applied to
commit. It's an interesting idea, but I'm not sure if it wouldn't be a
separate block job then, like backup is separate from mirror.

Or maybe backup actually becomes very similar to the mirror job once we
implement the active mirror which intercepts requests...

> With our existing active commit code, is there a way to do an
> intermediate style commit instead of an active commit (by passing the
> node name instead of the device name, even though it resolves to the
> same 'top' node)?

No, passing the node name and passing the device name both starts an
active commit job.

> Maybe an optional boolean is worth having, where we default to active
> if 'top' is writable and 'normal' otherwise; but can set the boolean
> to force 'normal' even on a writable, and where setting the boolean on
> something that is not writable is either a no-op or an error?

I think if we add an option, I would actually make it mandatory.

Another thought I had is that we could make blockdev-commit support only
'normal' and extend the blockdev-mirror command so that you can use that
to start an active commit. I don't think there is a lot of code in the
mirror job implementation that has special cases for active commit.

> > 
> >   * The 'backing-file' option (which specifies the new backing file
> > string for parents after the commit job completes) is currently not
> > allowed if top is the active layer. If we allow graph changes, this
> > doesn't seem to make sense to me because even if top 

Re: [Qemu-block] [Qemu-devel] [PATCH v4 04/23] qcow2: Switch is_zero_sectors() to byte-based

2017-09-26 Thread John Snow


On 09/13/2017 12:03 PM, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Convert another internal
> function (no semantic change), and rename it to is_zero() in the
> process.
> 
> Signed-off-by: Eric Blake 
> Reviewed-by: Fam Zheng 

Reviewed-by: John Snow 



Re: [Qemu-block] [Qemu-devel] [PATCH v4 03/23] block: Make bdrv_round_to_clusters() signature more useful

2017-09-26 Thread John Snow


On 09/13/2017 12:03 PM, Eric Blake wrote:
> In the process of converting sector-based interfaces to bytes,
> I'm finding it easier to represent a byte count as a 64-bit
> integer at the block layer (even if we are internally capped
> by SIZE_MAX or even INT_MAX for individual transactions, it's
> still nicer to not have to worry about truncation/overflow
> issues on as many variables).  Update the signature of
> bdrv_round_to_clusters() to uniformly use int64_t, matching
> the signature already chosen for bdrv_is_allocated and the
> fact that off_t is also a signed type, then adjust clients
> according to the required fallout.
> 
> Signed-off-by: Eric Blake 
> Reviewed-by: Fam Zheng 
> 
> ---
> v4: only context changes
> v3: no change
> v2: fix commit message [John], rebase to earlier changes, including
> mirror_clip_bytes() signature update
> ---
>  include/block/block.h | 4 ++--
>  block/io.c| 7 ---
>  block/mirror.c| 7 +++
>  block/trace-events| 2 +-
>  4 files changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/include/block/block.h b/include/block/block.h
> index 2ad18775af..bb3b95d491 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -475,9 +475,9 @@ int bdrv_get_flags(BlockDriverState *bs);
>  int bdrv_get_info(BlockDriverState *bs, BlockDriverInfo *bdi);
>  ImageInfoSpecific *bdrv_get_specific_info(BlockDriverState *bs);
>  void bdrv_round_to_clusters(BlockDriverState *bs,
> -int64_t offset, unsigned int bytes,
> +int64_t offset, int64_t bytes,
>  int64_t *cluster_offset,
> -unsigned int *cluster_bytes);
> +int64_t *cluster_bytes);
> 
>  const char *bdrv_get_encrypted_filename(BlockDriverState *bs);
>  void bdrv_get_backing_filename(BlockDriverState *bs,
> diff --git a/block/io.c b/block/io.c
> index 6509c804d4..b362b46e3d 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -446,9 +446,9 @@ static void mark_request_serialising(BdrvTrackedRequest 
> *req, uint64_t align)
>   * Round a region to cluster boundaries
>   */
>  void bdrv_round_to_clusters(BlockDriverState *bs,
> -int64_t offset, unsigned int bytes,
> +int64_t offset, int64_t bytes,
>  int64_t *cluster_offset,
> -unsigned int *cluster_bytes)
> +int64_t *cluster_bytes)
>  {
>  BlockDriverInfo bdi;
> 
> @@ -946,7 +946,7 @@ static int coroutine_fn 
> bdrv_co_do_copy_on_readv(BdrvChild *child,
>  struct iovec iov;
>  QEMUIOVector bounce_qiov;
>  int64_t cluster_offset;
> -unsigned int cluster_bytes;
> +int64_t cluster_bytes;
>  size_t skip_bytes;
>  int ret;
> 
> @@ -967,6 +967,7 @@ static int coroutine_fn 
> bdrv_co_do_copy_on_readv(BdrvChild *child,
>  trace_bdrv_co_do_copy_on_readv(bs, offset, bytes,
> cluster_offset, cluster_bytes);
> 
> +assert(cluster_bytes < SIZE_MAX);

later in this function, is there any real or imagined risk of
cluster_bytes exceeding INT_MAX when it's passed to
bdrv_co_do_pwrite_zeroes?

>  iov.iov_len = cluster_bytes;
>  iov.iov_base = bounce_buffer = qemu_try_blockalign(bs, iov.iov_len);
>  if (bounce_buffer == NULL) {
> diff --git a/block/mirror.c b/block/mirror.c
> index 032cfe91fa..67f45cec4e 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -190,10 +190,9 @@ static int mirror_cow_align(MirrorBlockJob *s, int64_t 
> *offset,
>  bool need_cow;
>  int ret = 0;
>  int64_t align_offset = *offset;
> -unsigned int align_bytes = *bytes;
> +int64_t align_bytes = *bytes;
>  int max_bytes = s->granularity * s->max_iov;
> 
> -assert(*bytes < INT_MAX);
>  need_cow = !test_bit(*offset / s->granularity, s->cow_bitmap);
>  need_cow |= !test_bit((*offset + *bytes - 1) / s->granularity,
>s->cow_bitmap);
> @@ -388,7 +387,7 @@ static uint64_t coroutine_fn 
> mirror_iteration(MirrorBlockJob *s)
>  while (nb_chunks > 0 && offset < s->bdev_length) {
>  int64_t ret;
>  int io_sectors;
> -unsigned int io_bytes;
> +int64_t io_bytes;
>  int64_t io_bytes_acct;
>  enum MirrorMethod {
>  MIRROR_METHOD_COPY,
> @@ -413,7 +412,7 @@ static uint64_t coroutine_fn 
> mirror_iteration(MirrorBlockJob *s)
>  io_bytes = s->granularity;
>  } else if (ret >= 0 && !(ret & BDRV_BLOCK_DATA)) {
>  int64_t target_offset;
> -unsigned int target_bytes;
> +int64_t target_bytes;
>  bdrv_round_to_clusters(blk_bs(s->target), offset, io_bytes,
> _offset, _bytes);
>  if (target_offset == offset &&
> diff --git a/block/trace-events b/block/trace-events
> 

Re: [Qemu-block] [Qemu-devel] [PATCH v4 02/23] block: Add flag to avoid wasted work in bdrv_is_allocated()

2017-09-26 Thread John Snow


On 09/13/2017 12:03 PM, Eric Blake wrote:
> Not all callers care about which BDS owns the mapping for a given
> range of the file.  In particular, bdrv_is_allocated() cares more
> about finding the largest run of allocated data from the guest
> perspective, whether or not that data is consecutive from the
> host perspective.  Therefore, doing subsequent refinements such
> as checking how much of the format-layer allocation also satisfies
> BDRV_BLOCK_ZERO at the protocol layer is wasted work - in the best
> case, it just costs extra CPU cycles during a single
> bdrv_is_allocated(), but in the worst case, it results in a smaller
> *pnum, and forces callers to iterate through more status probes when
> visiting the entire file for even more extra CPU cycles.
> 
> This patch only optimizes the block layer.  But subsequent patches
> will tweak the driver callback to be byte-based, and in the process,
> can also pass this hint through to the driver.
> 
> Signed-off-by: Eric Blake 
> 
> ---
> v4: only context changes
> v3: s/allocation/mapping/ and flip sense of bool
> v2: new patch
> ---
>  block/io.c | 52 ++--
>  1 file changed, 38 insertions(+), 14 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index f250029395..6509c804d4 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1709,6 +1709,7 @@ typedef struct BdrvCoGetBlockStatusData {
>  int nb_sectors;
>  int *pnum;
>  int64_t ret;
> +bool mapping;
>  bool done;
>  } BdrvCoGetBlockStatusData;
> 
> @@ -1743,6 +1744,11 @@ int64_t coroutine_fn 
> bdrv_co_get_block_status_from_backing(BlockDriverState *bs,
>   * Drivers not implementing the functionality are assumed to not support
>   * backing files, hence all their sectors are reported as allocated.
>   *
> + * If 'mapping' is true, the caller is querying for mapping purposes,
> + * and the result should include BDRV_BLOCK_OFFSET_VALID where
> + * possible; otherwise, the result may omit that bit particularly if
> + * it allows for a larger value in 'pnum'.
> + *
>   * If 'sector_num' is beyond the end of the disk image the return value is
>   * BDRV_BLOCK_EOF and 'pnum' is set to 0.
>   *
> @@ -1759,6 +1765,7 @@ int64_t coroutine_fn 
> bdrv_co_get_block_status_from_backing(BlockDriverState *bs,
>   * is allocated in.
>   */
>  static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
> + bool mapping,
>   int64_t sector_num,
>   int nb_sectors, int 
> *pnum,
>   BlockDriverState **file)
> @@ -1817,14 +1824,15 @@ static int64_t coroutine_fn 
> bdrv_co_get_block_status(BlockDriverState *bs,
> 
>  if (ret & BDRV_BLOCK_RAW) {
>  assert(ret & BDRV_BLOCK_OFFSET_VALID && local_file);
> -ret = bdrv_co_get_block_status(local_file, ret >> BDRV_SECTOR_BITS,
> +ret = bdrv_co_get_block_status(local_file, mapping,
> +   ret >> BDRV_SECTOR_BITS,
> *pnum, pnum, _file);
>  goto out;
>  }
> 
>  if (ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO)) {
>  ret |= BDRV_BLOCK_ALLOCATED;
> -} else {
> +} else if (mapping) {
>  if (bdrv_unallocated_blocks_are_zero(bs)) {
>  ret |= BDRV_BLOCK_ZERO;
>  } else if (bs->backing) {
> @@ -1836,12 +1844,13 @@ static int64_t coroutine_fn 
> bdrv_co_get_block_status(BlockDriverState *bs,
>  }
>  }
> 
> -if (local_file && local_file != bs &&
> +if (mapping && local_file && local_file != bs &&

Tentatively this looks OK to me, but I have to admit I'm a little shaky
on this portion because I've not really investigated this function too
much. I am at the very least convinced that when mapping is true that
the function is equivalent and that existing callers don't have their
behavior changed too much.

Benefit of the doubt:

Reviewed-by: John Snow 

>  (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
>  (ret & BDRV_BLOCK_OFFSET_VALID)) {
>  int file_pnum;
> 
> -ret2 = bdrv_co_get_block_status(local_file, ret >> BDRV_SECTOR_BITS,
> +ret2 = bdrv_co_get_block_status(local_file, mapping,
> +ret >> BDRV_SECTOR_BITS,
>  *pnum, _pnum, NULL);
>  if (ret2 >= 0) {
>  /* Ignore errors.  This is just providing extra information, it
> @@ -1876,6 +1885,7 @@ out:
> 
>  static int64_t coroutine_fn bdrv_co_get_block_status_above(BlockDriverState 
> *bs,
>  BlockDriverState *base,
> +bool mapping,
>  int64_t sector_num,
>  int nb_sectors,
>  int *pnum,
> @@ -1887,7 +1897,8 @@ static int64_t coroutine_fn 
> 

Re: [Qemu-block] blockdev-commit design

2017-09-26 Thread Eric Blake
On 09/26/2017 12:59 PM, Kevin Wolf wrote:
> Hi,
> 
> as the next step after my commit block job fixes, I'm trying to
> implement a new, clean version of the QMP command, which I'm calling
> blockdev-commit for consistency with all the other "modern" QMP
> commands.
> 
> I'll start with the schema that I have so far:
> 
> { 'command': 'blockdev-commit',
>   'data': { 'job-id': 'str', 'top': 'str', '*base': 'str'
> '*backing-file': 'str', '*speed': 'int',
> '*filter-node-name': 'str' } }

Seems okay at first glance, modulo your discussion below on active vs.
passive.

> 
> In comparison with the old command, the important changes are:
> 
>   * top/base are node names instead of file names.

I can agree to that.  Do we still allow device names to resolve into the
top-most node attached to the device? That matters for 'top', but not
for 'base'.

> 
>   * You don't need to specify the active layer any more (not the least
> because there could very well be more than one of them), but top
> becomes mandatory instead.

Libvirt should be fine with that.

> 
>   * top/base don't accept device (BlockBackend) names, so for
> consistency with other block jobs, job-id becomes mandatory.
> 
> Possible alternative: Accept device names and make them the default
> for job-id. This is more consistent with existing blockdev-*
> commands, but I consider BlockBackend names deprecated, so I prefer
> not adding them here.

Oh, you're answering my question above.  I'm okay if job-id is
mandatory, even if we allow the shortcut of a device name for 'top'
mapping to its attached active node.

> 
>   * filer-node-name is kept optional for now. Should we make it
> mandatory, too?
> 
> This was the easy part. Then I started looking at the code and found a
> few a bit more interesting questions:
> 
>   * The old block-commit command decides between an "actual" commit job
> and the mirror-based active commit based on whether top is the
> active layer.

And libvirt HAS to know whether it is requesting an active vs.
intermediate commit job up front, because the two code paths have
different expected signals for handling job completion (it is only
active commit that reaches a ready point between phases, requiring
further QMP commands to end the job).

> 
> We don't get an option for the active layer any more now, so this
> isn't how things can work for blockdev-commit. We could probably
> check whether top has a BlockBackend parent, but that's not really
> what we're interested in anyway. Maybe the best we could do to
> decide this automatically is to check whether there is any parent of
> top that requires write permissions. If there is, we need active
> commit, otherwise the "normal" one is good enough.
> 
> However, who says that the intentions of the user stay as we deduce
> them at the start of the block job? Who says that the user doesn't
> want to add a writable disk as a user of the node while the block
> job is running?
> 
> The optimal solution to this would be that the commit filter node
> responds to permission requests and switches between active and
> "normal" commit mode. I'm not sure how hard this would be to
> implement.
> 
> As long as we don't have the automatic switch, do we have to allow
> the user to specify explicitly which mode they want instead of
> automatically choosing one?

When committing one read-only image into another, you don't need the
active mode.  On the other hand, committing a writeable image generally
means you don't want to lose any data, even as further writes happen
while the job is ongoing.  Does a "normal" mode commit make sense on a
writeable image (perhaps as a point-in-time operation: all data that was
present when the job started gets written, but if we do a NEW write, we
make sure to FIRST commit the old data into the backing file then do the
write into the active layer, and mark that cluster as no longer needing
commit), differently from an "active" mode commit (a write to the active
layer dirties the cluster, and we make as many passes as necessary,
possibly writing some clusters to the backing file multiple times, so
that the backing file contains the contents at the point the job ends
rather than starts).

With our existing active commit code, is there a way to do an
intermediate style commit instead of an active commit (by passing the
node name instead of the device name, even though it resolves to the
same 'top' node)?

Maybe an optional boolean is worth having, where we default to active if
'top' is writable and 'normal' otherwise; but can set the boolean to
force 'normal' even on a writable, and where setting the boolean on
something that is not writable is either a no-op or an error?

> 
>   * The 'backing-file' option (which specifies the new backing file
> string for parents after the commit job completes) is currently not
> allowed if top 

Re: [Qemu-block] [PATCH v2 1/3] nbd-client: avoid read_reply_co entry if send failed

2017-09-26 Thread Eric Blake
CC: qemu-sta...@nongnu.org

On 08/29/2017 07:27 AM, Stefan Hajnoczi wrote:
> The following segfault is encountered if the NBD server closes the UNIX
> domain socket immediately after negotiation:
> 
>   Program terminated with signal SIGSEGV, Segmentation fault.
...
> Note this only happens with UNIX domain sockets on Linux.  It doesn't
> seem possible to reproduce this with TCP sockets.
> 
> Suggested-by: Paolo Bonzini 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  block/nbd-client.c | 25 +
>  1 file changed, 9 insertions(+), 16 deletions(-)
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-block] blockdev-commit design

2017-09-26 Thread Kevin Wolf
Hi,

as the next step after my commit block job fixes, I'm trying to
implement a new, clean version of the QMP command, which I'm calling
blockdev-commit for consistency with all the other "modern" QMP
commands.

I'll start with the schema that I have so far:

{ 'command': 'blockdev-commit',
  'data': { 'job-id': 'str', 'top': 'str', '*base': 'str'
'*backing-file': 'str', '*speed': 'int',
'*filter-node-name': 'str' } }

In comparison with the old command, the important changes are:

  * top/base are node names instead of file names.

  * You don't need to specify the active layer any more (not the least
because there could very well be more than one of them), but top
becomes mandatory instead.

  * top/base don't accept device (BlockBackend) names, so for
consistency with other block jobs, job-id becomes mandatory.

Possible alternative: Accept device names and make them the default
for job-id. This is more consistent with existing blockdev-*
commands, but I consider BlockBackend names deprecated, so I prefer
not adding them here.

  * filer-node-name is kept optional for now. Should we make it
mandatory, too?

This was the easy part. Then I started looking at the code and found a
few a bit more interesting questions:

  * The old block-commit command decides between an "actual" commit job
and the mirror-based active commit based on whether top is the
active layer.

We don't get an option for the active layer any more now, so this
isn't how things can work for blockdev-commit. We could probably
check whether top has a BlockBackend parent, but that's not really
what we're interested in anyway. Maybe the best we could do to
decide this automatically is to check whether there is any parent of
top that requires write permissions. If there is, we need active
commit, otherwise the "normal" one is good enough.

However, who says that the intentions of the user stay as we deduce
them at the start of the block job? Who says that the user doesn't
want to add a writable disk as a user of the node while the block
job is running?

The optimal solution to this would be that the commit filter node
responds to permission requests and switches between active and
"normal" commit mode. I'm not sure how hard this would be to
implement.

As long as we don't have the automatic switch, do we have to allow
the user to specify explicitly which mode they want instead of
automatically choosing one?

  * The 'backing-file' option (which specifies the new backing file
string for parents after the commit job completes) is currently not
allowed if top is the active layer. If we allow graph changes, this
doesn't seem to make sense to me because even if top doesn't have a
parent node when the job starts, it could have one when it's
completed.

Any opinions on this, especially the active/normal commit thing?

Kevin



Re: [Qemu-block] [PATCH 2/5] commit: Support multiple roots above top node

2017-09-26 Thread Kevin Wolf
Am 25.09.2017 um 21:38 hat Eric Blake geschrieben:
> On 09/25/2017 07:28 AM, Kevin Wolf wrote:
> > This changes the commit block job to support operation in a graph where
> > there is more than a single active layer that references the top node.
> > 
> > This involves inserting the commit filter node not only on the path
> > between the given active node and the top node, but between the top node
> > and all of its parents.
> > 
> > On completion, bdrv_drop_intermediate() must consider all parents for
> > updating the backing file link. These parents may be backing files
> > themselves and such read-only; reopen them temporarily if necessary.
> 
> s/such/as such/

Yes, thanks.

> > Previously this was achieved by the bdrv_reopen() calls in the commit
> > block job that made overlay_bs read-write for the whole duration of the
> > block job, even though write access is only needed on completion.
> 
> Do we know, at the start of the operation, enough to reject attempts
> that will eventually fail because we cannot obtain write access at
> completion, without having to wait for all the intermediate work before
> the completion phase is reached?  If not, that might be a bit of a
> regression (where a command used to fail up front, we now waste a lot of
> effort, and possibly modify the backing chain irreversably, before
> finding out we still fail because we lack write access).

It is kind of a change in behaviour, but I wouldn't call it a
regression. The reason is that today I'm looking at it with a completely
different perspective than when the command was added.

Originally, we assumed a more or less static block node graph, which in
fact was only a mostly degenerated tree - a backing file chain. Block
jobs were kind of an exceptional thing and we allowed only one per
chain. In this restricted setting, checking at the start of the
operation made sense because we wouldn't allow things to change while
the job was running anyway.

A big part of the recent blockdev work, however, is about getting rid of
arbitrary restrictions like this. We want to allow running two commit
operations in the same backing chain as long as they don't conflict. We
don't want to prevent adding new users to an existing node unless there
is a good reason.

If you consider this, the set of nodes that will get their backing file
rewritten at the end of the block job isn't known yet when the job
starts; and even if we knew it, keeping the nodes writable all the time
while the job runs feels like it could easily conflict with other
callers of bdrv_reopen().

> > Now that we consider all parents, overlay_bs is meaningless. It is left
> > in place in this commit, but we'll remove it soon.
> > 
> > Signed-off-by: Kevin Wolf 
> > ---
> >  block.c| 68 
> > ++
> >  block/commit.c |  2 +-
> >  2 files changed, 41 insertions(+), 29 deletions(-)
> >  
> >  /* success - we can delete the intermediate states, and link top->base 
> > */
> > -if (new_top_bs->backing->role->update_filename) {
> > -backing_file_str = backing_file_str ? backing_file_str : 
> > base->filename;
> > -ret = 
> > new_top_bs->backing->role->update_filename(new_top_bs->backing,
> > - base, 
> > backing_file_str,
> > - _err);
> > -if (ret < 0) {
> > -bdrv_set_backing_hd(new_top_bs, top, _abort);
> > +backing_file_str = backing_file_str ? backing_file_str : 
> > base->filename;
> > +
> > +QLIST_FOREACH_SAFE(c, >parents, next_parent, next) {
> > +/* Check whether we are allowed to switch c from top to base */
> > +GSList *ignore_children = g_slist_prepend(NULL, c);
> > +bdrv_check_update_perm(base, NULL, c->perm, c->shared_perm,
> > +   ignore_children, _err);
> > +if (local_err) {
> > +ret = -EPERM;
> > +error_report_err(local_err);
> >  goto exit;
> >  }
> > -}
> > +g_slist_free(ignore_children);
> >  
> 
> And it looks like you DO check up front that permissions are sufficient
> for later on.

This is already during completion. The completion code uses the
transactional nature of permission updates to make sure that the
in-memory graph stays consistent with the on-disk backing file links
even in case of errors, but it's not really upfront in the sense of
checking when the block job is started. (You also need to hold the BQL
between starting and completing a permission change transaction, so just
moving this code wouldn't work.)

Kevin


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH] virtio: introduce `info virtio' hmp command

2017-09-26 Thread Kevin Wolf
Am 26.09.2017 um 18:13 hat Jan Dakinevich geschrieben:
> The command is intended for exposing device specific virtio feature bits
> and their negotiation status. It is convenient and useful for debug
> purpose.
> 
> Names of features are taken from a devices via get_feature_name() within
> VirtioDeviceClass. If certain device doesn't implement it, the command
> will print only hexadecimal value of feature mask.
> 
> Cc: Denis V. Lunev 
> Signed-off-by: Jan Dakinevich 
> ---
> The patch suggests an another approach for this:
> 
> "virtio: show guest features in 'info qtree'"
> http://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg01609.html

Actually, I think the original approach is much nicer because it uses
generic infrastructure rather than introducing a one-off monitor command
that is specific to a certain class of devices.

But at the same time I can see Cornelia's point that this would create a
whole lot of properties and if we did the same thing consistently for
all devices, the output of 'info qtree' would be completely cluttered
up.

Maybe it would make sense to have properties that can be queried with
QOM, but that don't show up in 'info qtree'? If we then add some HMP
'qom-get' command that allows dumping a whole device as a QOM object
instead of having to query property by property, it should be quite
convenient to use.

Kevin



Re: [Qemu-block] [Qemu-devel] [PULL 08/24] iotests: Print full path of bad output if mismatch

2017-09-26 Thread Kevin Wolf
Am 26.09.2017 um 16:56 hat Eric Blake geschrieben:
> On 09/26/2017 09:21 AM, Kevin Wolf wrote:
> > From: Fam Zheng 
> > 
> > So it is easier to copy paste the path.
> > 
> > Signed-off-by: Fam Zheng 
> > Signed-off-by: Kevin Wolf 
> > ---
> >  tests/qemu-iotests/check | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
> > index d504b6e455..4583a0c269 100755
> > --- a/tests/qemu-iotests/check
> > +++ b/tests/qemu-iotests/check
> > @@ -353,7 +353,7 @@ do
> >  else
> >  echo " - output mismatch (see $seq.out.bad)"
> >  mv $tmp.out $seq.out.bad
> > -$diff -w "$reference" $seq.out.bad
> > +$diff -w "$reference" $(realpath $seq.out.bad)
> 
> This missed my review comments about being unsafe if $PWD contains spaces:
> https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg04241.html

Sorry for missing this. Please send a follow-up patch to fix it.

Kevin


signature.asc
Description: PGP signature


Re: [Qemu-block] [Qemu-devel] [PULL 08/24] iotests: Print full path of bad output if mismatch

2017-09-26 Thread Eric Blake
On 09/26/2017 09:21 AM, Kevin Wolf wrote:
> From: Fam Zheng 
> 
> So it is easier to copy paste the path.
> 
> Signed-off-by: Fam Zheng 
> Signed-off-by: Kevin Wolf 
> ---
>  tests/qemu-iotests/check | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
> index d504b6e455..4583a0c269 100755
> --- a/tests/qemu-iotests/check
> +++ b/tests/qemu-iotests/check
> @@ -353,7 +353,7 @@ do
>  else
>  echo " - output mismatch (see $seq.out.bad)"
>  mv $tmp.out $seq.out.bad
> -$diff -w "$reference" $seq.out.bad
> +$diff -w "$reference" $(realpath $seq.out.bad)

This missed my review comments about being unsafe if $PWD contains spaces:
https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg04241.html

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-block] [PULL 23/24] qemu-iotests: add shrinking image test

2017-09-26 Thread Kevin Wolf
From: Pavel Butsykin 

Signed-off-by: Pavel Butsykin 
Reviewed-by: Max Reitz 
Reviewed-by: John Snow 
Message-id: 20170918124230.8152-5-pbutsy...@virtuozzo.com
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/163 | 170 +
 tests/qemu-iotests/163.out |   5 ++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 176 insertions(+)
 create mode 100644 tests/qemu-iotests/163
 create mode 100644 tests/qemu-iotests/163.out

diff --git a/tests/qemu-iotests/163 b/tests/qemu-iotests/163
new file mode 100644
index 00..403842354e
--- /dev/null
+++ b/tests/qemu-iotests/163
@@ -0,0 +1,170 @@
+#!/usr/bin/env python
+#
+# Tests for shrinking images
+#
+# Copyright (c) 2016-2017 Parallels International GmbH
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import os, random, iotests, struct, qcow2
+from iotests import qemu_img, qemu_io, image_size
+
+test_img = os.path.join(iotests.test_dir, 'test.img')
+check_img = os.path.join(iotests.test_dir, 'check.img')
+
+def size_to_int(str):
+suff = ['B', 'K', 'M', 'G', 'T']
+return int(str[:-1]) * 1024**suff.index(str[-1:])
+
+class ShrinkBaseClass(iotests.QMPTestCase):
+image_len = '128M'
+shrink_size = '10M'
+chunk_size = '16M'
+refcount_bits = '16'
+
+def __qcow2_check(self, filename):
+entry_bits = 3
+entry_size = 1 << entry_bits
+l1_mask = 0x00fffe00
+div_roundup = lambda n, d: (n + d - 1) / d
+
+def split_by_n(data, n):
+for x in xrange(0, len(data), n):
+yield struct.unpack('>Q', data[x:x + n])[0] & l1_mask
+
+def check_l1_table(h, l1_data):
+l1_list = list(split_by_n(l1_data, entry_size))
+real_l1_size = div_roundup(h.size,
+   1 << (h.cluster_bits*2 - entry_size))
+used, unused = l1_list[:real_l1_size], l1_list[real_l1_size:]
+
+self.assertTrue(len(used) != 0, "Verifying l1 table content")
+self.assertFalse(any(unused), "Verifying l1 table content")
+
+def check_reftable(fd, h, reftable):
+for offset in split_by_n(reftable, entry_size):
+if offset != 0:
+fd.seek(offset)
+cluster = fd.read(1 << h.cluster_bits)
+self.assertTrue(any(cluster), "Verifying reftable content")
+
+with open(filename, "rb") as fd:
+h = qcow2.QcowHeader(fd)
+
+fd.seek(h.l1_table_offset)
+l1_table = fd.read(h.l1_size << entry_bits)
+
+fd.seek(h.refcount_table_offset)
+reftable = fd.read(h.refcount_table_clusters << h.cluster_bits)
+
+check_l1_table(h, l1_table)
+check_reftable(fd, h, reftable)
+
+def __raw_check(self, filename):
+pass
+
+image_check = {
+'qcow2' : __qcow2_check,
+'raw' : __raw_check
+}
+
+def setUp(self):
+if iotests.imgfmt == 'raw':
+qemu_img('create', '-f', iotests.imgfmt, test_img, self.image_len)
+qemu_img('create', '-f', iotests.imgfmt, check_img,
+ self.shrink_size)
+else:
+qemu_img('create', '-f', iotests.imgfmt,
+ '-o', 'cluster_size=' + self.cluster_size +
+ ',refcount_bits=' + self.refcount_bits,
+ test_img, self.image_len)
+qemu_img('create', '-f', iotests.imgfmt,
+ '-o', 'cluster_size=%s'% self.cluster_size,
+ check_img, self.shrink_size)
+qemu_io('-c', 'write -P 0xff 0 ' + self.shrink_size, check_img)
+
+def tearDown(self):
+os.remove(test_img)
+os.remove(check_img)
+
+def image_verify(self):
+self.assertEqual(image_size(test_img), image_size(check_img),
+ "Verifying image size")
+self.image_check[iotests.imgfmt](self, test_img)
+
+if iotests.imgfmt == 'raw':
+return
+self.assertEqual(qemu_img('check', test_img), 0,
+ "Verifying image corruption")
+
+def test_empty_image(self):
+qemu_img('resize',  '-f', iotests.imgfmt, '--shrink', test_img,
+ 

Re: [Qemu-block] qemu-img: Check failed: No space left on device

2017-09-26 Thread Nicolas Ecarnot

Le 21/09/2017 à 16:31, Stefan Hajnoczi a écrit :

On Tue, Sep 19, 2017 at 12:09:06PM +0200, Nicolas Ecarnot wrote:

Hello,

First post here, so maybe I should introduce myself :
- I'm a sysadmin for decades and currently managing 4 oVirt clusters, made
out of tens of hypervisors, all are CentOS 7.2+ based.
- I'm very happy with this solution we choose especially because it is based
on qemu-kvm (open source, reliable, documented).

On one VM, we experienced the following :
- oVirt/vdsm is detecting an issue on the image
- following this hints https://access.redhat.com/solutions/1173623, I
managed to detect one error and fix it
- the VM is now running perfectly

On two other VMs, we experienced a similar situation, except the check stage
is showing something like 14000+ errors, and the relevant logs are :

Repairing refcount block 14 is outside image
ERROR could not resize image: Invalid argument
ERROR cluster 425984 refcount=0 reference=1
ERROR cluster 425985 refcount=0 reference=1
[... repeating the previous line 7000+ times...]
ERROR cluster 457166 refcount=0 reference=1
Rebuilding refcount structure
ERROR writing refblock: No space left on device
qemu-img: Check failed: No space left on device


Please run strace qemu-img info /the/relevant/logical/volume/path.  It
will print all the syscalls that qemu-img makes.  That way we'll be able
to verify that the ENOSPC error is coming from a pwritev syscall.


I did but I'm not skilled enough to ensure where the ENOSPC error is 
coming from.


Is your question meaning the reads and/or the writes may come from or go 
to places outside the expected boundaries?



You surely know that oVirt/RHEV is storing its qcow2 images in dedicated
logical volumes.

pvs/vgs/lvs are all showing there is plenty of space available, so I
understand that I don't understand what "No space left on device" means.


After you have the strace data you can look at the file offset from the
failing pwritev syscall and check that it's really within the LV.

I think there is no fancy thin provisioning going on at the LVM level
with oVirt, but if there is then perhaps a write within the LV could
still result in an ENOSPC error.  It would be worth confirming that
these are class "thick" LVs.


I think there is no such thin prov. at the LVM level, but I wouldn't swear.
Don't you mind if I forward your question to the oVirt mailing-list?

--
Nicolas ECARNOT



[Qemu-block] [PULL 14/24] block: Add reopen queue to bdrv_check_perm()

2017-09-26 Thread Kevin Wolf
In the context of bdrv_reopen(), we'll have to look at the state of the
graph as it will be after the reopen. This interface addition is in
preparation for the change.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
---
 block.c | 34 +++---
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/block.c b/block.c
index c7724c85e3..0b499fda4c 100644
--- a/block.c
+++ b/block.c
@@ -1531,7 +1531,8 @@ static int bdrv_fill_options(QDict **options, const char 
*filename,
 return 0;
 }
 
-static int bdrv_child_check_perm(BdrvChild *c, uint64_t perm, uint64_t shared,
+static int bdrv_child_check_perm(BdrvChild *c, BlockReopenQueue *q,
+ uint64_t perm, uint64_t shared,
  GSList *ignore_children, Error **errp);
 static void bdrv_child_abort_perm_update(BdrvChild *c);
 static void bdrv_child_set_perm(BdrvChild *c, uint64_t perm, uint64_t shared);
@@ -1562,7 +1563,8 @@ static void bdrv_child_perm(BlockDriverState *bs, 
BlockDriverState *child_bs,
  * A call to this function must always be followed by a call to bdrv_set_perm()
  * or bdrv_abort_perm_update().
  */
-static int bdrv_check_perm(BlockDriverState *bs, uint64_t cumulative_perms,
+static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
+   uint64_t cumulative_perms,
uint64_t cumulative_shared_perms,
GSList *ignore_children, Error **errp)
 {
@@ -1597,11 +1599,11 @@ static int bdrv_check_perm(BlockDriverState *bs, 
uint64_t cumulative_perms,
 /* Check all children */
 QLIST_FOREACH(c, >children, next) {
 uint64_t cur_perm, cur_shared;
-bdrv_child_perm(bs, c->bs, c, c->role, NULL,
+bdrv_child_perm(bs, c->bs, c, c->role, q,
 cumulative_perms, cumulative_shared_perms,
 _perm, _shared);
-ret = bdrv_child_check_perm(c, cur_perm, cur_shared, ignore_children,
-errp);
+ret = bdrv_child_check_perm(c, q, cur_perm, cur_shared,
+ignore_children, errp);
 if (ret < 0) {
 return ret;
 }
@@ -1727,7 +1729,8 @@ char *bdrv_perm_names(uint64_t perm)
  *
  * Needs to be followed by a call to either bdrv_set_perm() or
  * bdrv_abort_perm_update(). */
-static int bdrv_check_update_perm(BlockDriverState *bs, uint64_t new_used_perm,
+static int bdrv_check_update_perm(BlockDriverState *bs, BlockReopenQueue *q,
+  uint64_t new_used_perm,
   uint64_t new_shared_perm,
   GSList *ignore_children, Error **errp)
 {
@@ -1769,19 +1772,20 @@ static int bdrv_check_update_perm(BlockDriverState *bs, 
uint64_t new_used_perm,
 cumulative_shared_perms &= c->shared_perm;
 }
 
-return bdrv_check_perm(bs, cumulative_perms, cumulative_shared_perms,
+return bdrv_check_perm(bs, q, cumulative_perms, cumulative_shared_perms,
ignore_children, errp);
 }
 
 /* Needs to be followed by a call to either bdrv_child_set_perm() or
  * bdrv_child_abort_perm_update(). */
-static int bdrv_child_check_perm(BdrvChild *c, uint64_t perm, uint64_t shared,
+static int bdrv_child_check_perm(BdrvChild *c, BlockReopenQueue *q,
+ uint64_t perm, uint64_t shared,
  GSList *ignore_children, Error **errp)
 {
 int ret;
 
 ignore_children = g_slist_prepend(g_slist_copy(ignore_children), c);
-ret = bdrv_check_update_perm(c->bs, perm, shared, ignore_children, errp);
+ret = bdrv_check_update_perm(c->bs, q, perm, shared, ignore_children, 
errp);
 g_slist_free(ignore_children);
 
 return ret;
@@ -1809,7 +1813,7 @@ int bdrv_child_try_set_perm(BdrvChild *c, uint64_t perm, 
uint64_t shared,
 {
 int ret;
 
-ret = bdrv_child_check_perm(c, perm, shared, NULL, errp);
+ret = bdrv_child_check_perm(c, NULL, perm, shared, NULL, errp);
 if (ret < 0) {
 bdrv_child_abort_perm_update(c);
 return ret;
@@ -1950,7 +1954,7 @@ static void bdrv_replace_child(BdrvChild *child, 
BlockDriverState *new_bs)
  * because we're just taking a parent away, so we're loosening
  * restrictions. */
 bdrv_get_cumulative_perm(old_bs, , _perm);
-bdrv_check_perm(old_bs, perm, shared_perm, NULL, _abort);
+bdrv_check_perm(old_bs, NULL, perm, shared_perm, NULL, _abort);
 bdrv_set_perm(old_bs, perm, shared_perm);
 }
 
@@ -1969,7 +1973,7 @@ BdrvChild *bdrv_root_attach_child(BlockDriverState 
*child_bs,
 BdrvChild *child;
 int ret;
 
-ret = bdrv_check_update_perm(child_bs, perm, shared_perm, NULL, errp);
+ret = bdrv_check_update_perm(child_bs, NULL, perm, shared_perm, NULL, 
errp);
 if (ret < 0) {
 

[Qemu-block] [PULL 16/24] block: reopen: Queue children after their parents

2017-09-26 Thread Kevin Wolf
We will calculate the required new permissions in the prepare stage of a
reopen. Required permissions of children can be influenced by the
changes made to their parents, but parents are independent from their
children. This means that permissions need to be calculated top-down. In
order to achieve this, queue parents before their children rather than
queuing the children first.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
---
 block.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/block.c b/block.c
index ed8d51dd42..204cbb46c7 100644
--- a/block.c
+++ b/block.c
@@ -2768,6 +2768,19 @@ static BlockReopenQueue 
*bdrv_reopen_queue_child(BlockReopenQueue *bs_queue,
 flags |= BDRV_O_ALLOW_RDWR;
 }
 
+if (!bs_entry) {
+bs_entry = g_new0(BlockReopenQueueEntry, 1);
+QSIMPLEQ_INSERT_TAIL(bs_queue, bs_entry, entry);
+} else {
+QDECREF(bs_entry->state.options);
+QDECREF(bs_entry->state.explicit_options);
+}
+
+bs_entry->state.bs = bs;
+bs_entry->state.options = options;
+bs_entry->state.explicit_options = explicit_options;
+bs_entry->state.flags = flags;
+
 QLIST_FOREACH(child, >children, next) {
 QDict *new_child_options;
 char *child_key_dot;
@@ -2787,19 +2800,6 @@ static BlockReopenQueue 
*bdrv_reopen_queue_child(BlockReopenQueue *bs_queue,
 child->role, options, flags);
 }
 
-if (!bs_entry) {
-bs_entry = g_new0(BlockReopenQueueEntry, 1);
-QSIMPLEQ_INSERT_TAIL(bs_queue, bs_entry, entry);
-} else {
-QDECREF(bs_entry->state.options);
-QDECREF(bs_entry->state.explicit_options);
-}
-
-bs_entry->state.bs = bs;
-bs_entry->state.options = options;
-bs_entry->state.explicit_options = explicit_options;
-bs_entry->state.flags = flags;
-
 return bs_queue;
 }
 
-- 
2.13.5




[Qemu-block] [PULL 22/24] qcow2: add shrink image support

2017-09-26 Thread Kevin Wolf
From: Pavel Butsykin 

This patch add shrinking of the image file for qcow2. As a result, this allows
us to reduce the virtual image size and free up space on the disk without
copying the image. Image can be fragmented and shrink is done by punching holes
in the image file.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Max Reitz 
Reviewed-by: John Snow 
Message-id: 20170918124230.8152-4-pbutsy...@virtuozzo.com
Signed-off-by: Max Reitz 
---
 qapi/block-core.json   |   8 +++-
 block/qcow2.h  |  14 ++
 block/qcow2-cluster.c  |  50 +
 block/qcow2-refcount.c | 120 +
 block/qcow2.c  |  43 ++
 5 files changed, 225 insertions(+), 10 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index c69a395804..750bb0c77c 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2533,6 +2533,11 @@
 #
 # Trigger events supported by blkdebug.
 #
+# @l1_shrink_write_table:  write zeros to the l1 table to shrink image.
+#  (since 2.11)
+#
+# @l1_shrink_free_l2_clusters: discard the l2 tables. (since 2.11)
+#
 # Since: 2.9
 ##
 { 'enum': 'BlkdebugEvent', 'prefix': 'BLKDBG',
@@ -2549,7 +2554,8 @@
 'cluster_alloc_bytes', 'cluster_free', 'flush_to_os',
 'flush_to_disk', 'pwritev_rmw_head', 'pwritev_rmw_after_head',
 'pwritev_rmw_tail', 'pwritev_rmw_after_tail', 'pwritev',
-'pwritev_zero', 'pwritev_done', 'empty_image_prepare' ] }
+'pwritev_zero', 'pwritev_done', 'empty_image_prepare',
+'l1_shrink_write_table', 'l1_shrink_free_l2_clusters' ] }
 
 ##
 # @BlkdebugInjectErrorOptions:
diff --git a/block/qcow2.h b/block/qcow2.h
index 52c374e9ed..5a289a81e2 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -521,6 +521,18 @@ static inline uint64_t refcount_diff(uint64_t r1, uint64_t 
r2)
 return r1 > r2 ? r1 - r2 : r2 - r1;
 }
 
+static inline
+uint32_t offset_to_reftable_index(BDRVQcow2State *s, uint64_t offset)
+{
+return offset >> (s->refcount_block_bits + s->cluster_bits);
+}
+
+static inline uint64_t get_refblock_offset(BDRVQcow2State *s, uint64_t offset)
+{
+uint32_t index = offset_to_reftable_index(s, offset);
+return s->refcount_table[index] & REFT_OFFSET_MASK;
+}
+
 /* qcow2.c functions */
 int qcow2_backing_read1(BlockDriverState *bs, QEMUIOVector *qiov,
   int64_t sector_num, int nb_sectors);
@@ -584,10 +596,12 @@ int qcow2_inc_refcounts_imrt(BlockDriverState *bs, 
BdrvCheckResult *res,
 int qcow2_change_refcount_order(BlockDriverState *bs, int refcount_order,
 BlockDriverAmendStatusCB *status_cb,
 void *cb_opaque, Error **errp);
+int qcow2_shrink_reftable(BlockDriverState *bs);
 
 /* qcow2-cluster.c functions */
 int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
 bool exact_size);
+int qcow2_shrink_l1_table(BlockDriverState *bs, uint64_t max_size);
 int qcow2_write_l1_entry(BlockDriverState *bs, int l1_index);
 int qcow2_decompress_cluster(BlockDriverState *bs, uint64_t cluster_offset);
 int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t sector_num,
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 0d4824993c..d2518d1893 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -32,6 +32,56 @@
 #include "qemu/bswap.h"
 #include "trace.h"
 
+int qcow2_shrink_l1_table(BlockDriverState *bs, uint64_t exact_size)
+{
+BDRVQcow2State *s = bs->opaque;
+int new_l1_size, i, ret;
+
+if (exact_size >= s->l1_size) {
+return 0;
+}
+
+new_l1_size = exact_size;
+
+#ifdef DEBUG_ALLOC2
+fprintf(stderr, "shrink l1_table from %d to %d\n", s->l1_size, 
new_l1_size);
+#endif
+
+BLKDBG_EVENT(bs->file, BLKDBG_L1_SHRINK_WRITE_TABLE);
+ret = bdrv_pwrite_zeroes(bs->file, s->l1_table_offset +
+   new_l1_size * sizeof(uint64_t),
+ (s->l1_size - new_l1_size) * sizeof(uint64_t), 0);
+if (ret < 0) {
+goto fail;
+}
+
+ret = bdrv_flush(bs->file->bs);
+if (ret < 0) {
+goto fail;
+}
+
+BLKDBG_EVENT(bs->file, BLKDBG_L1_SHRINK_FREE_L2_CLUSTERS);
+for (i = s->l1_size - 1; i > new_l1_size - 1; i--) {
+if ((s->l1_table[i] & L1E_OFFSET_MASK) == 0) {
+continue;
+}
+qcow2_free_clusters(bs, s->l1_table[i] & L1E_OFFSET_MASK,
+s->cluster_size, QCOW2_DISCARD_ALWAYS);
+s->l1_table[i] = 0;
+}
+return 0;
+
+fail:
+/*
+ * If the write in the l1_table failed the image may contain a partially
+ * overwritten l1_table. In this case it would be better to clear the
+ * l1_table in memory to avoid possible image corruption.
+ */
+

[Qemu-block] [PULL 19/24] iotests: fix 181: enable postcopy-ram capability on target

2017-09-26 Thread Kevin Wolf
From: Vladimir Sementsov-Ogievskiy 

Migration capabilities should be enabled on both source and
destination qemu processes.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/181 | 2 ++
 tests/qemu-iotests/181.out | 1 -
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/181 b/tests/qemu-iotests/181
index 0333dda0e3..0c91e8f9de 100755
--- a/tests/qemu-iotests/181
+++ b/tests/qemu-iotests/181
@@ -93,7 +93,9 @@ echo
 
 # Slow down migration so much that it definitely won't finish before we can
 # switch to postcopy
+# Enable postcopy-ram capability both on source and destination
 silent=yes
+_send_qemu_cmd $dest 'migrate_set_capability postcopy-ram on' "(qemu)"
 _send_qemu_cmd $src 'migrate_set_speed 4k' "(qemu)"
 _send_qemu_cmd $src 'migrate_set_capability postcopy-ram on' "(qemu)"
 _send_qemu_cmd $src "migrate -d unix:${MIG_SOCKET}" "(qemu)"
diff --git a/tests/qemu-iotests/181.out b/tests/qemu-iotests/181.out
index 6534ba2a76..d58c6a9dab 100644
--- a/tests/qemu-iotests/181.out
+++ b/tests/qemu-iotests/181.out
@@ -20,7 +20,6 @@ read 65536/65536 bytes at offset 0
 
 === Do some I/O on the destination ===
 
-QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) qemu-io disk "read -P 0x55 0 64k"
 read 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-- 
2.13.5




[Qemu-block] [PULL 17/24] block: Fix permissions after bdrv_reopen()

2017-09-26 Thread Kevin Wolf
If we switch between read-only and read-write, the permissions that
image format drivers need on bs->file change, too. Make sure to update
the permissions during bdrv_reopen().

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
---
 include/block/block.h |  1 +
 block.c   | 64 +++
 2 files changed, 65 insertions(+)

diff --git a/include/block/block.h b/include/block/block.h
index 082eb2cd9c..3c3af462e4 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -166,6 +166,7 @@ typedef QSIMPLEQ_HEAD(BlockReopenQueue, 
BlockReopenQueueEntry) BlockReopenQueue;
 typedef struct BDRVReopenState {
 BlockDriverState *bs;
 int flags;
+uint64_t perm, shared_perm;
 QDict *options;
 QDict *explicit_options;
 void *opaque;
diff --git a/block.c b/block.c
index 204cbb46c7..5c65fac672 100644
--- a/block.c
+++ b/block.c
@@ -2781,6 +2781,10 @@ static BlockReopenQueue 
*bdrv_reopen_queue_child(BlockReopenQueue *bs_queue,
 bs_entry->state.explicit_options = explicit_options;
 bs_entry->state.flags = flags;
 
+/* This needs to be overwritten in bdrv_reopen_prepare() */
+bs_entry->state.perm = UINT64_MAX;
+bs_entry->state.shared_perm = 0;
+
 QLIST_FOREACH(child, >children, next) {
 QDict *new_child_options;
 char *child_key_dot;
@@ -2887,6 +2891,52 @@ int bdrv_reopen(BlockDriverState *bs, int bdrv_flags, 
Error **errp)
 return ret;
 }
 
+static BlockReopenQueueEntry *find_parent_in_reopen_queue(BlockReopenQueue *q,
+  BdrvChild *c)
+{
+BlockReopenQueueEntry *entry;
+
+QSIMPLEQ_FOREACH(entry, q, entry) {
+BlockDriverState *bs = entry->state.bs;
+BdrvChild *child;
+
+QLIST_FOREACH(child, >children, next) {
+if (child == c) {
+return entry;
+}
+}
+}
+
+return NULL;
+}
+
+static void bdrv_reopen_perm(BlockReopenQueue *q, BlockDriverState *bs,
+ uint64_t *perm, uint64_t *shared)
+{
+BdrvChild *c;
+BlockReopenQueueEntry *parent;
+uint64_t cumulative_perms = 0;
+uint64_t cumulative_shared_perms = BLK_PERM_ALL;
+
+QLIST_FOREACH(c, >parents, next_parent) {
+parent = find_parent_in_reopen_queue(q, c);
+if (!parent) {
+cumulative_perms |= c->perm;
+cumulative_shared_perms &= c->shared_perm;
+} else {
+uint64_t nperm, nshared;
+
+bdrv_child_perm(parent->state.bs, bs, c, c->role, q,
+parent->state.perm, parent->state.shared_perm,
+, );
+
+cumulative_perms |= nperm;
+cumulative_shared_perms &= nshared;
+}
+}
+*perm = cumulative_perms;
+*shared = cumulative_shared_perms;
+}
 
 /*
  * Prepares a BlockDriverState for reopen. All changes are staged in the
@@ -2952,6 +3002,9 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, 
BlockReopenQueue *queue,
 goto error;
 }
 
+/* Calculate required permissions after reopening */
+bdrv_reopen_perm(queue, reopen_state->bs,
+ _state->perm, _state->shared_perm);
 
 ret = bdrv_flush(reopen_state->bs);
 if (ret) {
@@ -3007,6 +3060,12 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, 
BlockReopenQueue *queue,
 } while ((entry = qdict_next(reopen_state->options, entry)));
 }
 
+ret = bdrv_check_perm(reopen_state->bs, queue, reopen_state->perm,
+  reopen_state->shared_perm, NULL, errp);
+if (ret < 0) {
+goto error;
+}
+
 ret = 0;
 
 error:
@@ -3047,6 +3106,9 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state)
 
 bdrv_refresh_limits(bs, NULL);
 
+bdrv_set_perm(reopen_state->bs, reopen_state->perm,
+  reopen_state->shared_perm);
+
 new_can_write =
 !bdrv_is_read_only(bs) && !(bdrv_get_flags(bs) & BDRV_O_INACTIVE);
 if (!old_can_write && new_can_write && drv->bdrv_reopen_bitmaps_rw) {
@@ -3080,6 +3142,8 @@ void bdrv_reopen_abort(BDRVReopenState *reopen_state)
 }
 
 QDECREF(reopen_state->explicit_options);
+
+bdrv_abort_perm_update(reopen_state->bs);
 }
 
 
-- 
2.13.5




[Qemu-block] [PULL 21/24] qcow2: add qcow2_cache_discard

2017-09-26 Thread Kevin Wolf
From: Pavel Butsykin 

Whenever l2/refcount table clusters are discarded from the file we can
automatically drop unnecessary content of the cache tables. This reduces
the chance of eviction useful cache data and eliminates inconsistent data
in the cache with the data in the file.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Max Reitz 
Reviewed-by: John Snow 
Message-id: 20170918124230.8152-3-pbutsy...@virtuozzo.com
Signed-off-by: Max Reitz 
---
 block/qcow2.h  |  3 +++
 block/qcow2-cache.c| 26 ++
 block/qcow2-refcount.c | 20 ++--
 3 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 96a8d43c17..52c374e9ed 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -649,6 +649,9 @@ int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, 
uint64_t offset,
 int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
 void **table);
 void qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
+void *qcow2_cache_is_table_offset(BlockDriverState *bs, Qcow2Cache *c,
+  uint64_t offset);
+void qcow2_cache_discard(BlockDriverState *bs, Qcow2Cache *c, void *table);
 
 /* qcow2-bitmap.c functions */
 int qcow2_check_bitmaps_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index 1d25147392..75746a7f43 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -411,3 +411,29 @@ void qcow2_cache_entry_mark_dirty(BlockDriverState *bs, 
Qcow2Cache *c,
 assert(c->entries[i].offset != 0);
 c->entries[i].dirty = true;
 }
+
+void *qcow2_cache_is_table_offset(BlockDriverState *bs, Qcow2Cache *c,
+  uint64_t offset)
+{
+int i;
+
+for (i = 0; i < c->size; i++) {
+if (c->entries[i].offset == offset) {
+return qcow2_cache_get_table_addr(bs, c, i);
+}
+}
+return NULL;
+}
+
+void qcow2_cache_discard(BlockDriverState *bs, Qcow2Cache *c, void *table)
+{
+int i = qcow2_cache_get_table_idx(bs, c, table);
+
+assert(c->entries[i].ref == 0);
+
+c->entries[i].offset = 0;
+c->entries[i].lru_counter = 0;
+c->entries[i].dirty = false;
+
+qcow2_cache_table_release(bs, c, i, 1);
+}
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 168fc32e7b..8c17c0e3aa 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -861,8 +861,24 @@ static int QEMU_WARN_UNUSED_RESULT 
update_refcount(BlockDriverState *bs,
 }
 s->set_refcount(refcount_block, block_index, refcount);
 
-if (refcount == 0 && s->discard_passthrough[type]) {
-update_refcount_discard(bs, cluster_offset, s->cluster_size);
+if (refcount == 0) {
+void *table;
+
+table = qcow2_cache_is_table_offset(bs, s->refcount_block_cache,
+offset);
+if (table != NULL) {
+qcow2_cache_put(bs, s->refcount_block_cache, _block);
+qcow2_cache_discard(bs, s->refcount_block_cache, table);
+}
+
+table = qcow2_cache_is_table_offset(bs, s->l2_table_cache, offset);
+if (table != NULL) {
+qcow2_cache_discard(bs, s->l2_table_cache, table);
+}
+
+if (s->discard_passthrough[type]) {
+update_refcount_discard(bs, cluster_offset, s->cluster_size);
+}
 }
 }
 
-- 
2.13.5




[Qemu-block] [PULL 24/24] block/qcow2-bitmap: fix use of uninitialized pointer

2017-09-26 Thread Kevin Wolf
From: Vladimir Sementsov-Ogievskiy 

Without initialization to zero dirty_bitmap field may be not zero
for a bitmap which should not be stored and
qcow2_store_persistent_dirty_bitmaps will erroneously call
store_bitmap for it which leads to SIGSEGV on bdrv_dirty_bitmap_name.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Message-id: 20170922144353.4220-1-vsement...@virtuozzo.com
Cc: qemu-sta...@nongnu.org
Reviewed-by: Eric Blake 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Max Reitz 
---
 block/qcow2-bitmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
index e8d3bdbd6e..14f41d0427 100644
--- a/block/qcow2-bitmap.c
+++ b/block/qcow2-bitmap.c
@@ -602,7 +602,7 @@ static Qcow2BitmapList *bitmap_list_load(BlockDriverState 
*bs, uint64_t offset,
 goto fail;
 }
 
-bm = g_new(Qcow2Bitmap, 1);
+bm = g_new0(Qcow2Bitmap, 1);
 bm->table.offset = e->bitmap_table_offset;
 bm->table.size = e->bitmap_table_size;
 bm->flags = e->flags;
-- 
2.13.5




[Qemu-block] [PULL 20/24] qemu-img: add --shrink flag for resize

2017-09-26 Thread Kevin Wolf
From: Pavel Butsykin 

The flag is additional precaution against data loss. Perhaps in the future the
operation shrink without this flag will be blocked for all formats, but for now
we need to maintain compatibility with raw.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Max Reitz 
Reviewed-by: John Snow 
Message-id: 20170918124230.8152-2-pbutsy...@virtuozzo.com
[mreitz: Added a missing space to a warning]
Signed-off-by: Max Reitz 
---
 qemu-img.c | 23 +++
 qemu-img-cmds.hx   |  4 ++--
 qemu-img.texi  |  6 +-
 tests/qemu-iotests/102 |  4 ++--
 tests/qemu-iotests/106 |  2 +-
 5 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index df984b11b9..d6007b2a6d 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -64,6 +64,7 @@ enum {
 OPTION_TARGET_IMAGE_OPTS = 263,
 OPTION_SIZE = 264,
 OPTION_PREALLOCATION = 265,
+OPTION_SHRINK = 266,
 };
 
 typedef enum OutputFormat {
@@ -3436,6 +3437,7 @@ static int img_resize(int argc, char **argv)
 },
 };
 bool image_opts = false;
+bool shrink = false;
 
 /* Remove size from argv manually so that negative numbers are not treated
  * as options by getopt. */
@@ -3454,6 +3456,7 @@ static int img_resize(int argc, char **argv)
 {"object", required_argument, 0, OPTION_OBJECT},
 {"image-opts", no_argument, 0, OPTION_IMAGE_OPTS},
 {"preallocation", required_argument, 0, OPTION_PREALLOCATION},
+{"shrink", no_argument, 0, OPTION_SHRINK},
 {0, 0, 0, 0}
 };
 c = getopt_long(argc, argv, ":f:hq",
@@ -3496,6 +3499,9 @@ static int img_resize(int argc, char **argv)
 return 1;
 }
 break;
+case OPTION_SHRINK:
+shrink = true;
+break;
 }
 }
 if (optind != argc - 1) {
@@ -3569,6 +3575,23 @@ static int img_resize(int argc, char **argv)
 goto out;
 }
 
+if (total_size < current_size && !shrink) {
+warn_report("Shrinking an image will delete all data beyond the "
+"shrunken image's end. Before performing such an "
+"operation, make sure there is no important data there.");
+
+if (g_strcmp0(bdrv_get_format_name(blk_bs(blk)), "raw") != 0) {
+error_report(
+  "Use the --shrink option to perform a shrink operation.");
+ret = -1;
+goto out;
+} else {
+warn_report("Using the --shrink option will suppress this message. 
"
+"Note that future versions of qemu-img may refuse to "
+"shrink images without this option.");
+}
+}
+
 ret = blk_truncate(blk, total_size, prealloc, );
 if (!ret) {
 qprintf(quiet, "Image resized.\n");
diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index b47d409665..2fe31893cf 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -89,9 +89,9 @@ STEXI
 ETEXI
 
 DEF("resize", img_resize,
-"resize [--object objectdef] [--image-opts] [-q] filename [+ | -]size")
+"resize [--object objectdef] [--image-opts] [-q] [--shrink] filename [+ | 
-]size")
 STEXI
-@item resize [--object @var{objectdef}] [--image-opts] [-q] @var{filename} [+ 
| -]@var{size}
+@item resize [--object @var{objectdef}] [--image-opts] [-q] [--shrink] 
@var{filename} [+ | -]@var{size}
 ETEXI
 
 STEXI
diff --git a/qemu-img.texi b/qemu-img.texi
index 90c7eab4a8..ee5c5940d3 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -545,7 +545,7 @@ qemu-img rebase -b base.img diff.qcow2
 At this point, @code{modified.img} can be discarded, since
 @code{base.img + diff.qcow2} contains the same information.
 
-@item resize [--preallocation=@var{prealloc}] @var{filename} [+ | -]@var{size}
+@item resize [--shrink] [--preallocation=@var{prealloc}] @var{filename} [+ | 
-]@var{size}
 
 Change the disk image as if it had been created with @var{size}.
 
@@ -553,6 +553,10 @@ Before using this command to shrink a disk image, you MUST 
use file system and
 partitioning tools inside the VM to reduce allocated file systems and partition
 sizes accordingly.  Failure to do so will result in data loss!
 
+When shrinking images, the @code{--shrink} option must be given. This informs
+qemu-img that the user acknowledges all loss of data beyond the truncated
+image's end.
+
 After using this command to grow a disk image, you must use file system and
 partitioning tools inside the VM to actually begin using the new space on the
 device.
diff --git a/tests/qemu-iotests/102 b/tests/qemu-iotests/102
index 87db1bb1bf..d7ad8d9840 100755
--- a/tests/qemu-iotests/102
+++ b/tests/qemu-iotests/102
@@ -54,7 +54,7 @@ _make_test_img $IMG_SIZE
 $QEMU_IO -c 'write 0 64k' "$TEST_IMG" | _filter_qemu_io
 # Remove data cluster from image (first cluster: image header, 

[Qemu-block] [PULL 15/24] block: Base permissions on rw state after reopen

2017-09-26 Thread Kevin Wolf
When new permissions are calculated during bdrv_reopen(), they need to
be based on the state of the graph as it will be after the reopen has
completed, not on the current state of the involved nodes.

This patch makes bdrv_is_writable() optionally accept a BlockReopenQueue
from which the new flags are taken. This is then used for determining
the new bs->file permissions of format drivers as soon as we add the
code to actually pass a non-NULL reopen queue to the .bdrv_child_perm
callbacks.

While moving bdrv_is_writable(), make it static. It isn't used outside
block.c.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
---
 include/block/block.h |  1 -
 block.c   | 52 ---
 2 files changed, 37 insertions(+), 16 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 2ad18775af..082eb2cd9c 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -435,7 +435,6 @@ int bdrv_is_allocated_above(BlockDriverState *top, 
BlockDriverState *base,
 int64_t offset, int64_t bytes, int64_t *pnum);
 
 bool bdrv_is_read_only(BlockDriverState *bs);
-bool bdrv_is_writable(BlockDriverState *bs);
 int bdrv_can_set_read_only(BlockDriverState *bs, bool read_only,
bool ignore_allow_rdw, Error **errp);
 int bdrv_set_read_only(BlockDriverState *bs, bool read_only, Error **errp);
diff --git a/block.c b/block.c
index 0b499fda4c..ed8d51dd42 100644
--- a/block.c
+++ b/block.c
@@ -239,12 +239,6 @@ bool bdrv_is_read_only(BlockDriverState *bs)
 return bs->read_only;
 }
 
-/* Returns whether the image file can be written to right now */
-bool bdrv_is_writable(BlockDriverState *bs)
-{
-return !bdrv_is_read_only(bs) && !(bs->open_flags & BDRV_O_INACTIVE);
-}
-
 int bdrv_can_set_read_only(BlockDriverState *bs, bool read_only,
bool ignore_allow_rdw, Error **errp)
 {
@@ -1537,6 +1531,41 @@ static int bdrv_child_check_perm(BdrvChild *c, 
BlockReopenQueue *q,
 static void bdrv_child_abort_perm_update(BdrvChild *c);
 static void bdrv_child_set_perm(BdrvChild *c, uint64_t perm, uint64_t shared);
 
+typedef struct BlockReopenQueueEntry {
+ bool prepared;
+ BDRVReopenState state;
+ QSIMPLEQ_ENTRY(BlockReopenQueueEntry) entry;
+} BlockReopenQueueEntry;
+
+/*
+ * Return the flags that @bs will have after the reopens in @q have
+ * successfully completed. If @q is NULL (or @bs is not contained in @q),
+ * return the current flags.
+ */
+static int bdrv_reopen_get_flags(BlockReopenQueue *q, BlockDriverState *bs)
+{
+BlockReopenQueueEntry *entry;
+
+if (q != NULL) {
+QSIMPLEQ_FOREACH(entry, q, entry) {
+if (entry->state.bs == bs) {
+return entry->state.flags;
+}
+}
+}
+
+return bs->open_flags;
+}
+
+/* Returns whether the image file can be written to after the reopen queue @q
+ * has been successfully applied, or right now if @q is NULL. */
+static bool bdrv_is_writable(BlockDriverState *bs, BlockReopenQueue *q)
+{
+int flags = bdrv_reopen_get_flags(q, bs);
+
+return (flags & (BDRV_O_RDWR | BDRV_O_INACTIVE)) == BDRV_O_RDWR;
+}
+
 static void bdrv_child_perm(BlockDriverState *bs, BlockDriverState *child_bs,
 BdrvChild *c, const BdrvChildRole *role,
 BlockReopenQueue *reopen_queue,
@@ -1574,7 +1603,7 @@ static int bdrv_check_perm(BlockDriverState *bs, 
BlockReopenQueue *q,
 
 /* Write permissions never work with read-only images */
 if ((cumulative_perms & (BLK_PERM_WRITE | BLK_PERM_WRITE_UNCHANGED)) &&
-!bdrv_is_writable(bs))
+!bdrv_is_writable(bs, q))
 {
 error_setg(errp, "Block node is read-only");
 return -EPERM;
@@ -1864,8 +1893,7 @@ void bdrv_format_default_perms(BlockDriverState *bs, 
BdrvChild *c,
   , );
 
 /* Format drivers may touch metadata even if the guest doesn't write */
-/* TODO Take flags from reopen_queue */
-if (bdrv_is_writable(bs)) {
+if (bdrv_is_writable(bs, reopen_queue)) {
 perm |= BLK_PERM_WRITE | BLK_PERM_RESIZE;
 }
 
@@ -2642,12 +2670,6 @@ BlockDriverState *bdrv_open(const char *filename, const 
char *reference,
  NULL, errp);
 }
 
-typedef struct BlockReopenQueueEntry {
- bool prepared;
- BDRVReopenState state;
- QSIMPLEQ_ENTRY(BlockReopenQueueEntry) entry;
-} BlockReopenQueueEntry;
-
 /*
  * Adds a BlockDriverState to a simple queue for an atomic, transactional
  * reopen of multiple devices.
-- 
2.13.5




[Qemu-block] [PULL 18/24] qemu-iotests: Test change-backing-file command

2017-09-26 Thread Kevin Wolf
This involves a temporary read-write reopen if the backing file link in
the middle of a backing file chain should be changed and is therefore a
good test for the latest bdrv_reopen() vs. op blockers fixes.

Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/195 | 92 ++
 tests/qemu-iotests/195.out | 78 +++
 tests/qemu-iotests/group   |  1 +
 3 files changed, 171 insertions(+)
 create mode 100755 tests/qemu-iotests/195
 create mode 100644 tests/qemu-iotests/195.out

diff --git a/tests/qemu-iotests/195 b/tests/qemu-iotests/195
new file mode 100755
index 00..05a239cbf5
--- /dev/null
+++ b/tests/qemu-iotests/195
@@ -0,0 +1,92 @@
+#!/bin/bash
+#
+# Test change-backing-file command
+#
+# Copyright (C) 2017 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=kw...@redhat.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+status=1 # failure is the default!
+
+_cleanup()
+{
+_cleanup_test_img
+rm -f "$TEST_IMG.mid"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+
+function do_run_qemu()
+{
+echo Testing: "$@" | _filter_imgfmt
+$QEMU -nographic -qmp-pretty stdio -serial none "$@"
+echo
+}
+
+function run_qemu()
+{
+do_run_qemu "$@" 2>&1 | _filter_testdir | _filter_qemu | _filter_qmp \
+  | _filter_qemu_io | _filter_generated_node_ids
+}
+
+size=64M
+TEST_IMG="$TEST_IMG.base" _make_test_img $size
+TEST_IMG="$TEST_IMG.mid" _make_test_img -b "$TEST_IMG.base"
+_make_test_img -b "$TEST_IMG.mid"
+
+echo
+echo "Change backing file of mid (opened read-only)"
+echo
+
+run_qemu -drive if=none,file="$TEST_IMG",backing.node-name=mid <

[Qemu-block] [PULL 06/24] iotests: use -ccw on s390x for 051

2017-09-26 Thread Kevin Wolf
From: Cornelia Huck 

The default cpu model on s390x does not provide zPCI, which is
not yet wired up on tcg. Moreover, virtio-ccw is the standard
on s390x, so use the -ccw instead of the -pci versions of virtio
devices on s390x.

Signed-off-by: Cornelia Huck 
Reviewed-by: QingFeng Hao 
Reviewed-by: David Hildenbrand 
Reviewed-by: Thomas Huth 
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/051| 12 +++-
 tests/qemu-iotests/051.out|  2 +-
 tests/qemu-iotests/051.pc.out |  2 +-
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/051 b/tests/qemu-iotests/051
index c8cfc764bc..dba8816c9f 100755
--- a/tests/qemu-iotests/051
+++ b/tests/qemu-iotests/051
@@ -103,7 +103,17 @@ echo
 echo === Device without drive ===
 echo
 
-run_qemu -device virtio-scsi-pci -device scsi-hd
+case "$QEMU_DEFAULT_MACHINE" in
+  s390-ccw-virtio)
+  virtio_scsi=virtio-scsi-ccw
+  ;;
+  *)
+  virtio_scsi=virtio-scsi-pci
+  ;;
+esac
+
+run_qemu -device $virtio_scsi -device scsi-hd |
+sed -e "s/$virtio_scsi/VIRTIO_SCSI/"
 
 echo
 echo === Overriding backing file ===
diff --git a/tests/qemu-iotests/051.out b/tests/qemu-iotests/051.out
index 4d3b1ff316..e3c6eaba57 100644
--- a/tests/qemu-iotests/051.out
+++ b/tests/qemu-iotests/051.out
@@ -49,7 +49,7 @@ QEMU_PROG: -drive 
file=TEST_DIR/t.qcow2,driver=qcow2,format=qcow2: Cannot specif
 
 === Device without drive ===
 
-Testing: -device virtio-scsi-pci -device scsi-hd
+Testing: -device VIRTIO_SCSI -device scsi-hd
 QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) QEMU_PROG: -device scsi-hd: drive property not set
 
diff --git a/tests/qemu-iotests/051.pc.out b/tests/qemu-iotests/051.pc.out
index 762fb9f42c..f2c5622cee 100644
--- a/tests/qemu-iotests/051.pc.out
+++ b/tests/qemu-iotests/051.pc.out
@@ -49,7 +49,7 @@ QEMU_PROG: -drive 
file=TEST_DIR/t.qcow2,driver=qcow2,format=qcow2: Cannot specif
 
 === Device without drive ===
 
-Testing: -device virtio-scsi-pci -device scsi-hd
+Testing: -device VIRTIO_SCSI -device scsi-hd
 QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) QEMU_PROG: -device scsi-hd: drive property not set
 
-- 
2.13.5




[Qemu-block] [PULL 09/24] throttle: Assert that bkt->max is valid in throttle_compute_wait()

2017-09-26 Thread Kevin Wolf
From: Alberto Garcia 

If bkt->max == 0 and bkt->burst_length > 1 then we could have a
division by 0 in throttle_do_compute_wait(). That configuration is
however not permitted and is already detected by throttle_is_valid(),
but let's assert it in throttle_compute_wait() to make it explicit.

Found by Coverity (CID: 1381016).

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 util/throttle.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/util/throttle.c b/util/throttle.c
index 06bf916adc..b38e742da5 100644
--- a/util/throttle.c
+++ b/util/throttle.c
@@ -124,6 +124,7 @@ int64_t throttle_compute_wait(LeakyBucket *bkt)
 /* If the main bucket is not full yet we still have to check the
  * burst bucket in order to enforce the burst limit */
 if (bkt->burst_length > 1) {
+assert(bkt->max > 0); /* see throttle_is_valid() */
 extra = bkt->burst_level - burst_bucket_size;
 if (extra > 0) {
 return throttle_do_compute_wait(bkt->max, extra);
-- 
2.13.5




[Qemu-block] [PULL 11/24] block: Clean up some bad code in the vvfat driver

2017-09-26 Thread Kevin Wolf
From: Thomas Huth 

Remove the unnecessary home-grown redefinition of the assert() macro here,
and remove the unusable debug code at the end of the checkpoint() function.
The code there uses assert() with side-effects (assignment to the "mapping"
variable), which should be avoided. Looking more closely, it seems as it is
apparently also only usable for one certain directory layout (with a file
named USB.H in it) and thus is of no use for the rest of the world.

Signed-off-by: Thomas Huth 
Reviewed-by: John Snow 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 block/vvfat.c | 26 ++
 1 file changed, 2 insertions(+), 24 deletions(-)

diff --git a/block/vvfat.c b/block/vvfat.c
index cbabb36f62..1d6e7087a8 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -57,15 +57,6 @@
 
 static void checkpoint(void);
 
-#ifdef __MINGW32__
-void nonono(const char* file, int line, const char* msg) {
-fprintf(stderr, "Nonono! %s:%d %s\n", file, line, msg);
-exit(-5);
-}
-#undef assert
-#define assert(a) do {if (!(a)) nonono(__FILE__, __LINE__, #a);}while(0)
-#endif
-
 #else
 
 #define DLOG(a)
@@ -3270,24 +3261,11 @@ static void bdrv_vvfat_init(void)
 block_init(bdrv_vvfat_init);
 
 #ifdef DEBUG
-static void checkpoint(void) {
+static void checkpoint(void)
+{
 assert(((mapping_t*)array_get(&(vvv->mapping), 0))->end == 2);
 check1(vvv);
 check2(vvv);
 assert(!vvv->current_mapping || vvv->current_fd || 
(vvv->current_mapping->mode & MODE_DIRECTORY));
-#if 0
-if (((direntry_t*)vvv->directory.pointer)[1].attributes != 0xf)
-fprintf(stderr, "Nonono!\n");
-mapping_t* mapping;
-direntry_t* direntry;
-assert(vvv->mapping.size >= vvv->mapping.item_size * vvv->mapping.next);
-assert(vvv->directory.size >= vvv->directory.item_size * 
vvv->directory.next);
-if (vvv->mapping.next<47)
-return;
-assert((mapping = array_get(&(vvv->mapping), 47)));
-assert(mapping->dir_index < vvv->directory.next);
-direntry = array_get(&(vvv->directory), mapping->dir_index);
-assert(!memcmp(direntry->name, "USB H  ", 11) || direntry->name[0]==0);
-#endif
 }
 #endif
-- 
2.13.5




[Qemu-block] [PULL 07/24] iotests: use virtio aliases for 067

2017-09-26 Thread Kevin Wolf
From: Cornelia Huck 

The default cpu model on s390x does not provide zPCI, which is
not yet wired up on tcg. Moreover, virtio-ccw is the standard
on s390x.

Using virtio-scsi will implicitly pick the right device, so just
switch to that for simplicity.

Signed-off-by: Cornelia Huck 
Reviewed-by: QingFeng Hao 
Reviewed-by: David Hildenbrand 
Reviewed-by: Thomas Huth 
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/067 | 2 +-
 tests/qemu-iotests/067.out | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/067 b/tests/qemu-iotests/067
index 5d4ca4bc61..ee9595f0da 100755
--- a/tests/qemu-iotests/067
+++ b/tests/qemu-iotests/067
@@ -141,7 +141,7 @@ echo
 echo === Empty drive with -device and device_del ===
 echo
 
-run_qemu -device virtio-scsi-pci -device scsi-cd,id=cd0 <

[Qemu-block] [PULL 08/24] iotests: Print full path of bad output if mismatch

2017-09-26 Thread Kevin Wolf
From: Fam Zheng 

So it is easier to copy paste the path.

Signed-off-by: Fam Zheng 
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/check | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index d504b6e455..4583a0c269 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -353,7 +353,7 @@ do
 else
 echo " - output mismatch (see $seq.out.bad)"
 mv $tmp.out $seq.out.bad
-$diff -w "$reference" $seq.out.bad
+$diff -w "$reference" $(realpath $seq.out.bad)
 err=true
 fi
 fi
-- 
2.13.5




[Qemu-block] [PULL 12/24] qemu-io: Drop write permissions before read-only reopen

2017-09-26 Thread Kevin Wolf
qemu-io provides a 'reopen' command that allows switching from writable
to read-only access. We need to make sure that we don't try to keep
write permissions to a BlockBackend that becomes read-only, otherwise
things are going to fail.

This requires a bdrv_drain() call because otherwise in-flight AIO
write requests could issue new internal requests while the permission
has already gone away, which would cause assertion failures. Draining
the queue doesn't break AIO requests in any new way, bdrv_reopen() would
drain it anyway only a few lines later.

Signed-off-by: Kevin Wolf 
Reviewed-by: Fam Zheng 
---
 qemu-io-cmds.c | 12 
 tests/qemu-iotests/187.out |  2 +-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index 2811a89099..3727fb43f3 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -2010,6 +2010,18 @@ static int reopen_f(BlockBackend *blk, int argc, char 
**argv)
 return 0;
 }
 
+if (!(flags & BDRV_O_RDWR)) {
+uint64_t orig_perm, orig_shared_perm;
+
+bdrv_drain(bs);
+
+blk_get_perm(blk, _perm, _shared_perm);
+blk_set_perm(blk,
+ orig_perm & ~(BLK_PERM_WRITE | BLK_PERM_WRITE_UNCHANGED),
+ orig_shared_perm,
+ _abort);
+}
+
 qopts = qemu_opts_find(_opts, NULL);
 opts = qopts ? qemu_opts_to_qdict(qopts, NULL) : NULL;
 qemu_opts_reset(_opts);
diff --git a/tests/qemu-iotests/187.out b/tests/qemu-iotests/187.out
index 68fb944cd5..30b987f71f 100644
--- a/tests/qemu-iotests/187.out
+++ b/tests/qemu-iotests/187.out
@@ -12,7 +12,7 @@ Start from read-write
 
 wrote 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-write failed: Operation not permitted
+Block node is read-only
 wrote 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 *** done
-- 
2.13.5




[Qemu-block] [PULL 10/24] block/throttle-groups.c: allocate RestartData on the heap

2017-09-26 Thread Kevin Wolf
From: Manos Pitsidianakis 

RestartData is the opaque data of the throttle_group_restart_queue_entry
coroutine. By being stack allocated, it isn't available anymore if
aio_co_enter schedules the coroutine with a bottom half and runs after
throttle_group_restart_queue returns.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Manos Pitsidianakis 
Reviewed-by: Eric Blake 
Reviewed-by: Alberto Garcia 
Signed-off-by: Kevin Wolf 
---
 block/throttle-groups.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/block/throttle-groups.c b/block/throttle-groups.c
index 6ba992c8d7..b291a88481 100644
--- a/block/throttle-groups.c
+++ b/block/throttle-groups.c
@@ -403,17 +403,19 @@ static void coroutine_fn 
throttle_group_restart_queue_entry(void *opaque)
 schedule_next_request(tgm, is_write);
 qemu_mutex_unlock(>lock);
 }
+
+g_free(data);
 }
 
 static void throttle_group_restart_queue(ThrottleGroupMember *tgm, bool 
is_write)
 {
 Coroutine *co;
-RestartData rd = {
-.tgm = tgm,
-.is_write = is_write
-};
+RestartData *rd = g_new0(RestartData, 1);
+
+rd->tgm = tgm;
+rd->is_write = is_write;
 
-co = qemu_coroutine_create(throttle_group_restart_queue_entry, );
+co = qemu_coroutine_create(throttle_group_restart_queue_entry, rd);
 aio_co_enter(tgm->aio_context, co);
 }
 
-- 
2.13.5




[Qemu-block] [PULL 05/24] iotests: use -ccw on s390x for 040, 139, and 182

2017-09-26 Thread Kevin Wolf
From: Cornelia Huck 

The default cpu model on s390x does not provide zPCI, which is
not yet wired up on tcg. Moreover, virtio-ccw is the standard
on s390x, so use the -ccw instead of the -pci versions of virtio
devices on s390x.

Reviewed-by: Kevin Wolf 
Signed-off-by: Cornelia Huck 
Reviewed-by: QingFeng Hao 
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/040 |  6 +-
 tests/qemu-iotests/139 | 12 ++--
 tests/qemu-iotests/182 | 13 +++--
 3 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
index 95b7510571..c284d08796 100755
--- a/tests/qemu-iotests/040
+++ b/tests/qemu-iotests/040
@@ -82,7 +82,11 @@ class TestSingleDrive(ImageCommitTestCase):
 qemu_io('-f', 'raw', '-c', 'write -P 0xab 0 524288', backing_img)
 qemu_io('-f', iotests.imgfmt, '-c', 'write -P 0xef 524288 524288', 
mid_img)
 self.vm = iotests.VM().add_drive(test_img, 
"node-name=top,backing.node-name=mid,backing.backing.node-name=base", 
interface="none")
-self.vm.add_device("virtio-scsi-pci")
+if iotests.qemu_default_machine == 's390-ccw-virtio':
+self.vm.add_device("virtio-scsi-ccw")
+else:
+self.vm.add_device("virtio-scsi-pci")
+
 self.vm.add_device("scsi-hd,id=scsi0,drive=drive0")
 self.vm.launch()
 
diff --git a/tests/qemu-iotests/139 b/tests/qemu-iotests/139
index 50cf40fbd5..f8f02808a9 100644
--- a/tests/qemu-iotests/139
+++ b/tests/qemu-iotests/139
@@ -25,13 +25,21 @@ import time
 
 base_img = os.path.join(iotests.test_dir, 'base.img')
 new_img = os.path.join(iotests.test_dir, 'new.img')
+if iotests.qemu_default_machine == 's390-ccw-virtio':
+default_virtio_blk = 'virtio-blk-ccw'
+else:
+default_virtio_blk = 'virtio-blk-pci'
 
 class TestBlockdevDel(iotests.QMPTestCase):
 
 def setUp(self):
 iotests.qemu_img('create', '-f', iotests.imgfmt, base_img, '1M')
 self.vm = iotests.VM()
-self.vm.add_device("virtio-scsi-pci,id=virtio-scsi")
+if iotests.qemu_default_machine == 's390-ccw-virtio':
+self.vm.add_device("virtio-scsi-ccw,id=virtio-scsi")
+else:
+self.vm.add_device("virtio-scsi-pci,id=virtio-scsi")
+
 self.vm.launch()
 
 def tearDown(self):
@@ -87,7 +95,7 @@ class TestBlockdevDel(iotests.QMPTestCase):
 self.checkBlockDriverState(node, expect_error)
 
 # Add a device model
-def addDeviceModel(self, device, backend, driver = 'virtio-blk-pci'):
+def addDeviceModel(self, device, backend, driver = default_virtio_blk):
 result = self.vm.qmp('device_add', id = device,
  driver = driver, drive = backend)
 self.assert_qmp(result, 'return', {})
diff --git a/tests/qemu-iotests/182 b/tests/qemu-iotests/182
index 7ecbb22604..2e078ceed8 100755
--- a/tests/qemu-iotests/182
+++ b/tests/qemu-iotests/182
@@ -45,17 +45,26 @@ _supported_os Linux
 
 size=32M
 
+case "$QEMU_DEFAULT_MACHINE" in
+  s390-ccw-virtio)
+  virtioblk=virtio-blk-ccw
+  ;;
+  *)
+  virtioblk=virtio-blk-pci
+  ;;
+esac
+
 _make_test_img $size
 
 echo "Starting QEMU"
 _launch_qemu -drive file=$TEST_IMG,if=none,id=drive0,file.locking=on \
--device virtio-blk-pci,drive=drive0
+-device $virtioblk,drive=drive0
 
 echo
 echo "Starting a second QEMU using the same image should fail"
 echo 'quit' | $QEMU -monitor stdio \
 -drive file=$TEST_IMG,if=none,id=drive0,file.locking=on \
--device virtio-blk-pci,drive=drive0 2>&1 | _filter_testdir 2>&1 |
+-device $virtioblk,drive=drive0 2>&1 | _filter_testdir 2>&1 |
 _filter_qemu |
 sed -e '/falling back to POSIX file/d' \
 -e '/locks can be lost unexpectedly/d'
-- 
2.13.5




[Qemu-block] [PULL 04/24] docs: add qemu-block-drivers(7) man page

2017-09-26 Thread Kevin Wolf
From: Stefan Hajnoczi 

Block driver documentation is available in qemu-doc.html.  It would be
convenient to have documentation for formats, protocols, and filter
drivers in a man page.

Extract the relevant part of qemu-doc.html into a new file called
docs/qemu-block-drivers.texi.  This file can also be built as a
stand-alone document (man, html, etc).

Signed-off-by: Stefan Hajnoczi 
Signed-off-by: Kevin Wolf 
---
 Makefile |   6 +-
 docs/qemu-block-drivers.texi | 804 +++
 qemu-doc.texi| 781 +
 3 files changed, 810 insertions(+), 781 deletions(-)
 create mode 100644 docs/qemu-block-drivers.texi

diff --git a/Makefile b/Makefile
index 2be61fcf1c..cee6e28659 100644
--- a/Makefile
+++ b/Makefile
@@ -209,6 +209,7 @@ ifdef BUILD_DOCS
 DOCS=qemu-doc.html qemu-doc.txt qemu.1 qemu-img.1 qemu-nbd.8 qemu-ga.8
 DOCS+=docs/interop/qemu-qmp-ref.html docs/interop/qemu-qmp-ref.txt 
docs/interop/qemu-qmp-ref.7
 DOCS+=docs/interop/qemu-ga-ref.html docs/interop/qemu-ga-ref.txt 
docs/interop/qemu-ga-ref.7
+DOCS+=docs/qemu-block-drivers.7
 ifdef CONFIG_VIRTFS
 DOCS+=fsdev/virtfs-proxy-helper.1
 endif
@@ -532,6 +533,7 @@ distclean: clean
rm -f docs/interop/qemu-qmp-ref.txt docs/interop/qemu-ga-ref.txt
rm -f docs/interop/qemu-qmp-ref.pdf docs/interop/qemu-ga-ref.pdf
rm -f docs/interop/qemu-qmp-ref.html docs/interop/qemu-ga-ref.html
+   rm -f docs/qemu-block-drivers.7
for d in $(TARGET_DIRS); do \
rm -rf $$d || exit 1 ; \
 done
@@ -576,6 +578,7 @@ ifdef CONFIG_POSIX
$(INSTALL_DATA) qemu.1 "$(DESTDIR)$(mandir)/man1"
$(INSTALL_DIR) "$(DESTDIR)$(mandir)/man7"
$(INSTALL_DATA) docs/interop/qemu-qmp-ref.7 "$(DESTDIR)$(mandir)/man7"
+   $(INSTALL_DATA) docs/qemu-block-drivers.7 "$(DESTDIR)$(mandir)/man7"
 ifneq ($(TOOLS),)
$(INSTALL_DATA) qemu-img.1 "$(DESTDIR)$(mandir)/man1"
$(INSTALL_DIR) "$(DESTDIR)$(mandir)/man8"
@@ -721,6 +724,7 @@ qemu-img.1: qemu-img.texi qemu-option-trace.texi 
qemu-img-cmds.texi
 fsdev/virtfs-proxy-helper.1: fsdev/virtfs-proxy-helper.texi
 qemu-nbd.8: qemu-nbd.texi qemu-option-trace.texi
 qemu-ga.8: qemu-ga.texi
+docs/qemu-block-drivers.7: docs/qemu-block-drivers.texi
 
 html: qemu-doc.html docs/interop/qemu-qmp-ref.html 
docs/interop/qemu-ga-ref.html
 info: qemu-doc.info docs/interop/qemu-qmp-ref.info 
docs/interop/qemu-ga-ref.info
@@ -730,7 +734,7 @@ txt: qemu-doc.txt docs/interop/qemu-qmp-ref.txt 
docs/interop/qemu-ga-ref.txt
 qemu-doc.html qemu-doc.info qemu-doc.pdf qemu-doc.txt: \
qemu-img.texi qemu-nbd.texi qemu-options.texi qemu-option-trace.texi \
qemu-monitor.texi qemu-img-cmds.texi qemu-ga.texi \
-   qemu-monitor-info.texi
+   qemu-monitor-info.texi docs/qemu-block-drivers.texi
 
 docs/interop/qemu-ga-ref.dvi docs/interop/qemu-ga-ref.html \
 docs/interop/qemu-ga-ref.info docs/interop/qemu-ga-ref.pdf \
diff --git a/docs/qemu-block-drivers.texi b/docs/qemu-block-drivers.texi
new file mode 100644
index 00..1cb1e55686
--- /dev/null
+++ b/docs/qemu-block-drivers.texi
@@ -0,0 +1,804 @@
+@c man begin SYNOPSIS
+QEMU block driver reference manual
+@c man end
+
+@c man begin DESCRIPTION
+
+@node disk_images_formats
+@subsection Disk image file formats
+
+QEMU supports many image file formats that can be used with VMs as well as with
+any of the tools (like @code{qemu-img}). This includes the preferred formats
+raw and qcow2 as well as formats that are supported for compatibility with
+older QEMU versions or other hypervisors.
+
+Depending on the image format, different options can be passed to
+@code{qemu-img create} and @code{qemu-img convert} using the @code{-o} option.
+This section describes each format and the options that are supported for it.
+
+@table @option
+@item raw
+
+Raw disk image format. This format has the advantage of
+being simple and easily exportable to all other emulators. If your
+file system supports @emph{holes} (for example in ext2 or ext3 on
+Linux or NTFS on Windows), then only the written sectors will reserve
+space. Use @code{qemu-img info} to know the real size used by the
+image or @code{ls -ls} on Unix/Linux.
+
+Supported options:
+@table @code
+@item preallocation
+Preallocation mode (allowed values: @code{off}, @code{falloc}, @code{full}).
+@code{falloc} mode preallocates space for image by calling posix_fallocate().
+@code{full} mode preallocates space for image by writing zeros to underlying
+storage.
+@end table
+
+@item qcow2
+QEMU image format, the most versatile format. Use it to have smaller
+images (useful if your filesystem does not supports holes, for example
+on Windows), zlib based compression and support of multiple VM
+snapshots.
+
+Supported options:
+@table @code
+@item compat
+Determines the qcow2 version to use. @code{compat=0.10} uses the
+traditional 

[Qemu-block] [PULL 13/24] block: Add reopen_queue to bdrv_child_perm()

2017-09-26 Thread Kevin Wolf
In the context of bdrv_reopen(), we'll have to look at the state of the
graph as it will be after the reopen. This interface addition is in
preparation for the change.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
---
 include/block/block_int.h |  7 +++
 block.c   | 19 ---
 block/commit.c|  1 +
 block/mirror.c|  1 +
 block/replication.c   |  1 +
 block/vvfat.c |  1 +
 6 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index ba4c383393..99abe2ce74 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -411,9 +411,14 @@ struct BlockDriver {
  *
  * If @c is NULL, return the permissions for attaching a new child for the
  * given @role.
+ *
+ * If @reopen_queue is non-NULL, don't return the currently needed
+ * permissions, but those that will be needed after applying the
+ * @reopen_queue.
  */
  void (*bdrv_child_perm)(BlockDriverState *bs, BdrvChild *c,
  const BdrvChildRole *role,
+ BlockReopenQueue *reopen_queue,
  uint64_t parent_perm, uint64_t parent_shared,
  uint64_t *nperm, uint64_t *nshared);
 
@@ -983,6 +988,7 @@ int bdrv_child_try_set_perm(BdrvChild *c, uint64_t perm, 
uint64_t shared,
  * all children */
 void bdrv_filter_default_perms(BlockDriverState *bs, BdrvChild *c,
const BdrvChildRole *role,
+   BlockReopenQueue *reopen_queue,
uint64_t perm, uint64_t shared,
uint64_t *nperm, uint64_t *nshared);
 
@@ -992,6 +998,7 @@ void bdrv_filter_default_perms(BlockDriverState *bs, 
BdrvChild *c,
  * CONSISTENT_READ and doesn't share WRITE. */
 void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
const BdrvChildRole *role,
+   BlockReopenQueue *reopen_queue,
uint64_t perm, uint64_t shared,
uint64_t *nperm, uint64_t *nshared);
 
diff --git a/block.c b/block.c
index 6dd47e414e..c7724c85e3 100644
--- a/block.c
+++ b/block.c
@@ -1537,16 +1537,17 @@ static void bdrv_child_abort_perm_update(BdrvChild *c);
 static void bdrv_child_set_perm(BdrvChild *c, uint64_t perm, uint64_t shared);
 
 static void bdrv_child_perm(BlockDriverState *bs, BlockDriverState *child_bs,
-BdrvChild *c,
-const BdrvChildRole *role,
+BdrvChild *c, const BdrvChildRole *role,
+BlockReopenQueue *reopen_queue,
 uint64_t parent_perm, uint64_t parent_shared,
 uint64_t *nperm, uint64_t *nshared)
 {
 if (bs->drv && bs->drv->bdrv_child_perm) {
-bs->drv->bdrv_child_perm(bs, c, role,
+bs->drv->bdrv_child_perm(bs, c, role, reopen_queue,
  parent_perm, parent_shared,
  nperm, nshared);
 }
+/* TODO Take force_share from reopen_queue */
 if (child_bs && child_bs->force_share) {
 *nshared = BLK_PERM_ALL;
 }
@@ -1596,7 +1597,7 @@ static int bdrv_check_perm(BlockDriverState *bs, uint64_t 
cumulative_perms,
 /* Check all children */
 QLIST_FOREACH(c, >children, next) {
 uint64_t cur_perm, cur_shared;
-bdrv_child_perm(bs, c->bs, c, c->role,
+bdrv_child_perm(bs, c->bs, c, c->role, NULL,
 cumulative_perms, cumulative_shared_perms,
 _perm, _shared);
 ret = bdrv_child_check_perm(c, cur_perm, cur_shared, ignore_children,
@@ -1658,7 +1659,7 @@ static void bdrv_set_perm(BlockDriverState *bs, uint64_t 
cumulative_perms,
 /* Update all children */
 QLIST_FOREACH(c, >children, next) {
 uint64_t cur_perm, cur_shared;
-bdrv_child_perm(bs, c->bs, c, c->role,
+bdrv_child_perm(bs, c->bs, c, c->role, NULL,
 cumulative_perms, cumulative_shared_perms,
 _perm, _shared);
 bdrv_child_set_perm(c, cur_perm, cur_shared);
@@ -1827,6 +1828,7 @@ int bdrv_child_try_set_perm(BdrvChild *c, uint64_t perm, 
uint64_t shared,
 
 void bdrv_filter_default_perms(BlockDriverState *bs, BdrvChild *c,
const BdrvChildRole *role,
+   BlockReopenQueue *reopen_queue,
uint64_t perm, uint64_t shared,
uint64_t *nperm, uint64_t *nshared)
 {
@@ -1844,6 +1846,7 @@ void bdrv_filter_default_perms(BlockDriverState *bs, 
BdrvChild *c,
 
 void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
const 

[Qemu-block] [PULL 02/24] qemu-img: Clarify about relative backing file options

2017-09-26 Thread Kevin Wolf
From: Fam Zheng 

It's not too surprising when a user specifies the backing file relative
to the current working directory instead of the top layer image. This
causes error when they differ. Though the error message has enough
information to infer the fact about the misunderstanding, it is better
if we document this explicitly, so that users don't have to learn from
mistakes.

Signed-off-by: Fam Zheng 
Reviewed-by: Eric Blake 
Reviewed-by: Jeff Cody 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Kevin Wolf 
---
 qemu-img.texi | 9 +
 1 file changed, 9 insertions(+)

diff --git a/qemu-img.texi b/qemu-img.texi
index 72dabd6b3e..90c7eab4a8 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -244,6 +244,9 @@ only the differences from @var{backing_file}. No size needs 
to be specified in
 this case. @var{backing_file} will never be modified unless you use the
 @code{commit} monitor command (or qemu-img commit).
 
+If a relative path name is given, the backing file is looked up relative to
+the directory containing @var{filename}.
+
 Note that a given backing file will be opened to check that it is valid. Use
 the @code{-u} option to enable unsafe backing file mode, which means that the
 image will be created even if the associated backing file cannot be opened. A
@@ -343,6 +346,9 @@ created as a copy on write image of the specified base 
image; the
 @var{backing_file} should have the same content as the input's base image,
 however the path, image format, etc may differ.
 
+If a relative path name is given, the backing file is looked up relative to
+the directory containing @var{output_filename}.
+
 If the @code{-n} option is specified, the target volume creation will be
 skipped. This is useful for formats such as @code{rbd} if the target
 volume has already been created with site specific options that cannot
@@ -490,6 +496,9 @@ The backing file is changed to @var{backing_file} and (if 
the image format of
 string), then the image is rebased onto no backing file (i.e. it will exist
 independently of any backing file).
 
+If a relative path name is given, the backing file is looked up relative to
+the directory containing @var{filename}.
+
 @var{cache} specifies the cache mode to be used for @var{filename}, whereas
 @var{src_cache} specifies the cache mode for reading backing files.
 
-- 
2.13.5




[Qemu-block] [PULL 03/24] file-posix: Clear out first sector in hdev_create

2017-09-26 Thread Kevin Wolf
From: Fam Zheng 

People get surprised when, after "qemu-img create -f raw /dev/sdX", they
still see qcow2 with "qemu-img info", if previously the bdev had a qcow2
header. While this is natural because raw doesn't need to write any
magic bytes during creation, hdev_create is free to clear out the first
sector to make sure the stale qcow2 header doesn't cause such confusion.

Signed-off-by: Fam Zheng 
Signed-off-by: Kevin Wolf 
---
 block/file-posix.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/block/file-posix.c b/block/file-posix.c
index ab12a2b591..36ee89e940 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2730,6 +2730,16 @@ static int hdev_create(const char *filename, QemuOpts 
*opts,
 ret = -ENOSPC;
 }
 
+if (!ret && total_size) {
+uint8_t buf[BDRV_SECTOR_SIZE] = { 0 };
+int64_t zero_size = MIN(BDRV_SECTOR_SIZE, total_size);
+if (lseek(fd, 0, SEEK_SET) == -1) {
+ret = -errno;
+} else {
+ret = qemu_write_full(fd, buf, zero_size);
+ret = ret == zero_size ? 0 : -errno;
+}
+}
 qemu_close(fd);
 return ret;
 }
-- 
2.13.5




[Qemu-block] [PULL 01/24] qemu-iotests: Add missing -machine accel=qtest

2017-09-26 Thread Kevin Wolf
A basic set of qemu options is initialised in ./common:

export QEMU_OPTIONS="-nodefaults -machine accel=qtest"

However, two test cases (172 and 186) overwrite QEMU_OPTIONS and neglect
to manually set '-machine accel=qtest'. Add the missing option for 172.
186 probably only copied the code from 172, it doesn't actually need to
overwrite QEMU_OPTIONS, so remove that in 186.

Signed-off-by: Kevin Wolf 
Tested-by: Cornelia Huck 
Reviewed-by: Cornelia Huck 
---
 tests/qemu-iotests/172 | 2 +-
 tests/qemu-iotests/186 | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/tests/qemu-iotests/172 b/tests/qemu-iotests/172
index 826d6fecd3..02c5f79bab 100755
--- a/tests/qemu-iotests/172
+++ b/tests/qemu-iotests/172
@@ -56,7 +56,7 @@ function do_run_qemu()
 done
 fi
 echo quit
-) | $QEMU -nographic -monitor stdio -serial none "$@"
+) | $QEMU -machine accel=qtest -nographic -monitor stdio -serial none "$@"
 echo
 }
 
diff --git a/tests/qemu-iotests/186 b/tests/qemu-iotests/186
index 2b9f618f90..44cc01ed87 100755
--- a/tests/qemu-iotests/186
+++ b/tests/qemu-iotests/186
@@ -56,15 +56,15 @@ function do_run_qemu()
 done
 fi
 echo quit
-) | $QEMU -S -nodefaults -display none -device virtio-scsi-pci -monitor 
stdio "$@" 2>&1
+) | $QEMU -S -display none -device virtio-scsi-pci -monitor stdio "$@" 2>&1
 echo
 }
 
 function check_info_block()
 {
 echo "info block" |
-QEMU_OPTIONS="" do_run_qemu "$@" | _filter_win32 | _filter_hmp |
-_filter_qemu | _filter_generated_node_ids
+do_run_qemu "$@" | _filter_win32 | _filter_hmp | _filter_qemu |
+_filter_generated_node_ids
 }
 
 
-- 
2.13.5




[Qemu-block] [PULL 00/24] Block layer patches

2017-09-26 Thread Kevin Wolf
The following changes since commit 1e3ee834083227f552179f6e43902cba5a866e6b:

  Merge remote-tracking branch 'remotes/thibault/tags/samuel-thibault' into 
staging (2017-09-25 20:31:24 +0100)

are available in the git repository at:

  git://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to b156d51b62e6970753e1f9f36f7c4d5fdbf4c619:

  Merge remote-tracking branch 'mreitz/tags/pull-block-2017-09-26' into 
queue-block (2017-09-26 15:03:02 +0200)


Block layer patches


Alberto Garcia (1):
  throttle: Assert that bkt->max is valid in throttle_compute_wait()

Cornelia Huck (3):
  iotests: use -ccw on s390x for 040, 139, and 182
  iotests: use -ccw on s390x for 051
  iotests: use virtio aliases for 067

Fam Zheng (3):
  qemu-img: Clarify about relative backing file options
  file-posix: Clear out first sector in hdev_create
  iotests: Print full path of bad output if mismatch

Kevin Wolf (9):
  qemu-iotests: Add missing -machine accel=qtest
  qemu-io: Drop write permissions before read-only reopen
  block: Add reopen_queue to bdrv_child_perm()
  block: Add reopen queue to bdrv_check_perm()
  block: Base permissions on rw state after reopen
  block: reopen: Queue children after their parents
  block: Fix permissions after bdrv_reopen()
  qemu-iotests: Test change-backing-file command
  Merge remote-tracking branch 'mreitz/tags/pull-block-2017-09-26' into 
queue-block

Manos Pitsidianakis (1):
  block/throttle-groups.c: allocate RestartData on the heap

Pavel Butsykin (4):
  qemu-img: add --shrink flag for resize
  qcow2: add qcow2_cache_discard
  qcow2: add shrink image support
  qemu-iotests: add shrinking image test

Stefan Hajnoczi (1):
  docs: add qemu-block-drivers(7) man page

Thomas Huth (1):
  block: Clean up some bad code in the vvfat driver

Vladimir Sementsov-Ogievskiy (2):
  iotests: fix 181: enable postcopy-ram capability on target
  block/qcow2-bitmap: fix use of uninitialized pointer

 qapi/block-core.json  |   8 +-
 block/qcow2.h |  17 +
 include/block/block.h |   2 +-
 include/block/block_int.h |   7 +
 block.c   | 191 +++---
 block/commit.c|   1 +
 block/file-posix.c|  10 +
 block/mirror.c|   1 +
 block/qcow2-bitmap.c  |   2 +-
 block/qcow2-cache.c   |  26 ++
 block/qcow2-cluster.c |  50 +++
 block/qcow2-refcount.c| 140 +++-
 block/qcow2.c |  43 ++-
 block/replication.c   |   1 +
 block/throttle-groups.c   |  12 +-
 block/vvfat.c |  27 +-
 qemu-img.c|  23 ++
 qemu-io-cmds.c|  12 +
 util/throttle.c   |   1 +
 Makefile  |   6 +-
 docs/qemu-block-drivers.texi  | 804 ++
 qemu-doc.texi | 781 +---
 qemu-img-cmds.hx  |   4 +-
 qemu-img.texi |  15 +-
 tests/qemu-iotests/040|   6 +-
 tests/qemu-iotests/051|  12 +-
 tests/qemu-iotests/051.out|   2 +-
 tests/qemu-iotests/051.pc.out |   2 +-
 tests/qemu-iotests/067|   2 +-
 tests/qemu-iotests/067.out|   2 +-
 tests/qemu-iotests/102|   4 +-
 tests/qemu-iotests/106|   2 +-
 tests/qemu-iotests/139|  12 +-
 tests/qemu-iotests/163| 170 +
 tests/qemu-iotests/163.out|   5 +
 tests/qemu-iotests/172|   2 +-
 tests/qemu-iotests/181|   2 +
 tests/qemu-iotests/181.out|   1 -
 tests/qemu-iotests/182|  13 +-
 tests/qemu-iotests/186|   6 +-
 tests/qemu-iotests/187.out|   2 +-
 tests/qemu-iotests/195|  92 +
 tests/qemu-iotests/195.out|  78 
 tests/qemu-iotests/check  |   2 +-
 tests/qemu-iotests/group  |   2 +
 45 files changed, 1708 insertions(+), 895 deletions(-)
 create mode 100644 docs/qemu-block-drivers.texi
 create mode 100644 tests/qemu-iotests/163
 create mode 100644 tests/qemu-iotests/163.out
 create mode 100755 tests/qemu-iotests/195
 create mode 100644 tests/qemu-iotests/195.out



Re: [Qemu-block] [Qemu-devel] [PATCH v1 2/8] tests: Replace fprintf(stderr, "*\n" with error_report()

2017-09-26 Thread Eric Blake
On 09/25/2017 07:55 PM, Emilio G. Cota wrote:

>> diff --git a/tests/check-qlit b/tests/check-qlit
>> new file mode 100755
>> index 
>> ..950429524e3eb07e6daed1fe01caad0f5d554809
>> GIT binary patch
>> literal 272776
>> zcmeEvdwf*Ywf~vPB$){zGeCghJ;($So{10*LNEgfoIs*MKvBRDLV(l`3b5QKOS6
> 
> ? I don't know what this is, I don't seem to have this binary in my
> checked out tree.

Ouch. We have a missing entry in .gitignore for a recent test addition,
which hits if you do an in-tree build and then an indiscriminate 'git
add tests/' (I've already mentioned it on-list).  It's a built
executable during 'make check', and doesn't belong in git.

https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg01067.html

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH v1 1/8] Replace all occurances of __FUNCTION__ with __func__

2017-09-26 Thread Eric Blake
On 09/25/2017 07:08 PM, Alistair Francis wrote:
> Replace all occurs of __FUNCTION__ except for the check in checkpatch
> with the non GCC specific __func__.
> 
> One line in hcd-musb.c was manually tweaked to pass checkpatch.
> 
> Signed-off-by: Alistair Francis 
> Cc: Gerd Hoffmann 
> Cc: Andrzej Zaborowski 
> Cc: Stefano Stabellini 
> Cc: Anthony Perard 
> Cc: John Snow 
> Cc: Aurelien Jarno 
> Cc: Yongbok Kim 
> Cc: Peter Crosthwaite 
> Cc: Stefan Hajnoczi  (supporter:Block
> Cc: Fam Zheng  (supporter:Block

That looks funny, with no closing ).  Something's breaking down between
get_maintainers.pl and your eventual email, although it's not fatal.

> Cc: Juan Quintela 
> Cc: "Dr. David Alan Gilbert" 
> Cc: qemu-...@nongnu.org
> Cc: qemu-block@nongnu.org
> Cc: xen-de...@lists.xenproject.org
> ---
> 

>  65 files changed, 273 insertions(+), 273 deletions(-)

Big but mechanical, so I'm okay without splitting it further.

> 
> diff --git a/audio/audio_int.h b/audio/audio_int.h
> index 5bcb1c60e1..543b1bd8d5 100644
> --- a/audio/audio_int.h
> +++ b/audio/audio_int.h
> @@ -253,7 +253,7 @@ static inline int audio_ring_dist (int dst, int src, int 
> len)
>  #define AUDIO_STRINGIFY(n) AUDIO_STRINGIFY_(n)
>  
>  #if defined _MSC_VER || defined __GNUC__
> -#define AUDIO_FUNC __FUNCTION__
> +#define AUDIO_FUNC __func__
>  #else
>  #define AUDIO_FUNC __FILE__ ":" AUDIO_STRINGIFY (__LINE__)
>  #endif

This can be further simplified.  We really aren't using _MSC_VER as our
compiler (can anyone prove me wrong?), and we DO require a C99 compiler
(per C99 6.4.2.2, __func__ support is mandatory), so we don't really
need the #else branch (or, for that matter, we probably don't even need
AUDIO_FUNC).  But to keep this patch mechanical, that can be a separate
followup.

> diff --git a/hw/arm/nseries.c b/hw/arm/nseries.c
> index 58005b6619..32687afced 100644
> --- a/hw/arm/nseries.c
> +++ b/hw/arm/nseries.c
> @@ -463,7 +463,7 @@ static uint32_t mipid_txrx(void *opaque, uint32_t cmd, 
> int len)
>  uint8_t ret;
>  
>  if (len > 9) {
> -hw_error("%s: FIXME: bad SPI word width %i\n", __FUNCTION__, len);
> +hw_error("%s: FIXME: bad SPI word width %i\n", __func__, len);

Not this patch's problem, but it would probably be simpler if hw_error()
were a macro that automatically prefixed __func__, rather than making
every caller have to supply it themselves.

> +++ b/hw/arm/omap1.c

> @@ -1716,7 +1716,7 @@ static void omap_clkm_write(void *opaque, hwaddr addr,
>  case 0x18:   /* ARM_SYSST */
>  if ((s->clkm.clocking_scheme ^ (value >> 11)) & 7) {
>  s->clkm.clocking_scheme = (value >> 11) & 7;
> -printf("%s: clocking scheme set to %s\n", __FUNCTION__,
> +printf("%s: clocking scheme set to %s\n", __func__,
>  clkschemename[s->clkm.clocking_scheme]);

Worth fixing the indentation while you are here?

> @@ -2473,7 +2473,7 @@ static void omap_pwt_write(void *opaque, hwaddr addr,
>  case 0x04:   /* VRC */
>  if ((value ^ s->vrc) & 1) {
>  if (value & 1)
> -printf("%s: %iHz buzz on\n", __FUNCTION__, (int)
> +printf("%s: %iHz buzz on\n", __func__, (int)
>  /* 1.5 MHz from a 12-MHz or 13-MHz PWT_CLK */
>  ((omap_clk_getrate(s->clk) >> 3) /
>   /* Pre-multiplexer divider */

Likewise?

> @@ -3330,13 +3330,13 @@ static void omap_mcbsp_writeh(void *opaque, hwaddr 
> addr,
>  s->mcr[1] = value & 0x03e3;
>  if (value & 3)   /* XMCM */
>  printf("%s: Tx channel selection mode enable attempt\n",
> -__FUNCTION__);
> +__func__);
>  return;
>  case 0x1a:   /* MCR1 */
>  s->mcr[0] = value & 0x03e1;
>  if (value & 1)   /* RMCM */
>  printf("%s: Rx channel selection mode enable attempt\n",
> -__FUNCTION__);
> +__func__);

and again


> +++ b/hw/arm/omap2.c
> @@ -1312,7 +1312,7 @@ static void omap_prcm_apll_update(struct omap_prcm_s *s)
>  
>  if (mode[0] == 1 || mode[0] == 2 || mode[1] == 1 || mode[1] == 2)
>  fprintf(stderr, "%s: bad EN_54M_PLL or bad EN_96M_PLL\n",
> -__FUNCTION__);
> +__func__);

More of the same. I'll quit pointing it out.


> +++ b/hw/block/onenand.c
> @@ -661,12 +661,12 @@ static uint64_t onenand_read(void *opaque, hwaddr addr,
>  case 0xff02: /* ECC Result of spare area 

Re: [Qemu-block] [PATCH v3 0/3] add bdrv_co_drain_begin/end BlockDriver callbacks

2017-09-26 Thread Manos Pitsidianakis

On Tue, Sep 26, 2017 at 12:00:24PM +0100, Stefan Hajnoczi wrote:

On Sat, Sep 23, 2017 at 02:14:08PM +0300, Manos Pitsidianakis wrote:

This patch series renames bdrv_co_drain to bdrv_co_drain_begin and adds a new
bdrv_co_drain_end callback to match bdrv_drained_begin/end and
drained_begin/end of BdrvChild. This is needed because the throttle driver
(block/throttle.c) needs a way to mark the end of the drain in order to toggle
io_limits_disabled correctly.

Based-on: <20170918202529.28379-1-el13...@mail.ntua.gr>
"block/throttle-groups.c: allocate RestartData on the heap"
Which fixes a coroutine crash in block/throttle-groups.c

v3:
  fixed commit message typo in first patch [Fam]
  rephrased doc comment based on mailing discussion
v2:
  add doc for callbacks and change order of request polling for completion
  [Stefan]

Manos Pitsidianakis (3):
  block: add bdrv_co_drain_end callback
  block: rename bdrv_co_drain to bdrv_co_drain_begin
  block/throttle.c: add bdrv_co_drain_begin/end callbacks

 include/block/block_int.h | 13 ++---
 block/io.c| 48 +--
 block/qed.c   |  6 +++---
 block/throttle.c  | 18 ++
 4 files changed, 65 insertions(+), 20 deletions(-)


Oops, this seems to cause a qemu-iotests failure.  Please take a look:

$ ./check -qcow2 184
184 0s ... - output mismatch (see 184.out.bad)
--- /home/stefanha/qemu/tests/qemu-iotests/184.out  2017-09-19 
14:51:46.673854437 +0100
+++ 184.out.bad 2017-09-26 11:13:06.946610239 +0100
@@ -142,6 +142,9 @@
"guest": false
}
}
+./common.config: line 118:  9196 Segmentation fault  (core dumped) ( if [ -n 
"${QEMU_NEED_PID}" ]; then
+echo $BASHPID > "${QEMU_TEST_DIR}/qemu-${_QEMU_HANDLE}.pid";
+fi; exec "$QEMU_PROG" $QEMU_OPTIONS "$@" )



Hey Stefan,

This is fixed in 

Based-on: <20170918202529.28379-1-el13...@mail.ntua.gr>
"block/throttle-groups.c: allocate RestartData on the heap"
Which fixes a coroutine crash in block/throttle-groups.c


Which I sent before this series and is in Kevin's branch.


signature.asc
Description: PGP signature


Re: [Qemu-block] ping Re: [PATCH v7 03/16] migration: split common postcopy out of ram postcopy

2017-09-26 Thread Kevin Wolf
Am 26.09.2017 um 12:21 hat Dr. David Alan Gilbert geschrieben:
> * Kevin Wolf (kw...@redhat.com) wrote:
> > Am 25.09.2017 um 17:27 hat Dr. David Alan Gilbert geschrieben:
> > > > > Whatever you think the preferred way to set up postcopy migration is: 
> > > > > If
> > > > > something worked before this patch and doesn't after it, that's a
> > > > > regression and breaks backwards compatibility.
> > > > > 
> > > > > If we were talking about a graceful failure, maybe we could discuss
> > > > > whether carefully and deliberately breaking compatibility could be
> > > > > justified in this specific case. But the breakage is neither mentioned
> > > > > in the commit message nor is it graceful, so I can only call it a bug.
> > > > > 
> > > > > Kevin
> > > > 
> > > > It's of course my fault, I don't mean "it's wrong test, so it's not my
> > > > problem") And I've already sent a patch.
> > > 
> > > Why does this fail so badly, asserts etc - I was hoping for something
> > > a bit more obvious from the migration code.
> > > 
> > > postcopy did originally work without the destination having the flag on
> > > but setting the flag on the destination was always good practice because
> > > it detected whether the host support was there early on.
> > 
> > So what does this mean for 2.11? Do you think it is acceptable breaking
> > cases where the flag isn't set on the destination?
> 
> I think so, because we've always recommended setting it on the
> destination for the early detection.

Okay, I'll include the test case patch in my pull request today then.

> > If so, just changing the test case is enough. But if not, I'd rather
> > keep the test case as it is and fix only the migration code.
> 
> I'd take the test case fix, but I also want to dig why it fails so
> badly; it would be nice just to have a clean failure telling you
> that postcopy was expected.

Yes, that would be nice.

Kevin



Re: [Qemu-block] [PATCH v3 0/3] add bdrv_co_drain_begin/end BlockDriver callbacks

2017-09-26 Thread Stefan Hajnoczi
On Sat, Sep 23, 2017 at 02:14:08PM +0300, Manos Pitsidianakis wrote:
> This patch series renames bdrv_co_drain to bdrv_co_drain_begin and adds a new 
> bdrv_co_drain_end callback to match bdrv_drained_begin/end and 
> drained_begin/end of BdrvChild. This is needed because the throttle driver 
> (block/throttle.c) needs a way to mark the end of the drain in order to 
> toggle 
> io_limits_disabled correctly.
> 
> Based-on: <20170918202529.28379-1-el13...@mail.ntua.gr>
> "block/throttle-groups.c: allocate RestartData on the heap"
> Which fixes a coroutine crash in block/throttle-groups.c
> 
> v3:
>   fixed commit message typo in first patch [Fam]
>   rephrased doc comment based on mailing discussion
> v2: 
>   add doc for callbacks and change order of request polling for completion 
>   [Stefan]
> 
> Manos Pitsidianakis (3):
>   block: add bdrv_co_drain_end callback
>   block: rename bdrv_co_drain to bdrv_co_drain_begin
>   block/throttle.c: add bdrv_co_drain_begin/end callbacks
> 
>  include/block/block_int.h | 13 ++---
>  block/io.c| 48 
> +--
>  block/qed.c   |  6 +++---
>  block/throttle.c  | 18 ++
>  4 files changed, 65 insertions(+), 20 deletions(-)

Oops, this seems to cause a qemu-iotests failure.  Please take a look:

$ ./check -qcow2 184
184 0s ... - output mismatch (see 184.out.bad)
--- /home/stefanha/qemu/tests/qemu-iotests/184.out  2017-09-19 
14:51:46.673854437 +0100
+++ 184.out.bad 2017-09-26 11:13:06.946610239 +0100
@@ -142,6 +142,9 @@
 "guest": false
 }
 }
+./common.config: line 118:  9196 Segmentation fault  (core dumped) ( if [ 
-n "${QEMU_NEED_PID}" ]; then
+echo $BASHPID > "${QEMU_TEST_DIR}/qemu-${_QEMU_HANDLE}.pid";
+fi; exec "$QEMU_PROG" $QEMU_OPTIONS "$@" )



Re: [Qemu-block] ping Re: [PATCH v7 03/16] migration: split common postcopy out of ram postcopy

2017-09-26 Thread Dr. David Alan Gilbert
* Kevin Wolf (kw...@redhat.com) wrote:
> Am 25.09.2017 um 17:27 hat Dr. David Alan Gilbert geschrieben:
> > > > Whatever you think the preferred way to set up postcopy migration is: If
> > > > something worked before this patch and doesn't after it, that's a
> > > > regression and breaks backwards compatibility.
> > > > 
> > > > If we were talking about a graceful failure, maybe we could discuss
> > > > whether carefully and deliberately breaking compatibility could be
> > > > justified in this specific case. But the breakage is neither mentioned
> > > > in the commit message nor is it graceful, so I can only call it a bug.
> > > > 
> > > > Kevin
> > > 
> > > It's of course my fault, I don't mean "it's wrong test, so it's not my
> > > problem") And I've already sent a patch.
> > 
> > Why does this fail so badly, asserts etc - I was hoping for something
> > a bit more obvious from the migration code.
> > 
> > postcopy did originally work without the destination having the flag on
> > but setting the flag on the destination was always good practice because
> > it detected whether the host support was there early on.
> 
> So what does this mean for 2.11? Do you think it is acceptable breaking
> cases where the flag isn't set on the destination?

I think so, because we've always recommended setting it on the
destination for the early detection.

> If so, just changing the test case is enough. But if not, I'd rather
> keep the test case as it is and fix only the migration code.

I'd take the test case fix, but I also want to dig why it fails so
badly; it would be nice just to have a clean failure telling you
that postcopy was expected.

Dave

> 
> Kevin
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-block] ping Re: [PATCH v7 03/16] migration: split common postcopy out of ram postcopy

2017-09-26 Thread Kevin Wolf
Am 25.09.2017 um 17:27 hat Dr. David Alan Gilbert geschrieben:
> > > Whatever you think the preferred way to set up postcopy migration is: If
> > > something worked before this patch and doesn't after it, that's a
> > > regression and breaks backwards compatibility.
> > > 
> > > If we were talking about a graceful failure, maybe we could discuss
> > > whether carefully and deliberately breaking compatibility could be
> > > justified in this specific case. But the breakage is neither mentioned
> > > in the commit message nor is it graceful, so I can only call it a bug.
> > > 
> > > Kevin
> > 
> > It's of course my fault, I don't mean "it's wrong test, so it's not my
> > problem") And I've already sent a patch.
> 
> Why does this fail so badly, asserts etc - I was hoping for something
> a bit more obvious from the migration code.
> 
> postcopy did originally work without the destination having the flag on
> but setting the flag on the destination was always good practice because
> it detected whether the host support was there early on.

So what does this mean for 2.11? Do you think it is acceptable breaking
cases where the flag isn't set on the destination?

If so, just changing the test case is enough. But if not, I'd rather
keep the test case as it is and fix only the migration code.

Kevin



Re: [Qemu-block] [PATCH v3 0/3] add bdrv_co_drain_begin/end BlockDriver callbacks

2017-09-26 Thread Stefan Hajnoczi
On Sat, Sep 23, 2017 at 02:14:08PM +0300, Manos Pitsidianakis wrote:
> This patch series renames bdrv_co_drain to bdrv_co_drain_begin and adds a new 
> bdrv_co_drain_end callback to match bdrv_drained_begin/end and 
> drained_begin/end of BdrvChild. This is needed because the throttle driver 
> (block/throttle.c) needs a way to mark the end of the drain in order to 
> toggle 
> io_limits_disabled correctly.
> 
> Based-on: <20170918202529.28379-1-el13...@mail.ntua.gr>
> "block/throttle-groups.c: allocate RestartData on the heap"
> Which fixes a coroutine crash in block/throttle-groups.c
> 
> v3:
>   fixed commit message typo in first patch [Fam]
>   rephrased doc comment based on mailing discussion
> v2: 
>   add doc for callbacks and change order of request polling for completion 
>   [Stefan]
> 
> Manos Pitsidianakis (3):
>   block: add bdrv_co_drain_end callback
>   block: rename bdrv_co_drain to bdrv_co_drain_begin
>   block/throttle.c: add bdrv_co_drain_begin/end callbacks
> 
>  include/block/block_int.h | 13 ++---
>  block/io.c| 48 
> +--
>  block/qed.c   |  6 +++---
>  block/throttle.c  | 18 ++
>  4 files changed, 65 insertions(+), 20 deletions(-)
> 
> -- 
> 2.11.0
> 

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan



Re: [Qemu-block] [PATCH 0/5] commit: Support multiple roots above top node

2017-09-26 Thread Kevin Wolf
Am 25.09.2017 um 22:02 hat John Snow geschrieben:
> On 09/25/2017 08:28 AM, Kevin Wolf wrote:
> > This is a step towards making the commit job flexible enough that it can
> > work with any kind of block graph. Currently, it requires that not only
> > the top and base node of the commit operation are specified, but also
> > the active layer of the backing file chain. Of course, the assumption
> > that a single active layer exists is invalid.
> > 
> > This series makes the commit job consider other roots as well so that
> > all parent nodes of the top node get their backing file updated and stay
> > valid after the commit job completes.
> > 
> > With this, we should have all of the prerequisites for a follow-up
> > series that adds a new and clean blockdev-commit QMP command which
> > doesn't require an option for the active layer and which accepts node
> > names instead of file names for base and top.
> > 
> > Kevin Wolf (5):
> >   block: Introduce BdrvChildRole.update_filename
> >   commit: Support multiple roots above top node
> >   qemu-iotests: Allow QMP pretty printing in common.qemu
> >   qemu-iotests: Test commit block job where top has two parents
> >   commit: Remove overlay_bs
> 
> Does this depend on another series?

It is based on my block branch. I think specifically the series "block:
Fix permissions after ro/rw reopen" might be needed for the patches to
work correctly.

Kevin