Re: [Qemu-block] Failed to get "consistent read" lock on a mirroring image

2017-11-20 Thread Kevin Wolf
[ Cc: qemu-block ]

Am 20.11.2017 um 09:47 hat Fam Zheng geschrieben:
> On Mon, 11/20 10:58, Han Han wrote:
> > Hello,
> > On qemu-2.10, I find 'qemu-img info' couldn't get the info of a mirroring
> > image:
> > # /usr/libexec/qemu-kvm -name A -machine pc,accel=kvm \
> > -vnc 0.0.0.0:1 \
> > -monitor stdio \
> > -qmp tcp:0:,server,nowait \
> > -serial unix:/tmp/monitor,server,nowait \
> > -drive
> > file=/var/lib/libvirt/images/V.qcow2,format=qcow2,if=none,id=drive-virtio-blk0,werror=stop,rerror=stop
> > \
> > -device virtio-blk-pci,drive=drive-virtio-blk0,id=virtio-blk0
> > 
> > QEMU 2.10.0 monitor - type 'help' for more information
> > (qemu) drive_mirror drive-virtio-blk0 /var/lib/libvirt/images/V.new
> > Formatting '/var/lib/libvirt/images/V.new', fmt=qcow2 size=10737418240
> > cluster_size=65536 lazy_refcounts=off refcount_bits=16
> > 
> > # qemu-img info /var/lib/libvirt/images/V.new -U
> > qemu-img: Could not open '/var/lib/libvirt/images/V.new': *Failed to get
> > "consistent read" lock*
> > Is another process using the image?
> 
> Cc Kevin. Looks like -U here is not strong enough to skip the "consistent 
> read"
> lock. The drive mirror target BB shared permissions are different from device
> BB, but I'm not sure if it is intentional.

I think there are two parts to this:

First, blocking consistent reads for the mirror target seems
unnecessary. I can send a patch to allow this in 2.11.

The other part is that 'qemu-img info' doesn't actually need
BLK_PERM_CONSISTENT_READ, but blk_new_open() automatically requests it.
Maybe we need another flag that allows 'qemu-img info' to do without
this permission. The concrete use case are intermediate nodes of a
commit job, where we do have to block this permission.

Hm... Or is BDRV_O_NO_IO already the right flag for this? :-)

Kevin



Re: [Qemu-block] [Qemu-devel] [PATCH v8 04/14] block/dirty-bitmap: add bdrv_dirty_bitmap_set_frozen

2017-11-20 Thread Vladimir Sementsov-Ogievskiy

18.11.2017 02:46, John Snow wrote:


On 11/17/2017 12:30 PM, Vladimir Sementsov-Ogievskiy wrote:

17.11.2017 20:20, John Snow wrote:

On 11/17/2017 09:46 AM, Vladimir Sementsov-Ogievskiy wrote:

14.11.2017 02:32, John Snow wrote:

On 10/30/2017 12:32 PM, Vladimir Sementsov-Ogievskiy wrote:

Make it possible to set bitmap 'frozen' without a successor.
This is needed to protect the bitmap during outgoing bitmap postcopy
migration.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
    include/block/dirty-bitmap.h |  1 +
    block/dirty-bitmap.c | 22 --
    2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/include/block/dirty-bitmap.h
b/include/block/dirty-bitmap.h
index a9e2a92e4f..ae6d697850 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -39,6 +39,7 @@ uint32_t
bdrv_get_default_bitmap_granularity(BlockDriverState *bs);
    uint32_t bdrv_dirty_bitmap_granularity(const BdrvDirtyBitmap
*bitmap);
    bool bdrv_dirty_bitmap_enabled(BdrvDirtyBitmap *bitmap);
    bool bdrv_dirty_bitmap_frozen(BdrvDirtyBitmap *bitmap);
+void bdrv_dirty_bitmap_set_frozen(BdrvDirtyBitmap *bitmap, bool
frozen);
    const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap *bitmap);
    int64_t bdrv_dirty_bitmap_size(const BdrvDirtyBitmap *bitmap);
    DirtyBitmapStatus bdrv_dirty_bitmap_status(BdrvDirtyBitmap
*bitmap);
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 7578863aa1..67fc6bd6e0 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -40,6 +40,8 @@ struct BdrvDirtyBitmap {
    QemuMutex *mutex;
    HBitmap *bitmap;    /* Dirty bitmap implementation */
    HBitmap *meta;  /* Meta dirty bitmap */
+    bool frozen;    /* Bitmap is frozen, it can't be
modified
+   through QMP */

I hesitate, because this now means that we have two independent bits of
state we need to update and maintain consistency with.

it was your proposal)))

https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg01172.html

"
Your new use case here sounds like Frozen to me, but it simply does not
have an anonymous successor to force it to be recognized as "frozen." We
can add a `bool protected` or `bool frozen` field to force recognition
of this status and adjust the json documentation accordingly.

I think then we'd have four recognized states:

FROZEN: Cannot be reset/deleted. Bitmap is in-use by a block job or
other internal process. Bitmap is otherwise ACTIVE.
DISABLED: Not recording any writes (by choice.)
READONLY: Not able to record any writes (by necessity.)
ACTIVE: Normal bitmap status.

Sound right?
"



I was afraid you'd say that :(

It's okay, anyway. I shouldn't let myself go so long between reviews
like this, because you catch me changing my mind. Anyway, please go
ahead with it. I don't want to delay you on something that works because
I can't make up *my* mind.

Hm, if you remember, reusing "frozen" state was strange for me too. And
it's not
late to move to
1. make a new state: MIGRATION, which disallows qmp operations on bitmap
2. or just make them unnamed, so they can't be touched by qmp

"Migrating" is fine as a state name. You could probably announce this by
having it be "frozen" in the usual way (it has a successor) and a new
bool that lets you do whatever special handling you need to do for it.


create a successor and merge it in before postcopy stage? it is possible 
but I don't like it, looks like cheat..





anything is ok for me as well as leaving it as is. It's all little
things, the core is patch 10.

"frozen" sounds like unchanged, but user will see dirty-count changing
in query-block.

I guess it's a strange misnomer now... or maybe just always was a bad
name, since it's not really "frozen" but rather "locked" in a way that
the QMP user cannot interfere with it -- but it's still a live,
functioning object.


so, may be the best way is to add LOCKED state? which mean:

bitmap is under some operation and is not operable by qmp. dirty-count may
not reflect current state of the bitmap until operation end. and than we 
will be
able to move to 'locked' instead of 'frozen' for backups to, and 
deprecate frozen

state.



I'm remembering what I was talking about, but I think my preference is
illustrably worse. I was trying to avoid boolean bloat by advocating
re-use, but the re-use is kind of confusing...

I think I was hoping that a bitmap on the receiving end could simply be
"frozen" in the usual way, in that it has a successor recording writes.

I think the way you want to handle it though is with different semantics
for cleanup and recovery which makes it not quite the same state, which
needs either a new bool or some such.

Go with what you think is cleanest at this point, including if you want
to leave it alone. I don't want to cause you to respin it over bikeshedding.

--js




--
Best regards,
Vladimir




Re: [Qemu-block] [Qemu-devel] [PATCH v8 02/14] block/dirty-bitmap: add locked version of bdrv_release_dirty_bitmap

2017-11-20 Thread Vladimir Sementsov-Ogievskiy

17.11.2017 21:25, John Snow wrote:


On 11/17/2017 03:07 AM, Vladimir Sementsov-Ogievskiy wrote:

11.11.2017 01:52, John Snow wrote:

On 10/30/2017 12:32 PM, Vladimir Sementsov-Ogievskiy wrote:

It is needed to realize bdrv_dirty_bitmap_release_successor in
the following patch.


OK, but...


Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
   block/dirty-bitmap.c | 25 -
   1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 81adbeb6d4..981f99d362 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -326,13 +326,13 @@ static bool
bdrv_dirty_bitmap_has_name(BdrvDirtyBitmap *bitmap)
   return !!bdrv_dirty_bitmap_name(bitmap);
   }
   -/* Called with BQL taken.  */
-static void bdrv_do_release_matching_dirty_bitmap(
+/* Called within bdrv_dirty_bitmap_lock..unlock */

...Add this so it will compile:

how do you compile to get an error? and what is unused?


.../src/qemu/block/dirty-bitmap.c:368:13: error:
‘bdrv_release_dirty_bitmap_locked’ defined but not used
[-Werror=unused-function]
  static void bdrv_release_dirty_bitmap_locked(BlockDriverState *bs,
  ^~~~
cc1: all warnings being treated as errors


I commented on the wrong prototype. The ((__unused__)) attribute just
quiets this warning so it can compile without you having to refactor.



aha ok, you are right



--js



--
Best regards,
Vladimir




Re: [Qemu-block] [PATCH] block/snapshot: dirty all dirty bitmaps on snapshot-switch

2017-11-20 Thread Vladimir Sementsov-Ogievskiy

17.11.2017 23:40, John Snow wrote:


On 11/17/2017 01:25 PM, Kevin Wolf wrote:

Am 17.11.2017 um 19:15 hat John Snow geschrieben:


On 11/17/2017 10:01 AM, Max Reitz wrote:

On 2017-11-17 13:30, Kevin Wolf wrote:

Am 23.10.2017 um 11:29 hat Vladimir Sementsov-Ogievskiy geschrieben:

Snapshot-switch actually changes active state of disk so it should
reflect on dirty bitmaps. Otherwise next incremental backup using
these bitmaps will be invalid.

Signed-off-by: Vladimir Sementsov-Ogievskiy 

We discussed this quite a while ago, and I'm still not convinced that
this approach makes sense.

I think it at least makes more sense than not handling this case at all.


Can you give just one example of a use case where dirtying the whole
bitmap while loading a snapshot is the desired behaviour?

I think the most useful behaviour would be something where the bitmaps
themselves are snapshotted, too.

Agreed.


  But for the time being, the easiest and
safest solution might just be to error out in any snapshot operations
if any bitmaps are in use.

Sounds OK, too.  I personally don't have an opinion either way.

But in any case, what we did before this patch was definitely wrong so I
consider it an improvement.


This is how I feel about it too. Erroring out entirely is an option, but
code-wise just dirtying everything is at least verifiably not-wrong and
pretty simple to implement.

But that's exactly the problem: If something is just plain wrong, you
can always replace it with something that makes sense. If it errors out,
you can still remove that error later. But if you have something that
doesn't make a whole lot of sense, but kinda sorta works (like after
this patch), it's ABI and you can't implement something more useful
later any more.


It's an improvement... Don't do it, but at least you won't get
something wrong after, just something heinously unoptimal.

It's a short-term improvement that may become a long-term burden.

Kevin


I see your point.

If we enable it now, we always have to.
If we disable it now, we *can* later if we wish.

...however, I think it's been the case that we haven't prohibited it
before, but also it's a pretty good case that nobody has been using this
feature in production because they're not yet persistent and migratable.

An error message asking the user to delete bitmaps (which will very
obviously invalidate them) would be fine, too. I was erring on the side
of "just let things work," but you have a point that making sure the
user knows that what they're trying to accomplish is not a good idea is
probably better than silently doing the very stupid thing.


Hi all. erroring  is ok for me too. And looks like it's a way which 
looks ok for all interested in. Should I make a patch?


--
Best regards,
Vladimir




Re: [Qemu-block] [PATCH 3/5] coroutines: abort if we try to enter a still-sleeping coroutine

2017-11-20 Thread Stefan Hajnoczi
On Sun, Nov 19, 2017 at 09:46:44PM -0500, Jeff Cody wrote:
> diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
> index 931cdc9..b071217 100644
> --- a/include/qemu/coroutine_int.h
> +++ b/include/qemu/coroutine_int.h
> @@ -56,6 +56,8 @@ struct Coroutine {
>  
>  int scheduled;
>  
> +int sleeping;

s/int/bool/

BTW an alternative to adding individual bools is to implement a finite
state machine for the entire coroutine lifecycle.  A single function can
validate all state transitions:

  void check_state_transition(CoState old, CoState new,
  const char *action)
  {
  const char *errmsg = fsm[old][new];
  if (!errmsg) {
  return; /* valid transition! */
  }

  fprintf(stderr, "Cannot %s coroutine from %s state\n",
  action, state_name[old]);
  abort();
  }

Specifying fsm[][] forces us to think through all possible state
transitions.  This approach is proactive whereas adding bool flags is
reactive since it only covers a subset of states that were encountered
after crashes.  I'm not sure if it's worth it though :).


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH] block/snapshot: dirty all dirty bitmaps on snapshot-switch

2017-11-20 Thread Vladimir Sementsov-Ogievskiy

20.11.2017 12:51, Vladimir Sementsov-Ogievskiy wrote:

17.11.2017 23:40, John Snow wrote:


On 11/17/2017 01:25 PM, Kevin Wolf wrote:

Am 17.11.2017 um 19:15 hat John Snow geschrieben:


On 11/17/2017 10:01 AM, Max Reitz wrote:

On 2017-11-17 13:30, Kevin Wolf wrote:

Am 23.10.2017 um 11:29 hat Vladimir Sementsov-Ogievskiy geschrieben:

Snapshot-switch actually changes active state of disk so it should
reflect on dirty bitmaps. Otherwise next incremental backup using
these bitmaps will be invalid.

Signed-off-by: Vladimir Sementsov-Ogievskiy 

We discussed this quite a while ago, and I'm still not convinced 
that

this approach makes sense.
I think it at least makes more sense than not handling this case 
at all.



Can you give just one example of a use case where dirtying the whole
bitmap while loading a snapshot is the desired behaviour?

I think the most useful behaviour would be something where the 
bitmaps

themselves are snapshotted, too.

Agreed.


But for the time being, the easiest and
safest solution might just be to error out in any snapshot 
operations

if any bitmaps are in use.

Sounds OK, too.  I personally don't have an opinion either way.

But in any case, what we did before this patch was definitely 
wrong so I

consider it an improvement.

This is how I feel about it too. Erroring out entirely is an 
option, but
code-wise just dirtying everything is at least verifiably not-wrong 
and

pretty simple to implement.

But that's exactly the problem: If something is just plain wrong, you
can always replace it with something that makes sense. If it errors 
out,

you can still remove that error later. But if you have something that
doesn't make a whole lot of sense, but kinda sorta works (like after
this patch), it's ABI and you can't implement something more useful
later any more.


It's an improvement... Don't do it, but at least you won't get
something wrong after, just something heinously unoptimal.

It's a short-term improvement that may become a long-term burden.

Kevin


I see your point.

If we enable it now, we always have to.
If we disable it now, we *can* later if we wish.

...however, I think it's been the case that we haven't prohibited it
before, but also it's a pretty good case that nobody has been using this
feature in production because they're not yet persistent and migratable.

An error message asking the user to delete bitmaps (which will very
obviously invalidate them) would be fine, too. I was erring on the side
of "just let things work," but you have a point that making sure the
user knows that what they're trying to accomplish is not a good idea is
probably better than silently doing the very stupid thing.


Hi all. erroring  is ok for me too. And looks like it's a way which 
looks ok for all interested in. Should I make a patch?




Hmm, it looks not so easy: errp infrastructure is absent here, and 
returning just EINVAL is not very informative.


--
Best regards,
Vladimir




Re: [Qemu-block] [Qemu-devel] [PATCH for-2.12 0/4] qmp dirty bitmap API

2017-11-20 Thread Denis V. Lunev
On 11/17/2017 06:10 AM, John Snow wrote:
>
> On 11/16/2017 03:17 AM, Vladimir Sementsov-Ogievskiy wrote:
>> 16.11.2017 00:20, John Snow wrote:
>>> On 11/13/2017 11:20 AM, Vladimir Sementsov-Ogievskiy wrote:
 Hi all.

 There are three qmp commands, needed to implement external backup API.

 Using these three commands, client may do all needed bitmap
 management by
 hand:

 on backup start we need to do a transaction:
   {disable old bitmap, create new bitmap}

 on backup success:
   drop old bitmap

 on backup fail:
   enable old bitmap
   merge new bitmap to old bitmap
   drop new bitmap

>>> Can you give me an example of how you expect these commands to be used,
>>> and why they're required?
>>>
>>> I'm a little weary about how error-prone these commands might be and the
>>> potential for incorrect usage seems... high. Why do we require them?
>> It is needed for incremental backup. It looks like bad idea to export
>> abdicate/reclaim functionality, it is simpler
>> and clearer to allow user to merge/enable/disable bitmaps by hand.
>>
>> usage is like this:
>>
>> 1. we have dirty bitmap bitmap0 for incremental backup.
>>
>> 2. prepare image fleecing (create temporary image with backing=our_disk)
>> 3. in qmp transaction:
>>     - disable bitmap0
>>     - create bitmap1
>>     - start image fleecing (backup sync=none our_disk -> temp_disk)
> This could probably just be its own command, though:
>
> block-job-fleece node=foobar bitmap=bitmap0 etc=etera etc=etera
>
> Could handle forking the bitmap. I'm not sure what the arguments would
> look like, but we could name the NBD export here, too. (Assuming the
> server is already started and we just need to create the share.)
>
> Then, we can basically do what mirror does:
>
> (1) Cancel
> (2) Complete
>
> Cancel would instruct QEMU to keep the bitmap changes (i.e. roll back),
> and Complete would instruct QEMU to discard the changes.
>
> This way we don't need to expose commands like split or merge that will
> almost always be dangerous to use over QMP.
>
> In fact, a fleecing job would be really convenient even without a
> bitmap, because it'd still be nice to have a convenience command for it.
> Using an existing infrastructure and understood paradigm is just a bonus.
>
actually this is a very good question about safety/simplicity...

We actually have spent a bit of time on design and have not
come to a good solution. Yes, technically for now we can put
all into fleecing command and rely on its semantics. Though
I am not convinced with that approach. We can think that this
can be reused on snapshot operations (may be, may be not).
Also technically there are other cases.

This is actually a matter that QEMU should provide low level
API while management software should make decisions.
Providing merge etc operations is done using exactly that
approach. We can also consider this in a similar way we have
recently fixed "revert to snapshot" operation. Management
can make and should make a decision.

Den




Re: [Qemu-block] [PATCH for-2.11 3/3] block: Error out on load_vm with active dirty bitmaps

2017-11-20 Thread Vladimir Sementsov-Ogievskiy

20.11.2017 17:50, Kevin Wolf wrote:

Loading a snapshot invalidates the bitmap. Just marking all blocks dirty
is not a useful response in practice, instead the user needs to be aware
that we switch to a completely different state. If they are okay with
losing the dirty bitmap, they can just explicitly delete it.

This effectively reverts commit 04dec3c3ae5.

Signed-off-by: Kevin Wolf 



Reviewed-by: Vladimir Sementsov-Ogievskiy 


---
  block/snapshot.c | 15 +++
  1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/block/snapshot.c b/block/snapshot.c
index 13ec3b1c8c..6b338978c5 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -182,25 +182,16 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
  {
  BlockDriver *drv = bs->drv;
  int ret, open_ret;
-int64_t len;
  
  if (!drv) {

  error_setg(errp, "Block driver is closed");
  return -ENOMEDIUM;
  }
  
-len = bdrv_getlength(bs);

-if (len < 0) {
-error_setg_errno(errp, -len, "Cannot get block device size");
-return len;
+if (!QLIST_EMPTY(>dirty_bitmaps)) {
+error_setg(errp, "Device has active dirty bitmaps");
+return -EBUSY;
  }
-/* We should set all bits in all enabled dirty bitmaps, because dirty
- * bitmaps reflect active state of disk and snapshot switch operation
- * actually dirties active state.
- * TODO: It may make sense not to set all bits but analyze block status of
- * current state and destination snapshot and do not set bits corresponding
- * to both-zero or both-unallocated areas. */
-bdrv_set_dirty(bs, 0, len);
  
  if (drv->bdrv_snapshot_goto) {

  ret = drv->bdrv_snapshot_goto(bs, snapshot_id);



--
Best regards,
Vladimir




[Qemu-block] [PATCH 1/7] qapi: add unmap to BlockDeviceStats

2017-11-20 Thread Anton Nefedov
Signed-off-by: Anton Nefedov 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 qapi/block-core.json   | 20 
 include/block/accounting.h |  1 +
 block/qapi.c   |  6 ++
 3 files changed, 27 insertions(+)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 76bf50f..ba2705d 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -730,6 +730,23 @@
 # @timed_stats: Statistics specific to the set of previously defined
 #   intervals of time (Since 2.5)
 #
+# @unmap_bytes: The number of bytes unmapped by the device (Since 2.12)
+#
+# @unmap_operations: The number of unmap operations performed by the device
+#(Since 2.12)
+#
+# @unmap_total_time_ns: Total time spent on unmap operations in nano-seconds
+#   (Since 2.12)
+#
+# @unmap_merged: Number of unmap requests that have been merged into another
+#request (Since 2.12)
+#
+# @failed_unmap_operations: The number of failed unmap operations performed
+#   by the device (Since 2.12)
+#
+# @invalid_unmap_operations: The number of invalid unmap operations performed
+#by the device (Since 2.12)
+#
 # Since: 0.14.0
 ##
 { 'struct': 'BlockDeviceStats',
@@ -741,6 +758,9 @@
'failed_rd_operations': 'int', 'failed_wr_operations': 'int',
'failed_flush_operations': 'int', 'invalid_rd_operations': 'int',
'invalid_wr_operations': 'int', 'invalid_flush_operations': 'int',
+   'unmap_bytes' : 'int', 'unmap_operations': 'int',
+   'unmap_total_time_ns': 'int', 'unmap_merged': 'int',
+   'failed_unmap_operations': 'int', 'invalid_unmap_operations': 'int',
'account_invalid': 'bool', 'account_failed': 'bool',
'timed_stats': ['BlockDeviceTimedStats'] } }
 
diff --git a/include/block/accounting.h b/include/block/accounting.h
index b833d26..4e53c4a 100644
--- a/include/block/accounting.h
+++ b/include/block/accounting.h
@@ -35,6 +35,7 @@ enum BlockAcctType {
 BLOCK_ACCT_READ,
 BLOCK_ACCT_WRITE,
 BLOCK_ACCT_FLUSH,
+BLOCK_ACCT_UNMAP,
 BLOCK_MAX_IOTYPE,
 };
 
diff --git a/block/qapi.c b/block/qapi.c
index fc10f0a..6e110f2 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -396,24 +396,30 @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, 
BlockBackend *blk)
 
 ds->rd_bytes = stats->nr_bytes[BLOCK_ACCT_READ];
 ds->wr_bytes = stats->nr_bytes[BLOCK_ACCT_WRITE];
+ds->unmap_bytes = stats->nr_bytes[BLOCK_ACCT_UNMAP];
 ds->rd_operations = stats->nr_ops[BLOCK_ACCT_READ];
 ds->wr_operations = stats->nr_ops[BLOCK_ACCT_WRITE];
+ds->unmap_operations = stats->nr_ops[BLOCK_ACCT_UNMAP];
 
 ds->failed_rd_operations = stats->failed_ops[BLOCK_ACCT_READ];
 ds->failed_wr_operations = stats->failed_ops[BLOCK_ACCT_WRITE];
 ds->failed_flush_operations = stats->failed_ops[BLOCK_ACCT_FLUSH];
+ds->failed_unmap_operations = stats->failed_ops[BLOCK_ACCT_UNMAP];
 
 ds->invalid_rd_operations = stats->invalid_ops[BLOCK_ACCT_READ];
 ds->invalid_wr_operations = stats->invalid_ops[BLOCK_ACCT_WRITE];
 ds->invalid_flush_operations =
 stats->invalid_ops[BLOCK_ACCT_FLUSH];
+ds->invalid_unmap_operations = stats->invalid_ops[BLOCK_ACCT_UNMAP];
 
 ds->rd_merged = stats->merged[BLOCK_ACCT_READ];
 ds->wr_merged = stats->merged[BLOCK_ACCT_WRITE];
+ds->unmap_merged = stats->merged[BLOCK_ACCT_UNMAP];
 ds->flush_operations = stats->nr_ops[BLOCK_ACCT_FLUSH];
 ds->wr_total_time_ns = stats->total_time_ns[BLOCK_ACCT_WRITE];
 ds->rd_total_time_ns = stats->total_time_ns[BLOCK_ACCT_READ];
 ds->flush_total_time_ns = stats->total_time_ns[BLOCK_ACCT_FLUSH];
+ds->unmap_total_time_ns = stats->total_time_ns[BLOCK_ACCT_UNMAP];
 
 ds->has_idle_time_ns = stats->last_access_time_ns > 0;
 if (ds->has_idle_time_ns) {
-- 
2.7.4




[Qemu-block] [PATCH 2/7] ide: account UNMAP (TRIM) operations

2017-11-20 Thread Anton Nefedov
Signed-off-by: Anton Nefedov 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 hw/ide/core.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 471d0c9..2e4dea7 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -389,6 +389,7 @@ typedef struct TrimAIOCB {
 QEMUIOVector *qiov;
 BlockAIOCB *aiocb;
 int i, j;
+BlockAcctCookie acct;
 } TrimAIOCB;
 
 static void trim_aio_cancel(BlockAIOCB *acb)
@@ -426,6 +427,14 @@ static void ide_trim_bh_cb(void *opaque)
 static void ide_issue_trim_cb(void *opaque, int ret)
 {
 TrimAIOCB *iocb = opaque;
+if (iocb->i >= 0) {
+if (ret >= 0) {
+block_acct_done(blk_get_stats(iocb->blk), >acct);
+} else {
+block_acct_failed(blk_get_stats(iocb->blk), >acct);
+}
+}
+
 if (ret >= 0) {
 while (iocb->j < iocb->qiov->niov) {
 int j = iocb->j;
@@ -442,6 +451,9 @@ static void ide_issue_trim_cb(void *opaque, int ret)
 continue;
 }
 
+block_acct_start(blk_get_stats(iocb->blk), >acct,
+ count << BDRV_SECTOR_BITS, BLOCK_ACCT_UNMAP);
+
 /* Got an entry! Submit and exit.  */
 iocb->aiocb = blk_aio_pdiscard(iocb->blk,
sector << BDRV_SECTOR_BITS,
-- 
2.7.4




Re: [Qemu-block] [PATCH for-2.11 1/3] block: Add errp to bdrv_snapshot_goto()

2017-11-20 Thread Kevin Wolf
Am 20.11.2017 um 17:07 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 20.11.2017 17:50, Kevin Wolf wrote:
> > Signed-off-by: Kevin Wolf 
> > ---
> >   include/block/snapshot.h |  3 ++-
> >   block/snapshot.c | 21 -
> >   qemu-img.c   |  6 +++---
> >   3 files changed, 21 insertions(+), 9 deletions(-)
> > 
> > diff --git a/include/block/snapshot.h b/include/block/snapshot.h
> > index e5c0553115..aeb80405e8 100644
> > --- a/include/block/snapshot.h
> > +++ b/include/block/snapshot.h
> > @@ -57,7 +57,8 @@ int bdrv_can_snapshot(BlockDriverState *bs);
> >   int bdrv_snapshot_create(BlockDriverState *bs,
> >QEMUSnapshotInfo *sn_info);
> >   int bdrv_snapshot_goto(BlockDriverState *bs,
> > -   const char *snapshot_id);
> > +   const char *snapshot_id,
> > +   Error **errp);
> >   int bdrv_snapshot_delete(BlockDriverState *bs,
> >const char *snapshot_id,
> >const char *name,
> > diff --git a/block/snapshot.c b/block/snapshot.c
> > index be0743abac..9c941e638d 100644
> > --- a/block/snapshot.c
> > +++ b/block/snapshot.c
> > @@ -177,18 +177,21 @@ int bdrv_snapshot_create(BlockDriverState *bs,
> >   }
> >   int bdrv_snapshot_goto(BlockDriverState *bs,
> > -   const char *snapshot_id)
> > +   const char *snapshot_id,
> > +   Error **errp)
> >   {
> >   BlockDriver *drv = bs->drv;
> >   int ret, open_ret;
> >   int64_t len;
> >   if (!drv) {
> > +error_setg(errp, "Block driver is closed");
> >   return -ENOMEDIUM;
> >   }
> >   len = bdrv_getlength(bs);
> >   if (len < 0) {
> > +error_setg_errno(errp, -len, "Cannot get block device size");
> >   return len;
> >   }
> >   /* We should set all bits in all enabled dirty bitmaps, because dirty
> > @@ -200,13 +203,18 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
> >   bdrv_set_dirty(bs, 0, len);
> >   if (drv->bdrv_snapshot_goto) {
> > -return drv->bdrv_snapshot_goto(bs, snapshot_id);
> > +ret = drv->bdrv_snapshot_goto(bs, snapshot_id);
> > +if (ret < 0) {
> > +error_setg_errno(errp, -ret, "Failed to load snapshot");
> > +}
> > +return ret;
> >   }
> >   if (bs->file) {
> >   BlockDriverState *file;
> >   QDict *options = qdict_clone_shallow(bs->options);
> >   QDict *file_options;
> > +Error *local_err = NULL;
> >   file = bs->file->bs;
> >   /* Prevent it from getting deleted when detached from bs */
> > @@ -220,12 +228,14 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
> >   bdrv_unref_child(bs, bs->file);
> >   bs->file = NULL;
> > -ret = bdrv_snapshot_goto(file, snapshot_id);
> > -open_ret = drv->bdrv_open(bs, options, bs->open_flags, NULL);
> > +ret = bdrv_snapshot_goto(file, snapshot_id, errp);
> > +open_ret = drv->bdrv_open(bs, options, bs->open_flags, _err);
> >   QDECREF(options);
> >   if (open_ret < 0) {
> >   bdrv_unref(file);
> >   bs->drv = NULL;
> > +/* A bdrv_snapshot_goto() error takes precedence */
> 
> you return open_ret from bdrv_open and err msg from bdrv_snapshot_goto.
> the may not correspond to each other.

Hm, yes. Is this a problem, though?

If it is, I guess we can change the return value below:

> > +error_propagate(errp, local_err);
> >   return open_ret;

return ret < 0 ? ret : open_ret;

Would people prefer this?

Kevin



[Qemu-block] [PATCH 3/7] scsi: store unmap offset and nb_sectors in request struct

2017-11-20 Thread Anton Nefedov
it allows to report it in the error handler

Signed-off-by: Anton Nefedov 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 hw/scsi/scsi-disk.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 1243117..3882052 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -1625,8 +1625,6 @@ static void scsi_unmap_complete_noio(UnmapCBData *data, 
int ret)
 {
 SCSIDiskReq *r = data->r;
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
-uint64_t sector_num;
-uint32_t nb_sectors;
 
 assert(r->req.aiocb == NULL);
 if (scsi_disk_req_check_error(r, ret, false)) {
@@ -1634,16 +1632,16 @@ static void scsi_unmap_complete_noio(UnmapCBData *data, 
int ret)
 }
 
 if (data->count > 0) {
-sector_num = ldq_be_p(>inbuf[0]);
-nb_sectors = ldl_be_p(>inbuf[8]) & 0xULL;
-if (!check_lba_range(s, sector_num, nb_sectors)) {
+r->sector = ldq_be_p(>inbuf[0]);
+r->sector_count = ldl_be_p(>inbuf[8]) & 0xULL;
+if (!check_lba_range(s, r->sector, r->sector_count)) {
 scsi_check_condition(r, SENSE_CODE(LBA_OUT_OF_RANGE));
 goto done;
 }
 
 r->req.aiocb = blk_aio_pdiscard(s->qdev.conf.blk,
-sector_num * s->qdev.blocksize,
-nb_sectors * s->qdev.blocksize,
+r->sector * s->qdev.blocksize,
+r->sector_count * s->qdev.blocksize,
 scsi_unmap_complete, data);
 data->count--;
 data->inbuf += 16;
-- 
2.7.4




Re: [Qemu-block] [PATCH for-2.11 1/3] block: Add errp to bdrv_snapshot_goto()

2017-11-20 Thread Vladimir Sementsov-Ogievskiy

20.11.2017 17:50, Kevin Wolf wrote:

Signed-off-by: Kevin Wolf 
---
  include/block/snapshot.h |  3 ++-
  block/snapshot.c | 21 -
  qemu-img.c   |  6 +++---
  3 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/include/block/snapshot.h b/include/block/snapshot.h
index e5c0553115..aeb80405e8 100644
--- a/include/block/snapshot.h
+++ b/include/block/snapshot.h
@@ -57,7 +57,8 @@ int bdrv_can_snapshot(BlockDriverState *bs);
  int bdrv_snapshot_create(BlockDriverState *bs,
   QEMUSnapshotInfo *sn_info);
  int bdrv_snapshot_goto(BlockDriverState *bs,
-   const char *snapshot_id);
+   const char *snapshot_id,
+   Error **errp);
  int bdrv_snapshot_delete(BlockDriverState *bs,
   const char *snapshot_id,
   const char *name,
diff --git a/block/snapshot.c b/block/snapshot.c
index be0743abac..9c941e638d 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -177,18 +177,21 @@ int bdrv_snapshot_create(BlockDriverState *bs,
  }
  
  int bdrv_snapshot_goto(BlockDriverState *bs,

-   const char *snapshot_id)
+   const char *snapshot_id,
+   Error **errp)
  {
  BlockDriver *drv = bs->drv;
  int ret, open_ret;
  int64_t len;
  
  if (!drv) {

+error_setg(errp, "Block driver is closed");
  return -ENOMEDIUM;
  }
  
  len = bdrv_getlength(bs);

  if (len < 0) {
+error_setg_errno(errp, -len, "Cannot get block device size");
  return len;
  }
  /* We should set all bits in all enabled dirty bitmaps, because dirty
@@ -200,13 +203,18 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
  bdrv_set_dirty(bs, 0, len);
  
  if (drv->bdrv_snapshot_goto) {

-return drv->bdrv_snapshot_goto(bs, snapshot_id);
+ret = drv->bdrv_snapshot_goto(bs, snapshot_id);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to load snapshot");
+}
+return ret;
  }
  
  if (bs->file) {

  BlockDriverState *file;
  QDict *options = qdict_clone_shallow(bs->options);
  QDict *file_options;
+Error *local_err = NULL;
  
  file = bs->file->bs;

  /* Prevent it from getting deleted when detached from bs */
@@ -220,12 +228,14 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
  bdrv_unref_child(bs, bs->file);
  bs->file = NULL;
  
-ret = bdrv_snapshot_goto(file, snapshot_id);

-open_ret = drv->bdrv_open(bs, options, bs->open_flags, NULL);
+ret = bdrv_snapshot_goto(file, snapshot_id, errp);
+open_ret = drv->bdrv_open(bs, options, bs->open_flags, _err);
  QDECREF(options);
  if (open_ret < 0) {
  bdrv_unref(file);
  bs->drv = NULL;
+/* A bdrv_snapshot_goto() error takes precedence */


you return open_ret from bdrv_open and err msg from bdrv_snapshot_goto.
the may not correspond to each other.


+error_propagate(errp, local_err);
  return open_ret;
  }
  
@@ -234,6 +244,7 @@ int bdrv_snapshot_goto(BlockDriverState *bs,

  return ret;
  }
  
+error_setg(errp, "Block driver does not support snapshots");

  return -ENOTSUP;
  }
  
@@ -467,7 +478,7 @@ int bdrv_all_goto_snapshot(const char *name, BlockDriverState **first_bad_bs)
  
  aio_context_acquire(ctx);

  if (bdrv_can_snapshot(bs)) {
-err = bdrv_snapshot_goto(bs, name);
+err = bdrv_snapshot_goto(bs, name, NULL);
  }
  aio_context_release(ctx);
  if (err < 0) {
diff --git a/qemu-img.c b/qemu-img.c
index 02a6e27beb..68b375f998 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2989,10 +2989,10 @@ static int img_snapshot(int argc, char **argv)
  break;
  
  case SNAPSHOT_APPLY:

-ret = bdrv_snapshot_goto(bs, snapshot_name);
+ret = bdrv_snapshot_goto(bs, snapshot_name, );
  if (ret) {
-error_report("Could not apply snapshot '%s': %d (%s)",
-snapshot_name, ret, strerror(-ret));
+error_reportf_err(err, "Could not apply snapshot '%s': ",
+  snapshot_name);
  }
  break;
  



--
Best regards,
Vladimir




Re: [Qemu-block] [PATCH for-2.11 1/3] block: Add errp to bdrv_snapshot_goto()

2017-11-20 Thread Vladimir Sementsov-Ogievskiy

20.11.2017 19:23, Kevin Wolf wrote:

Am 20.11.2017 um 17:07 hat Vladimir Sementsov-Ogievskiy geschrieben:

20.11.2017 17:50, Kevin Wolf wrote:

Signed-off-by: Kevin Wolf 
---
   include/block/snapshot.h |  3 ++-
   block/snapshot.c | 21 -
   qemu-img.c   |  6 +++---
   3 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/include/block/snapshot.h b/include/block/snapshot.h
index e5c0553115..aeb80405e8 100644
--- a/include/block/snapshot.h
+++ b/include/block/snapshot.h
@@ -57,7 +57,8 @@ int bdrv_can_snapshot(BlockDriverState *bs);
   int bdrv_snapshot_create(BlockDriverState *bs,
QEMUSnapshotInfo *sn_info);
   int bdrv_snapshot_goto(BlockDriverState *bs,
-   const char *snapshot_id);
+   const char *snapshot_id,
+   Error **errp);
   int bdrv_snapshot_delete(BlockDriverState *bs,
const char *snapshot_id,
const char *name,
diff --git a/block/snapshot.c b/block/snapshot.c
index be0743abac..9c941e638d 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -177,18 +177,21 @@ int bdrv_snapshot_create(BlockDriverState *bs,
   }
   int bdrv_snapshot_goto(BlockDriverState *bs,
-   const char *snapshot_id)
+   const char *snapshot_id,
+   Error **errp)
   {
   BlockDriver *drv = bs->drv;
   int ret, open_ret;
   int64_t len;
   if (!drv) {
+error_setg(errp, "Block driver is closed");
   return -ENOMEDIUM;
   }
   len = bdrv_getlength(bs);
   if (len < 0) {
+error_setg_errno(errp, -len, "Cannot get block device size");
   return len;
   }
   /* We should set all bits in all enabled dirty bitmaps, because dirty
@@ -200,13 +203,18 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
   bdrv_set_dirty(bs, 0, len);
   if (drv->bdrv_snapshot_goto) {
-return drv->bdrv_snapshot_goto(bs, snapshot_id);
+ret = drv->bdrv_snapshot_goto(bs, snapshot_id);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to load snapshot");
+}
+return ret;
   }
   if (bs->file) {
   BlockDriverState *file;
   QDict *options = qdict_clone_shallow(bs->options);
   QDict *file_options;
+Error *local_err = NULL;
   file = bs->file->bs;
   /* Prevent it from getting deleted when detached from bs */
@@ -220,12 +228,14 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
   bdrv_unref_child(bs, bs->file);
   bs->file = NULL;
-ret = bdrv_snapshot_goto(file, snapshot_id);
-open_ret = drv->bdrv_open(bs, options, bs->open_flags, NULL);
+ret = bdrv_snapshot_goto(file, snapshot_id, errp);
+open_ret = drv->bdrv_open(bs, options, bs->open_flags, _err);
   QDECREF(options);
   if (open_ret < 0) {
   bdrv_unref(file);
   bs->drv = NULL;
+/* A bdrv_snapshot_goto() error takes precedence */

you return open_ret from bdrv_open and err msg from bdrv_snapshot_goto.
the may not correspond to each other.

Hm, yes. Is this a problem, though?

If it is, I guess we can change the return value below:


+error_propagate(errp, local_err);
   return open_ret;

return ret < 0 ? ret : open_ret;

Would people prefer this?


looks good,

Reviewed-by: Vladimir Sementsov-Ogievskiy 



Kevin



--
Best regards,
Vladimir




[Qemu-block] [PATCH 7/7] qapi: query-blockstat: add driver specific file-posix stats

2017-11-20 Thread Anton Nefedov
A block driver can provide a callback to report driver-specific
statistics.

file-posix driver now reports discard statistics

Signed-off-by: Anton Nefedov 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 qapi/block-core.json  | 37 +
 include/block/block.h |  1 +
 include/block/block_int.h |  1 +
 block.c   |  9 +
 block/file-posix.c| 21 +
 block/qapi.c  |  5 +
 6 files changed, 74 insertions(+)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index ba2705d..aefccf6 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -765,6 +765,40 @@
'timed_stats': ['BlockDeviceTimedStats'] } }
 
 ##
+# @BlockDriverStatsFile:
+#
+# File driver statistics
+#
+# @discard_nb_ok: The number of succeeded discard operations performed by
+# the driver.
+#
+# @discard_nb_failed: The number of failed discard operations performed by
+# the driver.
+#
+# @discard_bytes_ok: The number of bytes discarded by the driver.
+#
+# Since 2.12
+##
+{ 'struct': 'BlockDriverStatsFile',
+  'data': {
+  'discard_nb_ok': 'int',
+  'discard_nb_failed': 'int',
+  'discard_bytes_ok': 'int'
+  } }
+
+##
+# @BlockDriverStats:
+#
+# Statistics of a block driver (driver-specific)
+#
+# Since: 2.12
+##
+{ 'union': 'BlockDriverStats',
+  'data': {
+  'file': 'BlockDriverStatsFile'
+  } }
+
+##
 # @BlockStats:
 #
 # Statistics of a virtual block device or a block backing device.
@@ -776,6 +810,8 @@
 #
 # @stats:  A @BlockDeviceStats for the device.
 #
+# @driver-stats: Optional driver-specific statistics. (Since 2.12)
+#
 # @parent: This describes the file block device if it has one.
 #  Contains recursively the statistics of the underlying
 #  protocol (e.g. the host file for a qcow2 image). If there is
@@ -789,6 +825,7 @@
 { 'struct': 'BlockStats',
   'data': {'*device': 'str', '*node-name': 'str',
'stats': 'BlockDeviceStats',
+   '*driver-stats': 'BlockDriverStats',
'*parent': 'BlockStats',
'*backing': 'BlockStats'} }
 
diff --git a/include/block/block.h b/include/block/block.h
index c05cac5..e6baead 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -473,6 +473,7 @@ const char *bdrv_get_device_or_node_name(const 
BlockDriverState *bs);
 int bdrv_get_flags(BlockDriverState *bs);
 int bdrv_get_info(BlockDriverState *bs, BlockDriverInfo *bdi);
 ImageInfoSpecific *bdrv_get_specific_info(BlockDriverState *bs);
+BlockDriverStats *bdrv_get_driver_stats(BlockDriverState *bs);
 void bdrv_round_to_clusters(BlockDriverState *bs,
 int64_t offset, int64_t bytes,
 int64_t *cluster_offset,
diff --git a/include/block/block_int.h b/include/block/block_int.h
index a548277..a127861 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -269,6 +269,7 @@ struct BlockDriver {
   Error **errp);
 int (*bdrv_get_info)(BlockDriverState *bs, BlockDriverInfo *bdi);
 ImageInfoSpecific *(*bdrv_get_specific_info)(BlockDriverState *bs);
+BlockDriverStats *(*bdrv_get_stats)(BlockDriverState *bs);
 
 int coroutine_fn (*bdrv_save_vmstate)(BlockDriverState *bs,
   QEMUIOVector *qiov,
diff --git a/block.c b/block.c
index 6c8ef98..7e5822f 100644
--- a/block.c
+++ b/block.c
@@ -4016,6 +4016,15 @@ ImageInfoSpecific 
*bdrv_get_specific_info(BlockDriverState *bs)
 return NULL;
 }
 
+BlockDriverStats *bdrv_get_driver_stats(BlockDriverState *bs)
+{
+BlockDriver *drv = bs->drv;
+if (!drv || !drv->bdrv_get_stats) {
+return NULL;
+}
+return drv->bdrv_get_stats(bs);
+}
+
 void bdrv_debug_event(BlockDriverState *bs, BlkdebugEvent event)
 {
 if (!bs || !bs->drv || !bs->drv->bdrv_debug_event) {
diff --git a/block/file-posix.c b/block/file-posix.c
index 544ae58..3ab92e6 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2240,6 +2240,25 @@ static int raw_get_info(BlockDriverState *bs, 
BlockDriverInfo *bdi)
 return 0;
 }
 
+static BlockDriverStats *raw_get_stats(BlockDriverState *bs)
+{
+BDRVRawState *s = bs->opaque;
+BlockDriverStats *stats = g_new(BlockDriverStats, 1);
+
+*stats = (BlockDriverStats){
+.type  = BLOCK_DRIVER_STATS_KIND_FILE,
+.u.file.data = g_new(BlockDriverStatsFile, 1),
+};
+
+*stats->u.file.data = (BlockDriverStatsFile){
+.discard_nb_ok = s->stats.discard_nb_ok,
+.discard_nb_failed = s->stats.discard_nb_failed,
+.discard_bytes_ok = s->stats.discard_bytes_ok,
+};
+
+return stats;
+}
+
 static QemuOptsList raw_create_opts = {
 .name = "raw-create-opts",
 .head = QTAILQ_HEAD_INITIALIZER(raw_create_opts.head),
@@ -2312,6 +2331,7 @@ BlockDriver bdrv_file = {
 .bdrv_get_info = 

[Qemu-block] [PATCH 6/7] file-posix: account discard operations

2017-11-20 Thread Anton Nefedov
This will help to identify how many of the user-issued discard operations
(accounted on a device level) have actually suceeded down on the host file
(even though the numbers will not be exactly the same if non-raw format
driver is used (e.g. qcow2 sending metadata discards)).

Signed-off-by: Anton Nefedov 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/file-posix.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 36ee89e..544ae58 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -158,6 +158,11 @@ typedef struct BDRVRawState {
 bool page_cache_inconsistent:1;
 bool has_fallocate;
 bool needs_alignment;
+struct {
+int64_t discard_nb_ok;
+int64_t discard_nb_failed;
+int64_t discard_bytes_ok;
+} stats;
 
 PRManager *pr_mgr;
 } BDRVRawState;
@@ -1458,6 +1463,16 @@ static ssize_t handle_aiocb_discard(RawPosixAIOData 
*aiocb)
 return ret;
 }
 
+static void raw_account_discard(BDRVRawState *s, uint64_t nbytes, int ret)
+{
+if (ret) {
+s->stats.discard_nb_failed++;
+} else {
+s->stats.discard_nb_ok++;
+s->stats.discard_bytes_ok += nbytes;
+}
+}
+
 static int aio_worker(void *arg)
 {
 RawPosixAIOData *aiocb = arg;
@@ -1494,6 +1509,7 @@ static int aio_worker(void *arg)
 break;
 case QEMU_AIO_DISCARD:
 ret = handle_aiocb_discard(aiocb);
+raw_account_discard(aiocb->bs->opaque, aiocb->aio_nbytes, ret);
 break;
 case QEMU_AIO_WRITE_ZEROES:
 ret = handle_aiocb_write_zeroes(aiocb);
@@ -2654,8 +2670,9 @@ static coroutine_fn BlockAIOCB 
*hdev_aio_pdiscard(BlockDriverState *bs,
 BlockCompletionFunc *cb, void *opaque)
 {
 BDRVRawState *s = bs->opaque;
-
-if (fd_open(bs) < 0) {
+int ret = fd_open(bs);
+if (ret < 0) {
+raw_account_discard(s, bytes, ret);
 return NULL;
 }
 return paio_submit(bs, s->fd, offset, NULL, bytes,
-- 
2.7.4




[Qemu-block] [PATCH 5/7] scsi: account unmap operations

2017-11-20 Thread Anton Nefedov
Signed-off-by: Anton Nefedov 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 hw/scsi/scsi-disk.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index eca6a15..6c33418 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -1636,6 +1636,10 @@ static void scsi_unmap_complete_noio(UnmapCBData *data, 
int ret)
 goto done;
 }
 
+block_acct_start(blk_get_stats(s->qdev.conf.blk), >acct,
+ r->sector_count * s->qdev.blocksize,
+ BLOCK_ACCT_UNMAP);
+
 r->req.aiocb = blk_aio_pdiscard(s->qdev.conf.blk,
 r->sector * s->qdev.blocksize,
 r->sector_count * s->qdev.blocksize,
@@ -1662,10 +1666,11 @@ static void scsi_unmap_complete(void *opaque, int ret)
 r->req.aiocb = NULL;
 
 aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
-if (scsi_disk_req_check_error(r, ret, false)) {
+if (scsi_disk_req_check_error(r, ret, true)) {
 scsi_req_unref(>req);
 g_free(data);
 } else {
+block_acct_done(blk_get_stats(s->qdev.conf.blk), >acct);
 scsi_unmap_complete_noio(data, ret);
 }
 aio_context_release(blk_get_aio_context(s->qdev.conf.blk));
@@ -1712,10 +1717,12 @@ static void scsi_disk_emulate_unmap(SCSIDiskReq *r, 
uint8_t *inbuf)
 return;
 
 invalid_param_len:
+block_acct_invalid(blk_get_stats(s->qdev.conf.blk), BLOCK_ACCT_UNMAP);
 scsi_check_condition(r, SENSE_CODE(INVALID_PARAM_LEN));
 return;
 
 invalid_field:
+block_acct_invalid(blk_get_stats(s->qdev.conf.blk), BLOCK_ACCT_UNMAP);
 scsi_check_condition(r, SENSE_CODE(INVALID_FIELD));
 }
 
-- 
2.7.4




[Qemu-block] [PATCH 4/7] scsi: move unmap error checking to the complete callback

2017-11-20 Thread Anton Nefedov
This will help to account the operation in the following commit.

The difference is that we don't call scsi_disk_req_check_error() before
the 1st discard iteration anymore. That function also checks if
the request is cancelled, however it shouldn't get canceled until it
yields in blk_aio() functions anyway.
Same approach is already used for emulate_write_same.

Signed-off-by: Anton Nefedov 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 hw/scsi/scsi-disk.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 3882052..eca6a15 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -1627,9 +1627,6 @@ static void scsi_unmap_complete_noio(UnmapCBData *data, 
int ret)
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 
 assert(r->req.aiocb == NULL);
-if (scsi_disk_req_check_error(r, ret, false)) {
-goto done;
-}
 
 if (data->count > 0) {
 r->sector = ldq_be_p(>inbuf[0]);
@@ -1665,7 +1662,12 @@ static void scsi_unmap_complete(void *opaque, int ret)
 r->req.aiocb = NULL;
 
 aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
-scsi_unmap_complete_noio(data, ret);
+if (scsi_disk_req_check_error(r, ret, false)) {
+scsi_req_unref(>req);
+g_free(data);
+} else {
+scsi_unmap_complete_noio(data, ret);
+}
 aio_context_release(blk_get_aio_context(s->qdev.conf.blk));
 }
 
-- 
2.7.4




Re: [Qemu-block] [PATCH 3/5] coroutines: abort if we try to enter a still-sleeping coroutine

2017-11-20 Thread Paolo Bonzini
On 21/11/2017 00:08, Jeff Cody wrote:
> @@ -34,6 +36,7 @@ void coroutine_fn co_aio_sleep_ns(AioContext *ctx, 
> QEMUClockType type,
>  CoSleepCB sleep_cb = {
>  .co = qemu_coroutine_self(),
>  };
> +if (sleep_cb.co->sleeping == 1 || sleep_cb.co->scheduled == 1) {
> +   fprintf(stderr, "Cannot sleep a co-routine that is already sleeping "
> +   " or scheduled\n");
> +   abort();
> +}
> +sleep_cb.co->sleeping = 1;
>  sleep_cb.ts = aio_timer_new(ctx, type, SCALE_NS, co_sleep_cb, _cb);
>  timer_mod(sleep_cb.ts, qemu_clock_get_ns(type) + ns);
>  qemu_coroutine_yield();

I understand that this was just an example and not the actual patch, but
I'll still point out that this loses the benefit (better error message)
of keeping the flags separate.

What do you think about making "scheduled" a const char * and assigning
__func__ to it (i.e. either "aio_co_schedule" or "co_aio_sleep_ns")?

Thanks,

Paolo



Re: [Qemu-block] [PATCH for-2.11 2/3] block: Add errp to bdrv_all_goto_snapshot()

2017-11-20 Thread Vladimir Sementsov-Ogievskiy

20.11.2017 17:50, Kevin Wolf wrote:

Signed-off-by: Kevin Wolf 
---
  include/block/snapshot.h |  3 ++-
  block/snapshot.c | 11 ++-
  migration/savevm.c   |  6 +++---
  3 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/include/block/snapshot.h b/include/block/snapshot.h
index aeb80405e8..9407799941 100644
--- a/include/block/snapshot.h
+++ b/include/block/snapshot.h
@@ -84,7 +84,8 @@ int bdrv_snapshot_load_tmp_by_id_or_name(BlockDriverState *bs,
  bool bdrv_all_can_snapshot(BlockDriverState **first_bad_bs);
  int bdrv_all_delete_snapshot(const char *name, BlockDriverState 
**first_bsd_bs,
   Error **err);
-int bdrv_all_goto_snapshot(const char *name, BlockDriverState **first_bsd_bs);
+int bdrv_all_goto_snapshot(const char *name, BlockDriverState **first_bad_bs,
+   Error **errp);
  int bdrv_all_find_snapshot(const char *name, BlockDriverState **first_bad_bs);
  int bdrv_all_create_snapshot(QEMUSnapshotInfo *sn,
   BlockDriverState *vm_state_bs,
diff --git a/block/snapshot.c b/block/snapshot.c
index 9c941e638d..13ec3b1c8c 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -467,9 +467,10 @@ fail:
  }
  
  
-int bdrv_all_goto_snapshot(const char *name, BlockDriverState **first_bad_bs)

+int bdrv_all_goto_snapshot(const char *name, BlockDriverState **first_bad_bs,
+   Error **errp)
  {
-int err = 0;
+int ret = 0;
  BlockDriverState *bs;
  BdrvNextIterator it;
  
@@ -478,10 +479,10 @@ int bdrv_all_goto_snapshot(const char *name, BlockDriverState **first_bad_bs)
  
  aio_context_acquire(ctx);

  if (bdrv_can_snapshot(bs)) {
-err = bdrv_snapshot_goto(bs, name, NULL);
+ret = bdrv_snapshot_goto(bs, name, errp);
  }
  aio_context_release(ctx);
-if (err < 0) {
+if (ret < 0) {
  bdrv_next_cleanup();
  goto fail;
  }
@@ -489,7 +490,7 @@ int bdrv_all_goto_snapshot(const char *name, 
BlockDriverState **first_bad_bs)
  
  fail:

  *first_bad_bs = bs;
-return err;
+return ret;
  }
  
  int bdrv_all_find_snapshot(const char *name, BlockDriverState **first_bad_bs)

diff --git a/migration/savevm.c b/migration/savevm.c
index 4a88228614..192f2d82cd 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2346,10 +2346,10 @@ int load_snapshot(const char *name, Error **errp)
  /* Flush all IO requests so they don't interfere with the new state.  */
  bdrv_drain_all_begin();
  
-ret = bdrv_all_goto_snapshot(name, );

+ret = bdrv_all_goto_snapshot(name, , errp);
  if (ret < 0) {
-error_setg(errp, "Error %d while activating snapshot '%s' on '%s'",
- ret, name, bdrv_get_device_name(bs));
+error_prepend(errp, "Could not load snapshot '%s' on '%s': ",
+  name, bdrv_get_device_name(bs));


possible resulting msg: "Could not load snapshot 'snap0' on 'dev0': 
Failed to load snapshot". a bit weird. with this fixed or not:


Reviewed-by: Vladimir Sementsov-Ogievskiy 


  goto err_drain;
  }
  



--
Best regards,
Vladimir




[Qemu-block] [PATCH 0/7] discard blockstats

2017-11-20 Thread Anton Nefedov
qmp query-blockstats provides stats info for write/read/flush ops.

Patches 1-5 implement the similar for discard (unmap) command for scsi
and ide disks.
Discard stat "unmap_ops / unmap_bytes" is supposed to account the ops that
have completed without an error.

However, discard operation is advisory. Specifically,
 - common block layer ignores ENOTSUP error code.
   That might be returned if the block driver does not support discard,
   or discard has been configured to be ignored.
 - format drivers such as qcow2 may ignore discard if they were configured
   to ignore that, or if the corresponding area is already marked unused
   (unallocated / zero clusters).

And what is actually useful is the number of bytes actually discarded
down on the host filesystem.
To achieve that, driver-specific statistics has been added to blockstats
(patch 7).
With patch 6, file-posix driver accounts discard operations on its level too.

query-blockstat result:

(note the difference between blockdevice unmap and file discard stats. qcow2
sends a few ops down to the file as the clusters are actually unallocated
on qcow2 level)

{"return": [
{"device": "drive-scsi0-0-0-0"
  "parent": {
"stats": {
>  "unmap_operations": 0
>  "unmap_merged": 0
  "flush_total_time_ns": 0
  "wr_highest_offset": 8111718400
  [..]
  "invalid_wr_operations": 0
  "invalid_rd_operations": 0}
"node-name": "#block047"
>"driver_stats": {
>  "type": "file"
>  "data": {
>"discard_bytes_ok": 1572864
>"discard_nb_failed": 0
>"discard_nb_ok": 5}}}
  "stats": {
>   "unmap_operations": 472
>   "unmap_merged": 0
"flush_total_time_ns": 44530540
"wr_highest_offset": 7106662400
"wr_total_time_ns": 45518856
"failed_wr_operations": 0
"failed_rd_operations": 0
"wr_merged": 0
"wr_bytes": 889856
"timed_stats": []
>"failed_unmap_operations": 0
"failed_flush_operations": 0
"account_invalid": true
"rd_total_time_ns": 3306264098
>"invalid_unmap_operations": 0
"flush_operations": 18
"wr_operations": 120
>"unmap_bytes": 12312014848
"rd_merged": 0
"rd_bytes": 137103360
>"unmap_total_time_ns": 22664692
"invalid_flush_operations": 0
"account_failed": true
"idle_time_ns": 437316567
"rd_operations": 5636
"invalid_wr_operations": 0
"invalid_rd_operations": 0}
  "node-name": "#block128"}

  {"device": "drive-ide0-0-0"
  [..]

Anton Nefedov (7):
  qapi: add unmap to BlockDeviceStats
  ide: account UNMAP (TRIM) operations
  scsi: store unmap offset and nb_sectors in request struct
  scsi: move unmap error checking to the complete callback
  scsi: account unmap operations
  file-posix: account discard operations
  qapi: query-blockstat: add driver specific file-posix stats

 qapi/block-core.json   | 57 ++
 include/block/accounting.h |  1 +
 include/block/block.h  |  1 +
 include/block/block_int.h  |  1 +
 block.c|  9 
 block/file-posix.c | 42 --
 block/qapi.c   | 11 +
 hw/ide/core.c  | 12 ++
 hw/scsi/scsi-disk.c| 29 ++-
 9 files changed, 150 insertions(+), 13 deletions(-)

-- 
2.7.4




Re: [Qemu-block] [PATCH 3/5] coroutines: abort if we try to enter a still-sleeping coroutine

2017-11-20 Thread Paolo Bonzini
On 20/11/2017 03:46, Jeff Cody wrote:
> Once a coroutine is "sleeping", the timer callback will either enter the
> coroutine, or schedule it for the next AioContext if using iothreads.
> 
> It is illegal to enter that coroutine while waiting for this timer
> event and subsequent callback.  This patch will catch such an attempt,
> and abort QEMU with an error.
> 
> Like with the previous patch, we cannot rely solely on the co->caller
> check for recursive entry.  The prematurely entered coroutine may exit
> with COROUTINE_TERMINATE before the timer expires, making co->caller no
> longer valid.
> 
> We can clear co->sleeping in in co_sleep_cb(), because any doubly entry
> attempt after point should be caught by either the co->scheduled or
> co->caller checks.
> 
> Signed-off-by: Jeff Cody 
> ---
>  include/qemu/coroutine_int.h | 2 ++
>  util/qemu-coroutine-sleep.c  | 3 +++
>  util/qemu-coroutine.c| 5 +
>  3 files changed, 10 insertions(+)
> 
> diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
> index 931cdc9..b071217 100644
> --- a/include/qemu/coroutine_int.h
> +++ b/include/qemu/coroutine_int.h
> @@ -56,6 +56,8 @@ struct Coroutine {
>  
>  int scheduled;
>  
> +int sleeping;

Is this a different "state" (in Stefan's parlance) than scheduled?  In
practice both means that someone may call qemu_(aio_)coroutine_enter
concurrently, so you'd better not do it yourself.

Paolo

> +
>  QSIMPLEQ_ENTRY(Coroutine) co_queue_next;
>  QSLIST_ENTRY(Coroutine) co_scheduled_next;
>  };
> diff --git a/util/qemu-coroutine-sleep.c b/util/qemu-coroutine-sleep.c
> index 9c56550..11ae95a 100644
> --- a/util/qemu-coroutine-sleep.c
> +++ b/util/qemu-coroutine-sleep.c
> @@ -13,6 +13,7 @@
>  
>  #include "qemu/osdep.h"
>  #include "qemu/coroutine.h"
> +#include "qemu/coroutine_int.h"
>  #include "qemu/timer.h"
>  #include "block/aio.h"
>  
> @@ -25,6 +26,7 @@ static void co_sleep_cb(void *opaque)
>  {
>  CoSleepCB *sleep_cb = opaque;
>  
> +sleep_cb->co->sleeping = 0;
>  aio_co_wake(sleep_cb->co);
>  }
>  
> @@ -34,6 +36,7 @@ void coroutine_fn co_aio_sleep_ns(AioContext *ctx, 
> QEMUClockType type,
>  CoSleepCB sleep_cb = {
>  .co = qemu_coroutine_self(),
>  };
> +sleep_cb.co->sleeping = 1;
>  sleep_cb.ts = aio_timer_new(ctx, type, SCALE_NS, co_sleep_cb, _cb);
>  timer_mod(sleep_cb.ts, qemu_clock_get_ns(type) + ns);
>  qemu_coroutine_yield();
> diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
> index 2edab63..1d9f93d 100644
> --- a/util/qemu-coroutine.c
> +++ b/util/qemu-coroutine.c
> @@ -118,6 +118,11 @@ void qemu_aio_coroutine_enter(AioContext *ctx, Coroutine 
> *co)
>  abort();
>  }
>  
> +if (co->sleeping == 1) {
> +fprintf(stderr, "Cannot enter a co-routine that is still 
> sleeping\n");
> +abort();
> +}
> +
>  if (co->caller) {
>  fprintf(stderr, "Co-routine re-entered recursively\n");
>  abort();
> 




Re: [Qemu-block] [PATCH 3/5] coroutines: abort if we try to enter a still-sleeping coroutine

2017-11-20 Thread Paolo Bonzini
On 20/11/2017 23:35, Jeff Cody wrote:
>> Is this a different "state" (in Stefan's parlance) than scheduled?  In
>> practice both means that someone may call qemu_(aio_)coroutine_enter
>> concurrently, so you'd better not do it yourself.
>>
> It is slightly different; it is from sleeping with a timer via
> co_aio_sleep_ns and waking via co_sleep_cb.  Whereas the 'co->scheduled' is
> specifically from being scheduled for a specific AioContext, via
> aio_co_schedule().

Right; however, that would only make a difference if we allowed
canceling a co_aio_sleep_ns.  Since we don't want that, they have the
same transitions.

> In practice, 'co->schedule' and 'co->sleeping' certainly rhyme, at the very
> least.
> 
> But having them separate will put the abort closer to where the problem lies,
> so it should make debugging a bit easier if we hit it.

What do you mean by closer?  It would print a slightly more informative
message, but the message is in qemu_aio_coroutine_for both cases.

In fact, unifying co->scheduled and co->sleeping means that you can
easily abort when co_aio_sleep_ns is called on a scheduled coroutine, like

/* This is valid. */
aio_co_schedule(qemu_get_current_aio_context(),
qemu_coroutine_self());

/* But only if there was a qemu_coroutine_yield here.  */
co_aio_sleep_ns(qemu_get_current_aio_context(), 1000);

Thanks,

Paolo



Re: [Qemu-block] [PATCH 3/5] coroutines: abort if we try to enter a still-sleeping coroutine

2017-11-20 Thread Jeff Cody
On Mon, Nov 20, 2017 at 11:47:09PM +0100, Paolo Bonzini wrote:
> On 20/11/2017 23:35, Jeff Cody wrote:
> >> Is this a different "state" (in Stefan's parlance) than scheduled?  In
> >> practice both means that someone may call qemu_(aio_)coroutine_enter
> >> concurrently, so you'd better not do it yourself.
> >>
> > It is slightly different; it is from sleeping with a timer via
> > co_aio_sleep_ns and waking via co_sleep_cb.  Whereas the 'co->scheduled' is
> > specifically from being scheduled for a specific AioContext, via
> > aio_co_schedule().
> 
> Right; however, that would only make a difference if we allowed
> canceling a co_aio_sleep_ns.  Since we don't want that, they have the
> same transitions.
> 
> > In practice, 'co->schedule' and 'co->sleeping' certainly rhyme, at the very
> > least.
> > 
> > But having them separate will put the abort closer to where the problem 
> > lies,
> > so it should make debugging a bit easier if we hit it.
> 
> What do you mean by closer?  It would print a slightly more informative
> message, but the message is in qemu_aio_coroutine_for both cases.
> 

Sorry, sloppy wording; I meant what you said above, that the error message
is more informative, so by tracking down where co->sleeping is set the
developer is closer to where the problem lies.

> In fact, unifying co->scheduled and co->sleeping means that you can
> easily abort when co_aio_sleep_ns is called on a scheduled coroutine, like
> 
> /* This is valid. */
> aio_co_schedule(qemu_get_current_aio_context(),
> qemu_coroutine_self());
> 
> /* But only if there was a qemu_coroutine_yield here.  */
> co_aio_sleep_ns(qemu_get_current_aio_context(), 1000);
>

That is true.  But we could also check (co->sleeping || co->scheduled) in
co_aio_sleep_ns() though, as well.

Hmm... not checking co->sleeping in co_aio_sleep_ns() is a bug in my
patch.  We don't want to schedule a coroutine on two different timers,
either.

So what do you think about adding this to the patch:

@@ -34,6 +36,7 @@ void coroutine_fn co_aio_sleep_ns(AioContext *ctx, 
QEMUClockType type,
 CoSleepCB sleep_cb = {
 .co = qemu_coroutine_self(),
 };
+if (sleep_cb.co->sleeping == 1 || sleep_cb.co->scheduled == 1) {
+   fprintf(stderr, "Cannot sleep a co-routine that is already sleeping "
+   " or scheduled\n");
+   abort();
+}
+sleep_cb.co->sleeping = 1;
 sleep_cb.ts = aio_timer_new(ctx, type, SCALE_NS, co_sleep_cb, _cb);
 timer_mod(sleep_cb.ts, qemu_clock_get_ns(type) + ns);
 qemu_coroutine_yield();


Jeff



Re: [Qemu-block] [PATCH 3/5] coroutines: abort if we try to enter a still-sleeping coroutine

2017-11-20 Thread Jeff Cody
On Tue, Nov 21, 2017 at 12:13:46AM +0100, Paolo Bonzini wrote:
> On 21/11/2017 00:08, Jeff Cody wrote:
> > @@ -34,6 +36,7 @@ void coroutine_fn co_aio_sleep_ns(AioContext *ctx, 
> > QEMUClockType type,
> >  CoSleepCB sleep_cb = {
> >  .co = qemu_coroutine_self(),
> >  };
> > +if (sleep_cb.co->sleeping == 1 || sleep_cb.co->scheduled == 1) {
> > +   fprintf(stderr, "Cannot sleep a co-routine that is already sleeping 
> > "
> > +   " or scheduled\n");
> > +   abort();
> > +}
> > +sleep_cb.co->sleeping = 1;
> >  sleep_cb.ts = aio_timer_new(ctx, type, SCALE_NS, co_sleep_cb, 
> > _cb);
> >  timer_mod(sleep_cb.ts, qemu_clock_get_ns(type) + ns);
> >  qemu_coroutine_yield();
> 
> I understand that this was just an example and not the actual patch, but
> I'll still point out that this loses the benefit (better error message)
> of keeping the flags separate.
> 
> What do you think about making "scheduled" a const char * and assigning
> __func__ to it (i.e. either "aio_co_schedule" or "co_aio_sleep_ns")?
> 

Ohhh, nice.  I'll spin a v2 with that, and merge patches 3 and 5 together.
And then maybe for 2.12 we can look at making it a fsm, like Stefan
suggested (or somehow make coroutine entry thread safe and idempotent).

Jeff



Re: [Qemu-block] [Libguestfs] [Qemu-devel] [qemu-img] support for XVA

2017-11-20 Thread Gandalf Corvotempesta
I did something different, that will build a raw image directly from a
xenserver export, on the fly.
Compared the resulting file (via MD5) with xenmygrate.py and there is a match.

Currently, this is the faster way to convert a XenServer image to a
raw file. Don't need to wait
for export, tar extract and conversion. It does all of that, at the
same time during the VM export in a single pass.

If someone interested in testing it, it would be apreciated.

2017-11-16 15:02 GMT+01:00 Max Reitz :
> On 2017-11-16 11:08, Gandalf Corvotempesta wrote:
>> 2017-11-15 23:55 GMT+01:00 Max Reitz :
>>> https://xanclic.moe/convert-xva.rb -- does this work?
>>> (It seems to works on the two example images I found...)
>>>
>>> An example is in the code, you use it like this:
>>>
>>> $ ./convert-xva.rb ~/Downloads/stats-appliance-2.36.020502.xva Ref:73
>>
>>
>> It doesn't work on huge images:
>>
>> # time convert-xva.rb 20171115_193814_efff_.xva Ref:10 -O qcow2 disk1.qcow2
>> /usr/bin/convert-xva.rb:119:in `exec': Argument list too long -
>> qemu-img (Errno::E2BIG)
>> from /usr/bin/convert-xva.rb:119:in `'
>>
>> real 9m41.414s
>> user 0m19.280s
>> sys 0m3.260s
>
> Well, there is this:
>
> https://github.com/XanClic/qemu.rb/blob/master/examples/convert-xva-online.rb
>
> That should get around that restriction, but currently it is stupidly
> slow (because it just issues all 1 MB copy operations simultaneously),
> and it doesn't use qemu-img convert.  Instead, it uses a real qemu with
> block jobs.  Ideally, you'd use it like this:
>
> $ ./convert-xva-online.rb ~/Downloads/stats-appliance-2.36.020502.xva
> Ref:73  81936 M
> $ qemu-img create -f qcow2 [whatever options you need] disk1.qcow2 \
> 81936M
> Formatting 'disk1.qcow2', fmt=qcow2 size=85916123136 encryption=off
> cluster_size=65536 lazy_refcounts=off refcount_bits=16
> $ ./convert-xva-online.rb ~/Downloads/stats-appliance-2.36.020502.xva \
> Ref:73 \
> '{"driver":"qcow2",
>   "detect-zeroes":"unmap",
>   "discard":"unmap",
>   "file":{"driver":"file","filename":"disk1.qcow2"}}'
> Adding block devices...
> Starting block jobs...
> 100.00 % of jobs completed
>
> (Maybe I can get it faster.)
>
> Max
>


xva_conv.sh
Description: Bourne shell script


Re: [Qemu-block] [PATCH 1/5] blockjob: do not allow coroutine double entry or entry-after-completion

2017-11-20 Thread Paolo Bonzini
On 20/11/2017 12:16, Stefan Hajnoczi wrote:
> This raises questions about the ability to cancel sleep:
> 
> 1. Does something depend on cancelling sleep?

block_job_cancel does, but in practice the sleep time is so small
(smaller than SLICE_TIME, which is 100 ms) that we probably don't care.

I agree with Jeff that canceling the sleep by force-entering the
coroutine seemed clever but is probably a very bad idea.

Paolo



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 3/5] coroutines: abort if we try to enter a still-sleeping coroutine

2017-11-20 Thread Jeff Cody
On Mon, Nov 20, 2017 at 11:30:39PM +0100, Paolo Bonzini wrote:
> On 20/11/2017 03:46, Jeff Cody wrote:
> > Once a coroutine is "sleeping", the timer callback will either enter the
> > coroutine, or schedule it for the next AioContext if using iothreads.
> > 
> > It is illegal to enter that coroutine while waiting for this timer
> > event and subsequent callback.  This patch will catch such an attempt,
> > and abort QEMU with an error.
> > 
> > Like with the previous patch, we cannot rely solely on the co->caller
> > check for recursive entry.  The prematurely entered coroutine may exit
> > with COROUTINE_TERMINATE before the timer expires, making co->caller no
> > longer valid.
> > 
> > We can clear co->sleeping in in co_sleep_cb(), because any doubly entry
> > attempt after point should be caught by either the co->scheduled or
> > co->caller checks.
> > 
> > Signed-off-by: Jeff Cody 
> > ---
> >  include/qemu/coroutine_int.h | 2 ++
> >  util/qemu-coroutine-sleep.c  | 3 +++
> >  util/qemu-coroutine.c| 5 +
> >  3 files changed, 10 insertions(+)
> > 
> > diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
> > index 931cdc9..b071217 100644
> > --- a/include/qemu/coroutine_int.h
> > +++ b/include/qemu/coroutine_int.h
> > @@ -56,6 +56,8 @@ struct Coroutine {
> >  
> >  int scheduled;
> >  
> > +int sleeping;
> 
> Is this a different "state" (in Stefan's parlance) than scheduled?  In
> practice both means that someone may call qemu_(aio_)coroutine_enter
> concurrently, so you'd better not do it yourself.
> 

It is slightly different; it is from sleeping with a timer via
co_aio_sleep_ns and waking via co_sleep_cb.  Whereas the 'co->scheduled' is
specifically from being scheduled for a specific AioContext, via
aio_co_schedule().

In practice, 'co->schedule' and 'co->sleeping' certainly rhyme, at the very
least.

But having them separate will put the abort closer to where the problem lies,
so it should make debugging a bit easier if we hit it.

> 
> > +
> >  QSIMPLEQ_ENTRY(Coroutine) co_queue_next;
> >  QSLIST_ENTRY(Coroutine) co_scheduled_next;
> >  };
> > diff --git a/util/qemu-coroutine-sleep.c b/util/qemu-coroutine-sleep.c
> > index 9c56550..11ae95a 100644
> > --- a/util/qemu-coroutine-sleep.c
> > +++ b/util/qemu-coroutine-sleep.c
> > @@ -13,6 +13,7 @@
> >  
> >  #include "qemu/osdep.h"
> >  #include "qemu/coroutine.h"
> > +#include "qemu/coroutine_int.h"
> >  #include "qemu/timer.h"
> >  #include "block/aio.h"
> >  
> > @@ -25,6 +26,7 @@ static void co_sleep_cb(void *opaque)
> >  {
> >  CoSleepCB *sleep_cb = opaque;
> >  
> > +sleep_cb->co->sleeping = 0;
> >  aio_co_wake(sleep_cb->co);
> >  }
> >  
> > @@ -34,6 +36,7 @@ void coroutine_fn co_aio_sleep_ns(AioContext *ctx, 
> > QEMUClockType type,
> >  CoSleepCB sleep_cb = {
> >  .co = qemu_coroutine_self(),
> >  };
> > +sleep_cb.co->sleeping = 1;
> >  sleep_cb.ts = aio_timer_new(ctx, type, SCALE_NS, co_sleep_cb, 
> > _cb);
> >  timer_mod(sleep_cb.ts, qemu_clock_get_ns(type) + ns);
> >  qemu_coroutine_yield();
> > diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
> > index 2edab63..1d9f93d 100644
> > --- a/util/qemu-coroutine.c
> > +++ b/util/qemu-coroutine.c
> > @@ -118,6 +118,11 @@ void qemu_aio_coroutine_enter(AioContext *ctx, 
> > Coroutine *co)
> >  abort();
> >  }
> >  
> > +if (co->sleeping == 1) {
> > +fprintf(stderr, "Cannot enter a co-routine that is still 
> > sleeping\n");
> > +abort();
> > +}
> > +
> >  if (co->caller) {
> >  fprintf(stderr, "Co-routine re-entered recursively\n");
> >  abort();
> > 
> 



[Qemu-block] [PATCH v7 for-2.12 25/25] block/null: Generate filename even with latency-ns

2017-11-20 Thread Max Reitz
While we cannot represent the latency-ns option in a filename, it is not
a significant option so not being able to should not stop us from
generating a filename nonetheless.

Signed-off-by: Max Reitz 
---
 block/null.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/null.c b/block/null.c
index 65f5d681c0..ca0aa3d528 100644
--- a/block/null.c
+++ b/block/null.c
@@ -250,7 +250,8 @@ static void null_refresh_filename(BlockDriverState *bs)
 {
 /* These options can be ignored */
 if (strcmp(qdict_entry_key(e), "filename") &&
-strcmp(qdict_entry_key(e), "driver"))
+strcmp(qdict_entry_key(e), "driver") &&
+strcmp(qdict_entry_key(e), NULL_OPT_LATENCY))
 {
 return;
 }
-- 
2.13.6




[Qemu-block] [PATCH v7 for-2.12 06/25] block: Make path_combine() return the path

2017-11-20 Thread Max Reitz
Besides being safe for arbitrary path lengths, after some follow-up
patches all callers will want a freshly allocated buffer anyway.

In the meantime, path_combine_deprecated() is added which has the same
interface as path_combine() had before this patch. All callers to that
function will be converted in follow-up patches.

Signed-off-by: Max Reitz 
Reviewed-by: Alberto Garcia 
---
 include/block/block.h |  4 +--
 block.c   | 85 ---
 block/vmdk.c  |  3 +-
 3 files changed, 49 insertions(+), 43 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index c05cac57e5..4584893c78 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -490,9 +490,7 @@ void bdrv_get_full_backing_filename_from_filename(const 
char *backed,
 
 int path_has_protocol(const char *path);
 int path_is_absolute(const char *path);
-void path_combine(char *dest, int dest_size,
-  const char *base_path,
-  const char *filename);
+char *path_combine(const char *base_path, const char *filename);
 
 int bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
 int bdrv_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
diff --git a/block.c b/block.c
index 2dc06cb9cb..8786b9b520 100644
--- a/block.c
+++ b/block.c
@@ -147,53 +147,62 @@ int path_is_absolute(const char *path)
 #endif
 }
 
-/* if filename is absolute, just copy it to dest. Otherwise, build a
+/* if filename is absolute, just return its duplicate. Otherwise, build a
path to it by considering it is relative to base_path. URL are
supported. */
-void path_combine(char *dest, int dest_size,
-  const char *base_path,
-  const char *filename)
+char *path_combine(const char *base_path, const char *filename)
 {
+const char *protocol_stripped = NULL;
 const char *p, *p1;
+char *result;
 int len;
 
-if (dest_size <= 0)
-return;
 if (path_is_absolute(filename)) {
-pstrcpy(dest, dest_size, filename);
-} else {
-const char *protocol_stripped = NULL;
+return g_strdup(filename);
+}
 
-if (path_has_protocol(base_path)) {
-protocol_stripped = strchr(base_path, ':');
-if (protocol_stripped) {
-protocol_stripped++;
-}
+if (path_has_protocol(base_path)) {
+protocol_stripped = strchr(base_path, ':');
+if (protocol_stripped) {
+protocol_stripped++;
 }
-p = protocol_stripped ?: base_path;
+}
+p = protocol_stripped ?: base_path;
 
-p1 = strrchr(base_path, '/');
+p1 = strrchr(base_path, '/');
 #ifdef _WIN32
-{
-const char *p2;
-p2 = strrchr(base_path, '\\');
-if (!p1 || p2 > p1)
-p1 = p2;
+{
+const char *p2;
+p2 = strrchr(base_path, '\\');
+if (!p1 || p2 > p1) {
+p1 = p2;
 }
+}
 #endif
-if (p1)
-p1++;
-else
-p1 = base_path;
-if (p1 > p)
-p = p1;
-len = p - base_path;
-if (len > dest_size - 1)
-len = dest_size - 1;
-memcpy(dest, base_path, len);
-dest[len] = '\0';
-pstrcat(dest, dest_size, filename);
+if (p1) {
+p1++;
+} else {
+p1 = base_path;
+}
+if (p1 > p) {
+p = p1;
 }
+len = p - base_path;
+
+result = g_malloc(len + strlen(filename) + 1);
+memcpy(result, base_path, len);
+strcpy(result + len, filename);
+
+return result;
+}
+
+static void path_combine_deprecated(char *dest, int dest_size,
+const char *base_path,
+const char *filename)
+{
+char *combined = path_combine(base_path, filename);
+pstrcpy(dest, dest_size, combined);
+g_free(combined);
 }
 
 /*
@@ -292,7 +301,7 @@ void bdrv_get_full_backing_filename_from_filename(const 
char *backed,
 error_setg(errp, "Cannot use relative backing file names for '%s'",
backed);
 } else {
-path_combine(dest, sz, backed, backing);
+path_combine_deprecated(dest, sz, backed, backing);
 }
 }
 
@@ -4136,8 +4145,8 @@ BlockDriverState 
*bdrv_find_backing_image(BlockDriverState *bs,
 } else {
 /* If not an absolute filename path, make it relative to the 
current
  * image's filename path */
-path_combine(filename_tmp, PATH_MAX, curr_bs->filename,
- backing_file);
+path_combine_deprecated(filename_tmp, PATH_MAX, curr_bs->filename,
+backing_file);
 
 /* We are going to compare absolute pathnames */
 if (!realpath(filename_tmp, filename_full)) {
@@ -4146,8 +4155,8 @@ BlockDriverState 

[Qemu-block] [PATCH v7 for-2.12 15/25] block/nfs: Implement bdrv_dirname()

2017-11-20 Thread Max Reitz
While the basic idea is obvious and could be handled by the default
bdrv_dirname() implementation, we cannot generate a directory name if
the gid or uid are set, so we have to explicitly return NULL in those
cases.

Signed-off-by: Max Reitz 
Reviewed-by: Alberto Garcia 
---
 block/nfs.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/block/nfs.c b/block/nfs.c
index 337fcd9c84..0152a9bd32 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -875,6 +875,19 @@ static void nfs_refresh_filename(BlockDriverState *bs, 
QDict *options)
 bs->full_open_options = opts;
 }
 
+static char *nfs_dirname(BlockDriverState *bs, Error **errp)
+{
+NFSClient *client = bs->opaque;
+
+if (client->uid || client->gid) {
+error_setg(errp, "Cannot generate a base directory for NBD node '%s'",
+   bs->filename);
+return NULL;
+}
+
+return g_strdup_printf("nfs://%s%s/", client->server->host, client->path);
+}
+
 #ifdef LIBNFS_FEATURE_PAGECACHE
 static void nfs_invalidate_cache(BlockDriverState *bs,
  Error **errp)
@@ -908,6 +921,7 @@ static BlockDriver bdrv_nfs = {
 .bdrv_detach_aio_context= nfs_detach_aio_context,
 .bdrv_attach_aio_context= nfs_attach_aio_context,
 .bdrv_refresh_filename  = nfs_refresh_filename,
+.bdrv_dirname   = nfs_dirname,
 
 #ifdef LIBNFS_FEATURE_PAGECACHE
 .bdrv_invalidate_cache  = nfs_invalidate_cache,
-- 
2.13.6




[Qemu-block] [PATCH v7 for-2.12 13/25] quorum: Make bdrv_dirname() return NULL

2017-11-20 Thread Max Reitz
While the common implementation for bdrv_dirname() should return NULL
for quorum BDSs already (because they do not have a file node and their
exact_filename field should be empty), there is no reason not to make
that explicit.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: Alberto Garcia 
---
 block/quorum.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index 2f1a628449..e5a844335e 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1095,6 +1095,16 @@ static void quorum_refresh_filename(BlockDriverState 
*bs, QDict *options)
 bs->full_open_options = opts;
 }
 
+static char *quorum_dirname(BlockDriverState *bs, Error **errp)
+{
+/* In general, there are multiple BDSs with different dirnames below this
+ * one; so there is no unique dirname we could return (unless all are equal
+ * by chance, or there is only one). Therefore, to be consistent, just
+ * always return NULL. */
+error_setg(errp, "Cannot generate a base directory for quorum nodes");
+return NULL;
+}
+
 static BlockDriver bdrv_quorum = {
 .format_name= "quorum",
 .protocol_name  = "quorum",
@@ -1104,6 +1114,7 @@ static BlockDriver bdrv_quorum = {
 .bdrv_file_open = quorum_open,
 .bdrv_close = quorum_close,
 .bdrv_refresh_filename  = quorum_refresh_filename,
+.bdrv_dirname   = quorum_dirname,
 
 .bdrv_co_flush_to_disk  = quorum_co_flush,
 
-- 
2.13.6




[Qemu-block] [PATCH v7 for-2.12 12/25] blkverify: Make bdrv_dirname() return NULL

2017-11-20 Thread Max Reitz
blkverify's BDSs have a file BDS, but we do not want this to be
preferred over the raw node. There is no way to decide between the two
(and not really a reason to, either), so just return NULL in blkverify's
implementation of bdrv_dirname().

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: Alberto Garcia 
---
 block/blkverify.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/block/blkverify.c b/block/blkverify.c
index b2ed8cd70d..d5233eeaf9 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -309,6 +309,15 @@ static void blkverify_refresh_filename(BlockDriverState 
*bs, QDict *options)
 }
 }
 
+static char *blkverify_dirname(BlockDriverState *bs, Error **errp)
+{
+/* In general, there are two BDSs with different dirnames below this one;
+ * so there is no unique dirname we could return (unless both are equal by
+ * chance). Therefore, to be consistent, just always return NULL. */
+error_setg(errp, "Cannot generate a base directory for blkverify nodes");
+return NULL;
+}
+
 static BlockDriver bdrv_blkverify = {
 .format_name  = "blkverify",
 .protocol_name= "blkverify",
@@ -320,6 +329,7 @@ static BlockDriver bdrv_blkverify = {
 .bdrv_child_perm  = bdrv_filter_default_perms,
 .bdrv_getlength   = blkverify_getlength,
 .bdrv_refresh_filename= blkverify_refresh_filename,
+.bdrv_dirname = blkverify_dirname,
 
 .bdrv_co_preadv   = blkverify_co_preadv,
 .bdrv_co_pwritev  = blkverify_co_pwritev,
-- 
2.13.6




[Qemu-block] [PATCH v7 for-2.12 07/25] block: bdrv_get_full_backing_filename_from_...'s ret. val.

2017-11-20 Thread Max Reitz
Make bdrv_get_full_backing_filename_from_filename() return an allocated
string instead of placing the result in a caller-provided buffer.

Signed-off-by: Max Reitz 
Reviewed-by: Alberto Garcia 
---
 include/block/block.h |  7 +++
 block.c   | 51 +++
 block/vmdk.c  |  9 -
 3 files changed, 42 insertions(+), 25 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 4584893c78..46c6190414 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -483,10 +483,9 @@ void bdrv_get_backing_filename(BlockDriverState *bs,
char *filename, int filename_size);
 void bdrv_get_full_backing_filename(BlockDriverState *bs,
 char *dest, size_t sz, Error **errp);
-void bdrv_get_full_backing_filename_from_filename(const char *backed,
-  const char *backing,
-  char *dest, size_t sz,
-  Error **errp);
+char *bdrv_get_full_backing_filename_from_filename(const char *backed,
+   const char *backing,
+   Error **errp);
 
 int path_has_protocol(const char *path);
 int path_is_absolute(const char *path);
diff --git a/block.c b/block.c
index 8786b9b520..f8b3edd5bb 100644
--- a/block.c
+++ b/block.c
@@ -288,20 +288,28 @@ int bdrv_set_read_only(BlockDriverState *bs, bool 
read_only, Error **errp)
 return 0;
 }
 
-void bdrv_get_full_backing_filename_from_filename(const char *backed,
-  const char *backing,
-  char *dest, size_t sz,
-  Error **errp)
+/* If @backing is empty, this function returns NULL without setting
+ * @errp.  In all other cases, NULL will only be returned with @errp
+ * set.
+ *
+ * Therefore, a return value of NULL without @errp set means that
+ * there is no backing file; if @errp is set, there is one but its
+ * absolute filename cannot be generated.
+ */
+char *bdrv_get_full_backing_filename_from_filename(const char *backed,
+   const char *backing,
+   Error **errp)
 {
-if (backing[0] == '\0' || path_has_protocol(backing) ||
-path_is_absolute(backing))
-{
-pstrcpy(dest, sz, backing);
+if (backing[0] == '\0') {
+return NULL;
+} else if (path_has_protocol(backing) || path_is_absolute(backing)) {
+return g_strdup(backing);
 } else if (backed[0] == '\0' || strstart(backed, "json:", NULL)) {
 error_setg(errp, "Cannot use relative backing file names for '%s'",
backed);
+return NULL;
 } else {
-path_combine_deprecated(dest, sz, backed, backing);
+return path_combine(backed, backing);
 }
 }
 
@@ -309,9 +317,20 @@ void bdrv_get_full_backing_filename(BlockDriverState *bs, 
char *dest, size_t sz,
 Error **errp)
 {
 char *backed = bs->exact_filename[0] ? bs->exact_filename : bs->filename;
+char *full_name;
+Error *local_error = NULL;
 
-bdrv_get_full_backing_filename_from_filename(backed, bs->backing_file,
- dest, sz, errp);
+full_name = bdrv_get_full_backing_filename_from_filename(backed,
+ bs->backing_file,
+ _error);
+if (full_name) {
+pstrcpy(dest, sz, full_name);
+g_free(full_name);
+} else if (local_error) {
+error_propagate(errp, local_error);
+} else if (sz > 0) {
+*dest = '\0';
+}
 }
 
 void bdrv_register(BlockDriver *bdrv)
@@ -4593,17 +4612,17 @@ void bdrv_img_create(const char *filename, const char 
*fmt,
 size = qemu_opt_get_size(opts, BLOCK_OPT_SIZE, img_size);
 if (backing_file && !(flags & BDRV_O_NO_BACKING)) {
 BlockDriverState *bs;
-char *full_backing = g_new0(char, PATH_MAX);
+char *full_backing;
 int back_flags;
 QDict *backing_options = NULL;
 
-bdrv_get_full_backing_filename_from_filename(filename, backing_file,
- full_backing, PATH_MAX,
- _err);
+full_backing =
+bdrv_get_full_backing_filename_from_filename(filename, 
backing_file,
+ _err);
 if (local_err) {
-g_free(full_backing);
 goto out;
 }
+assert(full_backing);
 
 /* backing files always opened 

[Qemu-block] [PATCH v7 for-2.12 05/25] block: Respect backing bs in bdrv_refresh_filename

2017-11-20 Thread Max Reitz
Basically, bdrv_refresh_filename() should respect all children of a
BlockDriverState. However, generally those children are driver-specific,
so this function cannot handle the general case. On the other hand,
there are only few drivers which use other children than @file and
@backing (that being vmdk, quorum, and blkverify).

Most block drivers only use @file and/or @backing (if they use any
children at all). Both can be implemented directly in
bdrv_refresh_filename.

The user overriding the file's filename is already handled, however, the
user overriding the backing file is not. If this is done, opening the
BDS with the plain filename of its file will not be correct, so we may
not set bs->exact_filename in that case.

iotests 051 and 191 contain test cases for overwriting the backing file,
and so their output changes with this patch applied (which I consider a
good thing). Note that in the case of 191, the implicitly opened
(non-overridden) base file is included in the json:{} filename as well
-- this will be remedied by a later patch.

Signed-off-by: Max Reitz 
---
 block.c   | 12 +++-
 tests/qemu-iotests/051.out|  8 
 tests/qemu-iotests/051.pc.out |  8 
 tests/qemu-iotests/191.out| 24 
 4 files changed, 31 insertions(+), 21 deletions(-)

diff --git a/block.c b/block.c
index eb67dfbcc0..2dc06cb9cb 100644
--- a/block.c
+++ b/block.c
@@ -5016,6 +5016,7 @@ void bdrv_refresh_filename(BlockDriverState *bs)
 
 opts = qdict_new();
 has_open_options = append_open_options(opts, bs);
+has_open_options |= bs->backing_overridden;
 
 /* If no specific options have been given for this BDS, the filename of
  * the underlying file should suffice for this one as well */
@@ -5027,11 +5028,20 @@ void bdrv_refresh_filename(BlockDriverState *bs)
  * file BDS. The full options QDict of that file BDS should somehow
  * contain a representation of the filename, therefore the following
  * suffices without querying the (exact_)filename of this BDS. */
-if (bs->file->bs->full_open_options) {
+if (bs->file->bs->full_open_options &&
+(!bs->backing || bs->backing->bs->full_open_options))
+{
 qdict_put_str(opts, "driver", drv->format_name);
 QINCREF(bs->file->bs->full_open_options);
 qdict_put(opts, "file", bs->file->bs->full_open_options);
 
+if (bs->backing) {
+QINCREF(bs->backing->bs->full_open_options);
+qdict_put(opts, "backing", bs->backing->bs->full_open_options);
+} else if (bs->backing_overridden && !bs->backing) {
+qdict_put(opts, "backing", qstring_new());
+}
+
 bs->full_open_options = opts;
 } else {
 QDECREF(opts);
diff --git a/tests/qemu-iotests/051.out b/tests/qemu-iotests/051.out
index e3c6eaba57..50d5cd07c8 100644
--- a/tests/qemu-iotests/051.out
+++ b/tests/qemu-iotests/051.out
@@ -59,7 +59,7 @@ QEMU X.Y.Z monitor - type 'help' for more information
 Testing: -drive 
file=TEST_DIR/t.qcow2,driver=qcow2,backing.file.filename=TEST_DIR/t.qcow2.orig,if=none,id=drive0
 -nodefaults
 QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) info block
-drive0 (NODE_NAME): TEST_DIR/t.qcow2 (qcow2)
+drive0 (NODE_NAME): json:{"backing": {"driver": "qcow2", "file": {"driver": 
"file", "filename": "TEST_DIR/t.qcow2.orig"}}, "driver": "qcow2", "file": 
{"driver": "file", "filename": "TEST_DIR/t.qcow2"}} (qcow2)
 Removable device: not locked, tray closed
 Cache mode:   writeback
 Backing file: TEST_DIR/t.qcow2.orig (chain depth: 1)
@@ -148,7 +148,7 @@ QEMU_PROG: -drive driver=null-co,cache=invalid_value: 
invalid cache option
 Testing: -drive 
file=TEST_DIR/t.qcow2,cache=writeback,backing.file.filename=TEST_DIR/t.qcow2.base,backing.cache.no-flush=on,backing.node-name=backing,backing.file.node-name=backing-file,file.node-name=file,if=none,id=drive0
 -nodefaults
 QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) info block
-drive0 (NODE_NAME): TEST_DIR/t.qcow2 (qcow2)
+drive0 (NODE_NAME): json:{"backing": {"driver": "qcow2", "file": {"driver": 
"file", "filename": "TEST_DIR/t.qcow2.base"}}, "driver": "qcow2", "file": 
{"driver": "file", "filename": "TEST_DIR/t.qcow2"}} (qcow2)
 Removable device: not locked, tray closed
 Cache mode:   writeback
 Backing file: TEST_DIR/t.qcow2.base (chain depth: 1)
@@ -168,7 +168,7 @@ backing-file: TEST_DIR/t.qcow2.base (file, read-only)
 Testing: -drive 
file=TEST_DIR/t.qcow2,cache=writethrough,backing.file.filename=TEST_DIR/t.qcow2.base,backing.cache.no-flush=on,backing.node-name=backing,backing.file.node-name=backing-file,file.node-name=file,if=none,id=drive0
 -nodefaults
 QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) info block
-drive0 (NODE_NAME): TEST_DIR/t.qcow2 (qcow2)
+drive0 

[Qemu-block] [PATCH v7 for-2.12 24/25] block/curl: Implement bdrv_refresh_filename()

2017-11-20 Thread Max Reitz
Signed-off-by: Max Reitz 
---
 block/curl.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/block/curl.c b/block/curl.c
index 11318a9a29..fe57223fda 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -957,6 +957,20 @@ static int64_t curl_getlength(BlockDriverState *bs)
 return s->len;
 }
 
+static void curl_refresh_filename(BlockDriverState *bs)
+{
+BDRVCURLState *s = bs->opaque;
+
+if (!s->sslverify || s->cookie ||
+s->username || s->password || s->proxyusername || s->proxypassword)
+{
+return;
+}
+
+pstrcpy(bs->exact_filename, sizeof(bs->exact_filename), s->url);
+}
+
+
 static const char *const curl_sgfnt_runtime_opts[] = {
 CURL_BLOCK_OPT_URL,
 CURL_BLOCK_OPT_SSLVERIFY,
@@ -985,6 +999,7 @@ static BlockDriver bdrv_http = {
 .bdrv_detach_aio_context= curl_detach_aio_context,
 .bdrv_attach_aio_context= curl_attach_aio_context,
 
+.bdrv_refresh_filename  = curl_refresh_filename,
 .sgfnt_runtime_opts = curl_sgfnt_runtime_opts,
 };
 
@@ -1003,6 +1018,7 @@ static BlockDriver bdrv_https = {
 .bdrv_detach_aio_context= curl_detach_aio_context,
 .bdrv_attach_aio_context= curl_attach_aio_context,
 
+.bdrv_refresh_filename  = curl_refresh_filename,
 .sgfnt_runtime_opts = curl_sgfnt_runtime_opts,
 };
 
@@ -1021,6 +1037,7 @@ static BlockDriver bdrv_ftp = {
 .bdrv_detach_aio_context= curl_detach_aio_context,
 .bdrv_attach_aio_context= curl_attach_aio_context,
 
+.bdrv_refresh_filename  = curl_refresh_filename,
 .sgfnt_runtime_opts = curl_sgfnt_runtime_opts,
 };
 
@@ -1039,6 +1056,7 @@ static BlockDriver bdrv_ftps = {
 .bdrv_detach_aio_context= curl_detach_aio_context,
 .bdrv_attach_aio_context= curl_attach_aio_context,
 
+.bdrv_refresh_filename  = curl_refresh_filename,
 .sgfnt_runtime_opts = curl_sgfnt_runtime_opts,
 };
 
-- 
2.13.6




[Qemu-block] [PATCH v7 for-2.12 20/25] block: Generically refresh runtime options

2017-11-20 Thread Max Reitz
Instead of having every block driver which implements
bdrv_refresh_filename() copy all of the significant runtime options over
to bs->full_open_options, implement this process generically in
bdrv_refresh_filename().

This patch only adds this new generic implementation, it does not remove
the old functionality. This is done in a follow-up patch.

With this patch, some superfluous information (that should never have
been there) may be removed from some JSON filenames, as can be seen in
the change to iotest 110's reference output.  In case of 191, backing
nodes that have not been overridden are now removed from the filename.

Signed-off-by: Max Reitz 
---
 block.c| 116 -
 tests/qemu-iotests/110.out |   2 +-
 tests/qemu-iotests/191.out |  24 +-
 3 files changed, 128 insertions(+), 14 deletions(-)

diff --git a/block.c b/block.c
index 2e59e77f9d..64ef30bbac 100644
--- a/block.c
+++ b/block.c
@@ -4944,6 +4944,92 @@ out:
 return to_replace_bs;
 }
 
+/**
+ * Iterates through the list of runtime option keys that are said to be
+ * "significant" for a BDS. An option is called "significant" if it changes a
+ * BDS's data. For example, the null block driver's "size" and "read-zeroes"
+ * options are significant, but its "latency-ns" option is not.
+ *
+ * If a key returned by this function ends with a dot, all options starting 
with
+ * that prefix are significant.
+ */
+static const char *const *significant_options(BlockDriverState *bs,
+  const char *const *curopt)
+{
+static const char *const global_options[] = {
+"driver", "filename", "base-directory", NULL
+};
+
+if (!curopt) {
+return _options[0];
+}
+
+curopt++;
+if (curopt == _options[ARRAY_SIZE(global_options) - 1] && bs->drv) {
+curopt = bs->drv->sgfnt_runtime_opts;
+}
+
+return (curopt && *curopt) ? curopt : NULL;
+}
+
+/**
+ * Copies all significant runtime options from bs->options to the given QDict.
+ * The set of significant option keys is determined by invoking
+ * significant_options().
+ *
+ * Returns true iff any significant option was present in bs->options (and thus
+ * copied to the target QDict) with the exception of "filename" and "driver".
+ * The caller is expected to use this value to decide whether the existence of
+ * significant options prevents the generation of a plain filename.
+ */
+static bool append_significant_runtime_options(QDict *d, BlockDriverState *bs)
+{
+bool found_any = false;
+const char *const *option_name = NULL;
+
+if (!bs->drv) {
+return false;
+}
+
+while ((option_name = significant_options(bs, option_name))) {
+bool option_given = false;
+
+assert(strlen(*option_name) > 0);
+if ((*option_name)[strlen(*option_name) - 1] != '.') {
+QObject *entry = qdict_get(bs->options, *option_name);
+if (!entry) {
+continue;
+}
+
+qobject_incref(entry);
+qdict_put_obj(d, *option_name, entry);
+option_given = true;
+} else {
+const QDictEntry *entry;
+for (entry = qdict_first(bs->options); entry;
+ entry = qdict_next(bs->options, entry))
+{
+if (strstart(qdict_entry_key(entry), *option_name, NULL)) {
+qobject_incref(qdict_entry_value(entry));
+qdict_put_obj(d, qdict_entry_key(entry),
+  qdict_entry_value(entry));
+option_given = true;
+}
+}
+}
+
+/* While "driver" and "filename" need to be included in a JSON 
filename,
+ * their existence does not prohibit generation of a plain filename. */
+if (!found_any && option_given &&
+strcmp(*option_name, "driver") && strcmp(*option_name, "filename"))
+{
+found_any = true;
+}
+}
+
+return found_any;
+}
+
 static bool append_open_options(QDict *d, BlockDriverState *bs)
 {
 const QDictEntry *entry;
@@ -5098,9 +5184,37 @@ void bdrv_refresh_filename(BlockDriverState *bs)
 bs->full_open_options = opts;
 }
 
+/* Gather the options QDict */
+opts = qdict_new();
+append_significant_runtime_options(opts, bs);
+
+if (drv->bdrv_gather_child_options) {
+/* Some block drivers may not want to present all of their children's
+ * options, or name them differently from BdrvChild.name */
+drv->bdrv_gather_child_options(bs, opts);
+} else {
+QLIST_FOREACH(child, >children, next) {
+if (child->role == _backing && !bs->backing_overridden) {
+/* We can skip the backing BDS if it has not been overridden */
+continue;
+}
+
+QINCREF(child->bs->full_open_options);
+

[Qemu-block] [PATCH v7 for-2.12 23/25] block: Fix FIXME from "Add BDS.backing_overridden"

2017-11-20 Thread Max Reitz
Said commit introduced a FIXME stating that bdrv_open_backing_file()
should set bs->backing_overridden to true not only if the file.filename
option was set, but if the "options" QDict contained any option that is
significant for any node in the BDS tree emerging from the backing BDS.
This behavior is implemented by this patch.

Signed-off-by: Max Reitz 
---
 block.c | 81 -
 1 file changed, 75 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index 526ce43e9b..40007544e5 100644
--- a/block.c
+++ b/block.c
@@ -73,6 +73,9 @@ static BlockDriverState *bdrv_open_inherit(const char 
*filename,
const BdrvChildRole *child_role,
Error **errp);
 
+static bool has_significant_runtime_options(BlockDriverState *bs,
+const QDict *d);
+
 /* If non-zero, use only whitelisted block drivers */
 static int use_bdrv_whitelist;
 
@@ -2214,6 +2217,42 @@ out:
 bdrv_refresh_limits(bs, NULL);
 }
 
+/**
+ * Checks whether @options contains any significant option for any of the nodes
+ * in the BDS tree emerging from @bs.
+ */
+static bool is_significant_option_tree(BlockDriverState *bs, QDict *options)
+{
+BdrvChild *child;
+
+if (!qdict_size(options)) {
+/* No need to recurse */
+return false;
+}
+
+if (has_significant_runtime_options(bs, options)) {
+return true;
+}
+
+QLIST_FOREACH(child, >children, next) {
+QDict *child_options;
+char *option_prefix;
+
+option_prefix = g_strdup_printf("%s.", child->name);
+qdict_extract_subqdict(options, _options, option_prefix);
+g_free(option_prefix);
+
+if (is_significant_option_tree(child->bs, child_options)) {
+QDECREF(child_options);
+return true;
+}
+
+QDECREF(child_options);
+}
+
+return false;
+}
+
 /*
  * Opens the backing file for a BlockDriverState if not yet open
  *
@@ -2232,7 +2271,7 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*parent_options,
 const char *reference = NULL;
 int ret = 0;
 BlockDriverState *backing_hd;
-QDict *options;
+QDict *options, *cloned_options = NULL;
 QDict *tmp_parent_options = NULL;
 Error *local_err = NULL;
 
@@ -2262,11 +2301,6 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*parent_options,
 reference = qdict_get_try_str(parent_options, bdref_key);
 if (reference || qdict_haskey(options, "file.filename")) {
 /* keep backing_filename NULL */
-
-/* FIXME: Should also be set to true if @options contains other runtime
- *options which control the data that is read from the backing
- *BDS */
-bs->backing_overridden = true;
 } else if (bs->backing_file[0] == '\0' && qdict_size(options) == 0) {
 QDECREF(options);
 goto free_exit;
@@ -2287,6 +2321,8 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*parent_options,
 goto free_exit;
 }
 
+cloned_options = qdict_clone_shallow(options);
+
 if (!reference &&
 bs->backing_format[0] != '\0' && !qdict_haskey(options, "driver")) {
 qdict_put_str(options, "driver", bs->backing_format);
@@ -2302,6 +2338,10 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*parent_options,
 }
 bdrv_set_aio_context(backing_hd, bdrv_get_aio_context(bs));
 
+if (reference || is_significant_option_tree(backing_hd, cloned_options)) {
+bs->backing_overridden = true;
+}
+
 /* Hook up the backing file link; drop our reference, bs owns the
  * backing_hd reference now */
 bdrv_set_backing_hd(bs, backing_hd, _err);
@@ -2317,6 +2357,7 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*parent_options,
 free_exit:
 g_free(backing_filename);
 QDECREF(tmp_parent_options);
+QDECREF(cloned_options);
 return ret;
 }
 
@@ -4973,6 +5014,34 @@ static const char *const 
*significant_options(BlockDriverState *bs,
 }
 
 /**
+ * Returns true if @d contains any options the block driver of @bs considers to
+ * be significant runtime options.
+ */
+static bool has_significant_runtime_options(BlockDriverState *bs,
+const QDict *d)
+{
+const char *const *option_name = NULL;
+
+while ((option_name = significant_options(bs, option_name))) {
+assert(strlen(*option_name) > 0);
+if ((*option_name)[strlen(*option_name) - 1] != '.') {
+if (qdict_haskey(d, *option_name)) {
+return true;
+}
+} else {
+const QDictEntry *entry;
+for (entry = qdict_first(d); entry; entry = qdict_next(d, entry)) {
+if (strstart(qdict_entry_key(entry), *option_name, NULL)) {
+return true;
+}

[Qemu-block] [PATCH v7 for-2.12 22/25] block: Do not copy exact_filename from format file

2017-11-20 Thread Max Reitz
If the a format BDS's file BDS is in turn a format BDS, we cannot simply
use the same filename, because when opening a BDS tree based on a
filename alone, qemu will create only one format node on top of one
protocol node (disregarding a potential backing file).

Signed-off-by: Max Reitz 
---
 block.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index 275cae460a..526ce43e9b 100644
--- a/block.c
+++ b/block.c
@@ -5104,9 +5104,20 @@ void bdrv_refresh_filename(BlockDriverState *bs)
 
 bs->exact_filename[0] = '\0';
 
-/* If no specific options have been given for this BDS, the filename of
- * the underlying file should suffice for this one as well */
-if (bs->file->bs->exact_filename[0] && !generate_json_filename) {
+/* We can use the underlying file's filename if:
+ * - it has a filename,
+ * - the file is a protocol BDS, and
+ * - opening that file (as this BDS's format) will automatically create
+ *   the BDS tree we have right now, that is:
+ *   - the user did not significantly change this BDS's behavior with
+ * some explicit options
+ *   - no non-file child of this BDS has been overridden by the user
+ *   Both of these conditions are represented by 
generate_json_filename.
+ */
+if (bs->file->bs->exact_filename[0] &&
+bs->file->bs->drv->bdrv_file_open &&
+!generate_json_filename)
+{
 strcpy(bs->exact_filename, bs->file->bs->exact_filename);
 }
 }
-- 
2.13.6




Re: [Qemu-block] [PATCH] block: Close a BlockDriverState completely even when bs->drv is NULL

2017-11-20 Thread Max Reitz
On 2017-11-06 15:53, Alberto Garcia wrote:
> bdrv_close() skips much of its logic when bs->drv is NULL. This is
> fine when we're closing a BlockDriverState that has just been created
> (because e.g the initialization process failed), but it's not enough
> in other cases.
> 
> For example, when a valid qcow2 image is found to be corrupted then
> QEMU marks it as such in the file header and then sets bs->drv to
> NULL in order to make the BlockDriverState unusable. When that BDS is
> later closed then many of its data structures are not freed (leaking
> their memory) and none of its children are detached. This results in
> bdrv_close_all() failing to close all BDSs and making this assertion
> fail when QEMU is being shut down:
> 
>bdrv_close_all: Assertion `QTAILQ_EMPTY(_bdrv_states)' failed.
> 
> This patch makes bdrv_close() do the full uninitialization process
> in all cases. This fixes the problem with corrupted images and still
> works fine with freshly created BDSs.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  block.c| 57 
> +++---
>  tests/qemu-iotests/060 | 13 +++
>  tests/qemu-iotests/060.out | 12 ++
>  3 files changed, 53 insertions(+), 29 deletions(-)

Thanks, applied to my block branch:

https://github.com/XanClic/qemu/commits/block

Max



signature.asc
Description: OpenPGP digital signature


[Qemu-block] [PATCH v7 for-2.12 02/25] block: Use children list in bdrv_refresh_filename

2017-11-20 Thread Max Reitz
bdrv_refresh_filename() should invoke itself recursively on all
children, not just on file.

With that change, we can remove the manual invocations in blkverify,
quorum, commit, and mirror.

Signed-off-by: Max Reitz 
Reviewed-by: Alberto Garcia 
---
 block.c   | 9 +
 block/blkverify.c | 3 ---
 block/commit.c| 1 -
 block/mirror.c| 1 -
 block/quorum.c| 1 -
 5 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/block.c b/block.c
index 6c8ef98dfa..4aebf590d8 100644
--- a/block.c
+++ b/block.c
@@ -4965,16 +4965,17 @@ static bool append_open_options(QDict *d, 
BlockDriverState *bs)
 void bdrv_refresh_filename(BlockDriverState *bs)
 {
 BlockDriver *drv = bs->drv;
+BdrvChild *child;
 QDict *opts;
 
 if (!drv) {
 return;
 }
 
-/* This BDS's file name will most probably depend on its file's name, so
- * refresh that first */
-if (bs->file) {
-bdrv_refresh_filename(bs->file->bs);
+/* This BDS's file name may depend on any of its children's file names, so
+ * refresh those first */
+QLIST_FOREACH(child, >children, next) {
+bdrv_refresh_filename(child->bs);
 }
 
 if (drv->bdrv_refresh_filename) {
diff --git a/block/blkverify.c b/block/blkverify.c
index 06369f9eac..b2ed8cd70d 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -281,9 +281,6 @@ static void blkverify_refresh_filename(BlockDriverState 
*bs, QDict *options)
 {
 BDRVBlkverifyState *s = bs->opaque;
 
-/* bs->file->bs has already been refreshed */
-bdrv_refresh_filename(s->test_file->bs);
-
 if (bs->file->bs->full_open_options
 && s->test_file->bs->full_open_options)
 {
diff --git a/block/commit.c b/block/commit.c
index 5036eec434..cb529cc2aa 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -241,7 +241,6 @@ static int coroutine_fn 
bdrv_commit_top_preadv(BlockDriverState *bs,
 
 static void bdrv_commit_top_refresh_filename(BlockDriverState *bs, QDict *opts)
 {
-bdrv_refresh_filename(bs->backing->bs);
 pstrcpy(bs->exact_filename, sizeof(bs->exact_filename),
 bs->backing->bs->filename);
 }
diff --git a/block/mirror.c b/block/mirror.c
index f995924032..f059981f19 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1061,7 +1061,6 @@ static void 
bdrv_mirror_top_refresh_filename(BlockDriverState *bs, QDict *opts)
  * bdrv_set_backing_hd */
 return;
 }
-bdrv_refresh_filename(bs->backing->bs);
 pstrcpy(bs->exact_filename, sizeof(bs->exact_filename),
 bs->backing->bs->filename);
 }
diff --git a/block/quorum.c b/block/quorum.c
index 272f9a5b77..2f1a628449 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1074,7 +1074,6 @@ static void quorum_refresh_filename(BlockDriverState *bs, 
QDict *options)
 int i;
 
 for (i = 0; i < s->num_children; i++) {
-bdrv_refresh_filename(s->children[i]->bs);
 if (!s->children[i]->bs->full_open_options) {
 return;
 }
-- 
2.13.6




[Qemu-block] [PATCH v7 for-2.12 17/25] iotests: Add quorum case to test 110

2017-11-20 Thread Max Reitz
Test 110 tests relative backing filenames for complex BDS trees.  Now
that the originally supposedly failing test passes, let us add a new
failing test: Quorum can never work automatically (without detecting
whether all child nodes have the same base directory, but that would be
rather inconsistent behavior).

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/110 | 26 ++
 tests/qemu-iotests/110.out |  7 +++
 2 files changed, 33 insertions(+)

diff --git a/tests/qemu-iotests/110 b/tests/qemu-iotests/110
index ba1b3c6c7d..08976c8a61 100755
--- a/tests/qemu-iotests/110
+++ b/tests/qemu-iotests/110
@@ -30,6 +30,7 @@ status=1  # failure is the default!
 _cleanup()
 {
_cleanup_test_img
+rm -f "$TEST_IMG.copy"
 }
 trap "_cleanup; exit \$status" 0 1 2 3 15
 
@@ -87,6 +88,31 @@ echo
 # omit the image size; it should work anyway
 _make_test_img -b "$TEST_IMG_REL.base"
 
+echo
+echo '=== Nodes without a common directory ==='
+echo
+
+cp "$TEST_IMG" "$TEST_IMG.copy"
+
+# Should inform us that the actual path of the backing file cannot be 
determined
+TEST_IMG="json:{
+'driver': '$IMGFMT',
+'file': {
+'driver': 'quorum',
+'vote-threshold': 1,
+'children': [
+{
+'driver': 'file',
+'filename': '$TEST_IMG'
+},
+{
+'driver': 'file',
+'filename': '$TEST_IMG.copy'
+}
+]
+}
+}" _img_info | _filter_img_info
+
 
 # success, all done
 echo '*** done'
diff --git a/tests/qemu-iotests/110.out b/tests/qemu-iotests/110.out
index 5370bc1d26..1d0b2475cc 100644
--- a/tests/qemu-iotests/110.out
+++ b/tests/qemu-iotests/110.out
@@ -19,4 +19,11 @@ backing file: t.IMGFMT.base (actual path: 
TEST_DIR/t.IMGFMT.base)
 === Backing name is always relative to the backed image ===
 
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 
backing_file=t.IMGFMT.base
+
+=== Nodes without a common directory ===
+
+image: json:{"driver": "IMGFMT", "file": {"children": [{"driver": "file", 
"filename": "TEST_DIR/t.IMGFMT"}, {"driver": "file", "filename": 
"TEST_DIR/t.IMGFMT.copy"}], "driver": "quorum", "blkverify": false, 
"rewrite-corrupted": false, "vote-threshold": 1}}
+file format: IMGFMT
+virtual size: 64M (67108864 bytes)
+backing file: t.IMGFMT.base (cannot determine actual path)
 *** done
-- 
2.13.6




[Qemu-block] [PATCH v7 for-2.12 14/25] block/nbd: Make bdrv_dirname() return NULL

2017-11-20 Thread Max Reitz
The generic bdrv_dirname() implementation would be able to generate some
form of directory name for many NBD nodes, but it would be always wrong.
Therefore, we have to explicitly make it an error (until NBD has some
form of specification for export paths, if it ever will).

Signed-off-by: Max Reitz 
---
 block/nbd.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index a50d24b50a..ed921fa333 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -565,6 +565,16 @@ static void nbd_refresh_filename(BlockDriverState *bs, 
QDict *options)
 bs->full_open_options = opts;
 }
 
+static char *nbd_dirname(BlockDriverState *bs, Error **errp)
+{
+/* The generic bdrv_dirname() implementation is able to work out some
+ * directory name for NBD nodes, but that would be wrong. So far there is 
no
+ * specification for how "export paths" would work, so NBD does not have
+ * directory names. */
+error_setg(errp, "Cannot generate a base directory for NBD nodes");
+return NULL;
+}
+
 static BlockDriver bdrv_nbd = {
 .format_name= "nbd",
 .protocol_name  = "nbd",
@@ -582,6 +592,7 @@ static BlockDriver bdrv_nbd = {
 .bdrv_detach_aio_context= nbd_detach_aio_context,
 .bdrv_attach_aio_context= nbd_attach_aio_context,
 .bdrv_refresh_filename  = nbd_refresh_filename,
+.bdrv_dirname   = nbd_dirname,
 };
 
 static BlockDriver bdrv_nbd_tcp = {
@@ -601,6 +612,7 @@ static BlockDriver bdrv_nbd_tcp = {
 .bdrv_detach_aio_context= nbd_detach_aio_context,
 .bdrv_attach_aio_context= nbd_attach_aio_context,
 .bdrv_refresh_filename  = nbd_refresh_filename,
+.bdrv_dirname   = nbd_dirname,
 };
 
 static BlockDriver bdrv_nbd_unix = {
@@ -620,6 +632,7 @@ static BlockDriver bdrv_nbd_unix = {
 .bdrv_detach_aio_context= nbd_detach_aio_context,
 .bdrv_attach_aio_context= nbd_attach_aio_context,
 .bdrv_refresh_filename  = nbd_refresh_filename,
+.bdrv_dirname   = nbd_dirname,
 };
 
 static void bdrv_nbd_init(void)
-- 
2.13.6




[Qemu-block] [PATCH v7 for-2.12 11/25] block: Add bdrv_dirname()

2017-11-20 Thread Max Reitz
This function may be implemented by block drivers to derive a directory
name from a BDS. Concatenating this g_free()-able string with a relative
filename must result in a valid (not necessarily existing) filename, so
this is a function that should generally be not implemented by format
drivers, because this is protocol-specific.

If a BDS's driver does not implement this function, bdrv_dirname() will
fall through to the BDS's file if it exists. If it does not, the
exact_filename field will be used to generate a directory name.

Signed-off-by: Max Reitz 
Reviewed-by: Alberto Garcia 
---
 include/block/block.h |  1 +
 include/block/block_int.h |  7 +++
 block.c   | 26 ++
 3 files changed, 34 insertions(+)

diff --git a/include/block/block.h b/include/block/block.h
index 3e2ea6c879..a053a27148 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -485,6 +485,7 @@ char *bdrv_get_full_backing_filename(BlockDriverState *bs, 
Error **errp);
 char *bdrv_get_full_backing_filename_from_filename(const char *backed,
const char *backing,
Error **errp);
+char *bdrv_dirname(BlockDriverState *bs, Error **errp);
 
 int path_has_protocol(const char *path);
 int path_is_absolute(const char *path);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 5e9734d8b5..c07882962d 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -135,6 +135,13 @@ struct BlockDriver {
 
 void (*bdrv_refresh_filename)(BlockDriverState *bs, QDict *options);
 
+/*
+ * Returns an allocated string which is the directory name of this BDS: It
+ * will be used to make relative filenames absolute by prepending this
+ * function's return value to them.
+ */
+char *(*bdrv_dirname)(BlockDriverState *bs, Error **errp);
+
 /* aio */
 BlockAIOCB *(*bdrv_aio_readv)(BlockDriverState *bs,
 int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
diff --git a/block.c b/block.c
index ecea78dae0..20e45c85c6 100644
--- a/block.c
+++ b/block.c
@@ -5098,6 +5098,32 @@ void bdrv_refresh_filename(BlockDriverState *bs)
 }
 }
 
+char *bdrv_dirname(BlockDriverState *bs, Error **errp)
+{
+BlockDriver *drv = bs->drv;
+
+if (!drv) {
+error_setg(errp, "Node '%s' is ejected", bs->node_name);
+return NULL;
+}
+
+if (drv->bdrv_dirname) {
+return drv->bdrv_dirname(bs, errp);
+}
+
+if (bs->file) {
+return bdrv_dirname(bs->file->bs, errp);
+}
+
+if (bs->exact_filename[0] != '\0') {
+return path_combine(bs->exact_filename, "");
+}
+
+error_setg(errp, "Cannot generate a base directory for %s nodes",
+   drv->format_name);
+return NULL;
+}
+
 /*
  * Hot add/remove a BDS's child. So the user can take a child offline when
  * it is broken and take a new child online
-- 
2.13.6




[Qemu-block] [PATCH v7 for-2.12 16/25] block: Use bdrv_dirname() for relative filenames

2017-11-20 Thread Max Reitz
bdrv_get_full_backing_filename_from_filename() breaks down when it comes
to JSON filenames. Using bdrv_dirname() as the basis is better because
since we have BDS, we can descend through the BDS tree to the protocol
layer, which gives us a greater probability of finding a non-JSON name;
also, bdrv_dirname() is more correct as it allows block drivers to
override the generation of that directory name in a protocol-specific
way.

We still need to keep bdrv_get_full_backing_filename_from_filename(),
though, because it has valid callers which need it during image creation
when no BDS is available yet.

This makes a test case in qemu-iotest 110, which was supposed to fail,
work. That is actually good, but we need to change the reference output
(and the comment in 110) accordingly.

Signed-off-by: Max Reitz 
Reviewed-by: Alberto Garcia 
---
 block.c| 20 +++-
 tests/qemu-iotests/110 |  3 ++-
 tests/qemu-iotests/110.out |  2 +-
 3 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/block.c b/block.c
index 20e45c85c6..2e59e77f9d 100644
--- a/block.c
+++ b/block.c
@@ -311,12 +311,22 @@ char *bdrv_get_full_backing_filename_from_filename(const 
char *backed,
 static char *bdrv_make_absolute_filename(BlockDriverState *relative_to,
  const char *filename, Error **errp)
 {
-char *bs_filename = relative_to->exact_filename[0]
-? relative_to->exact_filename
-: relative_to->filename;
+char *dir, *full_name;
 
-return bdrv_get_full_backing_filename_from_filename(bs_filename,
-filename ?: "", errp);
+if (!filename || filename[0] == '\0') {
+return NULL;
+} else if (path_has_protocol(filename) || path_is_absolute(filename)) {
+return g_strdup(filename);
+}
+
+dir = bdrv_dirname(relative_to, errp);
+if (!dir) {
+return NULL;
+}
+
+full_name = g_strconcat(dir, filename, NULL);
+g_free(dir);
+return full_name;
 }
 
 char *bdrv_get_full_backing_filename(BlockDriverState *bs, Error **errp)
diff --git a/tests/qemu-iotests/110 b/tests/qemu-iotests/110
index 9de7369f3a..ba1b3c6c7d 100755
--- a/tests/qemu-iotests/110
+++ b/tests/qemu-iotests/110
@@ -61,7 +61,8 @@ echo '=== Non-reconstructable filename ==='
 echo
 
 # Across blkdebug without a config file, you cannot reconstruct filenames, so
-# qemu is incapable of knowing the directory of the top image
+# qemu is incapable of knowing the directory of the top image from the filename
+# alone. However, using bdrv_dirname(), it should still work.
 TEST_IMG="json:{
 'driver': '$IMGFMT',
 'file': {
diff --git a/tests/qemu-iotests/110.out b/tests/qemu-iotests/110.out
index b3584ff87f..5370bc1d26 100644
--- a/tests/qemu-iotests/110.out
+++ b/tests/qemu-iotests/110.out
@@ -14,7 +14,7 @@ backing file: t.IMGFMT.base (actual path: 
TEST_DIR/t.IMGFMT.base)
 image: json:{"driver": "IMGFMT", "file": {"set-state.0.event": "read_aio", 
"image": {"driver": "file", "filename": "TEST_DIR/t.IMGFMT"}, "driver": 
"blkdebug", "set-state.0.new_state": 42}}
 file format: IMGFMT
 virtual size: 64M (67108864 bytes)
-backing file: t.IMGFMT.base (cannot determine actual path)
+backing file: t.IMGFMT.base (actual path: TEST_DIR/t.IMGFMT.base)
 
 === Backing name is always relative to the backed image ===
 
-- 
2.13.6




Re: [Qemu-block] [PATCH for-2.11] iotests: Fix 176 on 32-bit host

2017-11-20 Thread Max Reitz
On 2017-11-17 20:04, Eric Blake wrote:
> The contents of a qcow2 bitmap are rounded up to a size that
> matches the number of bits available for the granularity, but
> that granularity differs for 32-bit hosts (our default 64k
> cluster allows for 2M bitmap coverage per 'long') and 64-bit
> hosts (4M bitmap per 'long').  If the image is a multiple of
> 2M but not 4M, then the number of bytes occupied by the array
> of longs in memory differs between architecture, thus
> resulting in different SHA256 hashes.
> 
> Furthermore (but untested by me), if our computation of the
> SHA256 hash is at all endian-dependent because of how we store
> data in memory, that's another variable we'd have to account
> for (ideally, we specified the bitmap stored in qcow2 as
> fixed-endian on disk, because the same qcow2 file must be
> usable across any architecture; but that says nothing about
> how we represent things in memory).  But we already have test
> 165 to validate that bitmaps are stored correctly on disk,
> while this test is merely testing that the bitmap exists.
> 
> So for this test, the easiest solution is to filter out the
> actual hash value.  Broken in commit 4096974e.
> 
> Reported-by: Max Reitz 
> Signed-off-by: Eric Blake 
> ---
>  tests/qemu-iotests/176 | 3 ++-
>  tests/qemu-iotests/176.out | 8 
>  2 files changed, 6 insertions(+), 5 deletions(-)

Now that I've failed to keep my tree empty anyway:

Thanks, applied to my block branch:

https://github.com/XanClic/qemu/commits/block

(No offense taken if **someone** *cough* *cough* were to take it from me)

Max



signature.asc
Description: OpenPGP digital signature


[Qemu-block] [PATCH v7 for-2.12 21/25] block: Purify .bdrv_refresh_filename()

2017-11-20 Thread Max Reitz
Currently, BlockDriver.bdrv_refresh_filename() is supposed to both
refresh the filename (BDS.exact_filename) and set BDS.full_open_options.
Now that we have generic code in the central bdrv_refresh_filename() for
creating BDS.full_open_options, we can drop the latter part from all
BlockDriver.bdrv_refresh_filename() implementations.

This also means that we can drop all of the existing default code for
this from the global bdrv_refresh_filename() itself.

Furthermore, we now have to call BlockDriver.bdrv_refresh_filename()
after having set BDS.full_open_options, because the block driver's
implementation should now be allowed to depend on BDS.full_open_options
being set correctly.

Finally, with this patch we can drop the @options parameter from
BlockDriver.bdrv_refresh_filename(); also, add a comment on this
function's purpose in block/block_int.h while touching its interface.

Signed-off-by: Max Reitz 
---
 include/block/block_int.h |   6 +-
 block.c   | 145 +++---
 block/blkdebug.c  |  52 ++---
 block/blkverify.c |  16 +
 block/commit.c|   2 +-
 block/mirror.c|   2 +-
 block/nbd.c   |  23 +---
 block/nfs.c   |  36 +---
 block/null.c  |  23 +---
 block/quorum.c|  30 --
 10 files changed, 63 insertions(+), 272 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index f81516b615..b67f3af599 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -133,7 +133,11 @@ struct BlockDriver {
 int (*bdrv_create)(const char *filename, QemuOpts *opts, Error **errp);
 int (*bdrv_make_empty)(BlockDriverState *bs);
 
-void (*bdrv_refresh_filename)(BlockDriverState *bs, QDict *options);
+/*
+ * Refreshes the bs->exact_filename field. If that is impossible,
+ * bs->exact_filename has to be left empty.
+ */
+void (*bdrv_refresh_filename)(BlockDriverState *bs);
 
 /*
  * Gathers the open options for all children into @target.
diff --git a/block.c b/block.c
index 64ef30bbac..275cae460a 100644
--- a/block.c
+++ b/block.c
@@ -5030,47 +5030,6 @@ static bool append_significant_runtime_options(QDict *d, 
BlockDriverState *bs)
 return found_any;
 }
 
-static bool append_open_options(QDict *d, BlockDriverState *bs)
-{
-const QDictEntry *entry;
-QemuOptDesc *desc;
-BdrvChild *child;
-bool found_any = false;
-const char *p;
-
-for (entry = qdict_first(bs->options); entry;
- entry = qdict_next(bs->options, entry))
-{
-/* Exclude options for children */
-QLIST_FOREACH(child, >children, next) {
-if (strstart(qdict_entry_key(entry), child->name, )
-&& (!*p || *p == '.'))
-{
-break;
-}
-}
-if (child) {
-continue;
-}
-
-/* And exclude all non-driver-specific options */
-for (desc = bdrv_runtime_opts.desc; desc->name; desc++) {
-if (!strcmp(qdict_entry_key(entry), desc->name)) {
-break;
-}
-}
-if (desc->name) {
-continue;
-}
-
-qobject_incref(qdict_entry_value(entry));
-qdict_put_obj(d, qdict_entry_key(entry), qdict_entry_value(entry));
-found_any = true;
-}
-
-return found_any;
-}
-
 /* Updates the following BDS fields:
  *  - exact_filename: A filename which may be used for opening a block device
  *which (mostly) equals the given BDS (even without any
@@ -5088,6 +5047,8 @@ void bdrv_refresh_filename(BlockDriverState *bs)
 BlockDriver *drv = bs->drv;
 BdrvChild *child;
 QDict *opts;
+bool generate_json_filename; /* Whether our default implementation should
+fill exact_filename (false) or not (true) 
*/
 
 if (!drv) {
 return;
@@ -5103,90 +5064,10 @@ void bdrv_refresh_filename(BlockDriverState *bs)
 }
 }
 
-if (drv->bdrv_refresh_filename) {
-/* Obsolete information is of no use here, so drop the old file name
- * information before refreshing it */
-bs->exact_filename[0] = '\0';
-if (bs->full_open_options) {
-QDECREF(bs->full_open_options);
-bs->full_open_options = NULL;
-}
-
-opts = qdict_new();
-append_open_options(opts, bs);
-drv->bdrv_refresh_filename(bs, opts);
-QDECREF(opts);
-} else if (bs->file) {
-/* Try to reconstruct valid information from the underlying file */
-bool has_open_options;
-
-bs->exact_filename[0] = '\0';
-if (bs->full_open_options) {
-QDECREF(bs->full_open_options);
-bs->full_open_options = NULL;
-}
-
-opts = qdict_new();
-has_open_options = append_open_options(opts, bs);
- 

[Qemu-block] [PATCH v7 for-2.12 00/25] block: Fix some filename generation issues

2017-11-20 Thread Max Reitz
I'm sparing myself writing this cover letter again, and I'll just give
you a link to the previous version:

http://lists.nongnu.org/archive/html/qemu-block/2017-09/msg01030.html

The only difference is that I dropped patch 16 which added a QAPI
@base-directory option for any node that could be used to override the
base directory to be used for resolving relative filenames.  (Because
Berto and Kevin convinced me it's not that useful now -- and the patch
is pretty stand-alone, so we can always add it later if needed.)

In turn, I had to add patch 4 because test 191 was broken by this
series.


v7: Rebased on block-next, and:
- Patch 4: Added (iotest 191 tries to both override a node's backing
   file and use filenames for node identification at the same
   time -- not a very good idea, but with a simple modification
   we can still get away with it)
- Patch 5: Changes 191's output now
- Patch 10 (was 9): Shorten the code [Berto]
- Old patch 16: Dropped [Berto/Kevin]
- Patch 17: Remove test case for @base-directory
- Patch 18:
  - Never use inline definition of the significant runtime options array
[Kevin]
  - Add encrypt.key-secret options for qcow/qcow2
  - Add transfer limit options for blkdebug
  - s/uid/user/ and s/gid/group/ for NFS
  - Add offset/size for raw
- Patch 19: Sprinkle a couple of notes in about how we actually don't
want this new function [Kevin, kind of]
- Patch 20: Changes 191's output now (and also, there is a rebase
conflict for 110 due to patch 16 being removed and thus
patch 17 having been modified)


git-backport-diff against v6:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/25:[] [--] 'block/mirror: Small absolute-paths simplification'
002/25:[] [--] 'block: Use children list in bdrv_refresh_filename'
003/25:[] [--] 'block: Add BDS.backing_overridden'
004/25:[down] 'iotests: Drop explicit base blockdev in 191'
005/25:[0024] [FC] 'block: Respect backing bs in bdrv_refresh_filename'
006/25:[] [--] 'block: Make path_combine() return the path'
007/25:[] [-C] 'block: bdrv_get_full_backing_filename_from_...'s ret. val.'
008/25:[] [--] 'block: bdrv_get_full_backing_filename's ret. val.'
009/25:[] [--] 'block: Add bdrv_make_absolute_filename()'
010/25:[0014] [FC] 'block: Fix bdrv_find_backing_image()'
011/25:[] [--] 'block: Add bdrv_dirname()'
012/25:[] [--] 'blkverify: Make bdrv_dirname() return NULL'
013/25:[] [--] 'quorum: Make bdrv_dirname() return NULL'
014/25:[] [--] 'block/nbd: Make bdrv_dirname() return NULL'
015/25:[] [--] 'block/nfs: Implement bdrv_dirname()'
016/25:[] [--] 'block: Use bdrv_dirname() for relative filenames'
017/25:[0027] [FC] 'iotests: Add quorum case to test 110'
018/25:[0188] [FC] 'block: Add sgfnt_runtime_opts to BlockDriver'
019/25:[0044] [FC] 'block: Add BlockDriver.bdrv_gather_child_options'
020/25:[0026] [FC] 'block: Generically refresh runtime options'
021/25:[] [-C] 'block: Purify .bdrv_refresh_filename()'
022/25:[] [--] 'block: Do not copy exact_filename from format file'
023/25:[] [-C] 'block: Fix FIXME from "Add BDS.backing_overridden"'
024/25:[] [--] 'block/curl: Implement bdrv_refresh_filename()'
025/25:[] [--] 'block/null: Generate filename even with latency-ns'


Max Reitz (25):
  block/mirror: Small absolute-paths simplification
  block: Use children list in bdrv_refresh_filename
  block: Add BDS.backing_overridden
  iotests: Drop explicit base blockdev in 191
  block: Respect backing bs in bdrv_refresh_filename
  block: Make path_combine() return the path
  block: bdrv_get_full_backing_filename_from_...'s ret. val.
  block: bdrv_get_full_backing_filename's ret. val.
  block: Add bdrv_make_absolute_filename()
  block: Fix bdrv_find_backing_image()
  block: Add bdrv_dirname()
  blkverify: Make bdrv_dirname() return NULL
  quorum: Make bdrv_dirname() return NULL
  block/nbd: Make bdrv_dirname() return NULL
  block/nfs: Implement bdrv_dirname()
  block: Use bdrv_dirname() for relative filenames
  iotests: Add quorum case to test 110
  block: Add sgfnt_runtime_opts to BlockDriver
  block: Add BlockDriver.bdrv_gather_child_options
  block: Generically refresh runtime options
  block: Purify .bdrv_refresh_filename()
  block: Do not copy exact_filename from format file
  block: Fix FIXME from "Add BDS.backing_overridden"
  block/curl: Implement bdrv_refresh_filename()
  block/null: Generate filename even with latency-ns

 include/block/block.h |  15 +-
 include/block/block_int.h |  38 +++-
 block.c   | 507 --
 block/blkdebug.c  |  69 +++---
 block/blkverify.c |  29 +--
 block/commit.c|   3 +-
 block/crypto.c|   

[Qemu-block] [PATCH v7 for-2.12 01/25] block/mirror: Small absolute-paths simplification

2017-11-20 Thread Max Reitz
When invoking drive-mirror in absolute-paths mode, the target's backing
BDS is assigned to it in mirror_exit(). The current logic only does so
if the target does not have that backing BDS already; but it actually
cannot have a backing BDS at all (the BDS is opened with O_NO_BACKING in
qmp_drive_mirror()), so just assert that and assign the new backing BDS
unconditionally.

Signed-off-by: Max Reitz 
Reviewed-by: Alberto Garcia 
---
 block/mirror.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 307b6391a8..f995924032 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -523,12 +523,12 @@ static void mirror_exit(BlockJob *job, void *opaque)
 _abort);
 if (s->backing_mode == MIRROR_SOURCE_BACKING_CHAIN) {
 BlockDriverState *backing = s->is_none_mode ? src : s->base;
-if (backing_bs(target_bs) != backing) {
-bdrv_set_backing_hd(target_bs, backing, _err);
-if (local_err) {
-error_report_err(local_err);
-data->ret = -EPERM;
-}
+
+assert(!target_bs->backing);
+bdrv_set_backing_hd(target_bs, backing, _err);
+if (local_err) {
+error_report_err(local_err);
+data->ret = -EPERM;
 }
 }
 
-- 
2.13.6




[Qemu-block] [PATCH v7 for-2.12 03/25] block: Add BDS.backing_overridden

2017-11-20 Thread Max Reitz
If the backing file is overridden, this most probably does change the
guest-visible data of a BDS. Therefore, we will need to consider this in
bdrv_refresh_filename().

Adding a new field to the BDS is not nice, but it is very simple and
exactly keeps track of whether the backing file has been overridden.

This commit adds a FIXME which will be remedied by a follow-up commit.
Until then, the respective piece of code will not result in any behavior
that is worse than what we currently have.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 include/block/block_int.h |  1 +
 block.c   | 13 +
 block/mirror.c|  4 
 blockdev.c| 16 
 4 files changed, 34 insertions(+)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index a5482775ec..5e9734d8b5 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -619,6 +619,7 @@ struct BlockDriverState {
 char backing_file[PATH_MAX]; /* if non zero, the image is a diff of
 this file image */
 char backing_format[16]; /* if non-zero and backing_file exists */
+bool backing_overridden; /* backing file has been specified by the user */
 
 QDict *full_open_options;
 char exact_filename[PATH_MAX];
diff --git a/block.c b/block.c
index 4aebf590d8..eb67dfbcc0 100644
--- a/block.c
+++ b/block.c
@@ -2233,6 +2233,11 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*parent_options,
 reference = qdict_get_try_str(parent_options, bdref_key);
 if (reference || qdict_haskey(options, "file.filename")) {
 backing_filename[0] = '\0';
+
+/* FIXME: Should also be set to true if @options contains other runtime
+ *options which control the data that is read from the backing
+ *BDS */
+bs->backing_overridden = true;
 } else if (bs->backing_file[0] == '\0' && qdict_size(options) == 0) {
 QDECREF(options);
 goto free_exit;
@@ -2434,6 +2439,9 @@ static BlockDriverState 
*bdrv_append_temp_snapshot(BlockDriverState *bs,
 goto out;
 }
 
+bs_snapshot->backing_overridden = true;
+bdrv_refresh_filename(bs_snapshot);
+
 out:
 QDECREF(snapshot_options);
 g_free(tmp_filename);
@@ -2564,6 +2572,7 @@ static BlockDriverState *bdrv_open_inherit(const char 
*filename,
 backing = qdict_get_try_str(options, "backing");
 if (backing && *backing == '\0') {
 flags |= BDRV_O_NO_BACKING;
+bs->backing_overridden = true;
 qdict_del(options, "backing");
 }
 
@@ -4976,6 +4985,10 @@ void bdrv_refresh_filename(BlockDriverState *bs)
  * refresh those first */
 QLIST_FOREACH(child, >children, next) {
 bdrv_refresh_filename(child->bs);
+
+if (child->role == _backing && child->bs->backing_overridden) {
+bs->backing_overridden = true;
+}
 }
 
 if (drv->bdrv_refresh_filename) {
diff --git a/block/mirror.c b/block/mirror.c
index f059981f19..d6e487fb2e 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -530,6 +530,10 @@ static void mirror_exit(BlockJob *job, void *opaque)
 error_report_err(local_err);
 data->ret = -EPERM;
 }
+
+/* The target image's file already has been created with the backing
+ * file we just set, so there is no need to set backing_overridden or
+ * call bdrv_refresh_filename(). */
 }
 
 if (s->to_replace) {
diff --git a/blockdev.c b/blockdev.c
index 56a6b24a0b..b0393e1786 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1781,6 +1781,8 @@ static void external_snapshot_commit(BlkActionState 
*common)
 {
 ExternalSnapshotState *state =
  DO_UPCAST(ExternalSnapshotState, common, common);
+TransactionAction *action = common->action;
+bool image_was_existing = false;
 
 /* We don't need (or want) to use the transactional
  * bdrv_reopen_multiple() across all the entries at once, because we
@@ -1789,6 +1791,20 @@ static void external_snapshot_commit(BlkActionState 
*common)
 bdrv_reopen(state->old_bs, state->old_bs->open_flags & ~BDRV_O_RDWR,
 NULL);
 }
+
+if (action->type == TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC) {
+BlockdevSnapshotSync *s = action->u.blockdev_snapshot_sync.data;
+if (s->has_mode && s->mode == NEW_IMAGE_MODE_EXISTING) {
+image_was_existing = true;
+}
+} else {
+image_was_existing = true;
+}
+
+if (image_was_existing) {
+state->new_bs->backing_overridden = true;
+bdrv_refresh_filename(state->new_bs);
+}
 }
 
 static void external_snapshot_abort(BlkActionState *common)
-- 
2.13.6




[Qemu-block] [PATCH v7 for-2.12 10/25] block: Fix bdrv_find_backing_image()

2017-11-20 Thread Max Reitz
bdrv_find_backing_image() should use bdrv_get_full_backing_filename() or
bdrv_make_absolute_filename() instead of trying to do what those
functions do by itself.

path_combine_deprecated() can now be dropped, so let's do that.

Signed-off-by: Max Reitz 
---
 block.c | 30 ++
 1 file changed, 10 insertions(+), 20 deletions(-)

diff --git a/block.c b/block.c
index a11cebf0cb..ecea78dae0 100644
--- a/block.c
+++ b/block.c
@@ -196,15 +196,6 @@ char *path_combine(const char *base_path, const char 
*filename)
 return result;
 }
 
-static void path_combine_deprecated(char *dest, int dest_size,
-const char *base_path,
-const char *filename)
-{
-char *combined = path_combine(base_path, filename);
-pstrcpy(dest, dest_size, combined);
-g_free(combined);
-}
-
 /*
  * Helper function for bdrv_parse_filename() implementations to remove optional
  * protocol prefixes (especially "file:") from a filename and for putting the
@@ -4133,7 +4124,6 @@ BlockDriverState 
*bdrv_find_backing_image(BlockDriverState *bs,
 
 filename_full = g_malloc(PATH_MAX);
 backing_file_full = g_malloc(PATH_MAX);
-filename_tmp  = g_malloc(PATH_MAX);
 
 is_protocol = path_has_protocol(backing_file);
 
@@ -4162,22 +4152,23 @@ BlockDriverState 
*bdrv_find_backing_image(BlockDriverState *bs,
 } else {
 /* If not an absolute filename path, make it relative to the 
current
  * image's filename path */
-path_combine_deprecated(filename_tmp, PATH_MAX, curr_bs->filename,
-backing_file);
-
-/* We are going to compare absolute pathnames */
-if (!realpath(filename_tmp, filename_full)) {
+filename_tmp = bdrv_make_absolute_filename(curr_bs, backing_file,
+   NULL);
+/* We are going to compare canonicalized absolute pathnames */
+if (!filename_tmp || !realpath(filename_tmp, filename_full)) {
+g_free(filename_tmp);
 continue;
 }
+g_free(filename_tmp);
 
 /* We need to make sure the backing filename we are comparing 
against
  * is relative to the current image filename (or absolute) */
-path_combine_deprecated(filename_tmp, PATH_MAX, curr_bs->filename,
-curr_bs->backing_file);
-
-if (!realpath(filename_tmp, backing_file_full)) {
+filename_tmp = bdrv_get_full_backing_filename(curr_bs, NULL);
+if (!filename_tmp || !realpath(filename_tmp, backing_file_full)) {
+g_free(filename_tmp);
 continue;
 }
+g_free(filename_tmp);
 
 if (strcmp(backing_file_full, filename_full) == 0) {
 retval = curr_bs->backing->bs;
@@ -4188,7 +4179,6 @@ BlockDriverState 
*bdrv_find_backing_image(BlockDriverState *bs,
 
 g_free(filename_full);
 g_free(backing_file_full);
-g_free(filename_tmp);
 return retval;
 }
 
-- 
2.13.6




Re: [Qemu-block] [Qemu-devel] [PATCH for-2.11] iotests: Fix 176 on 32-bit host

2017-11-20 Thread Eric Blake
On 11/17/2017 04:59 PM, John Snow wrote:
>>> So for this test, the easiest solution is to filter out the
>>> actual hash value.  Broken in commit 4096974e.
>>
>> Of course, if Kevin sends a v2 pull, it's probably better to just squash
>> this in to my original testsuite change (since a v2 would probably
>> invalidate that commit id).
>>
> 
> Whichever way you go,

Commit 4096974e has already landed in master, so this commit message is
correct as-is and belongs in the next pull request (whether for -rc2 or
-rc3 depends on timing).

> 
> Reviewed-by: John Snow 
> 
> I don't know how important it is to nail this down, but the purpose of
> this command is primarily for sanity testing and not necessarily between
> architectures, so this might just be a limitation to document.
> 
> Also, don't use this for anything except testing.

Indeed, the name of x-debug-block-dirty-bitmap-sha256 already implies
its limited portability; but it also gives us free reign to change it's
behavior if we find something we like better for the same ease of bitmap
testing/debugging.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-block] [PATCH v2 for-2.11 0/4] Fix segfault in blockjob race condition

2017-11-20 Thread Jeff Cody
Changes from v1 -> v2:

Patch 1: Updated docs in blockjob_int.h (Thanks Stefan)

Patch 2/3: Squashed, and used const char * to hold the __func__ name of
   the original scheduler (Thanks Paolo)

Patch 4: Unchanged.

Patch 5: Dropped qcow format for the test, it was so slow the test times
 out, and it doesn't add any new dimension to the test.


# git-backport-diff -r qemu/master.. -u github/bz1508708

001/4:[0003] [FC] 'blockjob: do not allow coroutine double entry or 
entry-after-completion'
002/4:[down] 'coroutine: abort if we try to schedule or enter a pending 
coroutine'
003/4:[] [--] 'qemu-iotests: add option in common.qemu for mismatch only'
004/4:[0002] [FC] 'qemu-iotest: add test for blockjob coroutine race condition'


This series fixes a race condition segfault when using iothreads with
blockjobs.

The qemu iotest in this series is a reproducer, as is the reproducer
script attached in this bug report:

https://bugzilla.redhat.com/show_bug.cgi?id=1508708

There are two additional patches to try and catch this sort of scenario
with an abort, before a segfault or memory corruption occurs.


Jeff Cody (4):
  blockjob: do not allow coroutine double entry or
entry-after-completion
  coroutine: abort if we try to schedule or enter a pending coroutine
  qemu-iotests: add option in common.qemu for mismatch only
  qemu-iotest: add test for blockjob coroutine race condition

 blockjob.c |  9 ++--
 include/block/blockjob_int.h   |  3 +-
 include/qemu/coroutine_int.h   |  6 +++
 tests/qemu-iotests/200 | 99 ++
 tests/qemu-iotests/200.out | 14 ++
 tests/qemu-iotests/common.qemu |  8 +++-
 tests/qemu-iotests/group   |  1 +
 util/async.c   | 11 +
 util/qemu-coroutine-sleep.c| 11 +
 util/qemu-coroutine.c  | 11 +
 10 files changed, 168 insertions(+), 5 deletions(-)
 create mode 100755 tests/qemu-iotests/200
 create mode 100644 tests/qemu-iotests/200.out

-- 
2.9.5




[Qemu-block] [PATCH v2 for-2.11 2/4] coroutine: abort if we try to schedule or enter a pending coroutine

2017-11-20 Thread Jeff Cody
The previous patch fixed a race condition, in which there were
coroutines being executing doubly, or after coroutine deletion.

We can detect common scenarios when this happens, and print an error
message and abort before we corrupt memory / data, or segfault.

This patch will abort if an attempt to enter a coroutine is made while
it is currently pending execution, either in a specific AioContext bh,
or pending execution via a timer.  It will also abort if a coroutine
is scheduled, before a prior scheduled run has occured.

We cannot rely on the existing co->caller check for recursive re-entry
to catch this, as the coroutine may run and exit with
COROUTINE_TERMINATE before the scheduled coroutine executes.

(This is the scenario that was occuring and fixed in the previous
patch).

Signed-off-by: Jeff Cody 
---
 include/qemu/coroutine_int.h |  6 ++
 util/async.c | 11 +++
 util/qemu-coroutine-sleep.c  | 11 +++
 util/qemu-coroutine.c| 11 +++
 4 files changed, 39 insertions(+)

diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
index cb98892..56e4c48 100644
--- a/include/qemu/coroutine_int.h
+++ b/include/qemu/coroutine_int.h
@@ -53,6 +53,12 @@ struct Coroutine {
 
 /* Only used when the coroutine has yielded.  */
 AioContext *ctx;
+
+/* Used to catch and abort on illegal co-routine entry.
+ * Will contain the name of the function that had first
+ * scheduled the coroutine. */
+const char *scheduled;
+
 QSIMPLEQ_ENTRY(Coroutine) co_queue_next;
 QSLIST_ENTRY(Coroutine) co_scheduled_next;
 };
diff --git a/util/async.c b/util/async.c
index 0e1bd87..49174b3 100644
--- a/util/async.c
+++ b/util/async.c
@@ -388,6 +388,7 @@ static void co_schedule_bh_cb(void *opaque)
 QSLIST_REMOVE_HEAD(, co_scheduled_next);
 trace_aio_co_schedule_bh_cb(ctx, co);
 aio_context_acquire(ctx);
+atomic_set(>scheduled, NULL);
 qemu_coroutine_enter(co);
 aio_context_release(ctx);
 }
@@ -438,6 +439,16 @@ fail:
 void aio_co_schedule(AioContext *ctx, Coroutine *co)
 {
 trace_aio_co_schedule(ctx, co);
+const char *scheduled = atomic_read(>scheduled);
+
+if (scheduled) {
+fprintf(stderr,
+"%s: Co-routine was already scheduled in '%s'\n",
+__func__, scheduled);
+abort();
+}
+atomic_set(>scheduled, __func__);
+
 QSLIST_INSERT_HEAD_ATOMIC(>scheduled_coroutines,
   co, co_scheduled_next);
 qemu_bh_schedule(ctx->co_schedule_bh);
diff --git a/util/qemu-coroutine-sleep.c b/util/qemu-coroutine-sleep.c
index 9c56550..38dc4c8 100644
--- a/util/qemu-coroutine-sleep.c
+++ b/util/qemu-coroutine-sleep.c
@@ -13,6 +13,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/coroutine.h"
+#include "qemu/coroutine_int.h"
 #include "qemu/timer.h"
 #include "block/aio.h"
 
@@ -25,6 +26,7 @@ static void co_sleep_cb(void *opaque)
 {
 CoSleepCB *sleep_cb = opaque;
 
+atomic_set(_cb->co->scheduled, NULL);
 aio_co_wake(sleep_cb->co);
 }
 
@@ -34,6 +36,15 @@ void coroutine_fn co_aio_sleep_ns(AioContext *ctx, 
QEMUClockType type,
 CoSleepCB sleep_cb = {
 .co = qemu_coroutine_self(),
 };
+const char *scheduled = atomic_read(_cb.co->scheduled);
+
+if (scheduled) {
+fprintf(stderr,
+"%s: Co-routine was already scheduled in '%s'\n",
+__func__, scheduled);
+abort();
+}
+atomic_set(_cb.co->scheduled, __func__);
 sleep_cb.ts = aio_timer_new(ctx, type, SCALE_NS, co_sleep_cb, _cb);
 timer_mod(sleep_cb.ts, qemu_clock_get_ns(type) + ns);
 qemu_coroutine_yield();
diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
index d6095c1..fbfd0ad 100644
--- a/util/qemu-coroutine.c
+++ b/util/qemu-coroutine.c
@@ -106,9 +106,20 @@ void qemu_aio_coroutine_enter(AioContext *ctx, Coroutine 
*co)
 {
 Coroutine *self = qemu_coroutine_self();
 CoroutineAction ret;
+const char *scheduled = atomic_read(>scheduled);
 
 trace_qemu_aio_coroutine_enter(ctx, self, co, co->entry_arg);
 
+/* if the Coroutine has already been scheduled, entering it again will
+ * cause us to enter it twice, potentially even after the coroutine has
+ * been deleted */
+if (scheduled) {
+fprintf(stderr,
+"%s: Co-routine was already scheduled in '%s'\n",
+__func__, scheduled);
+abort();
+}
+
 if (co->caller) {
 fprintf(stderr, "Co-routine re-entered recursively\n");
 abort();
-- 
2.9.5




[Qemu-block] [PATCH v2 for-2.11 1/4] blockjob: do not allow coroutine double entry or entry-after-completion

2017-11-20 Thread Jeff Cody
When block_job_sleep_ns() is called, the co-routine is scheduled for
future execution.  If we allow the job to be re-entered prior to the
scheduled time, we present a race condition in which a coroutine can be
entered recursively, or even entered after the coroutine is deleted.

The job->busy flag is used by blockjobs when a coroutine is busy
executing. The function 'block_job_enter()' obeys the busy flag,
and will not enter a coroutine if set.  If we sleep a job, we need to
leave the busy flag set, so that subsequent calls to block_job_enter()
are prevented.

This fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1508708

Also, in block_job_start(), set the relevant job flags (.busy, .paused)
before creating the coroutine, not just before executing it.

Signed-off-by: Jeff Cody 
---
 blockjob.c   | 9 ++---
 include/block/blockjob_int.h | 3 ++-
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 3a0c491..e181295 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -291,10 +291,10 @@ void block_job_start(BlockJob *job)
 {
 assert(job && !block_job_started(job) && job->paused &&
job->driver && job->driver->start);
-job->co = qemu_coroutine_create(block_job_co_entry, job);
 job->pause_count--;
 job->busy = true;
 job->paused = false;
+job->co = qemu_coroutine_create(block_job_co_entry, job);
 bdrv_coroutine_enter(blk_bs(job->blk), job->co);
 }
 
@@ -797,11 +797,14 @@ void block_job_sleep_ns(BlockJob *job, QEMUClockType 
type, int64_t ns)
 return;
 }
 
-job->busy = false;
+/* We need to leave job->busy set here, because when we have
+ * put a coroutine to 'sleep', we have scheduled it to run in
+ * the future.  We cannot enter that same coroutine again before
+ * it wakes and runs, otherwise we risk double-entry or entry after
+ * completion. */
 if (!block_job_should_pause(job)) {
 co_aio_sleep_ns(blk_get_aio_context(job->blk), type, ns);
 }
-job->busy = true;
 
 block_job_pause_point(job);
 }
diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index f13ad05..43f3be2 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -143,7 +143,8 @@ void *block_job_create(const char *job_id, const 
BlockJobDriver *driver,
  * @ns: How many nanoseconds to stop for.
  *
  * Put the job to sleep (assuming that it wasn't canceled) for @ns
- * nanoseconds.  Canceling the job will interrupt the wait immediately.
+ * nanoseconds.  Canceling the job will not interrupt the wait, so the
+ * cancel will not process until the coroutine wakes up.
  */
 void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns);
 
-- 
2.9.5




Re: [Qemu-block] [Qemu-devel] [PATCH for-2.11 0/3] block: Error out on load_vm with active dirty bitmaps

2017-11-20 Thread John Snow


On 11/20/2017 09:50 AM, Kevin Wolf wrote:
> Following the discussing at
> https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg03572.html
> this implements the error return for loading a snapshot while dirty
> bitmaps are active.
> 
> Kevin Wolf (3):
>   block: Add errp to bdrv_snapshot_goto()
>   block: Add errp to bdrv_all_goto_snapshot()
>   block: Error out on load_vm with active dirty bitmaps
> 
>  include/block/snapshot.h |  6 --
>  block/snapshot.c | 43 +++
>  migration/savevm.c   |  6 +++---
>  qemu-img.c   |  6 +++---
>  4 files changed, 33 insertions(+), 28 deletions(-)
> 

You have enough reviews, but just so it's unambiguous, I'm fine with
this approach in lieu of Vladimir's; so:

Reviewed-by: John Snow 



[Qemu-block] [PATCH v2 for-2.11 4/4] qemu-iotest: add test for blockjob coroutine race condition

2017-11-20 Thread Jeff Cody
Signed-off-by: Jeff Cody 
---
 tests/qemu-iotests/200 | 99 ++
 tests/qemu-iotests/200.out | 14 +++
 tests/qemu-iotests/group   |  1 +
 3 files changed, 114 insertions(+)
 create mode 100755 tests/qemu-iotests/200
 create mode 100644 tests/qemu-iotests/200.out

diff --git a/tests/qemu-iotests/200 b/tests/qemu-iotests/200
new file mode 100755
index 000..d8787dd
--- /dev/null
+++ b/tests/qemu-iotests/200
@@ -0,0 +1,99 @@
+#!/bin/bash
+#
+# Block job co-routine race condition test.
+#
+# See: https://bugzilla.redhat.com/show_bug.cgi?id=1508708
+#
+# Copyright (C) 2017 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=jc...@redhat.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+status=1# failure is the default!
+
+_cleanup()
+{
+_cleanup_qemu
+rm -f "${TEST_IMG}" "${BACKING_IMG}"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+. ./common.qemu
+
+_supported_fmt qcow2 qed
+_supported_proto file
+_supported_os Linux
+
+BACKING_IMG="${TEST_DIR}/backing.img"
+TEST_IMG="${TEST_DIR}/test.img"
+
+${QEMU_IMG} create -f $IMGFMT "${BACKING_IMG}" 512M | _filter_img_create
+${QEMU_IMG} create -f $IMGFMT -F $IMGFMT "${TEST_IMG}" -b "${BACKING_IMG}" 
512M | _filter_img_create
+
+${QEMU_IO} -c "write -P 0xa5 512 300M" "${BACKING_IMG}" | _filter_qemu_io
+
+echo
+echo === Starting QEMU VM ===
+echo
+qemu_comm_method="qmp"
+_launch_qemu -device pci-bridge,id=bridge1,chassis_nr=1,bus=pci.0 \
+ -object iothread,id=iothread0 \
+ -device 
virtio-scsi-pci,bus=bridge1,addr=0x1f,id=scsi0,iothread=iothread0 \
+ -drive 
file="${TEST_IMG}",media=disk,if=none,cache=none,id=drive_sysdisk,aio=native,format=$IMGFMT
 \
+ -device 
scsi-hd,drive=drive_sysdisk,bus=scsi0.0,id=sysdisk,bootindex=0
+h1=$QEMU_HANDLE
+
+_send_qemu_cmd $h1 "{ 'execute': 'qmp_capabilities' }" 'return'
+
+echo
+echo === Sending stream/cancel, checking for SIGSEGV only  ===
+echo
+for (( i=1;i<500;i++ ))
+do
+mismatch_only='y' qemu_error_no_exit='n' _send_qemu_cmd $h1 \
+   "{
+'execute': 'block-stream',
+'arguments': {
+'device': 'drive_sysdisk',
+'speed': 1000,
+'on-error': 'report',
+'job-id': 'job-$i'
+ }
+}
+{
+'execute': 'block-job-cancel',
+'arguments': {
+'device': 'job-$i'
+ }
+}" \
+   "{.*{.*}.*}"  # should match all well-formed QMP 
responses
+done
+
+silent='y' _send_qemu_cmd $h1  "{ 'execute': 'quit' }" 'return'
+
+echo "$i iterations performed"
+
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/200.out b/tests/qemu-iotests/200.out
new file mode 100644
index 000..af6a809
--- /dev/null
+++ b/tests/qemu-iotests/200.out
@@ -0,0 +1,14 @@
+QA output created by 200
+Formatting 'TEST_DIR/backing.img', fmt=IMGFMT size=536870912
+Formatting 'TEST_DIR/test.img', fmt=IMGFMT size=536870912 
backing_file=TEST_DIR/backing.img backing_fmt=IMGFMT
+wrote 314572800/314572800 bytes at offset 512
+300 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+=== Starting QEMU VM ===
+
+{"return": {}}
+
+=== Sending stream/cancel, checking for SIGSEGV only ===
+
+500 iterations performed
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 24e5ad1..25d9adf 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -194,3 +194,4 @@
 194 rw auto migration quick
 195 rw auto quick
 197 rw auto quick
+200 rw auto
-- 
2.9.5




[Qemu-block] [PATCH v2 for-2.11 3/4] qemu-iotests: add option in common.qemu for mismatch only

2017-11-20 Thread Jeff Cody
Add option to echo response to QMP / HMP command only on mismatch.

Useful for ignore all normal responses, but catching things like
segfaults.

Signed-off-by: Jeff Cody 
---
 tests/qemu-iotests/common.qemu | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/common.qemu b/tests/qemu-iotests/common.qemu
index 7b3052d..85f66b8 100644
--- a/tests/qemu-iotests/common.qemu
+++ b/tests/qemu-iotests/common.qemu
@@ -50,6 +50,8 @@ _in_fd=4
 #
 # If $silent is set to anything but an empty string, then
 # response is not echoed out.
+# If $mismatch_only is set, only non-matching responses will
+# be echoed.
 function _timed_wait_for()
 {
 local h=${1}
@@ -58,14 +60,18 @@ function _timed_wait_for()
 QEMU_STATUS[$h]=0
 while IFS= read -t ${QEMU_COMM_TIMEOUT} resp <&${QEMU_OUT[$h]}
 do
-if [ -z "${silent}" ]; then
+if [ -z "${silent}" ] && [ -z "${mismatch_only}" ]; then
 echo "${resp}" | _filter_testdir | _filter_qemu \
| _filter_qemu_io | _filter_qmp | _filter_hmp
 fi
 grep -q "${*}" < <(echo "${resp}")
 if [ $? -eq 0 ]; then
 return
+elif [ -z "${silent}" ] && [ -n "${mismatch_only}" ]; then
+echo "${resp}" | _filter_testdir | _filter_qemu \
+   | _filter_qemu_io | _filter_qmp | _filter_hmp
 fi
+
 done
 QEMU_STATUS[$h]=-1
 if [ -z "${qemu_error_no_exit}" ]; then
-- 
2.9.5




[Qemu-block] [PATCH v7 for-2.12 09/25] block: Add bdrv_make_absolute_filename()

2017-11-20 Thread Max Reitz
This is a general function for making a filename that is relative to a
certain BDS absolute.

It calls bdrv_get_full_backing_filename_from_filename() for now, but
that will be changed in a follow-up patch.

Signed-off-by: Max Reitz 
Reviewed-by: Alberto Garcia 
---
 block.c | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/block.c b/block.c
index 804adceafa..a11cebf0cb 100644
--- a/block.c
+++ b/block.c
@@ -313,13 +313,24 @@ char *bdrv_get_full_backing_filename_from_filename(const 
char *backed,
 }
 }
 
-char *bdrv_get_full_backing_filename(BlockDriverState *bs, Error **errp)
+/* If @filename is empty or NULL, this function returns NULL without
+ * setting @errp.  In all other cases, NULL will only be returned with
+ * @errp set.
+ */
+static char *bdrv_make_absolute_filename(BlockDriverState *relative_to,
+ const char *filename, Error **errp)
 {
-char *backed = bs->exact_filename[0] ? bs->exact_filename : bs->filename;
+char *bs_filename = relative_to->exact_filename[0]
+? relative_to->exact_filename
+: relative_to->filename;
+
+return bdrv_get_full_backing_filename_from_filename(bs_filename,
+filename ?: "", errp);
+}
 
-return bdrv_get_full_backing_filename_from_filename(backed,
-bs->backing_file,
-errp);
+char *bdrv_get_full_backing_filename(BlockDriverState *bs, Error **errp)
+{
+return bdrv_make_absolute_filename(bs, bs->backing_file, errp);
 }
 
 void bdrv_register(BlockDriver *bdrv)
-- 
2.13.6




[Qemu-block] [PATCH v7 for-2.12 19/25] block: Add BlockDriver.bdrv_gather_child_options

2017-11-20 Thread Max Reitz
Some follow-up patches will rework the way bs->full_open_options is
refreshed in bdrv_refresh_filename(). The new implementation will remove
the need for the block drivers' bdrv_refresh_filename() implementations
to set bs->full_open_options; instead, it will be generic and use static
information from each block driver.

However, by implementing bdrv_gather_child_options(), block drivers will
still be able to override the way the full_open_options of their
children are incorporated into their own.

We need to implement this function for VMDK because we have to prevent
the generic implementation from gathering the options of all children:
It is not possible to specify options for the extents through the
runtime options.

For quorum, the child names that would be used by the generic
implementation and the ones that we actually (currently) want to use
differ. See quorum_gather_child_options() for more information.

Note that both of these are cases which are not ideal: In case of VMDK
it would probably be nice to be able to specify options for all extents.
In case of quorum, the current runtime option structure is simply broken
and needs to be fixed (but that is left for another patch).

Signed-off-by: Max Reitz 
---
 include/block/block_int.h | 17 +
 block/quorum.c| 38 ++
 block/vmdk.c  | 13 +
 3 files changed, 68 insertions(+)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index d99ded8891..f81516b615 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -136,6 +136,23 @@ struct BlockDriver {
 void (*bdrv_refresh_filename)(BlockDriverState *bs, QDict *options);
 
 /*
+ * Gathers the open options for all children into @target.
+ * A simple format driver (without backing file support) might
+ * implement this function like this:
+ *
+ * QINCREF(bs->file->bs->full_open_options);
+ * qdict_put(target, "file", bs->file->bs->full_open_options);
+ *
+ * If not specified, the generic implementation will simply put
+ * all children's options under their respective name.
+ *
+ * Note that ideally this function would not be needed.  Every
+ * block driver which implements it is probably doing something
+ * shady regarding its runtime option structure.
+ */
+void (*bdrv_gather_child_options)(BlockDriverState *bs, QDict *target);
+
+/*
  * Returns an allocated string which is the directory name of this BDS: It
  * will be used to make relative filenames absolute by prepending this
  * function's return value to them.
diff --git a/block/quorum.c b/block/quorum.c
index 4b38201aa2..9d680addc6 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1095,6 +1095,43 @@ static void quorum_refresh_filename(BlockDriverState 
*bs, QDict *options)
 bs->full_open_options = opts;
 }
 
+static void quorum_gather_child_options(BlockDriverState *bs, QDict *target)
+{
+BDRVQuorumState *s = bs->opaque;
+QList *children_list;
+int i;
+
+/* The generic implementation for gathering child options in
+ * bdrv_refresh_filename() would use the names of the children
+ * as specified for bdrv_open_child() or bdrv_attach_child(),
+ * which is "children.%u" with %u being a value
+ * (s->next_child_index) that is incremented each time a new child
+ * is added (and never decremented).  Since children can be
+ * deleted at runtime, there may be gaps in that enumeration.
+ * When creating a new quorum BDS and specifying the children for
+ * it through runtime options, the enumeration used there may not
+ * have any gaps, though.
+ *
+ * Therefore, we have to create a new gap-less enumeration here
+ * (which we can achieve by simply putting all of the children's
+ * full_open_options into a QList).
+ *
+ * XXX: Note that there are issues with the current child option
+ *  structure quorum uses (such as the fact that children do
+ *  not really have unique permanent names).  Therefore, this
+ *  is going to have to change in the future and ideally we
+ *  want quorum to be covered by the generic implementation.
+ */
+
+children_list = qlist_new();
+qdict_put(target, "children", children_list);
+
+for (i = 0; i < s->num_children; i++) {
+QINCREF(s->children[i]->bs->full_open_options);
+qlist_append(children_list, s->children[i]->bs->full_open_options);
+}
+}
+
 static char *quorum_dirname(BlockDriverState *bs, Error **errp)
 {
 /* In general, there are multiple BDSs with different dirnames below this
@@ -1123,6 +1160,7 @@ static BlockDriver bdrv_quorum = {
 .bdrv_file_open = quorum_open,
 .bdrv_close = quorum_close,
 .bdrv_refresh_filename  = quorum_refresh_filename,
+.bdrv_gather_child_options 

[Qemu-block] [PATCH v7 for-2.12 04/25] iotests: Drop explicit base blockdev in 191

2017-11-20 Thread Max Reitz
Overriding the backing image should result in a json:{} pseudo-filename.
Then, you can no longer use the commit block job with filename
parameters.  Therefore, do not explicitly add the base and override the
middle image in iotest 191, since we do not need to anyway.  This will
allow us to continue to use the middle image's filename to identify it.

In the long run, we want block-commit to accept node names for base and
top (just like block-stream does).

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/191 |  3 +-
 tests/qemu-iotests/191.out | 70 +++---
 2 files changed, 36 insertions(+), 37 deletions(-)

diff --git a/tests/qemu-iotests/191 b/tests/qemu-iotests/191
index ad785e10b1..92805e9ca2 100755
--- a/tests/qemu-iotests/191
+++ b/tests/qemu-iotests/191
@@ -67,8 +67,7 @@ qemu_comm_method="qmp"
 qmp_pretty="y"
 
 _launch_qemu \
--blockdev 
"driver=${IMGFMT},file.driver=file,file.filename=${TEST_IMG}.base,node-name=base"
 \
--blockdev 
"driver=${IMGFMT},file.driver=file,file.filename=${TEST_IMG}.mid,node-name=mid,backing=base"
 \
+-blockdev 
"driver=${IMGFMT},file.driver=file,file.filename=${TEST_IMG}.mid,node-name=mid" 
\
 -blockdev 
"driver=${IMGFMT},file.driver=file,file.filename=${TEST_IMG},node-name=top,backing=mid"
 \
 -blockdev 
"driver=${IMGFMT},file.driver=file,file.filename=${TEST_IMG}.ovl2,node-name=top2,backing=mid"
 h=$QEMU_HANDLE
diff --git a/tests/qemu-iotests/191.out b/tests/qemu-iotests/191.out
index 73c0ed454c..26ff0a315d 100644
--- a/tests/qemu-iotests/191.out
+++ b/tests/qemu-iotests/191.out
@@ -220,26 +220,8 @@ wrote 65536/65536 bytes at offset 1048576
 "iops_rd": 0,
 "detect_zeroes": "off",
 "image": {
-"backing-image": {
-"virtual-size": 67108864,
-"filename": "TEST_DIR/t.qcow2.base",
-"cluster-size": 65536,
-"format": "qcow2",
-"actual-size": SIZE,
-"format-specific": {
-"type": "qcow2",
-"data": {
-"compat": "1.1",
-"lazy-refcounts": false,
-"refcount-bits": 16,
-"corrupt": false
-}
-},
-"dirty-flag": false
-},
-"backing-filename-format": "qcow2",
 "virtual-size": 67108864,
-"filename": "TEST_DIR/t.qcow2.mid",
+"filename": "TEST_DIR/t.qcow2.base",
 "cluster-size": 65536,
 "format": "qcow2",
 "actual-size": SIZE,
@@ -252,19 +234,16 @@ wrote 65536/65536 bytes at offset 1048576
 "corrupt": false
 }
 },
-"full-backing-filename": "TEST_DIR/t.qcow2.base",
-"backing-filename": "TEST_DIR/t.qcow2.base",
 "dirty-flag": false
 },
 "iops_wr": 0,
-"ro": false,
-"node-name": "mid",
-"backing_file_depth": 1,
+"ro": true,
+"node-name": "NODE_NAME",
+"backing_file_depth": 0,
 "drv": "qcow2",
 "iops": 0,
 "bps_wr": 0,
 "write_threshold": 0,
-"backing_file": "TEST_DIR/t.qcow2.base",
 "encrypted": false,
 "bps": 0,
 "bps_rd": 0,
@@ -273,7 +252,7 @@ wrote 65536/65536 bytes at offset 1048576
 "direct": false,
 "writeback": true
 },
-"file": "TEST_DIR/t.qcow2.mid",
+"file": "TEST_DIR/t.qcow2.base",
 "encryption_key_missing": false
 },
 {
@@ -281,13 +260,13 @@ wrote 65536/65536 bytes at offset 1048576
 "detect_zeroes": "off",
 "image": {
 "virtual-size": 393216,
-"filename": "TEST_DIR/t.qcow2.mid",
+"filename": "TEST_DIR/t.qcow2.base",
 "format": "file",
 "actual-size": SIZE,
 "dirty-flag": false
 },
 "iops_wr": 0,
-"ro": false,
+"ro": true,
 "node-name": "NODE_NAME",
 "backing_file_depth": 0,
 "drv": "file",
@@ -302,15 +281,33 @@ wrote 65536/65536 bytes at offset 1048576
 "direct": false,
 "writeback": true
 },
-"file": "TEST_DIR/t.qcow2.mid",
+"file": "TEST_DIR/t.qcow2.base",
 "encryption_key_missing": false
 },
 {
 "iops_rd": 0,
 "detect_zeroes": "off",
 "image": {
+"backing-image": {
+"virtual-size": 67108864,
+"filename": 

[Qemu-block] [PATCH v7 for-2.12 08/25] block: bdrv_get_full_backing_filename's ret. val.

2017-11-20 Thread Max Reitz
Make bdrv_get_full_backing_filename() return an allocated string instead
of placing the result in a caller-provided buffer.

Signed-off-by: Max Reitz 
Reviewed-by: Alberto Garcia 
---
 include/block/block.h |  3 +--
 block.c   | 47 +--
 block/qapi.c  | 12 ++--
 3 files changed, 20 insertions(+), 42 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 46c6190414..3e2ea6c879 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -481,8 +481,7 @@ void bdrv_round_to_clusters(BlockDriverState *bs,
 const char *bdrv_get_encrypted_filename(BlockDriverState *bs);
 void bdrv_get_backing_filename(BlockDriverState *bs,
char *filename, int filename_size);
-void bdrv_get_full_backing_filename(BlockDriverState *bs,
-char *dest, size_t sz, Error **errp);
+char *bdrv_get_full_backing_filename(BlockDriverState *bs, Error **errp);
 char *bdrv_get_full_backing_filename_from_filename(const char *backed,
const char *backing,
Error **errp);
diff --git a/block.c b/block.c
index f8b3edd5bb..804adceafa 100644
--- a/block.c
+++ b/block.c
@@ -313,24 +313,13 @@ char *bdrv_get_full_backing_filename_from_filename(const 
char *backed,
 }
 }
 
-void bdrv_get_full_backing_filename(BlockDriverState *bs, char *dest, size_t 
sz,
-Error **errp)
+char *bdrv_get_full_backing_filename(BlockDriverState *bs, Error **errp)
 {
 char *backed = bs->exact_filename[0] ? bs->exact_filename : bs->filename;
-char *full_name;
-Error *local_error = NULL;
 
-full_name = bdrv_get_full_backing_filename_from_filename(backed,
- bs->backing_file,
- _error);
-if (full_name) {
-pstrcpy(dest, sz, full_name);
-g_free(full_name);
-} else if (local_error) {
-error_propagate(errp, local_error);
-} else if (sz > 0) {
-*dest = '\0';
-}
+return bdrv_get_full_backing_filename_from_filename(backed,
+bs->backing_file,
+errp);
 }
 
 void bdrv_register(BlockDriver *bdrv)
@@ -2226,7 +2215,7 @@ out:
 int bdrv_open_backing_file(BlockDriverState *bs, QDict *parent_options,
const char *bdref_key, Error **errp)
 {
-char *backing_filename = g_malloc0(PATH_MAX);
+char *backing_filename = NULL;
 char *bdref_key_dot;
 const char *reference = NULL;
 int ret = 0;
@@ -2260,7 +2249,7 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*parent_options,
  */
 reference = qdict_get_try_str(parent_options, bdref_key);
 if (reference || qdict_haskey(options, "file.filename")) {
-backing_filename[0] = '\0';
+/* keep backing_filename NULL */
 
 /* FIXME: Should also be set to true if @options contains other runtime
  *options which control the data that is read from the backing
@@ -2270,8 +2259,7 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*parent_options,
 QDECREF(options);
 goto free_exit;
 } else {
-bdrv_get_full_backing_filename(bs, backing_filename, PATH_MAX,
-   _err);
+backing_filename = bdrv_get_full_backing_filename(bs, _err);
 if (local_err) {
 ret = -EINVAL;
 error_propagate(errp, local_err);
@@ -2292,9 +2280,8 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*parent_options,
 qdict_put_str(options, "driver", bs->backing_format);
 }
 
-backing_hd = bdrv_open_inherit(*backing_filename ? backing_filename : NULL,
-   reference, options, 0, bs, _backing,
-   errp);
+backing_hd = bdrv_open_inherit(backing_filename, reference, options, 0, bs,
+   _backing, errp);
 if (!backing_hd) {
 bs->open_flags |= BDRV_O_NO_BACKING;
 error_prepend(errp, "Could not open backing file: ");
@@ -4128,7 +4115,6 @@ BlockDriverState 
*bdrv_find_backing_image(BlockDriverState *bs,
 int is_protocol = 0;
 BlockDriverState *curr_bs = NULL;
 BlockDriverState *retval = NULL;
-Error *local_error = NULL;
 
 if (!bs || !bs->drv || !backing_file) {
 return NULL;
@@ -4145,21 +4131,22 @@ BlockDriverState 
*bdrv_find_backing_image(BlockDriverState *bs,
 /* If either of the filename paths is actually a protocol, then
  * compare unmodified paths; otherwise make paths relative */
 if (is_protocol || path_has_protocol(curr_bs->backing_file)) {
+char 

Re: [Qemu-block] [PATCH for-2.11] iotests: Fix 176 on 32-bit host

2017-11-20 Thread Max Reitz
On 2017-11-17 20:04, Eric Blake wrote:
> The contents of a qcow2 bitmap are rounded up to a size that
> matches the number of bits available for the granularity, but
> that granularity differs for 32-bit hosts (our default 64k
> cluster allows for 2M bitmap coverage per 'long') and 64-bit
> hosts (4M bitmap per 'long').  If the image is a multiple of
> 2M but not 4M, then the number of bytes occupied by the array
> of longs in memory differs between architecture, thus
> resulting in different SHA256 hashes.
> 
> Furthermore (but untested by me), if our computation of the
> SHA256 hash is at all endian-dependent because of how we store
> data in memory, that's another variable we'd have to account
> for (ideally, we specified the bitmap stored in qcow2 as
> fixed-endian on disk, because the same qcow2 file must be
> usable across any architecture; but that says nothing about
> how we represent things in memory).  But we already have test
> 165 to validate that bitmaps are stored correctly on disk,
> while this test is merely testing that the bitmap exists.
> 
> So for this test, the easiest solution is to filter out the
> actual hash value.  Broken in commit 4096974e.
> 
> Reported-by: Max Reitz 
> Signed-off-by: Eric Blake 
> ---
>  tests/qemu-iotests/176 | 3 ++-
>  tests/qemu-iotests/176.out | 8 
>  2 files changed, 6 insertions(+), 5 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] Failed to get "consistent read" lock on a mirroring image

2017-11-20 Thread Kevin Wolf
Am 20.11.2017 um 13:39 hat Eric Blake geschrieben:
> On 11/20/2017 04:37 AM, Kevin Wolf wrote:
> > [ Cc: qemu-block ]
> > 
> > Am 20.11.2017 um 09:47 hat Fam Zheng geschrieben:
> >> On Mon, 11/20 10:58, Han Han wrote:
> >>> Hello,
> >>> On qemu-2.10, I find 'qemu-img info' couldn't get the info of a mirroring
> >>> image:
> 
> >>
> >> Cc Kevin. Looks like -U here is not strong enough to skip the "consistent 
> >> read"
> >> lock. The drive mirror target BB shared permissions are different from 
> >> device
> >> BB, but I'm not sure if it is intentional.
> > 
> > I think there are two parts to this:
> > 
> > First, blocking consistent reads for the mirror target seems
> > unnecessary. I can send a patch to allow this in 2.11.
> 
> Doesn't the consistent read property mean that reading the image will
> see contents that a guest should have (had) access to at some point in
> time?  Until the mirroring reaches the READY state, the data is
> inconsistent (it is a mix of uninitialized data and partial guest data,
> while we continue to synchronize the rest of the guest data over the
> mirror), so it still makes sense to me that the consistent read blocker
> should be set at least until the mirror job switches into the second
> sync phase.

Yes, you're right. There may be some special cases where we actually get
a consistent view (if the target has the source as its backing file),
but generally speaking it's not consistent.

Okay, so let's leave this as it is for now.

> > The other part is that 'qemu-img info' doesn't actually need
> > BLK_PERM_CONSISTENT_READ, but blk_new_open() automatically requests it.
> > Maybe we need another flag that allows 'qemu-img info' to do without
> > this permission. The concrete use case are intermediate nodes of a
> > commit job, where we do have to block this permission.
> > 
> > Hm... Or is BDRV_O_NO_IO already the right flag for this? :-)
> 
> That's a good observation - as long as we aren't reading any
> guest-visible data from the image, we don't really care whether the
> guest-visible data is consistent.  BDRV_O_NO_IO is our promise that we
> are reading only metadata, not guest-visible data.  So that change makes
> sense.

I'll prepare a patch for this one then.

Kevin


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH 2/5] coroutine: abort if we try to enter coroutine scheduled for another ctx

2017-11-20 Thread Jeff Cody
On Mon, Nov 20, 2017 at 11:28:26AM +, Stefan Hajnoczi wrote:
> On Sun, Nov 19, 2017 at 09:46:43PM -0500, Jeff Cody wrote:
> > diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
> > index cb98892..931cdc9 100644
> > --- a/include/qemu/coroutine_int.h
> > +++ b/include/qemu/coroutine_int.h
> > @@ -53,6 +53,9 @@ struct Coroutine {
> >  
> >  /* Only used when the coroutine has yielded.  */
> >  AioContext *ctx;
> > +
> > +int scheduled;
> 
> s/int/bool/
> 

OK.

> > diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
> > index d6095c1..2edab63 100644
> > --- a/util/qemu-coroutine.c
> > +++ b/util/qemu-coroutine.c
> > @@ -109,6 +109,15 @@ void qemu_aio_coroutine_enter(AioContext *ctx, 
> > Coroutine *co)
> >  
> >  trace_qemu_aio_coroutine_enter(ctx, self, co, co->entry_arg);
> >  
> > +/* if the Coroutine has already been scheduled, entering it again will
> > + * cause us to enter it twice, potentially even after the coroutine has
> > + * been deleted */
> > +if (co->scheduled == 1) {
> > +fprintf(stderr, "Cannot enter a co-routine that has already "
> > +"been scheduled to run in a different 
> > AioContext\n");
> 
> This error message is too specific, the AioContext doesn't have to be
> different from the current one:
> 
> block/blkdebug.c:aio_co_schedule(qemu_get_current_aio_context(), 
> qemu_coroutine_self());
> 
> If something calls qemu_aio_coroutine_enter() on the coroutine it might
> be from the same AioContext - but still an error condition worth failing
> loudly on.
> 

Good point.


> I suggest simplifying the error message:
> 
>   fprintf(stderr, "Cannot enter a co-routine that has already "
>   "been scheduled\n");

OK. I'll also change the wording here to what you have above, as well:

  void aio_co_schedule(AioContext *ctx, Coroutine *co)
  {
   trace_aio_co_schedule(ctx, co);
  +if (co->scheduled == 1) {
  +fprintf(stderr,
  +"Cannot schedule a co-routine that is already scheduled\n");
  +abort();
  +}
  +co->scheduled = 1;
   QSLIST_INSERT_HEAD_ATOMIC(>scheduled_coroutines,
 co, co_scheduled_next);


Jeff



Re: [Qemu-block] [PATCH 3/5] coroutines: abort if we try to enter a still-sleeping coroutine

2017-11-20 Thread Jeff Cody
On Mon, Nov 20, 2017 at 11:43:34AM +, Stefan Hajnoczi wrote:
> On Sun, Nov 19, 2017 at 09:46:44PM -0500, Jeff Cody wrote:
> > diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
> > index 931cdc9..b071217 100644
> > --- a/include/qemu/coroutine_int.h
> > +++ b/include/qemu/coroutine_int.h
> > @@ -56,6 +56,8 @@ struct Coroutine {
> >  
> >  int scheduled;
> >  
> > +int sleeping;
> 
> s/int/bool/
> 

OK.

> BTW an alternative to adding individual bools is to implement a finite
> state machine for the entire coroutine lifecycle.  A single function can
> validate all state transitions:
> 
>   void check_state_transition(CoState old, CoState new,
>   const char *action)
>   {
>   const char *errmsg = fsm[old][new];
>   if (!errmsg) {
>   return; /* valid transition! */
>   }
> 
>   fprintf(stderr, "Cannot %s coroutine from %s state\n",
>   action, state_name[old]);
>   abort();
>   }
> 
> Specifying fsm[][] forces us to think through all possible state
> transitions.  This approach is proactive whereas adding bool flags is
> reactive since it only covers a subset of states that were encountered
> after crashes.  I'm not sure if it's worth it though :).

Interesting idea; maybe more for 2.12 instead of 2.11, though?

Jeff



Re: [Qemu-block] Failed to get "consistent read" lock on a mirroring image

2017-11-20 Thread Eric Blake
On 11/20/2017 04:37 AM, Kevin Wolf wrote:
> [ Cc: qemu-block ]
> 
> Am 20.11.2017 um 09:47 hat Fam Zheng geschrieben:
>> On Mon, 11/20 10:58, Han Han wrote:
>>> Hello,
>>> On qemu-2.10, I find 'qemu-img info' couldn't get the info of a mirroring
>>> image:

>>
>> Cc Kevin. Looks like -U here is not strong enough to skip the "consistent 
>> read"
>> lock. The drive mirror target BB shared permissions are different from device
>> BB, but I'm not sure if it is intentional.
> 
> I think there are two parts to this:
> 
> First, blocking consistent reads for the mirror target seems
> unnecessary. I can send a patch to allow this in 2.11.

Doesn't the consistent read property mean that reading the image will
see contents that a guest should have (had) access to at some point in
time?  Until the mirroring reaches the READY state, the data is
inconsistent (it is a mix of uninitialized data and partial guest data,
while we continue to synchronize the rest of the guest data over the
mirror), so it still makes sense to me that the consistent read blocker
should be set at least until the mirror job switches into the second
sync phase.

> 
> The other part is that 'qemu-img info' doesn't actually need
> BLK_PERM_CONSISTENT_READ, but blk_new_open() automatically requests it.
> Maybe we need another flag that allows 'qemu-img info' to do without
> this permission. The concrete use case are intermediate nodes of a
> commit job, where we do have to block this permission.
> 
> Hm... Or is BDRV_O_NO_IO already the right flag for this? :-)

That's a good observation - as long as we aren't reading any
guest-visible data from the image, we don't really care whether the
guest-visible data is consistent.  BDRV_O_NO_IO is our promise that we
are reading only metadata, not guest-visible data.  So that change makes
sense.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 1/5] blockjob: do not allow coroutine double entry or entry-after-completion

2017-11-20 Thread Stefan Hajnoczi
On Sun, Nov 19, 2017 at 09:46:42PM -0500, Jeff Cody wrote:
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -291,10 +291,10 @@ void block_job_start(BlockJob *job)
>  {
>  assert(job && !block_job_started(job) && job->paused &&
> job->driver && job->driver->start);
> -job->co = qemu_coroutine_create(block_job_co_entry, job);
>  job->pause_count--;
>  job->busy = true;
>  job->paused = false;
> +job->co = qemu_coroutine_create(block_job_co_entry, job);
>  bdrv_coroutine_enter(blk_bs(job->blk), job->co);
>  }
>  

This hunk makes no difference.  The coroutine is only entered by
bdrv_coroutine_enter() so the order of job field initialization doesn't
matter.

> @@ -797,11 +797,14 @@ void block_job_sleep_ns(BlockJob *job, QEMUClockType 
> type, int64_t ns)
>  return;
>  }
>  
> -job->busy = false;
> +/* We need to leave job->busy set here, because when we have
> + * put a coroutine to 'sleep', we have scheduled it to run in
> + * the future.  We cannot enter that same coroutine again before
> + * it wakes and runs, otherwise we risk double-entry or entry after
> + * completion. */
>  if (!block_job_should_pause(job)) {
>  co_aio_sleep_ns(blk_get_aio_context(job->blk), type, ns);
>  }
> -job->busy = true;
>  
>  block_job_pause_point(job);

This leaves a stale doc comment in include/block/blockjob_int.h:

  /**
   * block_job_sleep_ns:
   * @job: The job that calls the function.
   * @clock: The clock to sleep on.
   * @ns: How many nanoseconds to stop for.
   *
   * Put the job to sleep (assuming that it wasn't canceled) for @ns
   * nanoseconds.  Canceling the job will interrupt the wait immediately.
   ~~
   */
  void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns);

This raises questions about the ability to cancel sleep:

1. Does something depend on cancelling sleep?

2. Did cancellation work properly in commit
   4513eafe928ff47486f4167c28d364c72b5ff7e3 ("block: add
   block_job_sleep_ns") and was it broken afterwards?

It is possible to fix the recursive coroutine entry without losing sleep
cancellation.  Whether it's worth the trouble depends on the answers to
the above questions.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH 2/5] coroutine: abort if we try to enter coroutine scheduled for another ctx

2017-11-20 Thread Stefan Hajnoczi
On Sun, Nov 19, 2017 at 09:46:43PM -0500, Jeff Cody wrote:
> diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
> index cb98892..931cdc9 100644
> --- a/include/qemu/coroutine_int.h
> +++ b/include/qemu/coroutine_int.h
> @@ -53,6 +53,9 @@ struct Coroutine {
>  
>  /* Only used when the coroutine has yielded.  */
>  AioContext *ctx;
> +
> +int scheduled;

s/int/bool/

> diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
> index d6095c1..2edab63 100644
> --- a/util/qemu-coroutine.c
> +++ b/util/qemu-coroutine.c
> @@ -109,6 +109,15 @@ void qemu_aio_coroutine_enter(AioContext *ctx, Coroutine 
> *co)
>  
>  trace_qemu_aio_coroutine_enter(ctx, self, co, co->entry_arg);
>  
> +/* if the Coroutine has already been scheduled, entering it again will
> + * cause us to enter it twice, potentially even after the coroutine has
> + * been deleted */
> +if (co->scheduled == 1) {
> +fprintf(stderr, "Cannot enter a co-routine that has already "
> +"been scheduled to run in a different AioContext\n");

This error message is too specific, the AioContext doesn't have to be
different from the current one:

block/blkdebug.c:aio_co_schedule(qemu_get_current_aio_context(), 
qemu_coroutine_self());

If something calls qemu_aio_coroutine_enter() on the coroutine it might
be from the same AioContext - but still an error condition worth failing
loudly on.

I suggest simplifying the error message:

  fprintf(stderr, "Cannot enter a co-routine that has already "
  "been scheduled\n");


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH for-2.11 0/2] Fix 'qemu-img info' on mirror target

2017-11-20 Thread Fam Zheng
On Mon, 11/20 15:11, Kevin Wolf wrote:
> There is no reason to forbid 'qemu-img info' for an image that is used
> as a mirror target (or intermediate image of a commit job) at the same
> time.
> 
> This fixes the problem reported on qemu-discuss:
> https://lists.gnu.org/archive/html/qemu-discuss/2017-11/msg00039.html
> 
> Kevin Wolf (2):
>   block: Don't use BLK_PERM_CONSISTENT_READ for format probing
>   block: Don't request I/O permission with BDRV_O_NO_IO
> 
>  block.c   |  5 -
>  block/block-backend.c | 10 ++
>  2 files changed, 10 insertions(+), 5 deletions(-)
> 
> -- 
> 2.13.6
> 

Reviewed-by: Fam Zheng 




Re: [Qemu-block] [PATCH for-2.11] qcow2: fix image corruption on commit with persistent snapshot

2017-11-20 Thread Kevin Wolf
Am 17.11.2017 um 19:46 hat Eric Blake geschrieben:
> > -- and I'm not really a fan of testing this on every possible
> > architecture and then adding different reference outputs.
> > 
> > Therefore, the best fix is probably to just filter the hashes out (you
> > don't need the exact value anyway, do you?), and I think it's fine to do
> > this as a follow-up.
> 
> At any rate, I concur with this conclusion; I'll post a followup that
> filters out the hash (for this test, we only care that the existence of
> a hash proves the bitmap exists; unlike 165 where we want to validate
> that it is actually tracking correct information).
> 
> I missed Kevin's -rc2 pull, unless he wants to send a v2; but we also
> have time (it's not the end of the world if the fix goes in -rc3).

There is no rule that a maintainer can only send one pull request per
release candidate. I already missed -rc1, so I wanted to make sure to
get things merged definitely in time for -rc2, but it's not too unlikely
that I'll send another small pull request for tomorrow.

Kevin


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH for-2.11 2/2] block: Don't request I/O permission with BDRV_O_NO_IO

2017-11-20 Thread Alberto Garcia
On Mon 20 Nov 2017 03:11:41 PM CET, Kevin Wolf wrote:
> 'qemu-img info' makes sense even when BLK_PERM_CONSISTENT_READ cannot be
> granted because of a block job in a running qemu process. It already
> sets BDRV_O_NO_IO to indicate that it doesn't access the guest visible
> data at all.
>
> Check the BDRV_O_NO_IO flags in blk_new_open(), so that I/O related
> permissions are not unnecessarily requested and 'qemu-img info' can work
> even if BLK_PERM_CONSISTENT_READ cannot be granted.
>
> Signed-off-by: Kevin Wolf 

Reviewed-by: Alberto Garcia 

Berto



[Qemu-block] [PATCH for-2.11 2/2] block: Don't request I/O permission with BDRV_O_NO_IO

2017-11-20 Thread Kevin Wolf
'qemu-img info' makes sense even when BLK_PERM_CONSISTENT_READ cannot be
granted because of a block job in a running qemu process. It already
sets BDRV_O_NO_IO to indicate that it doesn't access the guest visible
data at all.

Check the BDRV_O_NO_IO flags in blk_new_open(), so that I/O related
permissions are not unnecessarily requested and 'qemu-img info' can work
even if BLK_PERM_CONSISTENT_READ cannot be granted.

Signed-off-by: Kevin Wolf 
---
 block/block-backend.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 5836cb3087..baef8e7abc 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -299,7 +299,7 @@ BlockBackend *blk_new_open(const char *filename, const char 
*reference,
 {
 BlockBackend *blk;
 BlockDriverState *bs;
-uint64_t perm;
+uint64_t perm = 0;
 
 /* blk_new_open() is mainly used in .bdrv_create implementations and the
  * tools where sharing isn't a concern because the BDS stays private, so we
@@ -309,9 +309,11 @@ BlockBackend *blk_new_open(const char *filename, const 
char *reference,
  * caller of blk_new_open() doesn't make use of the permissions, but they
  * shouldn't hurt either. We can still share everything here because the
  * guest devices will add their own blockers if they can't share. */
-perm = BLK_PERM_CONSISTENT_READ;
-if (flags & BDRV_O_RDWR) {
-perm |= BLK_PERM_WRITE;
+if ((flags & BDRV_O_NO_IO) == 0) {
+perm |= BLK_PERM_CONSISTENT_READ;
+if (flags & BDRV_O_RDWR) {
+perm |= BLK_PERM_WRITE;
+}
 }
 if (flags & BDRV_O_RESIZE) {
 perm |= BLK_PERM_RESIZE;
-- 
2.13.6




[Qemu-block] [PATCH for-2.11 0/2] Fix 'qemu-img info' on mirror target

2017-11-20 Thread Kevin Wolf
There is no reason to forbid 'qemu-img info' for an image that is used
as a mirror target (or intermediate image of a commit job) at the same
time.

This fixes the problem reported on qemu-discuss:
https://lists.gnu.org/archive/html/qemu-discuss/2017-11/msg00039.html

Kevin Wolf (2):
  block: Don't use BLK_PERM_CONSISTENT_READ for format probing
  block: Don't request I/O permission with BDRV_O_NO_IO

 block.c   |  5 -
 block/block-backend.c | 10 ++
 2 files changed, 10 insertions(+), 5 deletions(-)

-- 
2.13.6




[Qemu-block] [PATCH for-2.11 1/2] block: Don't use BLK_PERM_CONSISTENT_READ for format probing

2017-11-20 Thread Kevin Wolf
For format probing, we don't really care whether all of the image
content is consistent. The only thing we're looking at is the image
header, and specifically the magic numbers that are expected to never
change, no matter how inconsistent the guest visible disk content is.

Therefore, don't request BLK_PERM_CONSISTENT_READ. This allows to use
format probing, e.g. in the context of 'qemu-img info', even while the
guest visible data in the image is inconsistent during a running block
job.

Signed-off-by: Kevin Wolf 
---
 block.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 6c8ef98dfa..68b724206d 100644
--- a/block.c
+++ b/block.c
@@ -2579,7 +2579,10 @@ static BlockDriverState *bdrv_open_inherit(const char 
*filename,
 goto fail;
 }
 if (file_bs != NULL) {
-file = blk_new(BLK_PERM_CONSISTENT_READ, BLK_PERM_ALL);
+/* Not requesting BLK_PERM_CONSISTENT_READ because we're only
+ * looking at the header to guess the image format. This works even
+ * in cases where a guest would not see a consistent state. */
+file = blk_new(0, BLK_PERM_ALL);
 blk_insert_bs(file, file_bs, _err);
 bdrv_unref(file_bs);
 if (local_err) {
-- 
2.13.6




Re: [Qemu-block] [Qemu-devel] [PATCH v2 4/4] iotests: 030: add compressed block-stream test

2017-11-20 Thread Stefan Hajnoczi
On Thu, Nov 16, 2017 at 07:54:58PM +0300, Anton Nefedov wrote:
> Signed-off-by: Anton Nefedov 
> ---
>  tests/qemu-iotests/030 | 66 
> +-
>  tests/qemu-iotests/030.out |  4 +--
>  2 files changed, 67 insertions(+), 3 deletions(-)
> 
> diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
> index 457984b..831d6f3 100755
> --- a/tests/qemu-iotests/030
> +++ b/tests/qemu-iotests/030
> @@ -21,7 +21,7 @@
>  import time
>  import os
>  import iotests
> -from iotests import qemu_img, qemu_io
> +from iotests import qemu_img, qemu_img_pipe, qemu_io
>  
>  backing_img = os.path.join(iotests.test_dir, 'backing.img')
>  mid_img = os.path.join(iotests.test_dir, 'mid.img')
> @@ -804,5 +804,69 @@ class TestSetSpeed(iotests.QMPTestCase):
>  
>  self.cancel_and_wait(resume=True)
>  
> +class TestCompressed(iotests.QMPTestCase):
> +supported_fmts = ['qcow2']
> +cluster_size = 64 * 1024;
> +image_len = 1 * 1024 * 1024;

Please drop the unnecessary semicolons

> +
> +def setUp(self):
> +qemu_img('create', '-f', iotests.imgfmt, '-o', 'cluster_size=%d' % 
> TestCompressed.cluster_size, backing_img, str(TestCompressed.image_len))
> +qemu_io('-c', 'write -P 1 0 ' + str(TestCompressed.image_len), 
> backing_img)
> +qemu_img('create', '-f', iotests.imgfmt, '-o', 
> 'backing_file=%s,cluster_size=%d' % (backing_img, 
> TestCompressed.cluster_size), test_img)
> +
> +# write '3' in every 3rd cluster
> +step = 3
> +for off in range(0, TestCompressed.image_len, 
> TestCompressed.cluster_size * step):
> +qemu_io('-c', 'write -P %d %d %d' %
> +(step, off, TestCompressed.cluster_size), test_img)
> +
> +self.vm = iotests.VM().add_drive(test_img)
> +self.vm.launch()
> +
> +def tearDown(self):
> +os.remove(test_img)
> +os.remove(backing_img)
> +
> +def _first_divider(self, x, divs):

"Divisor" or "factor":
https://en.wikipedia.org/wiki/Divisor

> +return divs[0] if not x%divs[0] else self._first_divider(x, divs[1:])

An alternative that I find easier to read than conditional expressions:

  for divisor in divs:
  if x % divisor == 0:
  return divisor
  raise ValueError('No suitable divisor found')

> +
> +def test_compressed(self):
> +self.assert_no_active_block_jobs()
> +
> +result = self.vm.qmp('block-stream', device='drive0', compress=True)
> +if iotests.imgfmt not in TestCompressed.supported_fmts:
> +self.assert_qmp(
> +result, 'error/desc',
> +'Compression is not supported for this drive drive0')
> +return
> +self.assert_qmp(result, 'return', {})
> +
> +# write '4' in every 4th cluster
> +step = 4
> +for off in range(0, TestCompressed.image_len, 
> TestCompressed.cluster_size * step):
> +result = self.vm.qmp('human-monitor-command',
> + command_line=
> + 'qemu-io drive0 "write -P %d %d %d"' %
> + (step, off, TestCompressed.cluster_size))
> +self.assert_qmp(result, 'return', "")
> +
> +self.wait_until_completed()
> +self.assert_no_active_block_jobs()
> +
> +self.vm.shutdown()

It is safe to call self.vm.shutdown() multiple times.  Please call it
from tearDown() too so the QEMU process is cleaned up on failure.


signature.asc
Description: PGP signature


[Qemu-block] [PATCH for-2.11 1/3] block: Add errp to bdrv_snapshot_goto()

2017-11-20 Thread Kevin Wolf
Signed-off-by: Kevin Wolf 
---
 include/block/snapshot.h |  3 ++-
 block/snapshot.c | 21 -
 qemu-img.c   |  6 +++---
 3 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/include/block/snapshot.h b/include/block/snapshot.h
index e5c0553115..aeb80405e8 100644
--- a/include/block/snapshot.h
+++ b/include/block/snapshot.h
@@ -57,7 +57,8 @@ int bdrv_can_snapshot(BlockDriverState *bs);
 int bdrv_snapshot_create(BlockDriverState *bs,
  QEMUSnapshotInfo *sn_info);
 int bdrv_snapshot_goto(BlockDriverState *bs,
-   const char *snapshot_id);
+   const char *snapshot_id,
+   Error **errp);
 int bdrv_snapshot_delete(BlockDriverState *bs,
  const char *snapshot_id,
  const char *name,
diff --git a/block/snapshot.c b/block/snapshot.c
index be0743abac..9c941e638d 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -177,18 +177,21 @@ int bdrv_snapshot_create(BlockDriverState *bs,
 }
 
 int bdrv_snapshot_goto(BlockDriverState *bs,
-   const char *snapshot_id)
+   const char *snapshot_id,
+   Error **errp)
 {
 BlockDriver *drv = bs->drv;
 int ret, open_ret;
 int64_t len;
 
 if (!drv) {
+error_setg(errp, "Block driver is closed");
 return -ENOMEDIUM;
 }
 
 len = bdrv_getlength(bs);
 if (len < 0) {
+error_setg_errno(errp, -len, "Cannot get block device size");
 return len;
 }
 /* We should set all bits in all enabled dirty bitmaps, because dirty
@@ -200,13 +203,18 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
 bdrv_set_dirty(bs, 0, len);
 
 if (drv->bdrv_snapshot_goto) {
-return drv->bdrv_snapshot_goto(bs, snapshot_id);
+ret = drv->bdrv_snapshot_goto(bs, snapshot_id);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to load snapshot");
+}
+return ret;
 }
 
 if (bs->file) {
 BlockDriverState *file;
 QDict *options = qdict_clone_shallow(bs->options);
 QDict *file_options;
+Error *local_err = NULL;
 
 file = bs->file->bs;
 /* Prevent it from getting deleted when detached from bs */
@@ -220,12 +228,14 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
 bdrv_unref_child(bs, bs->file);
 bs->file = NULL;
 
-ret = bdrv_snapshot_goto(file, snapshot_id);
-open_ret = drv->bdrv_open(bs, options, bs->open_flags, NULL);
+ret = bdrv_snapshot_goto(file, snapshot_id, errp);
+open_ret = drv->bdrv_open(bs, options, bs->open_flags, _err);
 QDECREF(options);
 if (open_ret < 0) {
 bdrv_unref(file);
 bs->drv = NULL;
+/* A bdrv_snapshot_goto() error takes precedence */
+error_propagate(errp, local_err);
 return open_ret;
 }
 
@@ -234,6 +244,7 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
 return ret;
 }
 
+error_setg(errp, "Block driver does not support snapshots");
 return -ENOTSUP;
 }
 
@@ -467,7 +478,7 @@ int bdrv_all_goto_snapshot(const char *name, 
BlockDriverState **first_bad_bs)
 
 aio_context_acquire(ctx);
 if (bdrv_can_snapshot(bs)) {
-err = bdrv_snapshot_goto(bs, name);
+err = bdrv_snapshot_goto(bs, name, NULL);
 }
 aio_context_release(ctx);
 if (err < 0) {
diff --git a/qemu-img.c b/qemu-img.c
index 02a6e27beb..68b375f998 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2989,10 +2989,10 @@ static int img_snapshot(int argc, char **argv)
 break;
 
 case SNAPSHOT_APPLY:
-ret = bdrv_snapshot_goto(bs, snapshot_name);
+ret = bdrv_snapshot_goto(bs, snapshot_name, );
 if (ret) {
-error_report("Could not apply snapshot '%s': %d (%s)",
-snapshot_name, ret, strerror(-ret));
+error_reportf_err(err, "Could not apply snapshot '%s': ",
+  snapshot_name);
 }
 break;
 
-- 
2.13.6




[Qemu-block] [PATCH for-2.11 0/3] block: Error out on load_vm with active dirty bitmaps

2017-11-20 Thread Kevin Wolf
Following the discussing at
https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg03572.html
this implements the error return for loading a snapshot while dirty
bitmaps are active.

Kevin Wolf (3):
  block: Add errp to bdrv_snapshot_goto()
  block: Add errp to bdrv_all_goto_snapshot()
  block: Error out on load_vm with active dirty bitmaps

 include/block/snapshot.h |  6 --
 block/snapshot.c | 43 +++
 migration/savevm.c   |  6 +++---
 qemu-img.c   |  6 +++---
 4 files changed, 33 insertions(+), 28 deletions(-)

-- 
2.13.6




[Qemu-block] [PATCH for-2.11 3/3] block: Error out on load_vm with active dirty bitmaps

2017-11-20 Thread Kevin Wolf
Loading a snapshot invalidates the bitmap. Just marking all blocks dirty
is not a useful response in practice, instead the user needs to be aware
that we switch to a completely different state. If they are okay with
losing the dirty bitmap, they can just explicitly delete it.

This effectively reverts commit 04dec3c3ae5.

Signed-off-by: Kevin Wolf 
---
 block/snapshot.c | 15 +++
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/block/snapshot.c b/block/snapshot.c
index 13ec3b1c8c..6b338978c5 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -182,25 +182,16 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
 {
 BlockDriver *drv = bs->drv;
 int ret, open_ret;
-int64_t len;
 
 if (!drv) {
 error_setg(errp, "Block driver is closed");
 return -ENOMEDIUM;
 }
 
-len = bdrv_getlength(bs);
-if (len < 0) {
-error_setg_errno(errp, -len, "Cannot get block device size");
-return len;
+if (!QLIST_EMPTY(>dirty_bitmaps)) {
+error_setg(errp, "Device has active dirty bitmaps");
+return -EBUSY;
 }
-/* We should set all bits in all enabled dirty bitmaps, because dirty
- * bitmaps reflect active state of disk and snapshot switch operation
- * actually dirties active state.
- * TODO: It may make sense not to set all bits but analyze block status of
- * current state and destination snapshot and do not set bits corresponding
- * to both-zero or both-unallocated areas. */
-bdrv_set_dirty(bs, 0, len);
 
 if (drv->bdrv_snapshot_goto) {
 ret = drv->bdrv_snapshot_goto(bs, snapshot_id);
-- 
2.13.6




Re: [Qemu-block] [Qemu-devel] [PATCH v2 1/4] qcow2: multiple clusters write compressed

2017-11-20 Thread Stefan Hajnoczi
On Thu, Nov 16, 2017 at 07:54:55PM +0300, Anton Nefedov wrote:
> From: Pavel Butsykin 
> 
> At the moment, qcow2_co_pwritev_compressed can process the requests size
> less than or equal to one cluster. This patch added possibility to write
> compressed data in the QCOW2 more than one cluster. The implementation
> is simple, we just split large requests into separate clusters and write
> using existing functionality.
> 
> Signed-off-by: Anton Nefedov 
> ---
>  block/qcow2.c | 73 
> ++-
>  1 file changed, 52 insertions(+), 21 deletions(-)

Kevin: qcow2_alloc_compressed_cluster_offset() sets up an L2 table entry
before doing the compressed data write.  Can the following scenario
occur if there is another request racing with the compressed write?

1. Compressed cluster L2 table entry added to qcow2 in-memory cache
2. Another request forces cached L2 table to be written to disk
3. Power failure or crash before compressed data is written

Now there is an L2 table entry pointing to garbage data.  This violates
qcow2's data integrity model.

I'm not sure if compressed writes are safe...  It may have been okay for
qemu-img convert but the risk is increased when running a VM.

> diff --git a/block/qcow2.c b/block/qcow2.c
> index c276b24..f1e2759 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -3332,11 +3332,9 @@ static int qcow2_truncate(BlockDriverState *bs, 
> int64_t offset,
>  return 0;
>  }
>  
> -/* XXX: put compressed sectors first, then all the cluster aligned
> -   tables to avoid losing bytes in alignment */
>  static coroutine_fn int
> -qcow2_co_pwritev_compressed(BlockDriverState *bs, uint64_t offset,
> -uint64_t bytes, QEMUIOVector *qiov)
> +qcow2_co_pwritev_cluster_compressed(BlockDriverState *bs, uint64_t offset,
> +uint64_t bytes, QEMUIOVector *qiov)
>  {
>  BDRVQcow2State *s = bs->opaque;
>  QEMUIOVector hd_qiov;
> @@ -3346,25 +3344,12 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, 
> uint64_t offset,
>  uint8_t *buf, *out_buf;
>  int64_t cluster_offset;
>  
> -if (bytes == 0) {
> -/* align end of file to a sector boundary to ease reading with
> -   sector based I/Os */
> -cluster_offset = bdrv_getlength(bs->file->bs);
> -if (cluster_offset < 0) {
> -return cluster_offset;
> -}
> -return bdrv_truncate(bs->file, cluster_offset, PREALLOC_MODE_OFF, 
> NULL);
> -}
> -
> -if (offset_into_cluster(s, offset)) {
> -return -EINVAL;
> -}
> +assert(bytes <= s->cluster_size);
> +assert(!offset_into_cluster(s, offset));
>  
>  buf = qemu_blockalign(bs, s->cluster_size);
> -if (bytes != s->cluster_size) {
> -if (bytes > s->cluster_size ||
> -offset + bytes != bs->total_sectors << BDRV_SECTOR_BITS)
> -{
> +if (bytes < s->cluster_size) {
> +if (offset + bytes != bs->total_sectors << BDRV_SECTOR_BITS) {
>  qemu_vfree(buf);
>  return -EINVAL;
>  }
> @@ -3444,6 +3429,52 @@ fail:
>  return ret;
>  }
>  
> +/* XXX: put compressed sectors first, then all the cluster aligned
> +   tables to avoid losing bytes in alignment */
> +static coroutine_fn int
> +qcow2_co_pwritev_compressed(BlockDriverState *bs, uint64_t offset,
> +uint64_t bytes, QEMUIOVector *qiov)
> +{
> +BDRVQcow2State *s = bs->opaque;
> +QEMUIOVector hd_qiov;
> +uint64_t curr_off = 0;
> +int ret;
> +
> +if (bytes == 0) {
> +/* align end of file to a sector boundary to ease reading with
> +   sector based I/Os */
> +int64_t cluster_offset = bdrv_getlength(bs->file->bs);
> +if (cluster_offset < 0) {
> +return cluster_offset;
> +}
> +return bdrv_truncate(bs->file, cluster_offset, PREALLOC_MODE_OFF, 
> NULL);
> +}
> +
> +if (offset_into_cluster(s, offset)) {
> +return -EINVAL;
> +}
> +
> +qemu_iovec_init(_qiov, qiov->niov);
> +do {
> +uint32_t chunk_size;
> +
> +qemu_iovec_reset(_qiov);
> +chunk_size = MIN(bytes, s->cluster_size);
> +qemu_iovec_concat(_qiov, qiov, curr_off, chunk_size);
> +
> +ret = qcow2_co_pwritev_cluster_compressed(bs, offset + curr_off,
> +  chunk_size, _qiov);
> +if (ret < 0) {
> +break;
> +}
> +curr_off += chunk_size;
> +bytes -= chunk_size;
> +} while (bytes);
> +qemu_iovec_destroy(_qiov);
> +
> +return ret;
> +}
> +
>  static int make_completely_empty(BlockDriverState *bs)
>  {
>  BDRVQcow2State *s = bs->opaque;
> -- 
> 2.7.4
> 
> 


signature.asc
Description: PGP signature


Re: [Qemu-block] [Qemu-devel] [PULL 00/25] Block layer patches for 2.11.0-rc2

2017-11-20 Thread Peter Maydell
On 17 November 2017 at 18:16, Kevin Wolf  wrote:
> The following changes since commit fec035a53fa15c4c8c4e62bfef56a35df4161e38:
>
>   Merge remote-tracking branch 'remotes/kraxel/tags/ui-20171117-pull-request' 
> into staging (2017-11-17 10:18:41 +)
>
> are available in the git repository at:
>
>   git://repo.or.cz/qemu/kevin.git tags/for-upstream
>
> for you to fetch changes up to d5a49c6e7d9e42059450674ec845b7bc0d62cb7e:
>
>   Merge remote-tracking branch 'mreitz/tags/pull-block-2017-11-17' into 
> queue-block (2017-11-17 18:24:30 +0100)
>
> 
> Block layer patches for 2.11.0-rc2
>
> 

Hi. I'm afraid this fails 'make check' (Linux x86-64, gcc, debug build):

  GTESTER tests/test-replication
**
ERROR:/home/petmay01/linaro/qemu-for-merges/tests/test-replication.c:117:test_blk_write:
assertion failed: (async_ret == 0)
GTester: last random seed: R02S7bcadec8b2ecdf71fa8abd7f833c90f5
/home/petmay01/linaro/qemu-for-merges/tests/Makefile.include:863:
recipe for target 'check-tests/test-replication' failed

thanks
-- PMM



Re: [Qemu-block] [PATCH v2 3/4] block-stream: add compress option

2017-11-20 Thread Stefan Hajnoczi
On Thu, Nov 16, 2017 at 07:54:57PM +0300, Anton Nefedov wrote:
> It might be useful to compress images during block-stream;
> this way the user can merge compressed images of a backing chain and
> the result will remain compressed.
> 
> Signed-off-by: Anton Nefedov 
> ---
>  qapi/block-core.json  |  4 
>  include/block/block_int.h |  4 +++-
>  block/stream.c| 16 
>  blockdev.c| 13 -
>  hmp.c |  2 ++
>  hmp-commands.hx   |  4 ++--
>  6 files changed, 35 insertions(+), 8 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH v2 2/4] block: support compressed write for copy-on-read

2017-11-20 Thread Stefan Hajnoczi
On Thu, Nov 16, 2017 at 07:54:56PM +0300, Anton Nefedov wrote:
> Signed-off-by: Anton Nefedov 
> Reviewed-by: Max Reitz 
> ---
>  block/io.c | 23 +--
>  block/trace-events |  2 +-
>  2 files changed, 18 insertions(+), 7 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [Qemu-block] [Qemu-devel] [PATCH v2 1/4] qcow2: multiple clusters write compressed

2017-11-20 Thread Denis V. Lunev
On 11/20/2017 05:53 PM, Stefan Hajnoczi wrote:
> On Thu, Nov 16, 2017 at 07:54:55PM +0300, Anton Nefedov wrote:
>> From: Pavel Butsykin 
>>
>> At the moment, qcow2_co_pwritev_compressed can process the requests size
>> less than or equal to one cluster. This patch added possibility to write
>> compressed data in the QCOW2 more than one cluster. The implementation
>> is simple, we just split large requests into separate clusters and write
>> using existing functionality.
>>
>> Signed-off-by: Anton Nefedov 
>> ---
>>  block/qcow2.c | 73 
>> ++-
>>  1 file changed, 52 insertions(+), 21 deletions(-)
> Kevin: qcow2_alloc_compressed_cluster_offset() sets up an L2 table entry
> before doing the compressed data write.  Can the following scenario
> occur if there is another request racing with the compressed write?
>
> 1. Compressed cluster L2 table entry added to qcow2 in-memory cache
> 2. Another request forces cached L2 table to be written to disk
> 3. Power failure or crash before compressed data is written
>
> Now there is an L2 table entry pointing to garbage data.  This violates
> qcow2's data integrity model.
>
> I'm not sure if compressed writes are safe...  It may have been okay for
> qemu-img convert but the risk is increased when running a VM.
qemu-img is now multi-coroutine, thus the danger is exactly the same.

Den



>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index c276b24..f1e2759 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -3332,11 +3332,9 @@ static int qcow2_truncate(BlockDriverState *bs, 
>> int64_t offset,
>>  return 0;
>>  }
>>  
>> -/* XXX: put compressed sectors first, then all the cluster aligned
>> -   tables to avoid losing bytes in alignment */
>>  static coroutine_fn int
>> -qcow2_co_pwritev_compressed(BlockDriverState *bs, uint64_t offset,
>> -uint64_t bytes, QEMUIOVector *qiov)
>> +qcow2_co_pwritev_cluster_compressed(BlockDriverState *bs, uint64_t offset,
>> +uint64_t bytes, QEMUIOVector *qiov)
>>  {
>>  BDRVQcow2State *s = bs->opaque;
>>  QEMUIOVector hd_qiov;
>> @@ -3346,25 +3344,12 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, 
>> uint64_t offset,
>>  uint8_t *buf, *out_buf;
>>  int64_t cluster_offset;
>>  
>> -if (bytes == 0) {
>> -/* align end of file to a sector boundary to ease reading with
>> -   sector based I/Os */
>> -cluster_offset = bdrv_getlength(bs->file->bs);
>> -if (cluster_offset < 0) {
>> -return cluster_offset;
>> -}
>> -return bdrv_truncate(bs->file, cluster_offset, PREALLOC_MODE_OFF, 
>> NULL);
>> -}
>> -
>> -if (offset_into_cluster(s, offset)) {
>> -return -EINVAL;
>> -}
>> +assert(bytes <= s->cluster_size);
>> +assert(!offset_into_cluster(s, offset));
>>  
>>  buf = qemu_blockalign(bs, s->cluster_size);
>> -if (bytes != s->cluster_size) {
>> -if (bytes > s->cluster_size ||
>> -offset + bytes != bs->total_sectors << BDRV_SECTOR_BITS)
>> -{
>> +if (bytes < s->cluster_size) {
>> +if (offset + bytes != bs->total_sectors << BDRV_SECTOR_BITS) {
>>  qemu_vfree(buf);
>>  return -EINVAL;
>>  }
>> @@ -3444,6 +3429,52 @@ fail:
>>  return ret;
>>  }
>>  
>> +/* XXX: put compressed sectors first, then all the cluster aligned
>> +   tables to avoid losing bytes in alignment */
>> +static coroutine_fn int
>> +qcow2_co_pwritev_compressed(BlockDriverState *bs, uint64_t offset,
>> +uint64_t bytes, QEMUIOVector *qiov)
>> +{
>> +BDRVQcow2State *s = bs->opaque;
>> +QEMUIOVector hd_qiov;
>> +uint64_t curr_off = 0;
>> +int ret;
>> +
>> +if (bytes == 0) {
>> +/* align end of file to a sector boundary to ease reading with
>> +   sector based I/Os */
>> +int64_t cluster_offset = bdrv_getlength(bs->file->bs);
>> +if (cluster_offset < 0) {
>> +return cluster_offset;
>> +}
>> +return bdrv_truncate(bs->file, cluster_offset, PREALLOC_MODE_OFF, 
>> NULL);
>> +}
>> +
>> +if (offset_into_cluster(s, offset)) {
>> +return -EINVAL;
>> +}
>> +
>> +qemu_iovec_init(_qiov, qiov->niov);
>> +do {
>> +uint32_t chunk_size;
>> +
>> +qemu_iovec_reset(_qiov);
>> +chunk_size = MIN(bytes, s->cluster_size);
>> +qemu_iovec_concat(_qiov, qiov, curr_off, chunk_size);
>> +
>> +ret = qcow2_co_pwritev_cluster_compressed(bs, offset + curr_off,
>> +  chunk_size, _qiov);
>> +if (ret < 0) {
>> +break;
>> +}
>> +curr_off += chunk_size;
>> +bytes -= chunk_size;
>> +} while (bytes);
>> +qemu_iovec_destroy(_qiov);
>> +
>> +return ret;
>> +}
>> +
>>  static int 

Re: [Qemu-block] [PATCH for-2.11 0/3] block: Error out on load_vm with active dirty bitmaps

2017-11-20 Thread Denis V. Lunev
On 11/20/2017 05:50 PM, Kevin Wolf wrote:
> Following the discussing at
> https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg03572.html
> this implements the error return for loading a snapshot while dirty
> bitmaps are active.
>
> Kevin Wolf (3):
>   block: Add errp to bdrv_snapshot_goto()
>   block: Add errp to bdrv_all_goto_snapshot()
>   block: Error out on load_vm with active dirty bitmaps
>
>  include/block/snapshot.h |  6 --
>  block/snapshot.c | 43 +++
>  migration/savevm.c   |  6 +++---
>  qemu-img.c   |  6 +++---
>  4 files changed, 33 insertions(+), 28 deletions(-)
>
Reviewed-by: Denis V. Lunev 



Re: [Qemu-block] [Qemu-devel] [PULL 00/25] Block layer patches for 2.11.0-rc2

2017-11-20 Thread Kevin Wolf
Am 20.11.2017 um 15:53 hat Peter Maydell geschrieben:
> On 17 November 2017 at 18:16, Kevin Wolf  wrote:
> > The following changes since commit fec035a53fa15c4c8c4e62bfef56a35df4161e38:
> >
> >   Merge remote-tracking branch 
> > 'remotes/kraxel/tags/ui-20171117-pull-request' into staging (2017-11-17 
> > 10:18:41 +)
> >
> > are available in the git repository at:
> >
> >   git://repo.or.cz/qemu/kevin.git tags/for-upstream
> >
> > for you to fetch changes up to d5a49c6e7d9e42059450674ec845b7bc0d62cb7e:
> >
> >   Merge remote-tracking branch 'mreitz/tags/pull-block-2017-11-17' into 
> > queue-block (2017-11-17 18:24:30 +0100)
> >
> > 
> > Block layer patches for 2.11.0-rc2
> >
> > 
> 
> Hi. I'm afraid this fails 'make check' (Linux x86-64, gcc, debug build):
> 
>   GTESTER tests/test-replication
> **
> ERROR:/home/petmay01/linaro/qemu-for-merges/tests/test-replication.c:117:test_blk_write:
> assertion failed: (async_ret == 0)
> GTester: last random seed: R02S7bcadec8b2ecdf71fa8abd7f833c90f5
> /home/petmay01/linaro/qemu-for-merges/tests/Makefile.include:863:
> recipe for target 'check-tests/test-replication' failed

I'll try to reproduce this, but so far I don't seem to be able to get it
to fail on my main laptop.

Kevin