Re: [Qemu-block] [PATCH v2] blkdebug: Add support for latency rules

2018-09-14 Thread Marc Olson

On 09/14/2018 01:46 PM, John Snow wrote:

On 09/13/2018 12:48 PM, Marc Olson wrote:

Are there further thoughts on this patch?


The CI tools may have missed it since it appears to have been sent in
reply to the V1 instead of as a new thread. When you send a revision
out, can you send it as its own thread?

Ah, sorry. Happy to do that for v3.

(We're also in a bit of a logjam at the moment with no pull requests
having been processed recently, so we're a bit gummed up.)

Some comments below.


On 09/04/2018 05:24 PM, Marc Olson wrote:

Sometimes storage devices can be slow to respond, due to media errors,
firmware
issues, SSD garbage collection, etc. This patch adds a new rule type to
blkdebug that allows injection of latency to I/O operations. Similar
to error
injection rules, latency rules can be specified with or without an
offset, and
can also apply upon state transitions.

Signed-off-by: Marc Olson 

---
v2:
- Change so that delay rules are executed before error rules
- Add QMP support
- Add tests
---
   block/blkdebug.c   | 119
+++--
   docs/devel/blkdebug.txt    |  30 +---
   qapi/block-core.json   |  39 +--
   tests/qemu-iotests/071 |  63 
   tests/qemu-iotests/071.out |  31 
   5 files changed, 244 insertions(+), 38 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 0759452..1785fe3 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -65,6 +65,7 @@ typedef struct BlkdebugSuspendedReq {
     enum {
   ACTION_INJECT_ERROR,
+    ACTION_DELAY,
   ACTION_SET_STATE,
   ACTION_SUSPEND,
   };
@@ -73,14 +74,17 @@ typedef struct BlkdebugRule {
   BlkdebugEvent event;
   int action;
   int state;
+    int once;
+    int64_t offset;

Hm, I guess we're just promoting these to universal values that might
exist for any of the rule types, but not allowing the user to set them
for rules where it doesn't make sense.
It seemed a bit odd to me to duplicate them in the union, but happy to 
move them back if you're concerned about it.



   union {
   struct {
   int error;
   int immediately;
-    int once;
-    int64_t offset;
   } inject;
   struct {
+    int64_t latency;
+    } delay;
+    struct {
   int new_state;
   } set_state;
   struct {
@@ -123,6 +127,33 @@ static QemuOptsList inject_error_opts = {
   },
   };
   +static QemuOptsList delay_opts = {
+    .name = "delay",
+    .head = QTAILQ_HEAD_INITIALIZER(delay_opts.head),
+    .desc = {
+    {
+    .name = "event",
+    },
+    {
+    .name = "state",
+    .type = QEMU_OPT_NUMBER,
+    },
+    {
+    .name = "latency",
+    .type = QEMU_OPT_NUMBER,
+    },
+    {
+    .name = "sector",
+    .type = QEMU_OPT_NUMBER,
+    },
+    {
+    .name = "once",
+    .type = QEMU_OPT_BOOL,
+    },
+    { /* end of list */ }
+    },
+};
+
   static QemuOptsList set_state_opts = {
   .name = "set-state",
   .head = QTAILQ_HEAD_INITIALIZER(set_state_opts.head),
@@ -145,6 +176,7 @@ static QemuOptsList set_state_opts = {
     static QemuOptsList *config_groups[] = {
   _error_opts,
+    _opts,
   _state_opts,
   NULL
   };
@@ -182,16 +214,21 @@ static int add_rule(void *opaque, QemuOpts
*opts, Error **errp)
   .state  = qemu_opt_get_number(opts, "state", 0),
   };
   +    rule->once  = qemu_opt_get_bool(opts, "once", 0);
+    sector = qemu_opt_get_number(opts, "sector", -1);
+    rule->offset = sector == -1 ? -1 : sector * BDRV_SECTOR_SIZE;
+
   /* Parse action-specific options */
   switch (d->action) {
   case ACTION_INJECT_ERROR:
   rule->options.inject.error = qemu_opt_get_number(opts,
"errno", EIO);
-    rule->options.inject.once  = qemu_opt_get_bool(opts, "once", 0);
   rule->options.inject.immediately =
   qemu_opt_get_bool(opts, "immediately", 0);
-    sector = qemu_opt_get_number(opts, "sector", -1);
-    rule->options.inject.offset =
-    sector == -1 ? -1 : sector * BDRV_SECTOR_SIZE;
+    break;
+
+    case ACTION_DELAY:
+    rule->options.delay.latency =
+    qemu_opt_get_number(opts, "latency", 100) * SCALE_US;
   break;
     case ACTION_SET_STATE:
@@ -264,6 +301,14 @@ static int read_config(BDRVBlkdebugState *s,
const char *filename,
   goto fail;
   }
   +    d.action = ACTION_DELAY;
+    qemu_opts_foreach(_opts, add_rule, , _err);
+    if (local_err) {
+    error_propagate(errp, local_err);
+    ret = -EINVAL;
+    goto fail;
+    }
+
   d.action = ACTION_SET_STATE;
   qemu_opts_foreach(_state_opts, add_rule, , _err);
   if (local_err) {
@@ -275,6 +320,7 @@ static int read_config(BDRVBlkdebugState *s, const
char 

Re: [Qemu-block] [PATCH v2] blkdebug: Add support for latency rules

2018-09-14 Thread John Snow



On 09/13/2018 12:48 PM, Marc Olson wrote:
> Are there further thoughts on this patch?
> 

The CI tools may have missed it since it appears to have been sent in
reply to the V1 instead of as a new thread. When you send a revision
out, can you send it as its own thread?

(We're also in a bit of a logjam at the moment with no pull requests
having been processed recently, so we're a bit gummed up.)

Some comments below.

> 
> On 09/04/2018 05:24 PM, Marc Olson wrote:
>> Sometimes storage devices can be slow to respond, due to media errors,
>> firmware
>> issues, SSD garbage collection, etc. This patch adds a new rule type to
>> blkdebug that allows injection of latency to I/O operations. Similar
>> to error
>> injection rules, latency rules can be specified with or without an
>> offset, and
>> can also apply upon state transitions.
>>
>> Signed-off-by: Marc Olson 
>>
>> ---
>> v2:
>> - Change so that delay rules are executed before error rules
>> - Add QMP support
>> - Add tests
>> ---
>>   block/blkdebug.c   | 119
>> +++--
>>   docs/devel/blkdebug.txt    |  30 +---
>>   qapi/block-core.json   |  39 +--
>>   tests/qemu-iotests/071 |  63 
>>   tests/qemu-iotests/071.out |  31 
>>   5 files changed, 244 insertions(+), 38 deletions(-)
>>
>> diff --git a/block/blkdebug.c b/block/blkdebug.c
>> index 0759452..1785fe3 100644
>> --- a/block/blkdebug.c
>> +++ b/block/blkdebug.c
>> @@ -65,6 +65,7 @@ typedef struct BlkdebugSuspendedReq {
>>     enum {
>>   ACTION_INJECT_ERROR,
>> +    ACTION_DELAY,
>>   ACTION_SET_STATE,
>>   ACTION_SUSPEND,
>>   };
>> @@ -73,14 +74,17 @@ typedef struct BlkdebugRule {
>>   BlkdebugEvent event;
>>   int action;
>>   int state;
>> +    int once;
>> +    int64_t offset;

Hm, I guess we're just promoting these to universal values that might
exist for any of the rule types, but not allowing the user to set them
for rules where it doesn't make sense.

>>   union {
>>   struct {
>>   int error;
>>   int immediately;
>> -    int once;
>> -    int64_t offset;
>>   } inject;
>>   struct {
>> +    int64_t latency;
>> +    } delay;
>> +    struct {
>>   int new_state;
>>   } set_state;
>>   struct {
>> @@ -123,6 +127,33 @@ static QemuOptsList inject_error_opts = {
>>   },
>>   };
>>   +static QemuOptsList delay_opts = {
>> +    .name = "delay",
>> +    .head = QTAILQ_HEAD_INITIALIZER(delay_opts.head),
>> +    .desc = {
>> +    {
>> +    .name = "event",
>> +    },
>> +    {
>> +    .name = "state",
>> +    .type = QEMU_OPT_NUMBER,
>> +    },
>> +    {
>> +    .name = "latency",
>> +    .type = QEMU_OPT_NUMBER,
>> +    },
>> +    {
>> +    .name = "sector",
>> +    .type = QEMU_OPT_NUMBER,
>> +    },
>> +    {
>> +    .name = "once",
>> +    .type = QEMU_OPT_BOOL,
>> +    },
>> +    { /* end of list */ }
>> +    },
>> +};
>> +
>>   static QemuOptsList set_state_opts = {
>>   .name = "set-state",
>>   .head = QTAILQ_HEAD_INITIALIZER(set_state_opts.head),
>> @@ -145,6 +176,7 @@ static QemuOptsList set_state_opts = {
>>     static QemuOptsList *config_groups[] = {
>>   _error_opts,
>> +    _opts,
>>   _state_opts,
>>   NULL
>>   };
>> @@ -182,16 +214,21 @@ static int add_rule(void *opaque, QemuOpts
>> *opts, Error **errp)
>>   .state  = qemu_opt_get_number(opts, "state", 0),
>>   };
>>   +    rule->once  = qemu_opt_get_bool(opts, "once", 0);
>> +    sector = qemu_opt_get_number(opts, "sector", -1);
>> +    rule->offset = sector == -1 ? -1 : sector * BDRV_SECTOR_SIZE;
>> +
>>   /* Parse action-specific options */
>>   switch (d->action) {
>>   case ACTION_INJECT_ERROR:
>>   rule->options.inject.error = qemu_opt_get_number(opts,
>> "errno", EIO);
>> -    rule->options.inject.once  = qemu_opt_get_bool(opts, "once", 0);
>>   rule->options.inject.immediately =
>>   qemu_opt_get_bool(opts, "immediately", 0);
>> -    sector = qemu_opt_get_number(opts, "sector", -1);
>> -    rule->options.inject.offset =
>> -    sector == -1 ? -1 : sector * BDRV_SECTOR_SIZE;
>> +    break;
>> +
>> +    case ACTION_DELAY:
>> +    rule->options.delay.latency =
>> +    qemu_opt_get_number(opts, "latency", 100) * SCALE_US;
>>   break;
>>     case ACTION_SET_STATE:
>> @@ -264,6 +301,14 @@ static int read_config(BDRVBlkdebugState *s,
>> const char *filename,
>>   goto fail;
>>   }
>>   +    d.action = ACTION_DELAY;
>> +    qemu_opts_foreach(_opts, add_rule, , _err);
>> +    if (local_err) {
>> +    error_propagate(errp, local_err);
>> +    ret = -EINVAL;
>> +    goto fail;
>> +    }
>> +
>>   d.action = ACTION_SET_STATE;
>> 

Re: [Qemu-block] [Qemu-devel] [PATCH v3 1/8] dirty-bitmap: improve bdrv_dirty_bitmap_next_zero

2018-09-14 Thread John Snow



On 09/14/2018 01:51 PM, Vladimir Sementsov-Ogievskiy wrote:
> 14.09.2018 20:39, John Snow wrote:
>>
>> On 09/10/2018 01:00 PM, Vladimir Sementsov-Ogievskiy wrote:
>>> 10.09.2018 19:55, Eric Blake wrote:
 On 9/10/18 11:49 AM, Vladimir Sementsov-Ogievskiy wrote:

>>> -int64_t hbitmap_next_zero(const HBitmap *hb, uint64_t start);
>>> +int64_t hbitmap_next_zero(const HBitmap *hb, uint64_t start,
>>> int64_t end);
>> The interface looks weird because we can define a 'start' that's
>> beyond
>> the 'end'.
>>
>> I realize that you need a signed integer for 'end' to signify EOF...
>> should we do a 'bytes' parameter instead? (Did you already do that
>> in an
>> earlier version and we changed it?)
>>
>> Well, it's not a big deal to me personally.
> interface with constant end parameter is more comfortable for loop:
> we don't need to update 'bytes' parameter on each iteration
 But there's still the question of WHO should be calculating end. Your
 interface argues for the caller:

 hbitmap_next_zero(start, start + bytes)

 int64_t hbitmap_next_zero(...)
 {
  while (offset != end) ...
 }

 while we're asking about a consistent interface for the caller (if
 most callers already have a 'bytes' rather than an 'end' computed):

 hbitmap_next_zero(start, bytes)

 int64_t hbitmap_next_zero(...)
 {
  int64_t end = start + bytes;
  while (offset != end) ...
 }

>>> Yes, that's an issue. Ok, if you are not comfortable with start,end, I
>>> can switch to start,bytes.
>>>
>> The series looks pretty close, I can merge the next version if you think
>> it's worth changing the interface.
>>
>> --js
> 
> I've started to change interface and found a bug in bitmap_to_extents
> (patch sent
> https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg01804.html).
> So, next version will be based on this patch, which will go through
> Eric's tree..
> 

OK, I spoke with Eric and if you resend and I R-B & ACK the patches,
he'll stage them through NBD.

Thanks,
--js



Re: [Qemu-block] [PATCH v7 3/6] qcow2: Reduce REFT_OFFSET_MASK

2018-09-14 Thread Eric Blake

[revisiting this thread]

On 6/29/18 10:43 AM, Kevin Wolf wrote:

Am 29.06.2018 um 17:22 hat Eric Blake geschrieben:

On 06/29/2018 03:44 AM, Kevin Wolf wrote:

Am 28.06.2018 um 21:07 hat Eric Blake geschrieben:

Match our code to the spec change in the previous patch - there's
no reason for the refcount table to allow larger offsets than the
L1/L2 tables.


What about internal snapshots? And anyway, because of the metadata
overhead, the physical image size of a fully allocated image is always
going to be at least minimally larger than the virtual disk size.

I'm not necessarily opposed to making the change if there is a good
reason to make it, but I don't see a real need for it and the
justification used here and also in the previous patch is incorrect.


The fact that ext4 cannot hold an image this large is already an indication
that setting this limit on the refcount table is NOT going to bite real
users.

Yes, you can argue that with lots of metadata, including internal snapshots,
and on a capable file system (such as tmpfs) that can even hold files this
large to begin with, then yes, allowing the refcount to exceed this limit
will allow slightly more metadata to be crammed into a single image.  But
will it actually help anyone?

Do I need to respin the series to split patch 2 into the obvious changes
(stuff unrelated to capping refcount size) vs. the controversial stuff
(refcount cap and this code change)?



Kevin


In practice, no image has more than 64PB of
allocated clusters anyways, as anything beyond that can't be
expressed via L2 mappings to host offsets.


If you're opposed to an exact 56-bit limit on the grounds that 56-bit guest
data plus minimal metadata should still be expressable, would it be better
to cap refcount at 57-bits?


You're arguing that the patch doesn't hurt in practice, but what I'm
really looking for is what do we gain from it?


I guess one thing to be gained by having qemu's implementation of qcow2 
pull a hard-stop at 56 bits is that it becomes easier to detect errant 
refcount entries beyond the end of the file as being errant, and not 
attempt to expand the file during 'qemu-img check' just because a 
refcount pointed out that far.  But that's still a pretty weak argument 
(we've already mentioned that qemu-img check should do a better job of 
handling/flagging refcount entries that point way beyond the end of the 
image - but that's true regardless of what limits may be placed on the 
host image size)




We generally don't apply patches just because they don't hurt, but
because they are helpful for something. In the simple, fully compatible
cases "helpful" could just mean "we like the code better".

This is a slightly different category: "it's technically incompatible,
but we don't think anyone uses this". We have done such changes before
when they allowed us to improve something substantially, but I think the
requirements for this are at least a little higher than just "because we
like it better". And I'm not even sure yet if I like it better because
it seems like an arbitrary restriction.

So apart from not hurting in practice, what does the restriction buy us?


As I'm not coming up with a solid reason for tightening the bounds, I 
will (eventually) respin the series without the spec change.  At this 
point, it's probably easier for me to wait until after the pull request 
queue has started flushing again.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [Qemu-devel] [PATCH] nbd/server: fix bitmap export

2018-09-14 Thread Eric Blake

On 9/14/18 12:24 PM, Eric Blake wrote:

On 9/14/18 11:51 AM, Vladimir Sementsov-Ogievskiy wrote:

bitmap_to_extents function is broken: it switches dirty variable after
every iteration, however it can process only part of dirty (or zero)
area during one iteration in case when this area is too large for one
extent.

Fortunately, the bug don't produce wrong extents: it just inserts
zero-length extents between sequential extents representing large dirty
(or zero) area. However, zero-length extents are abandoned by NBD


s/abandoned by/forbidden by the/


protocol. So, careful client should consider such replay as server


s/replay/reply/


fault and not-careful will likely ignore zero-length extents.


Which camp is qemu 3.0 as client in? Does it tolerate the zero-length 
extent, and still manage to see correct information overall, or does it 
crash?


Hmm - I think we're "safe" with qemu as client - right now, the only way 
qemu 3.0 accesses the qemu dirty bitmap over NBD is with my 
x-dirty-bitmap hack (commit 216ee3657), which uses 
block/nbd-client.c:nbd_client_co_block_status() to read the bitmap, and 
that always passes NBD_CMD_FLAG_REQ_ONE.  qemu will assert() if 
nbd_client_co_block_status() doesn't make any progress, but from what 
I'm reading of your bug report, qemu as client never permits the server 
to answer with more than one extent, and the bug of a zero-length extent 
is triggered only after the first extent has been sent.


A further mitigation - the bug can only occur for a client that requests 
block status with a length in the range (4G-bitmap_granularity, 4G). 
Otherwise, the loop terminates after the extent that got truncated to 
4G-bitmap_granularity, preventing the next iteration of the loop from 
detecting a 0-length extent.




Thus, the primary reason to accept this patch is not because of qemu 3.0 
as client, but for interoperability with other clients.  I'm planning on 
updating the commit message to add these additional details.



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [Qemu-devel] [PATCH v3 1/8] dirty-bitmap: improve bdrv_dirty_bitmap_next_zero

2018-09-14 Thread John Snow



On 09/14/2018 01:51 PM, Vladimir Sementsov-Ogievskiy wrote:
> 14.09.2018 20:39, John Snow wrote:
>>
>> On 09/10/2018 01:00 PM, Vladimir Sementsov-Ogievskiy wrote:
>>> 10.09.2018 19:55, Eric Blake wrote:
 On 9/10/18 11:49 AM, Vladimir Sementsov-Ogievskiy wrote:

>>> -int64_t hbitmap_next_zero(const HBitmap *hb, uint64_t start);
>>> +int64_t hbitmap_next_zero(const HBitmap *hb, uint64_t start,
>>> int64_t end);
>> The interface looks weird because we can define a 'start' that's
>> beyond
>> the 'end'.
>>
>> I realize that you need a signed integer for 'end' to signify EOF...
>> should we do a 'bytes' parameter instead? (Did you already do that
>> in an
>> earlier version and we changed it?)
>>
>> Well, it's not a big deal to me personally.
> interface with constant end parameter is more comfortable for loop:
> we don't need to update 'bytes' parameter on each iteration
 But there's still the question of WHO should be calculating end. Your
 interface argues for the caller:

 hbitmap_next_zero(start, start + bytes)

 int64_t hbitmap_next_zero(...)
 {
  while (offset != end) ...
 }

 while we're asking about a consistent interface for the caller (if
 most callers already have a 'bytes' rather than an 'end' computed):

 hbitmap_next_zero(start, bytes)

 int64_t hbitmap_next_zero(...)
 {
  int64_t end = start + bytes;
  while (offset != end) ...
 }

>>> Yes, that's an issue. Ok, if you are not comfortable with start,end, I
>>> can switch to start,bytes.
>>>
>> The series looks pretty close, I can merge the next version if you think
>> it's worth changing the interface.
>>
>> --js
> 
> I've started to change interface and found a bug in bitmap_to_extents
> (patch sent
> https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg01804.html).
> So, next version will be based on this patch, which will go through
> Eric's tree..
> 

ah, I see. you can send it to the list anyway with the requires: header
and I can have Eric stage it to make the eventual merge easier for Peter.

--js



Re: [Qemu-block] [Qemu-devel] [PATCH v3 1/8] dirty-bitmap: improve bdrv_dirty_bitmap_next_zero

2018-09-14 Thread Vladimir Sementsov-Ogievskiy

14.09.2018 20:39, John Snow wrote:


On 09/10/2018 01:00 PM, Vladimir Sementsov-Ogievskiy wrote:

10.09.2018 19:55, Eric Blake wrote:

On 9/10/18 11:49 AM, Vladimir Sementsov-Ogievskiy wrote:


-int64_t hbitmap_next_zero(const HBitmap *hb, uint64_t start);
+int64_t hbitmap_next_zero(const HBitmap *hb, uint64_t start,
int64_t end);

The interface looks weird because we can define a 'start' that's beyond
the 'end'.

I realize that you need a signed integer for 'end' to signify EOF...
should we do a 'bytes' parameter instead? (Did you already do that
in an
earlier version and we changed it?)

Well, it's not a big deal to me personally.

interface with constant end parameter is more comfortable for loop:
we don't need to update 'bytes' parameter on each iteration

But there's still the question of WHO should be calculating end. Your
interface argues for the caller:

hbitmap_next_zero(start, start + bytes)

int64_t hbitmap_next_zero(...)
{
     while (offset != end) ...
}

while we're asking about a consistent interface for the caller (if
most callers already have a 'bytes' rather than an 'end' computed):

hbitmap_next_zero(start, bytes)

int64_t hbitmap_next_zero(...)
{
     int64_t end = start + bytes;
     while (offset != end) ...
}


Yes, that's an issue. Ok, if you are not comfortable with start,end, I
can switch to start,bytes.


The series looks pretty close, I can merge the next version if you think
it's worth changing the interface.

--js


I've started to change interface and found a bug in bitmap_to_extents 
(patch sent 
https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg01804.html). 
So, next version will be based on this patch, which will go through 
Eric's tree..


--
Best regards,
Vladimir




Re: [Qemu-block] [Qemu-devel] [PULL 2/4] block/rbd: Attempt to parse legacy filenames

2018-09-14 Thread Eric Blake

On 9/12/18 8:18 AM, Jeff Cody wrote:

When we converted rbd to get rid of the older key/value-centric
encoding format, we broke compatibility with image files with backing
file strings encoded in the old format.

This leaves a bit of an ugly conundrum, and a hacky solution.




+/* Take care whenever deciding to actually deprecate; once this ability
+ * is removed, we will not be able to open any images with 
legacy-styled
+ * backing image strings. */
+error_report("RBD options encoded in the filename as keyvalue pairs "
+ "is deprecated.  Future versions may cease to parse "
+ "these options in the future.");


Fam spotted offline that " in the future" is redundant with "Future 
versions", if you want to respin the pull request.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [PATCH] nbd/server: fix bitmap export

2018-09-14 Thread Vladimir Sementsov-Ogievskiy

14.09.2018 20:35, Eric Blake wrote:

On 9/14/18 12:30 PM, Vladimir Sementsov-Ogievskiy wrote:


  while (begin < overall_end && i < nb_extents) {
+    bool next_dirty = !dirty;
+
  if (dirty) {
  end = bdrv_dirty_bitmap_next_zero(bitmap, begin);
  } else {
@@ -1962,6 +1964,7 @@ static unsigned int 
bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset,

  end = MIN(bdrv_dirty_bitmap_size(bitmap),
    begin + UINT32_MAX + 1 -
bdrv_dirty_bitmap_granularity(bitmap));
+    next_dirty = dirty;
  }


Rather than introducing next_dirty, couldn't you just make this:

if (end == -1 || end - begin > UINT32_MAX) {
    /* Cap ... */
    end = MIN(...);
} else {
    dirty = !dirty;
}


no, dirty variable is used after it, we can't change it here.


Ah, right. But we could also hoist the extents[i].flags = 
cpu_to_be32(dirty ? NBD_STATEE_DIRTY : 0) line to occur prior to the 
'if' doing the end capping calculation.  However, splitting the two 
assignments into extents[i].* to no longer be consecutive statements, 
just to avoid the use of a temporary variable, starts to get less 
aesthetically pleasing.


Thus, I'm fine with your version (with commit message improved), 
unless you want to send a v2.


Ok, I don't want:) Be free to update commit message and pull it.



Reviewed-by: Eric Blake 




--
Best regards,
Vladimir




Re: [Qemu-block] [PATCH v2 11/17] block-backend: Decrease in_flight only after callback

2018-09-14 Thread Paolo Bonzini
On 14/09/2018 19:14, Kevin Wolf wrote:
>> As you mention, you could have a nested aio_poll() in the main thread,
>> for example invoked from a bottom half, but in that case I'd rather
>> track the caller that is creating the bottom half and see if it lacks a
>> bdrv_ref/bdrv_unref (or perhaps it's even higher in the tree that is
>> missing).
> I went back to the commit where I first added the patch (it already
> contained the ref/unref pair) and tried if I could reproduce a bug with
> the pair removed. I couldn't.
> 
> I'm starting to think that maybe I was just overly cautious with the
> ref/unref. I may have confused the nested aio_poll() crash with a
> different situation. I've dealt with so many crashes and hangs while
> working on this series that it's quite possible.

Are you going to drop the patch hen?

Thanks,

Paolo



Re: [Qemu-block] [Qemu-devel] [PATCH v3 1/8] dirty-bitmap: improve bdrv_dirty_bitmap_next_zero

2018-09-14 Thread John Snow



On 09/10/2018 01:00 PM, Vladimir Sementsov-Ogievskiy wrote:
> 10.09.2018 19:55, Eric Blake wrote:
>> On 9/10/18 11:49 AM, Vladimir Sementsov-Ogievskiy wrote:
>>
> -int64_t hbitmap_next_zero(const HBitmap *hb, uint64_t start);
> +int64_t hbitmap_next_zero(const HBitmap *hb, uint64_t start,
> int64_t end);
 The interface looks weird because we can define a 'start' that's beyond
 the 'end'.

 I realize that you need a signed integer for 'end' to signify EOF...
 should we do a 'bytes' parameter instead? (Did you already do that
 in an
 earlier version and we changed it?)

 Well, it's not a big deal to me personally.
>>>
>>> interface with constant end parameter is more comfortable for loop:
>>> we don't need to update 'bytes' parameter on each iteration
>>
>> But there's still the question of WHO should be calculating end. Your
>> interface argues for the caller:
>>
>> hbitmap_next_zero(start, start + bytes)
>>
>> int64_t hbitmap_next_zero(...)
>> {
>>     while (offset != end) ...
>> }
>>
>> while we're asking about a consistent interface for the caller (if
>> most callers already have a 'bytes' rather than an 'end' computed):
>>
>> hbitmap_next_zero(start, bytes)
>>
>> int64_t hbitmap_next_zero(...)
>> {
>>     int64_t end = start + bytes;
>>     while (offset != end) ...
>> }
>>
> 
> Yes, that's an issue. Ok, if you are not comfortable with start,end, I
> can switch to start,bytes.
> 

The series looks pretty close, I can merge the next version if you think
it's worth changing the interface.

--js



Re: [Qemu-block] [PATCH] nbd/server: fix bitmap export

2018-09-14 Thread Vladimir Sementsov-Ogievskiy

14.09.2018 20:30, Vladimir Sementsov-Ogievskiy wrote:

14.09.2018 20:24, Eric Blake wrote:

On 9/14/18 11:51 AM, Vladimir Sementsov-Ogievskiy wrote:

bitmap_to_extents function is broken: it switches dirty variable after
every iteration, however it can process only part of dirty (or zero)
area during one iteration in case when this area is too large for one
extent.

Fortunately, the bug don't produce wrong extents: it just inserts
zero-length extents between sequential extents representing large dirty
(or zero) area. However, zero-length extents are abandoned by NBD


s/abandoned by/forbidden by the/


protocol. So, careful client should consider such replay as server


s/replay/reply/


fault and not-careful will likely ignore zero-length extents.


Which camp is qemu 3.0 as client in? Does it tolerate the zero-length 
extent, and still manage to see correct information overall, or does 
it crash?


Hmm - I think we're "safe" with qemu as client - right now, the only 
way qemu 3.0 accesses the qemu dirty bitmap over NBD is with my 
x-dirty-bitmap hack (commit 216ee3657), which uses 
block/nbd-client.c:nbd_client_co_block_status() to read the bitmap, 
and that always passes NBD_CMD_FLAG_REQ_ONE.  qemu will assert() if 
nbd_client_co_block_status() doesn't make any progress, but from what 
I'm reading of your bug report, qemu as client never permits the 
server to answer with more than one extent, and the bug of a 
zero-length extent is triggered only after the first extent has been 
sent.


Thus, the primary reason to accept this patch is not because of qemu 
3.0 as client, but for interoperability with other clients. I'm 
planning on updating the commit message to add these additional details.




Fix this by more careful handling of dirty variable.

Bug was introduced in 3d068aff16
  "nbd/server: implement dirty bitmap export", with the whole function.
and presents in v3.0.0 release.


s/presents/present/



Signed-off-by: Vladimir Sementsov-Ogievskiy 


CC: qemu-sta...@nongnu.org


---
  nbd/server.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/nbd/server.c b/nbd/server.c
index ea5fe0eb33..12f721482d 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1951,6 +1951,8 @@ static unsigned int 
bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset,

    assert(begin < overall_end && nb_extents);
  while (begin < overall_end && i < nb_extents) {
+    bool next_dirty = !dirty;
+
  if (dirty) {
  end = bdrv_dirty_bitmap_next_zero(bitmap, begin);
  } else {
@@ -1962,6 +1964,7 @@ static unsigned int 
bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset,

  end = MIN(bdrv_dirty_bitmap_size(bitmap),
    begin + UINT32_MAX + 1 -
bdrv_dirty_bitmap_granularity(bitmap));
+    next_dirty = dirty;
  }


Rather than introducing next_dirty, couldn't you just make this:

if (end == -1 || end - begin > UINT32_MAX) {
    /* Cap ... */
    end = MIN(...);
} else {
    dirty = !dirty;
}


no, dirty variable is used after it, we can't change it here.


however, it can be done if we move

extents[i].flags = cpu_to_be32(dirty ? NBD_STATE_DIRTY : 0);

to be the first line of the loop body, so, with this movement, I'm ok 
with your changes and turning it into a pull.







  if (dont_fragment && end > overall_end) {
  end = overall_end;
@@ -1971,7 +1974,7 @@ static unsigned int 
bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset,

  extents[i].flags = cpu_to_be32(dirty ? NBD_STATE_DIRTY : 0);
  i++;
  begin = end;
-    dirty = !dirty;
+    dirty = next_dirty;
  }


and then you merely delete the assignment to 'dirty' here.


    bdrv_dirty_iter_free(it);



At any rate, the fix makes sense to me.  Should I go ahead and 
incorporate the changes I've suggested and turn it into a pull 
request through my NBD tree, or would you like to send a v2?








--
Best regards,
Vladimir




Re: [Qemu-block] [Qemu-devel] [PATCH v3 0/4] transaction support for bitmap merge

2018-09-14 Thread John Snow



On 09/14/2018 11:15 AM, Vladimir Sementsov-Ogievskiy wrote:
> This is a last brick, necessary to play with nbd bitmap export in
> conjunction with image fleecing.
> 
> v3:
> 01: fix type in commit message, add John's r-b
> 02: splitted refactoring
> 03: improve commit message, split some refactoring to 02
> 04: add John's r-b
> 05: drop extra state variable, make it local instead. John's r-b.
> 
> 
> v2: don't compare with v1, it is changed a lot, to do the whole thing
> in .prepare instead of .commit. It is needed to be compatible with
> backup block job transaction actions [John]
> 
> Vladimir Sementsov-Ogievskiy (5):
>   dirty-bitmap: switch assert-fails to errors in bdrv_merge_dirty_bitmap
>   dirty-bitmap: rename bdrv_undo_clear_dirty_bitmap
>   dirty-bitmap: make it possible to restore bitmap after merge
>   blockdev: rename block-dirty-bitmap-clear transaction handlers
>   qapi: add transaction support for x-block-dirty-bitmap-merge
> 
>  qapi/transaction.json|  2 ++
>  include/block/block_int.h|  2 +-
>  include/block/dirty-bitmap.h |  2 +-
>  include/qemu/hbitmap.h   | 25 +--
>  block/dirty-bitmap.c | 36 +-
>  blockdev.c   | 59 +---
>  util/hbitmap.c   | 11 +--
>  7 files changed, 99 insertions(+), 38 deletions(-)
> 

Thanks, applied to my bitmaps tree:

https://github.com/jnsnow/qemu/commits/bitmaps
https://github.com/jnsnow/qemu.git

--js



Re: [Qemu-block] [PATCH] nbd/server: fix bitmap export

2018-09-14 Thread Eric Blake

On 9/14/18 12:30 PM, Vladimir Sementsov-Ogievskiy wrote:


  while (begin < overall_end && i < nb_extents) {
+    bool next_dirty = !dirty;
+
  if (dirty) {
  end = bdrv_dirty_bitmap_next_zero(bitmap, begin);
  } else {
@@ -1962,6 +1964,7 @@ static unsigned int 
bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset,

  end = MIN(bdrv_dirty_bitmap_size(bitmap),
    begin + UINT32_MAX + 1 -
    bdrv_dirty_bitmap_granularity(bitmap));
+    next_dirty = dirty;
  }


Rather than introducing next_dirty, couldn't you just make this:

if (end == -1 || end - begin > UINT32_MAX) {
    /* Cap ... */
    end = MIN(...);
} else {
    dirty = !dirty;
}


no, dirty variable is used after it, we can't change it here.


Ah, right. But we could also hoist the extents[i].flags = 
cpu_to_be32(dirty ? NBD_STATEE_DIRTY : 0) line to occur prior to the 
'if' doing the end capping calculation.  However, splitting the two 
assignments into extents[i].* to no longer be consecutive statements, 
just to avoid the use of a temporary variable, starts to get less 
aesthetically pleasing.


Thus, I'm fine with your version (with commit message improved), unless 
you want to send a v2.


Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [PATCH] nbd/server: fix bitmap export

2018-09-14 Thread Vladimir Sementsov-Ogievskiy

14.09.2018 20:24, Eric Blake wrote:

On 9/14/18 11:51 AM, Vladimir Sementsov-Ogievskiy wrote:

bitmap_to_extents function is broken: it switches dirty variable after
every iteration, however it can process only part of dirty (or zero)
area during one iteration in case when this area is too large for one
extent.

Fortunately, the bug don't produce wrong extents: it just inserts
zero-length extents between sequential extents representing large dirty
(or zero) area. However, zero-length extents are abandoned by NBD


s/abandoned by/forbidden by the/


protocol. So, careful client should consider such replay as server


s/replay/reply/


fault and not-careful will likely ignore zero-length extents.


Which camp is qemu 3.0 as client in? Does it tolerate the zero-length 
extent, and still manage to see correct information overall, or does 
it crash?


Hmm - I think we're "safe" with qemu as client - right now, the only 
way qemu 3.0 accesses the qemu dirty bitmap over NBD is with my 
x-dirty-bitmap hack (commit 216ee3657), which uses 
block/nbd-client.c:nbd_client_co_block_status() to read the bitmap, 
and that always passes NBD_CMD_FLAG_REQ_ONE.  qemu will assert() if 
nbd_client_co_block_status() doesn't make any progress, but from what 
I'm reading of your bug report, qemu as client never permits the 
server to answer with more than one extent, and the bug of a 
zero-length extent is triggered only after the first extent has been 
sent.


Thus, the primary reason to accept this patch is not because of qemu 
3.0 as client, but for interoperability with other clients. I'm 
planning on updating the commit message to add these additional details.




Fix this by more careful handling of dirty variable.

Bug was introduced in 3d068aff16
  "nbd/server: implement dirty bitmap export", with the whole function.
and presents in v3.0.0 release.


s/presents/present/



Signed-off-by: Vladimir Sementsov-Ogievskiy 


CC: qemu-sta...@nongnu.org


---
  nbd/server.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/nbd/server.c b/nbd/server.c
index ea5fe0eb33..12f721482d 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1951,6 +1951,8 @@ static unsigned int 
bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset,

    assert(begin < overall_end && nb_extents);
  while (begin < overall_end && i < nb_extents) {
+    bool next_dirty = !dirty;
+
  if (dirty) {
  end = bdrv_dirty_bitmap_next_zero(bitmap, begin);
  } else {
@@ -1962,6 +1964,7 @@ static unsigned int 
bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset,

  end = MIN(bdrv_dirty_bitmap_size(bitmap),
    begin + UINT32_MAX + 1 -
    bdrv_dirty_bitmap_granularity(bitmap));
+    next_dirty = dirty;
  }


Rather than introducing next_dirty, couldn't you just make this:

if (end == -1 || end - begin > UINT32_MAX) {
    /* Cap ... */
    end = MIN(...);
} else {
    dirty = !dirty;
}


no, dirty variable is used after it, we can't change it here.




  if (dont_fragment && end > overall_end) {
  end = overall_end;
@@ -1971,7 +1974,7 @@ static unsigned int 
bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset,

  extents[i].flags = cpu_to_be32(dirty ? NBD_STATE_DIRTY : 0);
  i++;
  begin = end;
-    dirty = !dirty;
+    dirty = next_dirty;
  }


and then you merely delete the assignment to 'dirty' here.


    bdrv_dirty_iter_free(it);



At any rate, the fix makes sense to me.  Should I go ahead and 
incorporate the changes I've suggested and turn it into a pull request 
through my NBD tree, or would you like to send a v2?





--
Best regards,
Vladimir




Re: [Qemu-block] [PATCH] nbd/server: fix bitmap export

2018-09-14 Thread Eric Blake

On 9/14/18 11:51 AM, Vladimir Sementsov-Ogievskiy wrote:

bitmap_to_extents function is broken: it switches dirty variable after
every iteration, however it can process only part of dirty (or zero)
area during one iteration in case when this area is too large for one
extent.

Fortunately, the bug don't produce wrong extents: it just inserts
zero-length extents between sequential extents representing large dirty
(or zero) area. However, zero-length extents are abandoned by NBD


s/abandoned by/forbidden by the/


protocol. So, careful client should consider such replay as server


s/replay/reply/


fault and not-careful will likely ignore zero-length extents.


Which camp is qemu 3.0 as client in? Does it tolerate the zero-length 
extent, and still manage to see correct information overall, or does it 
crash?


Hmm - I think we're "safe" with qemu as client - right now, the only way 
qemu 3.0 accesses the qemu dirty bitmap over NBD is with my 
x-dirty-bitmap hack (commit 216ee3657), which uses 
block/nbd-client.c:nbd_client_co_block_status() to read the bitmap, and 
that always passes NBD_CMD_FLAG_REQ_ONE.  qemu will assert() if 
nbd_client_co_block_status() doesn't make any progress, but from what 
I'm reading of your bug report, qemu as client never permits the server 
to answer with more than one extent, and the bug of a zero-length extent 
is triggered only after the first extent has been sent.


Thus, the primary reason to accept this patch is not because of qemu 3.0 
as client, but for interoperability with other clients.  I'm planning on 
updating the commit message to add these additional details.




Fix this by more careful handling of dirty variable.

Bug was introduced in 3d068aff16
  "nbd/server: implement dirty bitmap export", with the whole function.
and presents in v3.0.0 release.


s/presents/present/



Signed-off-by: Vladimir Sementsov-Ogievskiy 


CC: qemu-sta...@nongnu.org


---
  nbd/server.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/nbd/server.c b/nbd/server.c
index ea5fe0eb33..12f721482d 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1951,6 +1951,8 @@ static unsigned int bitmap_to_extents(BdrvDirtyBitmap 
*bitmap, uint64_t offset,
  
  assert(begin < overall_end && nb_extents);

  while (begin < overall_end && i < nb_extents) {
+bool next_dirty = !dirty;
+
  if (dirty) {
  end = bdrv_dirty_bitmap_next_zero(bitmap, begin);
  } else {
@@ -1962,6 +1964,7 @@ static unsigned int bitmap_to_extents(BdrvDirtyBitmap 
*bitmap, uint64_t offset,
  end = MIN(bdrv_dirty_bitmap_size(bitmap),
begin + UINT32_MAX + 1 -
bdrv_dirty_bitmap_granularity(bitmap));
+next_dirty = dirty;
  }


Rather than introducing next_dirty, couldn't you just make this:

if (end == -1 || end - begin > UINT32_MAX) {
/* Cap ... */
end = MIN(...);
} else {
dirty = !dirty;
}


  if (dont_fragment && end > overall_end) {
  end = overall_end;
@@ -1971,7 +1974,7 @@ static unsigned int bitmap_to_extents(BdrvDirtyBitmap 
*bitmap, uint64_t offset,
  extents[i].flags = cpu_to_be32(dirty ? NBD_STATE_DIRTY : 0);
  i++;
  begin = end;
-dirty = !dirty;
+dirty = next_dirty;
  }


and then you merely delete the assignment to 'dirty' here.

  
  bdrv_dirty_iter_free(it);




At any rate, the fix makes sense to me.  Should I go ahead and 
incorporate the changes I've suggested and turn it into a pull request 
through my NBD tree, or would you like to send a v2?


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [Qemu-devel] [PATCH v3 0/4] transaction support for bitmap merge

2018-09-14 Thread John Snow



On 09/14/2018 11:15 AM, Vladimir Sementsov-Ogievskiy wrote:
> This is a last brick, necessary to play with nbd bitmap export in
> conjunction with image fleecing.
> 
> v3:
> 01: fix type in commit message, add John's r-b
> 02: splitted refactoring
> 03: improve commit message, split some refactoring to 02
> 04: add John's r-b
> 05: drop extra state variable, make it local instead. John's r-b.
> 
> 
> v2: don't compare with v1, it is changed a lot, to do the whole thing
> in .prepare instead of .commit. It is needed to be compatible with
> backup block job transaction actions [John]
> 
> Vladimir Sementsov-Ogievskiy (5):
>   dirty-bitmap: switch assert-fails to errors in bdrv_merge_dirty_bitmap
>   dirty-bitmap: rename bdrv_undo_clear_dirty_bitmap
>   dirty-bitmap: make it possible to restore bitmap after merge
>   blockdev: rename block-dirty-bitmap-clear transaction handlers
>   qapi: add transaction support for x-block-dirty-bitmap-merge
> 
>  qapi/transaction.json|  2 ++
>  include/block/block_int.h|  2 +-
>  include/block/dirty-bitmap.h |  2 +-
>  include/qemu/hbitmap.h   | 25 +--
>  block/dirty-bitmap.c | 36 +-
>  blockdev.c   | 59 +---
>  util/hbitmap.c   | 11 +--
>  7 files changed, 99 insertions(+), 38 deletions(-)
> 

Reviewed-by: John Snow 




Re: [Qemu-block] [PATCH v2 11/17] block-backend: Decrease in_flight only after callback

2018-09-14 Thread Kevin Wolf
Am 14.09.2018 um 17:12 hat Paolo Bonzini geschrieben:
> On 13/09/2018 18:59, Kevin Wolf wrote:
> > Am 13.09.2018 um 17:10 hat Paolo Bonzini geschrieben:
> >> On 13/09/2018 14:52, Kevin Wolf wrote:
> >>> + if (qemu_get_current_aio_context() == qemu_get_aio_context()) {
> >>> + /* If we are in the main thread, the callback is allowed to unref
> >>> + * the BlockBackend, so we have to hold an additional reference */
> >>> + blk_ref(acb->rwco.blk);
> >>> + }
> >>> acb->common.cb(acb->common.opaque, acb->rwco.ret);
> >>> + blk_dec_in_flight(acb->rwco.blk);
> >>> + if (qemu_get_current_aio_context() == qemu_get_aio_context()) {
> >>> + blk_unref(acb->rwco.blk);
> >>> + }
> >>
> >> Is this something that happens only for some specific callers?  That is,
> >> which callers are sure that the callback is invoked from the main thread?
> > 
> > I can't seem to reproduce the problem I saw any more even when reverting
> > the bdrv_ref/unref pair. If I remember correctly it was actually a
> > nested aio_poll() that was running a block job completion or something
> > like that - which would obviously only happen on the main thread because
> > the job intentionally defers to the main thread.
> > 
> > The only reason I made this conditional is that I think bdrv_unref()
> > still isn't safe outside the main thread, is it?
> 
> Yes, making it conditional is correct, but it is quite fishy even with
> the conditional.
> 
> As you mention, you could have a nested aio_poll() in the main thread,
> for example invoked from a bottom half, but in that case I'd rather
> track the caller that is creating the bottom half and see if it lacks a
> bdrv_ref/bdrv_unref (or perhaps it's even higher in the tree that is
> missing).

I went back to the commit where I first added the patch (it already
contained the ref/unref pair) and tried if I could reproduce a bug with
the pair removed. I couldn't.

I'm starting to think that maybe I was just overly cautious with the
ref/unref. I may have confused the nested aio_poll() crash with a
different situation. I've dealt with so many crashes and hangs while
working on this series that it's quite possible.

Kevin



[Qemu-block] [PATCH] nbd/server: fix bitmap export

2018-09-14 Thread Vladimir Sementsov-Ogievskiy
bitmap_to_extents function is broken: it switches dirty variable after
every iteration, however it can process only part of dirty (or zero)
area during one iteration in case when this area is too large for one
extent.

Fortunately, the bug don't produce wrong extents: it just inserts
zero-length extents between sequential extents representing large dirty
(or zero) area. However, zero-length extents are abandoned by NBD
protocol. So, careful client should consider such replay as server
fault and not-careful will likely ignore zero-length extents.

Fix this by more careful handling of dirty variable.

Bug was introduced in 3d068aff16
 "nbd/server: implement dirty bitmap export", with the whole function.
and presents in v3.0.0 release.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 nbd/server.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/nbd/server.c b/nbd/server.c
index ea5fe0eb33..12f721482d 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1951,6 +1951,8 @@ static unsigned int bitmap_to_extents(BdrvDirtyBitmap 
*bitmap, uint64_t offset,
 
 assert(begin < overall_end && nb_extents);
 while (begin < overall_end && i < nb_extents) {
+bool next_dirty = !dirty;
+
 if (dirty) {
 end = bdrv_dirty_bitmap_next_zero(bitmap, begin);
 } else {
@@ -1962,6 +1964,7 @@ static unsigned int bitmap_to_extents(BdrvDirtyBitmap 
*bitmap, uint64_t offset,
 end = MIN(bdrv_dirty_bitmap_size(bitmap),
   begin + UINT32_MAX + 1 -
   bdrv_dirty_bitmap_granularity(bitmap));
+next_dirty = dirty;
 }
 if (dont_fragment && end > overall_end) {
 end = overall_end;
@@ -1971,7 +1974,7 @@ static unsigned int bitmap_to_extents(BdrvDirtyBitmap 
*bitmap, uint64_t offset,
 extents[i].flags = cpu_to_be32(dirty ? NBD_STATE_DIRTY : 0);
 i++;
 begin = end;
-dirty = !dirty;
+dirty = next_dirty;
 }
 
 bdrv_dirty_iter_free(it);
-- 
2.18.0




Re: [Qemu-block] [PATCH v2 12/17] mirror: Fix potential use-after-free in active commit

2018-09-14 Thread Kevin Wolf
Am 13.09.2018 um 22:55 hat Max Reitz geschrieben:
> On 13.09.18 14:52, Kevin Wolf wrote:
> > When starting an active commit job, other callbacks can run before
> > mirror_start_job() calls bdrv_ref() where needed and cause the nodes to
> > go away. Add another pair of bdrv_ref/unref() around it to protect
> > against this case.
> > 
> > Signed-off-by: Kevin Wolf 
> > ---
> >  block/mirror.c | 11 +++
> >  1 file changed, 11 insertions(+)
> 
> Reviewed-by: Max Reitz 
> 
> But...  How?

Tried to reproduce the other bug Paolo was concerned about (good there
is something like 'git reflog'!) and dug up this one instead.

So the scenario seems to be test_stream_commit_1 in qemu-iotests 030.

The backtrace when the BDS is deleted is the following:

(rr) bt
#0  0x7faeaf6145ec in __memset_sse2_unaligned_erms () at /lib64/libc.so.6
#1  0x7faeaf60414e in _int_free () at /lib64/libc.so.6
#2  0x7faeaf6093be in free () at /lib64/libc.so.6
#3  0x7faecaea6b4e in g_free () at /lib64/libglib-2.0.so.0
#4  0x55c0c94f9d6c in bdrv_delete (bs=0x55c0cbe69240) at block.c:3590
#5  0x55c0c94fbe82 in bdrv_unref (bs=0x55c0cbe69240) at block.c:4638
#6  0x55c0c94f6761 in bdrv_root_unref_child (child=0x55c0cbe57af0) at 
block.c:2188
#7  0x55c0c94fe239 in block_job_remove_all_bdrv (job=0x55c0ccf9e220) at 
blockjob.c:200
#8  0x55c0c94fdf1f in block_job_free (job=0x55c0ccf9e220) at blockjob.c:94
#9  0x55c0c94ffe0d in job_unref (job=0x55c0ccf9e220) at job.c:368
#10 0x55c0c95007cd in job_do_dismiss (job=0x55c0ccf9e220) at job.c:641
#11 0x55c0c95008d9 in job_conclude (job=0x55c0ccf9e220) at job.c:667
#12 0x55c0c9500b66 in job_finalize_single (job=0x55c0ccf9e220) at job.c:735
#13 0x55c0c94ff4fa in job_txn_apply (txn=0x55c0cbe19eb0, fn=0x55c0c9500a62 
, lock=true) at job.c:151
#14 0x55c0c9500e92 in job_do_finalize (job=0x55c0ccf9e220) at job.c:822
#15 0x55c0c9501040 in job_completed_txn_success (job=0x55c0ccf9e220) at 
job.c:872
#16 0x55c0c950115c in job_completed (job=0x55c0ccf9e220, ret=0, error=0x0) 
at job.c:892
#17 0x55c0c92b572c in stream_complete (job=0x55c0ccf9e220, 
opaque=0x55c0cc050bc0) at block/stream.c:96
#18 0x55c0c950142b in job_defer_to_main_loop_bh (opaque=0x55c0cbe38a50) at 
job.c:981
#19 0x55c0c96171fc in aio_bh_call (bh=0x55c0cbe19f20) at util/async.c:90
#20 0x55c0c9617294 in aio_bh_poll (ctx=0x55c0cbd7b8d0) at util/async.c:118
#21 0x55c0c961c150 in aio_poll (ctx=0x55c0cbd7b8d0, blocking=true) at 
util/aio-posix.c:690
#22 0x55c0c957246c in bdrv_flush (bs=0x55c0cbe4b250) at block/io.c:2693
#23 0x55c0c94f8e46 in bdrv_reopen_prepare (reopen_state=0x55c0cbef4d68, 
queue=0x55c0cbeab9f0, errp=0x7ffd65617050) at block.c:3206
#24 0x55c0c94f883a in bdrv_reopen_multiple (ctx=0x55c0cbd7b8d0, 
bs_queue=0x55c0cbeab9f0, errp=0x7ffd656170c8) at block.c:3032
#25 0x55c0c94f8a00 in bdrv_reopen (bs=0x55c0cbe2c250, bdrv_flags=8194, 
errp=0x7ffd65617220) at block.c:3075
#26 0x55c0c956a23f in commit_active_start (job_id=0x0, bs=0x55c0cbd905b0, 
base=0x55c0cbe2c250, creation_flags=0, speed=0, 
on_error=BLOCKDEV_ON_ERROR_REPORT, filter_node_name=0x0, cb=0x0, opaque=0x0, 
auto_complete=false, errp=0x7ffd65617220) at block/mirror.c:1687
#27 0x55c0c927437c in qmp_block_commit (has_job_id=false, job_id=0x0, 
device=0x55c0cbef4d20 "drive0", has_base_node=false, base_node=0x0, 
has_base=true, base=0x55c0cbe38a00 
"/home/kwolf/source/qemu/tests/qemu-iotests/scratch/img-3.img", 
has_top_node=false, top_node=0x0, has_top=false, top=0x0, 
has_backing_file=false, backing_file=0x0, has_speed=false, speed=0, 
has_filter_node_name=false, filter_node_name=0x0, errp=0x7ffd656172d8) at 
blockdev.c:3325
#28 0x55c0c928bb08 in qmp_marshal_block_commit (args=, 
ret=, errp=0x7ffd656173b8) at qapi/qapi-commands-block-core.c:409
#29 0x55c0c960beef in do_qmp_dispatch (errp=0x7ffd656173b0, 
allow_oob=false, request=, cmds=0x55c0c9f56f80 ) 
at qapi/qmp-dispatch.c:129
#30 0x55c0c960beef in qmp_dispatch (cmds=0x55c0c9f56f80 , 
request=, allow_oob=) at qapi/qmp-dispatch.c:171
#31 0x55c0c9111623 in monitor_qmp_dispatch (mon=0x55c0cbdbb930, 
req=0x55c0ccf90e00, id=0x0) at /home/kwolf/source/qemu/monitor.c:4168
#32 0x55c0c911191f in monitor_qmp_bh_dispatcher (data=0x0) at 
/home/kwolf/source/qemu/monitor.c:4237
#33 0x55c0c96171fc in aio_bh_call (bh=0x55c0cbccc840) at util/async.c:90
#34 0x55c0c9617294 in aio_bh_poll (ctx=0x55c0cbccc550) at util/async.c:118
#35 0x55c0c961b723 in aio_dispatch (ctx=0x55c0cbccc550) at 
util/aio-posix.c:436
#36 0x55c0c961762f in aio_ctx_dispatch (source=0x55c0cbccc550, 
callback=0x0, user_data=0x0) at util/async.c:261
#37 0x7faecaea1257 in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#38 0x55c0c961a2aa in glib_pollfds_poll () at util/main-loop.c:215
#39 0x55c0c961a2aa in os_host_main_loop_wait (timeout=0) at 
util/main-loop.c:238
#40 0x55c0c961a2aa in main_loop_wait 

Re: [Qemu-block] [PATCH v5 4/6] nbd/server: implement dirty bitmap export

2018-09-14 Thread Vladimir Sementsov-Ogievskiy

20.06.2018 21:09, Eric Blake wrote:

On 06/20/2018 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:

20.06.2018 19:27, Eric Blake wrote:

On 06/09/2018 10:17 AM, Vladimir Sementsov-Ogievskiy wrote:

Handle new NBD meta namespace: "qemu", and corresponding queries:
"qemu:dirty-bitmap:".

With new metadata context negotiated, BLOCK_STATUS query will reply
with dirty-bitmap data, converted to extents. New public function
nbd_export_bitmap selects bitmap to export. For now, only one bitmap
may be exported.




 static int nbd_co_send_extents(NBDClient *client, uint64_t handle,
-   NBDExtent *extents, unsigned 
nb_extents,
+   NBDExtent *extents, unsigned int 
nb_extents,

+   uint64_t length, bool last,
    uint32_t context_id, Error **errp)
 {
 NBDStructuredMeta chunk;
@@ -1890,7 +1897,9 @@ static int nbd_co_send_extents(NBDClient 
*client, uint64_t handle,
 {.iov_base = extents, .iov_len = nb_extents * 
sizeof(extents[0])}

 };

-    set_be_chunk(, 0, NBD_REPLY_TYPE_BLOCK_STATUS,
+    trace_nbd_co_send_extents(handle, nb_extents, context_id, 
length, last);


hm, length variable added only to be traced.. Ok, but a bit strange.


Yes. It doesn't affect what goes over the wire, but when it comes to 
tracing, knowing the sum of the extents can be quite helpful 
(especially knowing if the server's reply is shorter or longer than 
the client's request, and the fact that when two or more contexts are 
negotiated by the client, the different contexts can have different 
length replies)



+static unsigned int bitmap_to_extents(BdrvDirtyBitmap *bitmap, 
uint64_t offset,
+  uint64_t *length, NBDExtent 
*extents,


length - in-out? worth comment?


Sure.




+ unsigned int nb_extents,
+  bool dont_fragment)
 {
 uint64_t begin = offset, end;
-    uint64_t overall_end = offset + length;
-    unsigned i = 0;
+    uint64_t overall_end = offset + *length;
+    unsigned int i = 0;
 BdrvDirtyBitmapIter *it;
 bool dirty;

@@ -1983,23 +1994,25 @@ static unsigned 
bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset,


 bdrv_dirty_bitmap_unlock(bitmap);

+    *length = end - begin;


hm, it will always be zero, as at the end of the previous loop we 
have "begin = end;"


Ah, then it should be end - offset. Thanks for the careful review.

In fact, since ONLY the final extent can be larger than 2G (all 
earlier extents were short of the overall_end, and the incoming length 
is 32-bit), but the NBD protocol says that at most one extent can go 
beyond the original request, we do NOT want to split a >2G extent into 
multiple extents, but rather cap it to just shy of 4G at the 
granularity offered by the bitmap itself.  At which point 
add_extents() never returns more than 1, and can just be inlined.





return i;
 }

 static int nbd_co_send_bitmap(NBDClient *client, uint64_t handle,
   BdrvDirtyBitmap *bitmap, uint64_t 
offset,

-  uint64_t length, bool dont_fragment,
+  uint32_t length, bool dont_fragment,
   uint32_t context_id, Error **errp)
 {
 int ret;
-    unsigned nb_extents = dont_fragment ? 1 : NBD_MAX_BITMAP_EXTENTS;
+    unsigned int nb_extents = dont_fragment ? 1 : 
NBD_MAX_BITMAP_EXTENTS;

 NBDExtent *extents = g_new(NBDExtent, nb_extents);
+    uint64_t final_length = length;

-    nb_extents = bitmap_to_extents(bitmap, offset, length, extents, 
nb_extents,

-   dont_fragment);
+    nb_extents = bitmap_to_extents(bitmap, offset, _length, 
extents,

+   nb_extents, dont_fragment);

-    ret = nbd_co_send_extents(client, handle, extents, nb_extents, 
context_id,

-  errp);
+    ret = nbd_co_send_extents(client, handle, extents, nb_extents,
+  final_length, true, context_id, errp);


hmm in-out pointer field only to trace final_length? may be just 
trace it in bitmap_to_extents?


No, because base:allocation also benefits from tracing final_length.



also, it's not trivial, that the function now sends FLAG_DONE. I 
think, worth add a comment, or
a parameter like for nbd_co_send_block_status. It will simplify 
introducing new contexts in future.


Do we anticipate adding any in the near future? But adding a parameter 
that is always true so that the callsite becomes more obvious on when 
to pass the done flag may indeed make future additions easier to spot 
in one place.


So here's what else I'll squash in:

diff --git i/nbd/server.c w/nbd/server.c
index e84aea6cf03..7a96b296839 100644
--- i/nbd/server.c
+++ w/nbd/server.c
@@ -1881,9 +1881,10 @@ static int 
blockstatus_to_extent_be(BlockDriverState *bs, uint64_t offset,


 /* nbd_co_send_extents
  *
- * @last controls 

[Qemu-block] [PATCH v3 5/5] qapi: add transaction support for x-block-dirty-bitmap-merge

2018-09-14 Thread Vladimir Sementsov-Ogievskiy
New action is like clean action: do the whole thing in .prepare and
undo in .abort. This behavior for bitmap-changing actions is needed
because backup job actions use bitmap in .prepare.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: John Snow 
---
 qapi/transaction.json |  2 ++
 blockdev.c| 37 -
 2 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/qapi/transaction.json b/qapi/transaction.json
index d7e4274550..5875cdb16c 100644
--- a/qapi/transaction.json
+++ b/qapi/transaction.json
@@ -48,6 +48,7 @@
 # - @block-dirty-bitmap-clear: since 2.5
 # - @x-block-dirty-bitmap-enable: since 3.0
 # - @x-block-dirty-bitmap-disable: since 3.0
+# - @x-block-dirty-bitmap-merge: since 3.1
 # - @blockdev-backup: since 2.3
 # - @blockdev-snapshot: since 2.5
 # - @blockdev-snapshot-internal-sync: since 1.7
@@ -63,6 +64,7 @@
'block-dirty-bitmap-clear': 'BlockDirtyBitmap',
'x-block-dirty-bitmap-enable': 'BlockDirtyBitmap',
'x-block-dirty-bitmap-disable': 'BlockDirtyBitmap',
+   'x-block-dirty-bitmap-merge': 'BlockDirtyBitmapMerge',
'blockdev-backup': 'BlockdevBackup',
'blockdev-snapshot': 'BlockdevSnapshot',
'blockdev-snapshot-internal-sync': 'BlockdevSnapshotInternal',
diff --git a/blockdev.c b/blockdev.c
index 5348e8ba9b..2917bf6e09 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2112,6 +2112,35 @@ static void 
block_dirty_bitmap_disable_abort(BlkActionState *common)
 }
 }
 
+static void block_dirty_bitmap_merge_prepare(BlkActionState *common,
+ Error **errp)
+{
+BlockDirtyBitmapMerge *action;
+BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState,
+ common, common);
+BdrvDirtyBitmap *merge_source;
+
+if (action_check_completion_mode(common, errp) < 0) {
+return;
+}
+
+action = common->action->u.x_block_dirty_bitmap_merge.data;
+state->bitmap = block_dirty_bitmap_lookup(action->node,
+  action->dst_name,
+  >bs,
+  errp);
+if (!state->bitmap) {
+return;
+}
+
+merge_source = bdrv_find_dirty_bitmap(state->bs, action->src_name);
+if (!merge_source) {
+return;
+}
+
+bdrv_merge_dirty_bitmap(state->bitmap, merge_source, >backup, errp);
+}
+
 static void abort_prepare(BlkActionState *common, Error **errp)
 {
 error_setg(errp, "Transaction aborted using Abort action");
@@ -2182,7 +2211,13 @@ static const BlkActionOps actions[] = {
 .instance_size = sizeof(BlockDirtyBitmapState),
 .prepare = block_dirty_bitmap_disable_prepare,
 .abort = block_dirty_bitmap_disable_abort,
- }
+},
+[TRANSACTION_ACTION_KIND_X_BLOCK_DIRTY_BITMAP_MERGE] = {
+.instance_size = sizeof(BlockDirtyBitmapState),
+.prepare = block_dirty_bitmap_merge_prepare,
+.commit = block_dirty_bitmap_free_backup,
+.abort = block_dirty_bitmap_restore,
+}
 };
 
 /**
-- 
2.18.0




[Qemu-block] [PATCH v3 3/5] dirty-bitmap: make it possible to restore bitmap after merge

2018-09-14 Thread Vladimir Sementsov-Ogievskiy
Add backup parameter to bdrv_merge_dirty_bitmap() to be used then with
bdrv_restore_dirty_bitmap() if it needed to restore the bitmap after
merge operation.

This is needed to implement bitmap merge transaction action in further
commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/dirty-bitmap.h |  2 +-
 include/qemu/hbitmap.h   | 25 -
 block/dirty-bitmap.c | 17 ++---
 blockdev.c   |  2 +-
 util/hbitmap.c   | 11 ---
 5 files changed, 40 insertions(+), 17 deletions(-)

diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index 259bd27c40..201ff7f20b 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -71,7 +71,7 @@ void bdrv_dirty_bitmap_set_persistance(BdrvDirtyBitmap 
*bitmap,
bool persistent);
 void bdrv_dirty_bitmap_set_qmp_locked(BdrvDirtyBitmap *bitmap, bool 
qmp_locked);
 void bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const BdrvDirtyBitmap *src,
- Error **errp);
+ HBitmap **backup, Error **errp);
 
 /* Functions that require manual locking.  */
 void bdrv_dirty_bitmap_lock(BdrvDirtyBitmap *bitmap);
diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
index ddca52c48e..a7cb780592 100644
--- a/include/qemu/hbitmap.h
+++ b/include/qemu/hbitmap.h
@@ -73,16 +73,23 @@ void hbitmap_truncate(HBitmap *hb, uint64_t size);
 
 /**
  * hbitmap_merge:
- * @a: The bitmap to store the result in.
- * @b: The bitmap to merge into @a.
- * @return true if the merge was successful,
- * false if it was not attempted.
- *
- * Merge two bitmaps together.
- * A := A (BITOR) B.
- * B is left unmodified.
+ *
+ * Store result of merging @a and @b into @result.
+ * @result is allowed to be equal to @a or @b.
+ *
+ * Return true if the merge was successful,
+ *false if it was not attempted.
+ */
+bool hbitmap_merge(const HBitmap *a, const HBitmap *b, HBitmap *result);
+
+/**
+ * hbitmap_can_merge:
+ *
+ * hbitmap_can_merge(a, b) && hbitmap_can_merge(a, result) is sufficient and
+ * necessary for hbitmap_merge will not fail.
+ *
  */
-bool hbitmap_merge(HBitmap *a, const HBitmap *b);
+bool hbitmap_can_merge(const HBitmap *a, const HBitmap *b);
 
 /**
  * hbitmap_empty:
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 017ee9db46..8ac933cf1c 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -314,7 +314,7 @@ BdrvDirtyBitmap 
*bdrv_reclaim_dirty_bitmap_locked(BlockDriverState *bs,
 return NULL;
 }
 
-if (!hbitmap_merge(parent->bitmap, successor->bitmap)) {
+if (!hbitmap_merge(parent->bitmap, successor->bitmap, parent->bitmap)) {
 error_setg(errp, "Merging of parent and successor bitmap failed");
 return NULL;
 }
@@ -791,8 +791,10 @@ int64_t bdrv_dirty_bitmap_next_zero(BdrvDirtyBitmap 
*bitmap, uint64_t offset)
 }
 
 void bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const BdrvDirtyBitmap *src,
- Error **errp)
+ HBitmap **backup, Error **errp)
 {
+bool ret;
+
 /* only bitmaps from one bds are supported */
 assert(dest->mutex == src->mutex);
 
@@ -810,11 +812,20 @@ void bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const 
BdrvDirtyBitmap *src,
 goto out;
 }
 
-if (!hbitmap_merge(dest->bitmap, src->bitmap)) {
+if (!hbitmap_can_merge(dest->bitmap, src->bitmap)) {
 error_setg(errp, "Bitmaps are incompatible and can't be merged");
 goto out;
 }
 
+if (backup) {
+*backup = dest->bitmap;
+dest->bitmap = hbitmap_alloc(dest->size, hbitmap_granularity(*backup));
+ret = hbitmap_merge(*backup, src->bitmap, dest->bitmap);
+} else {
+ret = hbitmap_merge(dest->bitmap, src->bitmap, dest->bitmap);
+}
+assert(ret);
+
 out:
 qemu_mutex_unlock(dest->mutex);
 }
diff --git a/blockdev.c b/blockdev.c
index f133e87414..9cb29ca63e 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2961,7 +2961,7 @@ void qmp_x_block_dirty_bitmap_merge(const char *node, 
const char *dst_name,
 return;
 }
 
-bdrv_merge_dirty_bitmap(dst, src, errp);
+bdrv_merge_dirty_bitmap(dst, src, NULL, errp);
 }
 
 BlockDirtyBitmapSha256 *qmp_x_debug_block_dirty_bitmap_sha256(const char *node,
diff --git a/util/hbitmap.c b/util/hbitmap.c
index bcd304041a..d5aca5159f 100644
--- a/util/hbitmap.c
+++ b/util/hbitmap.c
@@ -723,6 +723,10 @@ void hbitmap_truncate(HBitmap *hb, uint64_t size)
 }
 }
 
+bool hbitmap_can_merge(const HBitmap *a, const HBitmap *b)
+{
+return (a->size == b->size) && (a->granularity == b->granularity);
+}
 
 /**
  * Given HBitmaps A and B, let A := A (BITOR) B.
@@ -731,14 +735,15 @@ void hbitmap_truncate(HBitmap *hb, uint64_t size)
  * @return true if the merge was successful,
  * false if it was not attempted.
  */
-bool hbitmap_merge(HBitmap *a, 

[Qemu-block] [PATCH v3 1/5] dirty-bitmap: switch assert-fails to errors in bdrv_merge_dirty_bitmap

2018-09-14 Thread Vladimir Sementsov-Ogievskiy
Move checks from qmp_x_block_dirty_bitmap_merge() to
bdrv_merge_dirty_bitmap(), to share them with dirty bitmap merge
transaction action in future commit.

Note: for now, only qmp_x_block_dirty_bitmap_merge() calls
bdrv_merge_dirty_bitmap().

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: John Snow 
---
 block/dirty-bitmap.c | 15 +--
 blockdev.c   | 10 --
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index c9b8a6fd52..6c8761e027 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -798,12 +798,23 @@ void bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const 
BdrvDirtyBitmap *src,
 
 qemu_mutex_lock(dest->mutex);
 
-assert(bdrv_dirty_bitmap_enabled(dest));
-assert(!bdrv_dirty_bitmap_readonly(dest));
+if (bdrv_dirty_bitmap_frozen(dest)) {
+error_setg(errp, "Bitmap '%s' is frozen and cannot be modified",
+   dest->name);
+goto out;
+}
+
+if (bdrv_dirty_bitmap_readonly(dest)) {
+error_setg(errp, "Bitmap '%s' is readonly and cannot be modified",
+   dest->name);
+goto out;
+}
 
 if (!hbitmap_merge(dest->bitmap, src->bitmap)) {
 error_setg(errp, "Bitmaps are incompatible and can't be merged");
+goto out;
 }
 
+out:
 qemu_mutex_unlock(dest->mutex);
 }
diff --git a/blockdev.c b/blockdev.c
index 72f5347df5..902338e815 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2955,16 +2955,6 @@ void qmp_x_block_dirty_bitmap_merge(const char *node, 
const char *dst_name,
 return;
 }
 
-if (bdrv_dirty_bitmap_frozen(dst)) {
-error_setg(errp, "Bitmap '%s' is frozen and cannot be modified",
-   dst_name);
-return;
-} else if (bdrv_dirty_bitmap_readonly(dst)) {
-error_setg(errp, "Bitmap '%s' is readonly and cannot be modified",
-   dst_name);
-return;
-}
-
 src = bdrv_find_dirty_bitmap(bs, src_name);
 if (!src) {
 error_setg(errp, "Dirty bitmap '%s' not found", src_name);
-- 
2.18.0




[Qemu-block] [PATCH v3 2/5] dirty-bitmap: rename bdrv_undo_clear_dirty_bitmap

2018-09-14 Thread Vladimir Sementsov-Ogievskiy
Use more generic names to reuse the function for bitmap merge in the
following commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/block_int.h | 2 +-
 block/dirty-bitmap.c  | 4 ++--
 blockdev.c| 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 903b9c1034..677065ae30 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1149,7 +1149,7 @@ bool blk_dev_is_medium_locked(BlockBackend *blk);
 void bdrv_set_dirty(BlockDriverState *bs, int64_t offset, int64_t bytes);
 
 void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap **out);
-void bdrv_undo_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap *in);
+void bdrv_restore_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap *backup);
 
 void bdrv_inc_in_flight(BlockDriverState *bs);
 void bdrv_dec_in_flight(BlockDriverState *bs);
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 6c8761e027..017ee9db46 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -633,12 +633,12 @@ void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, 
HBitmap **out)
 bdrv_dirty_bitmap_unlock(bitmap);
 }
 
-void bdrv_undo_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap *in)
+void bdrv_restore_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap *backup)
 {
 HBitmap *tmp = bitmap->bitmap;
 assert(bdrv_dirty_bitmap_enabled(bitmap));
 assert(!bdrv_dirty_bitmap_readonly(bitmap));
-bitmap->bitmap = in;
+bitmap->bitmap = backup;
 hbitmap_free(tmp);
 }
 
diff --git a/blockdev.c b/blockdev.c
index 902338e815..f133e87414 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2032,7 +2032,7 @@ static void block_dirty_bitmap_clear_abort(BlkActionState 
*common)
  common, common);
 
 if (state->backup) {
-bdrv_undo_clear_dirty_bitmap(state->bitmap, state->backup);
+bdrv_restore_dirty_bitmap(state->bitmap, state->backup);
 }
 }
 
-- 
2.18.0




[Qemu-block] [PATCH v3 0/4] transaction support for bitmap merge

2018-09-14 Thread Vladimir Sementsov-Ogievskiy
This is a last brick, necessary to play with nbd bitmap export in
conjunction with image fleecing.

v3:
01: fix type in commit message, add John's r-b
02: splitted refactoring
03: improve commit message, split some refactoring to 02
04: add John's r-b
05: drop extra state variable, make it local instead. John's r-b.


v2: don't compare with v1, it is changed a lot, to do the whole thing
in .prepare instead of .commit. It is needed to be compatible with
backup block job transaction actions [John]

Vladimir Sementsov-Ogievskiy (5):
  dirty-bitmap: switch assert-fails to errors in bdrv_merge_dirty_bitmap
  dirty-bitmap: rename bdrv_undo_clear_dirty_bitmap
  dirty-bitmap: make it possible to restore bitmap after merge
  blockdev: rename block-dirty-bitmap-clear transaction handlers
  qapi: add transaction support for x-block-dirty-bitmap-merge

 qapi/transaction.json|  2 ++
 include/block/block_int.h|  2 +-
 include/block/dirty-bitmap.h |  2 +-
 include/qemu/hbitmap.h   | 25 +--
 block/dirty-bitmap.c | 36 +-
 blockdev.c   | 59 +---
 util/hbitmap.c   | 11 +--
 7 files changed, 99 insertions(+), 38 deletions(-)

-- 
2.18.0




[Qemu-block] [PATCH v3 4/5] blockdev: rename block-dirty-bitmap-clear transaction handlers

2018-09-14 Thread Vladimir Sementsov-Ogievskiy
Rename block-dirty-bitmap-clear transaction handlers to reuse them for
x-block-dirty-bitmap-merge transaction in the following patch.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: John Snow 
---
 blockdev.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 9cb29ca63e..5348e8ba9b 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2026,7 +2026,7 @@ static void 
block_dirty_bitmap_clear_prepare(BlkActionState *common,
 bdrv_clear_dirty_bitmap(state->bitmap, >backup);
 }
 
-static void block_dirty_bitmap_clear_abort(BlkActionState *common)
+static void block_dirty_bitmap_restore(BlkActionState *common)
 {
 BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState,
  common, common);
@@ -2036,7 +2036,7 @@ static void block_dirty_bitmap_clear_abort(BlkActionState 
*common)
 }
 }
 
-static void block_dirty_bitmap_clear_commit(BlkActionState *common)
+static void block_dirty_bitmap_free_backup(BlkActionState *common)
 {
 BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState,
  common, common);
@@ -2170,8 +2170,8 @@ static const BlkActionOps actions[] = {
 [TRANSACTION_ACTION_KIND_BLOCK_DIRTY_BITMAP_CLEAR] = {
 .instance_size = sizeof(BlockDirtyBitmapState),
 .prepare = block_dirty_bitmap_clear_prepare,
-.commit = block_dirty_bitmap_clear_commit,
-.abort = block_dirty_bitmap_clear_abort,
+.commit = block_dirty_bitmap_free_backup,
+.abort = block_dirty_bitmap_restore,
 },
 [TRANSACTION_ACTION_KIND_X_BLOCK_DIRTY_BITMAP_ENABLE] = {
 .instance_size = sizeof(BlockDirtyBitmapState),
-- 
2.18.0




Re: [Qemu-block] [PATCH v2 03/17] aio-wait: Increase num_waiters even in home thread

2018-09-14 Thread Paolo Bonzini
On 13/09/2018 19:21, Kevin Wolf wrote:
> Am 13.09.2018 um 17:11 hat Paolo Bonzini geschrieben:
>> On 13/09/2018 14:52, Kevin Wolf wrote:
>>> Even if AIO_WAIT_WHILE() is called in the home context of the
>>> AioContext, we still want to allow the condition to change depending on
>>> other threads as long as they kick the AioWait. Specfically block jobs
>>> can be running in an I/O thread and should then be able to kick a drain
>>> in the main loop context.
>>
>> I don't understand the scenario very well.  Why hasn't the main loop's
>> drain incremented num_waiters?
> 
> We are in this path (that didn't increase num_waiters before this patch)
> because drain, and therefore AIO_WAIT_WHILE(), was called from the main
> thread. But draining depends on a job in a different thread, so we need
> to be able to be kicked when that job finally is quiesced.

Ah, because AIO_WAIT_WHILE() is invoked with ctx ==
qemu_get_aio_context(), but the condition is triggered *outside* the
main context?  Tricky, but seems correct.  AIO_WAIT_WHILE() anyway is
not a fast path.

Paolo

> If I revert this, the test /bdrv-drain/blockjob/iothread/drain hangs.
> This is a block job that works on two nodes in two different contexts.
> 
> (I think I saw this with mirror, which doesn't take additional locks
> when it issues a request, so maybe there's a bit more wrong there... We
> clearly need more testing with iothreads, this series probably only
> scratches the surface.)
> 
>> Note I'm not against the patch---though I would hoist the
>> atomic_inc/atomic_dec outside the if, since it's done in both branches.
> 
> Ok.
> 
> Kevin
> 




Re: [Qemu-block] [PATCH v2 11/17] block-backend: Decrease in_flight only after callback

2018-09-14 Thread Paolo Bonzini
On 13/09/2018 18:59, Kevin Wolf wrote:
> Am 13.09.2018 um 17:10 hat Paolo Bonzini geschrieben:
>> On 13/09/2018 14:52, Kevin Wolf wrote:
>>> + if (qemu_get_current_aio_context() == qemu_get_aio_context()) {
>>> + /* If we are in the main thread, the callback is allowed to unref
>>> + * the BlockBackend, so we have to hold an additional reference */
>>> + blk_ref(acb->rwco.blk);
>>> + }
>>> acb->common.cb(acb->common.opaque, acb->rwco.ret);
>>> + blk_dec_in_flight(acb->rwco.blk);
>>> + if (qemu_get_current_aio_context() == qemu_get_aio_context()) {
>>> + blk_unref(acb->rwco.blk);
>>> + }
>>
>> Is this something that happens only for some specific callers?  That is,
>> which callers are sure that the callback is invoked from the main thread?
> 
> I can't seem to reproduce the problem I saw any more even when reverting
> the bdrv_ref/unref pair. If I remember correctly it was actually a
> nested aio_poll() that was running a block job completion or something
> like that - which would obviously only happen on the main thread because
> the job intentionally defers to the main thread.
> 
> The only reason I made this conditional is that I think bdrv_unref()
> still isn't safe outside the main thread, is it?

Yes, making it conditional is correct, but it is quite fishy even with
the conditional.

As you mention, you could have a nested aio_poll() in the main thread,
for example invoked from a bottom half, but in that case I'd rather
track the caller that is creating the bottom half and see if it lacks a
bdrv_ref/bdrv_unref (or perhaps it's even higher in the tree that is
missing).

Paolo



Re: [Qemu-block] [Qemu-devel] Can I only commit from active image to corresponding range of its backing file by qemu cmd?

2018-09-14 Thread Eric Blake

On 9/13/18 9:19 PM, lampahome wrote:

Sorry, I need to explain what case I want to do

Todo: I want to *backup a block device into qcow2 format image.*
I met a problem which is the *file size limit of filesystem* ex: Max is
16TB for any file in ext4, but the block device maybe 32TB or more.

I figure out one way is to *divide data of device into 1TB chunk* and save
every chunk into qcow2 image cuz I don't change filesystem, and  connect
with backing chain.


A better way would be to use a different filesystem that does not have 
those limits, or even better to just directly use a raw block device 
with the size you need instead of worrying about storing a file system 
on top of the block device just to introduce artificial size limitations 
into the mix.  LVM is great for that.



*(That's what I said range is different)*
Ex: 1st chunk of device will save into image.000
2nd chunk of device will save into image.001
Nth chunk of device will save into image.(N-1)
...etc

I can see all block device data when I mount image.(N-1) by qemu-nbd cuz
the chunk doesn't overlap and all chunks connect by backing chain.


How exactly did you create those images?  I'm trying to verify the steps 
you used to split the image. I know the concept of the split, but 
without seeing actual commands used, I don't know that you actually 
accomplished the split in the manner desired.  (It's okay if a 
reproduction uses smaller scales for speed, such as splitting a 32M 
image across 1M qcow2 files - the point remains that seeing the actual 
steps used may offer additional insights into your usage scenario).


Or are you trying to ask if it is possible to create such a fragmented 
design with current tools?  (The answer that we've given you is that no, 
it is not easy to do, because no one has needed it so far).  There's no 
way to tell a running qemu that writes to offsets 0-1M go into one file, 
while writes to offsets 1M to 2M go into another - ALL writes go into 
the currently active layer, regardless of the offset represented by the 
write.


It would be possible to come up with a new driver (or to add yet another 
mode to the existing quorum driver) that DOES allow runtime 
concatenation of multiple subsidiary devices, in order to present a 
linear view of those images as a single guest device.  To an extent, 
that's what 'qemu-img convert image1 image2 imageout' is doing, except 
that qemu-img is doing it via manual hacks, rather than something baked 
into the internal qemu block layer (we'd need it in the qemu block layer 
for it to work with a running guest with random access, rather than just 
a one-time conversion pass).  But no one has submitted patches for that yet.




Now I want to do next thing: *Incremental backup*
When I modify data of 1st chunk, what I thought is to write new 1st chunk
to new image *image.N* and let *imgae.(N-1)* be the backing file of
*image.N* .
That's cuz I want to store the data before modified to roll back anytime.


Qemu DOES support incremental backups via persistent bitmaps coupled 
with NBD exports.  See 
https://bugzilla.redhat.com/show_bug.cgi?id=1207657#c27 for a 
demonstration of all the steps involved, but it is quite possible to 
create an NBD export of a point-in-time incremental of a running guest, 
where you can then query over NBD which portions of the backup represent 
deltas from your earlier point in time (by using a bitmap to track which 
clusters were written from the earlier point in time), and where you can 
read the data from NBD in ANY manner you see fit (including reading 
dirty clusters from 0-1M to write into backup file .000, reading dirty 
clusters from 1M-2M to write into backup file .001, and so on).  So if 
you want to split your backing file into ranges (which I already 
questioned as to how you plan to do that, given that the subsequent 
writes are not split), you can at least create incremental backups that 
are also split.




So now I have two *version of block device(like concept of snapshot)*:
One is image.000 to image.(N-1). I can access the data before modify by
mount image.(N-1) through qemu-nbd
The other one is image.000 to image.N.  I can access the data after modify
by mount image.N through qemu-nbd(cuz the visible 1st chunk are in the
image.N)

Consider about the situation:
000   A - - - - - - - -  <---  store the 1st chunk of block device
001   - B - - - - - - -
002   - - C - - - - - - (1st state of block device)
003   A' - - - - - - - - <--- store the 1st chunk of block device, but
data is different
004   - - - D - - - - - (2nd state of block device)
005   - - - - E - - - -  (3rd state of block device)

The original problem is If I want to remove the 2nd state(003 and 004) but
I need to keep the data of 003 and 004.
If I just commit 003, the A' of 003 must be committed into 002 cuz 002 is
the backing file of 003.
I try to figure out some way to let it only commit from 003 into 000.



I'm not quite following your diagram. My naive read 

Re: [Qemu-block] [Qemu-devel] [PATCH v2 4/4] qapi: add transaction support for x-block-dirty-bitmap-merge

2018-09-14 Thread Vladimir Sementsov-Ogievskiy

11.09.2018 22:45, John Snow wrote:


On 07/06/2018 07:36 AM, Vladimir Sementsov-Ogievskiy wrote:

New action is like clean action: do the whole thing in .prepare and
undo in .abort. This behavior for bitmap-changing actions is needed
because backup job actions use bitmap in .prepare.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  qapi/transaction.json |  2 ++
  blockdev.c| 38 +-
  2 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/qapi/transaction.json b/qapi/transaction.json
index d7e4274550..5875cdb16c 100644
--- a/qapi/transaction.json
+++ b/qapi/transaction.json
@@ -48,6 +48,7 @@
  # - @block-dirty-bitmap-clear: since 2.5
  # - @x-block-dirty-bitmap-enable: since 3.0
  # - @x-block-dirty-bitmap-disable: since 3.0
+# - @x-block-dirty-bitmap-merge: since 3.1
  # - @blockdev-backup: since 2.3
  # - @blockdev-snapshot: since 2.5
  # - @blockdev-snapshot-internal-sync: since 1.7
@@ -63,6 +64,7 @@
 'block-dirty-bitmap-clear': 'BlockDirtyBitmap',
 'x-block-dirty-bitmap-enable': 'BlockDirtyBitmap',
 'x-block-dirty-bitmap-disable': 'BlockDirtyBitmap',
+   'x-block-dirty-bitmap-merge': 'BlockDirtyBitmapMerge',
 'blockdev-backup': 'BlockdevBackup',
 'blockdev-snapshot': 'BlockdevSnapshot',
 'blockdev-snapshot-internal-sync': 'BlockdevSnapshotInternal',
diff --git a/blockdev.c b/blockdev.c
index 5348e8ba9b..feebbb9a9a 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1940,6 +1940,7 @@ static void blockdev_backup_clean(BlkActionState *common)
  typedef struct BlockDirtyBitmapState {
  BlkActionState common;
  BdrvDirtyBitmap *bitmap;
+BdrvDirtyBitmap *merge_source;

Is this necessary?


looks like it isn't, will drop it.




  BlockDriverState *bs;
  HBitmap *backup;
  bool prepared;
@@ -2112,6 +2113,35 @@ static void 
block_dirty_bitmap_disable_abort(BlkActionState *common)
  }
  }
  
+static void block_dirty_bitmap_merge_prepare(BlkActionState *common,

+ Error **errp)
+{
+BlockDirtyBitmapMerge *action;
+BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState,
+ common, common);
+
+if (action_check_completion_mode(common, errp) < 0) {
+return;
+}
+
+action = common->action->u.x_block_dirty_bitmap_merge.data;
+state->bitmap = block_dirty_bitmap_lookup(action->node,
+  action->dst_name,
+  >bs,
+  errp);
+if (!state->bitmap) {
+return;
+}
+
+state->merge_source = bdrv_find_dirty_bitmap(state->bs, action->src_name);
+if (!state->merge_source) {
+return;
+}
+
+bdrv_merge_dirty_bitmap(state->bitmap, state->merge_source, >backup,
+errp);
+}
+
  static void abort_prepare(BlkActionState *common, Error **errp)
  {
  error_setg(errp, "Transaction aborted using Abort action");
@@ -2182,7 +2212,13 @@ static const BlkActionOps actions[] = {
  .instance_size = sizeof(BlockDirtyBitmapState),
  .prepare = block_dirty_bitmap_disable_prepare,
  .abort = block_dirty_bitmap_disable_abort,
- }
+},
+[TRANSACTION_ACTION_KIND_X_BLOCK_DIRTY_BITMAP_MERGE] = {
+.instance_size = sizeof(BlockDirtyBitmapState),
+.prepare = block_dirty_bitmap_merge_prepare,
+.commit = block_dirty_bitmap_free_backup,
+.abort = block_dirty_bitmap_restore,
+}
  };
  
  /**



If the new state is not necessary and you remove it:

Reviewed-by: John Snow 



--
Best regards,
Vladimir




Re: [Qemu-block] [Qemu-devel] Can I only commit from active image to corresponding range of its backing file by qemu cmd?

2018-09-14 Thread lampahome
Sorry, I need to explain what case I want to do

Todo: I want to *backup a block device into qcow2 format image.*
I met a problem which is the *file size limit of filesystem* ex: Max is
16TB for any file in ext4, but the block device maybe 32TB or more.

I figure out one way is to *divide data of device into 1TB chunk* and save
every chunk into qcow2 image cuz I don't change filesystem, and  connect
with backing chain.
*(That's what I said range is different)*
Ex: 1st chunk of device will save into image.000
2nd chunk of device will save into image.001
Nth chunk of device will save into image.(N-1)
...etc

I can see all block device data when I mount image.(N-1) by qemu-nbd cuz
the chunk doesn't overlap and all chunks connect by backing chain.

Now I want to do next thing: *Incremental backup*
When I modify data of 1st chunk, what I thought is to write new 1st chunk
to new image *image.N* and let *imgae.(N-1)* be the backing file of
*image.N* .
That's cuz I want to store the data before modified to roll back anytime.

So now I have two *version of block device(like concept of snapshot)*:
One is image.000 to image.(N-1). I can access the data before modify by
mount image.(N-1) through qemu-nbd
The other one is image.000 to image.N.  I can access the data after modify
by mount image.N through qemu-nbd(cuz the visible 1st chunk are in the
image.N)

Consider about the situation:
000   A - - - - - - - -  <---  store the 1st chunk of block device
001   - B - - - - - - -
002   - - C - - - - - - (1st state of block device)
003   A' - - - - - - - - <--- store the 1st chunk of block device, but
data is different
004   - - - D - - - - - (2nd state of block device)
005   - - - - E - - - -  (3rd state of block device)

The original problem is If I want to remove the 2nd state(003 and 004) but
I need to keep the data of 003 and 004.
If I just commit 003, the A' of 003 must be committed into 002 cuz 002 is
the backing file of 003.
I try to figure out some way to let it only commit from 003 into 000.


[Qemu-block] [PATCH] curl: Make sslverify=off disable host as well as peer verification.

2018-09-14 Thread Richard W.M. Jones
The sslverify setting is supposed to turn off all TLS certificate
checks in libcurl.  However because of the way we use it, it only
turns off peer certificate authenticity checks
(CURLOPT_SSL_VERIFYPEER).  This patch makes it also turn off the check
that the server name in the certificate is the same as the server
you're connecting to (CURLOPT_SSL_VERIFYHOST).

We can use Google's server at 8.8.8.8 which happens to have a bad TLS
certificate to demonstrate this:

$ ./qemu-img create -q -f qcow2 -b 'json: { "file.sslverify": "off", 
"file.driver": "https", "file.url": "https://8.8.8.8/foo; }' /var/tmp/file.qcow2
qemu-img: /var/tmp/file.qcow2: CURL: Error opening file: SSL: no alternative 
certificate subject name matches target host name '8.8.8.8'
Could not open backing image to determine size.

With this patch applied, qemu-img connects to the server regardless of
the bad certificate:

$ ./qemu-img create -q -f qcow2 -b 'json: { "file.sslverify": "off", 
"file.driver": "https", "file.url": "https://8.8.8.8/foo; }' /var/tmp/file.qcow2
qemu-img: /var/tmp/file.qcow2: CURL: Error opening file: The requested URL 
returned error: 404 Not Found

(The 404 error is expected because 8.8.8.8 is not actually serving a
file called "/foo".)

Of course the default (without sslverify=off) remains to always check
the certificate:

$ ./qemu-img create -q -f qcow2 -b 'json: { "file.driver": "https", "file.url": 
"https://8.8.8.8/foo; }' /var/tmp/file.qcow2
qemu-img: /var/tmp/file.qcow2: CURL: Error opening file: SSL: no alternative 
certificate subject name matches target host name '8.8.8.8'
Could not open backing image to determine size.

Further information about the two settings is available here:

https://curl.haxx.se/libcurl/c/CURLOPT_SSL_VERIFYPEER.html
https://curl.haxx.se/libcurl/c/CURLOPT_SSL_VERIFYHOST.html

Signed-off-by: Richard W.M. Jones 
---
 block/curl.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/curl.c b/block/curl.c
index 229bb84a27..fabb2b4da7 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -483,6 +483,8 @@ static int curl_init_state(BDRVCURLState *s, CURLState 
*state)
 curl_easy_setopt(state->curl, CURLOPT_URL, s->url);
 curl_easy_setopt(state->curl, CURLOPT_SSL_VERIFYPEER,
  (long) s->sslverify);
+curl_easy_setopt(state->curl, CURLOPT_SSL_VERIFYHOST,
+ s->sslverify ? 2L : 0L);
 if (s->cookie) {
 curl_easy_setopt(state->curl, CURLOPT_COOKIE, s->cookie);
 }
-- 
2.19.0.rc0




Re: [Qemu-block] [PATCH v2 11/17] block-backend: Decrease in_flight only after callback

2018-09-14 Thread Fam Zheng
On Thu, 09/13 18:59, Kevin Wolf wrote:
> Am 13.09.2018 um 17:10 hat Paolo Bonzini geschrieben:
> > On 13/09/2018 14:52, Kevin Wolf wrote:
> > > + if (qemu_get_current_aio_context() == qemu_get_aio_context()) {
> > > + /* If we are in the main thread, the callback is allowed to unref
> > > + * the BlockBackend, so we have to hold an additional reference */
> > > + blk_ref(acb->rwco.blk);
> > > + }
> > > acb->common.cb(acb->common.opaque, acb->rwco.ret);
> > > + blk_dec_in_flight(acb->rwco.blk);
> > > + if (qemu_get_current_aio_context() == qemu_get_aio_context()) {
> > > + blk_unref(acb->rwco.blk);
> > > + }
> > 
> > Is this something that happens only for some specific callers?  That is,
> > which callers are sure that the callback is invoked from the main thread?
> 
> I can't seem to reproduce the problem I saw any more even when reverting
> the bdrv_ref/unref pair. If I remember correctly it was actually a
> nested aio_poll() that was running a block job completion or something
> like that - which would obviously only happen on the main thread because
> the job intentionally defers to the main thread.
> 
> The only reason I made this conditional is that I think bdrv_unref()
> still isn't safe outside the main thread, is it?

I think that is correct.

Fam