Re: [Qemu-block] [PATCH 13/16] block: Implement bdrv_append() without bdrv_swap()

2015-10-01 Thread Kevin Wolf
Am 30.09.2015 um 16:45 hat Max Reitz geschrieben:
> On 29.09.2015 15:51, Kevin Wolf wrote:
> > Am 23.09.2015 um 18:36 hat Max Reitz geschrieben:
> >> On 17.09.2015 15:48, Kevin Wolf wrote:
> >>>  void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top)
> >>>  {
> >>> -bdrv_swap(bs_new, bs_top);
> >>> +assert(!bdrv_requests_pending(bs_top));
> >>> +assert(!bdrv_requests_pending(bs_new));
> >>> +
> >>> +bdrv_ref(bs_top);
> >>> +change_parent_backing_link(bs_top, bs_new);
> >>> +
> >>> +/* Some fields always stay on top of the backing file chain */
> >>> +swap_feature_fields(bs_top, bs_new);
> >>> +
> >>> +bdrv_set_backing_hd(bs_new, bs_top);
> >>> +bdrv_unref(bs_top);
> >>>  
> >>> -/* The contents of 'tmp' will become bs_top, as we are
> >>> - * swapping bs_new and bs_top contents. */
> >>> -bdrv_set_backing_hd(bs_top, bs_new);
> >>> +/* bs_new is now referenced by its new parents, we don't need the
> >>> + * additional reference any more. */
> >>> +bdrv_unref(bs_new);
> >>>  }
> >>
> >> Before, all pointers to @bs_new were moved to @bs_top. Now, they stay at
> >> @bs_new. I suppose we are assuming there are no pointers to @bs_new,
> >> should we assert that, and/or point it out in the documentation?
> > 
> > How would you assert something like this?
> 
> Of course, I have no idea.
> 
> >   Also, I think it's currently
> > true, but there's no reason why it should stay so. The important part is
> > just that it's true while applying the patch because the semantics
> > changes. Once it's applied, we have sane behaviour and can make use of
> > it.
> 
> The thing is that this is exactly the reason for the bug Berto found.
> external_snapshot_commit() keeps a reference to state->new_bs (@bs_new)
> and uses it afterwards for bdrv_reopen(), whereas it should be using
> state->old_bs (@bs_top) after this series.

Yes, the interface changes. That means that you need to review all
callers. There aren't that many, so it's doable.

Kevin


pgpcRDwR6Ej22.pgp
Description: PGP signature


Re: [Qemu-block] [PATCH 13/16] block: Implement bdrv_append() without bdrv_swap()

2015-10-01 Thread Max Reitz
On 29.09.2015 15:51, Kevin Wolf wrote:
> Am 23.09.2015 um 18:36 hat Max Reitz geschrieben:
>> On 17.09.2015 15:48, Kevin Wolf wrote:
>>> Remember all parent nodes and just change the pointers there instead of
>>> swapping the contents of the BlockDriverState.
>>>
>>> Handling of snapshot=on must be moved further down in bdrv_open()
>>> because *pbs (which is the bs pointer in the BlockBackend) must already
>>> be set before bdrv_append() is called. Otherwise bdrv_append() changes
>>> the BB's pointer to the temporary snapshot, but bdrv_open() overwrites
>>> it with the read-only original image.
>>>
>>> Signed-off-by: Kevin Wolf 
> 
>>> @@ -2155,6 +2157,42 @@ void bdrv_swap(BlockDriverState *bs_new, 
>>> BlockDriverState *bs_old)
>>>  bdrv_rebind(bs_old);
>>>  }
>>>  
>>> +static void change_parent_backing_link(BlockDriverState *from,
>>> +   BlockDriverState *to)
>>> +{
>>> +BdrvChild *c, *next;
>>> +
>>> +QLIST_FOREACH_SAFE(c, >parents, next_parent, next) {
>>> +assert(c->role != _backing);
>>> +c->bs = to;
>>> +QLIST_REMOVE(c, next_parent);
>>> +QLIST_INSERT_HEAD(>parents, c, next_parent);
>>
>> This drops a reference from the parent BDS to @from, and adds a new one
>> from the parent BDS to @to. However, this is not reflected here.
> 
> You mean bdrv_ref(to); bdrv_unref(from); ?

Yes.

>>> +}
>>> +if (from->blk) {
>>> +blk_set_bs(from->blk, to);
>>> +if (!to->device_list.tqe_prev) {
>>> +QTAILQ_INSERT_BEFORE(from, to, device_list);
>>> +}
>>> +QTAILQ_REMOVE(_states, from, device_list);
>>> +}
>>> +}
>>> +
>>> +static void swap_feature_fields(BlockDriverState *bs_top,
>>> +BlockDriverState *bs_new)
>>> +{
>>> +BlockDriverState tmp;
>>> +
>>> +bdrv_move_feature_fields(, bs_top);
>>> +bdrv_move_feature_fields(bs_top, bs_new);
>>> +bdrv_move_feature_fields(bs_new, );
>>> +
>>> +assert(!bs_new->io_limits_enabled);
>>> +if (bs_top->io_limits_enabled) {
>>> +bdrv_io_limits_enable(bs_new, throttle_group_get_name(bs_top));
>>> +bdrv_io_limits_disable(bs_top);
>>> +}
>>> +}
>>> +
>>>  /*
>>>   * Add new bs contents at the top of an image chain while the chain is
>>>   * live, while keeping required fields on the top layer.
>>> @@ -2165,14 +2203,29 @@ void bdrv_swap(BlockDriverState *bs_new, 
>>> BlockDriverState *bs_old)
>>>   * bs_new must not be attached to a BlockBackend.
>>>   *
>>>   * This function does not create any image files.
>>> + *
>>> + * bdrv_append() takes ownership of a bs_new reference and unrefs it 
>>> because
>>> + * that's what the callers commonly need. bs_new will be referenced by the 
>>> old
>>> + * parents of bs_top after bdrv_append() returns. If the caller needs to 
>>> keep a
>>> + * reference of its own, it must call bdrv_ref().
>>>   */
>>>  void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top)
>>>  {
>>> -bdrv_swap(bs_new, bs_top);
>>> +assert(!bdrv_requests_pending(bs_top));
>>> +assert(!bdrv_requests_pending(bs_new));
>>> +
>>> +bdrv_ref(bs_top);
>>> +change_parent_backing_link(bs_top, bs_new);
>>> +
>>> +/* Some fields always stay on top of the backing file chain */
>>> +swap_feature_fields(bs_top, bs_new);
>>> +
>>> +bdrv_set_backing_hd(bs_new, bs_top);
>>> +bdrv_unref(bs_top);
>>>  
>>> -/* The contents of 'tmp' will become bs_top, as we are
>>> - * swapping bs_new and bs_top contents. */
>>> -bdrv_set_backing_hd(bs_top, bs_new);
>>> +/* bs_new is now referenced by its new parents, we don't need the
>>> + * additional reference any more. */
>>> +bdrv_unref(bs_new);
>>>  }
>>
>> Before, all pointers to @bs_new were moved to @bs_top. Now, they stay at
>> @bs_new. I suppose we are assuming there are no pointers to @bs_new,
>> should we assert that, and/or point it out in the documentation?
> 
> How would you assert something like this?

Of course, I have no idea.

>   Also, I think it's currently
> true, but there's no reason why it should stay so. The important part is
> just that it's true while applying the patch because the semantics
> changes. Once it's applied, we have sane behaviour and can make use of
> it.

The thing is that this is exactly the reason for the bug Berto found.
external_snapshot_commit() keeps a reference to state->new_bs (@bs_new)
and uses it afterwards for bdrv_reopen(), whereas it should be using
state->old_bs (@bs_top) after this series.

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 13/16] block: Implement bdrv_append() without bdrv_swap()

2015-09-29 Thread Kevin Wolf
Am 23.09.2015 um 18:36 hat Max Reitz geschrieben:
> On 17.09.2015 15:48, Kevin Wolf wrote:
> > Remember all parent nodes and just change the pointers there instead of
> > swapping the contents of the BlockDriverState.
> > 
> > Handling of snapshot=on must be moved further down in bdrv_open()
> > because *pbs (which is the bs pointer in the BlockBackend) must already
> > be set before bdrv_append() is called. Otherwise bdrv_append() changes
> > the BB's pointer to the temporary snapshot, but bdrv_open() overwrites
> > it with the read-only original image.
> > 
> > Signed-off-by: Kevin Wolf 

> > @@ -2155,6 +2157,42 @@ void bdrv_swap(BlockDriverState *bs_new, 
> > BlockDriverState *bs_old)
> >  bdrv_rebind(bs_old);
> >  }
> >  
> > +static void change_parent_backing_link(BlockDriverState *from,
> > +   BlockDriverState *to)
> > +{
> > +BdrvChild *c, *next;
> > +
> > +QLIST_FOREACH_SAFE(c, >parents, next_parent, next) {
> > +assert(c->role != _backing);
> > +c->bs = to;
> > +QLIST_REMOVE(c, next_parent);
> > +QLIST_INSERT_HEAD(>parents, c, next_parent);
> 
> This drops a reference from the parent BDS to @from, and adds a new one
> from the parent BDS to @to. However, this is not reflected here.

You mean bdrv_ref(to); bdrv_unref(from); ?

> > +}
> > +if (from->blk) {
> > +blk_set_bs(from->blk, to);
> > +if (!to->device_list.tqe_prev) {
> > +QTAILQ_INSERT_BEFORE(from, to, device_list);
> > +}
> > +QTAILQ_REMOVE(_states, from, device_list);
> > +}
> > +}
> > +
> > +static void swap_feature_fields(BlockDriverState *bs_top,
> > +BlockDriverState *bs_new)
> > +{
> > +BlockDriverState tmp;
> > +
> > +bdrv_move_feature_fields(, bs_top);
> > +bdrv_move_feature_fields(bs_top, bs_new);
> > +bdrv_move_feature_fields(bs_new, );
> > +
> > +assert(!bs_new->io_limits_enabled);
> > +if (bs_top->io_limits_enabled) {
> > +bdrv_io_limits_enable(bs_new, throttle_group_get_name(bs_top));
> > +bdrv_io_limits_disable(bs_top);
> > +}
> > +}
> > +
> >  /*
> >   * Add new bs contents at the top of an image chain while the chain is
> >   * live, while keeping required fields on the top layer.
> > @@ -2165,14 +2203,29 @@ void bdrv_swap(BlockDriverState *bs_new, 
> > BlockDriverState *bs_old)
> >   * bs_new must not be attached to a BlockBackend.
> >   *
> >   * This function does not create any image files.
> > + *
> > + * bdrv_append() takes ownership of a bs_new reference and unrefs it 
> > because
> > + * that's what the callers commonly need. bs_new will be referenced by the 
> > old
> > + * parents of bs_top after bdrv_append() returns. If the caller needs to 
> > keep a
> > + * reference of its own, it must call bdrv_ref().
> >   */
> >  void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top)
> >  {
> > -bdrv_swap(bs_new, bs_top);
> > +assert(!bdrv_requests_pending(bs_top));
> > +assert(!bdrv_requests_pending(bs_new));
> > +
> > +bdrv_ref(bs_top);
> > +change_parent_backing_link(bs_top, bs_new);
> > +
> > +/* Some fields always stay on top of the backing file chain */
> > +swap_feature_fields(bs_top, bs_new);
> > +
> > +bdrv_set_backing_hd(bs_new, bs_top);
> > +bdrv_unref(bs_top);
> >  
> > -/* The contents of 'tmp' will become bs_top, as we are
> > - * swapping bs_new and bs_top contents. */
> > -bdrv_set_backing_hd(bs_top, bs_new);
> > +/* bs_new is now referenced by its new parents, we don't need the
> > + * additional reference any more. */
> > +bdrv_unref(bs_new);
> >  }
> 
> Before, all pointers to @bs_new were moved to @bs_top. Now, they stay at
> @bs_new. I suppose we are assuming there are no pointers to @bs_new,
> should we assert that, and/or point it out in the documentation?

How would you assert something like this? Also, I think it's currently
true, but there's no reason why it should stay so. The important part is
just that it's true while applying the patch because the semantics
changes. Once it's applied, we have sane behaviour and can make use of
it.

Kevin


pgpwEG52kibrV.pgp
Description: PGP signature


Re: [Qemu-block] [PATCH 13/16] block: Implement bdrv_append() without bdrv_swap()

2015-09-23 Thread Max Reitz
On 17.09.2015 15:48, Kevin Wolf wrote:
> Remember all parent nodes and just change the pointers there instead of
> swapping the contents of the BlockDriverState.
> 
> Handling of snapshot=on must be moved further down in bdrv_open()
> because *pbs (which is the bs pointer in the BlockBackend) must already
> be set before bdrv_append() is called. Otherwise bdrv_append() changes
> the BB's pointer to the temporary snapshot, but bdrv_open() overwrites
> it with the read-only original image.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block.c | 109 
> +++-
>  1 file changed, 81 insertions(+), 28 deletions(-)
> 
> diff --git a/block.c b/block.c
> index c196f83..98fc17c 100644
> --- a/block.c
> +++ b/block.c
> @@ -1516,15 +1516,6 @@ static int bdrv_open_inherit(BlockDriverState **pbs, 
> const char *filename,
>  
>  bdrv_refresh_filename(bs);
>  
> -/* For snapshot=on, create a temporary qcow2 overlay. bs points to the
> - * temporary snapshot afterwards. */
> -if (snapshot_flags) {
> -ret = bdrv_append_temp_snapshot(bs, snapshot_flags, _err);
> -if (local_err) {
> -goto close_and_fail;
> -}
> -}
> -
>  /* Check if any unknown options were used */
>  if (options && (qdict_size(options) != 0)) {
>  const QDictEntry *entry = qdict_first(options);
> @@ -1556,6 +1547,16 @@ static int bdrv_open_inherit(BlockDriverState **pbs, 
> const char *filename,
>  
>  QDECREF(options);
>  *pbs = bs;
> +
> +/* For snapshot=on, create a temporary qcow2 overlay. bs points to the
> + * temporary snapshot afterwards. */
> +if (snapshot_flags) {
> +ret = bdrv_append_temp_snapshot(bs, snapshot_flags, _err);
> +if (local_err) {
> +goto close_and_fail;
> +}
> +}
> +
>  return 0;
>  
>  fail:
> @@ -1999,20 +2000,6 @@ static void bdrv_move_feature_fields(BlockDriverState 
> *bs_dest,
>  
>  bs_dest->enable_write_cache = bs_src->enable_write_cache;
>  
> -/* i/o throttled req */
> -bs_dest->throttle_state = bs_src->throttle_state,
> -bs_dest->io_limits_enabled  = bs_src->io_limits_enabled;
> -bs_dest->pending_reqs[0]= bs_src->pending_reqs[0];
> -bs_dest->pending_reqs[1]= bs_src->pending_reqs[1];
> -bs_dest->throttled_reqs[0]  = bs_src->throttled_reqs[0];
> -bs_dest->throttled_reqs[1]  = bs_src->throttled_reqs[1];
> -memcpy(_dest->round_robin,
> -   _src->round_robin,
> -   sizeof(bs_dest->round_robin));
> -memcpy(_dest->throttle_timers,
> -   _src->throttle_timers,
> -   sizeof(ThrottleTimers));
> -
>  /* r/w error */
>  bs_dest->on_read_error  = bs_src->on_read_error;
>  bs_dest->on_write_error = bs_src->on_write_error;
> @@ -2026,10 +2013,25 @@ static void bdrv_move_feature_fields(BlockDriverState 
> *bs_dest,
>  }
>  
>  /* Fields that only need to be swapped if the contents of BDSes is swapped
> - * rather than pointers being changed in the parents. */
> + * rather than pointers being changed in the parents, and throttling fields
> + * because only bdrv_swap() messes with internals of throttling. */
>  static void bdrv_move_reference_fields(BlockDriverState *bs_dest,
> BlockDriverState *bs_src)
>  {
> +/* i/o throttled req */
> +bs_dest->throttle_state = bs_src->throttle_state,
> +bs_dest->io_limits_enabled  = bs_src->io_limits_enabled;
> +bs_dest->pending_reqs[0]= bs_src->pending_reqs[0];
> +bs_dest->pending_reqs[1]= bs_src->pending_reqs[1];
> +bs_dest->throttled_reqs[0]  = bs_src->throttled_reqs[0];
> +bs_dest->throttled_reqs[1]  = bs_src->throttled_reqs[1];
> +memcpy(_dest->round_robin,
> +   _src->round_robin,
> +   sizeof(bs_dest->round_robin));
> +memcpy(_dest->throttle_timers,
> +   _src->throttle_timers,
> +   sizeof(ThrottleTimers));
> +
>  /* reference count */
>  bs_dest->refcnt = bs_src->refcnt;
>  
> @@ -2155,6 +2157,42 @@ void bdrv_swap(BlockDriverState *bs_new, 
> BlockDriverState *bs_old)
>  bdrv_rebind(bs_old);
>  }
>  
> +static void change_parent_backing_link(BlockDriverState *from,
> +   BlockDriverState *to)
> +{
> +BdrvChild *c, *next;
> +
> +QLIST_FOREACH_SAFE(c, >parents, next_parent, next) {
> +assert(c->role != _backing);
> +c->bs = to;
> +QLIST_REMOVE(c, next_parent);
> +QLIST_INSERT_HEAD(>parents, c, next_parent);

This drops a reference from the parent BDS to @from, and adds a new one
from the parent BDS to @to. However, this is not reflected here.

> +}
> +if (from->blk) {
> +blk_set_bs(from->blk, to);
> +if (!to->device_list.tqe_prev) {
> +QTAILQ_INSERT_BEFORE(from, to, device_list);
> +}
> +QTAILQ_REMOVE(_states, from, 

Re: [Qemu-block] [PATCH 13/16] block: Implement bdrv_append() without bdrv_swap()

2015-09-18 Thread Alberto Garcia
On Thu 17 Sep 2015 03:48:17 PM CEST, Kevin Wolf wrote:

> +static void swap_feature_fields(BlockDriverState *bs_top,
> +BlockDriverState *bs_new)
> +{
> +BlockDriverState tmp;
> +
> +bdrv_move_feature_fields(, bs_top);
> +bdrv_move_feature_fields(bs_top, bs_new);
> +bdrv_move_feature_fields(bs_new, );
> +
> +assert(!bs_new->io_limits_enabled);
> +if (bs_top->io_limits_enabled) {
> +bdrv_io_limits_enable(bs_new, throttle_group_get_name(bs_top));
> +bdrv_io_limits_disable(bs_top);
> +}
> +}

I would use 'if (bs_top->throttle_state != NULL)' instead. That pointer
is set if and only if the BDS has I/O limits set.

bs->io_limits_enabled can be set to false in order to bypass the limits
temporarily without removing the BDS's throttling settings (e.g in
bdrv_read_unthrottled()).

I actually think that we can get rid of io_limits_enabled in several
places (if not completely), but I will take care of that in a separate
patch.

The only scenario that could cause problems is if bs->throttle_state is
set but bs->io_limits_enabled is false when swap_feature_fields() is
called, but I don't think that's possible in this case.

Berto



Re: [Qemu-block] [PATCH 13/16] block: Implement bdrv_append() without bdrv_swap()

2015-09-18 Thread Kevin Wolf
Am 18.09.2015 um 13:45 hat Alberto Garcia geschrieben:
> On Thu 17 Sep 2015 03:48:17 PM CEST, Kevin Wolf wrote:
> 
> > +static void swap_feature_fields(BlockDriverState *bs_top,
> > +BlockDriverState *bs_new)
> > +{
> > +BlockDriverState tmp;
> > +
> > +bdrv_move_feature_fields(, bs_top);
> > +bdrv_move_feature_fields(bs_top, bs_new);
> > +bdrv_move_feature_fields(bs_new, );
> > +
> > +assert(!bs_new->io_limits_enabled);
> > +if (bs_top->io_limits_enabled) {
> > +bdrv_io_limits_enable(bs_new, throttle_group_get_name(bs_top));
> > +bdrv_io_limits_disable(bs_top);
> > +}
> > +}
> 
> I would use 'if (bs_top->throttle_state != NULL)' instead. That pointer
> is set if and only if the BDS has I/O limits set.
> 
> bs->io_limits_enabled can be set to false in order to bypass the limits
> temporarily without removing the BDS's throttling settings (e.g in
> bdrv_read_unthrottled()).
> 
> I actually think that we can get rid of io_limits_enabled in several
> places (if not completely), but I will take care of that in a separate
> patch.
> 
> The only scenario that could cause problems is if bs->throttle_state is
> set but bs->io_limits_enabled is false when swap_feature_fields() is
> called, but I don't think that's possible in this case.

So something like this? (Specifically, is the assertion right?)

assert(!bs_new->throttle_state);
if (bs_top->throttle_state) {
assert(bs_top->io_limits_enabled);
bdrv_io_limits_enable(bs_new, throttle_group_get_name(bs_top));
bdrv_io_limits_disable(bs_top);
}

If the assertion doesn't hold true, I guess we would need to call
throttle_group_register() directly for the cases where io_limits_enabled
wasn't true.

Kevin



Re: [Qemu-block] [PATCH 13/16] block: Implement bdrv_append() without bdrv_swap()

2015-09-18 Thread Alberto Garcia
On Fri 18 Sep 2015 02:24:21 PM CEST, Kevin Wolf wrote:

>> > +static void swap_feature_fields(BlockDriverState *bs_top,
>> > +BlockDriverState *bs_new)
>> > +{
>> > +BlockDriverState tmp;
>> > +
>> > +bdrv_move_feature_fields(, bs_top);
>> > +bdrv_move_feature_fields(bs_top, bs_new);
>> > +bdrv_move_feature_fields(bs_new, );
>> > +
>> > +assert(!bs_new->io_limits_enabled);
>> > +if (bs_top->io_limits_enabled) {
>> > +bdrv_io_limits_enable(bs_new, throttle_group_get_name(bs_top));
>> > +bdrv_io_limits_disable(bs_top);
>> > +}
>> > +}
>> 
>> I would use 'if (bs_top->throttle_state != NULL)' instead. That pointer
>> is set if and only if the BDS has I/O limits set.
>> 
>> bs->io_limits_enabled can be set to false in order to bypass the limits
>> temporarily without removing the BDS's throttling settings (e.g in
>> bdrv_read_unthrottled()).
>> 
>> I actually think that we can get rid of io_limits_enabled in several
>> places (if not completely), but I will take care of that in a separate
>> patch.
>> 
>> The only scenario that could cause problems is if bs->throttle_state is
>> set but bs->io_limits_enabled is false when swap_feature_fields() is
>> called, but I don't think that's possible in this case.
>
> So something like this? (Specifically, is the assertion right?)
>
> assert(!bs_new->throttle_state);
> if (bs_top->throttle_state) {
> assert(bs_top->io_limits_enabled);
> bdrv_io_limits_enable(bs_new, throttle_group_get_name(bs_top));
> bdrv_io_limits_disable(bs_top);
> }

Yes, I think that's fine.

> If the assertion doesn't hold true, I guess we would need to call
> throttle_group_register() directly for the cases where
> io_limits_enabled wasn't true.

If io_limits_enabled is not true but throttle_state is set it means that
there's an ongoing request that needs to bypass the I/O limits.

There's two places where that can happen:

  1) bdrv_start_throttled_reqs(), that's either a result of
 bdrv_io_limits_disable() or bdrv_flush_io_queue()
 (i.e. bdrv_drain()).

  2) blk_read_unthrottled() (that comes from hd_geometry_guess()).

Is there any scenario where the feature fields can be swapped in the
middle of one of these two operations? I understand that the BDS must be
drained first, that's why I don't think there's any risk.

Berto