Re: [Qemu-block] [PATCH v5 07/11] block: introduce backup-top filter driver

2019-04-13 Thread Vladimir Sementsov-Ogievskiy
13.04.2019 19:08, Vladimir Sementsov-Ogievskiy wrote:
> 16.01.2019 19:02, Max Reitz wrote:
>> On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
>>> Backup-top filter does copy-before-write operation. It should be
>>> inserted above active disk and has a target node for CBW, like the
>>> following:
>>>
>>>  +---+
>>>  | Guest |
>>>  +---+---+
>>>  |r,w
>>>  v
>>>  +---+---+  target   +---+
>>>  | backup_top    |-->| target(qcow2) |
>>>  +---+---+   CBW +---+---+
>>>  |
>>> backing |r,w
>>>  v
>>>  +---+-+
>>>  | Active disk |
>>>  +-+
>>>
>>> The driver will be used in backup instead of write-notifiers.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>> ---
>>>   block/backup-top.h  |  43 +++
>>>   block/backup-top.c  | 306 
>>>   block/Makefile.objs |   2 +
>>>   3 files changed, 351 insertions(+)
>>>   create mode 100644 block/backup-top.h
>>>   create mode 100644 block/backup-top.c
>>
> 
> [..]
> 
>>> +BlockDriverState *bdrv_backup_top_append(BlockDriverState *source,
>>> + BlockDriverState *target,
>>> + HBitmap *copy_bitmap,
>>> + Error **errp)
>>> +{
>>> +    Error *local_err = NULL;
>>> +    BDRVBackupTopState *state;
>>> +    BlockDriverState *top = bdrv_new_open_driver(_backup_top_filter,
>>> + NULL, BDRV_O_RDWR, errp);
>>> +
>>> +    if (!top) {
>>> +    return NULL;
>>> +    }
>>> +
>>> +    top->implicit = true;
>>> +    top->total_sectors = source->total_sectors;
>>> +    top->opaque = state = g_new0(BDRVBackupTopState, 1);
>>> +    state->copy_bitmap = copy_bitmap;
>>> +
>>> +    bdrv_ref(target);
>>> +    state->target = bdrv_attach_child(top, target, "target", _file, 
>>> errp);
>>> +    if (!state->target) {
>>> +    bdrv_unref(target);
>>> +    bdrv_unref(top);
>>> +    return NULL;
>>> +    }
>>> +
>>> +    bdrv_set_aio_context(top, bdrv_get_aio_context(source));
>>> +    bdrv_set_aio_context(target, bdrv_get_aio_context(source));
>>> +
>>> +    bdrv_drained_begin(source);
>>> +
>>> +    bdrv_ref(top);
>>> +    bdrv_append(top, source, _err);
>>> +
>>> +    if (local_err) {
>>> +    bdrv_unref(top);
>>
>> This is done automatically by bdrv_append().
>>
>>> +    }
>>> +
>>> +    bdrv_drained_end(source);
>>> +
>>> +    if (local_err) {
>>> +    bdrv_unref_child(top, state->target);
>>> +    bdrv_unref(top);
>>> +    error_propagate(errp, local_err);
>>> +    return NULL;
>>> +    }
>>> +
>>> +    return top;
>>> +}
>>> +
>>> +void bdrv_backup_top_drop(BlockDriverState *bs)
>>> +{
>>> +    BDRVBackupTopState *s = bs->opaque;
>>> +
>>> +    AioContext *aio_context = bdrv_get_aio_context(bs);
>>> +
>>> +    aio_context_acquire(aio_context);
>>> +
>>> +    bdrv_drained_begin(bs);
>>> +
>>> +    bdrv_child_try_set_perm(bs->backing, 0, BLK_PERM_ALL, _abort);
>>> +    bdrv_replace_node(bs, backing_bs(bs), _abort);
>>> +    bdrv_set_backing_hd(bs, NULL, _abort);
>>
>> This is done automatically in bdrv_close(), and after bs has been
>> replaced by backing_bs(bs), I don't think new requests should come in,
>> so I don't think this needs to be done here.
> 
> Following movement of backup_top back to job->blk becomes impossible then,
> if we don't share WRITE on source in backup_top_child_perm.
> 
> And I think, this function should drop all relations created by
> bdrv_backup_top_append.
> 
>>
>>> +
>>> +    bdrv_drained_end(bs);
>>> +
>>> +    if (s->target) {
>>> +    bdrv_unref_child(bs, s->target);
>>> +    }
>>
>> And this should be done in a .bdrv_close() implementation, I think.
>>

and therefore this one too. We don't have .bdrv_open, so I'd prefer not
have bdrv_close. We create this child in _top_append, seems logical to
unref it in _top_drop.


-- 
Best regards,
Vladimir


Re: [Qemu-block] [PATCH v5 07/11] block: introduce backup-top filter driver

2019-04-13 Thread Vladimir Sementsov-Ogievskiy
16.01.2019 19:02, Max Reitz wrote:
> On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
>> Backup-top filter does copy-before-write operation. It should be
>> inserted above active disk and has a target node for CBW, like the
>> following:
>>
>>  +---+
>>  | Guest |
>>  +---+---+
>>  |r,w
>>  v
>>  +---+---+  target   +---+
>>  | backup_top|-->| target(qcow2) |
>>  +---+---+   CBW +---+---+
>>  |
>> backing |r,w
>>  v
>>  +---+-+
>>  | Active disk |
>>  +-+
>>
>> The driver will be used in backup instead of write-notifiers.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> ---
>>   block/backup-top.h  |  43 +++
>>   block/backup-top.c  | 306 
>>   block/Makefile.objs |   2 +
>>   3 files changed, 351 insertions(+)
>>   create mode 100644 block/backup-top.h
>>   create mode 100644 block/backup-top.c
> 

[..]

>> +BlockDriverState *bdrv_backup_top_append(BlockDriverState *source,
>> + BlockDriverState *target,
>> + HBitmap *copy_bitmap,
>> + Error **errp)
>> +{
>> +Error *local_err = NULL;
>> +BDRVBackupTopState *state;
>> +BlockDriverState *top = bdrv_new_open_driver(_backup_top_filter,
>> + NULL, BDRV_O_RDWR, errp);
>> +
>> +if (!top) {
>> +return NULL;
>> +}
>> +
>> +top->implicit = true;
>> +top->total_sectors = source->total_sectors;
>> +top->opaque = state = g_new0(BDRVBackupTopState, 1);
>> +state->copy_bitmap = copy_bitmap;
>> +
>> +bdrv_ref(target);
>> +state->target = bdrv_attach_child(top, target, "target", _file, 
>> errp);
>> +if (!state->target) {
>> +bdrv_unref(target);
>> +bdrv_unref(top);
>> +return NULL;
>> +}
>> +
>> +bdrv_set_aio_context(top, bdrv_get_aio_context(source));
>> +bdrv_set_aio_context(target, bdrv_get_aio_context(source));
>> +
>> +bdrv_drained_begin(source);
>> +
>> +bdrv_ref(top);
>> +bdrv_append(top, source, _err);
>> +
>> +if (local_err) {
>> +bdrv_unref(top);
> 
> This is done automatically by bdrv_append().
> 
>> +}
>> +
>> +bdrv_drained_end(source);
>> +
>> +if (local_err) {
>> +bdrv_unref_child(top, state->target);
>> +bdrv_unref(top);
>> +error_propagate(errp, local_err);
>> +return NULL;
>> +}
>> +
>> +return top;
>> +}
>> +
>> +void bdrv_backup_top_drop(BlockDriverState *bs)
>> +{
>> +BDRVBackupTopState *s = bs->opaque;
>> +
>> +AioContext *aio_context = bdrv_get_aio_context(bs);
>> +
>> +aio_context_acquire(aio_context);
>> +
>> +bdrv_drained_begin(bs);
>> +
>> +bdrv_child_try_set_perm(bs->backing, 0, BLK_PERM_ALL, _abort);
>> +bdrv_replace_node(bs, backing_bs(bs), _abort);
>> +bdrv_set_backing_hd(bs, NULL, _abort);
> 
> This is done automatically in bdrv_close(), and after bs has been
> replaced by backing_bs(bs), I don't think new requests should come in,
> so I don't think this needs to be done here.

Following movement of backup_top back to job->blk becomes impossible then,
if we don't share WRITE on source in backup_top_child_perm.

And I think, this function should drop all relations created by
bdrv_backup_top_append.

> 
>> +
>> +bdrv_drained_end(bs);
>> +
>> +if (s->target) {
>> +bdrv_unref_child(bs, s->target);
>> +}
> 
> And this should be done in a .bdrv_close() implementation, I think.
> 
> Max
> 
>> +bdrv_unref(bs);
>> +
>> +aio_context_release(aio_context);
>> +}
>> +


-- 
Best regards,
Vladimir


Re: [Qemu-block] [PATCH v5 07/11] block: introduce backup-top filter driver

2019-01-23 Thread Vladimir Sementsov-Ogievskiy
18.01.2019 15:05, Max Reitz wrote:
> On 17.01.19 13:13, Vladimir Sementsov-Ogievskiy wrote:
>> 16.01.2019 19:02, Max Reitz wrote:
>>> On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
 Backup-top filter does copy-before-write operation. It should be
 inserted above active disk and has a target node for CBW, like the
 following:

   +---+
   | Guest |
   +---+---+
   |r,w
   v
   +---+---+  target   +---+
   | backup_top|-->| target(qcow2) |
   +---+---+   CBW +---+---+
   |
 backing |r,w
   v
   +---+-+
   | Active disk |
   +-+

 The driver will be used in backup instead of write-notifiers.

 Signed-off-by: Vladimir Sementsov-Ogievskiy 
 ---
block/backup-top.h  |  43 +++
block/backup-top.c  | 306 
block/Makefile.objs |   2 +
3 files changed, 351 insertions(+)
create mode 100644 block/backup-top.h
create mode 100644 block/backup-top.c
>>>
>>> Looks good to me overall, comments below:
>>>
>>
>> [..]
>>
>>   >> +static coroutine_fn int backup_top_cbw(BlockDriverState *bs, uint64_t 
>> offset,
>>   >> +  uint64_t bytes)
>>
>> [..]
>>
 +
 +/*
 + * Features to be implemented:
 + * F3. parallelize copying loop
 + * F4. detect zeros
>>>
>>> Isn't there detect-zeroes for that?
>>
>> Hmm interesting note. I've placed F4 here, as we do have detecting zeros by 
>> hand in
>> current backup code. Should we drop it from backup, or switch to just 
>> enabling
>> detect-zeros on target?
> 
> Off the top of my head I don't know a reason why we wouldn't just set
> detect-zeroes.  Maybe because there may be other writers to target which
> should not use that setting?  But that scenario just seems wrong to me.
> 
 + * F5. use block_status ?
 + * F6. don't copy clusters which are already cached by COR [see F1]
 + * F7. if target is ram-cache and it is full, there should be a 
 possibility
 + * to drop not necessary data (cached by COR [see F1]) to handle 
 CBW
 + * fast.
>>>
>>> Another idea: Read all dirty data from bs->backing (in parallel, more or
>>> less), and then perform all writes (to bs->backing and s->target) in
>>> parallel as well.
>>>
>>> (So the writes do not have to wait on one another)
>>
>> Yes, we can continue guest write after read part of CBW, but we can't do it 
>> always,
>> as we want to limit RAM usage to store all these in-flight requests.
> 
> But this code here already allocates a buffer to cover all of the area.

Yes, like current backup do on write notifier. Yes, we don't have exact 
RAM-limit, and
possibly it should be fixed. But we have at least guarantee, that we don't have 
more
CBW requests than guest in-flight requests. And we can hope, that in bad case 
with slow
backup target and high enough write-load in guest, our backup will slow down 
guest writes,
and guest will not create more and more parallel requests, until the older ones 
finished.
Hmm, and I think we have limit for parallel guest requests.

But if we continue guest write after old data read into RAM buffer without any 
limit, it
will lead to uncontrolled growth of RAM usage.

> 
>> And I think,
>> in scheme with ram-cache, ram-cache itself should be actually a kind of 
>> storage
>> for in-flight requests. And in this terminology, if ram-cache is full, we 
>> can't
>> proceed with guest write. It's the same as we reached limit on in-flight 
>> requests
>> count.
>>
>> [..]
>>
 +
 +static int coroutine_fn backup_top_co_pdiscard(BlockDriverState *bs,
 +  int64_t offset, int 
 bytes)
>>>
>>> (Indentation)
>>
>> looks like it was after renaming
>> fleecing_hook
>> to
>> backup_top
>>
>> will fix all
> 
> Ah :-)
> 
>> [..]
>>
 +
 +static int coroutine_fn backup_top_co_flush(BlockDriverState *bs)
 +{
 +if (!bs->backing) {
 +return 0;
 +}
 +
 +return bdrv_co_flush(bs->backing->bs);
>>>
>>> Why not the target as well?
>>
>> Should we? It may be guest flush, why to flush backup target on it?
> 
> Hm...  Well, the current backup code didn't flush the target, and the
> mirror filter doesn't either.  OK then.
> 
> Max
> 
>> [..]
>>
 +static void backup_top_child_perm(BlockDriverState *bs, BdrvChild *c,
 +   const BdrvChildRole *role,
 +   BlockReopenQueue *reopen_queue,
 +   uint64_t perm, uint64_t shared,
 +   uint64_t *nperm, uint64_t *nshared)
>>>
>>> (Indentation)
>>>
 +{
 +

Re: [Qemu-block] [PATCH v5 07/11] block: introduce backup-top filter driver

2019-01-18 Thread Max Reitz
On 17.01.19 13:13, Vladimir Sementsov-Ogievskiy wrote:
> 16.01.2019 19:02, Max Reitz wrote:
>> On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
>>> Backup-top filter does copy-before-write operation. It should be
>>> inserted above active disk and has a target node for CBW, like the
>>> following:
>>>
>>>  +---+
>>>  | Guest |
>>>  +---+---+
>>>  |r,w
>>>  v
>>>  +---+---+  target   +---+
>>>  | backup_top|-->| target(qcow2) |
>>>  +---+---+   CBW +---+---+
>>>  |
>>> backing |r,w
>>>  v
>>>  +---+-+
>>>  | Active disk |
>>>  +-+
>>>
>>> The driver will be used in backup instead of write-notifiers.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>> ---
>>>   block/backup-top.h  |  43 +++
>>>   block/backup-top.c  | 306 
>>>   block/Makefile.objs |   2 +
>>>   3 files changed, 351 insertions(+)
>>>   create mode 100644 block/backup-top.h
>>>   create mode 100644 block/backup-top.c
>>
>> Looks good to me overall, comments below:
>>
> 
> [..]
> 
>  >> +static coroutine_fn int backup_top_cbw(BlockDriverState *bs, uint64_t 
> offset,
>  >> +  uint64_t bytes)
> 
> [..]
> 
>>> +
>>> +/*
>>> + * Features to be implemented:
>>> + * F3. parallelize copying loop
>>> + * F4. detect zeros
>>
>> Isn't there detect-zeroes for that?
> 
> Hmm interesting note. I've placed F4 here, as we do have detecting zeros by 
> hand in
> current backup code. Should we drop it from backup, or switch to just enabling
> detect-zeros on target?

Off the top of my head I don't know a reason why we wouldn't just set
detect-zeroes.  Maybe because there may be other writers to target which
should not use that setting?  But that scenario just seems wrong to me.

>>> + * F5. use block_status ?
>>> + * F6. don't copy clusters which are already cached by COR [see F1]
>>> + * F7. if target is ram-cache and it is full, there should be a 
>>> possibility
>>> + * to drop not necessary data (cached by COR [see F1]) to handle 
>>> CBW
>>> + * fast.
>>
>> Another idea: Read all dirty data from bs->backing (in parallel, more or
>> less), and then perform all writes (to bs->backing and s->target) in
>> parallel as well.
>>
>> (So the writes do not have to wait on one another)
> 
> Yes, we can continue guest write after read part of CBW, but we can't do it 
> always,
> as we want to limit RAM usage to store all these in-flight requests.

But this code here already allocates a buffer to cover all of the area.

> And I think,
> in scheme with ram-cache, ram-cache itself should be actually a kind of 
> storage
> for in-flight requests. And in this terminology, if ram-cache is full, we 
> can't
> proceed with guest write. It's the same as we reached limit on in-flight 
> requests
> count.
> 
> [..]
> 
>>> +
>>> +static int coroutine_fn backup_top_co_pdiscard(BlockDriverState *bs,
>>> +  int64_t offset, int 
>>> bytes)
>>
>> (Indentation)
> 
> looks like it was after renaming
> fleecing_hook
> to
> backup_top
> 
> will fix all

Ah :-)

> [..]
> 
>>> +
>>> +static int coroutine_fn backup_top_co_flush(BlockDriverState *bs)
>>> +{
>>> +if (!bs->backing) {
>>> +return 0;
>>> +}
>>> +
>>> +return bdrv_co_flush(bs->backing->bs);
>>
>> Why not the target as well?
> 
> Should we? It may be guest flush, why to flush backup target on it?

Hm...  Well, the current backup code didn't flush the target, and the
mirror filter doesn't either.  OK then.

Max

> [..]
> 
>>> +static void backup_top_child_perm(BlockDriverState *bs, BdrvChild *c,
>>> +   const BdrvChildRole *role,
>>> +   BlockReopenQueue *reopen_queue,
>>> +   uint64_t perm, uint64_t shared,
>>> +   uint64_t *nperm, uint64_t *nshared)
>>
>> (Indentation)
>>
>>> +{
>>> +bdrv_filter_default_perms(bs, c, role, reopen_queue, perm, shared, 
>>> nperm,
>>> +  nshared);
>>> +
>>> +if (role == _file) {
>>> +/*
>>> + * share write to target, to not interfere guest writes to it's 
>>> disk
>>
>> *interfere with guest writes to its disk
>>
>>> + * which will be in target backing chain
>>> + */
>>> +*nshared = *nshared | BLK_PERM_WRITE;
>>> +*nperm = *nperm | BLK_PERM_WRITE;
>>
>> Why do you need to take the write permission?  This filter does not
>> perform any writes unless it receives writes, right?  (So
>> bdrv_filter_default_perms() should take care of this.)
> 
> Your commit shed a bit of light on permission system for me) Ok, I'll check, 
> if
> that work without PERM_WRITE here.
> 
>>
>>> +} else {
>>> +*nperm = *nperm | 

Re: [Qemu-block] [PATCH v5 07/11] block: introduce backup-top filter driver

2019-01-17 Thread Vladimir Sementsov-Ogievskiy
16.01.2019 19:02, Max Reitz wrote:
> On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
>> Backup-top filter does copy-before-write operation. It should be
>> inserted above active disk and has a target node for CBW, like the
>> following:
>>
>>  +---+
>>  | Guest |
>>  +---+---+
>>  |r,w
>>  v
>>  +---+---+  target   +---+
>>  | backup_top|-->| target(qcow2) |
>>  +---+---+   CBW +---+---+
>>  |
>> backing |r,w
>>  v
>>  +---+-+
>>  | Active disk |
>>  +-+
>>
>> The driver will be used in backup instead of write-notifiers.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> ---
>>   block/backup-top.h  |  43 +++
>>   block/backup-top.c  | 306 
>>   block/Makefile.objs |   2 +
>>   3 files changed, 351 insertions(+)
>>   create mode 100644 block/backup-top.h
>>   create mode 100644 block/backup-top.c
> 
> Looks good to me overall, comments below:
> 

[..]

 >> +static coroutine_fn int backup_top_cbw(BlockDriverState *bs, uint64_t 
 >> offset,
 >> +  uint64_t bytes)

[..]

>> +
>> +/*
>> + * Features to be implemented:
>> + * F3. parallelize copying loop
>> + * F4. detect zeros
> 
> Isn't there detect-zeroes for that?

Hmm interesting note. I've placed F4 here, as we do have detecting zeros by 
hand in
current backup code. Should we drop it from backup, or switch to just enabling
detect-zeros on target?

> 
>> + * F5. use block_status ?
>> + * F6. don't copy clusters which are already cached by COR [see F1]
>> + * F7. if target is ram-cache and it is full, there should be a 
>> possibility
>> + * to drop not necessary data (cached by COR [see F1]) to handle CBW
>> + * fast.
> 
> Another idea: Read all dirty data from bs->backing (in parallel, more or
> less), and then perform all writes (to bs->backing and s->target) in
> parallel as well.
> 
> (So the writes do not have to wait on one another)

Yes, we can continue guest write after read part of CBW, but we can't do it 
always,
as we want to limit RAM usage to store all these in-flight requests. And I 
think,
in scheme with ram-cache, ram-cache itself should be actually a kind of storage
for in-flight requests. And in this terminology, if ram-cache is full, we can't
proceed with guest write. It's the same as we reached limit on in-flight 
requests
count.

[..]

>> +
>> +static int coroutine_fn backup_top_co_pdiscard(BlockDriverState *bs,
>> +  int64_t offset, int bytes)
> 
> (Indentation)

looks like it was after renaming
fleecing_hook
to
backup_top

will fix all

[..]

>> +
>> +static int coroutine_fn backup_top_co_flush(BlockDriverState *bs)
>> +{
>> +if (!bs->backing) {
>> +return 0;
>> +}
>> +
>> +return bdrv_co_flush(bs->backing->bs);
> 
> Why not the target as well?

Should we? It may be guest flush, why to flush backup target on it?

[..]

>> +static void backup_top_child_perm(BlockDriverState *bs, BdrvChild *c,
>> +   const BdrvChildRole *role,
>> +   BlockReopenQueue *reopen_queue,
>> +   uint64_t perm, uint64_t shared,
>> +   uint64_t *nperm, uint64_t *nshared)
> 
> (Indentation)
> 
>> +{
>> +bdrv_filter_default_perms(bs, c, role, reopen_queue, perm, shared, 
>> nperm,
>> +  nshared);
>> +
>> +if (role == _file) {
>> +/*
>> + * share write to target, to not interfere guest writes to it's disk
> 
> *interfere with guest writes to its disk
> 
>> + * which will be in target backing chain
>> + */
>> +*nshared = *nshared | BLK_PERM_WRITE;
>> +*nperm = *nperm | BLK_PERM_WRITE;
> 
> Why do you need to take the write permission?  This filter does not
> perform any writes unless it receives writes, right?  (So
> bdrv_filter_default_perms() should take care of this.)

Your commit shed a bit of light on permission system for me) Ok, I'll check, if
that work without PERM_WRITE here.

> 
>> +} else {
>> +*nperm = *nperm | BLK_PERM_CONSISTENT_READ;
>> +}
>> +}
>> +

Thank you! For other comments: argee or will-try


-- 
Best regards,
Vladimir


Re: [Qemu-block] [PATCH v5 07/11] block: introduce backup-top filter driver

2019-01-16 Thread Max Reitz
On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
> Backup-top filter does copy-before-write operation. It should be
> inserted above active disk and has a target node for CBW, like the
> following:
> 
> +---+
> | Guest |
> +---+---+
> |r,w
> v
> +---+---+  target   +---+
> | backup_top|-->| target(qcow2) |
> +---+---+   CBW +---+---+
> |
> backing |r,w
> v
> +---+-+
> | Active disk |
> +-+
> 
> The driver will be used in backup instead of write-notifiers.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/backup-top.h  |  43 +++
>  block/backup-top.c  | 306 
>  block/Makefile.objs |   2 +
>  3 files changed, 351 insertions(+)
>  create mode 100644 block/backup-top.h
>  create mode 100644 block/backup-top.c

Looks good to me overall, comments below:

> diff --git a/block/backup-top.h b/block/backup-top.h
> new file mode 100644
> index 00..abe4dda574
> --- /dev/null
> +++ b/block/backup-top.h
> @@ -0,0 +1,43 @@
> +/*
> + * backup-top filter driver
> + *
> + * The driver performs Copy-Before-Write (CBW) operation: it is injected 
> above
> + * some node, and before each write it copies _old_ data to the target node.
> + *
> + * Copyright (c) 2018 Virtuozzo International GmbH. All rights reserved.
> + *
> + * Author:
> + *  Sementsov-Ogievskiy Vladimir 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see .
> + */
> +

Include guards are missing here.

> +#include "qemu/osdep.h"
> +
> +#include "block/block_int.h"
> +
> +typedef struct BDRVBackupTopState {
> +HBitmap *copy_bitmap; /* what should be copied to @target on guest 
> write. */
> +BdrvChild *target;
> +
> +uint64_t bytes_copied;
> +} BDRVBackupTopState;
> +
> +void bdrv_backup_top_drop(BlockDriverState *bs);
> +uint64_t bdrv_backup_top_progress(BlockDriverState *bs);
> +
> +BlockDriverState *bdrv_backup_top_append(BlockDriverState *source,
> + BlockDriverState *target,
> + HBitmap *copy_bitmap,
> + Error **errp);

A comment describing at least parts of the interface would be nice; for
instance, that @target will be forced into the same AioContext as
@source, or that the backup-top node will be an implicit node.

> diff --git a/block/backup-top.c b/block/backup-top.c
> new file mode 100644
> index 00..0e7b3e3142
> --- /dev/null
> +++ b/block/backup-top.c
> @@ -0,0 +1,306 @@
> +/*
> + * backup-top filter driver
> + *
> + * The driver performs Copy-Before-Write (CBW) operation: it is injected 
> above
> + * some node, and before each write it copies _old_ data to the target node.
> + *
> + * Copyright (c) 2018 Virtuozzo International GmbH. All rights reserved.
> + *
> + * Author:
> + *  Sementsov-Ogievskiy Vladimir 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see .
> + */
> +
> +#include "qemu/osdep.h"
> +
> +#include "qemu/cutils.h"
> +#include "qapi/error.h"
> +#include "block/block_int.h"
> +#include "block/qdict.h"
> +
> +#include "block/backup-top.h"
> +
> +static coroutine_fn int backup_top_co_preadv(
> +BlockDriverState *bs, uint64_t offset, uint64_t bytes,
> +QEMUIOVector *qiov, int flags)
> +{
> +/*
> + * Features to be implemented:
> + * F1. COR. save read data to fleecing target for fast access
> + * (to reduce reads). This possibly may be done with use of 
> copy-on-read
> + * filter, but we need an ability to make COR requests optional: for
> + * example, if target is a ram-cache, and if it 

[Qemu-block] [PATCH v5 07/11] block: introduce backup-top filter driver

2018-12-29 Thread Vladimir Sementsov-Ogievskiy
Backup-top filter does copy-before-write operation. It should be
inserted above active disk and has a target node for CBW, like the
following:

+---+
| Guest |
+---+---+
|r,w
v
+---+---+  target   +---+
| backup_top|-->| target(qcow2) |
+---+---+   CBW +---+---+
|
backing |r,w
v
+---+-+
| Active disk |
+-+

The driver will be used in backup instead of write-notifiers.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/backup-top.h  |  43 +++
 block/backup-top.c  | 306 
 block/Makefile.objs |   2 +
 3 files changed, 351 insertions(+)
 create mode 100644 block/backup-top.h
 create mode 100644 block/backup-top.c

diff --git a/block/backup-top.h b/block/backup-top.h
new file mode 100644
index 00..abe4dda574
--- /dev/null
+++ b/block/backup-top.h
@@ -0,0 +1,43 @@
+/*
+ * backup-top filter driver
+ *
+ * The driver performs Copy-Before-Write (CBW) operation: it is injected above
+ * some node, and before each write it copies _old_ data to the target node.
+ *
+ * Copyright (c) 2018 Virtuozzo International GmbH. All rights reserved.
+ *
+ * Author:
+ *  Sementsov-Ogievskiy Vladimir 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see .
+ */
+
+#include "qemu/osdep.h"
+
+#include "block/block_int.h"
+
+typedef struct BDRVBackupTopState {
+HBitmap *copy_bitmap; /* what should be copied to @target on guest write. 
*/
+BdrvChild *target;
+
+uint64_t bytes_copied;
+} BDRVBackupTopState;
+
+void bdrv_backup_top_drop(BlockDriverState *bs);
+uint64_t bdrv_backup_top_progress(BlockDriverState *bs);
+
+BlockDriverState *bdrv_backup_top_append(BlockDriverState *source,
+ BlockDriverState *target,
+ HBitmap *copy_bitmap,
+ Error **errp);
diff --git a/block/backup-top.c b/block/backup-top.c
new file mode 100644
index 00..0e7b3e3142
--- /dev/null
+++ b/block/backup-top.c
@@ -0,0 +1,306 @@
+/*
+ * backup-top filter driver
+ *
+ * The driver performs Copy-Before-Write (CBW) operation: it is injected above
+ * some node, and before each write it copies _old_ data to the target node.
+ *
+ * Copyright (c) 2018 Virtuozzo International GmbH. All rights reserved.
+ *
+ * Author:
+ *  Sementsov-Ogievskiy Vladimir 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see .
+ */
+
+#include "qemu/osdep.h"
+
+#include "qemu/cutils.h"
+#include "qapi/error.h"
+#include "block/block_int.h"
+#include "block/qdict.h"
+
+#include "block/backup-top.h"
+
+static coroutine_fn int backup_top_co_preadv(
+BlockDriverState *bs, uint64_t offset, uint64_t bytes,
+QEMUIOVector *qiov, int flags)
+{
+/*
+ * Features to be implemented:
+ * F1. COR. save read data to fleecing target for fast access
+ * (to reduce reads). This possibly may be done with use of 
copy-on-read
+ * filter, but we need an ability to make COR requests optional: for
+ * example, if target is a ram-cache, and if it is full now, we should
+ * skip doing COR request, as it is actually not necessary.
+ *
+ * F2. Feature for guest: read from fleecing target if data is in ram-cache
+ * and is unchanged
+ */
+
+return bdrv_co_preadv(bs->backing, offset, bytes, qiov, flags);
+}
+
+static coroutine_fn int backup_top_cbw(BlockDriverState *bs, uint64_t offset,
+  uint64_t bytes)
+{
+int ret = 0;
+BDRVBackupTopState *s = bs->opaque;
+uint64_t gran = 1UL << hbitmap_granularity(s->copy_bitmap);
+uint64_t end = QEMU_ALIGN_UP(offset +