from:"Wen Congyang"

Re: [Qemu-block] [PATCH v12 2/3] quorum: implement bdrv_add_child() and bdrv_del_child()

2016-04-06 Thread Wen Congyang

On 04/01/2016 11:20 PM, Max Reitz wrote:
> On 31.03.2016 13:42, Alberto Garcia wrote:
>> On Wed 30 Mar 2016 05:07:15 PM CEST, Max Reitz wrote:
>>>> I also have another (not directly related) question: why not simply
>>>> use the node name when removing children? I understood that the idea
>>>> was that it's possible to have the same node attached twice to the
>>>> same Quorum, but can you actually do that? And what's the use case?
>>>
>>> What I like about using the child role name is that it automatically
>>> prevents you from specifying a node that is not a child of the given
>>> parent.
>>
>> Right, but checking if a node is not a child and returning an error is
>> very simple. And it doesn't require the user to keep track of the node
>> name *and* the child role name.
> 
> Yes. But I think that you need to know parent and child anyway if you
> want to modify (delete) an edge in the graph.
> 
> Also, it may be possible to have multiple parents per node. Actually, it
> is already possible because the BB-BDS relationship is modeled as a
> parent-child relationship. Thus, I'm not sure whether it would be
> sufficient to specify a single node if you want to delete a single edge.
> 
>> Unless I'm forgetting something this would be the first time we expose
>> the child role name in the API, that's why I'm wondering if it's
>> something worth doing.
> 
> Well, the roles are kind of exposed already. It's exactly what you
> specify in -drive or blockdev-add.
> 
>>> Which makes me notice that it might be a good idea to require the user
>>> to specify the child's role when adding a new child. In this version
>>> of this series (where only quorum is supported), the children are just
>>> inserted in numerical order (first free slot is taken first), but
>>> maybe the user wants to insert them in a different order.
>>
>> For the Quorum case it totally makes sense to let the user choose the
>> position of the new child.
>>
>> But for creating a Quorum array in the first place we don't require
>> that, the order is the one that the user provides, and the user does not
>> need to know about the child role names at that point.
> 
> Depends. If you create an empty quorum BDS and then add the children
> using the QAPI command introduced in this series, you are right. But if
> you add children along with creating the quorum BDS (be it via -drive or
> via blockdev-add), one has to specify the child role names.

I think the problem is that: the child role name is wrong.

If we always attach the child in the tail, we can do it like this:
the child role name is children.XXX, and the XXX's value is larger than
any child role name's XXX.

For example:
Quorum has one child: children.1(children.0 is removed)
We add a new child, its role name is children.2, not children.0.

If we want to attach the child not in the tail, for example:
Quorum has two children: children.0, children.1. And the new child should
be before children.1. In this case, we should rename children.1 to children.2
and the new child role name can be children.1. If we allow such usage, we
should rename the other child role name when add/deleting a child. It means
that we should query the role name again after add/deleting a child.

Thanks
Wen Congyang

> 
> Max
>

Re: [Qemu-block] [RFC for-2.7 1/1] block/qapi: Add query-block-node-tree

2016-03-25 Thread Wen Congyang

On 03/25/2016 03:07 AM, Max Reitz wrote:
> This command returns the tree of BlockDriverStates under a given root
> node.
> 
> Every tree node is described by its node name and the connection of a
> parent node to its children additionally contains the role the child
> assumes.
> 
> A node's name can then be used e.g. in conjunction with
> query-named-block-nodes to get more information about the node.

I found another problem:

{'execute': 'query-block-node-tree', 'arguments': {'root-node': 'disk1' } }
{"return": {"children": [{"role": "children.1", "node": {"children": [{"role": 
"file", "node": {}}], "node-name": "test1"}}, {"role": "children.0", "node": 
{"children": [{"role": "file", "node": {}}]}}]}}

s->children[0] is children.0, and s->children[1] is children.1.
But we output them in reverse order. The reason is:

BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
 BlockDriverState *child_bs,
 const char *child_name,
 const BdrvChildRole *child_role)
{
    BdrvChild *child = bdrv_root_attach_child(child_bs, child_name, child_role);
QLIST_INSERT_HEAD(_bs->children, child, next);
return child;
}

We insert the new child to the head, not the tail...

Thanks
Wen Congyang

> 
> Signed-off-by: Max Reitz <mre...@redhat.com>
> ---
>  block/qapi.c | 43 +++
>  qapi/block-core.json | 46 ++
>  qmp-commands.hx  | 38 ++
>  3 files changed, 127 insertions(+)
> 
> diff --git a/block/qapi.c b/block/qapi.c
> index 6a4869a..a35d32b 100644
> --- a/block/qapi.c
> +++ b/block/qapi.c
> @@ -493,6 +493,49 @@ BlockInfoList *qmp_query_block(Error **errp)
>  return head;
>  }
>  
> +static BlockNodeTreeNode *qmp_query_block_node_tree_by_bs(BlockDriverState 
> *bs)
> +{
> +BlockNodeTreeNode *bntn;
> +BlockNodeTreeChildList **p_next;
> +BdrvChild *child;
> +
> +bntn = g_new0(BlockNodeTreeNode, 1);
> +
> +bntn->node_name = g_strdup(bdrv_get_node_name(bs));
> +bntn->has_node_name = bntn->node_name;
> +
> +p_next = >children;
> +QLIST_FOREACH(child, >children, next) {
> +BlockNodeTreeChild *bntc;
> +
> +bntc = g_new(BlockNodeTreeChild, 1);
> +*bntc = (BlockNodeTreeChild){
> +.role = g_strdup(child->name),
> +.node = qmp_query_block_node_tree_by_bs(child->bs),
> +};
> +
> +*p_next = g_new0(BlockNodeTreeChildList, 1);
> +(*p_next)->value = bntc;
> +p_next = &(*p_next)->next;
> +}
> +
> +*p_next = NULL;
> +return bntn;
> +}
> +
> +BlockNodeTreeNode *qmp_query_block_node_tree(const char *root_node,
> + Error **errp)
> +{
> +BlockDriverState *bs;
> +
> +bs = bdrv_lookup_bs(root_node, root_node, errp);
> +if (!bs) {
> +return NULL;
> +}
> +
> +return qmp_query_block_node_tree_by_bs(bs);
> +}
> +
>  static bool next_query_bds(BlockBackend **blk, BlockDriverState **bs,
> bool query_nodes)
>  {
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index b1cf77d..754ccd6 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -470,6 +470,52 @@
>  
>  
>  ##
> +# @BlockNodeTreeNode:
> +#
> +# Describes a node in the block node graph.
> +#
> +# @node-name: If present, the node's name.
> +#
> +# @children:  List of the node's children.
> +#
> +# Since: 2.7
> +##
> +{ 'struct': 'BlockNodeTreeNode',
> +  'data': { '*node-name': 'str',
> +'children': ['BlockNodeTreeChild'] } }
> +
> +##
> +# @BlockNodeTreeChild:
> +#
> +# Describes a child node in the block node graph.
> +#
> +# @role: Role the child assumes for its parent, e.g. "file" or "backing".
> +#
> +# @node: The child node's BlockNodeTreeNode structure.
> +#
> +# Since: 2.7
> +##
> +{ 'struct': 'BlockNodeTreeChild',
> +  'data': { 'role': 'str',
> +'node': 'BlockNodeTreeNode' } }
> +
> +##
> +# @query-block-node-tree:
> +#
> +# Queries the tree of nodes under a given node in the block graph.
> +#
> +# @root-node: Node name or device name of the tree's root node.
> +#
> +# Returns: The root node's BlockNodeTreeNode structure.
> +#
> +# Since: 2.7
> +##
> +{ 'command': 'query-block-node-tree',
>

Re: [Qemu-block] [RFC for-2.7 1/1] block/qapi: Add query-block-node-tree

2016-03-24 Thread Wen Congyang

On 03/25/2016 03:07 AM, Max Reitz wrote:
> This command returns the tree of BlockDriverStates under a given root
> node.
> 
> Every tree node is described by its node name and the connection of a
> parent node to its children additionally contains the role the child
> assumes.
> 
> A node's name can then be used e.g. in conjunction with
> query-named-block-nodes to get more information about the node.

I test this patch, and it works.
{'execute': 'query-block-node-tree', 'arguments': {'root-node': 'disk1' } }
{"return": {"children": [{"role": "children.0", "node": {"children": [{"role": 
"file", "node": {"children": [], "node-name": "#block175"}}], "node-name": 
"#block267"}}], "node-name": "#block040"}}

Shoule we hide the node name like "#blockxxx"?
If the bs doesn't have any child, should we output: '"children": [], '?

Can we add a new parameter: depth? For example, If I only want to know the 
quorum's
child name, we can limit the depth, and the output may be very clear.

Thanks
Wen Congyang

> 
> Signed-off-by: Max Reitz <mre...@redhat.com>
> ---
>  block/qapi.c | 43 +++
>  qapi/block-core.json | 46 ++
>  qmp-commands.hx  | 38 ++
>  3 files changed, 127 insertions(+)
> 
> diff --git a/block/qapi.c b/block/qapi.c
> index 6a4869a..a35d32b 100644
> --- a/block/qapi.c
> +++ b/block/qapi.c
> @@ -493,6 +493,49 @@ BlockInfoList *qmp_query_block(Error **errp)
>  return head;
>  }
>  
> +static BlockNodeTreeNode *qmp_query_block_node_tree_by_bs(BlockDriverState 
> *bs)
> +{
> +BlockNodeTreeNode *bntn;
> +BlockNodeTreeChildList **p_next;
> +BdrvChild *child;
> +
> +bntn = g_new0(BlockNodeTreeNode, 1);
> +
> +bntn->node_name = g_strdup(bdrv_get_node_name(bs));
> +bntn->has_node_name = bntn->node_name;
> +
> +p_next = >children;
> +QLIST_FOREACH(child, >children, next) {
> +BlockNodeTreeChild *bntc;
> +
> +bntc = g_new(BlockNodeTreeChild, 1);
> +*bntc = (BlockNodeTreeChild){
> +.role = g_strdup(child->name),
> +.node = qmp_query_block_node_tree_by_bs(child->bs),
> +};
> +
> +*p_next = g_new0(BlockNodeTreeChildList, 1);
> +(*p_next)->value = bntc;
> +p_next = &(*p_next)->next;
> +}
> +
> +*p_next = NULL;
> +return bntn;
> +}
> +
> +BlockNodeTreeNode *qmp_query_block_node_tree(const char *root_node,
> + Error **errp)
> +{
> +BlockDriverState *bs;
> +
> +bs = bdrv_lookup_bs(root_node, root_node, errp);
> +if (!bs) {
> +return NULL;
> +}
> +
> +return qmp_query_block_node_tree_by_bs(bs);
> +}
> +
>  static bool next_query_bds(BlockBackend **blk, BlockDriverState **bs,
> bool query_nodes)
>  {
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index b1cf77d..754ccd6 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -470,6 +470,52 @@
>  
>  
>  ##
> +# @BlockNodeTreeNode:
> +#
> +# Describes a node in the block node graph.
> +#
> +# @node-name: If present, the node's name.
> +#
> +# @children:  List of the node's children.
> +#
> +# Since: 2.7
> +##
> +{ 'struct': 'BlockNodeTreeNode',
> +  'data': { '*node-name': 'str',
> +'children': ['BlockNodeTreeChild'] } }
> +
> +##
> +# @BlockNodeTreeChild:
> +#
> +# Describes a child node in the block node graph.
> +#
> +# @role: Role the child assumes for its parent, e.g. "file" or "backing".
> +#
> +# @node: The child node's BlockNodeTreeNode structure.
> +#
> +# Since: 2.7
> +##
> +{ 'struct': 'BlockNodeTreeChild',
> +  'data': { 'role': 'str',
> +'node': 'BlockNodeTreeNode' } }
> +
> +##
> +# @query-block-node-tree:
> +#
> +# Queries the tree of nodes under a given node in the block graph.
> +#
> +# @root-node: Node name or device name of the tree's root node.
> +#
> +# Returns: The root node's BlockNodeTreeNode structure.
> +#
> +# Since: 2.7
> +##
> +{ 'command': 'query-block-node-tree',
> +  'data': { 'root-node': 'str' },
> +  'returns': 'BlockNodeTreeNode' }
> +
> +
> +##
>  # @BlockDeviceTimedStats:
>  #
>  # Statistics of a block device during a given interval of time.
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index 9e05365..5c404aa

Re: [Qemu-block] [PATCH v16 0/8] Block replication for continuous checkpoints

2016-03-24 Thread Wen Congyang

Ping


On 03/11/2016 06:34 PM, Changlong Xie wrote:
> Block replication is a very important feature which is used for
> continuous checkpoints(for example: COLO).
> 
> You can get the detailed information about block replication from here:
> http://wiki.qemu.org/Features/BlockReplication
> 
> Usage:
> Please refer to docs/block-replication.txt
> 
> This patch series is based on the following patch series:
> 1. http://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg02319.html 
> 
> Patch status:
> 1. Acked patches: none 
> 2. Reviewed patches: patch 4
> 3. Updated patches: patch 7, 8
> 
> You can get the patch here:
> https://github.com/Pating/qemu/tree/changlox/block-replication-v16
> 
> You can get the patch with framework here:
> https://github.com/Pating/qemu/tree/changlox/colo_framework_v15
> 
> TODO:
> 1. Continuous block replication. It will be started after basic functions
>are accepted.
> 
> Changs Log:
> V16:
> 1. Rebase to the newest codes
> 2. Address comments from Stefan & hailiang
> p3: we don't need this patch now
> p4: add "top-id" parameters for secondary
> p6: fix NULL pointer in replication callbacks, remove unnecessary typedefs, 
> add doc comments that explain the semantics of Replication
> p7: Refactor AioContext for thread-safe, remove unnecessary get_top_bs()
> *Note*: I'm working on replication testcase now, will send out in V17
> V15:
> 1. Rebase to the newest codes
> 2. Fix typos and coding style addresed Eric's comments
> 3. Address Stefan's comments
>1) Make backup_do_checkpoint public, drop the changes on BlockJobDriver
>2) Update the message and description for [PATCH 4/9]
>3) Make replication_(start/stop/do_checkpoint)_all as global interfaces
>4) Introduce AioContext lock to protect start/stop/do_checkpoint callbacks
>5) Use BdrvChild instead of holding on to BlockDriverState * pointers
> 4. Clear BDRV_O_INACTIVE for hidden disk's open_flags since commit 09e0c771  
> 5. Introduce replication_get_error_all to check replication status
> 6. Remove useless discard interface
> V14:
> 1. Implement auto complete active commit
> 2. Implement active commit block job for replication.c
> 3. Address the comments from Stefan, add replication-specific API and data
>structure, also remove old block layer APIs
> V13:
> 1. Rebase to the newest codes
> 2. Remove redundant marcos and semicolon in replication.c 
> 3. Fix typos in block-replication.txt
> V12:
> 1. Rebase to the newest codes
> 2. Use backing reference to replcace 'allow-write-backing-file'
> V11:
> 1. Reopen the backing file when starting blcok replication if it is not
>opened in R/W mode
> 2. Unblock BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET
>when opening backing file
> 3. Block the top BDS so there is only one block job for the top BDS and
>its backing chain.
> V10:
> 1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
>reference.
> 2. Address the comments from Eric Blake
> V9:
> 1. Update the error messages
> 2. Rebase to the newest qemu
> 3. Split child add/delete support. These patches are sent in another patchset.
> V8:
> 1. Address Alberto Garcia's comments
> V7:
> 1. Implement adding/removing quorum child. Remove the option non-connect.
> 2. Simplify the backing refrence option according to Stefan Hajnoczi's 
> suggestion
> V6:
> 1. Rebase to the newest qemu.
> V5:
> 1. Address the comments from Gong Lei
> 2. Speed the failover up. The secondary vm can take over very quickly even
>if there are too many I/O requests.
> V4:
> 1. Introduce a new driver replication to avoid touch nbd and qcow2.
> V3:
> 1: use error_setg() instead of error_set()
> 2. Add a new block job API
> 3. Active disk, hidden disk and nbd target uses the same AioContext
> 4. Add a testcase to test new hbitmap API
> V2:
> 1. Redesign the secondary qemu(use image-fleecing)
> 2. Use Error objects to return error message
> 3. Address the comments from Max Reitz and Eric Blake
> 
> Changlong Xie (1):
>   Introduce new APIs to do replication operation
> 
> Wen Congyang (7):
>   unblock backup operations in backing file
>   Backup: clear all bitmap when doing block checkpoint
>   Link backup into block core
>   docs: block replication's description
>   auto complete active commit
>   Implement new driver for block replication
>   support replication driver in blockdev-add
> 
>  Makefile.objs  |   1 +
>  block.c|  18 ++
>  block/Makefile.objs|   3 +-
>  block/backup.c |  15 ++
>  block/mirror.c |  13 +-
>  block/replication.c| 6

Re: [Qemu-block] [PATCH v12 2/3] quorum: implement bdrv_add_child() and bdrv_del_child()

2016-03-19 Thread Wen Congyang

On 03/17/2016 06:07 PM, Alberto Garcia wrote:
> On Thu 17 Mar 2016 10:56:09 AM CET, Wen Congyang wrote:
>>> We should have the failure modes documented, and how you'll use it
>>> after failover etc Without that it's really difficult to tell if this
>>> naming is right.
>>
>> For COLO, children.0 is the real disk, children.1 is replication
>> driver.  After failure, children.1 will be removed by the user. If we
>> want to continue do COLO, we need add a new children.1 again.
> 
> What if children.0 fails ?

For COLO, reading from children.1 always fails. if children.0 fails, it
means that reading from the disk fails. The guest vm will see the I/O error.

Thanks
Wen Congyang

> 
> Berto
> 
> 
> .
>

Re: [Qemu-block] [PATCH v12 2/3] quorum: implement bdrv_add_child() and bdrv_del_child()

2016-03-19 Thread Wen Congyang

On 03/17/2016 07:25 PM, Dr. David Alan Gilbert wrote:
> * Wen Congyang (we...@cn.fujitsu.com) wrote:
>> On 03/17/2016 06:07 PM, Alberto Garcia wrote:
>>> On Thu 17 Mar 2016 10:56:09 AM CET, Wen Congyang wrote:
>>>>> We should have the failure modes documented, and how you'll use it
>>>>> after failover etc Without that it's really difficult to tell if this
>>>>> naming is right.
>>>>
>>>> For COLO, children.0 is the real disk, children.1 is replication
>>>> driver.  After failure, children.1 will be removed by the user. If we
>>>> want to continue do COLO, we need add a new children.1 again.
>>>
>>> What if children.0 fails ?
>>
>> For COLO, reading from children.1 always fails. if children.0 fails, it
>> means that reading from the disk fails. The guest vm will see the I/O error.
> 
> How do we get that to cause a fail over before the guest detects it?
> If the primary's local disk (children.0) fails then if we can failover
> at that point then the guest carries running on the secondary without
> ever knowing about the failure.

COLO is not designed for such case. The children.0 can also be quorum, so
you can add more than one real disk, and get more reliability. Another
choice is that, the real disk is an external storage, and it has
its own replication solution.

COLO is designed for such case: the host is crashed, and the guest is still
alive after failover, the client doesn't know this event.

Thanks
Wen Congyang

> 
> Dave
> 
>>
>> Thanks
>> Wen Congyang
>>
>>>
>>> Berto
>>>
>>>
>>> .
>>>
>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> 
> 
> .
>

Re: [Qemu-block] [PATCH v12 2/3] quorum: implement bdrv_add_child() and bdrv_del_child()

2016-03-19 Thread Wen Congyang

On 03/17/2016 05:10 PM, Alberto Garcia wrote:
> On Thu 17 Mar 2016 02:22:40 AM CET, Wen Congyang <we...@cn.fujitsu.com> wrote:
>>>>>> @@ -81,6 +82,8 @@ typedef struct BDRVQuorumState {
>>>>>>   bool rewrite_corrupted;/* true if the driver must rewrite-on-read 
>>>>>> corrupted
>>>>>>   * block if Quorum is reached.
>>>>>>   */
>>>>>> +unsigned long *index_bitmap;
>>>>
>>>> Hi Berto
>>>>
>>>> *NOTE*, In the old version, we just used "bs->node_name", but in the
>>>> lastest one, as Kevin suggested we introduce
>>>> "child->child_name"(formart as "children.xxx"), this is the key cause
>>>> why we need this two functions here.
>>>
>>> I'm sorry I missed this discussion earlier. Your code seems technically
>>> correct but I have several questions:
>>>
>>> - I read that one of the reasons for this change is that "In theory, the
>>>   same node could be attached twice to the same parent in different
>>>   roles.". Is there any example of that? What's the use case?
>>
>> Kevin may know the case.
> 
> Kevin, do you have an example?
> 
>>> - How do you obtain the child name?
>>
>> IIRC, the answer is no now. I think we can improve 'info block' output
> 
> Okay, but then we should extend that first, otherwise this API cannot be
> used.
> 
>>> - I see that if you have children.0 and children.1 (let's say hd0.qcow2
>>>   and hd1.qcow2), then you remove children.0 and add it again, it will
>>>   keep the 'children.0' name (that's what the bitmap is for if I'm
>>>   understanding it correctly). However the position in the s->children
>>>   array will change because you do memmove() when you remove children.0
>>>   and then add it again to the end of the array.
>>>
>>>   Initial status:
>>>
>>> s->children[0] <--> "children.0" (hd0.qcow2)
>>> s->children[1] <--> "children.1" (hd1.qcow2)
>>>
>>>   children.0 (hd0.qcow2) is removed:
>>>
>>> s->children[0] <--> "children.1" (hd1.qcow2)
>>>
>>>   children.0 (hd0.qcow2) is added again:
>>>
>>> s->children[0] <--> "children.1" (hd1.qcow2)
>>> s->children[1] <--> "children.0" (hd0.qcow2)
>>
>> Yes, it is correct.
>>
>>>
>>>   Is this correct? Is this the indented behavior? Since you are reading
>>>   in FIFO mode, now hd1.qcow2 will always be read first, so if
>>>   children.1 was the secondary disk, it has just become the primary.
>>
>> Yes.
> 
> And don't you need a way to control the order in which the disks must be
> read for COLO?

I think in fifo mode, we should read the disk first that is added earlier.

We don't need a way to control the order now.

Thanks
Wen Congyang

> 
> Berto
> 
> 
> .
>

Re: [Qemu-block] [PATCH v12 2/3] quorum: implement bdrv_add_child() and bdrv_del_child()

2016-03-15 Thread Wen Congyang

On 03/11/2016 08:21 PM, Alberto Garcia wrote:
> On Thu 10 Mar 2016 03:49:40 AM CET, Changlong Xie wrote:
>> @@ -81,6 +82,8 @@ typedef struct BDRVQuorumState {
>>  bool rewrite_corrupted;/* true if the driver must rewrite-on-read 
>> corrupted
>>  * block if Quorum is reached.
>>  */
>> +unsigned long *index_bitmap;
>> +int bsize;
>   [...]
>> +static int get_new_child_index(BDRVQuorumState *s)
>   [...]
>> +static void remove_child_index(BDRVQuorumState *s, int index)
>   [...]
> 
> Sorry if I missed a previous discussion, but why is this necessary?

Hi, Alberto Garcia

Do you have any comments about this patch or give a R-B?

Thanks
Wen Congyang

> 
> Berto
> 
> 
> .
>

Re: [Qemu-block] [PATCH v15 7/9] Introduce new APIs to do replication operation

2016-02-19 Thread Wen Congyang

On 02/19/2016 04:41 PM, Hailiang Zhang wrote:
> Hi,
> 
> On 2016/2/15 9:13, Wen Congyang wrote:
>> On 02/15/2016 08:57 AM, Hailiang Zhang wrote:
>>> On 2016/2/5 12:18, Changlong Xie wrote:
>>>> Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
>>>> Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
>>>> Signed-off-by: Gonglei <arei.gong...@huawei.com>
>>>> Signed-off-by: Changlong Xie <xiecl.f...@cn.fujitsu.com>
>>>> ---
>>>>Makefile.objs|  1 +
>>>>qapi/block-core.json | 13 
>>>>replication.c| 94 
>>>> 
>>>>replication.h| 53 +
>>>>4 files changed, 161 insertions(+)
>>>>create mode 100644 replication.c
>>>>create mode 100644 replication.h
>>>>
>>>> diff --git a/Makefile.objs b/Makefile.objs
>>>> index 06b95c7..a8c74b7 100644
>>>> --- a/Makefile.objs
>>>> +++ b/Makefile.objs
>>>> @@ -15,6 +15,7 @@ block-obj-$(CONFIG_POSIX) += aio-posix.o
>>>>block-obj-$(CONFIG_WIN32) += aio-win32.o
>>>>block-obj-y += block/
>>>>block-obj-y += qemu-io-cmds.o
>>>> +block-obj-y += replication.o
>>>>
>>>>block-obj-m = block/
>>>>
>>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>>> index 7e9e8fe..12362b8 100644
>>>> --- a/qapi/block-core.json
>>>> +++ b/qapi/block-core.json
>>>> @@ -2002,6 +2002,19 @@
>>>>'*read-pattern': 'QuorumReadPattern' } }
>>>>
>>>>##
>>>> +# @ReplicationMode
>>>> +#
>>>> +# An enumeration of replication modes.
>>>> +#
>>>> +# @primary: Primary mode, the vm's state will be sent to secondary QEMU.
>>>> +#
>>>> +# @secondary: Secondary mode, receive the vm's state from primary QEMU.
>>>> +#
>>>> +# Since: 2.6
>>>> +##
>>>> +{ 'enum' : 'ReplicationMode', 'data' : [ 'primary', 'secondary' ] }
>>>> +
>>>> +##
>>>># @BlockdevOptions
>>>>#
>>>># Options for creating a block device.
>>>> diff --git a/replication.c b/replication.c
>>>> new file mode 100644
>>>> index 000..e8ac2f0
>>>> --- /dev/null
>>>> +++ b/replication.c
>>>> @@ -0,0 +1,94 @@
>>>> +/*
>>>> + * Replication filter
>>>> + *
>>>> + * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
>>>> + * Copyright (c) 2016 Intel Corporation
>>>> + * Copyright (c) 2016 FUJITSU LIMITED
>>>> + *
>>>> + * Author:
>>>> + *   Wen Congyang <we...@cn.fujitsu.com>
>>>> + *
>>>> + * This work is licensed under the terms of the GNU GPL, version 2 or 
>>>> later.
>>>> + * See the COPYING file in the top-level directory.
>>>> + */
>>>> +
>>>> +#include "replication.h"
>>>> +
>>>> +static QLIST_HEAD(, ReplicationState) replication_states;
>>>> +
>>>> +ReplicationState *replication_new(void *opaque, ReplicationOps *ops)
>>>> +{
>>>> +ReplicationState *rs;
>>>> +
>>>> +rs = g_new0(ReplicationState, 1);
>>>> +rs->opaque = opaque;
>>>> +rs->ops = ops;
>>>> +QLIST_INSERT_HEAD(_states, rs, node);
>>>> +
>>>> +return rs;
>>>> +}
>>>> +
>>>> +void replication_remove(ReplicationState *rs)
>>>> +{
>>>> +QLIST_REMOVE(rs, node);
>>>> +g_free(rs);
>>>> +}
>>>> +
>>>> +/*
>>>> + * The caller of the function *MUST* make sure vm stopped
>>>> + */
>>>> +void replication_start_all(ReplicationMode mode, Error **errp)
>>>> +{
>>>
>>> Is this public API is only used for block ?
>>> If yes, I'd like it with a 'block_' prefix.
>>
>> No, we hope it can be used for nic too.
>>
> 
> OK, i got why you designed these APIs, I like this idea that
> use the callback/notifier to notify the status of COLO for block/nic.
> 
> But let's do something more graceful.
> For COLO, we can consider it has four states:
> Prepare/start checkpoint(with VM stopped)/finish checkpo

Re: [Qemu-block] [PATCH v15 7/9] Introduce new APIs to do replication operation

2016-02-14 Thread Wen Congyang

On 02/15/2016 08:57 AM, Hailiang Zhang wrote:
> On 2016/2/5 12:18, Changlong Xie wrote:
>> Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
>> Signed-off-by: Gonglei <arei.gong...@huawei.com>
>> Signed-off-by: Changlong Xie <xiecl.f...@cn.fujitsu.com>
>> ---
>>   Makefile.objs|  1 +
>>   qapi/block-core.json | 13 
>>   replication.c| 94 
>> 
>>   replication.h| 53 +
>>   4 files changed, 161 insertions(+)
>>   create mode 100644 replication.c
>>   create mode 100644 replication.h
>>
>> diff --git a/Makefile.objs b/Makefile.objs
>> index 06b95c7..a8c74b7 100644
>> --- a/Makefile.objs
>> +++ b/Makefile.objs
>> @@ -15,6 +15,7 @@ block-obj-$(CONFIG_POSIX) += aio-posix.o
>>   block-obj-$(CONFIG_WIN32) += aio-win32.o
>>   block-obj-y += block/
>>   block-obj-y += qemu-io-cmds.o
>> +block-obj-y += replication.o
>>
>>   block-obj-m = block/
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index 7e9e8fe..12362b8 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -2002,6 +2002,19 @@
>>   '*read-pattern': 'QuorumReadPattern' } }
>>
>>   ##
>> +# @ReplicationMode
>> +#
>> +# An enumeration of replication modes.
>> +#
>> +# @primary: Primary mode, the vm's state will be sent to secondary QEMU.
>> +#
>> +# @secondary: Secondary mode, receive the vm's state from primary QEMU.
>> +#
>> +# Since: 2.6
>> +##
>> +{ 'enum' : 'ReplicationMode', 'data' : [ 'primary', 'secondary' ] }
>> +
>> +##
>>   # @BlockdevOptions
>>   #
>>   # Options for creating a block device.
>> diff --git a/replication.c b/replication.c
>> new file mode 100644
>> index 000..e8ac2f0
>> --- /dev/null
>> +++ b/replication.c
>> @@ -0,0 +1,94 @@
>> +/*
>> + * Replication filter
>> + *
>> + * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
>> + * Copyright (c) 2016 Intel Corporation
>> + * Copyright (c) 2016 FUJITSU LIMITED
>> + *
>> + * Author:
>> + *   Wen Congyang <we...@cn.fujitsu.com>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + */
>> +
>> +#include "replication.h"
>> +
>> +static QLIST_HEAD(, ReplicationState) replication_states;
>> +
>> +ReplicationState *replication_new(void *opaque, ReplicationOps *ops)
>> +{
>> +ReplicationState *rs;
>> +
>> +rs = g_new0(ReplicationState, 1);
>> +rs->opaque = opaque;
>> +rs->ops = ops;
>> +QLIST_INSERT_HEAD(_states, rs, node);
>> +
>> +return rs;
>> +}
>> +
>> +void replication_remove(ReplicationState *rs)
>> +{
>> +QLIST_REMOVE(rs, node);
>> +g_free(rs);
>> +}
>> +
>> +/*
>> + * The caller of the function *MUST* make sure vm stopped
>> + */
>> +void replication_start_all(ReplicationMode mode, Error **errp)
>> +{
> 
> Is this public API is only used for block ?
> If yes, I'd like it with a 'block_' prefix.

No, we hope it can be used for nic too.

Thanks
Wen Congyang

> 
>> +ReplicationState *rs, *next;
>> +
>> +QLIST_FOREACH_SAFE(rs, _states, node, next) {
>> +if (rs->ops && rs->ops->start) {
>> +rs->ops->start(rs, mode, errp);
>> +}
>> +if (*errp != NULL) {
> 
> This is incorrect, you miss checking if errp is NULL,
> if errp is NULL, there will be an error that accessing memory at address 0x0.
> Same with other places in this patch.
> 
>> +return;
>> +}
>> +}
>> +}
>> +
>> +void replication_do_checkpoint_all(Error **errp)
>> +{
>> +ReplicationState *rs, *next;
>> +
>> +QLIST_FOREACH_SAFE(rs, _states, node, next) {
>> +if (rs->ops && rs->ops->checkpoint) {
>> +rs->ops->checkpoint(rs, errp);
>> +}
>> +if (*errp != NULL) {
>> +return;
> 
>> +}
>> +}
>> +}
>> +
>> +void replication_get_error_all(Error **errp)
>> +{
>> +ReplicationState *rs, *next;
>> +
>> +QLIST_FOREACH_SAFE(rs, _states, node, next) {
>> +if (rs-&

Re: [Qemu-block] [PATCH v13 00/10] Block replication for continuous checkpoints

2016-02-04 Thread Wen Congyang

On 02/04/2016 05:07 PM, Dr. David Alan Gilbert wrote:
> * Changlong Xie (xiecl.f...@cn.fujitsu.com) wrote:
>> On 02/01/2016 09:18 AM, Wen Congyang wrote:
>>> On 01/29/2016 06:47 PM, Dr. David Alan Gilbert wrote:
>>>> * Wen Congyang (we...@cn.fujitsu.com) wrote:
>>>>> On 01/29/2016 06:07 PM, Dr. David Alan Gilbert wrote:
>>>>>> * Wen Congyang (we...@cn.fujitsu.com) wrote:
>>>>>>> On 01/27/2016 07:03 PM, Dr. David Alan Gilbert wrote:
>>>>>>>> Hi,
>>>>>>>>   I've got a block error if I kill the secondary.
>>>>>>>>
>>>>>>>> Start both primary & secondary
>>>>>>>> kill -9 secondary qemu
>>>>>>>> x_colo_lost_heartbeat on primary
>>>>>>>>
>>>>>>>> The guest sees a block error and the ext4 root switches to read-only.
>>>>>>>>
>>>>>>>> I gdb'd the primary with a breakpoint on quorum_report_bad; see
>>>>>>>> backtrace below.
>>>>>>>> (This is based on colo-v2.4-periodic-mode of the framework
>>>>>>>> code with the block and network proxy merged in; so it could be my
>>>>>>>> merging but I don't think so ?)
>>>>>>>>
>>>>>>>>
>>>>>>>> (gdb) where
>>>>>>>> #0  quorum_report_bad (node_name=0x7f2946a0892c "node0", ret=-5, 
>>>>>>>> acb=0x7f2946cb3910, acb=0x7f2946cb3910)
>>>>>>>> at /root/colo/jan-2016/qemu/block/quorum.c:222
>>>>>>>> #1  0x7f2943b23058 in quorum_aio_cb (opaque=, 
>>>>>>>> ret=)
>>>>>>>> at /root/colo/jan-2016/qemu/block/quorum.c:315
>>>>>>>> #2  0x7f2943b311be in bdrv_co_complete (acb=0x7f2946cb3f60) at 
>>>>>>>> /root/colo/jan-2016/qemu/block/io.c:2122
>>>>>>>> #3  0x7f2943ae777d in aio_bh_call (bh=) at 
>>>>>>>> /root/colo/jan-2016/qemu/async.c:64
>>>>>>>> #4  aio_bh_poll (ctx=ctx@entry=0x7f2945b771d0) at 
>>>>>>>> /root/colo/jan-2016/qemu/async.c:92
>>>>>>>> #5  0x7f2943af5090 in aio_dispatch (ctx=0x7f2945b771d0) at 
>>>>>>>> /root/colo/jan-2016/qemu/aio-posix.c:305
>>>>>>>> #6  0x7f2943ae756e in aio_ctx_dispatch (source=, 
>>>>>>>> callback=,
>>>>>>>> user_data=) at /root/colo/jan-2016/qemu/async.c:231
>>>>>>>> #7  0x7f293b84a79a in g_main_context_dispatch () from 
>>>>>>>> /lib64/libglib-2.0.so.0
>>>>>>>> #8  0x7f2943af3a00 in glib_pollfds_poll () at 
>>>>>>>> /root/colo/jan-2016/qemu/main-loop.c:211
>>>>>>>> #9  os_host_main_loop_wait (timeout=) at 
>>>>>>>> /root/colo/jan-2016/qemu/main-loop.c:256
>>>>>>>> #10 main_loop_wait (nonblocking=) at 
>>>>>>>> /root/colo/jan-2016/qemu/main-loop.c:504
>>>>>>>> #11 0x7f29438529ee in main_loop () at 
>>>>>>>> /root/colo/jan-2016/qemu/vl.c:1945
>>>>>>>> #12 main (argc=, argv=, envp=>>>>>>> out>) at /root/colo/jan-2016/qemu/vl.c:4707
>>>>>>>>
>>>>>>>> (gdb) p s->num_children
>>>>>>>> $1 = 2
>>>>>>>> (gdb) p acb->success_count
>>>>>>>> $2 = 0
>>>>>>>> (gdb) p acb->is_read
>>>>>>>> $5 = false
>>>>>>>
>>>>>>> Sorry for the late reply.
>>>>>>
>>>>>> No problem.
>>>>>>
>>>>>>> What it the value of acb->count?
>>>>>>
>>>>>> (gdb) p acb->count
>>>>>> $1 = 1
>>>>>
>>>>> Note, the count is 1, not 2. Writing to children.0 is in flight. If 
>>>>> writing to children.0 successes,
>>>>> the guest doesn't know this error.
>>>>>>> If secondary host is down, you should remove quorum's children.1. 
>>>>>>> Otherwise, you will get
>>>>>>> I/O error event.
>>>>>>
>>>>>> Is that safe?  If the secondary fails, do you always have time to issue 
>>&

Re: [Qemu-block] [PATCH v13 00/10] Block replication for continuous checkpoints

2016-01-31 Thread Wen Congyang

On 01/29/2016 06:47 PM, Dr. David Alan Gilbert wrote:
> * Wen Congyang (we...@cn.fujitsu.com) wrote:
>> On 01/29/2016 06:07 PM, Dr. David Alan Gilbert wrote:
>>> * Wen Congyang (we...@cn.fujitsu.com) wrote:
>>>> On 01/27/2016 07:03 PM, Dr. David Alan Gilbert wrote:
>>>>> Hi,
>>>>>   I've got a block error if I kill the secondary.
>>>>>
>>>>> Start both primary & secondary
>>>>> kill -9 secondary qemu
>>>>> x_colo_lost_heartbeat on primary
>>>>>
>>>>> The guest sees a block error and the ext4 root switches to read-only.
>>>>>
>>>>> I gdb'd the primary with a breakpoint on quorum_report_bad; see
>>>>> backtrace below.
>>>>> (This is based on colo-v2.4-periodic-mode of the framework
>>>>> code with the block and network proxy merged in; so it could be my
>>>>> merging but I don't think so ?)
>>>>>
>>>>>
>>>>> (gdb) where
>>>>> #0  quorum_report_bad (node_name=0x7f2946a0892c "node0", ret=-5, 
>>>>> acb=0x7f2946cb3910, acb=0x7f2946cb3910)
>>>>> at /root/colo/jan-2016/qemu/block/quorum.c:222
>>>>> #1  0x7f2943b23058 in quorum_aio_cb (opaque=, 
>>>>> ret=)
>>>>> at /root/colo/jan-2016/qemu/block/quorum.c:315
>>>>> #2  0x7f2943b311be in bdrv_co_complete (acb=0x7f2946cb3f60) at 
>>>>> /root/colo/jan-2016/qemu/block/io.c:2122
>>>>> #3  0x7f2943ae777d in aio_bh_call (bh=) at 
>>>>> /root/colo/jan-2016/qemu/async.c:64
>>>>> #4  aio_bh_poll (ctx=ctx@entry=0x7f2945b771d0) at 
>>>>> /root/colo/jan-2016/qemu/async.c:92
>>>>> #5  0x7f2943af5090 in aio_dispatch (ctx=0x7f2945b771d0) at 
>>>>> /root/colo/jan-2016/qemu/aio-posix.c:305
>>>>> #6  0x7f2943ae756e in aio_ctx_dispatch (source=, 
>>>>> callback=, 
>>>>> user_data=) at /root/colo/jan-2016/qemu/async.c:231
>>>>> #7  0x7f293b84a79a in g_main_context_dispatch () from 
>>>>> /lib64/libglib-2.0.so.0
>>>>> #8  0x7f2943af3a00 in glib_pollfds_poll () at 
>>>>> /root/colo/jan-2016/qemu/main-loop.c:211
>>>>> #9  os_host_main_loop_wait (timeout=) at 
>>>>> /root/colo/jan-2016/qemu/main-loop.c:256
>>>>> #10 main_loop_wait (nonblocking=) at 
>>>>> /root/colo/jan-2016/qemu/main-loop.c:504
>>>>> #11 0x7f29438529ee in main_loop () at 
>>>>> /root/colo/jan-2016/qemu/vl.c:1945
>>>>> #12 main (argc=, argv=, envp=>>>> out>) at /root/colo/jan-2016/qemu/vl.c:4707
>>>>>
>>>>> (gdb) p s->num_children
>>>>> $1 = 2
>>>>> (gdb) p acb->success_count
>>>>> $2 = 0
>>>>> (gdb) p acb->is_read
>>>>> $5 = false
>>>>
>>>> Sorry for the late reply.
>>>
>>> No problem.
>>>
>>>> What it the value of acb->count?
>>>
>>> (gdb) p acb->count
>>> $1 = 1
>>
>> Note, the count is 1, not 2. Writing to children.0 is in flight. If writing 
>> to children.0 successes,
>> the guest doesn't know this error.
>>>> If secondary host is down, you should remove quorum's children.1. 
>>>> Otherwise, you will get
>>>> I/O error event.
>>>
>>> Is that safe?  If the secondary fails, do you always have time to issue the 
>>> command to
>>> remove the children.1  before the guest sees the error?
>>
>> We will write to two children, and expect that writing to children.0 will 
>> success. If so,
>> the guest doesn't know this error. You just get the I/O error event.
> 
> I think children.0 is the disk, and that should be OK - so only the 
> children.1/replication should
> be failing - so in that case why do I see the error?

I don't know, and I will check the codes.

> The 'node0' in the backtrace above is the name of the replication, so it does 
> look like the error
> is coming from the replication.

No, the backtrace is just report an I/O error events to the management 
application.

> 
>>> Anyway, I tried removing children.1 but it segfaults now, I guess the 
>>> replication is unhappy:
>>>
>>> (qemu) x_block_change colo-disk0 -d children.1
>>> (qemu) x_colo_lost_heartbeat 
>>
>> Hmm, you should not remove the child before failover. I will check it how to

Re: [Qemu-block] [PATCH v13 00/10] Block replication for continuous checkpoints

2016-01-29 Thread Wen Congyang

On 01/29/2016 06:07 PM, Dr. David Alan Gilbert wrote:
> * Wen Congyang (we...@cn.fujitsu.com) wrote:
>> On 01/27/2016 07:03 PM, Dr. David Alan Gilbert wrote:
>>> Hi,
>>>   I've got a block error if I kill the secondary.
>>>
>>> Start both primary & secondary
>>> kill -9 secondary qemu
>>> x_colo_lost_heartbeat on primary
>>>
>>> The guest sees a block error and the ext4 root switches to read-only.
>>>
>>> I gdb'd the primary with a breakpoint on quorum_report_bad; see
>>> backtrace below.
>>> (This is based on colo-v2.4-periodic-mode of the framework
>>> code with the block and network proxy merged in; so it could be my
>>> merging but I don't think so ?)
>>>
>>>
>>> (gdb) where
>>> #0  quorum_report_bad (node_name=0x7f2946a0892c "node0", ret=-5, 
>>> acb=0x7f2946cb3910, acb=0x7f2946cb3910)
>>> at /root/colo/jan-2016/qemu/block/quorum.c:222
>>> #1  0x7f2943b23058 in quorum_aio_cb (opaque=, 
>>> ret=)
>>> at /root/colo/jan-2016/qemu/block/quorum.c:315
>>> #2  0x7f2943b311be in bdrv_co_complete (acb=0x7f2946cb3f60) at 
>>> /root/colo/jan-2016/qemu/block/io.c:2122
>>> #3  0x7f2943ae777d in aio_bh_call (bh=) at 
>>> /root/colo/jan-2016/qemu/async.c:64
>>> #4  aio_bh_poll (ctx=ctx@entry=0x7f2945b771d0) at 
>>> /root/colo/jan-2016/qemu/async.c:92
>>> #5  0x7f2943af5090 in aio_dispatch (ctx=0x7f2945b771d0) at 
>>> /root/colo/jan-2016/qemu/aio-posix.c:305
>>> #6  0x7f2943ae756e in aio_ctx_dispatch (source=, 
>>> callback=, 
>>> user_data=) at /root/colo/jan-2016/qemu/async.c:231
>>> #7  0x7f293b84a79a in g_main_context_dispatch () from 
>>> /lib64/libglib-2.0.so.0
>>> #8  0x7f2943af3a00 in glib_pollfds_poll () at 
>>> /root/colo/jan-2016/qemu/main-loop.c:211
>>> #9  os_host_main_loop_wait (timeout=) at 
>>> /root/colo/jan-2016/qemu/main-loop.c:256
>>> #10 main_loop_wait (nonblocking=) at 
>>> /root/colo/jan-2016/qemu/main-loop.c:504
>>> #11 0x7f29438529ee in main_loop () at /root/colo/jan-2016/qemu/vl.c:1945
>>> #12 main (argc=, argv=, envp=) 
>>> at /root/colo/jan-2016/qemu/vl.c:4707
>>>
>>> (gdb) p s->num_children
>>> $1 = 2
>>> (gdb) p acb->success_count
>>> $2 = 0
>>> (gdb) p acb->is_read
>>> $5 = false
>>
>> Sorry for the late reply.
> 
> No problem.
> 
>> What it the value of acb->count?
> 
> (gdb) p acb->count
> $1 = 1

Note, the count is 1, not 2. Writing to children.0 is in flight. If writing to 
children.0 successes,
the guest doesn't know this error.

> 
>> If secondary host is down, you should remove quorum's children.1. Otherwise, 
>> you will get
>> I/O error event.
> 
> Is that safe?  If the secondary fails, do you always have time to issue the 
> command to
> remove the children.1  before the guest sees the error?

We will write to two children, and expect that writing to children.0 will 
success. If so,
the guest doesn't know this error. You just get the I/O error event.

> 
> Anyway, I tried removing children.1 but it segfaults now, I guess the 
> replication is unhappy:
> 
> (qemu) x_block_change colo-disk0 -d children.1
> (qemu) x_colo_lost_heartbeat 

Hmm, you should not remove the child before failover. I will check it how to 
avoid it in the codes.

> 
> 12973 Segmentation fault  (core dumped) 
> ./try/x86_64-softmmu/qemu-system-x86_64 -enable-kvm $console_param -S -boot c 
> -m 4080 -smp 4 -machine pc-i440fx-2.5,accel=kvm -name debug-threads=on -trace 
> events=trace-file -device virtio-rng-pci $block_param $net_param
> 
> #0  0x7f0a398a864c in bdrv_stop_replication (bs=0x7f0a3b0a8430, 
> failover=true, errp=0x7fff6a5c3420)
> at /root/colo/jan-2016/qemu/block.c:4426
> 
> (gdb) p drv
> $1 = (BlockDriver *) 0x5d2a
> 
>   it looks like the whole of bs is bogus.
> 
> #1  0x7f0a398d87f6 in quorum_stop_replication (bs=, 
> failover=, 
> errp=) at /root/colo/jan-2016/qemu/block/quorum.c:1213
> 
> (gdb) p s->replication_index
> $3 = 1
> 
> I guess quorum_del_child needs to stop replication before it removes the 
> child?

Yes, but in the newest version, quorum doesn't know the block replication, and 
I think
we shoud add an reference to the bs when starting block replication.

Thanks
Wen Congyang

> (although it would have to be careful not to block on the dead nbd).
> 
> #2  0x7f0a398a8901 in bdrv_stop_replication_all 
> (failover=failover@entry=true, errp=

Re: [Qemu-block] [PATCH v13 00/10] Block replication for continuous checkpoints

2016-01-28 Thread Wen Congyang

On 01/27/2016 07:03 PM, Dr. David Alan Gilbert wrote:
> Hi,
>   I've got a block error if I kill the secondary.
> 
> Start both primary & secondary
> kill -9 secondary qemu
> x_colo_lost_heartbeat on primary
> 
> The guest sees a block error and the ext4 root switches to read-only.
> 
> I gdb'd the primary with a breakpoint on quorum_report_bad; see
> backtrace below.
> (This is based on colo-v2.4-periodic-mode of the framework
> code with the block and network proxy merged in; so it could be my
> merging but I don't think so ?)
> 
> 
> (gdb) where
> #0  quorum_report_bad (node_name=0x7f2946a0892c "node0", ret=-5, 
> acb=0x7f2946cb3910, acb=0x7f2946cb3910)
> at /root/colo/jan-2016/qemu/block/quorum.c:222
> #1  0x7f2943b23058 in quorum_aio_cb (opaque=, 
> ret=)
> at /root/colo/jan-2016/qemu/block/quorum.c:315
> #2  0x7f2943b311be in bdrv_co_complete (acb=0x7f2946cb3f60) at 
> /root/colo/jan-2016/qemu/block/io.c:2122
> #3  0x7f2943ae777d in aio_bh_call (bh=) at 
> /root/colo/jan-2016/qemu/async.c:64
> #4  aio_bh_poll (ctx=ctx@entry=0x7f2945b771d0) at 
> /root/colo/jan-2016/qemu/async.c:92
> #5  0x7f2943af5090 in aio_dispatch (ctx=0x7f2945b771d0) at 
> /root/colo/jan-2016/qemu/aio-posix.c:305
> #6  0x7f2943ae756e in aio_ctx_dispatch (source=, 
> callback=, 
> user_data=) at /root/colo/jan-2016/qemu/async.c:231
> #7  0x7f293b84a79a in g_main_context_dispatch () from 
> /lib64/libglib-2.0.so.0
> #8  0x7f2943af3a00 in glib_pollfds_poll () at 
> /root/colo/jan-2016/qemu/main-loop.c:211
> #9  os_host_main_loop_wait (timeout=) at 
> /root/colo/jan-2016/qemu/main-loop.c:256
> #10 main_loop_wait (nonblocking=) at 
> /root/colo/jan-2016/qemu/main-loop.c:504
> #11 0x7f29438529ee in main_loop () at /root/colo/jan-2016/qemu/vl.c:1945
> #12 main (argc=, argv=, envp=) 
> at /root/colo/jan-2016/qemu/vl.c:4707
> 
> (gdb) p s->num_children
> $1 = 2
> (gdb) p acb->success_count
> $2 = 0
> (gdb) p acb->is_read
> $5 = false

Sorry for the late reply.
What it the value of acb->count?

If secondary host is down, you should remove quorum's children.1. Otherwise, 
you will get
I/O error event.

Thanks
Wen Congyang

> 
> (qemu) info block
> colo-disk0 (#block080): json:{"children": [{"driver": "raw", "file": 
> {"driver": "file", "filename": "/root/colo/bugzilla.raw"}}, {"driver": 
> "replication", "mode": "primary", "file": {"port": "8889", "host": "ibpair", 
> "driver": "nbd", "export": "colo-disk0"}}], "driver": "quorum", "blkverify": 
> false, "rewrite-corrupted": false, "vote-threshold": 1} (quorum)
> Cache mode:   writeback, direct
> 
> Dave
> 
> * Changlong Xie (xiecl.f...@cn.fujitsu.com) wrote:
>> Block replication is a very important feature which is used for
>> continuous checkpoints(for example: COLO).
>>
>> You can get the detailed information about block replication from here:
>> http://wiki.qemu.org/Features/BlockReplication
>>
>> Usage:
>> Please refer to docs/block-replication.txt
>>
>> This patch series is based on the following patch series:
>> 1. http://lists.nongnu.org/archive/html/qemu-devel/2015-12/msg04570.html
>>
>> You can get the patch here:
>> https://github.com/Pating/qemu/tree/changlox/block-replication-v13
>>
>> You can get the patch with framework here:
>> https://github.com/Pating/qemu/tree/changlox/colo_framework_v12
>>
>> TODO:
>> 1. Continuous block replication. It will be started after basic functions
>>are accepted.
>>
>> Changs Log:
>> V13:
>> 1. Rebase to the newest codes
>> 2. Remove redundant marcos and semicolon in replication.c 
>> 3. Fix typos in block-replication.txt
>> V12:
>> 1. Rebase to the newest codes
>> 2. Use backing reference to replcace 'allow-write-backing-file'
>> V11:
>> 1. Reopen the backing file when starting blcok replication if it is not
>>opened in R/W mode
>> 2. Unblock BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET
>>when opening backing file
>> 3. Block the top BDS so there is only one block job for the top BDS and
>>its backing chain.
>> V10:
>> 1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
>>reference.
>> 2. Address the comments from Eric Blake
>> V9:
>> 1. Update the error messages
>> 2. Rebase to the newest qemu
>>

Re: [Qemu-block] COLO: how to flip a secondary to a primary?

2016-01-25 Thread Wen Congyang

On 01/26/2016 02:59 AM, Dr. David Alan Gilbert wrote:
> * Wen Congyang (we...@cn.fujitsu.com) wrote:
>> On 01/23/2016 03:35 AM, Dr. David Alan Gilbert wrote:
>>> Hi,
>>>   I've been looking at what's needed to add a new secondary after
>>> a primary failed; from the block side it doesn't look as hard
>>> as I'd expected, perhaps you can tell me if I'm missing something!
>>>
>>> The normal primary setup is:
>>>
>>>quorum
>>>   Real disk
>>>   nbd client
>>
>> quorum
>>real disk
>>replication
>>   nbd client
>>
>>>
>>> The normal secondary setup is:
>>>replication
>>>   active-disk
>>>   hidden-disk
>>>   Real-disk
>>
>> IIRC, we can do it like this:
>> quorum
>>replication
>>   active-disk
>>   hidden-disk
>>   real-disk
> 
> Yes.
> 
>>> With a couple of minor code hacks; I changed the secondary to be:
>>>
>>>quorum
>>>   replication
>>> active-disk
>>> hidden-disk
>>> Real-disk
>>>   dummy-disk
>>
>> after failover,
>> quorum
>>replicaion(old, mode is secondary)
>>  active-disk
>>  hidden-disk*
>>  real-disk*
>>replication(new, mode is primary)
>>  nbd-client
> 
> Do you need to keep the old secondary-replication?
> Does that just pass straight through?

Yes, the old secondary-replication can work in the newest mode.
For example, we don't start colo again after failover, we do nothing.

> 
>> In the newest version, we active commit active-disk to real-disk.
>> So it will be:
>> quorum
>>replicaion(old, mode is secondary)
>>  active-disk(it is real disk now)
>>replication(new, mode is primary)
>>  nbd-client
> 
> How does that active-commit work?  I didn't think you
> could change the real disk until you had the full checkpoint,
> since you don't know whether the primary or secondaries
> changes need to be written?

I start the active-commit work when doing failover. After failover,
the primary changes after last checkpoint should be dropped(How to cancel
the inprogress write ops?).

> 
>>> and then after the primary fails, I start a new secondary
>>> on another host and then on the old secondary do:
>>>
>>>   nbd_server_stop
>>>   stop
>>>   x_block_change top-quorum -d children.0 # deletes use of real 
>>> disk, leaves dummy
>>>   drive_del active-disk0
>>>   x_block_change top-quorum -a node-real-disk
>>>   x_block_change top-quorum -d children.1 # Seems to have deleted 
>>> the dummy?!, the disk is now child 0
>>>   drive_add buddy 
>>> driver=replication,mode=primary,file.driver=nbd,file.host=ibpair,file.port=8889,file.export=colo-disk0,node-name=nbd-client,if=none,cache=none
>>>   x_block_change top-quorum -a nbd-client
>>>   c
>>>   migrate_set_capability x-colo on
>>>   migrate -d -b tcp:ibpair:
>>>
>>> and I think that means what was the secondary, has the same disk
>>> structure as a normal primary.
>>> That's not quite happy yet, and I've not figured out why - but the
>>> order/structure of the block devices looks right?
>>>
>>> Notes:
>>>a) The dummy serves two purposes, 1) it works around the segfault
>>>   I reported in the other mail, 2) when I delete the real disk in the
>>>   first x_block_change it means the quorum still has 1 disk so doesn't
>>>   get upset.
>>
>> I don't understand the purpose 2.
> 
> quorum wont allow you to delete all it's members ('The number of children 
> cannot be lower than the vote threshold 1')
> and it's very tricky getting the order correct with add/delete; for example
> I tried:
> 
> drive_add buddy 
> driver=replication,mode=primary,file.driver=nbd,file.host=ibpair,file.port=8889,file.export=colo-disk0,node-name=nbd-client,if=none,cache=none
> # gets children.1
> x_block_change top-quorum -a nbd-client
> # deletes the secondary replication
> x_block_change top-quorum -d children.0
> drive_del active-disk0

The active-disk0 contains some data, and you should not delete it.
If we do active-commit after failover, the active-disk0 is the real disk.

> # ends up as children.0 but in the 2nd slot
> x_block_change top-quorum -a node-real-disk
> 
> info block shows me:
> top-quorum (#block615): json:{"children": [
> {"driver":

Re: [Qemu-block] [PATCH v13 00/10] Block replication for continuous checkpoints

2016-01-24 Thread Wen Congyang

On 01/22/2016 11:14 PM, Dr. David Alan Gilbert wrote:
> Hi,
>   I can trigger a segfault if I wire in the block replication together with
> a quorum instance; it only triggers with both of them present but,
> it looks like the problem is a disagreement about the number of quorum
> members;  I'm triggering this on the 'colo-v2.4-periodic-mode' branch
> that is posted in the colo-framework set that I think includes this set
> (from https://github.com/coloft/qemu.git).
> 
> To trigger:
> ./git/colo/jan-16/try/x86_64-softmmu/qemu-system-x86_64 -nographic -S
> 
> (qemu) drive_add 0 
> if=none,id=colo-disk0,file.filename=/home/localvms/bugzilla.raw,driver=raw,node-name=node0
> (qemu) drive_add 1 
> if=none,id=active-disk0,throttling.bps-total=7000,driver=replication,mode=secondary,file.driver=qcow2,file.file.filename=/run/colo-active-disk.qcow2,file.backing.driver=qcow2,file.backing.file.filename=/run/colo-hidden-disk.qcow2,file.backing.backing=colo-disk0
> (qemu) drive_add 2 
> if=none,id=top-quorum,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0=active-disk0
> (qemu) device_add virtio-blk-pci,drive=top-quorum,addr=9
> 
> *** Error in `/root/colo/jan-2016/./try/x86_64-softmmu/qemu-system-x86_64': 
> free(): invalid pointer: 0x55a8fdf0 ***
> === Backtrace: =
> /lib64/libc.so.6(+0x7cfe1)[0x7110ffe1]
> /lib64/libglib-2.0.so.0(g_free+0xf)[0x71ecc36f]
> /root/colo/jan-2016/./try/x86_64-softmmu/qemu-system-x86_64
> Program received signal SIGABRT, Aborted.
> 0x710c85f7 in raise () from /lib64/libc.so.6
> (gdb) where
> #0  0x710c85f7 in raise () from /lib64/libc.so.6
> #1  0x710c9ce8 in abort () from /lib64/libc.so.6
> #2  0x71108317 in __libc_message () from /lib64/libc.so.6
> #3  0x7110ffe1 in _int_free () from /lib64/libc.so.6
> #4  0x71ecc36f in g_free () from /lib64/libglib-2.0.so.0
> #5  0x559dfdd7 in qemu_iovec_destroy (qiov=0x57815410) at 
> /root/colo/jan-2016/qemu/util/iov.c:378
> #6  0x55989cce in quorum_aio_finalize (acb=0x57815350) at 
> /root/colo/jan-2016/qemu/block/quorum.c:171
> 171   qemu_iovec_destroy(>qcrs[i].qiov);
> (gdb) list
> 166   
> 167   if (acb->is_read) {
> 168   /* on the quorum case acb->child_iter == s->num_children - 1 */
> 169   for (i = 0; i <= acb->child_iter; i++) {
> 170   qemu_vfree(acb->qcrs[i].buf);
> 171   qemu_iovec_destroy(>qcrs[i].qiov);
> 172   }
> 173   }
> 174   
> 175   g_free(acb->qcrs);
> (gdb) p acb->child_iter
> $1 = 1
> (gdb) p i
> $3 = 1

Thanks for your test. Can you give me the following information:
1. acb->ret's value
2. s->num_children

I think it is quorum's bug, and acb->ret is < 0.

Thanks
Wen Congyang

> 
> #7  0x5598afca in quorum_aio_cb (opaque=, ret=-5)
> at /root/colo/jan-2016/qemu/block/quorum.c:302
> #8  0x559990ee in bdrv_co_complete (acb=0x57815410) at 
> /root/colo/jan-2016/qemu/block/io.c:2122
> .
> 
> So I guess acb->child_iter is wrong, since we only have one child on that 
> quorum?
> and we're trying to do a destroy on the second child.
> 
> Dave
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> 
> 
> .
>

Re: [Qemu-block] [PATCH v13 00/10] Block replication for continuous checkpoints

2016-01-24 Thread Wen Congyang

On 01/22/2016 11:14 PM, Dr. David Alan Gilbert wrote:
> Hi,
>   I can trigger a segfault if I wire in the block replication together with
> a quorum instance; it only triggers with both of them present but,
> it looks like the problem is a disagreement about the number of quorum
> members;  I'm triggering this on the 'colo-v2.4-periodic-mode' branch
> that is posted in the colo-framework set that I think includes this set
> (from https://github.com/coloft/qemu.git).
> 
> To trigger:
> ./git/colo/jan-16/try/x86_64-softmmu/qemu-system-x86_64 -nographic -S
> 
> (qemu) drive_add 0 
> if=none,id=colo-disk0,file.filename=/home/localvms/bugzilla.raw,driver=raw,node-name=node0
> (qemu) drive_add 1 
> if=none,id=active-disk0,throttling.bps-total=7000,driver=replication,mode=secondary,file.driver=qcow2,file.file.filename=/run/colo-active-disk.qcow2,file.backing.driver=qcow2,file.backing.file.filename=/run/colo-hidden-disk.qcow2,file.backing.backing=colo-disk0
> (qemu) drive_add 2 
> if=none,id=top-quorum,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0=active-disk0
> (qemu) device_add virtio-blk-pci,drive=top-quorum,addr=9
> 
> *** Error in `/root/colo/jan-2016/./try/x86_64-softmmu/qemu-system-x86_64': 
> free(): invalid pointer: 0x55a8fdf0 ***
> === Backtrace: =
> /lib64/libc.so.6(+0x7cfe1)[0x7110ffe1]
> /lib64/libglib-2.0.so.0(g_free+0xf)[0x71ecc36f]
> /root/colo/jan-2016/./try/x86_64-softmmu/qemu-system-x86_64
> Program received signal SIGABRT, Aborted.
> 0x710c85f7 in raise () from /lib64/libc.so.6
> (gdb) where
> #0  0x710c85f7 in raise () from /lib64/libc.so.6
> #1  0x710c9ce8 in abort () from /lib64/libc.so.6
> #2  0x71108317 in __libc_message () from /lib64/libc.so.6
> #3  0x7110ffe1 in _int_free () from /lib64/libc.so.6
> #4  0x71ecc36f in g_free () from /lib64/libglib-2.0.so.0
> #5  0x559dfdd7 in qemu_iovec_destroy (qiov=0x57815410) at 
> /root/colo/jan-2016/qemu/util/iov.c:378
> #6  0x55989cce in quorum_aio_finalize (acb=0x57815350) at 
> /root/colo/jan-2016/qemu/block/quorum.c:171
> 171   qemu_iovec_destroy(>qcrs[i].qiov);
> (gdb) list
> 166   
> 167   if (acb->is_read) {
> 168   /* on the quorum case acb->child_iter == s->num_children - 1 */
> 169   for (i = 0; i <= acb->child_iter; i++) {
> 170   qemu_vfree(acb->qcrs[i].buf);
> 171   qemu_iovec_destroy(>qcrs[i].qiov);
> 172   }
> 173   }
> 174   
> 175   g_free(acb->qcrs);
> (gdb) p acb->child_iter
> $1 = 1
> (gdb) p i
> $3 = 1
> 
> #7  0x5598afca in quorum_aio_cb (opaque=, ret=-5)
> at /root/colo/jan-2016/qemu/block/quorum.c:302
> #8  0x559990ee in bdrv_co_complete (acb=0x57815410) at 
> /root/colo/jan-2016/qemu/block/io.c:2122
> .
> 
> So I guess acb->child_iter is wrong, since we only have one child on that 
> quorum?
> and we're trying to do a destroy on the second child.

Can you try the following patch:
>From 3f2c5ec288cd9a36afb392b4bba24029f3e9345a Mon Sep 17 00:00:00 2001
From: Wen Congyang <we...@cn.fujitsu.com>
Date: Mon, 25 Jan 2016 09:18:09 +0800
Subject: [PATCH] quorum: fix segfault when read fails in fifo mode

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
---
 block/quorum.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index a5ae4b8..0965277 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -295,6 +295,9 @@ static void quorum_aio_cb(void *opaque, int ret)
 quorum_copy_qiov(acb->qiov, >qcrs[acb->child_iter].qiov);
 }
 acb->vote_ret = ret;
+if (ret < 0) {
+acb->child_iter--;
+}
 quorum_aio_finalize(acb);
 return;
 }
-- 
2.5.0



> 
> Dave
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> 
> 
> .
>

Re: [Qemu-block] [PATCH v9 2/3] quorum: implement bdrv_add_child() and bdrv_del_child()

2016-01-24 Thread Wen Congyang

On 01/23/2016 04:02 AM, Dr. David Alan Gilbert wrote:
> * Alberto Garcia (be...@igalia.com) wrote:
>> On Thu 21 Jan 2016 05:58:42 PM CET, Eric Blake <ebl...@redhat.com> wrote:
>>>>>> In general, what do you do to make sure that the data in a new Quorum
>>>>>> child is consistent with that of the rest of the array?
>>>>>
>>>>> Quorum can have more than one child when it starts. But we don't do
>>>>> the similar check. So I don't think we should do such check here.
>>>>
>>>> Yes, but when you start a VM you can verify in advance that all
>>>> members of the Quorum have the same data. If you do that on a running
>>>> VM how can you know if the new disk is consistent with the others?
>>>
>>> User error if it is not.  Just the same as it is user error if you
>>> request a shallow drive-mirror but the destination is not the same
>>> contents as the backing file.  I don't think qemu has to protect us
>>> from user error in this case.
>>
>> But the backing file is read-only so the user can guarantee that the
>> destination has the same data before the shallow mirror. How do you do
>> that in this case?
> 
> I think in the colo case they're relying on doing a block migrate
> to synchronise the remote disk prior to switching into colo mode.

Yes, we can do a block migration to sync the disk. After the migration finished,
we stop block migration before starting colo.

Thanks
Wen Congyang

> 
> Dave
> 
>> Berto
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> 
> 
> .
>

Re: [Qemu-block] COLO: how to flip a secondary to a primary?

2016-01-24 Thread Wen Congyang

On 01/23/2016 03:35 AM, Dr. David Alan Gilbert wrote:
> Hi,
>   I've been looking at what's needed to add a new secondary after
> a primary failed; from the block side it doesn't look as hard
> as I'd expected, perhaps you can tell me if I'm missing something!
> 
> The normal primary setup is:
> 
>quorum
>   Real disk
>   nbd client

quorum
   real disk
   replication
  nbd client

> 
> The normal secondary setup is:
>replication
>   active-disk
>   hidden-disk
>   Real-disk

IIRC, we can do it like this:
quorum
   replication
  active-disk
  hidden-disk
  real-disk

> 
> With a couple of minor code hacks; I changed the secondary to be:
> 
>quorum
>   replication
> active-disk
> hidden-disk
> Real-disk
>   dummy-disk

after failover,
quorum
   replicaion(old, mode is secondary)
 active-disk
 hidden-disk*
 real-disk*
   replication(new, mode is primary)
 nbd-client

In the newest version, we active commit active-disk to real-disk.
So it will be:
quorum
   replicaion(old, mode is secondary)
 active-disk(it is real disk now)
   replication(new, mode is primary)
 nbd-client

> 
> and then after the primary fails, I start a new secondary
> on another host and then on the old secondary do:
> 
>   nbd_server_stop
>   stop
>   x_block_change top-quorum -d children.0 # deletes use of real disk, 
> leaves dummy
>   drive_del active-disk0
>   x_block_change top-quorum -a node-real-disk
>   x_block_change top-quorum -d children.1 # Seems to have deleted the 
> dummy?!, the disk is now child 0
>   drive_add buddy 
> driver=replication,mode=primary,file.driver=nbd,file.host=ibpair,file.port=8889,file.export=colo-disk0,node-name=nbd-client,if=none,cache=none
>   x_block_change top-quorum -a nbd-client
>   c
>   migrate_set_capability x-colo on
>   migrate -d -b tcp:ibpair:
> 
> and I think that means what was the secondary, has the same disk
> structure as a normal primary.
> That's not quite happy yet, and I've not figured out why - but the
> order/structure of the block devices looks right?
> 
> Notes:
>a) The dummy serves two purposes, 1) it works around the segfault
>   I reported in the other mail, 2) when I delete the real disk in the
>   first x_block_change it means the quorum still has 1 disk so doesn't
>   get upset.

I don't understand the purpose 2.

>b) I had to remove the restriction in quorum_start_replication
>   on which mode it would run in. 

IIRC, this check will be removed.

>c) I'm not really sure everything knows it's in secondary mode yet, and
>   I'm not convinced whether the replication is doing the right thing.
>d) The migrate -d -b   eventually fails on the destination, not worked out 
> why
>   yet.

Can you give me the error message?

>e) Adding/deleting children on quorum is hard having to use the 
> children.0/1
>   notation when you've added children using node names - it's worrying
>   which number is which; is there a way to give them a name?

No. I think we can improve 'info block' output.

>f) I've not thought about the colo-proxy that much yet - I guess that
>   existing connections need to keep their sequence number offset but
>   new connections made by what is now the primary dont need to do anything
>   special.

Hailiang or Zhijian can answer this question.

Thanks
Wen Congyang

> 
> Dave
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> 
> 
> .
>

Re: [Qemu-block] [PATCH v9 2/3] quorum: implement bdrv_add_child() and bdrv_del_child()

2016-01-20 Thread Wen Congyang

On 01/20/2016 11:43 PM, Alberto Garcia wrote:
> On Fri 25 Dec 2015 10:22:55 AM CET, Changlong Xie wrote:
>> @@ -875,9 +878,9 @@ static int quorum_open(BlockDriverState *bs, QDict 
>> *options, int flags,
>>  ret = -EINVAL;
>>  goto exit;
>>  }
>> -if (s->num_children < 2) {
>> +if (s->num_children < 1) {
>>  error_setg(_err,
>> -   "Number of provided children must be greater than 1");
>> +   "Number of provided children must be 1 or more");
>>  ret = -EINVAL;
>>  goto exit;
>>  }
> 
> I have a question: if you have a Quorum with just one member and you add
> a new one, how do you know if it has the same data as the existing one?
> 
> In general, what do you do to make sure that the data in a new Quorum
> child is consistent with that of the rest of the array?

Quorum can have more than one child when it starts. But we don't do the
similar check. So I don't think we should do such check here.

Thanks
Wen Congyang

> 
> Berto
> 
> 
> .
>

Re: [Qemu-block] [PATCH v9 0/3] qapi: child add/delete support

2016-01-17 Thread Wen Congyang

Ping...

On 12/25/2015 05:22 PM, Changlong Xie wrote:
> If quorum's child is broken, we can use mirror job to replace it.
> But sometimes, the user only need to remove the broken child, and
> add it later when the problem is fixed.
> 
> ChangLog:
> v9:
> 1. Rebase to the newest codes
> 2. Remove redundant codes in quorum_add_child() and quorum_del_child()
> 3. Fix typos and in qmp-commands.hx 
> v8:
> 1. Rebase to the newest codes
> 2. Address the comments from Eric Blake
> v7:
> 1. Remove the qmp command x-blockdev-change's parameter operation according
>to Kevin's comments.
> 2. Remove the hmp command.
> v6:
> 1. Use a single qmp command x-blockdev-change to replace x-blockdev-child-add
>and x-blockdev-child-delete
> v5:
> 1. Address Eric Blake's comments
> v4:
> 1. drop nbd driver's implementation. We can use human-monitor-command
>to do it.
> 2. Rename the command name.
> v3:
> 1. Don't open BDS in bdrv_add_child(). Use the existing BDS which is
>created by the QMP command blockdev-add.
> 2. The driver NBD can support filename, path, host:port now.
> v2:
> 1. Use bdrv_get_device_or_node_name() instead of new function
>bdrv_get_id_or_node_name()
> 2. Update the error message
> 3. Update the documents in block-core.json
> 
> Wen Congyang (3):
>   Add new block driver interface to add/delete a BDS's child
>   quorum: implement bdrv_add_child() and bdrv_del_child()
>   qmp: add monitor command to add/remove a child
> 
>  block.c   |  58 --
>  block/quorum.c| 122 
> +-
>  blockdev.c|  54 
>  include/block/block.h |   9 
>  include/block/block_int.h |   5 ++
>  qapi/block-core.json  |  23 +
>  qmp-commands.hx   |  47 ++
>  7 files changed, 312 insertions(+), 6 deletions(-)
>

Re: [Qemu-block] [Patch v12 resend 08/10] Implement new driver for block replication

2016-01-03 Thread Wen Congyang

On 12/23/2015 05:47 PM, Stefan Hajnoczi wrote:
> On Wed, Dec 02, 2015 at 01:37:25PM +0800, Wen Congyang wrote:
>> +/*
>> + * Only write to active disk if the sectors have
>> + * already been allocated in active disk/hidden disk.
>> + */
>> +qemu_iovec_init(_qiov, qiov->niov);
>> +while (remaining_sectors > 0) {
>> +ret = bdrv_is_allocated_above(top, base, sector_num,
>> +  remaining_sectors, );
> 
> There is a race condition here since multiple I/O requests can be in
> flight at the same time.   If two requests touch the same cluster
> between top->base then the result of these checks could be unreliable.

I don't think so. When we come here, primary qemu is gone, and failover is
done. We only write to active disk if the sectors have already been allocated
in active disk/hidden disk before failover. So it two requests touch the same
cluster, it is OK, because the function bdrv_is_allocated_above()'s return
value is not changed.

> 
> The simple but slow solution is to use a CoMutex to serialize requests.
> 
>> +if (ret < 0) {
>> +return ret;
>> +}
>> +
>> +qemu_iovec_reset(_qiov);
>> +qemu_iovec_concat(_qiov, qiov, bytes_done, n * 512);
>> +
>> +target = ret ? top : base;
>> +ret = bdrv_co_writev(target, sector_num, n, _qiov);
>> +if (ret < 0) {
>> +return ret;
>> +}
>> +
>> +remaining_sectors -= n;
>> +sector_num += n;
>> +bytes_done += n * BDRV_SECTOR_SIZE;
>> +}
> 
> I think this can be replaced with an active commit block job that copies
> data down from the hidden/active disk to the secondary disk.  It is okay
> to keep writing to the secondary disk while the block job is running and
> then switch over to the secondary disk once it completes.

Yes, active commit is another choice. IIRC, I don't use it because mirror job
has some problem. It is fixed now(see bdrv_drained_begin()/bdrv_drained_end()
in the mirror job).
We will use mirror job in the next version.

> 
>> +
>> +return 0;
>> +}
>> +
>> +static coroutine_fn int replication_co_discard(BlockDriverState *bs,
>> +   int64_t sector_num,
>> +   int nb_sectors)
>> +{
>> +BDRVReplicationState *s = bs->opaque;
>> +int ret;
>> +
>> +ret = replication_get_io_status(s);
>> +if (ret < 0) {
>> +return ret;
>> +}
>> +
>> +if (ret == 1) {
>> +/* It is secondary qemu and we are after failover */
>> +ret = bdrv_co_discard(s->secondary_disk, sector_num, nb_sectors);
> 
> What if the clusters are still allocated in the hidden/active disk?
> 

What does discard do? Drop the data that allocated in the disk?
If so, I think I make a misunderstand. I will fix it in the next version.

Thanks
Wen Congyang

Re: [Qemu-block] [Patch v12 resend 05/10] docs: block replication's description

2016-01-03 Thread Wen Congyang

On 12/23/2015 05:26 PM, Stefan Hajnoczi wrote:
> On Wed, Dec 02, 2015 at 01:31:46PM +0800, Wen Congyang wrote:
>> +== Failure Handling ==
>> +There are 6 internal errors when block replication is running:
>> +1. I/O error on primary disk
>> +2. Forwarding primary write requests failed
>> +3. Backup failed
>> +4. I/O error on secondary disk
>> +5. I/O error on active disk
>> +6. Making active disk or hidden disk empty failed
>> +In case 1 and 5, we just report the error to the disk layer. In case 2, 3,
>> +4 and 6, we just report block replication's error to FT/HA manager (which
>> +decides when to do a new checkpoint, when to do failover).
>> +There is no internal error when doing failover.
> 
> Not sure this is true.
> 
> Below it says the following for failover: "We will flush the Disk buffer
> into Secondary Disk and stop block replication".  Flushing the disk
> buffer can result in I/O errors.  This means that failover operations
> are not guaranteed to succeed.

We don't use mirror job now. We may use it in the next version.
Is there any way to know the I/O error when the mirror job is running?
Get the job's status?

> 
> In practice I think this is similar to a successful failover followed by
> immediately getting I/O errors on the new Primary Disk.  It means that
> right after failover there is another failure and the system may not be
> able to continue.

Block replication is not designed for such case. For example, we don't do
failover on primary disk's failure. In such case, we just report the error
to the disk layer(It is the case 1 in the above Failure Handling).

Sorry for the late reply. Your mail is sent at 2015-12-23, but I receive
it at 2016-01-04

> 
> So this really only matters in the case where there is a new Secondary
> ready after failover.  In that case the user might expect failover to
> continue to the new Secondary (Host 3):
> 
>[X][X]
>   Host 1 <-> Host 2 <-> Host 3
>

Re: [Qemu-block] [Patch v12 resend 00/10] Block replication for continuous checkpoints

2016-01-03 Thread Wen Congyang

On 12/23/2015 06:04 PM, Stefan Hajnoczi wrote:
> On Thu, Dec 17, 2015 at 02:22:14PM +0800, Wen Congyang wrote:
>> Stefan:Ping...
>>
>> What about this feature? I have worked for it about 1 year, but it is still 
>> in the
>> way...
> 
> The code still has TODOs.  What is the plan for supporting replication
> after failover?  This feature seems critical because anyone who wants FT
> won't be able to use this code unless it supports FT after the first
> failure.

We have implemented it based on an old version qemu. To keep the logical
simple, we don't post them now. We will post them after this feature is merged
into qemu.

> 
> ---
> 
> Adding new block layer APIs that are replication-specific is not clean.
> Only the replication block driver cares about the start/stop/checkpoint
> interface.
> 
> It is cleaner to have a separate API and data structure for block
> replication.
> 
> The replication code should define its own BlockReplicationOps struct
> and allow objects to register themselves.  Then it's no longer necessary
> to modify the core block layer to forward start/stop/checkpoint calls.
> 
> Something like:
> 
> typedef struct BlockReplicationOps BlockReplicationOps;
> typedef struct BlockReplicationState {
> const BlockReplicationOps *ops;
> QLIST_ENTRY(BlockReplicationState) list;
> } BlockReplicationState;
> 
> typedef struct {
> void start(BlockReplicationState *brs, Error **errp);
> void stop(BlockReplicationState *brs, Error **errp);
> void checkpoint(BlockReplicationState *brs, Error **errp);
> } BlockReplicationOps;
> 
> static QLIST_HEAD(BlockReplicationState) block_replication_states;
> 
> void block_replication_add(BlockReplicationState *brs);
> void block_replication_remove(BlockReplicationState *brs);
> 
> The replication block driver would add/remove itself.  The quorum block
> driver probably doesn't need to be modified (I think in your current
> patches you modify it just to forward the start/stop/checkpoint calls to
> a particular quorum child).

Yes, it is the major purpose. We also do some check in the quorum driver: 
we don't allow more than one child support block replication.

Thanks
Wen Congyang

> 
> Stefan
>

Re: [Qemu-block] [Qemu-devel] [PATCH COLO-Frame v12 25/38] qmp event: Add event notification for COLO error

2015-12-22 Thread Wen Congyang

On 12/19/2015 06:02 PM, Markus Armbruster wrote:
> Copying qemu-block because this seems related to generalising block jobs
> to background jobs.
> 
> zhanghailiang <zhang.zhanghaili...@huawei.com> writes:
> 
>> If some errors happen during VM's COLO FT stage, it's important to notify 
>> the users
>> of this event. Together with 'colo_lost_heartbeat', users can intervene in 
>> COLO's
>> failover work immediately.
>> If users don't want to get involved in COLO's failover verdict,
>> it is still necessary to notify users that we exited COLO mode.
>>
>> Cc: Markus Armbruster <arm...@redhat.com>
>> Cc: Michael Roth <mdr...@linux.vnet.ibm.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
>> Signed-off-by: Li Zhijian <lizhij...@cn.fujitsu.com>
>> ---
>> v11:
>> - Fix several typos found by Eric
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
>> ---
>>  docs/qmp-events.txt | 17 +
>>  migration/colo.c| 11 +++
>>  qapi-schema.json| 16 
>>  qapi/event.json | 17 +
>>  4 files changed, 61 insertions(+)
>>
>> diff --git a/docs/qmp-events.txt b/docs/qmp-events.txt
>> index d2f1ce4..19f68fc 100644
>> --- a/docs/qmp-events.txt
>> +++ b/docs/qmp-events.txt
>> @@ -184,6 +184,23 @@ Example:
>>  Note: The "ready to complete" status is always reset by a BLOCK_JOB_ERROR
>>  event.
>>  
>> +COLO_EXIT
>> +-
>> +
>> +Emitted when VM finishes COLO mode due to some errors happening or
>> +at the request of users.
> 
> How would the event's recipient distinguish between "due to error" and
> "at the user's request"?
> 
>> +
>> +Data:
>> +
>> + - "mode": COLO mode, primary or secondary side (json-string)
>> + - "reason":  the exit reason, internal error or external request. 
>> (json-string)
>> + - "error": error message (json-string, operation)
>> +
>> +Example:
>> +
>> +{"timestamp": {"seconds": 2032141960, "microseconds": 417172},
>> + "event": "COLO_EXIT", "data": {"mode": "primary", "reason": "request" } }
>> +
> 
> Pardon my ignorance again...  Does "VM finishes COLO mode" means have
> some kind of COLO background job, and it just finished for whatever
> reason?
> 
> If yes, this COLO job could be an instance of the general background job
> concept we're trying to grow from the existing block job concept.
> 
> I'm not asking you to rebase your work onto the background job
> infrastructure, not least for the simple reason that it doesn't exist,
> yet.  But I think it would be fruitful to compare your COLO job
> management QMP interface with the one we have for block jobs.  Not only
> may that avoid unnecessary inconsistency, it could also help shape the
> general background job interface.

COLO is not a block job. If live migration is a background jon, COLO
is also a backgroud job.

> 
> Quick overview of the block job QMP interface:
> 
> * Commands to create a job: block-commit, block-stream, drive-mirror,
>   drive-backup.
> 
> * Get information on jobs: query-block-jobs
> 
> * Pause a job: block-job-pause
> 
> * Resume a job: block-job-resume
> 
> * Cancel a job: block-job-cancel
> 
> * Block job completion events: BLOCK_JOB_COMPLETED, BLOCK_JOB_CANCELLED
> 
> * Block job error event: BLOCK_JOB_ERROR
> 
> * Block job synchronous completion: event BLOCK_JOB_READY and command
>   block-job-complete

What is background job infrastructure? Do you mean implement all the above
interfaces for each background job?

Thanks
Wen Congyang

> 
>>  DEVICE_DELETED
>>  --
>>  
>> diff --git a/migration/colo.c b/migration/colo.c
>> index d1dd4e1..d06c14f 100644
>> --- a/migration/colo.c
>> +++ b/migration/colo.c
>> @@ -18,6 +18,7 @@
>>  #include "qemu/error-report.h"
>>  #include "qemu/sockets.h"
>>  #include "migration/failover.h"
>> +#include "qapi-event.h"
>>  
>>  /* colo buffer */
>>  #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
>> @@ -349,6 +350,11 @@ static void colo_process_checkpoint(MigrationState *s)
>>  out:
>>  if (ret < 0) {
>>  error_report("%s: %s", __func__, strerror(-ret));
>> +qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_ERROR,
>> +

Re: [Qemu-block] [Patch v12 resend 00/10] Block replication for continuous checkpoints

2015-12-16 Thread Wen Congyang

Stefan:Ping...

What about this feature? I have worked for it about 1 year, but it is still in 
the
way...

On 12/02/2015 01:31 PM, Wen Congyang wrote:
> Block replication is a very important feature which is used for
> continuous checkpoints(for example: COLO).
> 
> You can get the detailed information about block replication from here:
> http://wiki.qemu.org/Features/BlockReplication
> 
> Usage:
> Please refer to docs/block-replication.txt
> 
> This patch series is based on the following patch series:
> 1. http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg04949.html
> 2. http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg06043.html
> 
> You can get the patch here:
> https://github.com/coloft/qemu/tree/wency/block-replication-v12
> 
> You can get the patch with framework here:
> https://github.com/coloft/qemu/tree/wency/colo_framework_v11.2
> 
> TODO:
> 1. Continuous block replication. It will be started after basic functions
>are accepted.
> 
> Changs Log:
> V12:
> 1. Rebase to the newest codes
> 2. Use backing reference to replcace 'allow-write-backing-file'
> V11:
> 1. Reopen the backing file when starting blcok replication if it is not
>opened in R/W mode
> 2. Unblock BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET
>when opening backing file
> 3. Block the top BDS so there is only one block job for the top BDS and
>its backing chain.
> V10:
> 1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
>reference.
> 2. Address the comments from Eric Blake
> V9:
> 1. Update the error messages
> 2. Rebase to the newest qemu
> 3. Split child add/delete support. These patches are sent in another patchset.
> V8:
> 1. Address Alberto Garcia's comments
> V7:
> 1. Implement adding/removing quorum child. Remove the option non-connect.
> 2. Simplify the backing refrence option according to Stefan Hajnoczi's 
> suggestion
> V6:
> 1. Rebase to the newest qemu.
> V5:
> 1. Address the comments from Gong Lei
> 2. Speed the failover up. The secondary vm can take over very quickly even
>if there are too many I/O requests.
> V4:
> 1. Introduce a new driver replication to avoid touch nbd and qcow2.
> V3:
> 1: use error_setg() instead of error_set()
> 2. Add a new block job API
> 3. Active disk, hidden disk and nbd target uses the same AioContext
> 4. Add a testcase to test new hbitmap API
> V2:
> 1. Redesign the secondary qemu(use image-fleecing)
> 2. Use Error objects to return error message
> 3. Address the comments from Max Reitz and Eric Blake
> 
> Wen Congyang (10):
>   unblock backup operations in backing file
>   Store parent BDS in BdrvChild
>   Backup: clear all bitmap when doing block checkpoint
>   Allow creating backup jobs when opening BDS
>   docs: block replication's description
>   Add new block driver interfaces to control block replication
>   quorum: implement block driver interfaces for block replication
>   Implement new driver for block replication
>   support replication driver in blockdev-add
>   Add a new API to start/stop replication, do checkpoint to all BDSes
> 
>  block.c| 145 
>  block/Makefile.objs|   3 +-
>  block/backup.c |  14 ++
>  block/quorum.c |  78 +++
>  block/replication.c| 549 
> +
>  blockjob.c |  11 +
>  docs/block-replication.txt | 227 +++
>  include/block/block.h  |   9 +
>  include/block/block_int.h  |  15 ++
>  include/block/blockjob.h   |  12 +
>  qapi/block-core.json   |  34 ++-
>  11 files changed, 1093 insertions(+), 4 deletions(-)
>  create mode 100644 block/replication.c
>  create mode 100644 docs/block-replication.txt
>

Re: [Qemu-block] [Patch v8 0/3] qapi: child add/delete support

2015-12-09 Thread Wen Congyang

Kevin: ping

On 11/27/2015 02:06 PM, Wen Congyang wrote:
> If quorum's child is broken, we can use mirror job to replace it.
> But sometimes, the user only need to remove the broken child, and
> add it later when the problem is fixed.
> 
> It is based on the Kevin's child name related patch:
> http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg04949.html
> 
> ChangLog:
> v8:
> 1. Rebase to the newest codes
> 2. Address the comments from Eric Blake
> v7:
> 1. Remove the qmp command x-blockdev-change's parameter operation according
>to Kevin's comments.
> 2. Remove the hmp command.
> v6:
> 1. Use a single qmp command x-blockdev-change to replace x-blockdev-child-add
>and x-blockdev-child-delete
> v5:
> 1. Address Eric Blake's comments
> v4:
> 1. drop nbd driver's implementation. We can use human-monitor-command
>to do it.
> 2. Rename the command name.
> v3:
> 1. Don't open BDS in bdrv_add_child(). Use the existing BDS which is
>created by the QMP command blockdev-add.
> 2. The driver NBD can support filename, path, host:port now.
> v2:
> 1. Use bdrv_get_device_or_node_name() instead of new function
>bdrv_get_id_or_node_name()
> 2. Update the error message
> 3. Update the documents in block-core.json
> 
> Wen Congyang (3):
>   Add new block driver interface to add/delete a BDS's child
>   quorum: implement bdrv_add_child() and bdrv_del_child()
>   qmp: add monitor command to add/remove a child
> 
>  block.c   |  58 --
>  block/quorum.c| 124 
> +-
>  blockdev.c|  54 
>  include/block/block.h |   9 
>  include/block/block_int.h |   5 ++
>  qapi/block-core.json  |  23 +
>  qmp-commands.hx   |  47 ++
>  7 files changed, 314 insertions(+), 6 deletions(-)
>

Re: [Qemu-block] [Patch v12 00/10] Block replication for continuous checkpoints

2015-12-01 Thread Wen Congyang

On 12/01/2015 06:40 PM, Dr. David Alan Gilbert wrote:
> * Wen Congyang (we...@cn.fujitsu.com) wrote:
>> Block replication is a very important feature which is used for
>> continuous checkpoints(for example: COLO).
>>
>> You can get the detailed information about block replication from here:
>> http://wiki.qemu.org/Features/BlockReplication
>>
>> Usage:
>> Please refer to docs/block-replication.txt
>>
>> This patch series is based on the following patch series:
>> 1. http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg04949.html
>> 2. http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg06043.html
>>
>> You can get the patch here:
>> https://github.com/coloft/qemu/tree/wency/block-replication-v12
>>
>> You can get the patch with framework here:
>> https://github.com/coloft/qemu/tree/wency/colo_framework_v11.2
> 
> Neither of these links work for me, and I see that  only messages 0..7 in the
> series hit the list.

I forgot to push it to github...
And I also received the messages 0..7, and I don't know what's wrong...

I will push it to github, and resend them.

Thanks
Wen Congyang

> 
> Dave
> 
>>
>> TODO:
>> 1. Continuous block replication. It will be started after basic functions
>>are accepted.
>>
>> Changs Log:
>> V12:
>> 1. Rebase to the newest codes
>> 2. Use backing reference to replcace 'allow-write-backing-file'
>> V11:
>> 1. Reopen the backing file when starting blcok replication if it is not
>>opened in R/W mode
>> 2. Unblock BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET
>>when opening backing file
>> 3. Block the top BDS so there is only one block job for the top BDS and
>>its backing chain.
>> V10:
>> 1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
>>reference.
>> 2. Address the comments from Eric Blake
>> V9:
>> 1. Update the error messages
>> 2. Rebase to the newest qemu
>> 3. Split child add/delete support. These patches are sent in another 
>> patchset.
>> V8:
>> 1. Address Alberto Garcia's comments
>> V7:
>> 1. Implement adding/removing quorum child. Remove the option non-connect.
>> 2. Simplify the backing refrence option according to Stefan Hajnoczi's 
>> suggestion
>> V6:
>> 1. Rebase to the newest qemu.
>> V5:
>> 1. Address the comments from Gong Lei
>> 2. Speed the failover up. The secondary vm can take over very quickly even
>>if there are too many I/O requests.
>> V4:
>> 1. Introduce a new driver replication to avoid touch nbd and qcow2.
>> V3:
>> 1: use error_setg() instead of error_set()
>> 2. Add a new block job API
>> 3. Active disk, hidden disk and nbd target uses the same AioContext
>> 4. Add a testcase to test new hbitmap API
>> V2:
>> 1. Redesign the secondary qemu(use image-fleecing)
>> 2. Use Error objects to return error message
>> 3. Address the comments from Max Reitz and Eric Blake
>>
>> Wen Congyang (10):
>>   unblock backup operations in backing file
>>   Store parent BDS in BdrvChild
>>   Backup: clear all bitmap when doing block checkpoint
>>   Allow creating backup jobs when opening BDS
>>   docs: block replication's description
>>   Add new block driver interfaces to control block replication
>>   quorum: implement block driver interfaces for block replication
>>   Implement new driver for block replication
>>   support replication driver in blockdev-add
>>   Add a new API to start/stop replication, do checkpoint to all BDSes
>>
>>  block.c| 145 
>>  block/Makefile.objs|   3 +-
>>  block/backup.c |  14 ++
>>  block/quorum.c |  78 +++
>>  block/replication.c| 549 
>> +
>>  blockjob.c |  11 +
>>  docs/block-replication.txt | 227 +++
>>  include/block/block.h  |   9 +
>>  include/block/block_int.h  |  15 ++
>>  include/block/blockjob.h   |  12 +
>>  qapi/block-core.json   |  34 ++-
>>  11 files changed, 1093 insertions(+), 4 deletions(-)
>>  create mode 100644 block/replication.c
>>  create mode 100644 docs/block-replication.txt
>>
>> -- 
>> 2.5.0
>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> 
> 
> .
>

Re: [Qemu-block] [Qemu-devel] [Patch v12 00/10] Block replication for continuous checkpoints

2015-12-01 Thread Wen Congyang

On 12/01/2015 07:58 PM, Hailiang Zhang wrote:
> On 2015/12/1 18:40, Dr. David Alan Gilbert wrote:
>> * Wen Congyang (we...@cn.fujitsu.com) wrote:
>>> Block replication is a very important feature which is used for
>>> continuous checkpoints(for example: COLO).
>>>
>>> You can get the detailed information about block replication from here:
>>> http://wiki.qemu.org/Features/BlockReplication
>>>
>>> Usage:
>>> Please refer to docs/block-replication.txt
>>>
>>> This patch series is based on the following patch series:
>>> 1. http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg04949.html
>>> 2. http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg06043.html
>>>
>>> You can get the patch here:
>>> https://github.com/coloft/qemu/tree/wency/block-replication-v12
>>>
>>> You can get the patch with framework here:
>>> https://github.com/coloft/qemu/tree/wency/colo_framework_v11.2
>>
>> Neither of these links work for me, and I see that  only messages 0..7 in the
>> series hit the list.
>>
> 
> Hi Dave,
> 
> You can refer to https://github.com/coloft/qemu/tree/colo-v2.2-periodic-mode,
> The block replication part in this link is also the newest version.

No, I remove one patch, and the usage is changed.

Thanks
Wen Congyang

> 
> Congyang has deleted this confused branch, we will pay attention to this 
> later in next version.
> 
> Thanks,
> Hailiang
> 
>>
>>>
>>> TODO:
>>> 1. Continuous block replication. It will be started after basic functions
>>> are accepted.
>>>
>>> Changs Log:
>>> V12:
>>> 1. Rebase to the newest codes
>>> 2. Use backing reference to replcace 'allow-write-backing-file'
>>> V11:
>>> 1. Reopen the backing file when starting blcok replication if it is not
>>> opened in R/W mode
>>> 2. Unblock BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET
>>> when opening backing file
>>> 3. Block the top BDS so there is only one block job for the top BDS and
>>> its backing chain.
>>> V10:
>>> 1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
>>> reference.
>>> 2. Address the comments from Eric Blake
>>> V9:
>>> 1. Update the error messages
>>> 2. Rebase to the newest qemu
>>> 3. Split child add/delete support. These patches are sent in another 
>>> patchset.
>>> V8:
>>> 1. Address Alberto Garcia's comments
>>> V7:
>>> 1. Implement adding/removing quorum child. Remove the option non-connect.
>>> 2. Simplify the backing refrence option according to Stefan Hajnoczi's 
>>> suggestion
>>> V6:
>>> 1. Rebase to the newest qemu.
>>> V5:
>>> 1. Address the comments from Gong Lei
>>> 2. Speed the failover up. The secondary vm can take over very quickly even
>>> if there are too many I/O requests.
>>> V4:
>>> 1. Introduce a new driver replication to avoid touch nbd and qcow2.
>>> V3:
>>> 1: use error_setg() instead of error_set()
>>> 2. Add a new block job API
>>> 3. Active disk, hidden disk and nbd target uses the same AioContext
>>> 4. Add a testcase to test new hbitmap API
>>> V2:
>>> 1. Redesign the secondary qemu(use image-fleecing)
>>> 2. Use Error objects to return error message
>>> 3. Address the comments from Max Reitz and Eric Blake
>>>
>>> Wen Congyang (10):
>>>unblock backup operations in backing file
>>>Store parent BDS in BdrvChild
>>>Backup: clear all bitmap when doing block checkpoint
>>>Allow creating backup jobs when opening BDS
>>>docs: block replication's description
>>>Add new block driver interfaces to control block replication
>>>quorum: implement block driver interfaces for block replication
>>>Implement new driver for block replication
>>>support replication driver in blockdev-add
>>>Add a new API to start/stop replication, do checkpoint to all BDSes
>>>
>>>   block.c| 145 
>>>   block/Makefile.objs|   3 +-
>>>   block/backup.c |  14 ++
>>>   block/quorum.c |  78 +++
>>>   block/replication.c| 549 
>>> +
>>>   blockjob.c |  11 +
>>>   docs/block-replication.txt | 227 +++
>>>   include/block/block.h  |   9 +
>>>   include/block/block_int.h  |  15 ++
>>>   include/block/blockjob.h   |  12 +
>>>   qapi/block-core.json   |  34 ++-
>>>   11 files changed, 1093 insertions(+), 4 deletions(-)
>>>   create mode 100644 block/replication.c
>>>   create mode 100644 docs/block-replication.txt
>>>
>>> -- 
>>> 2.5.0
>>>
>>>
>>>
>> -- 
>> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
>>
>>
>> .
>>
> 
> 
> 
> 
> .
>

Re: [Qemu-block] [Qemu-devel] [Patch v12 00/10] Block replication for continuous checkpoints

2015-12-01 Thread Wen Congyang

On 12/02/2015 09:00 AM, Wen Congyang wrote:
> On 12/01/2015 06:40 PM, Dr. David Alan Gilbert wrote:
>> * Wen Congyang (we...@cn.fujitsu.com) wrote:
>>> Block replication is a very important feature which is used for
>>> continuous checkpoints(for example: COLO).
>>>
>>> You can get the detailed information about block replication from here:
>>> http://wiki.qemu.org/Features/BlockReplication
>>>
>>> Usage:
>>> Please refer to docs/block-replication.txt
>>>
>>> This patch series is based on the following patch series:
>>> 1. http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg04949.html
>>> 2. http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg06043.html
>>>
>>> You can get the patch here:
>>> https://github.com/coloft/qemu/tree/wency/block-replication-v12
>>>
>>> You can get the patch with framework here:
>>> https://github.com/coloft/qemu/tree/wency/colo_framework_v11.2
>>
>> Neither of these links work for me, and I see that  only messages 0..7 in the
>> series hit the list.
> 
> I forgot to push it to github...
> And I also received the messages 0..7, and I don't know what's wrong...

The reason is that: git send-email has a bug:
http://permalink.gmane.org/gmane.comp.version-control.git/274569

Thanks
Wen Congyang

> 
> I will push it to github, and resend them.
> 
> Thanks
> Wen Congyang
> 
>>
>> Dave
>>
>>>
>>> TODO:
>>> 1. Continuous block replication. It will be started after basic functions
>>>are accepted.
>>>
>>> Changs Log:
>>> V12:
>>> 1. Rebase to the newest codes
>>> 2. Use backing reference to replcace 'allow-write-backing-file'
>>> V11:
>>> 1. Reopen the backing file when starting blcok replication if it is not
>>>opened in R/W mode
>>> 2. Unblock BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET
>>>when opening backing file
>>> 3. Block the top BDS so there is only one block job for the top BDS and
>>>its backing chain.
>>> V10:
>>> 1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
>>>reference.
>>> 2. Address the comments from Eric Blake
>>> V9:
>>> 1. Update the error messages
>>> 2. Rebase to the newest qemu
>>> 3. Split child add/delete support. These patches are sent in another 
>>> patchset.
>>> V8:
>>> 1. Address Alberto Garcia's comments
>>> V7:
>>> 1. Implement adding/removing quorum child. Remove the option non-connect.
>>> 2. Simplify the backing refrence option according to Stefan Hajnoczi's 
>>> suggestion
>>> V6:
>>> 1. Rebase to the newest qemu.
>>> V5:
>>> 1. Address the comments from Gong Lei
>>> 2. Speed the failover up. The secondary vm can take over very quickly even
>>>if there are too many I/O requests.
>>> V4:
>>> 1. Introduce a new driver replication to avoid touch nbd and qcow2.
>>> V3:
>>> 1: use error_setg() instead of error_set()
>>> 2. Add a new block job API
>>> 3. Active disk, hidden disk and nbd target uses the same AioContext
>>> 4. Add a testcase to test new hbitmap API
>>> V2:
>>> 1. Redesign the secondary qemu(use image-fleecing)
>>> 2. Use Error objects to return error message
>>> 3. Address the comments from Max Reitz and Eric Blake
>>>
>>> Wen Congyang (10):
>>>   unblock backup operations in backing file
>>>   Store parent BDS in BdrvChild
>>>   Backup: clear all bitmap when doing block checkpoint
>>>   Allow creating backup jobs when opening BDS
>>>   docs: block replication's description
>>>   Add new block driver interfaces to control block replication
>>>   quorum: implement block driver interfaces for block replication
>>>   Implement new driver for block replication
>>>   support replication driver in blockdev-add
>>>   Add a new API to start/stop replication, do checkpoint to all BDSes
>>>
>>>  block.c| 145 
>>>  block/Makefile.objs|   3 +-
>>>  block/backup.c |  14 ++
>>>  block/quorum.c |  78 +++
>>>  block/replication.c| 549 
>>> +
>>>  blockjob.c |  11 +
>>>  docs/block-replication.txt | 227 +++
>>>  include/block/block.h  |   9 +
>>>  include/block/block_int.h  |  15 ++
>>>  include/block/blockjob.h   |  12 +
>>>  qapi/block-core.json   |  34 ++-
>>>  11 files changed, 1093 insertions(+), 4 deletions(-)
>>>  create mode 100644 block/replication.c
>>>  create mode 100644 docs/block-replication.txt
>>>
>>> -- 
>>> 2.5.0
>>>
>>>
>>>
>> --
>> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
>>
>>
>> .
>>
> 
> 
> 
> 
> .
>

[Qemu-block] [Patch v12 resend 06/10] Add new block driver interfaces to control block replication

2015-12-01 Thread Wen Congyang

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Gonglei <arei.gong...@huawei.com>
Cc: Luiz Capitulino <lcapitul...@redhat.com>
Cc: Michael Roth <mdr...@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonz...@redhat.com>
---
 block.c   | 43 +++
 include/block/block.h |  5 +
 include/block/block_int.h | 14 ++
 qapi/block-core.json  | 13 +
 4 files changed, 75 insertions(+)

diff --git a/block.c b/block.c
index 0a0468f..213bee8 100644
--- a/block.c
+++ b/block.c
@@ -4390,3 +4390,46 @@ void bdrv_del_child(BlockDriverState *parent_bs, 
BlockDriverState *child_bs,
 
 parent_bs->drv->bdrv_del_child(parent_bs, child_bs, errp);
 }
+
+void bdrv_start_replication(BlockDriverState *bs, ReplicationMode mode,
+Error **errp)
+{
+BlockDriver *drv = bs->drv;
+
+if (drv && drv->bdrv_start_replication) {
+drv->bdrv_start_replication(bs, mode, errp);
+} else if (bs->file) {
+bdrv_start_replication(bs->file->bs, mode, errp);
+} else {
+error_setg(errp, "The BDS %s doesn't support starting block"
+   " replication", bs->filename);
+}
+}
+
+void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp)
+{
+BlockDriver *drv = bs->drv;
+
+if (drv && drv->bdrv_do_checkpoint) {
+drv->bdrv_do_checkpoint(bs, errp);
+} else if (bs->file) {
+bdrv_do_checkpoint(bs->file->bs, errp);
+} else {
+error_setg(errp, "The BDS %s doesn't support block checkpoint",
+   bs->filename);
+}
+}
+
+void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp)
+{
+BlockDriver *drv = bs->drv;
+
+if (drv && drv->bdrv_stop_replication) {
+drv->bdrv_stop_replication(bs, failover, errp);
+} else if (bs->file) {
+bdrv_stop_replication(bs->file->bs, failover, errp);
+} else {
+error_setg(errp, "The BDS %s doesn't support stopping block"
+   " replication", bs->filename);
+}
+}
diff --git a/include/block/block.h b/include/block/block.h
index 1d3b9c6..cd39d50 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -648,4 +648,9 @@ void bdrv_add_child(BlockDriverState *parent, 
BlockDriverState *child,
 void bdrv_del_child(BlockDriverState *parent, BlockDriverState *child,
 Error **errp);
 
+void bdrv_start_replication(BlockDriverState *bs, ReplicationMode mode,
+Error **errp);
+void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp);
+void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp);
+
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 1f56046..a6aba8b 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -307,6 +307,20 @@ struct BlockDriver {
 void (*bdrv_del_child)(BlockDriverState *parent, BlockDriverState *child,
Error **errp);
 
+void (*bdrv_start_replication)(BlockDriverState *bs, ReplicationMode mode,
+   Error **errp);
+/* Drop Disk buffer when doing checkpoint. */
+void (*bdrv_do_checkpoint)(BlockDriverState *bs, Error **errp);
+/*
+ * After failover, we should flush Disk buffer into secondary disk
+ * and stop block replication.
+ *
+ * If the guest is shutdown, we should drop Disk buffer and stop
+ * block representation.
+ */
+void (*bdrv_stop_replication)(BlockDriverState *bs, bool failover,
+  Error **errp);
+
 QLIST_ENTRY(BlockDriver) list;
 };
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index feb8da2..2c6bd3f 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1925,6 +1925,19 @@
 '*read-pattern': 'QuorumReadPattern' } }
 
 ##
+# @ReplicationMode
+#
+# An enumeration of replication modes.
+#
+# @primary: Primary mode, the vm's state will be sent to secondary QEMU.
+#
+# @secondary: Secondary mode, receive the vm's state from primary QEMU.
+#
+# Since: 2.5
+##
+{ 'enum' : 'ReplicationMode', 'data' : [ 'primary', 'secondary' ] }
+
+##
 # @BlockdevOptions
 #
 # Options for creating a block device.
-- 
2.5.0

[Qemu-block] [Patch v12 resend 07/10] quorum: implement block driver interfaces for block replication

2015-12-01 Thread Wen Congyang

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Gonglei <arei.gong...@huawei.com>
Reviewed-by: Alberto Garcia <be...@igalia.com>
---
 block/quorum.c | 78 ++
 1 file changed, 78 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index b7df14b..6fa54f3 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -85,6 +85,8 @@ typedef struct BDRVQuorumState {
 int bsize;
 
 QuorumReadPattern read_pattern;
+
+int replication_index; /* store which child supports block replication */
 } BDRVQuorumState;
 
 typedef struct QuorumAIOCB QuorumAIOCB;
@@ -949,6 +951,7 @@ static int quorum_open(BlockDriverState *bs, QDict 
*options, int flags,
 s->bsize = s->num_children;
 
 g_free(opened);
+s->replication_index = -1;
 goto exit;
 
 close_exit:
@@ -1148,6 +1151,77 @@ static void quorum_refresh_filename(BlockDriverState 
*bs, QDict *options)
 bs->full_open_options = opts;
 }
 
+static void quorum_start_replication(BlockDriverState *bs, ReplicationMode 
mode,
+ Error **errp)
+{
+BDRVQuorumState *s = bs->opaque;
+int count = 0, i, index;
+Error *local_err = NULL;
+
+/*
+ * TODO: support REPLICATION_MODE_SECONDARY if we allow secondary
+ * QEMU becoming primary QEMU.
+ */
+if (mode != REPLICATION_MODE_PRIMARY) {
+error_setg(errp, "The replication mode for quorum should be 
'primary'");
+return;
+}
+
+if (s->read_pattern != QUORUM_READ_PATTERN_FIFO) {
+error_setg(errp, "Block replication needs read pattern 'fifo'");
+return;
+}
+
+for (i = 0; i < s->num_children; i++) {
+bdrv_start_replication(s->children[i]->bs, mode, _err);
+if (local_err) {
+error_free(local_err);
+local_err = NULL;
+} else {
+count++;
+index = i;
+}
+}
+
+if (count == 0) {
+error_setg(errp, "No child supports block replication");
+} else if (count > 1) {
+for (i = 0; i < s->num_children; i++) {
+bdrv_stop_replication(s->children[i]->bs, false, NULL);
+}
+error_setg(errp, "Too many children support block replication");
+} else {
+s->replication_index = index;
+}
+}
+
+static void quorum_do_checkpoint(BlockDriverState *bs, Error **errp)
+{
+BDRVQuorumState *s = bs->opaque;
+
+if (s->replication_index < 0) {
+error_setg(errp, "Block replication is not running");
+return;
+}
+
+bdrv_do_checkpoint(s->children[s->replication_index]->bs, errp);
+}
+
+static void quorum_stop_replication(BlockDriverState *bs, bool failover,
+Error **errp)
+{
+BDRVQuorumState *s = bs->opaque;
+
+if (s->replication_index < 0) {
+error_setg(errp, "Block replication is not running");
+return;
+}
+
+bdrv_stop_replication(s->children[s->replication_index]->bs, failover,
+  errp);
+s->replication_index = -1;
+}
+
 static BlockDriver bdrv_quorum = {
 .format_name= "quorum",
 .protocol_name  = "quorum",
@@ -1174,6 +1248,10 @@ static BlockDriver bdrv_quorum = {
 
 .is_filter  = true,
 .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
+
+.bdrv_start_replication = quorum_start_replication,
+.bdrv_do_checkpoint = quorum_do_checkpoint,
+.bdrv_stop_replication  = quorum_stop_replication,
 };
 
 static void bdrv_quorum_init(void)
-- 
2.5.0

[Qemu-block] [Patch v12 resend 04/10] Allow creating backup jobs when opening BDS

2015-12-01 Thread Wen Congyang

When opening BDS, we need to create backup jobs for
image-fleecing.

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Gonglei <arei.gong...@huawei.com>
Reviewed-by: Stefan Hajnoczi <stefa...@redhat.com>
Reviewed-by: Jeff Cody <jc...@redhat.com>
---
 block/Makefile.objs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 58ef2ef..fa05f37 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -22,10 +22,10 @@ block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o
 block-obj-$(CONFIG_LIBSSH2) += ssh.o
 block-obj-y += accounting.o
 block-obj-y += write-threshold.o
+block-obj-y += backup.o
 
 common-obj-y += stream.o
 common-obj-y += commit.o
-common-obj-y += backup.o
 
 iscsi.o-cflags := $(LIBISCSI_CFLAGS)
 iscsi.o-libs   := $(LIBISCSI_LIBS)
-- 
2.5.0

[Qemu-block] [Patch v12 resend 01/10] unblock backup operations in backing file

2015-12-01 Thread Wen Congyang

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Gonglei <arei.gong...@huawei.com>
---
 block.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/block.c b/block.c
index bfc2be8..eaf479a 100644
--- a/block.c
+++ b/block.c
@@ -1275,6 +1275,24 @@ void bdrv_set_backing_hd(BlockDriverState *bs, 
BlockDriverState *backing_hd)
 /* Otherwise we won't be able to commit due to check in bdrv_commit */
 bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_COMMIT_TARGET,
 bs->backing_blocker);
+/*
+ * We do backup in 3 ways:
+ * 1. drive backup
+ *The target bs is new opened, and the source is top BDS
+ * 2. blockdev backup
+ *Both the source and the target are top BDSes.
+ * 3. internal backup(used for block replication)
+ *Both the source and the target are backing file
+ *
+ * In case 1, and 2, the backing file is neither the source nor
+ * the target.
+ * In case 3, we will block the top BDS, so there is only one block
+ * job for the top BDS and its backing chain.
+ */
+bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_SOURCE,
+bs->backing_blocker);
+bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_TARGET,
+bs->backing_blocker);
 out:
 bdrv_refresh_limits(bs, NULL);
 }
-- 
2.5.0

[Qemu-block] [Patch v12 resend 03/10] Backup: clear all bitmap when doing block checkpoint

2015-12-01 Thread Wen Congyang

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Gonglei <arei.gong...@huawei.com>
Reviewed-by: Jeff Cody <jc...@redhat.com>
---
 block/backup.c   | 14 ++
 blockjob.c   | 11 +++
 include/block/blockjob.h | 12 
 3 files changed, 37 insertions(+)

diff --git a/block/backup.c b/block/backup.c
index 3b39119..1ca102d 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -253,11 +253,25 @@ static void backup_abort(BlockJob *job)
 }
 }
 
+static void backup_do_checkpoint(BlockJob *job, Error **errp)
+{
+BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
+
+if (backup_job->sync_mode != MIRROR_SYNC_MODE_NONE) {
+error_setg(errp, "The backup job only supports block checkpoint in"
+   " sync=none mode");
+return;
+}
+
+hbitmap_reset_all(backup_job->bitmap);
+}
+
 static const BlockJobDriver backup_job_driver = {
 .instance_size  = sizeof(BackupBlockJob),
 .job_type   = BLOCK_JOB_TYPE_BACKUP,
 .set_speed  = backup_set_speed,
 .iostatus_reset = backup_iostatus_reset,
+.do_checkpoint  = backup_do_checkpoint,
 .commit = backup_commit,
 .abort  = backup_abort,
 };
diff --git a/blockjob.c b/blockjob.c
index 80adb9d..0c8edfe 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -533,3 +533,14 @@ void block_job_txn_add_job(BlockJobTxn *txn, BlockJob *job)
 QLIST_INSERT_HEAD(>jobs, job, txn_list);
 block_job_txn_ref(txn);
 }
+
+void block_job_do_checkpoint(BlockJob *job, Error **errp)
+{
+if (!job->driver->do_checkpoint) {
+error_setg(errp, "The job %s doesn't support block checkpoint",
+   BlockJobType_lookup[job->driver->job_type]);
+return;
+}
+
+job->driver->do_checkpoint(job, errp);
+}
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index d84ccd8..abdba7c 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -70,6 +70,9 @@ typedef struct BlockJobDriver {
  * never both.
  */
 void (*abort)(BlockJob *job);
+
+/** Optional callback for job types that support checkpoint. */
+void (*do_checkpoint)(BlockJob *job, Error **errp);
 } BlockJobDriver;
 
 /**
@@ -443,4 +446,13 @@ void block_job_txn_unref(BlockJobTxn *txn);
  */
 void block_job_txn_add_job(BlockJobTxn *txn, BlockJob *job);
 
+/**
+ * block_job_do_checkpoint:
+ * @job: The job.
+ * @errp: Error object.
+ *
+ * Do block checkpoint on the specified job.
+ */
+void block_job_do_checkpoint(BlockJob *job, Error **errp);
+
 #endif
-- 
2.5.0

[Qemu-block] [Patch v12 resend 08/10] Implement new driver for block replication

2015-12-01 Thread Wen Congyang

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Gonglei <arei.gong...@huawei.com>
---
 block/Makefile.objs |   1 +
 block/replication.c | 549 
 2 files changed, 550 insertions(+)
 create mode 100644 block/replication.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index fa05f37..94c1d03 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -23,6 +23,7 @@ block-obj-$(CONFIG_LIBSSH2) += ssh.o
 block-obj-y += accounting.o
 block-obj-y += write-threshold.o
 block-obj-y += backup.o
+block-obj-y += replication.o
 
 common-obj-y += stream.o
 common-obj-y += commit.o
diff --git a/block/replication.c b/block/replication.c
new file mode 100644
index 000..c46c916
--- /dev/null
+++ b/block/replication.c
@@ -0,0 +1,549 @@
+/*
+ * Replication Block filter
+ *
+ * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2015 Intel Corporation
+ * Copyright (c) 2015 FUJITSU LIMITED
+ *
+ * Author:
+ *   Wen Congyang <we...@cn.fujitsu.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu-common.h"
+#include "block/block_int.h"
+#include "block/blockjob.h"
+#include "block/nbd.h"
+
+typedef struct BDRVReplicationState {
+ReplicationMode mode;
+int replication_state;
+BlockDriverState *active_disk;
+BlockDriverState *hidden_disk;
+BlockDriverState *secondary_disk;
+BlockDriverState *top_bs;
+Error *blocker;
+int orig_hidden_flags;
+int orig_secondary_flags;
+int error;
+} BDRVReplicationState;
+
+enum {
+BLOCK_REPLICATION_NONE, /* block replication is not started */
+BLOCK_REPLICATION_RUNNING,  /* block replication is running */
+BLOCK_REPLICATION_DONE, /* block replication is done(failover) */
+};
+
+#define COMMIT_CLUSTER_BITS 16
+#define COMMIT_CLUSTER_SIZE (1 << COMMIT_CLUSTER_BITS)
+#define COMMIT_SECTORS_PER_CLUSTER (COMMIT_CLUSTER_SIZE / BDRV_SECTOR_SIZE)
+
+static void replication_stop(BlockDriverState *bs, bool failover, Error 
**errp);
+
+#define REPLICATION_MODE"mode"
+static QemuOptsList replication_runtime_opts = {
+.name = "replication",
+.head = QTAILQ_HEAD_INITIALIZER(replication_runtime_opts.head),
+.desc = {
+{
+.name = REPLICATION_MODE,
+.type = QEMU_OPT_STRING,
+},
+{ /* end of list */ }
+},
+};
+
+static int replication_open(BlockDriverState *bs, QDict *options,
+int flags, Error **errp)
+{
+int ret;
+BDRVReplicationState *s = bs->opaque;;
+Error *local_err = NULL;
+QemuOpts *opts = NULL;
+const char *mode;
+
+ret = -EINVAL;
+opts = qemu_opts_create(_runtime_opts, NULL, 0, _abort);
+qemu_opts_absorb_qdict(opts, options, _err);
+if (local_err) {
+goto fail;
+}
+
+mode = qemu_opt_get(opts, REPLICATION_MODE);
+if (!mode) {
+error_setg(_err, "Missing the option mode");
+goto fail;
+}
+
+if (!strcmp(mode, "primary")) {
+s->mode = REPLICATION_MODE_PRIMARY;
+} else if (!strcmp(mode, "secondary")) {
+s->mode = REPLICATION_MODE_SECONDARY;
+} else {
+error_setg(_err,
+   "The option mode's value should be primary or secondary");
+goto fail;
+}
+
+ret = 0;
+
+fail:
+qemu_opts_del(opts);
+/* propagate error */
+if (local_err) {
+error_propagate(errp, local_err);
+}
+return ret;
+}
+
+static void replication_close(BlockDriverState *bs)
+{
+BDRVReplicationState *s = bs->opaque;
+
+if (s->replication_state == BLOCK_REPLICATION_RUNNING) {
+replication_stop(bs, false, NULL);
+}
+}
+
+static int64_t replication_getlength(BlockDriverState *bs)
+{
+return bdrv_getlength(bs->file->bs);
+}
+
+static int replication_get_io_status(BDRVReplicationState *s)
+{
+switch (s->replication_state) {
+case BLOCK_REPLICATION_NONE:
+return -EIO;
+case BLOCK_REPLICATION_RUNNING:
+return 0;
+case BLOCK_REPLICATION_DONE:
+return s->mode == REPLICATION_MODE_PRIMARY ? -EIO : 1;
+default:
+abort();
+}
+}
+
+static int replication_return_value(BDRVReplicationState *s, int ret)
+{
+if (s->mode == REPLICATION_MODE_SECONDARY) {
+return ret;
+}
+
+if (ret < 0) {
+s->error = ret;
+ret = 0;
+}
+
+return ret;
+}
+
+static coroutine_fn int replication_co_readv(BlockDriverState *bs,
+ int64_t sector_num,
+ int remaining_sectors,
+ QEMUIOVector *q

[Qemu-block] [Patch v12 resend 10/10] Add a new API to start/stop replication, do checkpoint to all BDSes

2015-12-01 Thread Wen Congyang

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Gonglei <arei.gong...@huawei.com>
---
 block.c   | 83 +++
 include/block/block.h |  4 +++
 2 files changed, 87 insertions(+)

diff --git a/block.c b/block.c
index 213bee8..09ee7f1 100644
--- a/block.c
+++ b/block.c
@@ -4433,3 +4433,86 @@ void bdrv_stop_replication(BlockDriverState *bs, bool 
failover, Error **errp)
" replication", bs->filename);
 }
 }
+
+void bdrv_start_replication_all(ReplicationMode mode, Error **errp)
+{
+BlockDriverState *bs = NULL, *temp = NULL;
+Error *local_err = NULL;
+
+while ((bs = bdrv_next(bs))) {
+if (!QLIST_EMPTY(>parents)) {
+/* It is not top BDS */
+continue;
+}
+
+if (bdrv_is_read_only(bs) || !bdrv_is_inserted(bs)) {
+continue;
+}
+
+bdrv_start_replication(bs, mode, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+goto fail;
+}
+}
+
+return;
+
+fail:
+while ((temp = bdrv_next(temp)) && bs != temp) {
+bdrv_stop_replication(temp, false, NULL);
+}
+}
+
+void bdrv_do_checkpoint_all(Error **errp)
+{
+BlockDriverState *bs = NULL;
+Error *local_err = NULL;
+
+while ((bs = bdrv_next(bs))) {
+if (!QLIST_EMPTY(>parents)) {
+/* It is not top BDS */
+continue;
+}
+
+if (bdrv_is_read_only(bs) || !bdrv_is_inserted(bs)) {
+continue;
+}
+
+bdrv_do_checkpoint(bs, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+}
+}
+
+void bdrv_stop_replication_all(bool failover, Error **errp)
+{
+BlockDriverState *bs = NULL;
+Error *local_err = NULL;
+
+while ((bs = bdrv_next(bs))) {
+if (!QLIST_EMPTY(>parents)) {
+/* It is not top BDS */
+continue;
+}
+
+if (bdrv_is_read_only(bs) || !bdrv_is_inserted(bs)) {
+continue;
+}
+
+bdrv_stop_replication(bs, failover, _err);
+if (!errp) {
+/*
+ * The caller doesn't care the result, they just
+ * want to stop all block's replication.
+ */
+continue;
+}
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+}
+}
diff --git a/include/block/block.h b/include/block/block.h
index cd39d50..39d246c 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -653,4 +653,8 @@ void bdrv_start_replication(BlockDriverState *bs, 
ReplicationMode mode,
 void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp);
 void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp);
 
+void bdrv_start_replication_all(ReplicationMode mode, Error **errp);
+void bdrv_do_checkpoint_all(Error **errp);
+void bdrv_stop_replication_all(bool failover, Error **errp);
+
 #endif
-- 
2.5.0

[Qemu-block] [Patch v12 resend 09/10] support replication driver in blockdev-add

2015-12-01 Thread Wen Congyang

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Gonglei <arei.gong...@huawei.com>
Reviewed-by: Eric Blake <ebl...@redhat.com>
---
 qapi/block-core.json | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 2c6bd3f..acc9f8d 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -219,7 +219,7 @@
 #   'qcow2', 'raw', 'tftp', 'vdi', 'vmdk', 'vpc', 'vvfat'
 #   2.2: 'archipelago' added, 'cow' dropped
 #   2.3: 'host_floppy' deprecated
-#   2.5: 'host_floppy' dropped
+#   2.5: 'host_floppy' dropped, 'replication' added
 #
 # @backing_file: #optional the name of the backing file (for copy-on-write)
 #
@@ -1492,6 +1492,7 @@
 # Drivers that are supported in block device operations.
 #
 # @host_device, @host_cdrom: Since 2.1
+# @replication: Since 2.5
 #
 # Since: 2.0
 ##
@@ -1499,8 +1500,8 @@
   'data': [ 'archipelago', 'blkdebug', 'blkverify', 'bochs', 'cloop',
 'dmg', 'file', 'ftp', 'ftps', 'host_cdrom', 'host_device',
 'http', 'https', 'null-aio', 'null-co', 'parallels',
-'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'tftp', 'vdi', 'vhdx',
-'vmdk', 'vpc', 'vvfat' ] }
+'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'replication',
+'tftp', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
 
 ##
 # @BlockdevOptionsBase
@@ -1938,6 +1939,19 @@
 { 'enum' : 'ReplicationMode', 'data' : [ 'primary', 'secondary' ] }
 
 ##
+# @BlockdevOptionsReplication
+#
+# Driver specific block device options for replication
+#
+# @mode: the replication mode
+#
+# Since: 2.5
+##
+{ 'struct': 'BlockdevOptionsReplication',
+  'base': 'BlockdevOptionsGenericFormat',
+  'data': { 'mode': 'ReplicationMode'  } }
+
+##
 # @BlockdevOptions
 #
 # Options for creating a block device.
@@ -1974,6 +1988,7 @@
   'quorum': 'BlockdevOptionsQuorum',
   'raw':'BlockdevOptionsGenericFormat',
 # TODO rbd: Wait for structured options
+  'replication':'BlockdevOptionsReplication',
 # TODO sheepdog: Wait for structured options
 # TODO ssh: Should take InetSocketAddress for 'host'?
   'tftp':   'BlockdevOptionsFile',
-- 
2.5.0

[Qemu-block] [Patch v8 2/3] quorum: implement bdrv_add_child() and bdrv_del_child()

2015-11-26 Thread Wen Congyang

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Gonglei <arei.gong...@huawei.com>
---
 block.c   |   8 ++--
 block/quorum.c| 124 +-
 include/block/block.h |   4 ++
 3 files changed, 130 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index 255a36e..bfc2be8 100644
--- a/block.c
+++ b/block.c
@@ -1196,10 +1196,10 @@ static int bdrv_fill_options(QDict **options, const 
char *filename,
 return 0;
 }
 
-static BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
-BlockDriverState *child_bs,
-const char *child_name,
-const BdrvChildRole *child_role)
+BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
+ BlockDriverState *child_bs,
+ const char *child_name,
+ const BdrvChildRole *child_role)
 {
 BdrvChild *child = g_new(BdrvChild, 1);
 *child = (BdrvChild) {
diff --git a/block/quorum.c b/block/quorum.c
index 2810e37..b7df14b 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -23,6 +23,7 @@
 #include "qapi/qmp/qstring.h"
 #include "qapi-event.h"
 #include "crypto/hash.h"
+#include "qemu/bitmap.h"
 
 #define HASH_LENGTH 32
 
@@ -80,6 +81,8 @@ typedef struct BDRVQuorumState {
 bool rewrite_corrupted;/* true if the driver must rewrite-on-read corrupted
 * block if Quorum is reached.
 */
+unsigned long *index_bitmap;
+int bsize;
 
 QuorumReadPattern read_pattern;
 } BDRVQuorumState;
@@ -875,9 +878,9 @@ static int quorum_open(BlockDriverState *bs, QDict 
*options, int flags,
 ret = -EINVAL;
 goto exit;
 }
-if (s->num_children < 2) {
+if (s->num_children < 1) {
 error_setg(_err,
-   "Number of provided children must be greater than 1");
+   "Number of provided children must be 1 or more");
 ret = -EINVAL;
 goto exit;
 }
@@ -926,6 +929,7 @@ static int quorum_open(BlockDriverState *bs, QDict 
*options, int flags,
 /* allocate the children array */
 s->children = g_new0(BdrvChild *, s->num_children);
 opened = g_new0(bool, s->num_children);
+s->index_bitmap = bitmap_new(s->num_children);
 
 for (i = 0; i < s->num_children; i++) {
 char indexstr[32];
@@ -941,6 +945,8 @@ static int quorum_open(BlockDriverState *bs, QDict 
*options, int flags,
 
 opened[i] = true;
 }
+bitmap_set(s->index_bitmap, 0, s->num_children);
+s->bsize = s->num_children;
 
 g_free(opened);
 goto exit;
@@ -997,6 +1003,117 @@ static void quorum_attach_aio_context(BlockDriverState 
*bs,
 }
 }
 
+static int get_new_child_index(BDRVQuorumState *s)
+{
+int index;
+
+index = find_next_zero_bit(s->index_bitmap, s->bsize, 0);
+if (index < s->bsize) {
+return index;
+}
+
+if ((s->bsize % BITS_PER_LONG) == 0) {
+s->index_bitmap = bitmap_zero_extend(s->index_bitmap, s->bsize,
+ s->bsize + 1);
+}
+
+return s->bsize++;
+}
+
+static void remove_child_index(BDRVQuorumState *s, int index)
+{
+int last_index;
+long new_len;
+
+assert(index < s->bsize);
+
+clear_bit(index, s->index_bitmap);
+if (index < s->bsize - 1) {
+/*
+ * The last bit is always set, and we don't clear
+ * the last bit.
+ */
+return;
+}
+
+last_index = find_last_bit(s->index_bitmap, s->bsize);
+if (BITS_TO_LONGS(last_index + 1) == BITS_TO_LONGS(s->bsize)) {
+s->bsize = last_index + 1;
+return;
+}
+
+new_len = BITS_TO_LONGS(last_index + 1) * sizeof(unsigned long);
+s->index_bitmap = g_realloc(s->index_bitmap, new_len);
+s->bsize = last_index + 1;
+}
+
+static void quorum_add_child(BlockDriverState *bs, BlockDriverState *child_bs,
+ Error **errp)
+{
+BDRVQuorumState *s = bs->opaque;
+BdrvChild *child;
+char indexstr[32];
+int index = find_next_zero_bit(s->index_bitmap, s->bsize, 0);
+int ret;
+
+index = get_new_child_index(s);
+ret = snprintf(indexstr, 32, "children.%d", index);
+if (ret < 0 || ret >= 32) {
+error_setg(errp, "cannot generate child name");
+return;
+}
+
+bdrv_drain(bs);
+
+assert(s->num_children <= INT_MAX / sizeof(BdrvChild *));
+if (s->num_children == INT_MAX / sizeof(BdrvChild *)) {
+error_setg(errp, "Too many children");
+return;
+}
+s->children = g_

[Qemu-block] [Patch v8 0/3] qapi: child add/delete support

2015-11-26 Thread Wen Congyang

If quorum's child is broken, we can use mirror job to replace it.
But sometimes, the user only need to remove the broken child, and
add it later when the problem is fixed.

It is based on the Kevin's child name related patch:
http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg04949.html

ChangLog:
v8:
1. Rebase to the newest codes
2. Address the comments from Eric Blake
v7:
1. Remove the qmp command x-blockdev-change's parameter operation according
   to Kevin's comments.
2. Remove the hmp command.
v6:
1. Use a single qmp command x-blockdev-change to replace x-blockdev-child-add
   and x-blockdev-child-delete
v5:
1. Address Eric Blake's comments
v4:
1. drop nbd driver's implementation. We can use human-monitor-command
   to do it.
2. Rename the command name.
v3:
1. Don't open BDS in bdrv_add_child(). Use the existing BDS which is
   created by the QMP command blockdev-add.
2. The driver NBD can support filename, path, host:port now.
v2:
1. Use bdrv_get_device_or_node_name() instead of new function
   bdrv_get_id_or_node_name()
2. Update the error message
3. Update the documents in block-core.json

Wen Congyang (3):
  Add new block driver interface to add/delete a BDS's child
  quorum: implement bdrv_add_child() and bdrv_del_child()
  qmp: add monitor command to add/remove a child

 block.c   |  58 --
 block/quorum.c| 124 +-
 blockdev.c|  54 
 include/block/block.h |   9 
 include/block/block_int.h |   5 ++
 qapi/block-core.json  |  23 +
 qmp-commands.hx   |  47 ++
 7 files changed, 314 insertions(+), 6 deletions(-)

-- 
2.5.0

[Qemu-block] [Patch v8 1/3] Add new block driver interface to add/delete a BDS's child

2015-11-26 Thread Wen Congyang

In some cases, we want to take a quorum child offline, and take
another child online.

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Gonglei <arei.gong...@huawei.com>
Reviewed-by: Eric Blake <ebl...@redhat.com>
Reviewed-by: Alberto Garcia <be...@igalia.com>
---
 block.c   | 50 +++
 include/block/block.h |  5 +
 include/block/block_int.h |  5 +
 3 files changed, 60 insertions(+)

diff --git a/block.c b/block.c
index 60ff84f..255a36e 100644
--- a/block.c
+++ b/block.c
@@ -4321,3 +4321,53 @@ void bdrv_refresh_filename(BlockDriverState *bs)
 QDECREF(json);
 }
 }
+
+/*
+ * Hot add/remove a BDS's child. So the user can take a child offline when
+ * it is broken and take a new child online
+ */
+void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
+Error **errp)
+{
+
+if (!parent_bs->drv || !parent_bs->drv->bdrv_add_child) {
+error_setg(errp, "The node %s doesn't support adding a child",
+   bdrv_get_device_or_node_name(parent_bs));
+return;
+}
+
+if (!QLIST_EMPTY(_bs->parents)) {
+error_setg(errp, "The node %s already has a parent",
+   child_bs->node_name);
+return;
+}
+
+parent_bs->drv->bdrv_add_child(parent_bs, child_bs, errp);
+}
+
+void bdrv_del_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
+Error **errp)
+{
+BdrvChild *child;
+
+if (!parent_bs->drv || !parent_bs->drv->bdrv_del_child) {
+error_setg(errp, "The node %s doesn't support removing a child",
+   bdrv_get_device_or_node_name(parent_bs));
+return;
+}
+
+QLIST_FOREACH(child, _bs->children, next) {
+if (child->bs == child_bs) {
+break;
+}
+}
+
+if (!child) {
+error_setg(errp, "The node %s is not a child of %s",
+   bdrv_get_device_or_node_name(child_bs),
+   bdrv_get_device_or_node_name(parent_bs));
+return;
+}
+
+parent_bs->drv->bdrv_del_child(parent_bs, child_bs, errp);
+}
diff --git a/include/block/block.h b/include/block/block.h
index d9b380c..06d3369 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -639,4 +639,9 @@ void bdrv_drained_begin(BlockDriverState *bs);
  */
 void bdrv_drained_end(BlockDriverState *bs);
 
+void bdrv_add_child(BlockDriverState *parent, BlockDriverState *child,
+Error **errp);
+void bdrv_del_child(BlockDriverState *parent, BlockDriverState *child,
+Error **errp);
+
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 6d7bd3b..ea20d12 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -302,6 +302,11 @@ struct BlockDriver {
  */
 void (*bdrv_drain)(BlockDriverState *bs);
 
+void (*bdrv_add_child)(BlockDriverState *parent, BlockDriverState *child,
+   Error **errp);
+void (*bdrv_del_child)(BlockDriverState *parent, BlockDriverState *child,
+   Error **errp);
+
 QLIST_ENTRY(BlockDriver) list;
 };
 
-- 
2.5.0

[Qemu-block] [Patch v12 05/10] docs: block replication's description

2015-11-26 Thread Wen Congyang

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Gonglei <arei.gong...@huawei.com>
---
 docs/block-replication.txt | 227 +
 1 file changed, 227 insertions(+)
 create mode 100644 docs/block-replication.txt

diff --git a/docs/block-replication.txt b/docs/block-replication.txt
new file mode 100644
index 000..c7bad0e
--- /dev/null
+++ b/docs/block-replication.txt
@@ -0,0 +1,227 @@
+Block replication
+
+Copyright Fujitsu, Corp. 2015
+Copyright (c) 2015 Intel Corporation
+Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.
+See the COPYING file in the top-level directory.
+
+Block replication is used for continuous checkpoints. It is designed
+for COLO (COurse-grain LOck-stepping) where the Secondary VM is running.
+It can also be applied for FT/HA (Fault-tolerance/High Assurance) scenario,
+where the Secondary VM is not running.
+
+This document gives an overview of block replication's design.
+
+== Background ==
+High availability solutions such as micro checkpoint and COLO will do
+consecutive checkpoints. The VM state of Primary VM and Secondary VM is
+identical right after a VM checkpoint, but becomes different as the VM
+executes till the next checkpoint. To support disk contents checkpoint,
+the modified disk contents in the Secondary VM must be buffered, and are
+only dropped at next checkpoint time. To reduce the network transportation
+effort at the time of checkpoint, the disk modification operations of
+Primary disk are asynchronously forwarded to the Secondary node.
+
+== Workflow ==
+The following is the image of block replication workflow:
+
++--+++
+|Primary Write Requests||Secondary Write Requests|
++--+++
+  |   |
+  |  (4)
+  |   V
+  |  /-\
+  |  Copy and Forward| |
+  |-(1)--+   | Disk Buffer |
+  |  |   | |
+  | (3)  \-/
+  | speculative  ^
+  |write through(2)
+  |  |   |
+  V  V   |
+   +--+   ++
+   | Primary Disk |   | Secondary Disk |
+   +--+   ++
+
+1) Primary write requests will be copied and forwarded to Secondary
+   QEMU.
+2) Before Primary write requests are written to Secondary disk, the
+   original sector content will be read from Secondary disk and
+   buffered in the Disk buffer, but it will not overwrite the existing
+   sector content (it could be from either "Secondary Write Requests" or
+   previous COW of "Primary Write Requests") in the Disk buffer.
+3) Primary write requests will be written to Secondary disk.
+4) Secondary write requests will be buffered in the Disk buffer and it
+   will overwrite the existing sector content in the buffer.
+
+== Architecture ==
+We are going to implement block replication from many basic
+blocks that are already in QEMU.
+
+ virtio-blk   ||
+ ^||.--
+ |||| Secondary
+1 Quorum  ||'--
+ /  \ ||
+/\||
+   Primary2 filter
+ disk ^
 virtio-blk
+  |
  ^
+3 NBD  --->  3 NBD 
  |
+client|| server
  2 filter
+  ||^  
  ^
+. |||  
  |
+Primary | ||  Secondary disk <- hidden-disk 5 
<- active-disk 4
+' ||

[Qemu-block] [Patch v12 03/10] Backup: clear all bitmap when doing block checkpoint

2015-11-26 Thread Wen Congyang

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Gonglei <arei.gong...@huawei.com>
Reviewed-by: Jeff Cody <jc...@redhat.com>
---
 block/backup.c   | 14 ++
 blockjob.c   | 11 +++
 include/block/blockjob.h | 12 
 3 files changed, 37 insertions(+)

diff --git a/block/backup.c b/block/backup.c
index 3b39119..1ca102d 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -253,11 +253,25 @@ static void backup_abort(BlockJob *job)
 }
 }
 
+static void backup_do_checkpoint(BlockJob *job, Error **errp)
+{
+BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
+
+if (backup_job->sync_mode != MIRROR_SYNC_MODE_NONE) {
+error_setg(errp, "The backup job only supports block checkpoint in"
+   " sync=none mode");
+return;
+}
+
+hbitmap_reset_all(backup_job->bitmap);
+}
+
 static const BlockJobDriver backup_job_driver = {
 .instance_size  = sizeof(BackupBlockJob),
 .job_type   = BLOCK_JOB_TYPE_BACKUP,
 .set_speed  = backup_set_speed,
 .iostatus_reset = backup_iostatus_reset,
+.do_checkpoint  = backup_do_checkpoint,
 .commit = backup_commit,
 .abort  = backup_abort,
 };
diff --git a/blockjob.c b/blockjob.c
index 80adb9d..0c8edfe 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -533,3 +533,14 @@ void block_job_txn_add_job(BlockJobTxn *txn, BlockJob *job)
 QLIST_INSERT_HEAD(>jobs, job, txn_list);
 block_job_txn_ref(txn);
 }
+
+void block_job_do_checkpoint(BlockJob *job, Error **errp)
+{
+if (!job->driver->do_checkpoint) {
+error_setg(errp, "The job %s doesn't support block checkpoint",
+   BlockJobType_lookup[job->driver->job_type]);
+return;
+}
+
+job->driver->do_checkpoint(job, errp);
+}
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index d84ccd8..abdba7c 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -70,6 +70,9 @@ typedef struct BlockJobDriver {
  * never both.
  */
 void (*abort)(BlockJob *job);
+
+/** Optional callback for job types that support checkpoint. */
+void (*do_checkpoint)(BlockJob *job, Error **errp);
 } BlockJobDriver;
 
 /**
@@ -443,4 +446,13 @@ void block_job_txn_unref(BlockJobTxn *txn);
  */
 void block_job_txn_add_job(BlockJobTxn *txn, BlockJob *job);
 
+/**
+ * block_job_do_checkpoint:
+ * @job: The job.
+ * @errp: Error object.
+ *
+ * Do block checkpoint on the specified job.
+ */
+void block_job_do_checkpoint(BlockJob *job, Error **errp);
+
 #endif
-- 
2.5.0

[Qemu-block] [Patch v12 00/10] Block replication for continuous checkpoints

2015-11-26 Thread Wen Congyang

Block replication is a very important feature which is used for
continuous checkpoints(for example: COLO).

You can get the detailed information about block replication from here:
http://wiki.qemu.org/Features/BlockReplication

Usage:
Please refer to docs/block-replication.txt

This patch series is based on the following patch series:
1. http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg04949.html
2. http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg06043.html

You can get the patch here:
https://github.com/coloft/qemu/tree/wency/block-replication-v12

You can get the patch with framework here:
https://github.com/coloft/qemu/tree/wency/colo_framework_v11.2

TODO:
1. Continuous block replication. It will be started after basic functions
   are accepted.

Changs Log:
V12:
1. Rebase to the newest codes
2. Use backing reference to replcace 'allow-write-backing-file'
V11:
1. Reopen the backing file when starting blcok replication if it is not
   opened in R/W mode
2. Unblock BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET
   when opening backing file
3. Block the top BDS so there is only one block job for the top BDS and
   its backing chain.
V10:
1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
   reference.
2. Address the comments from Eric Blake
V9:
1. Update the error messages
2. Rebase to the newest qemu
3. Split child add/delete support. These patches are sent in another patchset.
V8:
1. Address Alberto Garcia's comments
V7:
1. Implement adding/removing quorum child. Remove the option non-connect.
2. Simplify the backing refrence option according to Stefan Hajnoczi's 
suggestion
V6:
1. Rebase to the newest qemu.
V5:
1. Address the comments from Gong Lei
2. Speed the failover up. The secondary vm can take over very quickly even
   if there are too many I/O requests.
V4:
1. Introduce a new driver replication to avoid touch nbd and qcow2.
V3:
1: use error_setg() instead of error_set()
2. Add a new block job API
3. Active disk, hidden disk and nbd target uses the same AioContext
4. Add a testcase to test new hbitmap API
V2:
1. Redesign the secondary qemu(use image-fleecing)
2. Use Error objects to return error message
3. Address the comments from Max Reitz and Eric Blake

Wen Congyang (10):
  unblock backup operations in backing file
  Store parent BDS in BdrvChild
  Backup: clear all bitmap when doing block checkpoint
  Allow creating backup jobs when opening BDS
  docs: block replication's description
  Add new block driver interfaces to control block replication
  quorum: implement block driver interfaces for block replication
  Implement new driver for block replication
  support replication driver in blockdev-add
  Add a new API to start/stop replication, do checkpoint to all BDSes

 block.c| 145 
 block/Makefile.objs|   3 +-
 block/backup.c |  14 ++
 block/quorum.c |  78 +++
 block/replication.c| 549 +
 blockjob.c |  11 +
 docs/block-replication.txt | 227 +++
 include/block/block.h  |   9 +
 include/block/block_int.h  |  15 ++
 include/block/blockjob.h   |  12 +
 qapi/block-core.json   |  34 ++-
 11 files changed, 1093 insertions(+), 4 deletions(-)
 create mode 100644 block/replication.c
 create mode 100644 docs/block-replication.txt

-- 
2.5.0

[Qemu-block] [Patch v12 01/10] unblock backup operations in backing file

2015-11-26 Thread Wen Congyang

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
---
 block.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/block.c b/block.c
index bfc2be8..eaf479a 100644
--- a/block.c
+++ b/block.c
@@ -1275,6 +1275,24 @@ void bdrv_set_backing_hd(BlockDriverState *bs, 
BlockDriverState *backing_hd)
 /* Otherwise we won't be able to commit due to check in bdrv_commit */
 bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_COMMIT_TARGET,
 bs->backing_blocker);
+/*
+ * We do backup in 3 ways:
+ * 1. drive backup
+ *The target bs is new opened, and the source is top BDS
+ * 2. blockdev backup
+ *Both the source and the target are top BDSes.
+ * 3. internal backup(used for block replication)
+ *Both the source and the target are backing file
+ *
+ * In case 1, and 2, the backing file is neither the source nor
+ * the target.
+ * In case 3, we will block the top BDS, so there is only one block
+ * job for the top BDS and its backing chain.
+ */
+bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_SOURCE,
+bs->backing_blocker);
+bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_TARGET,
+bs->backing_blocker);
 out:
 bdrv_refresh_limits(bs, NULL);
 }
-- 
2.5.0

[Qemu-block] [Patch v12 02/10] Store parent BDS in BdrvChild

2015-11-26 Thread Wen Congyang

We need to access the parent BDS to get the root BDS.

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
---
 block.c   | 1 +
 include/block/block_int.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/block.c b/block.c
index eaf479a..0a0468f 100644
--- a/block.c
+++ b/block.c
@@ -1204,6 +1204,7 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
 BdrvChild *child = g_new(BdrvChild, 1);
 *child = (BdrvChild) {
 .bs = child_bs,
+.parent = parent_bs,
 .name   = g_strdup(child_name),
 .role   = child_role,
 };
diff --git a/include/block/block_int.h b/include/block/block_int.h
index ea20d12..1f56046 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -357,6 +357,7 @@ extern const BdrvChildRole child_format;
 
 struct BdrvChild {
 BlockDriverState *bs;
+BlockDriverState *parent;
 char *name;
 const BdrvChildRole *role;
 QLIST_ENTRY(BdrvChild) next;
-- 
2.5.0

[Qemu-block] [Patch v12 04/10] Allow creating backup jobs when opening BDS

2015-11-26 Thread Wen Congyang

When opening BDS, we need to create backup jobs for
image-fleecing.

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Gonglei <arei.gong...@huawei.com>
Reviewed-by: Stefan Hajnoczi <stefa...@redhat.com>
Reviewed-by: Jeff Cody <jc...@redhat.com>
---
 block/Makefile.objs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 58ef2ef..fa05f37 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -22,10 +22,10 @@ block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o
 block-obj-$(CONFIG_LIBSSH2) += ssh.o
 block-obj-y += accounting.o
 block-obj-y += write-threshold.o
+block-obj-y += backup.o
 
 common-obj-y += stream.o
 common-obj-y += commit.o
-common-obj-y += backup.o
 
 iscsi.o-cflags := $(LIBISCSI_CFLAGS)
 iscsi.o-libs   := $(LIBISCSI_LIBS)
-- 
2.5.0

[Qemu-block] [Patch v8 3/3] qmp: add monitor command to add/remove a child

2015-11-26 Thread Wen Congyang

The new QMP command name is x-blockdev-change. It's just for adding/removing
quorum's child now, and doesn't support all kinds of children, all kinds of
operations, nor all block drivers. So it is experimental now.

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Gonglei <arei.gong...@huawei.com>
---
 blockdev.c   | 54 
 qapi/block-core.json | 23 ++
 qmp-commands.hx  | 47 +
 3 files changed, 124 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 2b076fb..7d8a2b4 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3836,6 +3836,60 @@ out:
 aio_context_release(aio_context);
 }
 
+static BlockDriverState *bdrv_find_child(BlockDriverState *parent_bs,
+ const char *child_name)
+{
+BdrvChild *child;
+
+QLIST_FOREACH(child, _bs->children, next) {
+if (strcmp(child->name, child_name) == 0) {
+return child->bs;
+}
+}
+
+return NULL;
+}
+
+void qmp_x_blockdev_change(const char *parent, bool has_child,
+   const char *child, bool has_node,
+   const char *node, Error **errp)
+{
+BlockDriverState *parent_bs, *child_bs = NULL, *new_bs = NULL;
+
+parent_bs = bdrv_lookup_bs(parent, parent, errp);
+if (!parent_bs) {
+return;
+}
+
+if (has_child == has_node) {
+if (has_child) {
+error_setg(errp, "The paramter child and node is conflict");
+} else {
+error_setg(errp, "Either child or node should be specified");
+}
+return;
+}
+
+if (has_child) {
+child_bs = bdrv_find_child(parent_bs, child);
+if (!child_bs) {
+error_setg(errp, "Node '%s' doesn't have child %s",
+   parent, child);
+return;
+}
+bdrv_del_child(parent_bs, child_bs, errp);
+}
+
+if (has_node) {
+new_bs = bdrv_find_node(node);
+if (!new_bs) {
+error_setg(errp, "Node '%s' not found", node);
+return;
+}
+bdrv_add_child(parent_bs, new_bs, errp);
+}
+}
+
 BlockJobInfoList *qmp_query_block_jobs(Error **errp)
 {
 BlockJobInfoList *head = NULL, **p_next = 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index a07b13f..feb8da2 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2400,3 +2400,26 @@
 ##
 { 'command': 'block-set-write-threshold',
   'data': { 'node-name': 'str', 'write-threshold': 'uint64' } }
+
+##
+# @x-blockdev-change
+#
+# Dynamically reconfigure the block driver state graph. It can be used
+# to add, remove, insert or replace a block driver state. Currently only
+# the Quorum driver implements this feature to add or remove its child.
+# This is useful to fix a broken quorum child.
+#
+# @parent: the id or name of the node that will be changed.
+#
+# @child: #optional the name of the child that will be deleted.
+#
+# @node: #optional the name of the node will be added.
+#
+# Note: this command is experimental, and its API is not stable.
+#
+# Since: 2.6
+##
+{ 'command': 'x-blockdev-change',
+  'data' : { 'parent': 'str',
+ '*child': 'str',
+ '*node': 'str' } }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 9d8b42f..9b49d51 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -4285,6 +4285,53 @@ Example:
 EQMP
 
 {
+.name   = "x-blockdev-change",
+.args_type  = "parent:B,child:B?,node:B?",
+.mhandler.cmd_new = qmp_marshal_x_blockdev_change,
+},
+
+SQMP
+x-blockdev-change
+-
+
+Dynamically reconfigure the block driver state graph. It can be used to
+add, remove, insert, or replace a block driver state. Currently only
+the Quorum driver implements this feature to add and remove its child.
+This is useful to fix a broken quorum child.
+
+Arguments:
+- "parent": the id or node name of which node will be changed (json-string)
+- "child": the child name which will be deleted (json-string, optional)
+- "node": the new node-name which will be added (json-string, optional)
+
+Note: this command is experimental, and not a stable API. It doesn't
+support all kinds of operations, all kinds of children, nor all block
+drivers.
+
+Example:
+
+Add a new node to a quorum
+-> { "execute": blockdev-add",
+"arguments": { "options": { "driver": "raw",
+"node-name": "new_node",
+"id": "test_new_node",
+"file": { "driver": "file",
+

[Qemu-block] [Patch v12 07/10] quorum: implement block driver interfaces for block replication

2015-11-26 Thread Wen Congyang

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Gonglei <arei.gong...@huawei.com>
Reviewed-by: Alberto Garcia <be...@igalia.com>
---
 block/quorum.c | 78 ++
 1 file changed, 78 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index b7df14b..6fa54f3 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -85,6 +85,8 @@ typedef struct BDRVQuorumState {
 int bsize;
 
 QuorumReadPattern read_pattern;
+
+int replication_index; /* store which child supports block replication */
 } BDRVQuorumState;
 
 typedef struct QuorumAIOCB QuorumAIOCB;
@@ -949,6 +951,7 @@ static int quorum_open(BlockDriverState *bs, QDict 
*options, int flags,
 s->bsize = s->num_children;
 
 g_free(opened);
+s->replication_index = -1;
 goto exit;
 
 close_exit:
@@ -1148,6 +1151,77 @@ static void quorum_refresh_filename(BlockDriverState 
*bs, QDict *options)
 bs->full_open_options = opts;
 }
 
+static void quorum_start_replication(BlockDriverState *bs, ReplicationMode 
mode,
+ Error **errp)
+{
+BDRVQuorumState *s = bs->opaque;
+int count = 0, i, index;
+Error *local_err = NULL;
+
+/*
+ * TODO: support REPLICATION_MODE_SECONDARY if we allow secondary
+ * QEMU becoming primary QEMU.
+ */
+if (mode != REPLICATION_MODE_PRIMARY) {
+error_setg(errp, "The replication mode for quorum should be 
'primary'");
+return;
+}
+
+if (s->read_pattern != QUORUM_READ_PATTERN_FIFO) {
+error_setg(errp, "Block replication needs read pattern 'fifo'");
+return;
+}
+
+for (i = 0; i < s->num_children; i++) {
+bdrv_start_replication(s->children[i]->bs, mode, _err);
+if (local_err) {
+error_free(local_err);
+local_err = NULL;
+} else {
+count++;
+index = i;
+}
+}
+
+if (count == 0) {
+error_setg(errp, "No child supports block replication");
+} else if (count > 1) {
+for (i = 0; i < s->num_children; i++) {
+bdrv_stop_replication(s->children[i]->bs, false, NULL);
+}
+error_setg(errp, "Too many children support block replication");
+} else {
+s->replication_index = index;
+}
+}
+
+static void quorum_do_checkpoint(BlockDriverState *bs, Error **errp)
+{
+BDRVQuorumState *s = bs->opaque;
+
+if (s->replication_index < 0) {
+error_setg(errp, "Block replication is not running");
+return;
+}
+
+bdrv_do_checkpoint(s->children[s->replication_index]->bs, errp);
+}
+
+static void quorum_stop_replication(BlockDriverState *bs, bool failover,
+Error **errp)
+{
+BDRVQuorumState *s = bs->opaque;
+
+if (s->replication_index < 0) {
+error_setg(errp, "Block replication is not running");
+return;
+}
+
+bdrv_stop_replication(s->children[s->replication_index]->bs, failover,
+  errp);
+s->replication_index = -1;
+}
+
 static BlockDriver bdrv_quorum = {
 .format_name= "quorum",
 .protocol_name  = "quorum",
@@ -1174,6 +1248,10 @@ static BlockDriver bdrv_quorum = {
 
 .is_filter  = true,
 .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
+
+.bdrv_start_replication = quorum_start_replication,
+.bdrv_do_checkpoint = quorum_do_checkpoint,
+.bdrv_stop_replication  = quorum_stop_replication,
 };
 
 static void bdrv_quorum_init(void)
-- 
2.5.0

[Qemu-block] [Patch v12 06/10] Add new block driver interfaces to control block replication

2015-11-26 Thread Wen Congyang

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Gonglei <arei.gong...@huawei.com>
Cc: Luiz Capitulino <lcapitul...@redhat.com>
Cc: Michael Roth <mdr...@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonz...@redhat.com>
---
 block.c   | 43 +++
 include/block/block.h |  5 +
 include/block/block_int.h | 14 ++
 qapi/block-core.json  | 13 +
 4 files changed, 75 insertions(+)

diff --git a/block.c b/block.c
index 0a0468f..213bee8 100644
--- a/block.c
+++ b/block.c
@@ -4390,3 +4390,46 @@ void bdrv_del_child(BlockDriverState *parent_bs, 
BlockDriverState *child_bs,
 
 parent_bs->drv->bdrv_del_child(parent_bs, child_bs, errp);
 }
+
+void bdrv_start_replication(BlockDriverState *bs, ReplicationMode mode,
+Error **errp)
+{
+BlockDriver *drv = bs->drv;
+
+if (drv && drv->bdrv_start_replication) {
+drv->bdrv_start_replication(bs, mode, errp);
+} else if (bs->file) {
+bdrv_start_replication(bs->file->bs, mode, errp);
+} else {
+error_setg(errp, "The BDS %s doesn't support starting block"
+   " replication", bs->filename);
+}
+}
+
+void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp)
+{
+BlockDriver *drv = bs->drv;
+
+if (drv && drv->bdrv_do_checkpoint) {
+drv->bdrv_do_checkpoint(bs, errp);
+} else if (bs->file) {
+bdrv_do_checkpoint(bs->file->bs, errp);
+} else {
+error_setg(errp, "The BDS %s doesn't support block checkpoint",
+   bs->filename);
+}
+}
+
+void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp)
+{
+BlockDriver *drv = bs->drv;
+
+if (drv && drv->bdrv_stop_replication) {
+drv->bdrv_stop_replication(bs, failover, errp);
+} else if (bs->file) {
+bdrv_stop_replication(bs->file->bs, failover, errp);
+} else {
+error_setg(errp, "The BDS %s doesn't support stopping block"
+   " replication", bs->filename);
+}
+}
diff --git a/include/block/block.h b/include/block/block.h
index 1d3b9c6..cd39d50 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -648,4 +648,9 @@ void bdrv_add_child(BlockDriverState *parent, 
BlockDriverState *child,
 void bdrv_del_child(BlockDriverState *parent, BlockDriverState *child,
 Error **errp);
 
+void bdrv_start_replication(BlockDriverState *bs, ReplicationMode mode,
+Error **errp);
+void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp);
+void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp);
+
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 1f56046..a6aba8b 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -307,6 +307,20 @@ struct BlockDriver {
 void (*bdrv_del_child)(BlockDriverState *parent, BlockDriverState *child,
Error **errp);
 
+void (*bdrv_start_replication)(BlockDriverState *bs, ReplicationMode mode,
+   Error **errp);
+/* Drop Disk buffer when doing checkpoint. */
+void (*bdrv_do_checkpoint)(BlockDriverState *bs, Error **errp);
+/*
+ * After failover, we should flush Disk buffer into secondary disk
+ * and stop block replication.
+ *
+ * If the guest is shutdown, we should drop Disk buffer and stop
+ * block representation.
+ */
+void (*bdrv_stop_replication)(BlockDriverState *bs, bool failover,
+  Error **errp);
+
 QLIST_ENTRY(BlockDriver) list;
 };
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index feb8da2..2c6bd3f 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1925,6 +1925,19 @@
 '*read-pattern': 'QuorumReadPattern' } }
 
 ##
+# @ReplicationMode
+#
+# An enumeration of replication modes.
+#
+# @primary: Primary mode, the vm's state will be sent to secondary QEMU.
+#
+# @secondary: Secondary mode, receive the vm's state from primary QEMU.
+#
+# Since: 2.5
+##
+{ 'enum' : 'ReplicationMode', 'data' : [ 'primary', 'secondary' ] }
+
+##
 # @BlockdevOptions
 #
 # Options for creating a block device.
-- 
2.5.0

Re: [Qemu-block] [Qemu-devel] [PATCH v2 14/21] blockdev: Set 'format' indicates non-empty drive

2015-11-24 Thread Wen Congyang

On 11/23/2015 11:59 PM, Kevin Wolf wrote:
> Creating an empty drive while specifying 'format' doesn't make sense.
> The specified format driver would simply be ignored.
> 
> Make a set 'format' option an indication that a non-empty drive should
> be created. This makes 'format' consistent with 'driver' and allows
> using it with a block driver that doesn't need any other options (like
> null-co/null-aio).

After this patch, make check will fail:
GTESTER check-qtest-x86_64
blkdebug: Suspended request 'A'
blkdebug: Resuming request 'A'
qemu-system-x86_64: -drive id=drive0,if=ide,format=raw,index=0,media=cdrom: 
Can't use 'raw' as a block driver for the protocol level
Broken pipe
GTester: last random seed: R02S30363735a5ceb1b5e3967086fe0faf60
qemu-system-x86_64: -drive id=drive2,if=ide,format=raw,index=2,media=cdrom: 
Can't use 'raw' as a block driver for the protocol level
Broken pipe
GTester: last random seed: R02S900c9c9fe15d4866eeedf04b9c3e15eb
qemu-system-x86_64: -drive id=drive2,if=ide,format=raw,index=2,media=cdrom: 
Can't use 'raw' as a block driver for the protocol level
Broken pipe
GTester: last random seed: R02S7b91f40b4832e7a8dc58090ab2316d3e
qemu-system-x86_64: -drive id=drive2,if=ide,format=raw,index=2,media=cdrom: 
Can't use 'raw' as a block driver for the protocol level
Broken pipe
GTester: last random seed: R02S554ea2416720809b0b192625a0a03d4e
qemu-system-x86_64: -drive id=drive2,if=none,format=raw,media=cdrom: Can't use 
'raw' as a block driver for the protocol level
Broken pipe
GTester: last random seed: R02Sc0a31724f0aee48215f32c39093bfc70
qemu-system-x86_64: -drive id=drive2,if=none,format=raw,media=cdrom: Can't use 
'raw' as a block driver for the protocol level
Broken pipe
GTester: last random seed: R02S090ae3723fe809a4ee5a077b620fed12
qemu-system-x86_64: -drive id=drive2,if=none,format=raw,media=cdrom: Can't use 
'raw' as a block driver for the protocol level
Broken pipe
GTester: last random seed: R02Sf378037d36c71e1666ee8547dbf276f3

Thanks
Wen Congyang

> 
> Signed-off-by: Kevin Wolf <kw...@redhat.com>
> ---
>  blockdev.c| 5 +
>  tests/qemu-iotests/iotests.py | 2 +-
>  2 files changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 313841b..afaeef9 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -490,7 +490,6 @@ static BlockBackend *blockdev_init(const char *file, 
> QDict *bs_opts,
>  QDict *interval_dict = NULL;
>  QList *interval_list = NULL;
>  const char *id;
> -bool has_driver_specific_opts;
>  BlockdevDetectZeroesOptions detect_zeroes =
>  BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF;
>  const char *throttling_group = NULL;
> @@ -514,8 +513,6 @@ static BlockBackend *blockdev_init(const char *file, 
> QDict *bs_opts,
>  qdict_del(bs_opts, "id");
>  }
>  
> -has_driver_specific_opts = !!qdict_size(bs_opts);
> -
>  /* extract parameters */
>  snapshot = qemu_opt_get_bool(opts, "snapshot", 0);
>  
> @@ -578,7 +575,7 @@ static BlockBackend *blockdev_init(const char *file, 
> QDict *bs_opts,
>  }
>  
>  /* init */
> -if ((!file || !*file) && !has_driver_specific_opts) {
> +if ((!file || !*file) && !qdict_size(bs_opts)) {
>  BlockBackendRootState *blk_rs;
>  
>  blk = blk_new(qemu_opts_id(opts), errp);
> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
> index ff5905f..f36add8 100644
> --- a/tests/qemu-iotests/iotests.py
> +++ b/tests/qemu-iotests/iotests.py
> @@ -143,12 +143,12 @@ class VM(object):
>  def add_drive(self, path, opts='', interface='virtio'):
>  '''Add a virtio-blk drive to the VM'''
>  options = ['if=%s' % interface,
> -   'format=%s' % imgfmt,
> 'cache=%s' % cachemode,
> 'id=drive%d' % self._num_drives]
>  
>  if path is not None:
>  options.append('file=%s' % path)
> +options.append('format=%s' % imgfmt)
>  
>  if opts:
>  options.append(opts)
> 


-- 
This message has been scanned for viruses and
dangerous content by Fujitsu, and is believed to be clean.

Re: [Qemu-block] [Qemu-devel] [PATCH for 2.6 1/3] backup: Use Bitmap to replace "s->bitmap"

2015-11-23 Thread Wen Congyang

On 11/23/2015 05:19 PM, Fam Zheng wrote:
> On Mon, 11/23 17:01, Wen Congyang wrote:
>> On 11/20/2015 05:59 PM, Fam Zheng wrote:
>>> "s->bitmap" tracks done sectors, we only check bit states without using any
>>> iterator which HBitmap is good for. Switch to "Bitmap" which is simpler and
>>> more memory efficient.
>>>
>>> Meanwhile, rename it to done_bitmap, to reflect the intention.
>>>
>>> Signed-off-by: Fam Zheng <f...@redhat.com>
>>> ---
>>>  block/backup.c | 11 ++-
>>>  1 file changed, 6 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/block/backup.c b/block/backup.c
>>> index 3b39119..d408f98 100644
>>> --- a/block/backup.c
>>> +++ b/block/backup.c
>>> @@ -22,6 +22,7 @@
>>>  #include "qapi/qmp/qerror.h"
>>>  #include "qemu/ratelimit.h"
>>>  #include "sysemu/block-backend.h"
>>> +#include "qemu/bitmap.h"
>>>  
>>>  #define BACKUP_CLUSTER_BITS 16
>>>  #define BACKUP_CLUSTER_SIZE (1 << BACKUP_CLUSTER_BITS)
>>> @@ -47,7 +48,7 @@ typedef struct BackupBlockJob {
>>>  BlockdevOnError on_target_error;
>>>  CoRwlock flush_rwlock;
>>>  uint64_t sectors_read;
>>> -HBitmap *bitmap;
>>> +unsigned long *done_bitmap;
>>>  QLIST_HEAD(, CowRequest) inflight_reqs;
>>>  } BackupBlockJob;
>>>  
>>> @@ -113,7 +114,7 @@ static int coroutine_fn backup_do_cow(BlockDriverState 
>>> *bs,
>>>  cow_request_begin(_request, job, start, end);
>>>  
>>>  for (; start < end; start++) {
>>> -if (hbitmap_get(job->bitmap, start)) {
>>> +if (test_bit(start, job->done_bitmap)) {
>>>  trace_backup_do_cow_skip(job, start);
>>>  continue; /* already copied */
>>>      }
>>> @@ -164,7 +165,7 @@ static int coroutine_fn backup_do_cow(BlockDriverState 
>>> *bs,
>>>  goto out;
>>>  }
>>>  
>>> -hbitmap_set(job->bitmap, start, 1);
>>> +bitmap_set(job->done_bitmap, start, 1);
>>
>> You can use set_bit() here.
> 
> Why? I think bitmap_set is a better match with bitmap_new below.

set_bit() is quicker than bitmap_set() if you only set one bit.

Thanks
Wen Congyang

> 
> Fam
> 
>>
>> Thanks
>> Wen Congyang
>>
>>>  
>>>  /* Publish progress, guest I/O counts as progress too.  Note that 
>>> the
>>>   * offset field is an opaque progress value, it is not a disk 
>>> offset.
>>> @@ -394,7 +395,7 @@ static void coroutine_fn backup_run(void *opaque)
>>>  start = 0;
>>>  end = DIV_ROUND_UP(job->common.len, BACKUP_CLUSTER_SIZE);
>>>  
>>> -job->bitmap = hbitmap_alloc(end, 0);
>>> +job->done_bitmap = bitmap_new(end);
>>>  
>>>  bdrv_set_enable_write_cache(target, true);
>>>  if (target->blk) {
>>> @@ -475,7 +476,7 @@ static void coroutine_fn backup_run(void *opaque)
>>>  /* wait until pending backup_do_cow() calls have completed */
>>>  qemu_co_rwlock_wrlock(>flush_rwlock);
>>>  qemu_co_rwlock_unlock(>flush_rwlock);
>>> -hbitmap_free(job->bitmap);
>>> +g_free(job->done_bitmap);
>>>  
>>>  if (target->blk) {
>>>  blk_iostatus_disable(target->blk);
>>>
>>
> .
>

Re: [Qemu-block] [Qemu-devel] [PATCH for 2.6 1/3] backup: Use Bitmap to replace "s->bitmap"

2015-11-23 Thread Wen Congyang

On 11/20/2015 05:59 PM, Fam Zheng wrote:
> "s->bitmap" tracks done sectors, we only check bit states without using any
> iterator which HBitmap is good for. Switch to "Bitmap" which is simpler and
> more memory efficient.
> 
> Meanwhile, rename it to done_bitmap, to reflect the intention.
> 
> Signed-off-by: Fam Zheng <f...@redhat.com>
> ---
>  block/backup.c | 11 ++-
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index 3b39119..d408f98 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -22,6 +22,7 @@
>  #include "qapi/qmp/qerror.h"
>  #include "qemu/ratelimit.h"
>  #include "sysemu/block-backend.h"
> +#include "qemu/bitmap.h"
>  
>  #define BACKUP_CLUSTER_BITS 16
>  #define BACKUP_CLUSTER_SIZE (1 << BACKUP_CLUSTER_BITS)
> @@ -47,7 +48,7 @@ typedef struct BackupBlockJob {
>  BlockdevOnError on_target_error;
>  CoRwlock flush_rwlock;
>  uint64_t sectors_read;
> -HBitmap *bitmap;
> +unsigned long *done_bitmap;
>  QLIST_HEAD(, CowRequest) inflight_reqs;
>  } BackupBlockJob;
>  
> @@ -113,7 +114,7 @@ static int coroutine_fn backup_do_cow(BlockDriverState 
> *bs,
>  cow_request_begin(_request, job, start, end);
>  
>  for (; start < end; start++) {
> -if (hbitmap_get(job->bitmap, start)) {
> +if (test_bit(start, job->done_bitmap)) {
>  trace_backup_do_cow_skip(job, start);
>  continue; /* already copied */
>  }
> @@ -164,7 +165,7 @@ static int coroutine_fn backup_do_cow(BlockDriverState 
> *bs,
>  goto out;
>  }
>  
> -hbitmap_set(job->bitmap, start, 1);
> +bitmap_set(job->done_bitmap, start, 1);

You can use set_bit() here.

Thanks
Wen Congyang

>  
>  /* Publish progress, guest I/O counts as progress too.  Note that the
>   * offset field is an opaque progress value, it is not a disk offset.
> @@ -394,7 +395,7 @@ static void coroutine_fn backup_run(void *opaque)
>  start = 0;
>  end = DIV_ROUND_UP(job->common.len, BACKUP_CLUSTER_SIZE);
>  
> -job->bitmap = hbitmap_alloc(end, 0);
> +job->done_bitmap = bitmap_new(end);
>  
>  bdrv_set_enable_write_cache(target, true);
>  if (target->blk) {
> @@ -475,7 +476,7 @@ static void coroutine_fn backup_run(void *opaque)
>  /* wait until pending backup_do_cow() calls have completed */
>  qemu_co_rwlock_wrlock(>flush_rwlock);
>  qemu_co_rwlock_unlock(>flush_rwlock);
> -hbitmap_free(job->bitmap);
> +g_free(job->done_bitmap);
>  
>  if (target->blk) {
>  blk_iostatus_disable(target->blk);
>

Re: [Qemu-block] [Qemu-devel] [PATCH for 2.6 1/3] backup: Use Bitmap to replace "s->bitmap"

2015-11-23 Thread Wen Congyang

On 11/23/2015 05:55 PM, Fam Zheng wrote:
> On Mon, 11/23 17:24, Wen Congyang wrote:
>> On 11/23/2015 05:19 PM, Fam Zheng wrote:
>>> On Mon, 11/23 17:01, Wen Congyang wrote:
>>>> On 11/20/2015 05:59 PM, Fam Zheng wrote:
>>>>> "s->bitmap" tracks done sectors, we only check bit states without using 
>>>>> any
>>>>> iterator which HBitmap is good for. Switch to "Bitmap" which is simpler 
>>>>> and
>>>>> more memory efficient.
>>>>>
>>>>> Meanwhile, rename it to done_bitmap, to reflect the intention.
>>>>>
>>>>> Signed-off-by: Fam Zheng <f...@redhat.com>
>>>>> ---
>>>>>  block/backup.c | 11 ++-
>>>>>  1 file changed, 6 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/block/backup.c b/block/backup.c
>>>>> index 3b39119..d408f98 100644
>>>>> --- a/block/backup.c
>>>>> +++ b/block/backup.c
>>>>> @@ -22,6 +22,7 @@
>>>>>  #include "qapi/qmp/qerror.h"
>>>>>  #include "qemu/ratelimit.h"
>>>>>  #include "sysemu/block-backend.h"
>>>>> +#include "qemu/bitmap.h"
>>>>>  
>>>>>  #define BACKUP_CLUSTER_BITS 16
>>>>>  #define BACKUP_CLUSTER_SIZE (1 << BACKUP_CLUSTER_BITS)
>>>>> @@ -47,7 +48,7 @@ typedef struct BackupBlockJob {
>>>>>  BlockdevOnError on_target_error;
>>>>>  CoRwlock flush_rwlock;
>>>>>  uint64_t sectors_read;
>>>>> -HBitmap *bitmap;
>>>>> +unsigned long *done_bitmap;
>>>>>  QLIST_HEAD(, CowRequest) inflight_reqs;
>>>>>  } BackupBlockJob;
>>>>>  
>>>>> @@ -113,7 +114,7 @@ static int coroutine_fn 
>>>>> backup_do_cow(BlockDriverState *bs,
>>>>>  cow_request_begin(_request, job, start, end);
>>>>>  
>>>>>  for (; start < end; start++) {
>>>>> -if (hbitmap_get(job->bitmap, start)) {
>>>>> +if (test_bit(start, job->done_bitmap)) {
>>>>>  trace_backup_do_cow_skip(job, start);
>>>>>  continue; /* already copied */
>>>>>  }
>>>>> @@ -164,7 +165,7 @@ static int coroutine_fn 
>>>>> backup_do_cow(BlockDriverState *bs,
>>>>>  goto out;
>>>>>  }
>>>>>  
>>>>> -hbitmap_set(job->bitmap, start, 1);
>>>>> +bitmap_set(job->done_bitmap, start, 1);
>>>>
>>>> You can use set_bit() here.
>>>
>>> Why? I think bitmap_set is a better match with bitmap_new below.
>>
>> set_bit() is quicker than bitmap_set() if you only set one bit.
>>
> 
> How much quicker is it? This doesn't sound convincing enough for me to lose 
> the
> readability.

I don't test it. It is just a suggestion.

Thanks
Wen Congyang

> 
> Fam
> .
>

Re: [Qemu-block] [Qemu-devel] [PATCH v2 06/21] block: Exclude nested options only for children in append_open_options()

2015-11-23 Thread Wen Congyang

On 11/23/2015 11:59 PM, Kevin Wolf wrote:
> Some drivers have nested options (e.g. blkdebug rule arrays), which
> don't belong to a child node and shouldn't be removed. Don't remove all
> options with "." in their name, but check for the complete prefixes of
> actually existing child nodes.
> 
> Signed-off-by: Kevin Wolf <kw...@redhat.com>
> ---
>  block.c   | 19 +++
>  include/block/block_int.h |  1 +
>  2 files changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 23d9e10..02125e2 100644
> --- a/block.c
> +++ b/block.c
> @@ -1101,11 +1101,13 @@ static int bdrv_fill_options(QDict **options, const 
> char **pfilename,
>  
>  static BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
>  BlockDriverState *child_bs,
> +const char *child_name,
>  const BdrvChildRole *child_role)
>  {
>  BdrvChild *child = g_new(BdrvChild, 1);
>  *child = (BdrvChild) {
>  .bs = child_bs,
> +.name   = child_name,

The child_name may be allocated in the caller's stack. For example:
In the function quorum_open():
for (i = 0; i < s->num_children; i++) {
char indexstr[32];
ret = snprintf(indexstr, 32, "children.%d", i);
assert(ret < 32);

s->children[i] = bdrv_open_child(NULL, options, indexstr, bs,
 _format, false, _err);
if (local_err) {
ret = -EINVAL;
goto close_exit;
}

opened[i] = true;
}

Thanks
Wen Congyang

>  .role   = child_role,
>  };
>  
> @@ -1165,7 +1167,7 @@ void bdrv_set_backing_hd(BlockDriverState *bs, 
> BlockDriverState *backing_hd)
>  bs->backing = NULL;
>  goto out;
>  }
> -bs->backing = bdrv_attach_child(bs, backing_hd, _backing);
> +bs->backing = bdrv_attach_child(bs, backing_hd, "backing", 
> _backing);
>  bs->open_flags &= ~BDRV_O_NO_BACKING;
>  pstrcpy(bs->backing_file, sizeof(bs->backing_file), 
> backing_hd->filename);
>  pstrcpy(bs->backing_format, sizeof(bs->backing_format),
> @@ -1322,7 +1324,7 @@ BdrvChild *bdrv_open_child(const char *filename,
>  goto done;
>  }
>  
> -c = bdrv_attach_child(parent, bs, child_role);
> +c = bdrv_attach_child(parent, bs, bdref_key, child_role);
>  
>  done:
>  qdict_del(options, bdref_key);
> @@ -3952,13 +3954,22 @@ static bool append_open_options(QDict *d, 
> BlockDriverState *bs)
>  {
>  const QDictEntry *entry;
>  QemuOptDesc *desc;
> +BdrvChild *child;
>  bool found_any = false;
> +const char *p;
>  
>  for (entry = qdict_first(bs->options); entry;
>   entry = qdict_next(bs->options, entry))
>  {
> -/* Only take options for this level */
> -if (strchr(qdict_entry_key(entry), '.')) {
> +/* Exclude options for children */
> +QLIST_FOREACH(child, >children, next) {
> +if (strstart(qdict_entry_key(entry), child->name, )
> +&& (!*p || *p == '.'))
> +{
> +break;
> +}
> +}
> +if (child) {
>  continue;
>  }
>  
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 77dc165..b2325aa 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -351,6 +351,7 @@ extern const BdrvChildRole child_format;
>  
>  struct BdrvChild {
>  BlockDriverState *bs;
> +const char *name;
>  const BdrvChildRole *role;
>  QLIST_ENTRY(BdrvChild) next;
>  QLIST_ENTRY(BdrvChild) next_parent;
> 


-- 
This message has been scanned for viruses and
dangerous content by FCNIC, and is
believed to be clean.

Re: [Qemu-block] [Patch v7 3/3] qmp: add monitor command to add/remove a child

2015-11-23 Thread Wen Congyang

On 11/24/2015 12:30 AM, Eric Blake wrote:
> On 11/22/2015 11:23 PM, Wen Congyang wrote:
>> The new QMP command name is x-blockdev-change. It justs for adding/removing
> 
> s/It justs/It's just/
> 
>> quorum's child now, and don't support all kinds of children, all kinds of
> 
> s/don't/doesn't/
> 
>> operations, nor all block drivers. So it is experimental now.
>>
>> Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
>> Signed-off-by: Gonglei <arei.gong...@huawei.com>
>> ---
> 
>> +++ b/qapi/block-core.json
>> @@ -2400,3 +2400,28 @@
>>  ##
>>  { 'command': 'block-set-write-threshold',
>>'data': { 'node-name': 'str', 'write-threshold': 'uint64' } }
>> +
>> +##
>> +# @x-blockdev-change
>> +#
>> +# Dynamically reconfigure the block driver state graph. It can be used
>> +# to add, remove, insert or replace a block driver state. Currently only
>> +# the Quorum driver implements this feature to add or remove its child.
>> +# This is useful to fix a broken quorum child.
>> +#
>> +# @operation: the change operation. It can be add, delete.
> 
> Documented but not present below.  Are you missing a parameter, or
> should this line be deleted?

This line should be deleted.

> 
>> +#
>> +# @parent: the id or name of the node that will be changed.
>> +#
>> +# @child: #optional the name of the child that will be deleted.
>> +#
>> +# @node: #optional the name of the node will be added.
>> +#
>> +# Note: this command is experimental, and its API is not stable.
>> +#
>> +# Since: 2.6
>> +##
>> +{ 'command': 'x-blockdev-change',
>> +  'data' : { 'parent': 'str',
>> + '*child': 'str',
>> + '*node': 'str' } }
> 
>> +++ b/qmp-commands.hx
>> @@ -4285,6 +4285,53 @@ Example:
>>  EQMP
>>  
>>  {
>> +.name   = "x-blockdev-change",
>> +.args_type  = "parent:B,child:B?,node:B?",
>> +.mhandler.cmd_new = qmp_marshal_x_blockdev_change,
>> +},
>> +
>> +SQMP
>> +x-blockdev-change
>> +
> 
> Make the --- divider as long as the text it is underlining.

OK

> 
>> +
>> +Dynamic reconfigure the block driver state graph. It can be used to
> 
> s/Dynamic/Dynamically/
> 
>> +add, remove, insert, replace a block driver state. Currently only
> 
> s/replace/or replace/
> 
> Isn't 'add' and 'insert' the same thing?

'insert' means that a filter driver is inserted betweean A and B(A is B's 
parent).

> 
>> +the Quorum driver implements this feature to add and remove its child.
>> +This is useful to fix a broken quorum child.
>> +
>> +Arguments:
>> +- "parent": the id or node name of which node will be changed
>> +- "child": the child name which will be delete
> 
> s/delete/deleted/; mention that it is optional (if nothing is going to
> be delted)
> 
>> +- "node": the new node-name which will be added
> 
> mention that it is optional (if nothing is going to be added)

OK

> 
>> +
>> +Note: this command is experimental, and not a stable API. It doesn't
>> +support all kinds of operations, all kindes of children, nor all block
> 
> s/kindes/kinds/
> 
>> +drivers.
>> +
>> +Example:
>> +
>> +Add a new quorum's node
> 
> s/quorum's node/node to a quorum/

All comments will be addressed.

Thanks
Wen Congyang

> 
>> +-> { "execute": blockdev-add",
>> +"arguments": { "options": { "driver": "raw",
>> +"node-name": "new_node",
>> +"id": "test_new_node",
>> +"file": { "driver": "file",
>> +  "filename": "test.raw" } } } }
>> +<- { "return": {} }
>> +-> { "execute": "x-blockdev-change",
>> +"arguments": { "parent": "disk1",
>> +   "node": "new_node" } }
>> +<- { "return": {} }
>> +
>> +Delete a quorum's node
>> +-> { "execute": "x-blockdev-change",
>> +"arguments": { "parent": "disk1",
>> +   "child": "children.2" } }
>> +<- { "return": {} }
>> +
>> +EQMP
>> +
>> +{
>>  .name   = "query-named-block-nodes",
>>  .args_type  = "",
>>  .mhandler.cmd_new = qmp_marshal_query_named_block_nodes,
>>
> 


-- 
This message has been scanned for viruses and
dangerous content by FCNIC, and is
believed to be clean.

[Qemu-block] [Patch v7 2/3] quorum: implement bdrv_add_child() and bdrv_del_child()

2015-11-22 Thread Wen Congyang

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Gonglei <arei.gong...@huawei.com>
---
 block.c   |   8 ++--
 block/quorum.c| 124 +-
 include/block/block.h |   4 ++
 3 files changed, 130 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index ba5806e..fc067ee 100644
--- a/block.c
+++ b/block.c
@@ -1085,10 +1085,10 @@ static int bdrv_fill_options(QDict **options, const 
char **pfilename,
 return 0;
 }
 
-static BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
-BlockDriverState *child_bs,
-const char *child_name,
-const BdrvChildRole *child_role)
+BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
+ BlockDriverState *child_bs,
+ const char *child_name,
+ const BdrvChildRole *child_role)
 {
 BdrvChild *child = g_new(BdrvChild, 1);
 *child = (BdrvChild) {
diff --git a/block/quorum.c b/block/quorum.c
index b9ba028..1938546 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -23,6 +23,7 @@
 #include "qapi/qmp/qstring.h"
 #include "qapi-event.h"
 #include "crypto/hash.h"
+#include "qemu/bitmap.h"
 
 #define HASH_LENGTH 32
 
@@ -80,6 +81,8 @@ typedef struct BDRVQuorumState {
 bool rewrite_corrupted;/* true if the driver must rewrite-on-read corrupted
 * block if Quorum is reached.
 */
+unsigned long *index_bitmap;
+int bsize;
 
 QuorumReadPattern read_pattern;
 } BDRVQuorumState;
@@ -875,9 +878,9 @@ static int quorum_open(BlockDriverState *bs, QDict 
*options, int flags,
 ret = -EINVAL;
 goto exit;
 }
-if (s->num_children < 2) {
+if (s->num_children < 1) {
 error_setg(_err,
-   "Number of provided children must be greater than 1");
+   "Number of provided children must be 1 or more");
 ret = -EINVAL;
 goto exit;
 }
@@ -926,6 +929,7 @@ static int quorum_open(BlockDriverState *bs, QDict 
*options, int flags,
 /* allocate the children array */
 s->children = g_new0(BdrvChild *, s->num_children);
 opened = g_new0(bool, s->num_children);
+s->index_bitmap = bitmap_new(s->num_children);
 
 for (i = 0; i < s->num_children; i++) {
 char indexstr[32];
@@ -941,6 +945,8 @@ static int quorum_open(BlockDriverState *bs, QDict 
*options, int flags,
 
 opened[i] = true;
 }
+bitmap_set(s->index_bitmap, 0, s->num_children);
+s->bsize = s->num_children;
 
 g_free(opened);
 goto exit;
@@ -997,6 +1003,117 @@ static void quorum_attach_aio_context(BlockDriverState 
*bs,
 }
 }
 
+static int get_new_child_index(BDRVQuorumState *s)
+{
+int index;
+
+index = find_next_zero_bit(s->index_bitmap, s->bsize, 0);
+if (index < s->bsize) {
+return index;
+}
+
+if ((s->bsize % BITS_PER_LONG) == 0) {
+s->index_bitmap = bitmap_zero_extend(s->index_bitmap, s->bsize,
+ s->bsize + 1);
+}
+
+return s->bsize++;
+}
+
+static void remove_child_index(BDRVQuorumState *s, int index)
+{
+int last_index;
+long new_len;
+
+assert(index < s->bsize);
+
+clear_bit(index, s->index_bitmap);
+if (index < s->bsize - 1) {
+/*
+ * The last bit is always set, and we don't clear
+ * the last bit.
+ */
+return;
+}
+
+last_index = find_last_bit(s->index_bitmap, s->bsize);
+if (BITS_TO_LONGS(last_index + 1) == BITS_TO_LONGS(s->bsize)) {
+s->bsize = last_index + 1;
+return;
+}
+
+new_len = BITS_TO_LONGS(last_index + 1) * sizeof(unsigned long);
+s->index_bitmap = g_realloc(s->index_bitmap, new_len);
+s->bsize = last_index + 1;
+}
+
+static void quorum_add_child(BlockDriverState *bs, BlockDriverState *child_bs,
+ Error **errp)
+{
+BDRVQuorumState *s = bs->opaque;
+BdrvChild *child;
+char indexstr[32];
+int index = find_next_zero_bit(s->index_bitmap, s->bsize, 0);
+int ret;
+
+index = get_new_child_index(s);
+ret = snprintf(indexstr, 32, "children.%d", index);
+if (ret < 0 || ret >= 32) {
+error_setg(errp, "cannot generate child name");
+return;
+}
+
+bdrv_drain(bs);
+
+assert(s->num_children <= INT_MAX / sizeof(BdrvChild *));
+if (s->num_children == INT_MAX / sizeof(BdrvChild *)) {
+error_setg(errp, "Too many children");
+return;
+}
+s->children = g_

[Qemu-block] [Patch v7 0/3] qapi: child add/delete support

2015-11-22 Thread Wen Congyang

If quorum's child is broken, we can use mirror job to replace it.
But sometimes, the user only need to remove the broken child, and
add it later when the problem is fixed.

It is based on the Kevin's child name related patch:
http://repo.or.cz/qemu/kevin.git/commitdiff/b8f3aba84160564576a5a068398f20eca13768af

ChangLog:
v7:
1. Remove the qmp command x-blockdev-change's parameter operation according
   to Kevin's comments.
2. Remove the hmp command.
v6:
1. Use a single qmp command x-blockdev-change to replace x-blockdev-child-add
   and x-blockdev-child-delete
v5:
1. Address Eric Blake's comments
v4:
1. drop nbd driver's implementation. We can use human-monitor-command
   to do it.
2. Rename the command name.
v3:
1. Don't open BDS in bdrv_add_child(). Use the existing BDS which is
   created by the QMP command blockdev-add.
2. The driver NBD can support filename, path, host:port now.
v2:
1. Use bdrv_get_device_or_node_name() instead of new function
   bdrv_get_id_or_node_name()
2. Update the error message
3. Update the documents in block-core.json

Wen Congyang (3):
  Add new block driver interface to add/delete a BDS's child
  quorum: implement bdrv_add_child() and bdrv_del_child()
  qmp: add monitor command to add/remove a child

 block.c   |  58 --
 block/quorum.c| 124 +-
 blockdev.c|  54 
 include/block/block.h |   9 
 include/block/block_int.h |   5 ++
 qapi/block-core.json  |  25 ++
 qmp-commands.hx   |  47 ++
 7 files changed, 316 insertions(+), 6 deletions(-)

-- 
2.5.0

[Qemu-block] [Patch v7 3/3] qmp: add monitor command to add/remove a child

2015-11-22 Thread Wen Congyang

The new QMP command name is x-blockdev-change. It justs for adding/removing
quorum's child now, and don't support all kinds of children, all kinds of
operations, nor all block drivers. So it is experimental now.

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Gonglei <arei.gong...@huawei.com>
---
 blockdev.c   | 54 
 qapi/block-core.json | 25 
 qmp-commands.hx  | 47 +
 3 files changed, 126 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 313841b..7736d84 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3837,6 +3837,60 @@ out:
 aio_context_release(aio_context);
 }
 
+static BlockDriverState *bdrv_find_child(BlockDriverState *parent_bs,
+ const char *child_name)
+{
+BdrvChild *child;
+
+QLIST_FOREACH(child, _bs->children, next) {
+if (strcmp(child->name, child_name) == 0) {
+return child->bs;
+}
+}
+
+return NULL;
+}
+
+void qmp_x_blockdev_change(const char *parent, bool has_child,
+   const char *child, bool has_node,
+   const char *node, Error **errp)
+{
+BlockDriverState *parent_bs, *child_bs = NULL, *new_bs = NULL;
+
+parent_bs = bdrv_lookup_bs(parent, parent, errp);
+if (!parent_bs) {
+return;
+}
+
+if (has_child == has_node) {
+if (has_child) {
+error_setg(errp, "The paramter child and node is conflict");
+} else {
+error_setg(errp, "Either child or node should be specified");
+}
+return;
+}
+
+if (has_child) {
+child_bs = bdrv_find_child(parent_bs, child);
+if (!child_bs) {
+error_setg(errp, "Node '%s' doesn't have child %s",
+   parent, child);
+return;
+}
+bdrv_del_child(parent_bs, child_bs, errp);
+}
+
+if (has_node) {
+new_bs = bdrv_find_node(node);
+if (!new_bs) {
+error_setg(errp, "Node '%s' not found", node);
+return;
+}
+bdrv_add_child(parent_bs, new_bs, errp);
+}
+}
+
 BlockJobInfoList *qmp_query_block_jobs(Error **errp)
 {
 BlockJobInfoList *head = NULL, **p_next = 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index a07b13f..22fc2ee 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2400,3 +2400,28 @@
 ##
 { 'command': 'block-set-write-threshold',
   'data': { 'node-name': 'str', 'write-threshold': 'uint64' } }
+
+##
+# @x-blockdev-change
+#
+# Dynamically reconfigure the block driver state graph. It can be used
+# to add, remove, insert or replace a block driver state. Currently only
+# the Quorum driver implements this feature to add or remove its child.
+# This is useful to fix a broken quorum child.
+#
+# @operation: the change operation. It can be add, delete.
+#
+# @parent: the id or name of the node that will be changed.
+#
+# @child: #optional the name of the child that will be deleted.
+#
+# @node: #optional the name of the node will be added.
+#
+# Note: this command is experimental, and its API is not stable.
+#
+# Since: 2.6
+##
+{ 'command': 'x-blockdev-change',
+  'data' : { 'parent': 'str',
+ '*child': 'str',
+ '*node': 'str' } }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 9d8b42f..c65d693 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -4285,6 +4285,53 @@ Example:
 EQMP
 
 {
+.name   = "x-blockdev-change",
+.args_type  = "parent:B,child:B?,node:B?",
+.mhandler.cmd_new = qmp_marshal_x_blockdev_change,
+},
+
+SQMP
+x-blockdev-change
+
+
+Dynamic reconfigure the block driver state graph. It can be used to
+add, remove, insert, replace a block driver state. Currently only
+the Quorum driver implements this feature to add and remove its child.
+This is useful to fix a broken quorum child.
+
+Arguments:
+- "parent": the id or node name of which node will be changed
+- "child": the child name which will be delete
+- "node": the new node-name which will be added
+
+Note: this command is experimental, and not a stable API. It doesn't
+support all kinds of operations, all kindes of children, nor all block
+drivers.
+
+Example:
+
+Add a new quorum's node
+-> { "execute": blockdev-add",
+"arguments": { "options": { "driver": "raw",
+"node-name": "new_node",
+"id": "test_new_node",
+"file": { "driver": "file",
+  &q

[Qemu-block] [PATCH for-2.5] block-migration: limit the memory usage

2015-11-20 Thread Wen Congyang

If we set migration speed in a very large value, block-migration will try to 
read
all data to the memory. Because
(block_mig_state.submitted + block_mig_state.read_done) * BLOCK_SIZE
will be overflow, and it will be always less than rate limit.

There is no need to read too many data into memory when the rate limit is very 
large.
So limit the memory usage can fix the overflow problem.

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
---
 migration/block.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/migration/block.c b/migration/block.c
index 310e2b3..656f38f 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -36,6 +36,8 @@
 
 #define MAX_IS_ALLOCATED_SEARCH 65536
 
+#define MAX_INFLIGHT_IO 512
+
 //#define DEBUG_BLK_MIGRATION
 
 #ifdef DEBUG_BLK_MIGRATION
@@ -665,7 +667,10 @@ static int block_save_iterate(QEMUFile *f, void *opaque)
 blk_mig_lock();
 while ((block_mig_state.submitted +
 block_mig_state.read_done) * BLOCK_SIZE <
-   qemu_file_get_rate_limit(f)) {
+   qemu_file_get_rate_limit(f) &&
+   (block_mig_state.submitted +
+block_mig_state.read_done) <
+   MAX_INFLIGHT_IO) {
 blk_mig_unlock();
 if (block_mig_state.bulk_completed == 0) {
 /* first finish the bulk phase */
-- 
2.5.0

Re: [Qemu-block] [PATCH v6 0/4] qapi: child add/delete support

2015-11-13 Thread Wen Congyang

On 11/13/2015 05:28 PM, Stefan Hajnoczi wrote:
> On Fri, Oct 30, 2015 at 02:11:30PM +0800, Wen Congyang wrote:
>> Ping...
> 
> Tips for faster code review:
> 
> It helps to mention the specific person you are expecting review from
> when the CC list is long.  For example, "Kevin: ping".

Do you mean when I ping a patch, I need write 'xxx: ping' if the CC list
is long?

> 
> Keeping the CC list short can result in faster code review than a long
> CC list because it's obvious who needs to reply.

Thanks, I always put them in the TO list, and the CC list contains the
people who is interested this patch.

Wen Congyang

> 
> Hope this helps.
> 
> Stefan
>

Re: [Qemu-block] [Qemu-devel] [PATCH v6 3/4] qmp: add monitor command to add/remove a child

2015-11-13 Thread Wen Congyang

On 11/10/2015 09:40 AM, Wen Congyang wrote:
> On 11/10/2015 12:04 AM, Kevin Wolf wrote:
>> Am 16.10.2015 um 10:57 hat Wen Congyang geschrieben:
>>> +##
>>> +# @ChangeOperation:
>>> +#
>>> +# An enumeration of block device change operation.
>>> +#
>>> +# @add: Add a new block driver state to a existed block driver state.
>>> +#
>>> +# @delete: Delete a block driver state's child.
>>> +#
>>> +# Since: 2.5
>>> +##
>>> +{ 'enum': 'ChangeOperation',
>>> +  'data': [ 'add', 'delete' ] }
>>
>> What's the advantage of this enum compared to separate QMP commands? The
>> way it is specified here, ChangeOperation is already implicit by whether
>> or not child and node are given.
>>
>>> +##
>>> +# @x-blockdev-change
>>> +#
>>> +# Dynamic reconfigure the block driver state graph. It can be used to
>>> +# add, remove, insert, replace a block driver state. Currently only
>>> +# the Quorum driver implements this feature to add and remove its child.
>>> +# This is useful to fix a broken quorum child.
>>> +#
>>> +# @operation: the chanage operation. It can be add, delete.
>>> +#
>>> +# @parent: the id or node name of which node will be changed.
>>> +#
>>> +# @child: the child node-name which will be deleted.
>>
>> #optional
>>
>> Must be present for operation = delete, must not be present otherwise.
>>
>>> +# @node: the new node-name which will be added.
>>
>> #optional
>>
>> Must be present for operation = add, must not be present otherwise.
>>
>>> +#
>>> +# Note: this command is experimental, and not a stable API.
>>> +#
>>> +# Since: 2.5
>>> +##
>>> +{ 'command': 'x-blockdev-change',
>>> +  'data' : { 'operation': 'ChangeOperation',
>>> + 'parent': 'str',
>>> + '*child': 'str',
>>> + '*node': 'str' } }
>>
>> Let me suggest this alternative:
>>
>> { 'command': 'x-blockdev-change',
>>   'data' : { 'parent': 'str',
>>  'child': 'str',
>>  '*node': 'str' } }
>>
>> child doesn't describe a node name then, but a child name (adds a
>> dependency on my patches which add a name to BdrvChild, though).
> 
> Where is the patch? I don't find it.
> 
>> Depending on whether node is given and whether the child already exists,
>> this may add, remove or replace a child.
> 
> If the user wants to insert a filter driver between parent and child, we
> also needs three parameters: parent, child, node. So it is why I add the
> parameter operation.

Hi kevin, I still wait for your reply...

Thanks
Wen Congyang

> 
> Thanks
> Wen Congyang
> 
>>
>> Kevin
>> .
>>
> 
> 
> .
>

Re: [Qemu-block] [PATCH v11 00/12] Block replication for continuous checkpoints

2015-11-12 Thread Wen Congyang

ping...

On 11/03/2015 06:58 PM, Wen Congyang wrote:
> You can the detailed information about block replication from here:
> http://wiki.qemu.org/Features/BlockReplication
> 
> Usage:
> Please refer to docs/block-replication.txt
> 
> This patch series is based on the following patch series:
> 1. http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg03860.html
> 2. http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg06124.html
> 
> You can get the patch here:
> https://github.com/coloft/qemu/tree/wency/block-replication-v11
> 
> The newest framework will be sent later.
> 
> TODO:
> 1. Continuous block replication. It will be started after basic functions
>are accepted.
> 
> Changs Log:
> V11:
> 1. Reopen the backing file when starting blcok replication if it is not
>opened in R/W mode
> 2. Unblock BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET
>when opening backing file
> 3. Block the top BDS so there is only one block job for the top BDS and
>its backing chain.
> V10:
> 1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
>reference.
> 2. Address the comments from Eric Blake
> V9:
> 1. Update the error messages
> 2. Rebase to the newest qemu
> 3. Split child add/delete support. These patches are sent in another patchset.
> V8:
> 1. Address Alberto Garcia's comments
> V7:
> 1. Implement adding/removing quorum child. Remove the option non-connect.
> 2. Simplify the backing refrence option according to Stefan Hajnoczi's 
> suggestion
> V6:
> 1. Rebase to the newest qemu.
> V5:
> 1. Address the comments from Gong Lei
> 2. Speed the failover up. The secondary vm can take over very quickly even
>if there are too many I/O requests.
> V4:
> 1. Introduce a new driver replication to avoid touch nbd and qcow2.
> V3:
> 1: use error_setg() instead of error_set()
> 2. Add a new block job API
> 3. Active disk, hidden disk and nbd target uses the same AioContext
> 4. Add a testcase to test new hbitmap API
> V2:
> 1. Redesign the secondary qemu(use image-fleecing)
> 2. Use Error objects to return error message
> 3. Address the comments from Max Reitz and Eric Blake
> 
> Wen Congyang (12):
>   unblock backup operations in backing file
>   Store parent BDS in BdrvChild
>   allow writing to the backing file
>   Backup: clear all bitmap when doing block checkpoint
>   Allow creating backup jobs when opening BDS
>   block: make bdrv_put_ref_bh_schedule() as a public API
>   docs: block replication's description
>   Add new block driver interfaces to control block replication
>   quorum: implement block driver interfaces for block replication
>   Implement new driver for block replication
>   support replication driver in blockdev-add
>   Add a new API to start/stop replication, do checkpoint to all BDSes
> 
>  block.c| 211 -
>  block/Makefile.objs|   3 +-
>  block/backup.c |  14 ++
>  block/quorum.c |  78 +++
>  block/replication.c| 550 
> +
>  blockdev.c |  37 +--
>  blockjob.c |  11 +
>  docs/block-replication.txt | 251 +
>  include/block/block.h  |  10 +
>  include/block/block_int.h  |  15 ++
>  include/block/blockjob.h   |  12 +
>  qapi/block-core.json   |  34 ++-
>  12 files changed, 1190 insertions(+), 36 deletions(-)
>  create mode 100644 block/replication.c
>  create mode 100644 docs/block-replication.txt
>

Re: [Qemu-block] [PATCH v6 4/4] hmp: add monitor command to add/remove a child

2015-11-10 Thread Wen Congyang

On 11/09/2015 10:54 PM, Alberto Garcia wrote:
> On Fri 16 Oct 2015 10:57:46 AM CEST, Wen Congyang wrote:
> 
>> +.name   = "blockdev_change",
>> +.args_type  = "op:s,parent:B,child:B?,node:?",
>> +.params = "operation parent [child] [node]",
>   [...]
>> +/*
>> + * FIXME: we must specify the parameter child, otherwise,
>> + * we can't specify the parameter node.
>> + */
>> +if (op == CHANGE_OPERATION_ADD) {
>> +has_child = false;
>> +}
> 
> So if you want to perform the 'add' operation you must pass both 'child'
> and 'node' but the former will be discarded.
> 
> I don't think you really need to do this for the HMP interface, but it's
> anyway one more good reason to merge 'child' and 'node'.

Do you mean there is no need to implement the HMP interface?

Thanks
Wen Congyang

> 
> Berto
> .
>

Re: [Qemu-block] [PATCH v6 3/4] qmp: add monitor command to add/remove a child

2015-11-09 Thread Wen Congyang

On 11/10/2015 12:04 AM, Kevin Wolf wrote:
> Am 16.10.2015 um 10:57 hat Wen Congyang geschrieben:
>> +##
>> +# @ChangeOperation:
>> +#
>> +# An enumeration of block device change operation.
>> +#
>> +# @add: Add a new block driver state to a existed block driver state.
>> +#
>> +# @delete: Delete a block driver state's child.
>> +#
>> +# Since: 2.5
>> +##
>> +{ 'enum': 'ChangeOperation',
>> +  'data': [ 'add', 'delete' ] }
> 
> What's the advantage of this enum compared to separate QMP commands? The
> way it is specified here, ChangeOperation is already implicit by whether
> or not child and node are given.
> 
>> +##
>> +# @x-blockdev-change
>> +#
>> +# Dynamic reconfigure the block driver state graph. It can be used to
>> +# add, remove, insert, replace a block driver state. Currently only
>> +# the Quorum driver implements this feature to add and remove its child.
>> +# This is useful to fix a broken quorum child.
>> +#
>> +# @operation: the chanage operation. It can be add, delete.
>> +#
>> +# @parent: the id or node name of which node will be changed.
>> +#
>> +# @child: the child node-name which will be deleted.
> 
> #optional
> 
> Must be present for operation = delete, must not be present otherwise.
> 
>> +# @node: the new node-name which will be added.
> 
> #optional
> 
> Must be present for operation = add, must not be present otherwise.
> 
>> +#
>> +# Note: this command is experimental, and not a stable API.
>> +#
>> +# Since: 2.5
>> +##
>> +{ 'command': 'x-blockdev-change',
>> +  'data' : { 'operation': 'ChangeOperation',
>> + 'parent': 'str',
>> + '*child': 'str',
>> + '*node': 'str' } }
> 
> Let me suggest this alternative:
> 
> { 'command': 'x-blockdev-change',
>   'data' : { 'parent': 'str',
>  'child': 'str',
>  '*node': 'str' } }
> 
> child doesn't describe a node name then, but a child name (adds a
> dependency on my patches which add a name to BdrvChild, though).

Where is the patch? I don't find it.

> Depending on whether node is given and whether the child already exists,
> this may add, remove or replace a child.

If the user wants to insert a filter driver between parent and child, we
also needs three parameters: parent, child, node. So it is why I add the
parameter operation.

Thanks
Wen Congyang

> 
> Kevin
> .
>

Re: [Qemu-block] [PATCH v6 3/4] qmp: add monitor command to add/remove a child

2015-11-05 Thread Wen Congyang

On 11/05/2015 09:49 PM, Alberto Garcia wrote:
> On Fri 16 Oct 2015 10:57:45 AM CEST, Wen Congyang <we...@cn.fujitsu.com> 
> wrote:
> 
>> The new QMP command name is x-blockdev-change. It justs for
>> adding/removing quorum's child now, and don't support all kinds of
>> children, all kinds of operations, nor all block drivers. So it is
>> experimental now.
> 
> I might have missed some discussion, why were the -add and -delete 

This monitor command can be used to implement: add, delete, insert, remove,
replace... Currently, I only implement add and delete operation.

> 
>> +# @x-blockdev-change
>> +#
>> +# Dynamic reconfigure the block driver state graph. It can be used to
>> +# add, remove, insert, replace a block driver state. Currently only
>> +# the Quorum driver implements this feature to add and remove its child.
>> +# This is useful to fix a broken quorum child.
>> +#
>> +# @operation: the chanage operation. It can be add, delete.
>> +#
>> +# @parent: the id or node name of which node will be changed.
>> +#
>> +# @child: the child node-name which will be deleted.
>> +#
>> +# @node: the new node-name which will be added.
>> +#
>> +# Note: this command is experimental, and not a stable API.
>> +#
>> +# Since: 2.5
>> +##
>> +{ 'command': 'x-blockdev-change',
>> +  'data' : { 'operation': 'ChangeOperation',
>> + 'parent': 'str',
>> + '*child': 'str',
>> + '*node': 'str' } }
> 
> Do you really need two separate 'child' and 'node' parameters? If the
> operation is 'add' you can only use 'node', if it is 'delete, you can
> only use 'child'. It seems to me that you can simply have one 'node'
> parameter and use it for both ...

parent and child already exist in the BDS graph, and node is a new node.
In the furture, we may need to implement insert opetioan, and this operation
needs such three BDSes.

Thanks
Wen Congyang

> 
> Berto
> .
>

[Qemu-block] [PATCH v11 11/12] support replication driver in blockdev-add

2015-11-03 Thread Wen Congyang

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Gonglei <arei.gong...@huawei.com>
Reviewed-by: Eric Blake <ebl...@redhat.com>
---
 qapi/block-core.json | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 0539dfa..acc85ba 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -219,7 +219,7 @@
 #   'qcow2', 'raw', 'tftp', 'vdi', 'vmdk', 'vpc', 'vvfat'
 #   2.2: 'archipelago' added, 'cow' dropped
 #   2.3: 'host_floppy' deprecated
-#   2.5: 'host_floppy' dropped
+#   2.5: 'host_floppy' dropped, 'replication' added
 #
 # @backing_file: #optional the name of the backing file (for copy-on-write)
 #
@@ -1375,6 +1375,7 @@
 # Drivers that are supported in block device operations.
 #
 # @host_device, @host_cdrom: Since 2.1
+# @replication: Since 2.5
 #
 # Since: 2.0
 ##
@@ -1382,8 +1383,8 @@
   'data': [ 'archipelago', 'blkdebug', 'blkverify', 'bochs', 'cloop',
 'dmg', 'file', 'ftp', 'ftps', 'host_cdrom', 'host_device',
 'http', 'https', 'null-aio', 'null-co', 'parallels',
-'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'tftp', 'vdi', 'vhdx',
-'vmdk', 'vpc', 'vvfat' ] }
+'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'replication',
+'tftp', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
 
 ##
 # @BlockdevOptionsBase
@@ -1810,6 +1811,19 @@
 { 'enum' : 'ReplicationMode', 'data' : [ 'primary', 'secondary' ] }
 
 ##
+# @BlockdevOptionsReplication
+#
+# Driver specific block device options for replication
+#
+# @mode: the replication mode
+#
+# Since: 2.5
+##
+{ 'struct': 'BlockdevOptionsReplication',
+  'base': 'BlockdevOptionsGenericFormat',
+  'data': { 'mode': 'ReplicationMode'  } }
+
+##
 # @BlockdevOptions
 #
 # Options for creating a block device.
@@ -1846,6 +1860,7 @@
   'quorum': 'BlockdevOptionsQuorum',
   'raw':'BlockdevOptionsGenericFormat',
 # TODO rbd: Wait for structured options
+  'replication':'BlockdevOptionsReplication',
 # TODO sheepdog: Wait for structured options
 # TODO ssh: Should take InetSocketAddress for 'host'?
   'tftp':   'BlockdevOptionsFile',
-- 
2.4.3

[Qemu-block] [PATCH v11 05/12] Allow creating backup jobs when opening BDS

2015-11-03 Thread Wen Congyang

When opening BDS, we need to create backup jobs for
image-fleecing.

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Gonglei <arei.gong...@huawei.com>
Reviewed-by: Stefan Hajnoczi <stefa...@redhat.com>
Reviewed-by: Jeff Cody <jc...@redhat.com>
---
 block/Makefile.objs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 58ef2ef..fa05f37 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -22,10 +22,10 @@ block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o
 block-obj-$(CONFIG_LIBSSH2) += ssh.o
 block-obj-y += accounting.o
 block-obj-y += write-threshold.o
+block-obj-y += backup.o
 
 common-obj-y += stream.o
 common-obj-y += commit.o
-common-obj-y += backup.o
 
 iscsi.o-cflags := $(LIBISCSI_CFLAGS)
 iscsi.o-libs   := $(LIBISCSI_LIBS)
-- 
2.4.3

[Qemu-block] [PATCH v11 10/12] Implement new driver for block replication

2015-11-03 Thread Wen Congyang

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Gonglei <arei.gong...@huawei.com>
---
 block/Makefile.objs |   1 +
 block/replication.c | 550 
 2 files changed, 551 insertions(+)
 create mode 100644 block/replication.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index fa05f37..94c1d03 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -23,6 +23,7 @@ block-obj-$(CONFIG_LIBSSH2) += ssh.o
 block-obj-y += accounting.o
 block-obj-y += write-threshold.o
 block-obj-y += backup.o
+block-obj-y += replication.o
 
 common-obj-y += stream.o
 common-obj-y += commit.o
diff --git a/block/replication.c b/block/replication.c
new file mode 100644
index 000..ca870d2
--- /dev/null
+++ b/block/replication.c
@@ -0,0 +1,550 @@
+/*
+ * Replication Block filter
+ *
+ * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2015 Intel Corporation
+ * Copyright (c) 2015 FUJITSU LIMITED
+ *
+ * Author:
+ *   Wen Congyang <we...@cn.fujitsu.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu-common.h"
+#include "block/block_int.h"
+#include "block/blockjob.h"
+#include "block/nbd.h"
+
+typedef struct BDRVReplicationState {
+ReplicationMode mode;
+int replication_state;
+BlockDriverState *active_disk;
+BlockDriverState *hidden_disk;
+BlockDriverState *secondary_disk;
+BlockDriverState *top_bs;
+Error *blocker;
+int orig_hidden_flags;
+int orig_secondary_flags;
+int error;
+} BDRVReplicationState;
+
+enum {
+BLOCK_REPLICATION_NONE, /* block replication is not started */
+BLOCK_REPLICATION_RUNNING,  /* block replication is running */
+BLOCK_REPLICATION_DONE, /* block replication is done(failover) */
+};
+
+#define COMMIT_CLUSTER_BITS 16
+#define COMMIT_CLUSTER_SIZE (1 << COMMIT_CLUSTER_BITS)
+#define COMMIT_SECTORS_PER_CLUSTER (COMMIT_CLUSTER_SIZE / BDRV_SECTOR_SIZE)
+
+static void replication_stop(BlockDriverState *bs, bool failover, Error 
**errp);
+
+#define REPLICATION_MODE"mode"
+static QemuOptsList replication_runtime_opts = {
+.name = "replication",
+.head = QTAILQ_HEAD_INITIALIZER(replication_runtime_opts.head),
+.desc = {
+{
+.name = REPLICATION_MODE,
+.type = QEMU_OPT_STRING,
+},
+{ /* end of list */ }
+},
+};
+
+static int replication_open(BlockDriverState *bs, QDict *options,
+int flags, Error **errp)
+{
+int ret;
+BDRVReplicationState *s = bs->opaque;;
+Error *local_err = NULL;
+QemuOpts *opts = NULL;
+const char *mode;
+
+ret = -EINVAL;
+opts = qemu_opts_create(_runtime_opts, NULL, 0, _abort);
+qemu_opts_absorb_qdict(opts, options, _err);
+if (local_err) {
+goto fail;
+}
+
+mode = qemu_opt_get(opts, REPLICATION_MODE);
+if (!mode) {
+error_setg(_err, "Missing the option mode");
+goto fail;
+}
+
+if (!strcmp(mode, "primary")) {
+s->mode = REPLICATION_MODE_PRIMARY;
+} else if (!strcmp(mode, "secondary")) {
+s->mode = REPLICATION_MODE_SECONDARY;
+} else {
+error_setg(_err,
+   "The option mode's value should be primary or secondary");
+goto fail;
+}
+
+ret = 0;
+
+fail:
+qemu_opts_del(opts);
+/* propagate error */
+if (local_err) {
+error_propagate(errp, local_err);
+}
+return ret;
+}
+
+static void replication_close(BlockDriverState *bs)
+{
+BDRVReplicationState *s = bs->opaque;
+
+if (s->replication_state == BLOCK_REPLICATION_RUNNING) {
+replication_stop(bs, false, NULL);
+}
+}
+
+static int64_t replication_getlength(BlockDriverState *bs)
+{
+return bdrv_getlength(bs->file->bs);
+}
+
+static int replication_get_io_status(BDRVReplicationState *s)
+{
+switch (s->replication_state) {
+case BLOCK_REPLICATION_NONE:
+return -EIO;
+case BLOCK_REPLICATION_RUNNING:
+return 0;
+case BLOCK_REPLICATION_DONE:
+return s->mode == REPLICATION_MODE_PRIMARY ? -EIO : 1;
+default:
+abort();
+}
+}
+
+static int replication_return_value(BDRVReplicationState *s, int ret)
+{
+if (s->mode == REPLICATION_MODE_SECONDARY) {
+return ret;
+}
+
+if (ret < 0) {
+s->error = ret;
+ret = 0;
+}
+
+return ret;
+}
+
+static coroutine_fn int replication_co_readv(BlockDriverState *bs,
+ int64_t sector_num,
+ int remaining_sectors,
+ QEMUIOVector *q

[Qemu-block] [PATCH v11 04/12] Backup: clear all bitmap when doing block checkpoint

2015-11-03 Thread Wen Congyang

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Gonglei <arei.gong...@huawei.com>
Reviewed-by: Jeff Cody <jc...@redhat.com>
---
 block/backup.c   | 14 ++
 blockjob.c   | 11 +++
 include/block/blockjob.h | 12 
 3 files changed, 37 insertions(+)

diff --git a/block/backup.c b/block/backup.c
index ec01db8..4232962 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -221,11 +221,25 @@ static void backup_iostatus_reset(BlockJob *job)
 }
 }
 
+static void backup_do_checkpoint(BlockJob *job, Error **errp)
+{
+BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
+
+if (backup_job->sync_mode != MIRROR_SYNC_MODE_NONE) {
+error_setg(errp, "The backup job only supports block checkpoint in"
+   " sync=none mode");
+return;
+}
+
+hbitmap_reset_all(backup_job->bitmap);
+}
+
 static const BlockJobDriver backup_job_driver = {
 .instance_size  = sizeof(BackupBlockJob),
 .job_type   = BLOCK_JOB_TYPE_BACKUP,
 .set_speed  = backup_set_speed,
 .iostatus_reset = backup_iostatus_reset,
+.do_checkpoint  = backup_do_checkpoint,
 };
 
 static BlockErrorAction backup_error_action(BackupBlockJob *job,
diff --git a/blockjob.c b/blockjob.c
index c02fe59..0bd2656 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -406,3 +406,14 @@ void block_job_defer_to_main_loop(BlockJob *job,
 
 qemu_bh_schedule(data->bh);
 }
+
+void block_job_do_checkpoint(BlockJob *job, Error **errp)
+{
+if (!job->driver->do_checkpoint) {
+error_setg(errp, "The job %s doesn't support block checkpoint",
+   BlockJobType_lookup[job->driver->job_type]);
+return;
+}
+
+job->driver->do_checkpoint(job, errp);
+}
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 289b13f..ae9e01c 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -50,6 +50,9 @@ typedef struct BlockJobDriver {
  * manually.
  */
 void (*complete)(BlockJob *job, Error **errp);
+
+/** Optional callback for job types that support checkpoint. */
+void (*do_checkpoint)(BlockJob *job, Error **errp);
 } BlockJobDriver;
 
 /**
@@ -364,4 +367,13 @@ void block_job_defer_to_main_loop(BlockJob *job,
   BlockJobDeferToMainLoopFn *fn,
   void *opaque);
 
+/**
+ * block_job_do_checkpoint:
+ * @job: The job.
+ * @errp: Error object.
+ *
+ * Do block checkpoint on the specified job.
+ */
+void block_job_do_checkpoint(BlockJob *job, Error **errp);
+
 #endif
-- 
2.4.3

[Qemu-block] [PATCH v11 06/12] block: make bdrv_put_ref_bh_schedule() as a public API

2015-11-03 Thread Wen Congyang

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
---
 block.c   | 25 +
 blockdev.c| 37 ++---
 include/block/block.h |  1 +
 3 files changed, 32 insertions(+), 31 deletions(-)

diff --git a/block.c b/block.c
index 32ed776..9a1c20e 100644
--- a/block.c
+++ b/block.c
@@ -3508,6 +3508,31 @@ void bdrv_unref(BlockDriverState *bs)
 }
 }
 
+typedef struct {
+QEMUBH *bh;
+BlockDriverState *bs;
+} BDRVPutRefBH;
+
+static void bdrv_put_ref_bh(void *opaque)
+{
+BDRVPutRefBH *s = opaque;
+
+bdrv_unref(s->bs);
+qemu_bh_delete(s->bh);
+g_free(s);
+}
+
+/* Release a BDS reference in a BH */
+void bdrv_put_ref_bh_schedule(BlockDriverState *bs)
+{
+BDRVPutRefBH *s;
+
+s = g_new(BDRVPutRefBH, 1);
+s->bh = qemu_bh_new(bdrv_put_ref_bh, s);
+s->bs = bs;
+qemu_bh_schedule(s->bh);
+}
+
 struct BdrvOpBlocker {
 Error *reason;
 QLIST_ENTRY(BdrvOpBlocker) list;
diff --git a/blockdev.c b/blockdev.c
index bd13669..9d0b3ea 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -278,37 +278,6 @@ static void bdrv_format_print(void *opaque, const char 
*name)
 error_printf(" %s", name);
 }
 
-typedef struct {
-QEMUBH *bh;
-BlockDriverState *bs;
-} BDRVPutRefBH;
-
-static void bdrv_put_ref_bh(void *opaque)
-{
-BDRVPutRefBH *s = opaque;
-
-bdrv_unref(s->bs);
-qemu_bh_delete(s->bh);
-g_free(s);
-}
-
-/*
- * Release a BDS reference in a BH
- *
- * It is not safe to use bdrv_unref() from a callback function when the callers
- * still need the BlockDriverState.  In such cases we schedule a BH to release
- * the reference.
- */
-static void bdrv_put_ref_bh_schedule(BlockDriverState *bs)
-{
-BDRVPutRefBH *s;
-
-s = g_new(BDRVPutRefBH, 1);
-s->bh = qemu_bh_new(bdrv_put_ref_bh, s);
-s->bs = bs;
-qemu_bh_schedule(s->bh);
-}
-
 static int parse_block_error_action(const char *buf, bool is_read, Error 
**errp)
 {
 if (!strcmp(buf, "ignore")) {
@@ -2557,6 +2526,12 @@ static void block_job_cb(void *opaque, int ret)
 block_job_event_completed(bs->job, msg);
 }
 
+
+/*
+ * It is not safe to use bdrv_unref() from a callback function when the
+ * callers still need the BlockDriverState. In such cases we schedule
+ * a BH to release the reference.
+ */
 bdrv_put_ref_bh_schedule(bs);
 }
 
diff --git a/include/block/block.h b/include/block/block.h
index 601a5de..cccda1d 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -507,6 +507,7 @@ void bdrv_unref_child(BlockDriverState *parent, BdrvChild 
*child);
 BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
  BlockDriverState *child_bs,
  const BdrvChildRole *child_role);
+void bdrv_put_ref_bh_schedule(BlockDriverState *bs);
 
 bool bdrv_op_is_blocked(BlockDriverState *bs, BlockOpType op, Error **errp);
 void bdrv_op_block(BlockDriverState *bs, BlockOpType op, Error *reason);
-- 
2.4.3

[Qemu-block] [PATCH v11 03/12] allow writing to the backing file

2015-11-03 Thread Wen Congyang

For block replication, we have such backing chain:
secondary disk <-- hidden disk <-- active disk
secondary disk is top BDS (use bacing reference), so it can be opened in
read-write mode. But hidden disk is read only, and we need to write to
hidden disk (backup job will write data to it).

TODO: support opening backing file in read-write mode if the BDS is
created by QMP command blockdev-add.

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Gonglei <arei.gong...@huawei.com>
---
 block.c | 41 -
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 52a91ad..32ed776 100644
--- a/block.c
+++ b/block.c
@@ -737,6 +737,15 @@ static const BdrvChildRole child_backing = {
 .inherit_flags = bdrv_backing_flags,
 };
 
+static int bdrv_backing_rw_flags(int flags)
+{
+return bdrv_backing_flags(flags) | BDRV_O_RDWR;
+}
+
+static const BdrvChildRole child_backing_rw = {
+.inherit_flags = bdrv_backing_rw_flags,
+};
+
 static int bdrv_open_flags(BlockDriverState *bs, int flags)
 {
 int open_flags = flags | BDRV_O_CACHE_WB;
@@ -1184,6 +1193,20 @@ out:
 bdrv_refresh_limits(bs, NULL);
 }
 
+#define ALLOW_WRITE_BACKING_FILE"allow-write-backing-file"
+static QemuOptsList backing_file_opts = {
+.name = "backing_file",
+.head = QTAILQ_HEAD_INITIALIZER(backing_file_opts.head),
+.desc = {
+{
+.name = ALLOW_WRITE_BACKING_FILE,
+.type = QEMU_OPT_BOOL,
+.help = "allow writes to backing file",
+},
+{ /* end of list */ }
+},
+};
+
 /*
  * Opens the backing file for a BlockDriverState if not yet open
  *
@@ -1198,6 +1221,9 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*options, Error **errp)
 int ret = 0;
 BlockDriverState *backing_hd;
 Error *local_err = NULL;
+QemuOpts *opts = NULL;
+bool child_rw = false;
+const BdrvChildRole *child_role = NULL;
 
 if (bs->backing != NULL) {
 QDECREF(options);
@@ -1210,6 +1236,18 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*options, Error **errp)
 }
 
 bs->open_flags &= ~BDRV_O_NO_BACKING;
+
+opts = qemu_opts_create(_file_opts, NULL, 0, _abort);
+qemu_opts_absorb_qdict(opts, options, _err);
+if (local_err) {
+ret = -EINVAL;
+error_propagate(errp, local_err);
+QDECREF(options);
+goto free_exit;
+}
+child_rw = qemu_opt_get_bool(opts, ALLOW_WRITE_BACKING_FILE, false);
+child_role = child_rw ? _backing_rw : _backing;
+
 if (qdict_haskey(options, "file.filename")) {
 backing_filename[0] = '\0';
 } else if (bs->backing_file[0] == '\0' && qdict_size(options) == 0) {
@@ -1242,7 +1280,7 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*options, Error **errp)
 assert(bs->backing == NULL);
 ret = bdrv_open_inherit(_hd,
 *backing_filename ? backing_filename : NULL,
-NULL, options, 0, bs, _backing, _err);
+NULL, options, 0, bs, child_role, _err);
 if (ret < 0) {
 bdrv_unref(backing_hd);
 backing_hd = NULL;
@@ -1259,6 +1297,7 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*options, Error **errp)
 bdrv_unref(backing_hd);
 
 free_exit:
+qemu_opts_del(opts);
 g_free(backing_filename);
 return ret;
 }
-- 
2.4.3

[Qemu-block] [PATCH v11 00/12] Block replication is a very important feature which is used for

2015-11-03 Thread Wen Congyang

You can the detailed information about block replication from here:
http://wiki.qemu.org/Features/BlockReplication

Usage:
Please refer to docs/block-replication.txt

This patch series is based on the following patch series:
1. http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg03860.html
2. http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg06124.html

You can get the patch here:
https://github.com/coloft/qemu/tree/wency/block-replication-v11

The newest framework will be sent later.

TODO:
1. Continuous block replication. It will be started after basic functions
   are accepted.

Changs Log:
V11:
1. Reopen the backing file when starting blcok replication if it is not
   opened in R/W mode
2. Unblock BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET
   when opening backing file
3. Block the top BDS so there is only one block job for the top BDS and
   its backing chain.
V10:
1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
   reference.
2. Address the comments from Eric Blake
V9:
1. Update the error messages
2. Rebase to the newest qemu
3. Split child add/delete support. These patches are sent in another patchset.
V8:
1. Address Alberto Garcia's comments
V7:
1. Implement adding/removing quorum child. Remove the option non-connect.
2. Simplify the backing refrence option according to Stefan Hajnoczi's 
suggestion
V6:
1. Rebase to the newest qemu.
V5:
1. Address the comments from Gong Lei
2. Speed the failover up. The secondary vm can take over very quickly even
   if there are too many I/O requests.
V4:
1. Introduce a new driver replication to avoid touch nbd and qcow2.
V3:
1: use error_setg() instead of error_set()
2. Add a new block job API
3. Active disk, hidden disk and nbd target uses the same AioContext
4. Add a testcase to test new hbitmap API
V2:
1. Redesign the secondary qemu(use image-fleecing)
2. Use Error objects to return error message
3. Address the comments from Max Reitz and Eric Blake

Wen Congyang (12):
  unblock backup operations in backing file
  Store parent BDS in BdrvChild
  allow writing to the backing file
  Backup: clear all bitmap when doing block checkpoint
  Allow creating backup jobs when opening BDS
  block: make bdrv_put_ref_bh_schedule() as a public API
  docs: block replication's description
  Add new block driver interfaces to control block replication
  quorum: implement block driver interfaces for block replication
  Implement new driver for block replication
  support replication driver in blockdev-add
  Add a new API to start/stop replication, do checkpoint to all BDSes

 block.c| 211 -
 block/Makefile.objs|   3 +-
 block/backup.c |  14 ++
 block/quorum.c |  78 +++
 block/replication.c| 550 +
 blockdev.c |  37 +--
 blockjob.c |  11 +
 docs/block-replication.txt | 251 +
 include/block/block.h  |  10 +
 include/block/block_int.h  |  15 ++
 include/block/blockjob.h   |  12 +
 qapi/block-core.json   |  34 ++-
 12 files changed, 1190 insertions(+), 36 deletions(-)
 create mode 100644 block/replication.c
 create mode 100644 docs/block-replication.txt

-- 
2.4.3

[Qemu-block] [PATCH v11 01/12] unblock backup operations in backing file

2015-11-03 Thread Wen Congyang

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
---
 block.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/block.c b/block.c
index 9a2ab68..1d6c115 100644
--- a/block.c
+++ b/block.c
@@ -1161,6 +1161,24 @@ void bdrv_set_backing_hd(BlockDriverState *bs, 
BlockDriverState *backing_hd)
 /* Otherwise we won't be able to commit due to check in bdrv_commit */
 bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_COMMIT_TARGET,
 bs->backing_blocker);
+/*
+ * We do backup in 3 ways:
+ * 1. drive backup
+ *The target bs is new opened, and the source is top BDS
+ * 2. blockdev backup
+ *Both the source and the target are top BDSes.
+ * 3. internal backup(used for block replication)
+ *Both the source and the target are backing file
+ *
+ * In case 1, and 2, the backing file is neither the source nor
+ * the target.
+ * In case 3, we will block the top BDS, so there is only one block
+ * job for the top BDS and its backing chain.
+ */
+bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_SOURCE,
+bs->backing_blocker);
+bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_TARGET,
+bs->backing_blocker);
 out:
 bdrv_refresh_limits(bs, NULL);
 }
-- 
2.4.3

[Qemu-block] [PATCH v11 12/12] Add a new API to start/stop replication, do checkpoint to all BDSes

2015-11-03 Thread Wen Congyang

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Gonglei <arei.gong...@huawei.com>
---
 block.c   | 83 +++
 include/block/block.h |  4 +++
 2 files changed, 87 insertions(+)

diff --git a/block.c b/block.c
index 04b928c..517fa4b 100644
--- a/block.c
+++ b/block.c
@@ -4208,3 +4208,86 @@ void bdrv_stop_replication(BlockDriverState *bs, bool 
failover, Error **errp)
" replication", bs->filename);
 }
 }
+
+void bdrv_start_replication_all(ReplicationMode mode, Error **errp)
+{
+BlockDriverState *bs = NULL, *temp = NULL;
+Error *local_err = NULL;
+
+while ((bs = bdrv_next(bs))) {
+if (!QLIST_EMPTY(>parents)) {
+/* It is not top BDS */
+continue;
+}
+
+if (bdrv_is_read_only(bs) || !bdrv_is_inserted(bs)) {
+continue;
+}
+
+bdrv_start_replication(bs, mode, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+goto fail;
+}
+}
+
+return;
+
+fail:
+while ((temp = bdrv_next(temp)) && bs != temp) {
+bdrv_stop_replication(temp, false, NULL);
+}
+}
+
+void bdrv_do_checkpoint_all(Error **errp)
+{
+BlockDriverState *bs = NULL;
+Error *local_err = NULL;
+
+while ((bs = bdrv_next(bs))) {
+if (!QLIST_EMPTY(>parents)) {
+/* It is not top BDS */
+continue;
+}
+
+if (bdrv_is_read_only(bs) || !bdrv_is_inserted(bs)) {
+continue;
+}
+
+bdrv_do_checkpoint(bs, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+}
+}
+
+void bdrv_stop_replication_all(bool failover, Error **errp)
+{
+BlockDriverState *bs = NULL;
+Error *local_err = NULL;
+
+while ((bs = bdrv_next(bs))) {
+if (!QLIST_EMPTY(>parents)) {
+/* It is not top BDS */
+continue;
+}
+
+if (bdrv_is_read_only(bs) || !bdrv_is_inserted(bs)) {
+continue;
+}
+
+bdrv_stop_replication(bs, failover, _err);
+if (!errp) {
+/*
+ * The caller doesn't care the result, they just
+ * want to stop all block's replication.
+ */
+continue;
+}
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+}
+}
diff --git a/include/block/block.h b/include/block/block.h
index 288e14e..8427969 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -643,4 +643,8 @@ void bdrv_start_replication(BlockDriverState *bs, 
ReplicationMode mode,
 void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp);
 void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp);
 
+void bdrv_start_replication_all(ReplicationMode mode, Error **errp);
+void bdrv_do_checkpoint_all(Error **errp);
+void bdrv_stop_replication_all(bool failover, Error **errp);
+
 #endif
-- 
2.4.3

[Qemu-block] [PATCH v11 08/12] Add new block driver interfaces to control block replication

2015-11-03 Thread Wen Congyang

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Gonglei <arei.gong...@huawei.com>
Cc: Luiz Capitulino <lcapitul...@redhat.com>
Cc: Michael Roth <mdr...@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonz...@redhat.com>
---
 block.c   | 43 +++
 include/block/block.h |  5 +
 include/block/block_int.h | 14 ++
 qapi/block-core.json  | 13 +
 4 files changed, 75 insertions(+)

diff --git a/block.c b/block.c
index 9a1c20e..04b928c 100644
--- a/block.c
+++ b/block.c
@@ -4165,3 +4165,46 @@ void bdrv_del_child(BlockDriverState *parent_bs, 
BlockDriverState *child_bs,
 
 parent_bs->drv->bdrv_del_child(parent_bs, child_bs, errp);
 }
+
+void bdrv_start_replication(BlockDriverState *bs, ReplicationMode mode,
+Error **errp)
+{
+BlockDriver *drv = bs->drv;
+
+if (drv && drv->bdrv_start_replication) {
+drv->bdrv_start_replication(bs, mode, errp);
+} else if (bs->file) {
+bdrv_start_replication(bs->file->bs, mode, errp);
+} else {
+error_setg(errp, "The BDS %s doesn't support starting block"
+   " replication", bs->filename);
+}
+}
+
+void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp)
+{
+BlockDriver *drv = bs->drv;
+
+if (drv && drv->bdrv_do_checkpoint) {
+drv->bdrv_do_checkpoint(bs, errp);
+} else if (bs->file) {
+bdrv_do_checkpoint(bs->file->bs, errp);
+} else {
+error_setg(errp, "The BDS %s doesn't support block checkpoint",
+   bs->filename);
+}
+}
+
+void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp)
+{
+BlockDriver *drv = bs->drv;
+
+if (drv && drv->bdrv_stop_replication) {
+drv->bdrv_stop_replication(bs, failover, errp);
+} else if (bs->file) {
+bdrv_stop_replication(bs->file->bs, failover, errp);
+} else {
+error_setg(errp, "The BDS %s doesn't support stopping block"
+   " replication", bs->filename);
+}
+}
diff --git a/include/block/block.h b/include/block/block.h
index cccda1d..288e14e 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -638,4 +638,9 @@ void bdrv_add_child(BlockDriverState *parent, 
BlockDriverState *child,
 void bdrv_del_child(BlockDriverState *parent, BlockDriverState *child,
 Error **errp);
 
+void bdrv_start_replication(BlockDriverState *bs, ReplicationMode mode,
+Error **errp);
+void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp);
+void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp);
+
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 3285739..eec2591 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -293,6 +293,20 @@ struct BlockDriver {
 void (*bdrv_del_child)(BlockDriverState *parent, BlockDriverState *child,
Error **errp);
 
+void (*bdrv_start_replication)(BlockDriverState *bs, ReplicationMode mode,
+   Error **errp);
+/* Drop Disk buffer when doing checkpoint. */
+void (*bdrv_do_checkpoint)(BlockDriverState *bs, Error **errp);
+/*
+ * After failover, we should flush Disk buffer into secondary disk
+ * and stop block replication.
+ *
+ * If the guest is shutdown, we should drop Disk buffer and stop
+ * block representation.
+ */
+void (*bdrv_stop_replication)(BlockDriverState *bs, bool failover,
+  Error **errp);
+
 QLIST_ENTRY(BlockDriver) list;
 };
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 86b62e4..0539dfa 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1797,6 +1797,19 @@
 '*read-pattern': 'QuorumReadPattern' } }
 
 ##
+# @ReplicationMode
+#
+# An enumeration of replication modes.
+#
+# @primary: Primary mode, the vm's state will be sent to secondary QEMU.
+#
+# @secondary: Secondary mode, receive the vm's state from primary QEMU.
+#
+# Since: 2.5
+##
+{ 'enum' : 'ReplicationMode', 'data' : [ 'primary', 'secondary' ] }
+
+##
 # @BlockdevOptions
 #
 # Options for creating a block device.
-- 
2.4.3

Re: [Qemu-block] [PATCH v6 0/4] qapi: child add/delete support

2015-10-30 Thread Wen Congyang

Ping...

On 10/16/2015 04:57 PM, Wen Congyang wrote:
> If quorum's child is broken, we can use mirror job to replace it.
> But sometimes, the user only need to remove the broken child, and
> add it later when the problem is fixed.
> 
> It is based on the Kevin's bdrv_swap() related patch:
> http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg02152.html
> 
> ChangLog:
> v6:
> 1. Use a single qmp command x-blockdev-change to replace x-blockdev-child-add
>and x-blockdev-child-delete
> v5:
> 1. Address Eric Blake's comments
> v4:
> 1. drop nbd driver's implementation. We can use human-monitor-command
>to do it.
> 2. Rename the command name.
> v3:
> 1. Don't open BDS in bdrv_add_child(). Use the existing BDS which is
>created by the QMP command blockdev-add.
> 2. The driver NBD can support filename, path, host:port now.
> v2:
> 1. Use bdrv_get_device_or_node_name() instead of new function
>bdrv_get_id_or_node_name()
> 2. Update the error message
> 3. Update the documents in block-core.json
> 
> 
> Wen Congyang (4):
>   Add new block driver interface to add/delete a BDS's child
>   quorum: implement bdrv_add_child() and bdrv_del_child()
>   qmp: add monitor command to add/remove a child
>   hmp: add monitor command to add/remove a child
> 
>  block.c   | 56 --
>  block/quorum.c| 59 ++--
>  blockdev.c| 76 
> +++
>  hmp-commands.hx   | 17 +++
>  hmp.c | 38 
>  hmp.h |  1 +
>  include/block/block.h |  8 +
>  include/block/block_int.h |  5 
>  qapi/block-core.json  | 40 +
>  qmp-commands.hx   | 50 +++
>  10 files changed, 345 insertions(+), 5 deletions(-)
>

Re: [Qemu-block] [PATCH v10 01/10] allow writing to the backing file

2015-10-29 Thread Wen Congyang

On 10/10/2015 03:07 AM, Eric Blake wrote:
> On 09/25/2015 12:17 AM, Wen Congyang wrote:
>> For block replication, we have such backing chain:
>> secondary disk <-- hidden disk <-- active disk
>> secondary disk is top BDS(use bacing reference), so it can be opened in
> 
> s/BDS(use bacing/BDS (use backing/
> 
>> read-write mode. But hidden disk is read only, and we need to write to
>> hidden disk(backup job will write data to it).
> 
> s/disk(/disk (/
> 
>>
>> TODO: support opening backing file in read-write mode if the BDS is
>> created by QMP command blockdev-add.
>>
>> Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
>> Signed-off-by: Gonglei <arei.gong...@huawei.com>
>> ---
>>  block.c | 41 -
>>  1 file changed, 40 insertions(+), 1 deletion(-)
>>
> 
> I really don't like this patch.  We are able to automatically (re-)open
> backing files for write during block-commit, without having to expose a
> knob to the user then, so exposing a knob to the user here feels wrong.

I try to reopen the backing files for write when starting block replication.
I tested it, and it doesn't work.
Here is my usage:
command line:
-drive 
if=none,id=colo-disk1,file.filename=/data/images/kvm/suse/temp.img,driver=raw
-drive 
if=virtio,id=active-disk1,driver=replication,mode=secondary,file.driver=qcow2,file.file.filename=/mnt/ramfs/active_disk.img,file.backing.driver=qcow2,file.backing.file.filename=/mnt/ramfs/hidden_disk.img,file.backing.backing.file.filename=/data/images/kvm/suse/suse11_3.img,file.backing.backing.driver=raw,file.backing.backing.node-name=sdisk

{'execute': 'blockdev-remove-medium', 'arguments': {'device': 'colo-disk1'} }
{'execute': 'blockdev-insert-medium', 'arguments': {'device': 'colo-disk1', 
'node-name': 'sdisk'} }
{'execute': 'nbd-server-start', 'arguments': {'addr': {'type': 'inet', 'data': 
{'host': '192.168.3.1', 'port': '8889'} } } }
{'execute': 'nbd-server-add', 'arguments': {'device': 'colo-disk1', 'writable': 
true } }

All qmp command success, but the disk exported to nbd server is readonly even 
if I specify 
'writable': true in the QMP commad. The reason is that the BDS is readonly.

> 
>> +#define ALLOW_WRITE_BACKING_FILE"allow-write-backing-file"
>> +static QemuOptsList backing_file_opts = {
>> +.name = "backing_file",
>> +.head = QTAILQ_HEAD_INITIALIZER(backing_file_opts.head),
>> +.desc = {
>> +{
>> +.name = ALLOW_WRITE_BACKING_FILE,
>> +.type = QEMU_OPT_BOOL,
>> +.help = "allow writes to backing file",
>> +},
> 
> And even if we DO need this knob (which I doubt), you need corresponding
> documentation of the knob in qapi/block-core.json, since we are trying
> to keep the command line and QMP in sync when it comes to adding new
> options.
> 

Yes, I know it, but I don't know how to do it. Update BlockdevOptions?

Thanks
Wen Congyang

Re: [Qemu-block] [Qemu-devel] Dynamic reconfiguration

2015-10-26 Thread Wen Congyang

On 10/26/2015 03:24 PM, Markus Armbruster wrote:
> Wen Congyang <we...@cn.fujitsu.com> writes:
> 
>> On 10/21/2015 04:27 PM, Markus Armbruster wrote:
> [...]
>>> Can we phrase the operation differently?  Instead of "insert between A
>>> and B (silently replacing everything that is now between A and B)",
>>> say one of
>>>
>>> 1a. Replace node A by A <- blkdebug
>>>
>>> 1b. Replace node B by blkdebug <- B
>>>
>>> 2a. Replace edge A <- B by <- blkdebug <-
>>> Impossible in the current state, because there is no such edge.
>>
>> What does 'edge' mean?
> 
> It's graph terminology: the BB and BDS serve as the graph's nodes
> (a.k.a. vertices), and the pointers connecting them serve as the graph's
> edges.

Thanks

Wen Congyang

> 
> [...]
> .
>

Re: [Qemu-block] [PATCH v10 08/10] Implement new driver for block replication

2015-10-26 Thread Wen Congyang

On 10/16/2015 07:37 PM, Stefan Hajnoczi wrote:
> On Fri, Oct 16, 2015 at 10:22:05AM +0800, Wen Congyang wrote:
>> On 10/15/2015 10:55 PM, Stefan Hajnoczi wrote:
>>> On Thu, Oct 15, 2015 at 10:19:17AM +0800, Wen Congyang wrote:
>>>> On 10/14/2015 10:27 PM, Stefan Hajnoczi wrote:
>>>>> On Tue, Oct 13, 2015 at 05:08:17PM +0800, Wen Congyang wrote:
>>>>>> On 10/13/2015 12:27 AM, Stefan Hajnoczi wrote:
>>>>>>> On Fri, Sep 25, 2015 at 02:17:36PM +0800, Wen Congyang wrote:
>>>>>>>> +/* start backup job now */
>>>>>>>> +bdrv_op_unblock(s->hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
>>>>>>>> +s->active_disk->backing_blocker);
>>>>>>>> +bdrv_op_unblock(s->secondary_disk, 
>>>>>>>> BLOCK_OP_TYPE_BACKUP_SOURCE,
>>>>>>>> +s->hidden_disk->backing_blocker);
>>>>>>>
>>>>>>> Why is it safe to unblock these operations?
>>>>>>>
>>>>>>> Why do they have to be blocked for non-replication users?
>>>>>>
>>>>>> hidden_disk and secondary disk are opened as backing file, so it is 
>>>>>> blocked for
>>>>>> non-replication users.
>>>>>> What can I do if I don't unblock it and want to do backup?
>>>>>
>>>>> CCing Jeff Cody, block jobs maintainer
>>>>>
>>>>> You need to explain why it is safe remove this protection.  We can't
>>>>> merge code that may be unsafe.
>>>>>
>>>>> I think we can investigate further by asking: when does QEMU code assume
>>>>> the backing file is read-only?
>>>>
>>>> The backing file is opened in read-only mode. I want to reopen it in 
>>>> read-write
>>>> mode here in the next version(So the patch 1 will be dropped)
>>>>
>>>>>
>>>>> I haven't checked but these cases come to mind:
>>>>>
>>>>> Operations that move data between BDS in the backing chain (e.g. commit
>>>>> and stream block jobs) will lose or overwrite data if the backing file
>>>>> is being written to by another coroutine.
>>>>>
>>>>> We need to prevent users from running these operations at the same time.
>>>>
>>>> Yes, but qemu doesn't provide such API.
>>>
>>> This series can't be merged unless it is safe.
>>>
>>> Have you looked at op blockers and thought about how to prevent unsafe
>>> operations?
>>
>> What about this solution:
>> 1. unblock it in bdrv_set_backing_hd()
>> 2. block it in qmp_block_commit(), qmp_block_stream(), 
>> qmp_block_backup()..., to
>>prevent unsafe operations
> 
> Come to think of it, currently QEMU only supports 1 block job per BDS.
> 
> This means that as long as COLO has a backup job running, no other block
> jobs can interfere.
> 
> There still might be a risk with monitor commands like 'commit'.

What about this?
diff --git a/block.c b/block.c
index e9f40dc..b181d67 100644
--- a/block.c
+++ b/block.c
@@ -1162,6 +1162,24 @@ void bdrv_set_backing_hd(BlockDriverState *bs, 
BlockDriverState *backing_hd)
 /* Otherwise we won't be able to commit due to check in bdrv_commit */
 bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_COMMIT_TARGET,
 bs->backing_blocker);
+/*
+ * We do backup in 3 ways:
+ * 1. drive backup
+ *The target bs is new opened, and the source is top BDS
+ * 2. blockdev backup
+ *Both the source and the target are top BDSes.
+ * 3. internal backup(used for block replication)
+ *Both the source and the target are backing file
+ *
+ * In case 1, and 2, the backing file is neither the source nor
+ * the target.
+ * In case 3, we will block the top BDS, so there is only one block
+ * job for the top BDS and its backing chain.
+ */
+bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_SOURCE,
+bs->backing_blocker);
+bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_TARGET,
+bs->backing_blocker);
 out:
 bdrv_refresh_limits(bs, NULL);
 }



> 
> Stefan
> .
>

[Qemu-block] [PATCH v6 2/4] quorum: implement bdrv_add_child() and bdrv_del_child()

2015-10-16 Thread Wen Congyang

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Gonglei <arei.gong...@huawei.com>
---
 block.c   |  6 +++---
 block/quorum.c| 59 +--
 include/block/block.h |  3 +++
 3 files changed, 63 insertions(+), 5 deletions(-)

diff --git a/block.c b/block.c
index bcba22f..d96d2cc 100644
--- a/block.c
+++ b/block.c
@@ -1079,9 +1079,9 @@ static int bdrv_fill_options(QDict **options, const char 
**pfilename,
 return 0;
 }
 
-static BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
-BlockDriverState *child_bs,
-const BdrvChildRole *child_role)
+BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
+ BlockDriverState *child_bs,
+ const BdrvChildRole *child_role)
 {
 BdrvChild *child = g_new(BdrvChild, 1);
 *child = (BdrvChild) {
diff --git a/block/quorum.c b/block/quorum.c
index c4cda32..a9e499c 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -875,9 +875,9 @@ static int quorum_open(BlockDriverState *bs, QDict 
*options, int flags,
 ret = -EINVAL;
 goto exit;
 }
-if (s->num_children < 2) {
+if (s->num_children < 1) {
 error_setg(_err,
-   "Number of provided children must be greater than 1");
+   "Number of provided children must be 1 or more");
 ret = -EINVAL;
 goto exit;
 }
@@ -997,6 +997,58 @@ static void quorum_attach_aio_context(BlockDriverState *bs,
 }
 }
 
+static void quorum_add_child(BlockDriverState *bs, BlockDriverState *child_bs,
+ Error **errp)
+{
+BDRVQuorumState *s = bs->opaque;
+BdrvChild *child;
+
+bdrv_drain(bs);
+
+assert(s->num_children <= INT_MAX / sizeof(BdrvChild *));
+if (s->num_children == INT_MAX / sizeof(BdrvChild *)) {
+error_setg(errp, "Too many children");
+return;
+}
+s->children = g_renew(BdrvChild *, s->children, s->num_children + 1);
+
+bdrv_ref(child_bs);
+child = bdrv_attach_child(bs, child_bs, _format);
+s->children[s->num_children++] = child;
+}
+
+static void quorum_del_child(BlockDriverState *bs, BlockDriverState *child_bs,
+ Error **errp)
+{
+BDRVQuorumState *s = bs->opaque;
+BdrvChild *child;
+int i;
+
+for (i = 0; i < s->num_children; i++) {
+if (s->children[i]->bs == child_bs) {
+break;
+}
+}
+
+/* we have checked it in bdrv_del_child() */
+assert(i < s->num_children);
+child = s->children[i];
+
+if (s->num_children <= s->threshold) {
+error_setg(errp,
+"The number of children cannot be lower than the vote threshold 
%d",
+s->threshold);
+return;
+}
+
+bdrv_drain(bs);
+/* We can safely remove this child now */
+memmove(>children[i], >children[i + 1],
+(s->num_children - i - 1) * sizeof(void *));
+s->children = g_renew(BdrvChild *, s->children, --s->num_children);
+bdrv_unref_child(bs, child);
+}
+
 static void quorum_refresh_filename(BlockDriverState *bs)
 {
 BDRVQuorumState *s = bs->opaque;
@@ -1052,6 +1104,9 @@ static BlockDriver bdrv_quorum = {
 .bdrv_detach_aio_context= quorum_detach_aio_context,
 .bdrv_attach_aio_context= quorum_attach_aio_context,
 
+.bdrv_add_child = quorum_add_child,
+.bdrv_del_child = quorum_del_child,
+
 .is_filter  = true,
 .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
 };
diff --git a/include/block/block.h b/include/block/block.h
index ef84c87..f5bfb6b 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -516,6 +516,9 @@ void bdrv_disable_copy_on_read(BlockDriverState *bs);
 void bdrv_ref(BlockDriverState *bs);
 void bdrv_unref(BlockDriverState *bs);
 void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child);
+BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
+ BlockDriverState *child_bs,
+ const BdrvChildRole *child_role);
 
 bool bdrv_op_is_blocked(BlockDriverState *bs, BlockOpType op, Error **errp);
 void bdrv_op_block(BlockDriverState *bs, BlockOpType op, Error *reason);
-- 
2.4.3

[Qemu-block] [PATCH v6 4/4] hmp: add monitor command to add/remove a child

2015-10-16 Thread Wen Congyang

The new command is blockdev_change. It does the same
thing as the QMP command x-blockdev-change.

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Gonglei <arei.gong...@huawei.com>
Cc: Luiz Capitulino <lcapitul...@redhat.com>
---
 hmp-commands.hx | 17 +
 hmp.c   | 38 ++
 hmp.h   |  1 +
 3 files changed, 56 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 3a4ae39..57475cc 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -193,6 +193,23 @@ actions (drive options rerror, werror).
 ETEXI
 
 {
+.name   = "blockdev_change",
+.args_type  = "op:s,parent:B,child:B?,node:?",
+.params = "operation parent [child] [node]",
+.help   = "Dynamic reconfigure the block driver state graph",
+.mhandler.cmd = hmp_blockdev_change,
+},
+
+STEXI
+@item blockdev_change @var{operation} @var{parent} [@var{child}] [@var{node}]
+@findex blockdev_change
+Dynamic reconfigure the block driver state graph. It can be used to
+add, remove, insert, replace a block driver state. Currently only
+the Quorum driver implements this feature to add and remove its child.
+This is useful to fix a broken quorum child.
+ETEXI
+
+{
 .name   = "change",
 .args_type  = "device:B,target:F,arg:s?",
 .params = "device filename [format]",
diff --git a/hmp.c b/hmp.c
index 5048eee..fc58ae2 100644
--- a/hmp.c
+++ b/hmp.c
@@ -2346,3 +2346,41 @@ void hmp_rocker_of_dpa_groups(Monitor *mon, const QDict 
*qdict)
 
 qapi_free_RockerOfDpaGroupList(list);
 }
+
+void hmp_blockdev_change(Monitor *mon, const QDict *qdict)
+{
+const char *operation = qdict_get_str(qdict, "op");
+const char *parent = qdict_get_str(qdict, "parent");
+const char *child = qdict_get_try_str(qdict, "child");
+const char *node = qdict_get_try_str(qdict, "node");
+ChangeOperation op = CHANGE_OPERATION_ADD;
+Error *local_err = NULL;
+bool has_child = !!child;
+bool has_node = !!node;
+
+while (ChangeOperation_lookup[op] != NULL) {
+if (strcmp(ChangeOperation_lookup[op], operation) == 0) {
+break;
+}
+op++;
+}
+
+if (ChangeOperation_lookup[op] == NULL) {
+error_setg(_err, "Invalid parameter '%s'", "operation");
+goto out;
+}
+
+/*
+ * FIXME: we must specify the parameter child, otherwise,
+ * we can't specify the parameter node.
+ */
+if (op == CHANGE_OPERATION_ADD) {
+has_child = false;
+}
+
+qmp_x_blockdev_change(op, parent, has_child, child,
+  has_node, node, _err);
+
+out:
+hmp_handle_error(mon, _err);
+}
diff --git a/hmp.h b/hmp.h
index 81656c3..80a1faf 100644
--- a/hmp.h
+++ b/hmp.h
@@ -130,5 +130,6 @@ void hmp_rocker(Monitor *mon, const QDict *qdict);
 void hmp_rocker_ports(Monitor *mon, const QDict *qdict);
 void hmp_rocker_of_dpa_flows(Monitor *mon, const QDict *qdict);
 void hmp_rocker_of_dpa_groups(Monitor *mon, const QDict *qdict);
+void hmp_blockdev_change(Monitor *mon, const QDict *qdict);
 
 #endif
-- 
2.4.3

[Qemu-block] [PATCH v6 0/4] qapi: child add/delete support

2015-10-16 Thread Wen Congyang

If quorum's child is broken, we can use mirror job to replace it.
But sometimes, the user only need to remove the broken child, and
add it later when the problem is fixed.

It is based on the Kevin's bdrv_swap() related patch:
http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg02152.html

ChangLog:
v6:
1. Use a single qmp command x-blockdev-change to replace x-blockdev-child-add
   and x-blockdev-child-delete
v5:
1. Address Eric Blake's comments
v4:
1. drop nbd driver's implementation. We can use human-monitor-command
   to do it.
2. Rename the command name.
v3:
1. Don't open BDS in bdrv_add_child(). Use the existing BDS which is
   created by the QMP command blockdev-add.
2. The driver NBD can support filename, path, host:port now.
v2:
1. Use bdrv_get_device_or_node_name() instead of new function
   bdrv_get_id_or_node_name()
2. Update the error message
3. Update the documents in block-core.json


Wen Congyang (4):
  Add new block driver interface to add/delete a BDS's child
  quorum: implement bdrv_add_child() and bdrv_del_child()
  qmp: add monitor command to add/remove a child
  hmp: add monitor command to add/remove a child

 block.c   | 56 --
 block/quorum.c| 59 ++--
 blockdev.c| 76 +++
 hmp-commands.hx   | 17 +++
 hmp.c | 38 
 hmp.h |  1 +
 include/block/block.h |  8 +
 include/block/block_int.h |  5 
 qapi/block-core.json  | 40 +
 qmp-commands.hx   | 50 +++
 10 files changed, 345 insertions(+), 5 deletions(-)

-- 
2.4.3

[Qemu-block] [PATCH v6 3/4] qmp: add monitor command to add/remove a child

2015-10-16 Thread Wen Congyang

The new QMP command name is x-blockdev-change. It justs for adding/removing
quorum's child now, and don't support all kinds of children, all kinds of
operations, nor all block drivers. So it is experimental now.

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Gonglei <arei.gong...@huawei.com>
---
 blockdev.c   | 76 
 qapi/block-core.json | 40 +++
 qmp-commands.hx  | 50 ++
 3 files changed, 166 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 6c8cce4..72efe5d 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3086,6 +3086,82 @@ fail:
 qmp_output_visitor_cleanup(ov);
 }
 
+void qmp_x_blockdev_change(ChangeOperation op, const char *parent,
+   bool has_child, const char *child,
+   bool has_new_node, const char *new_node,
+   Error **errp)
+{
+BlockDriverState *parent_bs, *child_bs, *new_bs;
+Error *local_err = NULL;
+
+parent_bs = bdrv_lookup_bs(parent, parent, _err);
+if (!parent_bs) {
+error_propagate(errp, local_err);
+return;
+}
+
+switch(op) {
+case CHANGE_OPERATION_ADD:
+if (has_child) {
+error_setg(errp, "The operation %s doesn't support the parameter 
child",
+   ChangeOperation_lookup[op]);
+return;
+}
+if (!has_new_node) {
+error_setg(errp, "The operation %s needs the parameter new_node",
+   ChangeOperation_lookup[op]);
+return;
+}
+break;
+case CHANGE_OPERATION_DELETE:
+if (has_new_node) {
+error_setg(errp, "The operation %s doesn't support the parameter 
node",
+   ChangeOperation_lookup[op]);
+return;
+}
+if (!has_child) {
+error_setg(errp, "The operation %s needs the parameter child",
+   ChangeOperation_lookup[op]);
+return;
+}
+default:
+break;
+}
+
+if (has_child) {
+child_bs = bdrv_find_node(child);
+if (!child_bs) {
+error_setg(errp, "Node '%s' not found", child);
+return;
+}
+}
+
+if (has_new_node) {
+new_bs = bdrv_find_node(new_node);
+if (!new_bs) {
+error_setg(errp, "Node '%s' not found", new_node);
+return;
+}
+}
+
+switch(op) {
+case CHANGE_OPERATION_ADD:
+bdrv_add_child(parent_bs, new_bs, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+}
+break;
+case CHANGE_OPERATION_DELETE:
+bdrv_del_child(parent_bs, child_bs, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+}
+break;
+default:
+break;
+}
+}
+
 BlockJobInfoList *qmp_query_block_jobs(Error **errp)
 {
 BlockJobInfoList *head = NULL, **p_next = 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index bb2189e..361588f 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2114,3 +2114,43 @@
 ##
 { 'command': 'block-set-write-threshold',
   'data': { 'node-name': 'str', 'write-threshold': 'uint64' } }
+
+##
+# @ChangeOperation:
+#
+# An enumeration of block device change operation.
+#
+# @add: Add a new block driver state to a existed block driver state.
+#
+# @delete: Delete a block driver state's child.
+#
+# Since: 2.5
+##
+{ 'enum': 'ChangeOperation',
+  'data': [ 'add', 'delete' ] }
+
+##
+# @x-blockdev-change
+#
+# Dynamic reconfigure the block driver state graph. It can be used to
+# add, remove, insert, replace a block driver state. Currently only
+# the Quorum driver implements this feature to add and remove its child.
+# This is useful to fix a broken quorum child.
+#
+# @operation: the chanage operation. It can be add, delete.
+#
+# @parent: the id or node name of which node will be changed.
+#
+# @child: the child node-name which will be deleted.
+#
+# @node: the new node-name which will be added.
+#
+# Note: this command is experimental, and not a stable API.
+#
+# Since: 2.5
+##
+{ 'command': 'x-blockdev-change',
+  'data' : { 'operation': 'ChangeOperation',
+ 'parent': 'str',
+ '*child': 'str',
+ '*node': 'str' } }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index d2ba800..ede7b71 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -3921,6 +3921,56 @@ Example (2):
 EQMP
 
 {
+.name   = "x-blockdev-change",
+.args_type  = "operation:s,parent:B,child:B?,node:B?",
+.mhandler.cmd_new = qmp_marshal_x_blockdev_change,
+},
+
+SQMP
+x-blockdev-change
+
+
+Dynamic reconfigure the block driver

Re: [Qemu-block] [PATCH v10 08/10] Implement new driver for block replication

2015-10-15 Thread Wen Congyang

On 10/15/2015 10:55 PM, Stefan Hajnoczi wrote:
> On Thu, Oct 15, 2015 at 10:19:17AM +0800, Wen Congyang wrote:
>> On 10/14/2015 10:27 PM, Stefan Hajnoczi wrote:
>>> On Tue, Oct 13, 2015 at 05:08:17PM +0800, Wen Congyang wrote:
>>>> On 10/13/2015 12:27 AM, Stefan Hajnoczi wrote:
>>>>> On Fri, Sep 25, 2015 at 02:17:36PM +0800, Wen Congyang wrote:
>>>>>> +/* start backup job now */
>>>>>> +bdrv_op_unblock(s->hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
>>>>>> +s->active_disk->backing_blocker);
>>>>>> +bdrv_op_unblock(s->secondary_disk, BLOCK_OP_TYPE_BACKUP_SOURCE,
>>>>>> +s->hidden_disk->backing_blocker);
>>>>>
>>>>> Why is it safe to unblock these operations?
>>>>>
>>>>> Why do they have to be blocked for non-replication users?
>>>>
>>>> hidden_disk and secondary disk are opened as backing file, so it is 
>>>> blocked for
>>>> non-replication users.
>>>> What can I do if I don't unblock it and want to do backup?
>>>
>>> CCing Jeff Cody, block jobs maintainer
>>>
>>> You need to explain why it is safe remove this protection.  We can't
>>> merge code that may be unsafe.
>>>
>>> I think we can investigate further by asking: when does QEMU code assume
>>> the backing file is read-only?
>>
>> The backing file is opened in read-only mode. I want to reopen it in 
>> read-write
>> mode here in the next version(So the patch 1 will be dropped)
>>
>>>
>>> I haven't checked but these cases come to mind:
>>>
>>> Operations that move data between BDS in the backing chain (e.g. commit
>>> and stream block jobs) will lose or overwrite data if the backing file
>>> is being written to by another coroutine.
>>>
>>> We need to prevent users from running these operations at the same time.
>>
>> Yes, but qemu doesn't provide such API.
> 
> This series can't be merged unless it is safe.
> 
> Have you looked at op blockers and thought about how to prevent unsafe
> operations?

Hmm, only block jobs will write to the backing file? If so, op blockers can
prevent unsafe operations.

Thanks
Wen Congyang

> 
> Stefan
> .
>

Re: [Qemu-block] [PATCH v10 08/10] Implement new driver for block replication

2015-10-15 Thread Wen Congyang

On 10/15/2015 10:55 PM, Stefan Hajnoczi wrote:
> On Thu, Oct 15, 2015 at 10:19:17AM +0800, Wen Congyang wrote:
>> On 10/14/2015 10:27 PM, Stefan Hajnoczi wrote:
>>> On Tue, Oct 13, 2015 at 05:08:17PM +0800, Wen Congyang wrote:
>>>> On 10/13/2015 12:27 AM, Stefan Hajnoczi wrote:
>>>>> On Fri, Sep 25, 2015 at 02:17:36PM +0800, Wen Congyang wrote:
>>>>>> +/* start backup job now */
>>>>>> +bdrv_op_unblock(s->hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
>>>>>> +s->active_disk->backing_blocker);
>>>>>> +bdrv_op_unblock(s->secondary_disk, BLOCK_OP_TYPE_BACKUP_SOURCE,
>>>>>> +s->hidden_disk->backing_blocker);
>>>>>
>>>>> Why is it safe to unblock these operations?
>>>>>
>>>>> Why do they have to be blocked for non-replication users?
>>>>
>>>> hidden_disk and secondary disk are opened as backing file, so it is 
>>>> blocked for
>>>> non-replication users.
>>>> What can I do if I don't unblock it and want to do backup?
>>>
>>> CCing Jeff Cody, block jobs maintainer
>>>
>>> You need to explain why it is safe remove this protection.  We can't
>>> merge code that may be unsafe.
>>>
>>> I think we can investigate further by asking: when does QEMU code assume
>>> the backing file is read-only?
>>
>> The backing file is opened in read-only mode. I want to reopen it in 
>> read-write
>> mode here in the next version(So the patch 1 will be dropped)
>>
>>>
>>> I haven't checked but these cases come to mind:
>>>
>>> Operations that move data between BDS in the backing chain (e.g. commit
>>> and stream block jobs) will lose or overwrite data if the backing file
>>> is being written to by another coroutine.
>>>
>>> We need to prevent users from running these operations at the same time.
>>
>> Yes, but qemu doesn't provide such API.
> 
> This series can't be merged unless it is safe.
> 
> Have you looked at op blockers and thought about how to prevent unsafe
> operations?

What about this solution:
1. unblock it in bdrv_set_backing_hd()
2. block it in qmp_block_commit(), qmp_block_stream(), qmp_block_backup()..., to
   prevent unsafe operations

Thanks
Wen Congyang

> 
> Stefan
> .
>

Re: [Qemu-block] [PATCH v10 08/10] Implement new driver for block replication

2015-10-14 Thread Wen Congyang

On 10/14/2015 10:27 PM, Stefan Hajnoczi wrote:
> On Tue, Oct 13, 2015 at 05:08:17PM +0800, Wen Congyang wrote:
>> On 10/13/2015 12:27 AM, Stefan Hajnoczi wrote:
>>> On Fri, Sep 25, 2015 at 02:17:36PM +0800, Wen Congyang wrote:
>>>> +/* start backup job now */
>>>> +bdrv_op_unblock(s->hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
>>>> +s->active_disk->backing_blocker);
>>>> +bdrv_op_unblock(s->secondary_disk, BLOCK_OP_TYPE_BACKUP_SOURCE,
>>>> +s->hidden_disk->backing_blocker);
>>>
>>> Why is it safe to unblock these operations?
>>>
>>> Why do they have to be blocked for non-replication users?
>>
>> hidden_disk and secondary disk are opened as backing file, so it is blocked 
>> for
>> non-replication users.
>> What can I do if I don't unblock it and want to do backup?
> 
> CCing Jeff Cody, block jobs maintainer
> 
> You need to explain why it is safe remove this protection.  We can't
> merge code that may be unsafe.
> 
> I think we can investigate further by asking: when does QEMU code assume
> the backing file is read-only?

The backing file is opened in read-only mode. I want to reopen it in read-write
mode here in the next version(So the patch 1 will be dropped)

> 
> I haven't checked but these cases come to mind:
> 
> Operations that move data between BDS in the backing chain (e.g. commit
> and stream block jobs) will lose or overwrite data if the backing file
> is being written to by another coroutine.
> 
> We need to prevent users from running these operations at the same time.

Yes, but qemu doesn't provide such API.

> 
> Also, accessing bs->backing_blocker is a layering violation.  No one
> outside block.c:bdrv_set_backing_hd() is supposed to access this field.

I agree with it.

Thanks
Wen Congyang

> 
> Let's figure out the safety concerns first and then the
> bs->backing_blocker access will probably be eliminated as part of the
> solution.
> 
> Stefan
> .
>

Re: [Qemu-block] [PATCH v10 08/10] Implement new driver for block replication

2015-10-13 Thread Wen Congyang

On 10/13/2015 12:31 AM, Stefan Hajnoczi wrote:
> On Fri, Sep 25, 2015 at 02:17:36PM +0800, Wen Congyang wrote:
>> +static void replication_start(BlockDriverState *bs, ReplicationMode mode,
>> +  Error **errp)
>> +{
>> +BDRVReplicationState *s = bs->opaque;
>> +int64_t active_length, hidden_length, disk_length;
>> +AioContext *aio_context;
>> +Error *local_err = NULL;
>> +
>> +if (s->replication_state != BLOCK_REPLICATION_NONE) {
>> +error_setg(errp, "Block replication is running or done");
>> +return;
>> +}
>> +
>> +if (s->mode != mode) {
>> +error_setg(errp, "The parameter mode's value is invalid, needs %d,"
>> +   " but receives %d", s->mode, mode);
>> +return;
>> +}
>> +
>> +switch (s->mode) {
>> +case REPLICATION_MODE_PRIMARY:
>> +break;
>> +case REPLICATION_MODE_SECONDARY:
>> +s->active_disk = bs->file;
>> +if (!bs->file->backing_hd) {
>> +error_setg(errp, "Active disk doesn't have backing file");
>> +return;
>> +}
>> +
>> +s->hidden_disk = s->active_disk->backing_hd;
>> +if (!s->hidden_disk->backing_hd) {
>> +error_setg(errp, "Hidden disk doesn't have backing file");
>> +return;
>> +}
>> +
>> +s->secondary_disk = s->hidden_disk->backing_hd;
>> +if (!s->secondary_disk->blk) {
>> +error_setg(errp, "The secondary disk doesn't have block 
>> backend");
>> +return;
>> +}
> ...
>> +aio_context = bdrv_get_aio_context(bs);
>> +aio_context_acquire(aio_context);
>> +bdrv_set_aio_context(s->secondary_disk, aio_context);
> 
> Why is this bdrv_set_aio_context() call necessary?
> 
> Child BDS nodes are in the same AioContext as their parents.  Other
> block jobs need something like this because they operate on a second BDS
> which is not bs' backing file chain.  I think you have a different
> situation here so it's not needed.

I think you are right. I will check it and remove it.

Thanks
Wen Congyang

> .
>

Re: [Qemu-block] [PATCH v10 08/10] Implement new driver for block replication

2015-10-13 Thread Wen Congyang

On 10/13/2015 12:25 AM, Stefan Hajnoczi wrote:
> On Fri, Sep 25, 2015 at 02:17:36PM +0800, Wen Congyang wrote:
>> +static void backup_job_completed(void *opaque, int ret)
>> +{
>> +BDRVReplicationState *s = opaque;
>> +
>> +if (s->replication_state != BLOCK_REPLICATION_DONE) {
>> +/* The backup job is cancelled unexpectedly */
>> +s->error = -EIO;
>> +}
>> +
>> +bdrv_op_block(s->hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
>> +  s->active_disk->backing_blocker);
>> +bdrv_op_block(s->secondary_disk, BLOCK_OP_TYPE_BACKUP_SOURCE,
>> +  s->hidden_disk->backing_blocker);
>> +
>> +bdrv_put_ref_bh_schedule(s->secondary_disk);
> 
> Why is bdrv_put_ref_bh_schedule() necessary?

It is copied from block_job_cb(). According to the comments in 
bdrv_put_ref_bh_schedule():
/*
 * Release a BDS reference in a BH
 *
 * It is not safe to use bdrv_unref() from a callback function when the callers
 * still need the BlockDriverState.  In such cases we schedule a BH to release
 * the reference.
 */

If the comment is right, I think it is necessary to call 
bdrv_put_ref_bh_schedule() here.
Because the job is created on the BDS s->secondary disk, backup_job_completed() 
is
called in block_job_completed(), and we will still use s->secondary_disk in 
block_job_release().

Thanks
Wen Congyang

> .
>

Re: [Qemu-block] [PATCH v10 08/10] Implement new driver for block replication

2015-10-13 Thread Wen Congyang

On 10/13/2015 12:27 AM, Stefan Hajnoczi wrote:
> On Fri, Sep 25, 2015 at 02:17:36PM +0800, Wen Congyang wrote:
>> +/* start backup job now */
>> +bdrv_op_unblock(s->hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
>> +s->active_disk->backing_blocker);
>> +bdrv_op_unblock(s->secondary_disk, BLOCK_OP_TYPE_BACKUP_SOURCE,
>> +s->hidden_disk->backing_blocker);
> 
> Why is it safe to unblock these operations?
> 
> Why do they have to be blocked for non-replication users?

hidden_disk and secondary disk are opened as backing file, so it is blocked for
non-replication users.
What can I do if I don't unblock it and want to do backup?

Thanks
Wen Congyang

> 
> Stefan
> .
>

Re: [Qemu-block] [Qemu-devel] [PATCH v10 08/10] Implement new driver for block replication

2015-10-13 Thread Wen Congyang

On 10/13/2015 05:41 PM, Fam Zheng wrote:
> On Tue, 10/13 16:59, Wen Congyang wrote:
>> On 10/13/2015 12:25 AM, Stefan Hajnoczi wrote:
>>> On Fri, Sep 25, 2015 at 02:17:36PM +0800, Wen Congyang wrote:
>>>> +static void backup_job_completed(void *opaque, int ret)
>>>> +{
>>>> +BDRVReplicationState *s = opaque;
>>>> +
>>>> +if (s->replication_state != BLOCK_REPLICATION_DONE) {
>>>> +/* The backup job is cancelled unexpectedly */
>>>> +s->error = -EIO;
>>>> +}
>>>> +
>>>> +bdrv_op_block(s->hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
>>>> +  s->active_disk->backing_blocker);
>>>> +bdrv_op_block(s->secondary_disk, BLOCK_OP_TYPE_BACKUP_SOURCE,
>>>> +  s->hidden_disk->backing_blocker);
>>>> +
>>>> +bdrv_put_ref_bh_schedule(s->secondary_disk);
>>>
>>> Why is bdrv_put_ref_bh_schedule() necessary?
>>
>> It is copied from block_job_cb(). According to the comments in 
>> bdrv_put_ref_bh_schedule():
>> /*
>>  * Release a BDS reference in a BH
>>  *
>>  * It is not safe to use bdrv_unref() from a callback function when the 
>> callers
>>  * still need the BlockDriverState.  In such cases we schedule a BH to 
>> release
>>  * the reference.
>>  */
>>
>> If the comment is right, I think it is necessary to call 
>> bdrv_put_ref_bh_schedule() here.
>> Because the job is created on the BDS s->secondary disk, 
>> backup_job_completed() is
>> called in block_job_completed(), and we will still use s->secondary_disk in 
>> block_job_release().
> 
> Where is the matching bdrv_ref called?

It is in block_job_create()

source: we call in bdrv_ref() in block_job_create(), and the user should unref 
it.
target: the user call bdrv_ref(), and we will unref it in the job

I don't know why we design it like this...

Thanks
Wen Congyang

> 
> Fam
> .
>

Re: [Qemu-block] [Qemu-devel] [PATCH v10 08/10] Implement new driver for block replication

2015-10-13 Thread Wen Congyang

On 10/13/2015 06:12 PM, Fam Zheng wrote:
> On Tue, 10/13 17:46, Wen Congyang wrote:
>> On 10/13/2015 05:41 PM, Fam Zheng wrote:
>>> On Tue, 10/13 16:59, Wen Congyang wrote:
>>>> On 10/13/2015 12:25 AM, Stefan Hajnoczi wrote:
>>>>> On Fri, Sep 25, 2015 at 02:17:36PM +0800, Wen Congyang wrote:
>>>>>> +static void backup_job_completed(void *opaque, int ret)
>>>>>> +{
>>>>>> +BDRVReplicationState *s = opaque;
>>>>>> +
>>>>>> +if (s->replication_state != BLOCK_REPLICATION_DONE) {
>>>>>> +/* The backup job is cancelled unexpectedly */
>>>>>> +s->error = -EIO;
>>>>>> +}
>>>>>> +
>>>>>> +bdrv_op_block(s->hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
>>>>>> +  s->active_disk->backing_blocker);
>>>>>> +bdrv_op_block(s->secondary_disk, BLOCK_OP_TYPE_BACKUP_SOURCE,
>>>>>> +  s->hidden_disk->backing_blocker);
>>>>>> +
>>>>>> +bdrv_put_ref_bh_schedule(s->secondary_disk);
>>>>>
>>>>> Why is bdrv_put_ref_bh_schedule() necessary?
>>>>
>>>> It is copied from block_job_cb(). According to the comments in 
>>>> bdrv_put_ref_bh_schedule():
>>>> /*
>>>>  * Release a BDS reference in a BH
>>>>  *
>>>>  * It is not safe to use bdrv_unref() from a callback function when the 
>>>> callers
>>>>  * still need the BlockDriverState.  In such cases we schedule a BH to 
>>>> release
>>>>  * the reference.
>>>>  */
>>>>
>>>> If the comment is right, I think it is necessary to call 
>>>> bdrv_put_ref_bh_schedule() here.
>>>> Because the job is created on the BDS s->secondary disk, 
>>>> backup_job_completed() is
>>>> called in block_job_completed(), and we will still use s->secondary_disk 
>>>> in block_job_release().
>>>
>>> Where is the matching bdrv_ref called?
>>
>> It is in block_job_create()
>>
>> source: we call in bdrv_ref() in block_job_create(), and the user should 
>> unref it.
>> target: the user call bdrv_ref(), and we will unref it in the job
>>
>> I don't know why we design it like this...
>>
> 
> Maybe it's better to unref it in block_job_release. Then we can simply drop
> bdrv_put_ref_bh_schedule.

I agree with it.

Thanks
Wen Congyang

> 
> Fam
> .
>

Re: [Qemu-block] [PATCH v5 0/4] qapi: child add/delete support

2015-10-07 Thread Wen Congyang

Ping...

On 09/22/2015 03:44 PM, Wen Congyang wrote:
> If quorum's child is broken, we can use mirror job to replace it.
> But sometimes, the user only need to remove the broken child, and
> add it later when the problem is fixed.
> 
> It is based on the following patch:
> http://lists.nongnu.org/archive/html/qemu-devel/2015-09/msg04579.html
> 
> ChangLog:
> v5:
> 1. Address Eric Blake's comments
> v4:
> 1. drop nbd driver's implementation. We can use human-monitor-command
>to do it.
> 2. Rename the command name.
> v3:
> 1. Don't open BDS in bdrv_add_child(). Use the existing BDS which is
>created by the QMP command blockdev-add.
> 2. The driver NBD can support filename, path, host:port now.
> v2:
> 1. Use bdrv_get_device_or_node_name() instead of new function
>bdrv_get_id_or_node_name()
> 2. Update the error message
> 3. Update the documents in block-core.json
> 
> Wen Congyang (4):
>   Add new block driver interface to add/delete a BDS's child
>   quorum: implement bdrv_add_child() and bdrv_del_child()
>   qmp: add monitor command to add/remove a child
>   hmp: add monitor command to add/remove a child
> 
>  block.c   | 56 ++--
>  block/quorum.c| 72 
> +--
>  blockdev.c| 48 +++
>  hmp-commands.hx   | 28 ++
>  hmp.c | 20 +
>  hmp.h |  2 ++
>  include/block/block.h |  8 ++
>  include/block/block_int.h |  5 
>  qapi/block-core.json  | 34 ++
>  qmp-commands.hx   | 61 +++
>  10 files changed, 329 insertions(+), 5 deletions(-)
>

Re: [Qemu-block] [PATCH v5 1/4] Add new block driver interface to add/delete a BDS's child

2015-10-07 Thread Wen Congyang

On 10/08/2015 03:00 AM, Dr. David Alan Gilbert wrote:
> * Wen Congyang (we...@cn.fujitsu.com) wrote:
>> In some cases, we want to take a quorum child offline, and take
>> another child online.
> 
> Hi,
>   Have you checked the output of 'info block' after adding/deleting a child?
> I'm using one of your older worlds (from a few months ago) and I found I had
> to add a 
> 
> bdrv_refresh_filename(bs);
> 
> to get the output of 'info block' to show the new child.
> I don't see it in this version.

Max sent a patch series to drop BDS.filename, so I don't call 
bdrv_refresh_filename()
here. If the BDS is not the top BDS, 'info block' still shows the wrong child.

Thanks
Wen Congyang

> 
> Dave
> 
> 
>>
>> Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
>> Signed-off-by: Gonglei <arei.gong...@huawei.com>
>> Reviewed-by: Eric Blake <ebl...@redhat.com>
>> ---
>>  block.c   | 50 
>> +++
>>  include/block/block.h |  5 +
>>  include/block/block_int.h |  5 +
>>  3 files changed, 60 insertions(+)
>>
>> diff --git a/block.c b/block.c
>> index e815d73..1b25e43 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -4265,3 +4265,53 @@ BlockAcctStats *bdrv_get_stats(BlockDriverState *bs)
>>  {
>>  return >stats;
>>  }
>> +
>> +/*
>> + * Hot add/remove a BDS's child. So the user can take a child offline when
>> + * it is broken and take a new child online
>> + */
>> +void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
>> +Error **errp)
>> +{
>> +
>> +if (!parent_bs->drv || !parent_bs->drv->bdrv_add_child) {
>> +error_setg(errp, "The BDS %s doesn't support adding a child",
>> +   bdrv_get_device_or_node_name(parent_bs));
>> +return;
>> +}
>> +
>> +if (!QLIST_EMPTY(_bs->parents)) {
>> +error_setg(errp, "The BDS %s already has parent",
>> +   child_bs->node_name);
>> +return;
>> +}
>> +
>> +parent_bs->drv->bdrv_add_child(parent_bs, child_bs, errp);
>> +}
>> +
>> +void bdrv_del_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
>> +Error **errp)
>> +{
>> +BdrvChild *child;
>> +
>> +if (!parent_bs->drv || !parent_bs->drv->bdrv_del_child) {
>> +error_setg(errp, "The BDS %s doesn't support removing a child",
>> +   bdrv_get_device_or_node_name(parent_bs));
>> +return;
>> +}
>> +
>> +QLIST_FOREACH(child, _bs->children, next) {
>> +if (child->bs == child_bs) {
>> +break;
>> +}
>> +}
>> +
>> +if (!child) {
>> +error_setg(errp, "BDS %s is not a child of %s",
>> +   bdrv_get_device_or_node_name(child_bs),
>> +   bdrv_get_device_or_node_name(parent_bs));
>> +return;
>> +}
>> +
>> +parent_bs->drv->bdrv_del_child(parent_bs, child_bs, errp);
>> +}
>> diff --git a/include/block/block.h b/include/block/block.h
>> index ef67353..665c56f 100644
>> --- a/include/block/block.h
>> +++ b/include/block/block.h
>> @@ -616,4 +616,9 @@ void bdrv_flush_io_queue(BlockDriverState *bs);
>>  
>>  BlockAcctStats *bdrv_get_stats(BlockDriverState *bs);
>>  
>> +void bdrv_add_child(BlockDriverState *parent, BlockDriverState *child,
>> +Error **errp);
>> +void bdrv_del_child(BlockDriverState *parent, BlockDriverState *child,
>> +Error **errp);
>> +
>>  #endif
>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>> index 2f2c47b..64cbc55 100644
>> --- a/include/block/block_int.h
>> +++ b/include/block/block_int.h
>> @@ -288,6 +288,11 @@ struct BlockDriver {
>>   */
>>  int (*bdrv_probe_geometry)(BlockDriverState *bs, HDGeometry *geo);
>>  
>> +void (*bdrv_add_child)(BlockDriverState *parent, BlockDriverState 
>> *child,
>> +   Error **errp);
>> +void (*bdrv_del_child)(BlockDriverState *parent, BlockDriverState 
>> *child,
>> +   Error **errp);
>> +
>>  QLIST_ENTRY(BlockDriver) list;
>>  };
>>  
>> -- 
>> 2.4.3
>>
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> .
>

Re: [Qemu-block] [PATCH v5 2/4] quorum: implement bdrv_add_child() and bdrv_del_child()

2015-10-07 Thread Wen Congyang

On 10/07/2015 10:12 PM, Alberto Garcia wrote:
> On Tue 22 Sep 2015 09:44:20 AM CEST, Wen Congyang wrote:
> 
>> +++ b/block/quorum.c
>> @@ -66,6 +66,9 @@ typedef struct QuorumVotes {
>>  typedef struct BDRVQuorumState {
>>  BlockDriverState **bs; /* children BlockDriverStates */
>>  int num_children;  /* children count */
>> +int max_children;  /* The maximum children count, we need to 
>> reallocate
>> +* bs if num_children grows larger than maximum.
>> +*/
>>  int threshold; /* if less than threshold children reads gave the
>>  * same result a quorum error occurs.
>>  */
> 
> As you announce in the cover letter of this series, your code depends on
> the parents list patch written by Kevin here:
> 
> http://lists.nongnu.org/archive/html/qemu-devel/2015-09/msg04579.html
> 
> As you might be aware, and as part of the same series by Kevin,
> BDRVQuorumState will no longer hold a list of BlockDriverState but a
> list of BdrvChild instead:
> 
> https://lists.nongnu.org/archive/html/qemu-devel/2015-09/msg04571.html

I notice that, and I only one patch from Kevin now. I will fix it in the
next version.

> 
>> +static void quorum_add_child(BlockDriverState *bs, BlockDriverState 
>> *child_bs,
>> + Error **errp)
>> +{
>> +BDRVQuorumState *s = bs->opaque;
>> +
>> +bdrv_drain(bs);
>> +
>> +if (s->num_children == s->max_children) {
>> +if (s->max_children >= INT_MAX) {
>> +error_setg(errp, "Too many children");
>> +return;
>> +}
> 
> max_children can never be greater than INT_MAX. Use == instead.
> 
>> +s->bs = g_renew(BlockDriverState *, s->bs, s->max_children + 1);
>> +s->bs[s->num_children] = NULL;
> 
> No need to set the pointer to NULL here, and you are anyway setting the
> pointer to the new child a few lines afterwards.

Yes, I will remove it in the next version.

> 
>> +s->max_children++;
>> +}
>> +
>> +bdrv_ref(child_bs);
>> +bdrv_attach_child(bs, child_bs, _format);
>> +s->bs[s->num_children++] = child_bs;
>> +}
>> +
>> +static void quorum_del_child(BlockDriverState *bs, BlockDriverState 
>> *child_bs,
>> + Error **errp)
>> +{
>> +BDRVQuorumState *s = bs->opaque;
>> +BdrvChild *child;
>> +int i;
>> +
>> +for (i = 0; i < s->num_children; i++) {
>> +if (s->bs[i] == child_bs) {
>> +break;
>> +}
>> +}
>> +
>> +QLIST_FOREACH(child, >children, next) {
>> +if (child->bs == child_bs) {
>> +break;
>> +}
>> +}
>> +
>> +/* we have checked it in bdrv_del_child() */
>> +assert(i < s->num_children && child);
>> +
>> +if (s->num_children <= s->threshold) {
>> +error_setg(errp,
>> +"The number of children cannot be lower than the vote threshold 
>> %d",
>> +s->threshold);
>> +return;
>> +}
>> +
>> +bdrv_drain(bs);
>> +/* We can safely remove this child now */
>> +memmove(>bs[i], >bs[i + 1],
>> +(s->num_children - i - 1) * sizeof(void *));
>> +s->num_children--;
>> +s->bs[s->num_children] = NULL;
> 
> Same here, no one will check or use s->bs[s->num_children] so there's no
> need to make it NULL.
> 
> Apart from the issue of using only part of Kevin's series, the rest are
> minor things.

I will fix it in the next version.

> 
> Thanks and sorry for the late review!

Thanks for your review

Wen Congyang

> 
> Berto
> .
>

Re: [Qemu-block] [PATCH v5 1/4] Add new block driver interface to add/delete a BDS's child

2015-10-07 Thread Wen Congyang

On 10/08/2015 02:33 AM, Max Reitz wrote:
> On 22.09.2015 09:44, Wen Congyang wrote:
>> In some cases, we want to take a quorum child offline, and take
>> another child online.
>>
>> Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
>> Signed-off-by: Gonglei <arei.gong...@huawei.com>
>> Reviewed-by: Eric Blake <ebl...@redhat.com>
>> ---
>>  block.c   | 50 
>> +++
>>  include/block/block.h |  5 +
>>  include/block/block_int.h |  5 +
>>  3 files changed, 60 insertions(+)
>>
>> diff --git a/block.c b/block.c
>> index e815d73..1b25e43 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -4265,3 +4265,53 @@ BlockAcctStats *bdrv_get_stats(BlockDriverState *bs)
>>  {
>>  return >stats;
>>  }
>> +
>> +/*
>> + * Hot add/remove a BDS's child. So the user can take a child offline when
>> + * it is broken and take a new child online
>> + */
>> +void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
>> +Error **errp)
>> +{
>> +
>> +if (!parent_bs->drv || !parent_bs->drv->bdrv_add_child) {
>> +error_setg(errp, "The BDS %s doesn't support adding a child",
>> +   bdrv_get_device_or_node_name(parent_bs));
>> +return;
>> +}
>> +
>> +if (!QLIST_EMPTY(_bs->parents)) {
>> +error_setg(errp, "The BDS %s already has parent",
>> +   child_bs->node_name);
>> +return;
>> +}
>> +
>> +parent_bs->drv->bdrv_add_child(parent_bs, child_bs, errp);
>> +}
>> +
>> +void bdrv_del_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
>> +Error **errp)
>> +{
>> +BdrvChild *child;
>> +
>> +if (!parent_bs->drv || !parent_bs->drv->bdrv_del_child) {
>> +error_setg(errp, "The BDS %s doesn't support removing a child",
>> +   bdrv_get_device_or_node_name(parent_bs));
>> +return;
>> +}
>> +
>> +QLIST_FOREACH(child, _bs->children, next) {
>> +if (child->bs == child_bs) {
>> +break;
>> +}
>> +}
>> +
>> +if (!child) {
>> +error_setg(errp, "BDS %s is not a child of %s",
>> +   bdrv_get_device_or_node_name(child_bs),
>> +   bdrv_get_device_or_node_name(parent_bs));
>> +return;
>> +}
>> +
>> +parent_bs->drv->bdrv_del_child(parent_bs, child_bs, errp);
> 
> How about we make this (BlockDriver.bdrv_del_child()) take a BdrvChild
> instead of a BDS? We could even make bdrv_del_child() as a whole take a
> BdrvChild parameter, but I don't suppose that would help much.

bdrv_add_child() takes a BDS, so I use BDS here.

Thanks
Wen Congyang

> 
> Max
> 
>> +}
>> diff --git a/include/block/block.h b/include/block/block.h
>> index ef67353..665c56f 100644
>> --- a/include/block/block.h
>> +++ b/include/block/block.h
>> @@ -616,4 +616,9 @@ void bdrv_flush_io_queue(BlockDriverState *bs);
>>  
>>  BlockAcctStats *bdrv_get_stats(BlockDriverState *bs);
>>  
>> +void bdrv_add_child(BlockDriverState *parent, BlockDriverState *child,
>> +Error **errp);
>> +void bdrv_del_child(BlockDriverState *parent, BlockDriverState *child,
>> +Error **errp);
>> +
>>  #endif
>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>> index 2f2c47b..64cbc55 100644
>> --- a/include/block/block_int.h
>> +++ b/include/block/block_int.h
>> @@ -288,6 +288,11 @@ struct BlockDriver {
>>   */
>>  int (*bdrv_probe_geometry)(BlockDriverState *bs, HDGeometry *geo);
>>  
>> +void (*bdrv_add_child)(BlockDriverState *parent, BlockDriverState 
>> *child,
>> +   Error **errp);
>> +void (*bdrv_del_child)(BlockDriverState *parent, BlockDriverState 
>> *child,
>> +   Error **errp);
>> +
>>  QLIST_ENTRY(BlockDriver) list;
>>  };
>>  
>>
> 
>

Re: [Qemu-block] [PATCH v5 1/4] Add new block driver interface to add/delete a BDS's child

2015-10-07 Thread Wen Congyang

On 10/07/2015 09:35 PM, Alberto Garcia wrote:
> On Tue 22 Sep 2015 09:44:19 AM CEST, Wen Congyang <we...@cn.fujitsu.com> 
> wrote:
>> In some cases, we want to take a quorum child offline, and take
>> another child online.
>>
>> Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
>> Signed-off-by: Gonglei <arei.gong...@huawei.com>
>> Reviewed-by: Eric Blake <ebl...@redhat.com>
> 
>> +void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
>> +Error **errp)
>> +{
>> +
>> +if (!parent_bs->drv || !parent_bs->drv->bdrv_add_child) {
>> +error_setg(errp, "The BDS %s doesn't support adding a child",
>> +   bdrv_get_device_or_node_name(parent_bs));
>> +return;
>> +}
>> +
>> +if (!QLIST_EMPTY(_bs->parents)) {
>> +error_setg(errp, "The BDS %s already has parent",
>> +   child_bs->node_name);
> 
> I think there's one 'a' missing:
> 
>   "The BDS %s already has a parent".
> 
> I also don't think we should use "BDS" in error messages, that's an
> acronym for the name of a C data type, not something that the user is
> supposed to know about.
> 
> I suggest using 'Node' instead.
> 
> Otherwise the patch looks good to me, thanks!

OK, I will fix it in the next version

Thanks
Wen Congyang

> 
> Berto
> .
>

[Qemu-block] [PATCH v10 04/10] block: make bdrv_put_ref_bh_schedule() as a public API

2015-09-25 Thread Wen Congyang

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
---
 block.c   | 25 +
 blockdev.c| 37 ++---
 include/block/block.h |  1 +
 3 files changed, 32 insertions(+), 31 deletions(-)

diff --git a/block.c b/block.c
index 328c52f..f9a985c 100644
--- a/block.c
+++ b/block.c
@@ -3597,6 +3597,31 @@ void bdrv_unref(BlockDriverState *bs)
 }
 }
 
+typedef struct {
+QEMUBH *bh;
+BlockDriverState *bs;
+} BDRVPutRefBH;
+
+static void bdrv_put_ref_bh(void *opaque)
+{
+BDRVPutRefBH *s = opaque;
+
+bdrv_unref(s->bs);
+qemu_bh_delete(s->bh);
+g_free(s);
+}
+
+/* Release a BDS reference in a BH */
+void bdrv_put_ref_bh_schedule(BlockDriverState *bs)
+{
+BDRVPutRefBH *s;
+
+s = g_new(BDRVPutRefBH, 1);
+s->bh = qemu_bh_new(bdrv_put_ref_bh, s);
+s->bs = bs;
+qemu_bh_schedule(s->bh);
+}
+
 struct BdrvOpBlocker {
 Error *reason;
 QLIST_ENTRY(BdrvOpBlocker) list;
diff --git a/blockdev.c b/blockdev.c
index 3289cc3..11bc992 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -278,37 +278,6 @@ static void bdrv_format_print(void *opaque, const char 
*name)
 error_printf(" %s", name);
 }
 
-typedef struct {
-QEMUBH *bh;
-BlockDriverState *bs;
-} BDRVPutRefBH;
-
-static void bdrv_put_ref_bh(void *opaque)
-{
-BDRVPutRefBH *s = opaque;
-
-bdrv_unref(s->bs);
-qemu_bh_delete(s->bh);
-g_free(s);
-}
-
-/*
- * Release a BDS reference in a BH
- *
- * It is not safe to use bdrv_unref() from a callback function when the callers
- * still need the BlockDriverState.  In such cases we schedule a BH to release
- * the reference.
- */
-static void bdrv_put_ref_bh_schedule(BlockDriverState *bs)
-{
-BDRVPutRefBH *s;
-
-s = g_new(BDRVPutRefBH, 1);
-s->bh = qemu_bh_new(bdrv_put_ref_bh, s);
-s->bs = bs;
-qemu_bh_schedule(s->bh);
-}
-
 static int parse_block_error_action(const char *buf, bool is_read, Error 
**errp)
 {
 if (!strcmp(buf, "ignore")) {
@@ -2534,6 +2503,12 @@ static void block_job_cb(void *opaque, int ret)
 block_job_event_completed(bs->job, msg);
 }
 
+
+/*
+ * It is not safe to use bdrv_unref() from a callback function when the
+ * callers still need the BlockDriverState. In such cases we schedule
+ * a BH to release the reference.
+ */
 bdrv_put_ref_bh_schedule(bs);
 }
 
diff --git a/include/block/block.h b/include/block/block.h
index e4be19f..5154388 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -505,6 +505,7 @@ void bdrv_unref_child(BlockDriverState *parent, BdrvChild 
*child);
 BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
  BlockDriverState *child_bs,
  const BdrvChildRole *child_role);
+void bdrv_put_ref_bh_schedule(BlockDriverState *bs);
 
 bool bdrv_op_is_blocked(BlockDriverState *bs, BlockOpType op, Error **errp);
 void bdrv_op_block(BlockDriverState *bs, BlockOpType op, Error *reason);
-- 
2.4.3

[Qemu-block] [PATCH v10 03/10] Allow creating backup jobs when opening BDS

2015-09-25 Thread Wen Congyang

When opening BDS, we need to create backup jobs for
image-fleecing.

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Gonglei <arei.gong...@huawei.com>
Reviewed-by: Stefan Hajnoczi <stefa...@redhat.com>
Reviewed-by: Jeff Cody <jc...@redhat.com>
---
 block/Makefile.objs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 58ef2ef..fa05f37 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -22,10 +22,10 @@ block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o
 block-obj-$(CONFIG_LIBSSH2) += ssh.o
 block-obj-y += accounting.o
 block-obj-y += write-threshold.o
+block-obj-y += backup.o
 
 common-obj-y += stream.o
 common-obj-y += commit.o
-common-obj-y += backup.o
 
 iscsi.o-cflags := $(LIBISCSI_CFLAGS)
 iscsi.o-libs   := $(LIBISCSI_LIBS)
-- 
2.4.3

[Qemu-block] [PATCH v10 06/10] Add new block driver interfaces to control block replication

2015-09-25 Thread Wen Congyang

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Gonglei <arei.gong...@huawei.com>
Cc: Luiz Capitulino <lcapitul...@redhat.com>
Cc: Michael Roth <mdr...@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonz...@redhat.com>
---
 block.c   | 43 +++
 include/block/block.h |  5 +
 include/block/block_int.h | 14 ++
 qapi/block-core.json  | 13 +
 4 files changed, 75 insertions(+)

diff --git a/block.c b/block.c
index f9a985c..5cb916b 100644
--- a/block.c
+++ b/block.c
@@ -4253,3 +4253,46 @@ void bdrv_del_child(BlockDriverState *parent_bs, 
BlockDriverState *child_bs,
 
 parent_bs->drv->bdrv_del_child(parent_bs, child_bs, errp);
 }
+
+void bdrv_start_replication(BlockDriverState *bs, ReplicationMode mode,
+Error **errp)
+{
+BlockDriver *drv = bs->drv;
+
+if (drv && drv->bdrv_start_replication) {
+drv->bdrv_start_replication(bs, mode, errp);
+} else if (bs->file) {
+bdrv_start_replication(bs->file, mode, errp);
+} else {
+error_setg(errp, "The BDS %s doesn't support starting block"
+   " replication", bs->filename);
+}
+}
+
+void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp)
+{
+BlockDriver *drv = bs->drv;
+
+if (drv && drv->bdrv_do_checkpoint) {
+drv->bdrv_do_checkpoint(bs, errp);
+} else if (bs->file) {
+bdrv_do_checkpoint(bs->file, errp);
+} else {
+error_setg(errp, "The BDS %s doesn't support block checkpoint",
+   bs->filename);
+}
+}
+
+void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp)
+{
+BlockDriver *drv = bs->drv;
+
+if (drv && drv->bdrv_stop_replication) {
+drv->bdrv_stop_replication(bs, failover, errp);
+} else if (bs->file) {
+bdrv_stop_replication(bs->file, failover, errp);
+} else {
+error_setg(errp, "The BDS %s doesn't support stopping block"
+   " replication", bs->filename);
+}
+}
diff --git a/include/block/block.h b/include/block/block.h
index 5154388..40ef59f 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -611,4 +611,9 @@ void bdrv_add_child(BlockDriverState *parent, 
BlockDriverState *child,
 void bdrv_del_child(BlockDriverState *parent, BlockDriverState *child,
 Error **errp);
 
+void bdrv_start_replication(BlockDriverState *bs, ReplicationMode mode,
+Error **errp);
+void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp);
+void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp);
+
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 636d0c9..ee4b8fa 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -293,6 +293,20 @@ struct BlockDriver {
 void (*bdrv_del_child)(BlockDriverState *parent, BlockDriverState *child,
Error **errp);
 
+void (*bdrv_start_replication)(BlockDriverState *bs, ReplicationMode mode,
+   Error **errp);
+/* Drop Disk buffer when doing checkpoint. */
+void (*bdrv_do_checkpoint)(BlockDriverState *bs, Error **errp);
+/*
+ * After failover, we should flush Disk buffer into secondary disk
+ * and stop block replication.
+ *
+ * If the guest is shutdown, we should drop Disk buffer and stop
+ * block representation.
+ */
+void (*bdrv_stop_replication)(BlockDriverState *bs, bool failover,
+  Error **errp);
+
 QLIST_ENTRY(BlockDriver) list;
 };
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 000ae47..d5a177b 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1797,6 +1797,19 @@
 '*read-pattern': 'QuorumReadPattern' } }
 
 ##
+# @ReplicationMode
+#
+# An enumeration of replication modes.
+#
+# @primary: Primary mode, the vm's state will be sent to secondary QEMU.
+#
+# @secondary: Secondary mode, receive the vm's state from primary QEMU.
+#
+# Since: 2.5
+##
+{ 'enum' : 'ReplicationMode', 'data' : [ 'primary', 'secondary' ] }
+
+##
 # @BlockdevOptions
 #
 # Options for creating a block device.
-- 
2.4.3

1 2 3 4 >

1 - 100 of 330 matches

Mail list logo