Re: [Qemu-devel] [PATCH v9 2/3] quorum: implement bdrv_add_child() and bdrv_del_child()

2016-02-15 Thread Changlong Xie

On 02/09/2016 01:06 AM, Alberto Garcia wrote:

On Fri 22 Jan 2016 09:02:10 PM CET, "Dr. David Alan Gilbert" 
 wrote:


In general, what do you do to make sure that the data in a new
Quorum child is consistent with that of the rest of the array?


Quorum can have more than one child when it starts. But we don't
do the similar check. So I don't think we should do such check
here.


Yes, but when you start a VM you can verify in advance that all
members of the Quorum have the same data. If you do that on a
running VM how can you know if the new disk is consistent with the
others?


User error if it is not.  Just the same as it is user error if you
request a shallow drive-mirror but the destination is not the same
contents as the backing file.  I don't think qemu has to protect us
from user error in this case.


But the backing file is read-only so the user can guarantee that the
destination has the same data before the shallow mirror. How do you
do that in this case?


I think in the colo case they're relying on doing a block migrate to
synchronise the remote disk prior to switching into colo mode.


Yes but this is a general API that can be used independently from
COLO. I'd say if we want to allow that we should at least place a big
warning in the documentation.



Ok, that's fair enough. Will add in next version.

Thanks
-Xie

Berto


.







Re: [Qemu-devel] [PATCH v9 2/3] quorum: implement bdrv_add_child() and bdrv_del_child()

2016-02-08 Thread Alberto Garcia
On Fri 22 Jan 2016 09:02:10 PM CET, "Dr. David Alan Gilbert" 
 wrote:

>>  In general, what do you do to make sure that the data in a new
>>  Quorum child is consistent with that of the rest of the array?
>> >>>
>> >>> Quorum can have more than one child when it starts. But we don't
>> >>> do the similar check. So I don't think we should do such check
>> >>> here.
>> >> 
>> >> Yes, but when you start a VM you can verify in advance that all
>> >> members of the Quorum have the same data. If you do that on a
>> >> running VM how can you know if the new disk is consistent with the
>> >> others?
>> >
>> > User error if it is not.  Just the same as it is user error if you
>> > request a shallow drive-mirror but the destination is not the same
>> > contents as the backing file.  I don't think qemu has to protect us
>> > from user error in this case.
>> 
>> But the backing file is read-only so the user can guarantee that the
>> destination has the same data before the shallow mirror. How do you
>> do that in this case?
>
> I think in the colo case they're relying on doing a block migrate to
> synchronise the remote disk prior to switching into colo mode.

Yes but this is a general API that can be used independently from
COLO. I'd say if we want to allow that we should at least place a big
warning in the documentation.

Berto



Re: [Qemu-devel] [PATCH v9 2/3] quorum: implement bdrv_add_child() and bdrv_del_child()

2016-01-24 Thread Wen Congyang
On 01/23/2016 04:02 AM, Dr. David Alan Gilbert wrote:
> * Alberto Garcia (be...@igalia.com) wrote:
>> On Thu 21 Jan 2016 05:58:42 PM CET, Eric Blake  wrote:
>> In general, what do you do to make sure that the data in a new Quorum
>> child is consistent with that of the rest of the array?
>
> Quorum can have more than one child when it starts. But we don't do
> the similar check. So I don't think we should do such check here.

 Yes, but when you start a VM you can verify in advance that all
 members of the Quorum have the same data. If you do that on a running
 VM how can you know if the new disk is consistent with the others?
>>>
>>> User error if it is not.  Just the same as it is user error if you
>>> request a shallow drive-mirror but the destination is not the same
>>> contents as the backing file.  I don't think qemu has to protect us
>>> from user error in this case.
>>
>> But the backing file is read-only so the user can guarantee that the
>> destination has the same data before the shallow mirror. How do you do
>> that in this case?
> 
> I think in the colo case they're relying on doing a block migrate
> to synchronise the remote disk prior to switching into colo mode.

Yes, we can do a block migration to sync the disk. After the migration finished,
we stop block migration before starting colo.

Thanks
Wen Congyang

> 
> Dave
> 
>> Berto
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> 
> 
> .
> 






Re: [Qemu-devel] [PATCH v9 2/3] quorum: implement bdrv_add_child() and bdrv_del_child()

2016-01-22 Thread Dr. David Alan Gilbert
* Alberto Garcia (be...@igalia.com) wrote:
> On Thu 21 Jan 2016 05:58:42 PM CET, Eric Blake  wrote:
>  In general, what do you do to make sure that the data in a new Quorum
>  child is consistent with that of the rest of the array?
> >>>
> >>> Quorum can have more than one child when it starts. But we don't do
> >>> the similar check. So I don't think we should do such check here.
> >> 
> >> Yes, but when you start a VM you can verify in advance that all
> >> members of the Quorum have the same data. If you do that on a running
> >> VM how can you know if the new disk is consistent with the others?
> >
> > User error if it is not.  Just the same as it is user error if you
> > request a shallow drive-mirror but the destination is not the same
> > contents as the backing file.  I don't think qemu has to protect us
> > from user error in this case.
> 
> But the backing file is read-only so the user can guarantee that the
> destination has the same data before the shallow mirror. How do you do
> that in this case?

I think in the colo case they're relying on doing a block migrate
to synchronise the remote disk prior to switching into colo mode.

Dave

> Berto
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [PATCH v9 2/3] quorum: implement bdrv_add_child() and bdrv_del_child()

2016-01-22 Thread Alberto Garcia
On Thu 21 Jan 2016 05:58:42 PM CET, Eric Blake  wrote:
 In general, what do you do to make sure that the data in a new Quorum
 child is consistent with that of the rest of the array?
>>>
>>> Quorum can have more than one child when it starts. But we don't do
>>> the similar check. So I don't think we should do such check here.
>> 
>> Yes, but when you start a VM you can verify in advance that all
>> members of the Quorum have the same data. If you do that on a running
>> VM how can you know if the new disk is consistent with the others?
>
> User error if it is not.  Just the same as it is user error if you
> request a shallow drive-mirror but the destination is not the same
> contents as the backing file.  I don't think qemu has to protect us
> from user error in this case.

But the backing file is read-only so the user can guarantee that the
destination has the same data before the shallow mirror. How do you do
that in this case?

Berto



Re: [Qemu-devel] [PATCH v9 2/3] quorum: implement bdrv_add_child() and bdrv_del_child()

2016-01-21 Thread Alberto Garcia
On Thu 21 Jan 2016 02:54:10 AM CET, Wen Congyang  wrote:

>>> @@ -875,9 +878,9 @@ static int quorum_open(BlockDriverState *bs, QDict 
>>> *options, int flags,
>>>  ret = -EINVAL;
>>>  goto exit;
>>>  }
>>> -if (s->num_children < 2) {
>>> +if (s->num_children < 1) {
>>>  error_setg(_err,
>>> -   "Number of provided children must be greater than 1");
>>> +   "Number of provided children must be 1 or more");
>>>  ret = -EINVAL;
>>>  goto exit;
>>>  }
>> 
>> I have a question: if you have a Quorum with just one member and you
>> add a new one, how do you know if it has the same data as the
>> existing one?
>> 
>> In general, what do you do to make sure that the data in a new Quorum
>> child is consistent with that of the rest of the array?
>
> Quorum can have more than one child when it starts. But we don't do
> the similar check. So I don't think we should do such check here.

Yes, but when you start a VM you can verify in advance that all members
of the Quorum have the same data. If you do that on a running VM how can
you know if the new disk is consistent with the others?

Berto



Re: [Qemu-devel] [PATCH v9 2/3] quorum: implement bdrv_add_child() and bdrv_del_child()

2016-01-21 Thread Eric Blake
On 01/21/2016 06:05 AM, Alberto Garcia wrote:
> On Thu 21 Jan 2016 02:54:10 AM CET, Wen Congyang  wrote:
> 
 @@ -875,9 +878,9 @@ static int quorum_open(BlockDriverState *bs, QDict 
 *options, int flags,
  ret = -EINVAL;
  goto exit;
  }
 -if (s->num_children < 2) {
 +if (s->num_children < 1) {
  error_setg(_err,
 -   "Number of provided children must be greater than 1");
 +   "Number of provided children must be 1 or more");
  ret = -EINVAL;
  goto exit;
  }
>>>
>>> I have a question: if you have a Quorum with just one member and you
>>> add a new one, how do you know if it has the same data as the
>>> existing one?
>>>
>>> In general, what do you do to make sure that the data in a new Quorum
>>> child is consistent with that of the rest of the array?
>>
>> Quorum can have more than one child when it starts. But we don't do
>> the similar check. So I don't think we should do such check here.
> 
> Yes, but when you start a VM you can verify in advance that all members
> of the Quorum have the same data. If you do that on a running VM how can
> you know if the new disk is consistent with the others?

User error if it is not.  Just the same as it is user error if you
request a shallow drive-mirror but the destination is not the same
contents as the backing file.  I don't think qemu has to protect us from
user error in this case.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v9 2/3] quorum: implement bdrv_add_child() and bdrv_del_child()

2016-01-20 Thread Wen Congyang
On 01/20/2016 11:43 PM, Alberto Garcia wrote:
> On Fri 25 Dec 2015 10:22:55 AM CET, Changlong Xie wrote:
>> @@ -875,9 +878,9 @@ static int quorum_open(BlockDriverState *bs, QDict 
>> *options, int flags,
>>  ret = -EINVAL;
>>  goto exit;
>>  }
>> -if (s->num_children < 2) {
>> +if (s->num_children < 1) {
>>  error_setg(_err,
>> -   "Number of provided children must be greater than 1");
>> +   "Number of provided children must be 1 or more");
>>  ret = -EINVAL;
>>  goto exit;
>>  }
> 
> I have a question: if you have a Quorum with just one member and you add
> a new one, how do you know if it has the same data as the existing one?
> 
> In general, what do you do to make sure that the data in a new Quorum
> child is consistent with that of the rest of the array?

Quorum can have more than one child when it starts. But we don't do the
similar check. So I don't think we should do such check here.

Thanks
Wen Congyang

> 
> Berto
> 
> 
> .
> 






Re: [Qemu-devel] [PATCH v9 2/3] quorum: implement bdrv_add_child() and bdrv_del_child()

2016-01-20 Thread Alberto Garcia
On Fri 25 Dec 2015 10:22:55 AM CET, Changlong Xie wrote:
> @@ -875,9 +878,9 @@ static int quorum_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  ret = -EINVAL;
>  goto exit;
>  }
> -if (s->num_children < 2) {
> +if (s->num_children < 1) {
>  error_setg(_err,
> -   "Number of provided children must be greater than 1");
> +   "Number of provided children must be 1 or more");
>  ret = -EINVAL;
>  goto exit;
>  }

I have a question: if you have a Quorum with just one member and you add
a new one, how do you know if it has the same data as the existing one?

In general, what do you do to make sure that the data in a new Quorum
child is consistent with that of the rest of the array?

Berto



[Qemu-devel] [PATCH v9 2/3] quorum: implement bdrv_add_child() and bdrv_del_child()

2015-12-25 Thread Changlong Xie
From: Wen Congyang 

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 
---
 block.c   |   8 ++--
 block/quorum.c| 122 +-
 include/block/block.h |   4 ++
 3 files changed, 128 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index a347008..b9e99da 100644
--- a/block.c
+++ b/block.c
@@ -1196,10 +1196,10 @@ static int bdrv_fill_options(QDict **options, const 
char *filename,
 return 0;
 }
 
-static BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
-BlockDriverState *child_bs,
-const char *child_name,
-const BdrvChildRole *child_role)
+BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
+ BlockDriverState *child_bs,
+ const char *child_name,
+ const BdrvChildRole *child_role)
 {
 BdrvChild *child = g_new(BdrvChild, 1);
 *child = (BdrvChild) {
diff --git a/block/quorum.c b/block/quorum.c
index 6793f12..e73418c 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -23,6 +23,7 @@
 #include "qapi/qmp/qstring.h"
 #include "qapi-event.h"
 #include "crypto/hash.h"
+#include "qemu/bitmap.h"
 
 #define HASH_LENGTH 32
 
@@ -80,6 +81,8 @@ typedef struct BDRVQuorumState {
 bool rewrite_corrupted;/* true if the driver must rewrite-on-read corrupted
 * block if Quorum is reached.
 */
+unsigned long *index_bitmap;
+int bsize;
 
 QuorumReadPattern read_pattern;
 } BDRVQuorumState;
@@ -875,9 +878,9 @@ static int quorum_open(BlockDriverState *bs, QDict 
*options, int flags,
 ret = -EINVAL;
 goto exit;
 }
-if (s->num_children < 2) {
+if (s->num_children < 1) {
 error_setg(_err,
-   "Number of provided children must be greater than 1");
+   "Number of provided children must be 1 or more");
 ret = -EINVAL;
 goto exit;
 }
@@ -926,6 +929,7 @@ static int quorum_open(BlockDriverState *bs, QDict 
*options, int flags,
 /* allocate the children array */
 s->children = g_new0(BdrvChild *, s->num_children);
 opened = g_new0(bool, s->num_children);
+s->index_bitmap = bitmap_new(s->num_children);
 
 for (i = 0; i < s->num_children; i++) {
 char indexstr[32];
@@ -941,6 +945,8 @@ static int quorum_open(BlockDriverState *bs, QDict 
*options, int flags,
 
 opened[i] = true;
 }
+bitmap_set(s->index_bitmap, 0, s->num_children);
+s->bsize = s->num_children;
 
 g_free(opened);
 goto exit;
@@ -997,6 +1003,115 @@ static void quorum_attach_aio_context(BlockDriverState 
*bs,
 }
 }
 
+static int get_new_child_index(BDRVQuorumState *s)
+{
+int index;
+
+index = find_next_zero_bit(s->index_bitmap, s->bsize, 0);
+if (index < s->bsize) {
+return index;
+}
+
+if ((s->bsize % BITS_PER_LONG) == 0) {
+s->index_bitmap = bitmap_zero_extend(s->index_bitmap, s->bsize,
+ s->bsize + 1);
+}
+
+return s->bsize++;
+}
+
+static void remove_child_index(BDRVQuorumState *s, int index)
+{
+int last_index;
+long new_len;
+
+assert(index < s->bsize);
+
+clear_bit(index, s->index_bitmap);
+if (index < s->bsize - 1) {
+/*
+ * The last bit is always set, and we don't clear
+ * the last bit.
+ */
+return;
+}
+
+last_index = find_last_bit(s->index_bitmap, s->bsize);
+s->bsize = last_index + 1;
+if (BITS_TO_LONGS(last_index + 1) == BITS_TO_LONGS(s->bsize)) {
+return;
+}
+
+new_len = BITS_TO_LONGS(last_index + 1) * sizeof(unsigned long);
+s->index_bitmap = g_realloc(s->index_bitmap, new_len);
+}
+
+static void quorum_add_child(BlockDriverState *bs, BlockDriverState *child_bs,
+ Error **errp)
+{
+BDRVQuorumState *s = bs->opaque;
+BdrvChild *child;
+char indexstr[32];
+int index, ret;
+
+index = get_new_child_index(s);
+ret = snprintf(indexstr, 32, "children.%d", index);
+if (ret < 0 || ret >= 32) {
+error_setg(errp, "cannot generate child name");
+return;
+}
+
+bdrv_drain(bs);
+
+assert(s->num_children <= INT_MAX / sizeof(BdrvChild *));
+if (s->num_children == INT_MAX / sizeof(BdrvChild *)) {
+error_setg(errp, "Too many children");
+return;
+}
+s->children = g_renew(BdrvChild *, s->children, s->num_children + 1);
+
+bdrv_ref(child_bs);
+child = bdrv_attach_child(bs, child_bs, indexstr, _format);
+s->children[s->num_children++] = child;
+set_bit(index, s->index_bitmap);
+}
+
+static