Re: [Qemu-block] [Qemu-devel] [PATCH RFC for-2.3 1/1] block: New command line option --no-format-probing

2015-03-25 Thread Markus Armbruster
Paolo Bonzini pbonz...@redhat.com writes:

 On 24/03/2015 17:49, Markus Armbruster wrote:
 But what about migration from newer to older QEMU?  Libvirt even
 supports QEMU versions where the only way to specify disks is -hda
 XYZ, so it is _impossible_ to honor the format=raw specifier.
 
 If you migrate to a QEMU that doesn't understand the new option, libvirt
 simply won't set it.  You lose the protection against libvirt bugs it
 provides.  Guest won't notice.
 
 If you somehow manage to find a QEMU old enough to make libvirt use
 format-incapable interfaces, you'll be using insecure format probing on
 the destination.  My patch makes no difference.  Good luck migrating to
 such an old QEMU.

 (Didn't mean live migration---sorry, could have simply said switch).

 Based on my reading of the code, libvirt will actually ignore the
 allow_disk_format_probing setting, and not do anything about the format
 when driving such an old QEMU.  By contrast, if you specify a format and
 libvirt invokes an old qemu-nbd without --format, libvirt fails hard.
 That's already CVE worthy, isn't it?

 So I think an option like this is premature.  libvirt should _first of
 all_ ensure that it completely abides by the allow_disk_format_probing
 setting, including refusing to drive old QEMUs when format probing is
 disabled.  Once libvirt is consistent within itself, we can talk about
 what help QEMU can provide.

If we had a no format probing option in qemu-nbd (not in my patch, but
that's a flaw in the patch, not the idea), then libvirt developers
would've put it to use right away, and then their insecure usage
would've failed hard, making it immediately obvious.

My point is: we don't have to perfect libvirt before we can provide it a
safe failure mode.  The safe failure mode will actually assist with
perfecting libvirt, by making the failures immediately fatal.

 Perfect is not the enemy of good here.  Good is the enemy of secure.

 As to near misses don't count: for me, what counts is actual users
 telling me about their difficulties using QEMU securely.  Secure usage
 shouldn't be hard.

 The right answer for them is probably use libvirt or use Boxes
 depending on the actual usecase.  Invoking QEMU manually is almost never
 the right answer for random untrusted images download from the Internet.

I agree secure usage is currently is too hard for casual command line
use.

Unfortunately, it has proven too hard even for sophisticated
programmatic users like libvirt.  I refuse to blame libvirt for that.
Secure usage really shouldn't be that hard.

  Also, I suspect any advice to QEMU users about adding
 --no-format-probing would be quickly forgotten.

When we discussed security issues with probing last year, my first line
of defense was the default should be secure.  Overwhelming opposition
from the backward compatibility camp forced me to retreat from that line
to my second line of defense secure shouldn't be hard.  That line
needs to be held at all costs :)

Always specify --no-format-probing (or however else we label the knob)
is workable advice for programmatic users like libvirt.  I agree it's
hardly helpful advice for human users.  For them, the advice needs to be
something like add the following stanza to ~/.qemu.conf, then forget
about it.

Same thing, different user interface.

~/.qemu.conf isn't currently used, but it could be.

 That said, if _humans_ have interest in secure use of QEMU, that's a
 much better and more interesting use case than libvirt's, because
 libvirt is itself providing a secure management layer.

Debating which use case is better and more interesting doesn't seem to
be wortwhile.  I think both are interesting enough.

 We have other security options, for example seccomp and FIPS mode.
 Format probing definitely falls in this category.  Let's add first of
 all a -security grouping, where -security [all=]on enables all of them
 but it's also possible to control the suboptions individually.  Then we
 can add format probing to this category.  The same options can be added
 to the utilities.

Putting it in a security option group is certainly fine with me.  I
picked misc only because I couldn't find a better fit, and I could see
other readconfig-incapable options where I couldn't find a better fit,
either.

 Let's iron out the kinks and do it for 2.4.  It's a very useful feature
 indeed.

 But it's something we do for _users_, not for libvirt.

For whom you want it to get done isn't as important to me as getting it
done :)

 But I still
 maintain that for libvirt this is basically security theater, and the
 priorities are others.

To be honest, I find your use of security theater offensive.

I'd readily accept the moniker if the feature would provide libvirt
developers an excuse not to fix their bugs, or reduce the incentives to
fix their bugs.

But the feature in fact does the opposite: it makes such bugs blatantly
obvious and 

[Qemu-block] [RFC PATCH COLO v2 12/13] skip nbd_target when starting block replication

2015-03-25 Thread Wen Congyang
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
Signed-off-by: Gonglei arei.gong...@huawei.com
---
 block.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/block.c b/block.c
index bd7fa9c..3af5ad4 100644
--- a/block.c
+++ b/block.c
@@ -6368,6 +6368,12 @@ BlockAcctStats *bdrv_get_stats(BlockDriverState *bs)
 void bdrv_start_replication(BlockDriverState *bs, COLOMode mode, Error **errp)
 {
 BlockDriver *drv = bs-drv;
+Error *local_err = NULL;
+
+if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKING_REFERENCE, local_err)) {
+error_free(local_err);
+return;
+}
 
 if (drv  drv-bdrv_start_replication) {
 drv-bdrv_start_replication(bs, mode, errp);
@@ -6381,6 +6387,12 @@ void bdrv_start_replication(BlockDriverState *bs, 
COLOMode mode, Error **errp)
 void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp)
 {
 BlockDriver *drv = bs-drv;
+Error *local_err = NULL;
+
+if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKING_REFERENCE, local_err)) {
+error_free(local_err);
+return;
+}
 
 if (drv  drv-bdrv_do_checkpoint) {
 drv-bdrv_do_checkpoint(bs, errp);
@@ -6394,6 +6406,12 @@ void bdrv_do_checkpoint(BlockDriverState *bs, Error 
**errp)
 void bdrv_stop_replication(BlockDriverState *bs, Error **errp)
 {
 BlockDriver *drv = bs-drv;
+Error *local_err = NULL;
+
+if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKING_REFERENCE, local_err)) {
+error_free(local_err);
+return;
+}
 
 if (drv  drv-bdrv_stop_replication) {
 drv-bdrv_stop_replication(bs, errp);
-- 
2.1.0




[Qemu-block] [RFC PATCH COLO v2 13/13] Don't allow a disk use backing reference target

2015-03-25 Thread Wen Congyang
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
Signed-off-by: Gonglei arei.gong...@huawei.com
---
 block.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/block.c b/block.c
index 3af5ad4..548cafa 100644
--- a/block.c
+++ b/block.c
@@ -1399,6 +1399,14 @@ static int 
bdrv_open_backing_reference_file(BlockDriverState *bs,
 }
 
 backing_hd = blk_bs(backing_blk);
+/* Don't allow a disk use backing reference target */
+ret = blk_attach_dev(backing_hd-blk, bs);
+if (ret  0) {
+error_setg(errp, backing_hd %s is used by the other device model,
+   backing_name);
+goto free_exit;
+}
+
 /* Backing reference itself? */
 if (backing_hd == bs || bdrv_find_overlay(backing_hd, bs)) {
 error_setg(errp, Backing reference itself);
@@ -2077,6 +2085,7 @@ void bdrv_close(BlockDriverState *bs)
 if (backing_hd-backing_hd-job) {
 block_job_cancel(backing_hd-backing_hd-job);
 }
+blk_detach_dev(backing_hd-backing_hd-blk, bs);
 bdrv_set_backing_hd(backing_hd, NULL);
 bdrv_unref(backing_hd-backing_hd);
 }
-- 
2.1.0




[Qemu-block] [RFC PATCH COLO v2 09/13] block: Parse backing_reference option to reference existing BDS

2015-03-25 Thread Wen Congyang
Usage:
-drive file=xxx,id=Y, \
-drive 
file=,id=X,backing_reference.drive_id=Y,backing_reference.hidden-disk.*

It will create such backing chain:
   {virtio-blk dev 'Y'}  
{virtio-blk dev 'X'}
 |  
|
 |  
|
 v  
v

[base] - [mid] - ( Y )  - (hidden target) 
--- ( X )

 v  ^
 v  ^
 v  ^
 v  ^
  drive-backup sync=none 

X's backing file is hidden-disk, and hidden-disk's backing file is Y.
Disk Y may be opened or reopened in read-write mode, so A block backup
job is automatically created: source is Y and target is hidden-disk.

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
Signed-off-by: Gonglei arei.gong...@huawei.com
---
 block.c| 145 -
 include/block/block.h  |   1 +
 include/block/block_int.h  |   1 +
 tests/qemu-iotests/051 |  13 
 tests/qemu-iotests/051.out |  13 
 5 files changed, 170 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index b4d629e..bd7fa9c 100644
--- a/block.c
+++ b/block.c
@@ -1351,6 +1351,113 @@ free_exit:
 return ret;
 }
 
+static void backing_reference_completed(void *opaque, int ret)
+{
+BlockDriverState *hidden_disk = opaque;
+
+assert(!hidden_disk-backing_reference);
+}
+
+static int bdrv_open_backing_reference_file(BlockDriverState *bs,
+QDict *options, Error **errp)
+{
+const char *backing_name;
+QDict *hidden_disk_options = NULL;
+BlockDriverState *backing_hd, *hidden_disk;
+BlockBackend *backing_blk;
+Error *local_err = NULL;
+int ret = 0;
+
+backing_name = qdict_get_try_str(options, drive_id);
+if (!backing_name) {
+error_setg(errp, Backing reference needs option drive_id);
+ret = -EINVAL;
+goto free_exit;
+}
+qdict_del(options, drive_id);
+
+qdict_extract_subqdict(options, hidden_disk_options, hidden-disk.);
+if (!qdict_size(hidden_disk_options)) {
+error_setg(errp, Backing reference needs option hidden-disk.*);
+ret = -EINVAL;
+goto free_exit;
+}
+
+if (qdict_size(options)) {
+const QDictEntry *entry = qdict_first(options);
+error_setg(errp, Backing reference used by '%s' doesn't support 
+   the option '%s', bdrv_get_device_name(bs), entry-key);
+ret = -EINVAL;
+goto free_exit;
+}
+
+backing_blk = blk_by_name(backing_name);
+if (!backing_blk) {
+error_set(errp, QERR_DEVICE_NOT_FOUND, backing_name);
+ret = -ENOENT;
+goto free_exit;
+}
+
+backing_hd = blk_bs(backing_blk);
+/* Backing reference itself? */
+if (backing_hd == bs || bdrv_find_overlay(backing_hd, bs)) {
+error_setg(errp, Backing reference itself);
+ret = -EINVAL;
+goto free_exit;
+}
+
+if (bdrv_op_is_blocked(backing_hd, BLOCK_OP_TYPE_BACKING_REFERENCE,
+   errp)) {
+ret = -EBUSY;
+goto free_exit;
+}
+
+/* hidden-disk is bs's backing file */
+ret = bdrv_open_backing_file(bs, hidden_disk_options, errp);
+hidden_disk_options = NULL;
+if (ret  0) {
+goto free_exit;
+}
+
+hidden_disk = bs-backing_hd;
+if (!hidden_disk-drv || !hidden_disk-drv-supports_backing) {
+ret = -EINVAL;
+error_setg(errp, Hidden disk's driver doesn't support backing files);
+goto free_exit;
+}
+
+bdrv_set_backing_hd(hidden_disk, backing_hd);
+bdrv_ref(backing_hd);
+
+/*
+ * backing hd may be opened or reopened in read-write mode, so we
+ * should backup backing hd to hidden disk
+ */
+bdrv_op_unblock(hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
+bs-backing_blocker);
+bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_SOURCE,
+hidden_disk-backing_blocker);
+
+bdrv_ref(hidden_disk);
+backup_start(backing_hd, hidden_disk, 0, MIRROR_SYNC_MODE_NONE,
+ BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
+ backing_reference_completed, hidden_disk, local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+bdrv_unref(hidden_disk);
+/* FIXME, use which errno? */
+ret = -EIO;
+goto free_exit;
+}
+
+bs-backing_reference = true;
+
+free_exit:
+QDECREF(hidden_disk_options);
+QDECREF(options);
+ 

Re: [Qemu-block] [PATCH] nbd: Fix up comment after commit e140177

2015-03-25 Thread Paolo Bonzini


On 25/03/2015 09:18, Markus Armbruster wrote:
 Signed-off-by: Markus Armbruster arm...@redhat.com
 ---
  blockdev-nbd.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)
 
 diff --git a/blockdev-nbd.c b/blockdev-nbd.c
 index b29e456..85cda4c 100644
 --- a/blockdev-nbd.c
 +++ b/blockdev-nbd.c
 @@ -47,8 +47,9 @@ void qmp_nbd_server_start(SocketAddress *addr, Error **errp)
  }
  }
  
 -/* Hook into the BlockDriverState notifiers to close the export when
 - * the file is closed.
 +/*
 + * Hook into the BlockBackend notifiers to close the export when the
 + * backend is closed.
   */
  typedef struct NBDCloseNotifier {
  Notifier n;
 

Applied, thanks.

Paolo



[Qemu-block] [RFC PATCH COLO v2 01/13] docs: block replication's description

2015-03-25 Thread Wen Congyang
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
Signed-off-by: Gonglei arei.gong...@huawei.com
---
 docs/block-replication.txt | 147 +
 1 file changed, 147 insertions(+)
 create mode 100644 docs/block-replication.txt

diff --git a/docs/block-replication.txt b/docs/block-replication.txt
new file mode 100644
index 000..874ed8e
--- /dev/null
+++ b/docs/block-replication.txt
@@ -0,0 +1,147 @@
+Block replication
+
+Copyright Fujitsu, Corp. 2015
+Copyright (c) 2015 Intel Corporation
+Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.
+See the COPYING file in the top-level directory.
+
+The block replication is used for continuous checkpoints. It is designed
+for COLO that Secondary VM is running. It can also be applied for FT/HA
+scene that Secondary VM is not running.
+
+This document gives an overview of block replication's design.
+
+== Background ==
+High availability solutions such as micro checkpoint and COLO will do
+consecutive checkpoint. The VM state of Primary VM and Secondary VM is
+identical right after a VM checkpoint, but becomes different as the VM
+executes till the next checkpoint. To support disk contents checkpoint,
+the modified disk contents in the Secondary VM must be buffered, and are
+only dropped at next checkpoint time. To reduce the network transportation
+effort at the time of checkpoint, the disk modification operations of
+Primary disk are asynchronously forwarded to the Secondary node.
+
+== Workflow ==
+The following is the image of block replication workflow:
+
++--+++
+|Primary Write Requests||Secondary Write Requests|
++--+++
+  |   |
+  |  (4)
+  |   V
+  |  /-\
+  |  Copy and Forward| |
+  |-(1)--+   | Disk Buffer |
+  |  |   | |
+  | (3)  \-/
+  | speculative  ^
+  |write through(2)
+  |  |   |
+  V  V   |
+   +--+   ++
+   | Primary Disk |   | Secondary Disk |
+   +--+   ++
+
+1) Primary write requests will be copied and forwarded to Secondary
+   QEMU.
+2) Before Primary write requests are written to Secondary disk, the
+   original sector content will be read from Secondary disk and
+   buffered in the Disk buffer, but it will not overwrite the existing
+   sector content in the Disk buffer.
+3) Primary write requests will be written to Secondary disk.
+4) Secondary write requests will be buffered in the Disk buffer and it
+   will overwrite the existing sector content in the buffer.
+
+== Architecture ==
+We are going to implement COLO block replication from many basic
+blocks that are already in QEMU.
+
+ virtio-blk   ||
+ ^||.--
+ |||| Secondary
+1 Quorum  ||'--
+ /  \ ||
+/\||
+   Primary  2 NBD  ---  2 NBD
+ disk   client|| server
 virtio-blk
+  ||^  
  ^
+. |||  
  |
+Primary | ||  Secondary disk - hidden-disk 4 
- active-disk 3
+' |||  backing^   backing
+  ||| |
+  ||| |
+  ||'-'
+  ||   drive-backup sync=none
+
+1) The disk on the primary is represented by a block device with two
+children, providing replication between a primary disk and the host that
+runs the secondary VM. The read pattern for quorum can be extended to
+make the primary always read from the local disk instead of going through
+NBD.
+
+2) The 

[Qemu-block] [RFC PATCH COLO v2 10/13] Backup: clear all bitmap when doing block checkpoint

2015-03-25 Thread Wen Congyang
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
Signed-off-by: Gonglei arei.gong...@huawei.com
Cc: Jeff Cody jc...@redhat.com
---
 block/backup.c| 12 
 include/block/block_int.h |  1 +
 include/qemu/hbitmap.h|  8 
 util/hbitmap.c| 19 +++
 4 files changed, 40 insertions(+)

diff --git a/block/backup.c b/block/backup.c
index 1c535b1..4e9d535 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -435,3 +435,15 @@ void backup_start(BlockDriverState *bs, BlockDriverState 
*target,
 job-common.co = qemu_coroutine_create(backup_run);
 qemu_coroutine_enter(job-common.co, job);
 }
+
+void backup_do_checkpoint(BlockJob *job, Error **errp)
+{
+BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
+
+if (job-driver != backup_job_driver) {
+error_setg(errp, It is not backup job);
+return;
+}
+
+hbitmap_reset_all(backup_job-bitmap);
+}
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 624945d..45d547b 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -628,6 +628,7 @@ void backup_start(BlockDriverState *bs, BlockDriverState 
*target,
   BlockdevOnError on_target_error,
   BlockCompletionFunc *cb, void *opaque,
   Error **errp);
+void backup_do_checkpoint(BlockJob *job, Error **errp);
 
 void blk_dev_change_media_cb(BlockBackend *blk, bool load);
 bool blk_dev_has_removable_media(BlockBackend *blk);
diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
index 550d7ce..95a55e4 100644
--- a/include/qemu/hbitmap.h
+++ b/include/qemu/hbitmap.h
@@ -109,6 +109,14 @@ void hbitmap_set(HBitmap *hb, uint64_t start, uint64_t 
count);
 void hbitmap_reset(HBitmap *hb, uint64_t start, uint64_t count);
 
 /**
+ * hbitmap_reset_all:
+ * @hb: HBitmap to operate on.
+ *
+ * Reset all bits in an HBitmap.
+ */
+void hbitmap_reset_all(HBitmap *hb);
+
+/**
  * hbitmap_get:
  * @hb: HBitmap to operate on.
  * @item: Bit to query (0-based).
diff --git a/util/hbitmap.c b/util/hbitmap.c
index ab13971..4111ca5 100644
--- a/util/hbitmap.c
+++ b/util/hbitmap.c
@@ -353,6 +353,25 @@ void hbitmap_reset(HBitmap *hb, uint64_t start, uint64_t 
count)
 hb_reset_between(hb, HBITMAP_LEVELS - 1, start, last);
 }
 
+void hbitmap_reset_all(HBitmap *hb)
+{
+#if 0
+hbitmap_reset(hb, 0, hb-size  hb-granularity);
+#else
+uint64_t size = hb-size;
+unsigned int i;
+
+/* Same as hbitmap_alloc() except memset() */
+for (i = HBITMAP_LEVELS; i--  0; ) {
+size = MAX((size + BITS_PER_LONG - 1)  BITS_PER_LEVEL, 1);
+memset(hb-levels[i], 0, size * sizeof(unsigned long));
+}
+
+assert(size == 1);
+hb-levels[0][0] |= 1UL  (BITS_PER_LONG - 1);
+#endif
+}
+
 bool hbitmap_get(const HBitmap *hb, uint64_t item)
 {
 /* Compute position and bit in the last layer.  */
-- 
2.1.0




[Qemu-block] [RFC PATCH COLO v2 03/13] NBD client: connect to nbd server later

2015-03-25 Thread Wen Congyang
The secondary qemu starts later than the primary qemu, so we
cannot connect to nbd server in bdrv_open().

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
Signed-off-by: Gonglei arei.gong...@huawei.com
---
 block/nbd.c | 122 +---
 1 file changed, 108 insertions(+), 14 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 2176186..3faf865 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -44,6 +44,8 @@
 typedef struct BDRVNBDState {
 NbdClientSession client;
 QemuOpts *socket_opts;
+char *export;
+bool connected;
 } BDRVNBDState;
 
 static int nbd_parse_uri(const char *filename, QDict *options)
@@ -254,50 +256,95 @@ static int nbd_establish_connection(BlockDriverState *bs, 
Error **errp)
 return sock;
 }
 
-static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
-Error **errp)
+static int nbd_connect_server(BlockDriverState *bs, Error **errp)
 {
 BDRVNBDState *s = bs-opaque;
-char *export = NULL;
 int result, sock;
-Error *local_err = NULL;
-
-/* Pop the config into our state object. Exit if invalid. */
-nbd_config(s, options, export, local_err);
-if (local_err) {
-error_propagate(errp, local_err);
-return -EINVAL;
-}
 
 /* establish TCP connection, return error if it fails
  * TODO: Configurable retry-until-timeout behaviour.
  */
 sock = nbd_establish_connection(bs, errp);
 if (sock  0) {
-g_free(export);
+g_free(s-export);
 return sock;
 }
 
 /* NBD handshake */
-result = nbd_client_init(bs, sock, export, errp);
-g_free(export);
+result = nbd_client_init(bs, sock, s-export, errp);
+g_free(s-export);
+s-export = NULL;
+if (!result) {
+s-connected = true;
+}
+
 return result;
 }
 
+static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
+Error **errp)
+{
+BDRVNBDState *s = bs-opaque;
+Error *local_err = NULL;
+
+/* Pop the config into our state object. Exit if invalid. */
+nbd_config(s, options, s-export, local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return -EINVAL;
+}
+
+return nbd_connect_server(bs, errp);
+}
+
+static int nbd_open_colo(BlockDriverState *bs, QDict *options, int flags,
+ Error **errp)
+{
+BDRVNBDState *s = bs-opaque;
+Error *local_err = NULL;
+
+/* Pop the config into our state object. Exit if invalid. */
+nbd_config(s, options, s-export, local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return -EINVAL;
+}
+
+return 0;
+}
+
 static int nbd_co_readv(BlockDriverState *bs, int64_t sector_num,
 int nb_sectors, QEMUIOVector *qiov)
 {
+BDRVNBDState *s = bs-opaque;
+
+if (!s-connected) {
+return -EIO;
+}
+
 return nbd_client_co_readv(bs, sector_num, nb_sectors, qiov);
 }
 
 static int nbd_co_writev(BlockDriverState *bs, int64_t sector_num,
  int nb_sectors, QEMUIOVector *qiov)
 {
+BDRVNBDState *s = bs-opaque;
+
+if (!s-connected) {
+return -EIO;
+}
+
 return nbd_client_co_writev(bs, sector_num, nb_sectors, qiov);
 }
 
 static int nbd_co_flush(BlockDriverState *bs)
 {
+BDRVNBDState *s = bs-opaque;
+
+if (!s-connected) {
+return -EIO;
+}
+
 return nbd_client_co_flush(bs);
 }
 
@@ -310,6 +357,12 @@ static void nbd_refresh_limits(BlockDriverState *bs, Error 
**errp)
 static int nbd_co_discard(BlockDriverState *bs, int64_t sector_num,
   int nb_sectors)
 {
+BDRVNBDState *s = bs-opaque;
+
+if (!s-connected) {
+return -EIO;
+}
+
 return nbd_client_co_discard(bs, sector_num, nb_sectors);
 }
 
@@ -319,23 +372,44 @@ static void nbd_close(BlockDriverState *bs)
 
 qemu_opts_del(s-socket_opts);
 nbd_client_close(bs);
+s-connected = false;
 }
 
 static int64_t nbd_getlength(BlockDriverState *bs)
 {
 BDRVNBDState *s = bs-opaque;
 
+if (!s-connected) {
+/*
+ * We cannot return -ENOTCONN, otherwise refresh_total_sectors()
+ * will fail, and we cannot open nbd client.
+ */
+return 0;
+}
+
 return s-client.size;
 }
 
 static void nbd_detach_aio_context(BlockDriverState *bs)
 {
+BDRVNBDState *s = bs-opaque;
+
+if (!s-connected) {
+return;
+}
+
 nbd_client_detach_aio_context(bs);
 }
 
 static void nbd_attach_aio_context(BlockDriverState *bs,
AioContext *new_context)
 {
+BDRVNBDState *s = bs-opaque;
+
+if (!s-connected) {
+return;
+}
+
 nbd_client_attach_aio_context(bs, new_context);
 }
 
@@ -438,11 +512,31 @@ static BlockDriver bdrv_nbd_unix = {
 .bdrv_refresh_filename  = nbd_refresh_filename,
 };
 
+static 

[Qemu-block] [RFC PATCH COLO v2 04/13] Add new block driver interfaces to control block replication

2015-03-25 Thread Wen Congyang
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
Signed-off-by: Gonglei arei.gong...@huawei.com
Cc: Luiz Capitulino lcapitul...@redhat.com
Cc: Michael Roth mdr...@linux.vnet.ibm.com
---
 block.c   | 39 +++
 include/block/block.h |  4 
 include/block/block_int.h | 11 +++
 qapi/block.json   | 16 
 4 files changed, 70 insertions(+)

diff --git a/block.c b/block.c
index 0fe97de..0ff5cf8 100644
--- a/block.c
+++ b/block.c
@@ -6196,3 +6196,42 @@ BlockAcctStats *bdrv_get_stats(BlockDriverState *bs)
 {
 return bs-stats;
 }
+
+void bdrv_start_replication(BlockDriverState *bs, COLOMode mode, Error **errp)
+{
+BlockDriver *drv = bs-drv;
+
+if (drv  drv-bdrv_start_replication) {
+drv-bdrv_start_replication(bs, mode, errp);
+} else if (bs-file) {
+bdrv_start_replication(bs-file, mode, errp);
+} else {
+error_set(errp, QERR_UNSUPPORTED);
+}
+}
+
+void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp)
+{
+BlockDriver *drv = bs-drv;
+
+if (drv  drv-bdrv_do_checkpoint) {
+drv-bdrv_do_checkpoint(bs, errp);
+} else if (bs-file) {
+bdrv_do_checkpoint(bs-file, errp);
+} else {
+error_set(errp, QERR_UNSUPPORTED);
+}
+}
+
+void bdrv_stop_replication(BlockDriverState *bs, Error **errp)
+{
+BlockDriver *drv = bs-drv;
+
+if (drv  drv-bdrv_stop_replication) {
+drv-bdrv_stop_replication(bs, errp);
+} else if (bs-file) {
+bdrv_stop_replication(bs-file, errp);
+} else {
+error_set(errp, QERR_UNSUPPORTED);
+}
+}
diff --git a/include/block/block.h b/include/block/block.h
index 4c57d63..68f3b1a 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -569,4 +569,8 @@ void bdrv_flush_io_queue(BlockDriverState *bs);
 
 BlockAcctStats *bdrv_get_stats(BlockDriverState *bs);
 
+void bdrv_start_replication(BlockDriverState *bs, COLOMode mode, Error **errp);
+void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp);
+void bdrv_stop_replication(BlockDriverState *bs, Error **errp);
+
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index dccb092..08dd8ba 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -290,6 +290,17 @@ struct BlockDriver {
  */
 int (*bdrv_probe_geometry)(BlockDriverState *bs, HDGeometry *geo);
 
+
+void (*bdrv_start_replication)(BlockDriverState *bs, COLOMode mode,
+   Error **errp);
+/* Drop Disk buffer when doing checkpoint. */
+void (*bdrv_do_checkpoint)(BlockDriverState *bs, Error **errp);
+/*
+ * After failover, we should flush Disk buffer into secondary disk
+ * and stop block replication.
+ */
+void (*bdrv_stop_replication)(BlockDriverState *bs, Error **errp);
+
 QLIST_ENTRY(BlockDriver) list;
 };
 
diff --git a/qapi/block.json b/qapi/block.json
index e313465..e640566 100644
--- a/qapi/block.json
+++ b/qapi/block.json
@@ -40,6 +40,22 @@
   'data': ['auto', 'none', 'lba', 'large', 'rechs']}
 
 ##
+# @COLOMode
+#
+# An enumeration of COLO mode.
+#
+# @unprotected: COLO is not started or after failover
+#
+# @primary: Primary mode, the vm's state will be sent to secondary QEMU.
+#
+# @secondary: Secondary mode, receive the vm's state from primary QEMU.
+#
+# Since: 2.4
+##
+{ 'enum' : 'COLOMode',
+  'data' : ['unprotected', 'primary', 'secondary']}
+
+##
 # @BlockdevSnapshotInternal
 #
 # @device: the name of the device to generate the snapshot from
-- 
2.1.0




[Qemu-block] [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints

2015-03-25 Thread Wen Congyang
Block replication is a very important feature which is used for
continuous checkpoints(for example: COLO).

Usage:
Please refer to docs/block-replication.txt

You can get the patch here:
https://github.com/wencongyang/qemu-colo/commits/block-replication-v2

Changs Log:
V2:
1. Redesign the secondary qemu(use image-fleecing)
2. Use Error objects to return error message
3. Address the comments from Max Reitz and Eric Blake

Wen Congyang (13):
  docs: block replication's description
  quorum: allow ignoring child errors
  NBD client: connect to nbd server later
  Add new block driver interfaces to control block replication
  quorum: implement block driver interfaces for block replication
  NBD client: implement block driver interfaces for block replication
  allow writing to the backing file
  Allow creating backup jobs when opening BDS
  block: Parse backing_reference option to reference existing BDS
  Backup: clear all bitmap when doing block checkpoint
  qcow2: support colo
  skip nbd_target when starting block replication
  Don't allow a disk use backing reference target

 block.c| 242 +++-
 block/Makefile.objs|   2 +-
 block/backup.c |  12 ++
 block/nbd.c| 171 +++--
 block/qcow2.c  | 447 -
 block/qcow2.h  |   6 +
 block/quorum.c | 143 ++-
 docs/block-replication.txt | 147 +++
 include/block/block.h  |   5 +
 include/block/block_int.h  |  13 ++
 include/qemu/hbitmap.h |   8 +
 qapi/block.json|  16 ++
 tests/qemu-iotests/051 |  13 ++
 tests/qemu-iotests/051.out |  13 ++
 util/hbitmap.c |  19 ++
 15 files changed, 1230 insertions(+), 27 deletions(-)
 create mode 100644 docs/block-replication.txt

-- 
2.1.0




[Qemu-block] [RFC PATCH COLO v2 06/13] NBD client: implement block driver interfaces for block replication

2015-03-25 Thread Wen Congyang
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
Signed-off-by: Gonglei arei.gong...@huawei.com
---
 block/nbd.c | 49 +
 1 file changed, 49 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index 3faf865..753b322 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -458,6 +458,52 @@ static void nbd_refresh_filename(BlockDriverState *bs)
 bs-full_open_options = opts;
 }
 
+static void nbd_start_replication(BlockDriverState *bs, COLOMode mode,
+  Error **errp)
+{
+BDRVNBDState *s = bs-opaque;
+
+/*
+ * TODO: support COLO_SECONDARY_MODE if we allow secondary
+ * QEMU becoming primary QEMU.
+ */
+if (mode != COLO_MODE_PRIMARY) {
+error_set(errp, QERR_INVALID_PARAMETER, mode);
+return;
+}
+
+if (s-connected) {
+error_setg(errp, The connection is established);
+return;
+}
+
+/* TODO: NBD client should be one child of quorum, how to verify it? */
+nbd_connect_server(bs, errp);
+}
+
+static void nbd_do_checkpoint(BlockDriverState *bs, Error **errp)
+{
+BDRVNBDState *s = bs-opaque;
+
+if (!s-connected) {
+error_setg(errp, The connection is not established);
+return;
+}
+}
+
+static void nbd_stop_replication(BlockDriverState *bs, Error **errp)
+{
+BDRVNBDState *s = bs-opaque;
+
+if (!s-connected) {
+error_setg(errp, The connection is not established);
+return;
+}
+
+nbd_client_close(bs);
+s-connected = false;
+}
+
 static BlockDriver bdrv_nbd = {
 .format_name= nbd,
 .protocol_name  = nbd,
@@ -527,6 +573,9 @@ static BlockDriver bdrv_nbd_colo = {
 .bdrv_detach_aio_context= nbd_detach_aio_context,
 .bdrv_attach_aio_context= nbd_attach_aio_context,
 .bdrv_refresh_filename  = nbd_refresh_filename,
+.bdrv_start_replication = nbd_start_replication,
+.bdrv_do_checkpoint = nbd_do_checkpoint,
+.bdrv_stop_replication  = nbd_stop_replication,
 
 .has_variable_length= true,
 };
-- 
2.1.0




Re: [Qemu-block] [PATCH v2 0/2] AHCI: avoid mapping stale guest memory

2015-03-25 Thread Stefan Hajnoczi
On Fri, Mar 13, 2015 at 05:50:52PM -0400, John Snow wrote:
 Currently, the AHCI device tries to re-map guest memory every time
 the low or high address registers are written to, whether or not the
 AHCI device is currently active. If the other register has stale
 information in it, this may lead to runtime failures.
 
 Reconfigure the AHCI device to ignore writes to these registers while
 the device is active, and otherwise postpone the dma memory map until
 the device becomes active.
 
 If the mappings should for whatever reason fail, do not activate the
 bits that tell the user the device has been started successfully.
 
 v2:
  - ahci_map_[clb|fis]_address now returns true on success
  - PORT_CMD_LIST_ON and PORT_CMD_FIS_ON only turn on if the map succeeds
  - Fix compiler warning due to changing context.
 
 John Snow (2):
   AHCI: Do not (re)map FB/CLB buffers while not running
   AHCI: Protect cmd register
 
  hw/ide/ahci.c | 76 
 +--
  hw/ide/ahci.h |  2 ++
  2 files changed, 60 insertions(+), 18 deletions(-)
 
 -- 
 1.9.3
 

Reviewed-by: Stefan Hajnoczi stefa...@redhat.com


pgpOSR3qQ8GIt.pgp
Description: PGP signature


Re: [Qemu-block] [RFC PATCH COLO v2 04/13] Add new block driver interfaces to control block replication

2015-03-25 Thread Paolo Bonzini


On 25/03/2015 10:36, Wen Congyang wrote:
 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
 Signed-off-by: Gonglei arei.gong...@huawei.com
 Cc: Luiz Capitulino lcapitul...@redhat.com
 Cc: Michael Roth mdr...@linux.vnet.ibm.com
 ---
  block.c   | 39 +++
  include/block/block.h |  4 
  include/block/block_int.h | 11 +++
  qapi/block.json   | 16 
  4 files changed, 70 insertions(+)
 
 diff --git a/block.c b/block.c
 index 0fe97de..0ff5cf8 100644
 --- a/block.c
 +++ b/block.c
 @@ -6196,3 +6196,42 @@ BlockAcctStats *bdrv_get_stats(BlockDriverState *bs)
  {
  return bs-stats;
  }
 +
 +void bdrv_start_replication(BlockDriverState *bs, COLOMode mode, Error 
 **errp)
 +{
 +BlockDriver *drv = bs-drv;
 +
 +if (drv  drv-bdrv_start_replication) {
 +drv-bdrv_start_replication(bs, mode, errp);
 +} else if (bs-file) {
 +bdrv_start_replication(bs-file, mode, errp);
 +} else {
 +error_set(errp, QERR_UNSUPPORTED);
 +}
 +}
 +
 +void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp)
 +{
 +BlockDriver *drv = bs-drv;
 +
 +if (drv  drv-bdrv_do_checkpoint) {
 +drv-bdrv_do_checkpoint(bs, errp);
 +} else if (bs-file) {
 +bdrv_do_checkpoint(bs-file, errp);
 +} else {
 +error_set(errp, QERR_UNSUPPORTED);
 +}
 +}
 +
 +void bdrv_stop_replication(BlockDriverState *bs, Error **errp)
 +{
 +BlockDriver *drv = bs-drv;
 +
 +if (drv  drv-bdrv_stop_replication) {
 +drv-bdrv_stop_replication(bs, errp);
 +} else if (bs-file) {
 +bdrv_stop_replication(bs-file, errp);
 +} else {
 +error_set(errp, QERR_UNSUPPORTED);
 +}
 +}
 diff --git a/include/block/block.h b/include/block/block.h
 index 4c57d63..68f3b1a 100644
 --- a/include/block/block.h
 +++ b/include/block/block.h
 @@ -569,4 +569,8 @@ void bdrv_flush_io_queue(BlockDriverState *bs);
  
  BlockAcctStats *bdrv_get_stats(BlockDriverState *bs);
  
 +void bdrv_start_replication(BlockDriverState *bs, COLOMode mode, Error 
 **errp);
 +void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp);
 +void bdrv_stop_replication(BlockDriverState *bs, Error **errp);
 +
  #endif
 diff --git a/include/block/block_int.h b/include/block/block_int.h
 index dccb092..08dd8ba 100644
 --- a/include/block/block_int.h
 +++ b/include/block/block_int.h
 @@ -290,6 +290,17 @@ struct BlockDriver {
   */
  int (*bdrv_probe_geometry)(BlockDriverState *bs, HDGeometry *geo);
  
 +
 +void (*bdrv_start_replication)(BlockDriverState *bs, COLOMode mode,
 +   Error **errp);
 +/* Drop Disk buffer when doing checkpoint. */
 +void (*bdrv_do_checkpoint)(BlockDriverState *bs, Error **errp);
 +/*
 + * After failover, we should flush Disk buffer into secondary disk
 + * and stop block replication.
 + */
 +void (*bdrv_stop_replication)(BlockDriverState *bs, Error **errp);
 +
  QLIST_ENTRY(BlockDriver) list;
  };
  
 diff --git a/qapi/block.json b/qapi/block.json
 index e313465..e640566 100644
 --- a/qapi/block.json
 +++ b/qapi/block.json
 @@ -40,6 +40,22 @@
'data': ['auto', 'none', 'lba', 'large', 'rechs']}
  
  ##
 +# @COLOMode
 +#
 +# An enumeration of COLO mode.
 +#
 +# @unprotected: COLO is not started or after failover
 +#
 +# @primary: Primary mode, the vm's state will be sent to secondary QEMU.
 +#
 +# @secondary: Secondary mode, receive the vm's state from primary QEMU.
 +#
 +# Since: 2.4
 +##
 +{ 'enum' : 'COLOMode',
 +  'data' : ['unprotected', 'primary', 'secondary']}

Perhaps replace COLOMode with ReplicationMode or FaultToleranceMode?

 +##
  # @BlockdevSnapshotInternal
  #
  # @device: the name of the device to generate the snapshot from
 

Apart from this,

Reviewed-by: Paolo Bonzini pbonz...@redhat.com



Re: [Qemu-block] [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints

2015-03-25 Thread Dr. David Alan Gilbert
* Wen Congyang (we...@cn.fujitsu.com) wrote:
 Block replication is a very important feature which is used for
 continuous checkpoints(for example: COLO).
 
 Usage:
 Please refer to docs/block-replication.txt
 
 You can get the patch here:
 https://github.com/wencongyang/qemu-colo/commits/block-replication-v2

Do you have a branch with this code ontop of the main colo work?

Dave

 
 Changs Log:
 V2:
 1. Redesign the secondary qemu(use image-fleecing)
 2. Use Error objects to return error message
 3. Address the comments from Max Reitz and Eric Blake
 
 Wen Congyang (13):
   docs: block replication's description
   quorum: allow ignoring child errors
   NBD client: connect to nbd server later
   Add new block driver interfaces to control block replication
   quorum: implement block driver interfaces for block replication
   NBD client: implement block driver interfaces for block replication
   allow writing to the backing file
   Allow creating backup jobs when opening BDS
   block: Parse backing_reference option to reference existing BDS
   Backup: clear all bitmap when doing block checkpoint
   qcow2: support colo
   skip nbd_target when starting block replication
   Don't allow a disk use backing reference target
 
  block.c| 242 +++-
  block/Makefile.objs|   2 +-
  block/backup.c |  12 ++
  block/nbd.c| 171 +++--
  block/qcow2.c  | 447 
 -
  block/qcow2.h  |   6 +
  block/quorum.c | 143 ++-
  docs/block-replication.txt | 147 +++
  include/block/block.h  |   5 +
  include/block/block_int.h  |  13 ++
  include/qemu/hbitmap.h |   8 +
  qapi/block.json|  16 ++
  tests/qemu-iotests/051 |  13 ++
  tests/qemu-iotests/051.out |  13 ++
  util/hbitmap.c |  19 ++
  15 files changed, 1230 insertions(+), 27 deletions(-)
  create mode 100644 docs/block-replication.txt
 
 -- 
 2.1.0
 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-block] [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints

2015-03-25 Thread Paolo Bonzini


On 25/03/2015 10:36, Wen Congyang wrote:
 Block replication is a very important feature which is used for
 continuous checkpoints(for example: COLO).
 
 Usage:
 Please refer to docs/block-replication.txt
 
 You can get the patch here:
 https://github.com/wencongyang/qemu-colo/commits/block-replication-v2
 
 Changs Log:
 V2:
 1. Redesign the secondary qemu(use image-fleecing)
 2. Use Error objects to return error message
 3. Address the comments from Max Reitz and Eric Blake
 
 Wen Congyang (13):
   docs: block replication's description
   quorum: allow ignoring child errors
   NBD client: connect to nbd server later
   Add new block driver interfaces to control block replication
   quorum: implement block driver interfaces for block replication
   NBD client: implement block driver interfaces for block replication
   allow writing to the backing file
   Allow creating backup jobs when opening BDS
   block: Parse backing_reference option to reference existing BDS
   Backup: clear all bitmap when doing block checkpoint
   qcow2: support colo
   skip nbd_target when starting block replication
   Don't allow a disk use backing reference target
 
  block.c| 242 +++-
  block/Makefile.objs|   2 +-
  block/backup.c |  12 ++
  block/nbd.c| 171 +++--
  block/qcow2.c  | 447 
 -
  block/qcow2.h  |   6 +
  block/quorum.c | 143 ++-
  docs/block-replication.txt | 147 +++
  include/block/block.h  |   5 +
  include/block/block_int.h  |  13 ++
  include/qemu/hbitmap.h |   8 +
  qapi/block.json|  16 ++
  tests/qemu-iotests/051 |  13 ++
  tests/qemu-iotests/051.out |  13 ++
  util/hbitmap.c |  19 ++
  15 files changed, 1230 insertions(+), 27 deletions(-)
  create mode 100644 docs/block-replication.txt
 

Looks nice!

I've reviewed the patches where I'm competent.

Paolo



Re: [Qemu-block] [RFC PATCH COLO v2 10/13] Backup: clear all bitmap when doing block checkpoint

2015-03-25 Thread Paolo Bonzini


On 25/03/2015 10:36, Wen Congyang wrote:
 
 +void backup_do_checkpoint(BlockJob *job, Error **errp)
 +{
 +BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
 +
 +if (job-driver != backup_job_driver) {
 +error_setg(errp, It is not backup job);
 +return;
 +}
 +
 +hbitmap_reset_all(backup_job-bitmap);
 +}

Please add instead a block_job_do_checkpoint API, and a do_checkpoint
function pointer to BlockJobDriver.

 +{
 +#if 0
 +hbitmap_reset(hb, 0, hb-size  hb-granularity);
 +#else
 +uint64_t size = hb-size;
 +unsigned int i;
 +
 +/* Same as hbitmap_alloc() except memset() */
 +for (i = HBITMAP_LEVELS; i--  0; ) {

This can be --i = 1...

 +size = MAX((size + BITS_PER_LONG - 1)  BITS_PER_LEVEL, 1);
 +memset(hb-levels[i], 0, size * sizeof(unsigned long));
 +}
 +
 +assert(size == 1);
 +hb-levels[0][0] |= 1UL  (BITS_PER_LONG - 1);

... if you use = instead of |= here.

 +#endif
 +}

Please pick one implementation (no #if), and also add a testcase to
tests/test-hbitmap.c.

Paolo



Re: [Qemu-block] [Qemu-devel] [PATCH v2 0/6] ahci: rerror/werror=stop resume tests

2015-03-25 Thread John Snow


On 03/10/2015 04:14 PM, John Snow wrote:

This series is based on:
[Qemu-devel] [PATCH 0/2] ahci: test varying sector offsets

There appear to be some upstream issues for iotests 051 and 061,
but this series does not appear to alter the existing bad behavior
of those tests.

This patchset brings us up to feature parity with the ide-test that
was checked in for the rerror/werror migration fixes series.

With the expanded functionality of libqos, we test error injection
and error recovery for the AHCI device.

v1 got bounced due to a prerequisite failing a test during a pull req,
so v2 is nearly unchanged:

v2:
  - Rebased to master
  - Fixed an include issue for patch 5.

John Snow (6):
   qtest/ahci: Add simple flush test
   qtest/ahci: Allow override of default CLI options
   libqtest: add qmp_eventwait
   libqtest: add qmp_async
   libqos: add blkdebug_prepare_script
   qtest/ahci: add flush retry test

  tests/ahci-test.c| 143 ---
  tests/ide-test.c |  34 +--
  tests/libqos/libqos-pc.c |   5 ++
  tests/libqos/libqos-pc.h |   1 +
  tests/libqos/libqos.c|  22 
  tests/libqos/libqos.h|   1 +
  tests/libqtest.c |  46 ++-
  tests/libqtest.h |  47 
  8 files changed, 245 insertions(+), 54 deletions(-)



After fixing the series this depends upon after it was booted from a 2.3 
pullreq for glib issues, I am staging this for 2.4.


Thanks!



Re: [Qemu-block] [RFC PATCH COLO v2 10/13] Backup: clear all bitmap when doing block checkpoint

2015-03-25 Thread Wen Congyang
On 03/25/2015 08:55 PM, Paolo Bonzini wrote:
 
 
 On 25/03/2015 10:36, Wen Congyang wrote:

 +void backup_do_checkpoint(BlockJob *job, Error **errp)
 +{
 +BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
 +
 +if (job-driver != backup_job_driver) {
 +error_setg(errp, It is not backup job);
 +return;
 +}
 +
 +hbitmap_reset_all(backup_job-bitmap);
 +}
 
 Please add instead a block_job_do_checkpoint API, and a do_checkpoint
 function pointer to BlockJobDriver.
 
 +{
 +#if 0
 +hbitmap_reset(hb, 0, hb-size  hb-granularity);
 +#else
 +uint64_t size = hb-size;
 +unsigned int i;
 +
 +/* Same as hbitmap_alloc() except memset() */
 +for (i = HBITMAP_LEVELS; i--  0; ) {
 
 This can be --i = 1...
 
 +size = MAX((size + BITS_PER_LONG - 1)  BITS_PER_LEVEL, 1);
 +memset(hb-levels[i], 0, size * sizeof(unsigned long));
 +}
 +
 +assert(size == 1);
 +hb-levels[0][0] |= 1UL  (BITS_PER_LONG - 1);
 
 ... if you use = instead of |= here.
 
 +#endif
 +}
 
 Please pick one implementation (no #if), and also add a testcase to
 tests/test-hbitmap.c.

OK

Thanks
Wen Congyang

 
 Paolo
 .
 




Re: [Qemu-block] [PATCH] migration: flush the bdrv before stopping VM

2015-03-25 Thread Juan Quintela
Li, Liang Z liang.z...@intel.com wrote:
   Right now, we don't have an interface to detect that cases and got
   back to the iterative stage.
 
  How about go back to the iterative stage when detect that the
  pending_size is larger Than max_size, like this:
 
  +/* do flush here is aimed to shorten the VM downtime,
  + * bdrv_flush_all is a time consuming operation
  + * when the guest has done some file writing */
  +bdrv_flush_all();
  +pending_size = qemu_savevm_state_pending(s-file, 
  max_size);
  +if (pending_size  pending_size = max_size) {
  +qemu_mutex_unlock_iothread();
  +continue;
  +}
ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
if (ret = 0) {
qemu_file_set_rate_limit(s-file, INT64_MAX);
 
  and this is quite simple.
 
  Yes, but it is too simple. If you hold all the locks during
  bdrv_flush_all(), your VM will effectively stop as soon as it performs
  the next I/O access, so you don't win much. And you still don't have a
  timeout for cases where the flush takes really long.
 
 This is probably better than what we had now (basically we are meassuring
 after bdrv_flush_all how much the amount of dirty memory has changed,
 and return to iterative stage if it took too much.  A timeout would be better
 anyways.  And an interface te start the synchronization sooner
 asynchronously would be also good.
 
 Notice that my understanding is that any proper fix for this is 2.4 material.

 Then, how to deal with this issue in 2.3, leave it here? or make an
 incomplete fix like I do above?

I think it is better to leave it here for 2.3. With a patch like this
one, we improve in one load and we got worse in a different load (depens
a lot in the ratio of dirtying memory vs disk).  I have no data which
load is more common, so I prefer to be conservative so late in the
cycle.  What do you think?

Later, Juan.



 Liang

 Thanks, Juan.