Re: [Qemu-block] [PATCH 2/8] block: allow block jobs in any arbitrary node

2015-04-22 Thread Max Reitz

On 16.04.2015 17:12, Alberto Garcia wrote:

Currently, block jobs can only be owned by root nodes. This patch
allows block jobs to be in any arbitrary node, by making the following
changes:

- Block jobs can now be identified by the node name of their
   BlockDriverState in addition to the device name. Since both device
   and node names live in the same namespace there's no ambiguity.

- The device parameter used by all commands that operate on block
   jobs can also be a node name now.

- The node name is used as a fallback to fill in the BlockJobInfo
   structure and all BLOCK_JOB_* events if there is no device name for
   that job.

Signed-off-by: Alberto Garcia be...@igalia.com
---
  block/mirror.c|  5 +++--
  blockdev.c| 16 
  blockjob.c| 18 ++
  docs/qmp/qmp-events.txt   |  8 
  include/qapi/qmp/qerror.h |  3 ---
  qapi/block-core.json  | 20 ++--
  6 files changed, 35 insertions(+), 35 deletions(-)


Reviewed-by: Max Reitz mre...@redhat.com



Re: [Qemu-block] [PATCH 8/8] qemu-iotests: test streaming to an intermediate layer

2015-04-22 Thread Max Reitz

On 16.04.2015 17:12, Alberto Garcia wrote:

This adds test_stream_intermediate(), similar to test_stream() but
streams to the intermediate image instead.

Signed-off-by: Alberto Garcia be...@igalia.com
---
  tests/qemu-iotests/030 | 18 +-
  tests/qemu-iotests/030.out |  4 ++--
  2 files changed, 19 insertions(+), 3 deletions(-)


Reviewed-by: Max Reitz mre...@redhat.com



Re: [Qemu-block] [PATCH v6 01/21] docs: incremental backup documentation

2015-04-22 Thread Eric Blake
On 04/17/2015 05:49 PM, John Snow wrote:
 Signed-off-by: John Snow js...@redhat.com
 ---
  docs/bitmaps.md | 352 
 
  1 file changed, 352 insertions(+)
  create mode 100644 docs/bitmaps.md
 

Reviewed-by: Eric Blake ebl...@redhat.com

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 1/8] block: keep a list of block jobs

2015-04-22 Thread Eric Blake
On 04/16/2015 09:12 AM, Alberto Garcia wrote:
 The current way to obtain the list of existing block jobs is to
 iterate over all root nodes and check which ones own a job.
 
 Since we want to be able to support block jobs in other nodes as well,
 this patch keeps a list of jobs that is updated everytime one is

s/everytime/every time/

 created or destroyed.
 
 This also updates qmp_query_block_jobs() to use this new list.
 
 Signed-off-by: Alberto Garcia be...@igalia.com
 Reviewed-by: Max Reitz mre...@redhat.com
 ---
  blockdev.c   | 19 ---
  blockjob.c   | 13 +
  include/block/blockjob.h | 14 ++
  3 files changed, 35 insertions(+), 11 deletions(-)
 

Reviewed-by: Eric Blake ebl...@redhat.com

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-block] [PATCH v3 10/10] iotests: 124 - transactional failure test

2015-04-22 Thread John Snow
Use a transaction to request an incremental backup across two drives.
Coerce one of the jobs to fail, and then re-run the transaction.

Verify that no bitmap data was lost due to the partial transaction
failure.

Signed-off-by: John Snow js...@redhat.com
---
 tests/qemu-iotests/124 | 120 -
 tests/qemu-iotests/124.out |   4 +-
 2 files changed, 121 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/124 b/tests/qemu-iotests/124
index 2d50594..772edd4 100644
--- a/tests/qemu-iotests/124
+++ b/tests/qemu-iotests/124
@@ -139,9 +139,12 @@ class TestIncrementalBackup(iotests.QMPTestCase):
 def do_qmp_backup(self, error='Input/output error', **kwargs):
 res = self.vm.qmp('drive-backup', **kwargs)
 self.assert_qmp(res, 'return', {})
+return self.wait_qmp_backup(kwargs['device'], error)
 
+
+def wait_qmp_backup(self, device, error='Input/output error'):
 event = self.vm.event_wait(name=BLOCK_JOB_COMPLETED,
-   match={'data': {'device': 
kwargs['device']}})
+   match={'data': {'device': device}})
 self.assertIsNotNone(event)
 
 try:
@@ -375,6 +378,121 @@ class TestIncrementalBackup(iotests.QMPTestCase):
 self.check_backups()
 
 
+def test_transaction_failure(self):
+'''Test: Verify backups made from a transaction that partially fails.
+
+Add a second drive with its own unique pattern, and add a bitmap to 
each
+drive. Use blkdebug to interfere with the backup on just one drive and
+attempt to create a coherent incremental backup across both drives.
+
+verify a failure in one but not both, then delete the failed stubs and
+re-run the same transaction.
+
+verify that both incrementals are created successfully.
+'''
+
+# Create a second drive, with pattern:
+drive1 = self.add_node('drive1')
+self.img_create(drive1['file'], drive1['fmt'])
+io_write_patterns(drive1['file'], (('0x14', 0, 512),
+   ('0x5d', '1M', '32k'),
+   ('0xcd', '32M', '124k')))
+
+# Create a blkdebug interface to this img as 'drive1'
+result = self.vm.qmp('blockdev-add', options={
+'id': drive1['id'],
+'driver': drive1['fmt'],
+'file': {
+'driver': 'blkdebug',
+'image': {
+'driver': 'file',
+'filename': drive1['file']
+},
+'set-state': [{
+'event': 'flush_to_disk',
+'state': 1,
+'new_state': 2
+}],
+'inject-error': [{
+'event': 'read_aio',
+'errno': 5,
+'state': 2,
+'immediately': False,
+'once': True
+}],
+}
+})
+self.assert_qmp(result, 'return', {})
+
+# Create bitmaps and full backups for both drives
+drive0 = self.drives[0]
+dr0bm0 = self.add_bitmap('bitmap0', drive0)
+dr1bm0 = self.add_bitmap('bitmap0', drive1)
+self.create_anchor_backup(drive0)
+self.create_anchor_backup(drive1)
+self.assert_no_active_block_jobs()
+self.assertFalse(self.vm.get_qmp_events(wait=False))
+
+# Emulate some writes
+self.hmp_io_writes(drive0['id'], (('0xab', 0, 512),
+  ('0xfe', '16M', '256k'),
+  ('0x64', '32736k', '64k')))
+self.hmp_io_writes(drive1['id'], (('0xba', 0, 512),
+  ('0xef', '16M', '256k'),
+  ('0x46', '32736k', '64k')))
+
+# Create incremental backup targets
+target0 = self.prepare_backup(dr0bm0)
+target1 = self.prepare_backup(dr1bm0)
+
+# Ask for a new incremental backup per-each drive,
+# expecting drive1's backup to fail:
+transaction = [
+transaction_drive_backup(drive0['id'], target0, 
sync='dirty-bitmap',
+ format=drive0['fmt'], mode='existing',
+ bitmap=dr0bm0.name),
+transaction_drive_backup(drive1['id'], target1, 
sync='dirty-bitmap',
+ format=drive1['fmt'], mode='existing',
+ bitmap=dr1bm0.name),
+]
+result = self.vm.qmp('transaction', actions=transaction)
+self.assert_qmp(result, 'return', {})
+
+# Observe that drive0's backup completes, but drive1's does not.
+# Consume drive1's error and ensure all pending actions are completed.
+self.assertTrue(self.wait_qmp_backup(drive0['id']))
+

[Qemu-block] [PATCH v3 05/10] block: add transactional callbacks feature

2015-04-22 Thread John Snow
The goal here is to add a new method to transactions that allows
developers to specify a callback that will get invoked only once
all jobs spawned by a transaction are completed, allowing developers
the chance to perform actions conditionally pending complete success,
partial failure, or complete failure.

In order to register the new callback to be invoked, a user must request
a callback pointer and closure by calling new_action_cb_wrapper, which
creates a wrapper around an opaque pointer and callback that would have
originally been passed to e.g. backup_start().

The function will return a function pointer and a new opaque pointer to
be passed instead. The transaction system will effectively intercept the
original callbacks and perform book-keeping on the transaction after it
has delivered the original enveloped callback.

This means that Transaction Action callback methods will be called after
all callbacks triggered by all Actions in the Transactional group have
been received.

This feature has no knowledge of any jobs spawned by Actions that do not
inform the system via new_action_cb_wrapper().

For an example of how to use the feature, please skip ahead to:
'block: drive_backup transaction callback support' which serves as an example
for how to hook up a post-transaction callback to the Drive Backup action.


Note 1: Defining a callback method alone is not sufficient to have the new
method invoked. You must call new_action_cb_wrapper() AND ensure the
callback it returns is the one used as the callback for the job
launched by the action.

Note 2: You can use this feature for any system that registers completions of
an asynchronous task via a callback of the form
(void *opaque, int ret), not just block job callbacks.

Signed-off-by: John Snow js...@redhat.com
---
 blockdev.c | 183 +++--
 1 file changed, 179 insertions(+), 4 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 2ab63ed..31ccb1b 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1240,6 +1240,8 @@ typedef struct BlkActionState BlkActionState;
  * @abort: Abort the changes on fail, can be NULL.
  * @clean: Clean up resources after all transaction actions have called
  * commit() or abort(). Can be NULL.
+ * @cb: Executed after all jobs launched by actions in the transaction finish,
+ *  but only if requested by new_action_cb_wrapper() prior to clean().
  *
  * Only prepare() may fail. In a single transaction, only one of commit() or
  * abort() will be called. clean() will always be called if it is present.
@@ -1250,6 +1252,7 @@ typedef struct BlkActionOps {
 void (*commit)(BlkActionState *common);
 void (*abort)(BlkActionState *common);
 void (*clean)(BlkActionState *common);
+void (*cb)(BlkActionState *common);
 } BlkActionOps;
 
 /**
@@ -1258,19 +1261,46 @@ typedef struct BlkActionOps {
  * by a transaction group.
  *
  * @jobs: A reference count that tracks how many jobs still need to complete.
+ * @status: A cumulative return code for all actions that have reported
+ *  a return code via callback in the transaction.
  * @actions: A list of all Actions in the Transaction.
+ *   However, once the transaction has completed, it will be only a 
list
+ *   of transactions that have registered a post-transaction callback.
  */
 typedef struct BlkTransactionState {
 int jobs;
+int status;
 QTAILQ_HEAD(actions, BlkActionState) actions;
 } BlkTransactionState;
 
+typedef void (CallbackFn)(void *opaque, int ret);
+
+/**
+ * BlkActionCallbackData:
+ * Necessary state for intercepting and
+ * re-delivering a callback triggered by an Action.
+ *
+ * @opaque: The data to be given to the encapsulated callback when
+ *  a job launched by an Action completes.
+ * @ret: The status code that was delivered to the encapsulated callback.
+ * @callback: The encapsulated callback to invoke upon completion of
+ *the Job launched by the Action.
+ */
+typedef struct BlkActionCallbackData {
+void *opaque;
+int ret;
+CallbackFn *callback;
+} BlkActionCallbackData;
+
 /**
  * BlkActionState:
  * Describes one Action's state within a Transaction.
  *
  * @action: QAPI-defined enum identifying which Action to perform.
  * @ops: Table of ActionOps this Action can perform.
+ * @transaction: A pointer back to the Transaction this Action belongs to.
+ * @cb_data: Information on this Action's encapsulated callback, if any.
+ * @refcount: reference count, allowing access to this state beyond clean().
  * @entry: List membership for all Actions in this Transaction.
  *
  * This structure must be arranged as first member in a subclassed type,
@@ -1280,6 +1310,9 @@ typedef struct BlkTransactionState {
 struct BlkActionState {
 TransactionAction *action;
 const BlkActionOps *ops;
+BlkTransactionState *transaction;
+BlkActionCallbackData *cb_data;
+int refcount;
 

[Qemu-block] [PATCH v3 06/10] block: add refcount to Job object

2015-04-22 Thread John Snow
If we want to get at the job after the life of the job,
we'll need a refcount for this object.

This may occur for example if we wish to inspect the actions
taken by a particular job after a transactional group of jobs
runs, and further actions are required.

Signed-off-by: John Snow js...@redhat.com
Reviewed-by: Max Reitz mre...@redhat.com
---
 blockjob.c   | 18 --
 include/block/blockjob.h | 21 +
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index ba2255d..d620082 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -35,6 +35,19 @@
 #include qemu/timer.h
 #include qapi-event.h
 
+void block_job_incref(BlockJob *job)
+{
+job-refcount++;
+}
+
+void block_job_decref(BlockJob *job)
+{
+job-refcount--;
+if (job-refcount == 0) {
+g_free(job);
+}
+}
+
 void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs,
int64_t speed, BlockCompletionFunc *cb,
void *opaque, Error **errp)
@@ -57,6 +70,7 @@ void *block_job_create(const BlockJobDriver *driver, 
BlockDriverState *bs,
 job-cb= cb;
 job-opaque= opaque;
 job-busy  = true;
+job-refcount  = 1;
 bs-job = job;
 
 /* Only set speed when necessary to avoid NotSupported error */
@@ -68,7 +82,7 @@ void *block_job_create(const BlockJobDriver *driver, 
BlockDriverState *bs,
 bs-job = NULL;
 bdrv_op_unblock_all(bs, job-blocker);
 error_free(job-blocker);
-g_free(job);
+block_job_decref(job);
 error_propagate(errp, local_err);
 return NULL;
 }
@@ -85,7 +99,7 @@ void block_job_completed(BlockJob *job, int ret)
 bs-job = NULL;
 bdrv_op_unblock_all(bs, job-blocker);
 error_free(job-blocker);
-g_free(job);
+block_job_decref(job);
 }
 
 void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index b6d4ebb..dcc0596 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -116,6 +116,9 @@ struct BlockJob {
 
 /** The opaque value that is passed to the completion function.  */
 void *opaque;
+
+/** A reference count, allowing for post-job actions in e.g. transactions 
*/
+int refcount;
 };
 
 /**
@@ -141,6 +144,24 @@ void *block_job_create(const BlockJobDriver *driver, 
BlockDriverState *bs,
void *opaque, Error **errp);
 
 /**
+ * block_job_incref:
+ * @job: The job to pick up a handle to
+ *
+ * Increment the refcount on @job, to be able to use it asynchronously
+ * from the job it is being used for. Put down the reference when done
+ * with @block_job_unref.
+ */
+void block_job_incref(BlockJob *job);
+
+/**
+ * block_job_decref:
+ * @job: The job to unreference and delete.
+ *
+ * Decrement the refcount on @job and delete it if there are no more 
references.
+ */
+void block_job_decref(BlockJob *job);
+
+/**
  * block_job_sleep_ns:
  * @job: The job that calls the function.
  * @clock: The clock to sleep on.
-- 
2.1.0




[Qemu-block] [PATCH v3 02/10] iotests: add transactional incremental backup test

2015-04-22 Thread John Snow
Test simple usage cases for using transactions to create
and synchronize incremental backups.

Signed-off-by: John Snow js...@redhat.com
---
 tests/qemu-iotests/124 | 54 ++
 tests/qemu-iotests/124.out |  4 ++--
 2 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/124 b/tests/qemu-iotests/124
index 3ee78cd..2d50594 100644
--- a/tests/qemu-iotests/124
+++ b/tests/qemu-iotests/124
@@ -36,6 +36,23 @@ def try_remove(img):
 pass
 
 
+def transaction_action(action, **kwargs):
+return {
+'type': action,
+'data': kwargs
+}
+
+
+def transaction_bitmap_clear(node, name, **kwargs):
+return transaction_action('block-dirty-bitmap-clear',
+  node=node, name=name, **kwargs)
+
+
+def transaction_drive_backup(device, target, **kwargs):
+return transaction_action('drive-backup', device=device, target=target,
+  **kwargs)
+
+
 class Bitmap:
 def __init__(self, name, drive):
 self.name = name
@@ -264,6 +281,43 @@ class TestIncrementalBackup(iotests.QMPTestCase):
 return self.do_incremental_simple(granularity=131072)
 
 
+def test_incremental_transaction(self):
+'''Test: Verify backups made from transactionally created bitmaps.
+
+Create a bitmap before VM execution begins, then create a second
+bitmap AFTER writes have already occurred. Use transactions to create
+a full backup and synchronize both bitmaps to this backup.
+Create an incremental backup through both bitmaps and verify that
+both backups match the current drive0 image.
+'''
+
+drive0 = self.drives[0]
+bitmap0 = self.add_bitmap('bitmap0', drive0)
+self.hmp_io_writes(drive0['id'], (('0xab', 0, 512),
+  ('0xfe', '16M', '256k'),
+  ('0x64', '32736k', '64k')))
+bitmap1 = self.add_bitmap('bitmap1', drive0)
+
+result = self.vm.qmp('transaction', actions=[
+transaction_bitmap_clear(bitmap0.drive['id'], bitmap0.name),
+transaction_bitmap_clear(bitmap1.drive['id'], bitmap1.name),
+transaction_drive_backup(drive0['id'], drive0['backup'],
+ sync='full', format=drive0['fmt'])
+])
+self.assert_qmp(result, 'return', {})
+self.wait_until_completed(drive0['id'])
+self.files.append(drive0['backup'])
+
+self.hmp_io_writes(drive0['id'], (('0x9a', 0, 512),
+  ('0x55', '8M', '352k'),
+  ('0x78', '15872k', '1M')))
+# Both bitmaps should be correctly in sync.
+self.create_incremental(bitmap0)
+self.create_incremental(bitmap1)
+self.vm.shutdown()
+self.check_backups()
+
+
 def test_incremental_failure(self):
 '''Test: Verify backups made after a failure are correct.
 
diff --git a/tests/qemu-iotests/124.out b/tests/qemu-iotests/124.out
index 2f7d390..594c16f 100644
--- a/tests/qemu-iotests/124.out
+++ b/tests/qemu-iotests/124.out
@@ -1,5 +1,5 @@
-...
+
 --
-Ran 7 tests
+Ran 8 tests
 
 OK
-- 
2.1.0




[Qemu-block] [PATCH v3 07/10] block: add delayed bitmap successor cleanup

2015-04-22 Thread John Snow
Allow bitmap successors to carry reference counts.

We can in a later patch use this ability to clean up the dirty bitmap
according to both the individual job's success and the success of all
jobs in the transaction group.

The code for cleaning up a bitmap is also moved from backup_run to
backup_complete.

Signed-off-by: John Snow js...@redhat.com
Reviewed-by: Max Reitz mre...@redhat.com
---
 block.c   | 65 ++-
 block/backup.c| 20 ++--
 include/block/block.h | 10 
 3 files changed, 70 insertions(+), 25 deletions(-)

diff --git a/block.c b/block.c
index b29aafe..0e7308c 100644
--- a/block.c
+++ b/block.c
@@ -51,6 +51,12 @@
 #include windows.h
 #endif
 
+typedef enum BitmapSuccessorAction {
+SUCCESSOR_ACTION_UNDEFINED = 0,
+SUCCESSOR_ACTION_ABDICATE,
+SUCCESSOR_ACTION_RECLAIM
+} BitmapSuccessorAction;
+
 /**
  * A BdrvDirtyBitmap can be in three possible states:
  * (1) successor is NULL and disabled is false: full r/w mode
@@ -65,6 +71,8 @@ struct BdrvDirtyBitmap {
 char *name; /* Optional non-empty unique ID */
 int64_t size;   /* Size of the bitmap (Number of sectors) */
 bool disabled;  /* Bitmap is read-only */
+int successor_refcount; /* Number of active handles to the successor */
+BitmapSuccessorAction act;  /* Action to take on successor upon release */
 QLIST_ENTRY(BdrvDirtyBitmap) list;
 };
 
@@ -5540,6 +5548,7 @@ int bdrv_dirty_bitmap_create_successor(BlockDriverState 
*bs,
 
 /* Install the successor and freeze the parent */
 bitmap-successor = child;
+bitmap-successor_refcount = 1;
 return 0;
 }
 
@@ -5547,9 +5556,9 @@ int bdrv_dirty_bitmap_create_successor(BlockDriverState 
*bs,
  * For a bitmap with a successor, yield our name to the successor,
  * delete the old bitmap, and return a handle to the new bitmap.
  */
-BdrvDirtyBitmap *bdrv_dirty_bitmap_abdicate(BlockDriverState *bs,
-BdrvDirtyBitmap *bitmap,
-Error **errp)
+static BdrvDirtyBitmap *bdrv_dirty_bitmap_abdicate(BlockDriverState *bs,
+   BdrvDirtyBitmap *bitmap,
+   Error **errp)
 {
 char *name;
 BdrvDirtyBitmap *successor = bitmap-successor;
@@ -5574,9 +5583,9 @@ BdrvDirtyBitmap 
*bdrv_dirty_bitmap_abdicate(BlockDriverState *bs,
  * we may wish to re-join the parent and child/successor.
  * The merged parent will be un-frozen, but not explicitly re-enabled.
  */
-BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap(BlockDriverState *bs,
-   BdrvDirtyBitmap *parent,
-   Error **errp)
+static BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap(BlockDriverState *bs,
+  BdrvDirtyBitmap *parent,
+  Error **errp)
 {
 BdrvDirtyBitmap *successor = parent-successor;
 
@@ -5595,6 +5604,50 @@ BdrvDirtyBitmap 
*bdrv_reclaim_dirty_bitmap(BlockDriverState *bs,
 return parent;
 }
 
+static BdrvDirtyBitmap *bdrv_free_bitmap_successor(BlockDriverState *bs,
+   BdrvDirtyBitmap *parent)
+{
+assert(!parent-successor_refcount);
+
+switch (parent-act) {
+case SUCCESSOR_ACTION_RECLAIM:
+return bdrv_reclaim_dirty_bitmap(bs, parent, NULL);
+case SUCCESSOR_ACTION_ABDICATE:
+return bdrv_dirty_bitmap_abdicate(bs, parent, NULL);
+case SUCCESSOR_ACTION_UNDEFINED:
+default:
+g_assert_not_reached();
+}
+}
+
+BdrvDirtyBitmap *bdrv_frozen_bitmap_decref(BlockDriverState *bs,
+   BdrvDirtyBitmap *parent,
+   int ret)
+{
+assert(bdrv_dirty_bitmap_frozen(parent));
+assert(parent-successor);
+
+if (ret) {
+parent-act = SUCCESSOR_ACTION_RECLAIM;
+} else if (parent-act != SUCCESSOR_ACTION_RECLAIM) {
+parent-act = SUCCESSOR_ACTION_ABDICATE;
+}
+
+parent-successor_refcount--;
+if (parent-successor_refcount == 0) {
+return bdrv_free_bitmap_successor(bs, parent);
+}
+return parent;
+}
+
+void bdrv_dirty_bitmap_incref(BdrvDirtyBitmap *parent)
+{
+assert(bdrv_dirty_bitmap_frozen(parent));
+assert(parent-successor);
+
+parent-successor_refcount++;
+}
+
 /**
  * Truncates _all_ bitmaps attached to a BDS.
  */
diff --git a/block/backup.c b/block/backup.c
index a297df6..62f8d2b 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -240,6 +240,12 @@ static void backup_complete(BlockJob *job, void *opaque)
 
 bdrv_unref(s-target);
 
+if (s-sync_bitmap) {
+BdrvDirtyBitmap *bm;
+bm = bdrv_frozen_bitmap_decref(job-bs, s-sync_bitmap, data-ret);
+assert(bm);
+}
+
 

Re: [Qemu-block] [PATCH v6 07/21] hbitmap: add hbitmap_merge

2015-04-22 Thread Eric Blake
On 04/17/2015 05:49 PM, John Snow wrote:
 We add a bitmap merge operation to assist in error cases
 where we wish to combine two bitmaps together.
 
 This is algorithmically O(bits) provided HBITMAP_LEVELS remains
 constant. For a full bitmap on a 64bit machine:
 sum(bits/64^k, k, 0, HBITMAP_LEVELS) ~= 1.01587 * bits
 
 We may be able to improve running speed for particularly sparse
 bitmaps by using iterators, but the running time for dense maps
 will be worse.
 
 We present the simpler solution first, and we can refine it later
 if needed.
 
 Signed-off-by: John Snow js...@redhat.com
 ---
  include/qemu/hbitmap.h | 13 +
  util/hbitmap.c | 33 +
  2 files changed, 46 insertions(+)

Reviewed-by: Eric Blake ebl...@redhat.com

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v6 10/21] qmp: Add support of dirty-bitmap sync mode for drive-backup

2015-04-22 Thread Eric Blake
On 04/17/2015 05:49 PM, John Snow wrote:
 For dirty-bitmap sync mode, the block job will iterate through the
 given dirty bitmap to decide if a sector needs backup (backup all the
 dirty clusters and skip clean ones), just as allocation conditions of
 top sync mode.
 
 Signed-off-by: Fam Zheng f...@redhat.com
 Signed-off-by: John Snow js...@redhat.com
 ---

Reviewed-by: Eric Blake ebl...@redhat.com

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 5/8] block: Add QMP support for streaming to an intermediate layer

2015-04-22 Thread Eric Blake
On 04/16/2015 09:12 AM, Alberto Garcia wrote:
 This patch makes the 'device' paramater of the 'block-stream' command

s/paramater/parameter/

 accept a node name as well as a device name.
 
 In addition to that, operation blockers will be checked in all
 intermediate nodes between the top and the base node.
 
 Since qmp_block_stream() now uses the error from bdrv_lookup_bs() and
 no longer returns DeviceNotFound, iotest 030 is updated to expect
 GenericError instead.
 
 Signed-off-by: Alberto Garcia be...@igalia.com
 ---
  blockdev.c | 20 ++--
  qapi/block-core.json   | 10 +++---
  tests/qemu-iotests/030 |  2 +-
  3 files changed, 18 insertions(+), 14 deletions(-)
 

Reviewed-by: Eric Blake ebl...@redhat.com

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH COLO v3 01/14] docs: block replication's description

2015-04-22 Thread Paolo Bonzini


On 22/04/2015 11:31, Kevin Wolf wrote:
 Actually I liked the foo+colo names.

 These are just internal details of the implementations and the
 primary/secondary disks actually can be any format.

 Stefan, what was your worry with the +colo block drivers?
 
 I haven't read the patches yet, so I may be misunderstanding, but
 wouldn't a separate filter driver be more appropriate than modifying
 qcow2 with logic that has nothing to do with the image format?

Possibly; on the other hand, why multiply the size of the test matrix
with options that no one will use and that will bitrot?

Paolo



Re: [Qemu-block] [PATCH] qmp: fill in the image field in BlockDeviceInfo

2015-04-22 Thread Stefan Hajnoczi
On Fri, Apr 17, 2015 at 02:52:43PM +0300, Alberto Garcia wrote:
 The image field in BlockDeviceInfo is supposed to contain an ImageInfo
 object. However that is being filled in by bdrv_query_info(), not by
 bdrv_block_device_info(), which is where BlockDeviceInfo is actually
 created.
 
 Anyone calling bdrv_block_device_info() directly will get a null image
 field. As a consequence of this, the HMP command 'info block -n -v'
 crashes QEMU.
 
 This patch moves the code that fills in that field from
 bdrv_query_info() to bdrv_block_device_info().
 
 Signed-off-by: Alberto Garcia be...@igalia.com
 ---
  block.c   |  9 +++--
  block/qapi.c  | 46 +-
  blockdev.c|  2 +-
  include/block/block.h |  2 +-
  include/block/qapi.h  |  2 +-
  5 files changed, 35 insertions(+), 26 deletions(-)

Reverted the QEMU 2.3 fix and replaced it with this commit.

Thanks, applied to my block-next tree:
https://github.com/stefanha/qemu/commits/block-next

Stefan


pgpRI7MMF4bxs.pgp
Description: PGP signature


Re: [Qemu-block] [PATCH] qcow2: do lazy allocation of the L2 cache

2015-04-22 Thread Stefan Hajnoczi
On Tue, Apr 21, 2015 at 06:20:39PM +0300, Alberto Garcia wrote:
 Large disk images need large L2 caches in order to maximize their
 I/O performance. However setting a correct size for the cache is not
 necessarily easy since apart from the image size, it also depends
 on other factors like its usage patterns or whether it's part of a
 backing chain.
 
 In order to be able to set a very large cache size to cover the
 worst-case scenarios and yet prevent an unnecessary waste of memory,
 this patch modifies the qcow2 cache algorithm so the memory for each
 entry is allocated only when it's actually needed.
 
 This also improves the scenarios with smaller images: the current
 default L2 cache size can map a whole 8GB disk, so those smaller than
 that are allocating cache memory that can never be used.

What measurable improvement does this patch make?

Buffers allocated upfront are not touched, so they don't actually
consume physical memory.

Stefan


pgpGJehESII7n.pgp
Description: PGP signature


Re: [Qemu-block] [PATCH COLO v3 01/14] docs: block replication's description

2015-04-22 Thread Stefan Hajnoczi
On Tue, Apr 21, 2015 at 05:28:01PM +0200, Paolo Bonzini wrote:
 On 21/04/2015 03:25, Wen Congyang wrote:
   Please do not introduce name+colo block drivers.  This approach is
   invasive and makes block replication specific to only a few block
   drivers, e.g. NBD or qcow2.
  NBD is used to connect to secondary qemu, so it must be used. But the 
  primary
  qemu uses quorum, so the primary disk can be any format.
  The secondary disk is nbd target, and it can also be any format. The cache
  disk(active disk/hidden disk) is an empty disk, and it is created before run
  COLO. The cache disk format is qcow2 now. In theory, it can be ant format 
  which
  supports backing file. But the driver should be updated to support colo 
  mode.
  
   A cleaner approach is a QMP command or -drive options that work for any
   BlockDriverState.
  
  OK, I will add a new drive option to avoid use name+colo.
 
 Actually I liked the foo+colo names.
 
 These are just internal details of the implementations and the
 primary/secondary disks actually can be any format.
 
 Stefan, what was your worry with the +colo block drivers?

Why does NBD need to know about COLO?  It should be possible to use
iSCSI or other protocols too.

Stefan


pgpGr2J18TCwu.pgp
Description: PGP signature


Re: [Qemu-block] [PATCH COLO v3 01/14] docs: block replication's description

2015-04-22 Thread Wen Congyang
On 04/22/2015 05:18 PM, Stefan Hajnoczi wrote:
 On Tue, Apr 21, 2015 at 05:28:01PM +0200, Paolo Bonzini wrote:
 On 21/04/2015 03:25, Wen Congyang wrote:
 Please do not introduce name+colo block drivers.  This approach is
 invasive and makes block replication specific to only a few block
 drivers, e.g. NBD or qcow2.
 NBD is used to connect to secondary qemu, so it must be used. But the 
 primary
 qemu uses quorum, so the primary disk can be any format.
 The secondary disk is nbd target, and it can also be any format. The cache
 disk(active disk/hidden disk) is an empty disk, and it is created before run
 COLO. The cache disk format is qcow2 now. In theory, it can be ant format 
 which
 supports backing file. But the driver should be updated to support colo 
 mode.

 A cleaner approach is a QMP command or -drive options that work for any
 BlockDriverState.

 OK, I will add a new drive option to avoid use name+colo.

 Actually I liked the foo+colo names.

 These are just internal details of the implementations and the
 primary/secondary disks actually can be any format.

 Stefan, what was your worry with the +colo block drivers?
 
 Why does NBD need to know about COLO?  It should be possible to use
 iSCSI or other protocols too.

Hmm, if you want to use iSCSI or other protocols, you should update the driver
to implement block replication's control interface.

Currently, we only support nbd now.

Thanks
Wen Congyang

 
 Stefan
 




Re: [Qemu-block] [PATCH COLO v3 01/14] docs: block replication's description

2015-04-22 Thread Stefan Hajnoczi
On Tue, Apr 21, 2015 at 09:25:59AM +0800, Wen Congyang wrote:
 On 04/20/2015 11:30 PM, Stefan Hajnoczi wrote:
  On Fri, Apr 03, 2015 at 06:01:07PM +0800, Wen Congyang wrote:
  One general question about the design: the Secondary host needs 3x
  storage space since it has the Secondary Disk, hidden-disk, and
  active-disk.  Each image requires a certain amount of space depending on
  writes or COW operations.  Is 3x the upper bound or is there a way to
  reduce the bound?
 
 active disk and hidden disk are temp file. It will be maked empty in
 bdrv_do_checkpoint(). Their format is qcow2 now, so it doesn't need too
 many spaces if we do checkpoint periodically.

A question related to checkpoints: both Primary and Secondary are active
(running) in COLO.  The Secondary will be slower since it performs extra
work; disk I/O on the Secondary has a COW overhead.

Does this force the Primary to wait for checkpoint commit so that the
Secondary can catch up?

I'm a little confused about that since the point of COLO is to avoid the
overheads of microcheckpointing, but there still seems to be a
checkpointing bottleneck for disk I/O-intensive applications.

  
  The bound is important since large amounts of data become a bottleneck
  for writeout/commit operations.  They could cause downtime if the guest
  is blocked until the entire Disk Buffer has been written to the
  Secondary Disk during failover, for example.
 
 OK, I will test it. In my test, vm_stop() will take about 2-3 seconds if
 I run filebench in the guest. Is there anyway to speed it up?

Is it necessary to commit the active disk and hidden disk to the
Secondary Disk on failover?  Maybe the VM could continue executing
immediately and run a block-commit job.  The active disk and hidden disk
files can be dropped once block-commit finishes.


pgpSmtN_bltYK.pgp
Description: PGP signature