Re: [Qemu-block] [PATCH 2/8] block: allow block jobs in any arbitrary node
On 16.04.2015 17:12, Alberto Garcia wrote: Currently, block jobs can only be owned by root nodes. This patch allows block jobs to be in any arbitrary node, by making the following changes: - Block jobs can now be identified by the node name of their BlockDriverState in addition to the device name. Since both device and node names live in the same namespace there's no ambiguity. - The device parameter used by all commands that operate on block jobs can also be a node name now. - The node name is used as a fallback to fill in the BlockJobInfo structure and all BLOCK_JOB_* events if there is no device name for that job. Signed-off-by: Alberto Garcia be...@igalia.com --- block/mirror.c| 5 +++-- blockdev.c| 16 blockjob.c| 18 ++ docs/qmp/qmp-events.txt | 8 include/qapi/qmp/qerror.h | 3 --- qapi/block-core.json | 20 ++-- 6 files changed, 35 insertions(+), 35 deletions(-) Reviewed-by: Max Reitz mre...@redhat.com
Re: [Qemu-block] [PATCH 8/8] qemu-iotests: test streaming to an intermediate layer
On 16.04.2015 17:12, Alberto Garcia wrote: This adds test_stream_intermediate(), similar to test_stream() but streams to the intermediate image instead. Signed-off-by: Alberto Garcia be...@igalia.com --- tests/qemu-iotests/030 | 18 +- tests/qemu-iotests/030.out | 4 ++-- 2 files changed, 19 insertions(+), 3 deletions(-) Reviewed-by: Max Reitz mre...@redhat.com
Re: [Qemu-block] [PATCH v6 01/21] docs: incremental backup documentation
On 04/17/2015 05:49 PM, John Snow wrote: Signed-off-by: John Snow js...@redhat.com --- docs/bitmaps.md | 352 1 file changed, 352 insertions(+) create mode 100644 docs/bitmaps.md Reviewed-by: Eric Blake ebl...@redhat.com -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH 1/8] block: keep a list of block jobs
On 04/16/2015 09:12 AM, Alberto Garcia wrote: The current way to obtain the list of existing block jobs is to iterate over all root nodes and check which ones own a job. Since we want to be able to support block jobs in other nodes as well, this patch keeps a list of jobs that is updated everytime one is s/everytime/every time/ created or destroyed. This also updates qmp_query_block_jobs() to use this new list. Signed-off-by: Alberto Garcia be...@igalia.com Reviewed-by: Max Reitz mre...@redhat.com --- blockdev.c | 19 --- blockjob.c | 13 + include/block/blockjob.h | 14 ++ 3 files changed, 35 insertions(+), 11 deletions(-) Reviewed-by: Eric Blake ebl...@redhat.com -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
[Qemu-block] [PATCH v3 10/10] iotests: 124 - transactional failure test
Use a transaction to request an incremental backup across two drives. Coerce one of the jobs to fail, and then re-run the transaction. Verify that no bitmap data was lost due to the partial transaction failure. Signed-off-by: John Snow js...@redhat.com --- tests/qemu-iotests/124 | 120 - tests/qemu-iotests/124.out | 4 +- 2 files changed, 121 insertions(+), 3 deletions(-) diff --git a/tests/qemu-iotests/124 b/tests/qemu-iotests/124 index 2d50594..772edd4 100644 --- a/tests/qemu-iotests/124 +++ b/tests/qemu-iotests/124 @@ -139,9 +139,12 @@ class TestIncrementalBackup(iotests.QMPTestCase): def do_qmp_backup(self, error='Input/output error', **kwargs): res = self.vm.qmp('drive-backup', **kwargs) self.assert_qmp(res, 'return', {}) +return self.wait_qmp_backup(kwargs['device'], error) + +def wait_qmp_backup(self, device, error='Input/output error'): event = self.vm.event_wait(name=BLOCK_JOB_COMPLETED, - match={'data': {'device': kwargs['device']}}) + match={'data': {'device': device}}) self.assertIsNotNone(event) try: @@ -375,6 +378,121 @@ class TestIncrementalBackup(iotests.QMPTestCase): self.check_backups() +def test_transaction_failure(self): +'''Test: Verify backups made from a transaction that partially fails. + +Add a second drive with its own unique pattern, and add a bitmap to each +drive. Use blkdebug to interfere with the backup on just one drive and +attempt to create a coherent incremental backup across both drives. + +verify a failure in one but not both, then delete the failed stubs and +re-run the same transaction. + +verify that both incrementals are created successfully. +''' + +# Create a second drive, with pattern: +drive1 = self.add_node('drive1') +self.img_create(drive1['file'], drive1['fmt']) +io_write_patterns(drive1['file'], (('0x14', 0, 512), + ('0x5d', '1M', '32k'), + ('0xcd', '32M', '124k'))) + +# Create a blkdebug interface to this img as 'drive1' +result = self.vm.qmp('blockdev-add', options={ +'id': drive1['id'], +'driver': drive1['fmt'], +'file': { +'driver': 'blkdebug', +'image': { +'driver': 'file', +'filename': drive1['file'] +}, +'set-state': [{ +'event': 'flush_to_disk', +'state': 1, +'new_state': 2 +}], +'inject-error': [{ +'event': 'read_aio', +'errno': 5, +'state': 2, +'immediately': False, +'once': True +}], +} +}) +self.assert_qmp(result, 'return', {}) + +# Create bitmaps and full backups for both drives +drive0 = self.drives[0] +dr0bm0 = self.add_bitmap('bitmap0', drive0) +dr1bm0 = self.add_bitmap('bitmap0', drive1) +self.create_anchor_backup(drive0) +self.create_anchor_backup(drive1) +self.assert_no_active_block_jobs() +self.assertFalse(self.vm.get_qmp_events(wait=False)) + +# Emulate some writes +self.hmp_io_writes(drive0['id'], (('0xab', 0, 512), + ('0xfe', '16M', '256k'), + ('0x64', '32736k', '64k'))) +self.hmp_io_writes(drive1['id'], (('0xba', 0, 512), + ('0xef', '16M', '256k'), + ('0x46', '32736k', '64k'))) + +# Create incremental backup targets +target0 = self.prepare_backup(dr0bm0) +target1 = self.prepare_backup(dr1bm0) + +# Ask for a new incremental backup per-each drive, +# expecting drive1's backup to fail: +transaction = [ +transaction_drive_backup(drive0['id'], target0, sync='dirty-bitmap', + format=drive0['fmt'], mode='existing', + bitmap=dr0bm0.name), +transaction_drive_backup(drive1['id'], target1, sync='dirty-bitmap', + format=drive1['fmt'], mode='existing', + bitmap=dr1bm0.name), +] +result = self.vm.qmp('transaction', actions=transaction) +self.assert_qmp(result, 'return', {}) + +# Observe that drive0's backup completes, but drive1's does not. +# Consume drive1's error and ensure all pending actions are completed. +self.assertTrue(self.wait_qmp_backup(drive0['id'])) +
[Qemu-block] [PATCH v3 05/10] block: add transactional callbacks feature
The goal here is to add a new method to transactions that allows developers to specify a callback that will get invoked only once all jobs spawned by a transaction are completed, allowing developers the chance to perform actions conditionally pending complete success, partial failure, or complete failure. In order to register the new callback to be invoked, a user must request a callback pointer and closure by calling new_action_cb_wrapper, which creates a wrapper around an opaque pointer and callback that would have originally been passed to e.g. backup_start(). The function will return a function pointer and a new opaque pointer to be passed instead. The transaction system will effectively intercept the original callbacks and perform book-keeping on the transaction after it has delivered the original enveloped callback. This means that Transaction Action callback methods will be called after all callbacks triggered by all Actions in the Transactional group have been received. This feature has no knowledge of any jobs spawned by Actions that do not inform the system via new_action_cb_wrapper(). For an example of how to use the feature, please skip ahead to: 'block: drive_backup transaction callback support' which serves as an example for how to hook up a post-transaction callback to the Drive Backup action. Note 1: Defining a callback method alone is not sufficient to have the new method invoked. You must call new_action_cb_wrapper() AND ensure the callback it returns is the one used as the callback for the job launched by the action. Note 2: You can use this feature for any system that registers completions of an asynchronous task via a callback of the form (void *opaque, int ret), not just block job callbacks. Signed-off-by: John Snow js...@redhat.com --- blockdev.c | 183 +++-- 1 file changed, 179 insertions(+), 4 deletions(-) diff --git a/blockdev.c b/blockdev.c index 2ab63ed..31ccb1b 100644 --- a/blockdev.c +++ b/blockdev.c @@ -1240,6 +1240,8 @@ typedef struct BlkActionState BlkActionState; * @abort: Abort the changes on fail, can be NULL. * @clean: Clean up resources after all transaction actions have called * commit() or abort(). Can be NULL. + * @cb: Executed after all jobs launched by actions in the transaction finish, + * but only if requested by new_action_cb_wrapper() prior to clean(). * * Only prepare() may fail. In a single transaction, only one of commit() or * abort() will be called. clean() will always be called if it is present. @@ -1250,6 +1252,7 @@ typedef struct BlkActionOps { void (*commit)(BlkActionState *common); void (*abort)(BlkActionState *common); void (*clean)(BlkActionState *common); +void (*cb)(BlkActionState *common); } BlkActionOps; /** @@ -1258,19 +1261,46 @@ typedef struct BlkActionOps { * by a transaction group. * * @jobs: A reference count that tracks how many jobs still need to complete. + * @status: A cumulative return code for all actions that have reported + * a return code via callback in the transaction. * @actions: A list of all Actions in the Transaction. + * However, once the transaction has completed, it will be only a list + * of transactions that have registered a post-transaction callback. */ typedef struct BlkTransactionState { int jobs; +int status; QTAILQ_HEAD(actions, BlkActionState) actions; } BlkTransactionState; +typedef void (CallbackFn)(void *opaque, int ret); + +/** + * BlkActionCallbackData: + * Necessary state for intercepting and + * re-delivering a callback triggered by an Action. + * + * @opaque: The data to be given to the encapsulated callback when + * a job launched by an Action completes. + * @ret: The status code that was delivered to the encapsulated callback. + * @callback: The encapsulated callback to invoke upon completion of + *the Job launched by the Action. + */ +typedef struct BlkActionCallbackData { +void *opaque; +int ret; +CallbackFn *callback; +} BlkActionCallbackData; + /** * BlkActionState: * Describes one Action's state within a Transaction. * * @action: QAPI-defined enum identifying which Action to perform. * @ops: Table of ActionOps this Action can perform. + * @transaction: A pointer back to the Transaction this Action belongs to. + * @cb_data: Information on this Action's encapsulated callback, if any. + * @refcount: reference count, allowing access to this state beyond clean(). * @entry: List membership for all Actions in this Transaction. * * This structure must be arranged as first member in a subclassed type, @@ -1280,6 +1310,9 @@ typedef struct BlkTransactionState { struct BlkActionState { TransactionAction *action; const BlkActionOps *ops; +BlkTransactionState *transaction; +BlkActionCallbackData *cb_data; +int refcount;
[Qemu-block] [PATCH v3 06/10] block: add refcount to Job object
If we want to get at the job after the life of the job, we'll need a refcount for this object. This may occur for example if we wish to inspect the actions taken by a particular job after a transactional group of jobs runs, and further actions are required. Signed-off-by: John Snow js...@redhat.com Reviewed-by: Max Reitz mre...@redhat.com --- blockjob.c | 18 -- include/block/blockjob.h | 21 + 2 files changed, 37 insertions(+), 2 deletions(-) diff --git a/blockjob.c b/blockjob.c index ba2255d..d620082 100644 --- a/blockjob.c +++ b/blockjob.c @@ -35,6 +35,19 @@ #include qemu/timer.h #include qapi-event.h +void block_job_incref(BlockJob *job) +{ +job-refcount++; +} + +void block_job_decref(BlockJob *job) +{ +job-refcount--; +if (job-refcount == 0) { +g_free(job); +} +} + void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs, int64_t speed, BlockCompletionFunc *cb, void *opaque, Error **errp) @@ -57,6 +70,7 @@ void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs, job-cb= cb; job-opaque= opaque; job-busy = true; +job-refcount = 1; bs-job = job; /* Only set speed when necessary to avoid NotSupported error */ @@ -68,7 +82,7 @@ void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs, bs-job = NULL; bdrv_op_unblock_all(bs, job-blocker); error_free(job-blocker); -g_free(job); +block_job_decref(job); error_propagate(errp, local_err); return NULL; } @@ -85,7 +99,7 @@ void block_job_completed(BlockJob *job, int ret) bs-job = NULL; bdrv_op_unblock_all(bs, job-blocker); error_free(job-blocker); -g_free(job); +block_job_decref(job); } void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp) diff --git a/include/block/blockjob.h b/include/block/blockjob.h index b6d4ebb..dcc0596 100644 --- a/include/block/blockjob.h +++ b/include/block/blockjob.h @@ -116,6 +116,9 @@ struct BlockJob { /** The opaque value that is passed to the completion function. */ void *opaque; + +/** A reference count, allowing for post-job actions in e.g. transactions */ +int refcount; }; /** @@ -141,6 +144,24 @@ void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs, void *opaque, Error **errp); /** + * block_job_incref: + * @job: The job to pick up a handle to + * + * Increment the refcount on @job, to be able to use it asynchronously + * from the job it is being used for. Put down the reference when done + * with @block_job_unref. + */ +void block_job_incref(BlockJob *job); + +/** + * block_job_decref: + * @job: The job to unreference and delete. + * + * Decrement the refcount on @job and delete it if there are no more references. + */ +void block_job_decref(BlockJob *job); + +/** * block_job_sleep_ns: * @job: The job that calls the function. * @clock: The clock to sleep on. -- 2.1.0
[Qemu-block] [PATCH v3 02/10] iotests: add transactional incremental backup test
Test simple usage cases for using transactions to create and synchronize incremental backups. Signed-off-by: John Snow js...@redhat.com --- tests/qemu-iotests/124 | 54 ++ tests/qemu-iotests/124.out | 4 ++-- 2 files changed, 56 insertions(+), 2 deletions(-) diff --git a/tests/qemu-iotests/124 b/tests/qemu-iotests/124 index 3ee78cd..2d50594 100644 --- a/tests/qemu-iotests/124 +++ b/tests/qemu-iotests/124 @@ -36,6 +36,23 @@ def try_remove(img): pass +def transaction_action(action, **kwargs): +return { +'type': action, +'data': kwargs +} + + +def transaction_bitmap_clear(node, name, **kwargs): +return transaction_action('block-dirty-bitmap-clear', + node=node, name=name, **kwargs) + + +def transaction_drive_backup(device, target, **kwargs): +return transaction_action('drive-backup', device=device, target=target, + **kwargs) + + class Bitmap: def __init__(self, name, drive): self.name = name @@ -264,6 +281,43 @@ class TestIncrementalBackup(iotests.QMPTestCase): return self.do_incremental_simple(granularity=131072) +def test_incremental_transaction(self): +'''Test: Verify backups made from transactionally created bitmaps. + +Create a bitmap before VM execution begins, then create a second +bitmap AFTER writes have already occurred. Use transactions to create +a full backup and synchronize both bitmaps to this backup. +Create an incremental backup through both bitmaps and verify that +both backups match the current drive0 image. +''' + +drive0 = self.drives[0] +bitmap0 = self.add_bitmap('bitmap0', drive0) +self.hmp_io_writes(drive0['id'], (('0xab', 0, 512), + ('0xfe', '16M', '256k'), + ('0x64', '32736k', '64k'))) +bitmap1 = self.add_bitmap('bitmap1', drive0) + +result = self.vm.qmp('transaction', actions=[ +transaction_bitmap_clear(bitmap0.drive['id'], bitmap0.name), +transaction_bitmap_clear(bitmap1.drive['id'], bitmap1.name), +transaction_drive_backup(drive0['id'], drive0['backup'], + sync='full', format=drive0['fmt']) +]) +self.assert_qmp(result, 'return', {}) +self.wait_until_completed(drive0['id']) +self.files.append(drive0['backup']) + +self.hmp_io_writes(drive0['id'], (('0x9a', 0, 512), + ('0x55', '8M', '352k'), + ('0x78', '15872k', '1M'))) +# Both bitmaps should be correctly in sync. +self.create_incremental(bitmap0) +self.create_incremental(bitmap1) +self.vm.shutdown() +self.check_backups() + + def test_incremental_failure(self): '''Test: Verify backups made after a failure are correct. diff --git a/tests/qemu-iotests/124.out b/tests/qemu-iotests/124.out index 2f7d390..594c16f 100644 --- a/tests/qemu-iotests/124.out +++ b/tests/qemu-iotests/124.out @@ -1,5 +1,5 @@ -... + -- -Ran 7 tests +Ran 8 tests OK -- 2.1.0
[Qemu-block] [PATCH v3 07/10] block: add delayed bitmap successor cleanup
Allow bitmap successors to carry reference counts. We can in a later patch use this ability to clean up the dirty bitmap according to both the individual job's success and the success of all jobs in the transaction group. The code for cleaning up a bitmap is also moved from backup_run to backup_complete. Signed-off-by: John Snow js...@redhat.com Reviewed-by: Max Reitz mre...@redhat.com --- block.c | 65 ++- block/backup.c| 20 ++-- include/block/block.h | 10 3 files changed, 70 insertions(+), 25 deletions(-) diff --git a/block.c b/block.c index b29aafe..0e7308c 100644 --- a/block.c +++ b/block.c @@ -51,6 +51,12 @@ #include windows.h #endif +typedef enum BitmapSuccessorAction { +SUCCESSOR_ACTION_UNDEFINED = 0, +SUCCESSOR_ACTION_ABDICATE, +SUCCESSOR_ACTION_RECLAIM +} BitmapSuccessorAction; + /** * A BdrvDirtyBitmap can be in three possible states: * (1) successor is NULL and disabled is false: full r/w mode @@ -65,6 +71,8 @@ struct BdrvDirtyBitmap { char *name; /* Optional non-empty unique ID */ int64_t size; /* Size of the bitmap (Number of sectors) */ bool disabled; /* Bitmap is read-only */ +int successor_refcount; /* Number of active handles to the successor */ +BitmapSuccessorAction act; /* Action to take on successor upon release */ QLIST_ENTRY(BdrvDirtyBitmap) list; }; @@ -5540,6 +5548,7 @@ int bdrv_dirty_bitmap_create_successor(BlockDriverState *bs, /* Install the successor and freeze the parent */ bitmap-successor = child; +bitmap-successor_refcount = 1; return 0; } @@ -5547,9 +5556,9 @@ int bdrv_dirty_bitmap_create_successor(BlockDriverState *bs, * For a bitmap with a successor, yield our name to the successor, * delete the old bitmap, and return a handle to the new bitmap. */ -BdrvDirtyBitmap *bdrv_dirty_bitmap_abdicate(BlockDriverState *bs, -BdrvDirtyBitmap *bitmap, -Error **errp) +static BdrvDirtyBitmap *bdrv_dirty_bitmap_abdicate(BlockDriverState *bs, + BdrvDirtyBitmap *bitmap, + Error **errp) { char *name; BdrvDirtyBitmap *successor = bitmap-successor; @@ -5574,9 +5583,9 @@ BdrvDirtyBitmap *bdrv_dirty_bitmap_abdicate(BlockDriverState *bs, * we may wish to re-join the parent and child/successor. * The merged parent will be un-frozen, but not explicitly re-enabled. */ -BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap(BlockDriverState *bs, - BdrvDirtyBitmap *parent, - Error **errp) +static BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap(BlockDriverState *bs, + BdrvDirtyBitmap *parent, + Error **errp) { BdrvDirtyBitmap *successor = parent-successor; @@ -5595,6 +5604,50 @@ BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap(BlockDriverState *bs, return parent; } +static BdrvDirtyBitmap *bdrv_free_bitmap_successor(BlockDriverState *bs, + BdrvDirtyBitmap *parent) +{ +assert(!parent-successor_refcount); + +switch (parent-act) { +case SUCCESSOR_ACTION_RECLAIM: +return bdrv_reclaim_dirty_bitmap(bs, parent, NULL); +case SUCCESSOR_ACTION_ABDICATE: +return bdrv_dirty_bitmap_abdicate(bs, parent, NULL); +case SUCCESSOR_ACTION_UNDEFINED: +default: +g_assert_not_reached(); +} +} + +BdrvDirtyBitmap *bdrv_frozen_bitmap_decref(BlockDriverState *bs, + BdrvDirtyBitmap *parent, + int ret) +{ +assert(bdrv_dirty_bitmap_frozen(parent)); +assert(parent-successor); + +if (ret) { +parent-act = SUCCESSOR_ACTION_RECLAIM; +} else if (parent-act != SUCCESSOR_ACTION_RECLAIM) { +parent-act = SUCCESSOR_ACTION_ABDICATE; +} + +parent-successor_refcount--; +if (parent-successor_refcount == 0) { +return bdrv_free_bitmap_successor(bs, parent); +} +return parent; +} + +void bdrv_dirty_bitmap_incref(BdrvDirtyBitmap *parent) +{ +assert(bdrv_dirty_bitmap_frozen(parent)); +assert(parent-successor); + +parent-successor_refcount++; +} + /** * Truncates _all_ bitmaps attached to a BDS. */ diff --git a/block/backup.c b/block/backup.c index a297df6..62f8d2b 100644 --- a/block/backup.c +++ b/block/backup.c @@ -240,6 +240,12 @@ static void backup_complete(BlockJob *job, void *opaque) bdrv_unref(s-target); +if (s-sync_bitmap) { +BdrvDirtyBitmap *bm; +bm = bdrv_frozen_bitmap_decref(job-bs, s-sync_bitmap, data-ret); +assert(bm); +} +
Re: [Qemu-block] [PATCH v6 07/21] hbitmap: add hbitmap_merge
On 04/17/2015 05:49 PM, John Snow wrote: We add a bitmap merge operation to assist in error cases where we wish to combine two bitmaps together. This is algorithmically O(bits) provided HBITMAP_LEVELS remains constant. For a full bitmap on a 64bit machine: sum(bits/64^k, k, 0, HBITMAP_LEVELS) ~= 1.01587 * bits We may be able to improve running speed for particularly sparse bitmaps by using iterators, but the running time for dense maps will be worse. We present the simpler solution first, and we can refine it later if needed. Signed-off-by: John Snow js...@redhat.com --- include/qemu/hbitmap.h | 13 + util/hbitmap.c | 33 + 2 files changed, 46 insertions(+) Reviewed-by: Eric Blake ebl...@redhat.com -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH v6 10/21] qmp: Add support of dirty-bitmap sync mode for drive-backup
On 04/17/2015 05:49 PM, John Snow wrote: For dirty-bitmap sync mode, the block job will iterate through the given dirty bitmap to decide if a sector needs backup (backup all the dirty clusters and skip clean ones), just as allocation conditions of top sync mode. Signed-off-by: Fam Zheng f...@redhat.com Signed-off-by: John Snow js...@redhat.com --- Reviewed-by: Eric Blake ebl...@redhat.com -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH 5/8] block: Add QMP support for streaming to an intermediate layer
On 04/16/2015 09:12 AM, Alberto Garcia wrote: This patch makes the 'device' paramater of the 'block-stream' command s/paramater/parameter/ accept a node name as well as a device name. In addition to that, operation blockers will be checked in all intermediate nodes between the top and the base node. Since qmp_block_stream() now uses the error from bdrv_lookup_bs() and no longer returns DeviceNotFound, iotest 030 is updated to expect GenericError instead. Signed-off-by: Alberto Garcia be...@igalia.com --- blockdev.c | 20 ++-- qapi/block-core.json | 10 +++--- tests/qemu-iotests/030 | 2 +- 3 files changed, 18 insertions(+), 14 deletions(-) Reviewed-by: Eric Blake ebl...@redhat.com -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH COLO v3 01/14] docs: block replication's description
On 22/04/2015 11:31, Kevin Wolf wrote: Actually I liked the foo+colo names. These are just internal details of the implementations and the primary/secondary disks actually can be any format. Stefan, what was your worry with the +colo block drivers? I haven't read the patches yet, so I may be misunderstanding, but wouldn't a separate filter driver be more appropriate than modifying qcow2 with logic that has nothing to do with the image format? Possibly; on the other hand, why multiply the size of the test matrix with options that no one will use and that will bitrot? Paolo
Re: [Qemu-block] [PATCH] qmp: fill in the image field in BlockDeviceInfo
On Fri, Apr 17, 2015 at 02:52:43PM +0300, Alberto Garcia wrote: The image field in BlockDeviceInfo is supposed to contain an ImageInfo object. However that is being filled in by bdrv_query_info(), not by bdrv_block_device_info(), which is where BlockDeviceInfo is actually created. Anyone calling bdrv_block_device_info() directly will get a null image field. As a consequence of this, the HMP command 'info block -n -v' crashes QEMU. This patch moves the code that fills in that field from bdrv_query_info() to bdrv_block_device_info(). Signed-off-by: Alberto Garcia be...@igalia.com --- block.c | 9 +++-- block/qapi.c | 46 +- blockdev.c| 2 +- include/block/block.h | 2 +- include/block/qapi.h | 2 +- 5 files changed, 35 insertions(+), 26 deletions(-) Reverted the QEMU 2.3 fix and replaced it with this commit. Thanks, applied to my block-next tree: https://github.com/stefanha/qemu/commits/block-next Stefan pgpRI7MMF4bxs.pgp Description: PGP signature
Re: [Qemu-block] [PATCH] qcow2: do lazy allocation of the L2 cache
On Tue, Apr 21, 2015 at 06:20:39PM +0300, Alberto Garcia wrote: Large disk images need large L2 caches in order to maximize their I/O performance. However setting a correct size for the cache is not necessarily easy since apart from the image size, it also depends on other factors like its usage patterns or whether it's part of a backing chain. In order to be able to set a very large cache size to cover the worst-case scenarios and yet prevent an unnecessary waste of memory, this patch modifies the qcow2 cache algorithm so the memory for each entry is allocated only when it's actually needed. This also improves the scenarios with smaller images: the current default L2 cache size can map a whole 8GB disk, so those smaller than that are allocating cache memory that can never be used. What measurable improvement does this patch make? Buffers allocated upfront are not touched, so they don't actually consume physical memory. Stefan pgpGJehESII7n.pgp Description: PGP signature
Re: [Qemu-block] [PATCH COLO v3 01/14] docs: block replication's description
On Tue, Apr 21, 2015 at 05:28:01PM +0200, Paolo Bonzini wrote: On 21/04/2015 03:25, Wen Congyang wrote: Please do not introduce name+colo block drivers. This approach is invasive and makes block replication specific to only a few block drivers, e.g. NBD or qcow2. NBD is used to connect to secondary qemu, so it must be used. But the primary qemu uses quorum, so the primary disk can be any format. The secondary disk is nbd target, and it can also be any format. The cache disk(active disk/hidden disk) is an empty disk, and it is created before run COLO. The cache disk format is qcow2 now. In theory, it can be ant format which supports backing file. But the driver should be updated to support colo mode. A cleaner approach is a QMP command or -drive options that work for any BlockDriverState. OK, I will add a new drive option to avoid use name+colo. Actually I liked the foo+colo names. These are just internal details of the implementations and the primary/secondary disks actually can be any format. Stefan, what was your worry with the +colo block drivers? Why does NBD need to know about COLO? It should be possible to use iSCSI or other protocols too. Stefan pgpGr2J18TCwu.pgp Description: PGP signature
Re: [Qemu-block] [PATCH COLO v3 01/14] docs: block replication's description
On 04/22/2015 05:18 PM, Stefan Hajnoczi wrote: On Tue, Apr 21, 2015 at 05:28:01PM +0200, Paolo Bonzini wrote: On 21/04/2015 03:25, Wen Congyang wrote: Please do not introduce name+colo block drivers. This approach is invasive and makes block replication specific to only a few block drivers, e.g. NBD or qcow2. NBD is used to connect to secondary qemu, so it must be used. But the primary qemu uses quorum, so the primary disk can be any format. The secondary disk is nbd target, and it can also be any format. The cache disk(active disk/hidden disk) is an empty disk, and it is created before run COLO. The cache disk format is qcow2 now. In theory, it can be ant format which supports backing file. But the driver should be updated to support colo mode. A cleaner approach is a QMP command or -drive options that work for any BlockDriverState. OK, I will add a new drive option to avoid use name+colo. Actually I liked the foo+colo names. These are just internal details of the implementations and the primary/secondary disks actually can be any format. Stefan, what was your worry with the +colo block drivers? Why does NBD need to know about COLO? It should be possible to use iSCSI or other protocols too. Hmm, if you want to use iSCSI or other protocols, you should update the driver to implement block replication's control interface. Currently, we only support nbd now. Thanks Wen Congyang Stefan
Re: [Qemu-block] [PATCH COLO v3 01/14] docs: block replication's description
On Tue, Apr 21, 2015 at 09:25:59AM +0800, Wen Congyang wrote: On 04/20/2015 11:30 PM, Stefan Hajnoczi wrote: On Fri, Apr 03, 2015 at 06:01:07PM +0800, Wen Congyang wrote: One general question about the design: the Secondary host needs 3x storage space since it has the Secondary Disk, hidden-disk, and active-disk. Each image requires a certain amount of space depending on writes or COW operations. Is 3x the upper bound or is there a way to reduce the bound? active disk and hidden disk are temp file. It will be maked empty in bdrv_do_checkpoint(). Their format is qcow2 now, so it doesn't need too many spaces if we do checkpoint periodically. A question related to checkpoints: both Primary and Secondary are active (running) in COLO. The Secondary will be slower since it performs extra work; disk I/O on the Secondary has a COW overhead. Does this force the Primary to wait for checkpoint commit so that the Secondary can catch up? I'm a little confused about that since the point of COLO is to avoid the overheads of microcheckpointing, but there still seems to be a checkpointing bottleneck for disk I/O-intensive applications. The bound is important since large amounts of data become a bottleneck for writeout/commit operations. They could cause downtime if the guest is blocked until the entire Disk Buffer has been written to the Secondary Disk during failover, for example. OK, I will test it. In my test, vm_stop() will take about 2-3 seconds if I run filebench in the guest. Is there anyway to speed it up? Is it necessary to commit the active disk and hidden disk to the Secondary Disk on failover? Maybe the VM could continue executing immediately and run a block-commit job. The active disk and hidden disk files can be dropped once block-commit finishes. pgpSmtN_bltYK.pgp Description: PGP signature