Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration
On 01/17/2015 12:17 PM, Vladimir Sementsov-Ogievskiy wrote: Best regards, Vladimir On 09.01.2015 01:05, John Snow wrote: CCing migration maintainers, feedback otherwise in-line. On 12/11/2014 09:17 AM, Vladimir Sementsov-Ogievskiy wrote: Just migrate parts of dirty bitmaps, corresponding to the block being migrated. Also, skip block migration if it is disabled (blk parameter of migrate command is false). Skipping shared sectors: bitmaps are migrated independently of this, just send blk without block data. Signed-off-by: Vladimir Sementsov-Ogievskiy vsement...@parallels.com --- In terms of general approach, migrating the dirty bitmap alongside the blocks it describes makes sense when migrating both. Is this a lot of overhead when that's not the case, though? If we utilize the bitmap only pathways added here, don't we iterate through the savevm handlers a lot to only transfer very little data per section? If we really need migration of bitmaps apart from the data they describe, is it not worth developing a more condensed transfer mechanism to get more bitmap data per iteration, instead of just one block's worth? The first stage of migration can be easily optimized in this case - just take a larger bitmap pieces. For the second stage, it's not as simple. If we will take larger pieces on each step - we will just increase dirty data transfer. One approach is to maintain dirty-region - two numbers, representing the region of set bits in migration dirty bitmap, and send data only when this region is large enough. block-migration.c | 173 +++--- savevm.c | 1 + 2 files changed, 154 insertions(+), 20 deletions(-) diff --git a/block-migration.c b/block-migration.c index 908a66d..95d54a1 100644 --- a/block-migration.c +++ b/block-migration.c @@ -32,6 +32,8 @@ #define BLK_MIG_FLAG_EOS0x02 #define BLK_MIG_FLAG_PROGRESS 0x04 #define BLK_MIG_FLAG_ZERO_BLOCK 0x08 +#define BLK_MIG_FLAG_HAS_BLOCK 0x10 +#define BLK_MIG_FLAG_HAS_BITMAPS0x20 OK: As a result of allowing bitmaps to be migrated without the block data itself, we now must acknowledge the concept that we can send either block data, bitmaps, both, or neither -- so new defines are added here to indicate what data can be found in the section following. #define MAX_IS_ALLOCATED_SEARCH 65536 @@ -51,6 +53,8 @@ typedef struct BlkMigDevState { int shared_base; int64_t total_sectors; QSIMPLEQ_ENTRY(BlkMigDevState) entry; +int nr_bitmaps; +BdrvDirtyBitmap **dirty_bitmaps; /* Only used by migration thread. Does not need a lock. */ int bulk_completed; @@ -64,6 +68,11 @@ typedef struct BlkMigDevState { Error *blocker; } BlkMigDevState; +typedef struct BlkMigDirtyBitmap { +BdrvDirtyBitmap *bitmap; +uint8_t *buf; +} BlkMigDirtyBitmap; + typedef struct BlkMigBlock { /* Only used by migration thread. */ uint8_t *buf; @@ -74,6 +83,9 @@ typedef struct BlkMigBlock { QEMUIOVector qiov; BlockAIOCB *aiocb; +int nr_bitmaps; +BlkMigDirtyBitmap *dirty_bitmaps; + /* Protected by block migration lock. */ int ret; QSIMPLEQ_ENTRY(BlkMigBlock) entry; @@ -83,6 +95,7 @@ typedef struct BlkMigState { /* Written during setup phase. Can be read without a lock. */ int blk_enable; int shared_base; +int dbm_enable; Similar to the feedback in a previous patch, we may not want to use 'dbm' to mean Dirty Bitmap because we have not been applying the abbreviation consistently. For now, the recommendation from stefan is to use the full bdrv_dirty_bitmap or dirty_bitmap in function names. If we do want an acronym to refer to this particular type of dirty bitmap, we should apply it consistently to all functions that work with the BdrvDirtyBitmap type. For now, bdrv_dirty_bitmap_enable should suffice, even though it's a bit wordy. It's a flag, that enables the migration of dirty bitmaps, like blk_enable enables the migrtion of blocks.. Then, should it be dirty_bitmap_migration_enable? Isn't it too long for a simple flag variable? Yeah, I see your point. I think the only difference is that blk is used very widely throughout the code, but this is the first occurrence of dbm. even just bitmap_enable would be sufficiently more self-evident than dbm_enable, unless we rework all of the BdrvDirtyBitmap functions to use the dbm abbreviation. I'd also be happy with migrate_bitmaps which is fairly self-explanatory, too. QSIMPLEQ_HEAD(bmds_list, BlkMigDevState) bmds_list; int64_t total_sector_sum; bool zero_blocks; @@ -116,27 +129,63 @@ static void blk_mig_unlock(void) /* Only allocating and initializing structure fields, not copying any data. */ static BlkMigBlock *blk_create(BlkMigDevState *bmds, int64_t sector, -int nr_sectors) + int
Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration
Best regards, Vladimir On 09.01.2015 01:05, John Snow wrote: CCing migration maintainers, feedback otherwise in-line. On 12/11/2014 09:17 AM, Vladimir Sementsov-Ogievskiy wrote: Just migrate parts of dirty bitmaps, corresponding to the block being migrated. Also, skip block migration if it is disabled (blk parameter of migrate command is false). Skipping shared sectors: bitmaps are migrated independently of this, just send blk without block data. Signed-off-by: Vladimir Sementsov-Ogievskiy vsement...@parallels.com --- In terms of general approach, migrating the dirty bitmap alongside the blocks it describes makes sense when migrating both. Is this a lot of overhead when that's not the case, though? If we utilize the bitmap only pathways added here, don't we iterate through the savevm handlers a lot to only transfer very little data per section? If we really need migration of bitmaps apart from the data they describe, is it not worth developing a more condensed transfer mechanism to get more bitmap data per iteration, instead of just one block's worth? The first stage of migration can be easily optimized in this case - just take a larger bitmap pieces. For the second stage, it's not as simple. If we will take larger pieces on each step - we will just increase dirty data transfer. One approach is to maintain dirty-region - two numbers, representing the region of set bits in migration dirty bitmap, and send data only when this region is large enough. block-migration.c | 173 +++--- savevm.c | 1 + 2 files changed, 154 insertions(+), 20 deletions(-) diff --git a/block-migration.c b/block-migration.c index 908a66d..95d54a1 100644 --- a/block-migration.c +++ b/block-migration.c @@ -32,6 +32,8 @@ #define BLK_MIG_FLAG_EOS0x02 #define BLK_MIG_FLAG_PROGRESS 0x04 #define BLK_MIG_FLAG_ZERO_BLOCK 0x08 +#define BLK_MIG_FLAG_HAS_BLOCK 0x10 +#define BLK_MIG_FLAG_HAS_BITMAPS0x20 OK: As a result of allowing bitmaps to be migrated without the block data itself, we now must acknowledge the concept that we can send either block data, bitmaps, both, or neither -- so new defines are added here to indicate what data can be found in the section following. #define MAX_IS_ALLOCATED_SEARCH 65536 @@ -51,6 +53,8 @@ typedef struct BlkMigDevState { int shared_base; int64_t total_sectors; QSIMPLEQ_ENTRY(BlkMigDevState) entry; +int nr_bitmaps; +BdrvDirtyBitmap **dirty_bitmaps; /* Only used by migration thread. Does not need a lock. */ int bulk_completed; @@ -64,6 +68,11 @@ typedef struct BlkMigDevState { Error *blocker; } BlkMigDevState; +typedef struct BlkMigDirtyBitmap { +BdrvDirtyBitmap *bitmap; +uint8_t *buf; +} BlkMigDirtyBitmap; + typedef struct BlkMigBlock { /* Only used by migration thread. */ uint8_t *buf; @@ -74,6 +83,9 @@ typedef struct BlkMigBlock { QEMUIOVector qiov; BlockAIOCB *aiocb; +int nr_bitmaps; +BlkMigDirtyBitmap *dirty_bitmaps; + /* Protected by block migration lock. */ int ret; QSIMPLEQ_ENTRY(BlkMigBlock) entry; @@ -83,6 +95,7 @@ typedef struct BlkMigState { /* Written during setup phase. Can be read without a lock. */ int blk_enable; int shared_base; +int dbm_enable; Similar to the feedback in a previous patch, we may not want to use 'dbm' to mean Dirty Bitmap because we have not been applying the abbreviation consistently. For now, the recommendation from stefan is to use the full bdrv_dirty_bitmap or dirty_bitmap in function names. If we do want an acronym to refer to this particular type of dirty bitmap, we should apply it consistently to all functions that work with the BdrvDirtyBitmap type. For now, bdrv_dirty_bitmap_enable should suffice, even though it's a bit wordy. It's a flag, that enables the migration of dirty bitmaps, like blk_enable enables the migrtion of blocks.. Then, should it be dirty_bitmap_migration_enable? Isn't it too long for a simple flag variable? QSIMPLEQ_HEAD(bmds_list, BlkMigDevState) bmds_list; int64_t total_sector_sum; bool zero_blocks; @@ -116,27 +129,63 @@ static void blk_mig_unlock(void) /* Only allocating and initializing structure fields, not copying any data. */ static BlkMigBlock *blk_create(BlkMigDevState *bmds, int64_t sector, -int nr_sectors) + int nr_sectors, bool only_bitmaps) { +int i; BlkMigBlock *blk = g_new(BlkMigBlock, 1); -blk-buf = g_malloc(BLOCK_SIZE); +blk-buf = only_bitmaps ? NULL : g_malloc(BLOCK_SIZE); blk-bmds = bmds; blk-sector = sector; blk-nr_sectors = nr_sectors; +blk-dirty_bitmaps = NULL; +blk-nr_bitmaps = 0; blk-iov.iov_base = blk-buf; blk-iov.iov_len = nr_sectors * BDRV_SECTOR_SIZE;
Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration
On 12.01.2015 17:42, Paolo Bonzini wrote: On 12/01/2015 15:20, Vladimir Sementsov-Ogievskiy wrote: On 09.01.2015 01:36, Paolo Bonzini wrote: The bitmaps are transmitted many times in their entirety, but only the last copy actually means something. The others are lost. This means you should use the non-live interface (register_savevm). This will simplify the code a lot. But what if the dirty bitmap is really big? For example, for 5 Pb drive the bitmap with granularity 65536 will be of 2 Gb size. So, shouldn't it be migrated iteratively for live migration? But your code is not attempting to do that. It is not attempting to track the dirtiness of the dirty bitmap, so to speak. For such a huge storage, in any case, I suspect the solution is to not use QEMU's live operations and, instead, operate at the storage level. Paolo On 12.01.2015 22:09, Paolo Bonzini wrote: On 12/01/2015 18:31, John Snow wrote: How do you track which parts of the disk have been acted upon (e.g. mirrored in the case of the drive-mirror command), so that they have become 0? Hm... so your question here is, What happens if the named bitmap you are transferring gets reset to 0 during migration? Will anybody track this change? As far as I know, the only consumer of _named_ BdrvDirtyBitmaps is block/backup.c, and we do not support it for mirror (yet?) Right. So can a block-backup job be running WHILE we migrate? I think so. Paolo There is no dirty bitmap for dirty bitmap in my solution, actually block migration dirty bitmap is used for this. It provides tracking of setting bitmap bits to 1 (when corresponding blocks are written), and if the bitmap changes only is this way (no other block-jobs are working with the migrated bitmap) - then the bitmap will be migrated iteratively as it is. Such approach doesn't provide tracking of resetting the bitmap to 0, and such changes are not migrated. So if the bitmap is in use by any block-job, it may have additional set bits after migration. If we want to migrate more precisely we need separate dirty bitmap for every migrated bitmap. And tracking of this dirty-dirty bitmap should be injected into BdrvDirtyBitmap interface. But do we really need precise migration of the bitmap being in use by some block-job (backup for example)? In this case we should migrate the backup process to, migrate working block-job.. Is it possible and is it planned to be implemented? Best regards, Vladimir
Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration
On 01/13/2015 04:16 AM, Vladimir Sementsov-Ogievskiy wrote: On 12.01.2015 17:42, Paolo Bonzini wrote: On 12/01/2015 15:20, Vladimir Sementsov-Ogievskiy wrote: On 09.01.2015 01:36, Paolo Bonzini wrote: The bitmaps are transmitted many times in their entirety, but only the last copy actually means something. The others are lost. This means you should use the non-live interface (register_savevm). This will simplify the code a lot. But what if the dirty bitmap is really big? For example, for 5 Pb drive the bitmap with granularity 65536 will be of 2 Gb size. So, shouldn't it be migrated iteratively for live migration? But your code is not attempting to do that. It is not attempting to track the dirtiness of the dirty bitmap, so to speak. For such a huge storage, in any case, I suspect the solution is to not use QEMU's live operations and, instead, operate at the storage level. Paolo On 12.01.2015 22:09, Paolo Bonzini wrote: On 12/01/2015 18:31, John Snow wrote: How do you track which parts of the disk have been acted upon (e.g. mirrored in the case of the drive-mirror command), so that they have become 0? Hm... so your question here is, What happens if the named bitmap you are transferring gets reset to 0 during migration? Will anybody track this change? As far as I know, the only consumer of _named_ BdrvDirtyBitmaps is block/backup.c, and we do not support it for mirror (yet?) Right. So can a block-backup job be running WHILE we migrate? I think so. Paolo There is no dirty bitmap for dirty bitmap in my solution, actually block migration dirty bitmap is used for this. It provides tracking of setting bitmap bits to 1 (when corresponding blocks are written), and if the bitmap changes only is this way (no other block-jobs are working with the migrated bitmap) - then the bitmap will be migrated iteratively as it is. Such approach doesn't provide tracking of resetting the bitmap to 0, and such changes are not migrated. So if the bitmap is in use by any block-job, it may have additional set bits after migration. If we want to migrate more precisely we need separate dirty bitmap for every migrated bitmap. And tracking of this dirty-dirty bitmap should be injected into BdrvDirtyBitmap interface. But do we really need precise migration of the bitmap being in use by some block-job (backup for example)? In this case we should migrate the backup process to, migrate working block-job.. Is it possible and is it planned to be implemented? Best regards, Vladimir This is my question, too. The approach presented here will at least work correctly, as far as I can tell. The only problem may arise if we try to migrate while doing an incremental backup which will lead to too many dirty bits being migrated, which is just an inefficiency. I'm not convinced it's important ... Stefan, any thoughts?
Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration
Best regards, Vladimir On 09.01.2015 01:36, Paolo Bonzini wrote: The bitmaps are transmitted many times in their entirety, but only the last copy actually means something. The others are lost. This means you should use the non-live interface (register_savevm). This will simplify the code a lot. But what if the dirty bitmap is really big? For example, for 5 Pb drive the bitmap with granularity 65536 will be of 2 Gb size. So, shouldn't it be migrated iteratively for live migration?
Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration
On 12/01/2015 15:20, Vladimir Sementsov-Ogievskiy wrote: On 09.01.2015 01:36, Paolo Bonzini wrote: The bitmaps are transmitted many times in their entirety, but only the last copy actually means something. The others are lost. This means you should use the non-live interface (register_savevm). This will simplify the code a lot. But what if the dirty bitmap is really big? For example, for 5 Pb drive the bitmap with granularity 65536 will be of 2 Gb size. So, shouldn't it be migrated iteratively for live migration? But your code is not attempting to do that. It is not attempting to track the dirtiness of the dirty bitmap, so to speak. For such a huge storage, in any case, I suspect the solution is to not use QEMU's live operations and, instead, operate at the storage level. Paolo
Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration
On 12/01/2015 18:31, John Snow wrote: How do you track which parts of the disk have been acted upon (e.g. mirrored in the case of the drive-mirror command), so that they have become 0? Hm... so your question here is, What happens if the named bitmap you are transferring gets reset to 0 during migration? Will anybody track this change? As far as I know, the only consumer of _named_ BdrvDirtyBitmaps is block/backup.c, and we do not support it for mirror (yet?) Right. So can a block-backup job be running WHILE we migrate? I think so. Paolo
Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration
On 01/12/2015 11:48 AM, Paolo Bonzini wrote: On 11/12/2014 15:17, Vladimir Sementsov-Ogievskiy wrote: Just migrate parts of dirty bitmaps, corresponding to the block being migrated. Also, skip block migration if it is disabled (blk parameter of migrate command is false). So I have misread the patch. blk here: +static void blk_store_bitmaps(BlkMigBlock *blk) +{ +int i; +for (i = 0; i blk-nr_bitmaps; ++i) { +bdrv_dbm_store_data(blk-dirty_bitmaps[i].bitmap, +blk-dirty_bitmaps[i].buf, +blk-sector, blk-nr_sectors); +} refers to a BlkMigBlock, not a BlockBackend. Still, I need an explanation of the operation of this patch. During the bulk phase you migrate all of the dirty bitmap. This is okay. Then, during the iterative phase you mgirate parts of the dirty bitmap corresponding to sectors that are dirty for block migration. This means parts of the dirty bitmap that have become 1. How do you track which parts of the disk have been acted upon (e.g. mirrored in the case of the drive-mirror command), so that they have become 0? Hm... so your question here is, What happens if the named bitmap you are transferring gets reset to 0 during migration? Will anybody track this change? As far as I know, the only consumer of _named_ BdrvDirtyBitmaps is block/backup.c, and we do not support it for mirror (yet?) So can a block-backup job be running WHILE we migrate? Otherwise, I don't think this situation (untracked bitmap resets) will arise here. This, strictly speaking, is conservative so it is not incorrect, but it basically throws away all the work done by the block job during the block migration---which might take several minutes. Is a non-live solution really that bad? If it is, and it is not feasible to just migrate the bitmaps non-live, can we do better to migrate bits of the dirty bitmap that have become 0? Paolo Skipping shared sectors: bitmaps are migrated independently of this, just send blk without block data. Signed-off-by: Vladimir Sementsov-Ogievskiy vsement...@parallels.com --- block-migration.c | 173 +++--- savevm.c | 1 + 2 files changed, 154 insertions(+), 20 deletions(-) diff --git a/block-migration.c b/block-migration.c index 908a66d..95d54a1 100644 --- a/block-migration.c +++ b/block-migration.c @@ -32,6 +32,8 @@ #define BLK_MIG_FLAG_EOS0x02 #define BLK_MIG_FLAG_PROGRESS 0x04 #define BLK_MIG_FLAG_ZERO_BLOCK 0x08 +#define BLK_MIG_FLAG_HAS_BLOCK 0x10 +#define BLK_MIG_FLAG_HAS_BITMAPS0x20 #define MAX_IS_ALLOCATED_SEARCH 65536 @@ -51,6 +53,8 @@ typedef struct BlkMigDevState { int shared_base; int64_t total_sectors; QSIMPLEQ_ENTRY(BlkMigDevState) entry; +int nr_bitmaps; +BdrvDirtyBitmap **dirty_bitmaps; /* Only used by migration thread. Does not need a lock. */ int bulk_completed; @@ -64,6 +68,11 @@ typedef struct BlkMigDevState { Error *blocker; } BlkMigDevState; +typedef struct BlkMigDirtyBitmap { +BdrvDirtyBitmap *bitmap; +uint8_t *buf; +} BlkMigDirtyBitmap; + typedef struct BlkMigBlock { /* Only used by migration thread. */ uint8_t *buf; @@ -74,6 +83,9 @@ typedef struct BlkMigBlock { QEMUIOVector qiov; BlockAIOCB *aiocb; +int nr_bitmaps; +BlkMigDirtyBitmap *dirty_bitmaps; + /* Protected by block migration lock. */ int ret; QSIMPLEQ_ENTRY(BlkMigBlock) entry; @@ -83,6 +95,7 @@ typedef struct BlkMigState { /* Written during setup phase. Can be read without a lock. */ int blk_enable; int shared_base; +int dbm_enable; QSIMPLEQ_HEAD(bmds_list, BlkMigDevState) bmds_list; int64_t total_sector_sum; bool zero_blocks; @@ -116,27 +129,63 @@ static void blk_mig_unlock(void) /* Only allocating and initializing structure fields, not copying any data. */ static BlkMigBlock *blk_create(BlkMigDevState *bmds, int64_t sector, -int nr_sectors) + int nr_sectors, bool only_bitmaps) { +int i; BlkMigBlock *blk = g_new(BlkMigBlock, 1); -blk-buf = g_malloc(BLOCK_SIZE); +blk-buf = only_bitmaps ? NULL : g_malloc(BLOCK_SIZE); blk-bmds = bmds; blk-sector = sector; blk-nr_sectors = nr_sectors; +blk-dirty_bitmaps = NULL; +blk-nr_bitmaps = 0; blk-iov.iov_base = blk-buf; blk-iov.iov_len = nr_sectors * BDRV_SECTOR_SIZE; qemu_iovec_init_external(blk-qiov, blk-iov, 1); +if (block_mig_state.dbm_enable) { +BlkMigDirtyBitmap *mig_bm; + +blk-nr_bitmaps = bmds-nr_bitmaps; +mig_bm = blk-dirty_bitmaps = g_new(BlkMigDirtyBitmap, +bmds-nr_bitmaps); +for (i = 0; i bmds-nr_bitmaps; ++i) { +BdrvDirtyBitmap *bm = bmds-dirty_bitmaps[i]; +
Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration
On 11/12/2014 15:17, Vladimir Sementsov-Ogievskiy wrote: Just migrate parts of dirty bitmaps, corresponding to the block being migrated. Also, skip block migration if it is disabled (blk parameter of migrate command is false). So I have misread the patch. blk here: +static void blk_store_bitmaps(BlkMigBlock *blk) +{ +int i; +for (i = 0; i blk-nr_bitmaps; ++i) { +bdrv_dbm_store_data(blk-dirty_bitmaps[i].bitmap, +blk-dirty_bitmaps[i].buf, +blk-sector, blk-nr_sectors); +} refers to a BlkMigBlock, not a BlockBackend. Still, I need an explanation of the operation of this patch. During the bulk phase you migrate all of the dirty bitmap. This is okay. Then, during the iterative phase you mgirate parts of the dirty bitmap corresponding to sectors that are dirty for block migration. This means parts of the dirty bitmap that have become 1. How do you track which parts of the disk have been acted upon (e.g. mirrored in the case of the drive-mirror command), so that they have become 0? This, strictly speaking, is conservative so it is not incorrect, but it basically throws away all the work done by the block job during the block migration---which might take several minutes. Is a non-live solution really that bad? If it is, and it is not feasible to just migrate the bitmaps non-live, can we do better to migrate bits of the dirty bitmap that have become 0? Paolo Skipping shared sectors: bitmaps are migrated independently of this, just send blk without block data. Signed-off-by: Vladimir Sementsov-Ogievskiy vsement...@parallels.com --- block-migration.c | 173 +++--- savevm.c | 1 + 2 files changed, 154 insertions(+), 20 deletions(-) diff --git a/block-migration.c b/block-migration.c index 908a66d..95d54a1 100644 --- a/block-migration.c +++ b/block-migration.c @@ -32,6 +32,8 @@ #define BLK_MIG_FLAG_EOS0x02 #define BLK_MIG_FLAG_PROGRESS 0x04 #define BLK_MIG_FLAG_ZERO_BLOCK 0x08 +#define BLK_MIG_FLAG_HAS_BLOCK 0x10 +#define BLK_MIG_FLAG_HAS_BITMAPS0x20 #define MAX_IS_ALLOCATED_SEARCH 65536 @@ -51,6 +53,8 @@ typedef struct BlkMigDevState { int shared_base; int64_t total_sectors; QSIMPLEQ_ENTRY(BlkMigDevState) entry; +int nr_bitmaps; +BdrvDirtyBitmap **dirty_bitmaps; /* Only used by migration thread. Does not need a lock. */ int bulk_completed; @@ -64,6 +68,11 @@ typedef struct BlkMigDevState { Error *blocker; } BlkMigDevState; +typedef struct BlkMigDirtyBitmap { +BdrvDirtyBitmap *bitmap; +uint8_t *buf; +} BlkMigDirtyBitmap; + typedef struct BlkMigBlock { /* Only used by migration thread. */ uint8_t *buf; @@ -74,6 +83,9 @@ typedef struct BlkMigBlock { QEMUIOVector qiov; BlockAIOCB *aiocb; +int nr_bitmaps; +BlkMigDirtyBitmap *dirty_bitmaps; + /* Protected by block migration lock. */ int ret; QSIMPLEQ_ENTRY(BlkMigBlock) entry; @@ -83,6 +95,7 @@ typedef struct BlkMigState { /* Written during setup phase. Can be read without a lock. */ int blk_enable; int shared_base; +int dbm_enable; QSIMPLEQ_HEAD(bmds_list, BlkMigDevState) bmds_list; int64_t total_sector_sum; bool zero_blocks; @@ -116,27 +129,63 @@ static void blk_mig_unlock(void) /* Only allocating and initializing structure fields, not copying any data. */ static BlkMigBlock *blk_create(BlkMigDevState *bmds, int64_t sector, -int nr_sectors) + int nr_sectors, bool only_bitmaps) { +int i; BlkMigBlock *blk = g_new(BlkMigBlock, 1); -blk-buf = g_malloc(BLOCK_SIZE); +blk-buf = only_bitmaps ? NULL : g_malloc(BLOCK_SIZE); blk-bmds = bmds; blk-sector = sector; blk-nr_sectors = nr_sectors; +blk-dirty_bitmaps = NULL; +blk-nr_bitmaps = 0; blk-iov.iov_base = blk-buf; blk-iov.iov_len = nr_sectors * BDRV_SECTOR_SIZE; qemu_iovec_init_external(blk-qiov, blk-iov, 1); +if (block_mig_state.dbm_enable) { +BlkMigDirtyBitmap *mig_bm; + +blk-nr_bitmaps = bmds-nr_bitmaps; +mig_bm = blk-dirty_bitmaps = g_new(BlkMigDirtyBitmap, +bmds-nr_bitmaps); +for (i = 0; i bmds-nr_bitmaps; ++i) { +BdrvDirtyBitmap *bm = bmds-dirty_bitmaps[i]; +mig_bm-buf = g_malloc(bdrv_dbm_data_size(bm, nr_sectors)); +mig_bm-bitmap = bm; +mig_bm++; +} +} + return blk; } static void blk_free(BlkMigBlock *blk) { +int i; g_free(blk-buf); + +if (blk-dirty_bitmaps) { +for (i = 0; i blk-nr_bitmaps; ++i) { +g_free(blk-dirty_bitmaps[i].buf); +} +
Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration
CCing migration maintainers, feedback otherwise in-line. On 12/11/2014 09:17 AM, Vladimir Sementsov-Ogievskiy wrote: Just migrate parts of dirty bitmaps, corresponding to the block being migrated. Also, skip block migration if it is disabled (blk parameter of migrate command is false). Skipping shared sectors: bitmaps are migrated independently of this, just send blk without block data. Signed-off-by: Vladimir Sementsov-Ogievskiy vsement...@parallels.com --- In terms of general approach, migrating the dirty bitmap alongside the blocks it describes makes sense when migrating both. Is this a lot of overhead when that's not the case, though? If we utilize the bitmap only pathways added here, don't we iterate through the savevm handlers a lot to only transfer very little data per section? If we really need migration of bitmaps apart from the data they describe, is it not worth developing a more condensed transfer mechanism to get more bitmap data per iteration, instead of just one block's worth? block-migration.c | 173 +++--- savevm.c | 1 + 2 files changed, 154 insertions(+), 20 deletions(-) diff --git a/block-migration.c b/block-migration.c index 908a66d..95d54a1 100644 --- a/block-migration.c +++ b/block-migration.c @@ -32,6 +32,8 @@ #define BLK_MIG_FLAG_EOS0x02 #define BLK_MIG_FLAG_PROGRESS 0x04 #define BLK_MIG_FLAG_ZERO_BLOCK 0x08 +#define BLK_MIG_FLAG_HAS_BLOCK 0x10 +#define BLK_MIG_FLAG_HAS_BITMAPS0x20 OK: As a result of allowing bitmaps to be migrated without the block data itself, we now must acknowledge the concept that we can send either block data, bitmaps, both, or neither -- so new defines are added here to indicate what data can be found in the section following. #define MAX_IS_ALLOCATED_SEARCH 65536 @@ -51,6 +53,8 @@ typedef struct BlkMigDevState { int shared_base; int64_t total_sectors; QSIMPLEQ_ENTRY(BlkMigDevState) entry; +int nr_bitmaps; +BdrvDirtyBitmap **dirty_bitmaps; /* Only used by migration thread. Does not need a lock. */ int bulk_completed; @@ -64,6 +68,11 @@ typedef struct BlkMigDevState { Error *blocker; } BlkMigDevState; +typedef struct BlkMigDirtyBitmap { +BdrvDirtyBitmap *bitmap; +uint8_t *buf; +} BlkMigDirtyBitmap; + typedef struct BlkMigBlock { /* Only used by migration thread. */ uint8_t *buf; @@ -74,6 +83,9 @@ typedef struct BlkMigBlock { QEMUIOVector qiov; BlockAIOCB *aiocb; +int nr_bitmaps; +BlkMigDirtyBitmap *dirty_bitmaps; + /* Protected by block migration lock. */ int ret; QSIMPLEQ_ENTRY(BlkMigBlock) entry; @@ -83,6 +95,7 @@ typedef struct BlkMigState { /* Written during setup phase. Can be read without a lock. */ int blk_enable; int shared_base; +int dbm_enable; Similar to the feedback in a previous patch, we may not want to use 'dbm' to mean Dirty Bitmap because we have not been applying the abbreviation consistently. For now, the recommendation from stefan is to use the full bdrv_dirty_bitmap or dirty_bitmap in function names. If we do want an acronym to refer to this particular type of dirty bitmap, we should apply it consistently to all functions that work with the BdrvDirtyBitmap type. For now, bdrv_dirty_bitmap_enable should suffice, even though it's a bit wordy. QSIMPLEQ_HEAD(bmds_list, BlkMigDevState) bmds_list; int64_t total_sector_sum; bool zero_blocks; @@ -116,27 +129,63 @@ static void blk_mig_unlock(void) /* Only allocating and initializing structure fields, not copying any data. */ static BlkMigBlock *blk_create(BlkMigDevState *bmds, int64_t sector, -int nr_sectors) + int nr_sectors, bool only_bitmaps) { +int i; BlkMigBlock *blk = g_new(BlkMigBlock, 1); -blk-buf = g_malloc(BLOCK_SIZE); +blk-buf = only_bitmaps ? NULL : g_malloc(BLOCK_SIZE); blk-bmds = bmds; blk-sector = sector; blk-nr_sectors = nr_sectors; +blk-dirty_bitmaps = NULL; +blk-nr_bitmaps = 0; blk-iov.iov_base = blk-buf; blk-iov.iov_len = nr_sectors * BDRV_SECTOR_SIZE; qemu_iovec_init_external(blk-qiov, blk-iov, 1); We can probably skip this block if only_bitmaps is true. +if (block_mig_state.dbm_enable) { +BlkMigDirtyBitmap *mig_bm; + +blk-nr_bitmaps = bmds-nr_bitmaps; +mig_bm = blk-dirty_bitmaps = g_new(BlkMigDirtyBitmap, +bmds-nr_bitmaps); +for (i = 0; i bmds-nr_bitmaps; ++i) { +BdrvDirtyBitmap *bm = bmds-dirty_bitmaps[i]; +mig_bm-buf = g_malloc(bdrv_dbm_data_size(bm, nr_sectors)); +mig_bm-bitmap = bm; +mig_bm++; +} +} + It's strange to me that we'd give it an only bitmaps boolean and then rely on
Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration
On 11/12/2014 15:17, Vladimir Sementsov-Ogievskiy wrote: Just migrate parts of dirty bitmaps, corresponding to the block being migrated. Also, skip block migration if it is disabled (blk parameter of migrate command is false). Skipping shared sectors: bitmaps are migrated independently of this, just send blk without block data. I think dirty bitmaps should be migrated separately from the block migration. It makes a lot of sense to migrate them even if you are not using block migration. Compared to new flags, the availability of migration capabilities can be queried via query-migrate-capabilities before actually invoking the migration. So they are preferred, and you should add a migration capability instead of a new argument. The bitmaps are transmitted many times in their entirety, but only the last copy actually means something. The others are lost. This means you should use the non-live interface (register_savevm). This will simplify the code a lot. Paolo
Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration
On 01/08/2015 03:36 PM, Paolo Bonzini wrote: On 11/12/2014 15:17, Vladimir Sementsov-Ogievskiy wrote: Just migrate parts of dirty bitmaps, corresponding to the block being migrated. Also, skip block migration if it is disabled (blk parameter of migrate command is false). Skipping shared sectors: bitmaps are migrated independently of this, just send blk without block data. I think dirty bitmaps should be migrated separately from the block migration. It makes a lot of sense to migrate them even if you are not using block migration. Compared to new flags, the availability of migration capabilities can be queried via query-migrate-capabilities before actually invoking the migration. So they are preferred, and you should add a migration capability instead of a new argument. Good point - we still don't have argument introspection (only command introspection), so turning on bitmap migration with a capability is indeed preferable over having to probe whether attempting an optional parameter will induce failure. The bitmaps are transmitted many times in their entirety, but only the last copy actually means something. The others are lost. This means you should use the non-live interface (register_savevm). This will simplify the code a lot. If I understand correctly, you are saying that: block migration vs. assuming shared storage during migration is independent from: current behavior of resetting dirty tracking on destination vs. new one-shot non-live dirty bitmap migration and that all four combinations are potentially useful. Also, by doing dirty bitmap migration as one-shot at the tail end of migration (in the small window when things are no longer live) you have a lot less effort, although the window of non-live activity is now a bit longer if the dirty table migration is not efficient. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 08/01/2015 23:45, Eric Blake wrote: The bitmaps are transmitted many times in their entirety, but only the last copy actually means something. The others are lost. This means you should use the non-live interface (register_savevm). This will simplify the code a lot. If I understand correctly, you are saying that: block migration vs. assuming shared storage during migration is independent from: current behavior of resetting dirty tracking on destination vs. new one-shot non-live dirty bitmap migration and that all four combinations are potentially useful. Yes. For example, you could use drive-mirror to snoop writes to an NBD destination (e.g. to a malware scanner or to a secondary machine if you have fault tolerance). Then you need to migrate the dirty bitmap even if you have shared storage. Also, by doing dirty bitmap migration as one-shot at the tail end of migration (in the small window when things are no longer live) you have a lot less effort, although the window of non-live activity is now a bit longer if the dirty table migration is not efficient. As of this series, dirty bitmap migration is not attempting to track the dirtiness of the dirty bitmap itself (you might have to read that twice :)). So there's nothing to lose in terms of efficiency. Paolo -BEGIN PGP SIGNATURE- Version: GnuPG v2 iQEcBAEBAgAGBQJUrwlwAAoJEL/70l94x66DSN4H+gLqqkghOqdkPzduTmf1hzi1 pwLe/TGZv+gNNWu8G2hO2FfasMlmBxEOVtwODFcwGExvPnH2jv1N7/dplIwKqbLS g9Hq5cI3UagxfFQts7fyBhjCxeUrHuaKfIvc/A1kfMMwesn8PPGFUGEXNqIs1NGc I+3qTE5TlcOKWCfL+3zgLDUowvRACbTJngHfveOtoWscVz743yQxhH3NOKPu7jxR 8H5AC9TK7sr3azHrXFgMP53W2qHT0KHTUyLQem+iKagFLsxX6oyOEn0ueMooAndE tzvw+5EiFj8MfAzPAdQWEPYwxQyDBy/oXQizQTCmAQku2TZIlWi+kStJ3K/qEaM= =v5Gj -END PGP SIGNATURE-
[Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration
Just migrate parts of dirty bitmaps, corresponding to the block being migrated. Also, skip block migration if it is disabled (blk parameter of migrate command is false). Skipping shared sectors: bitmaps are migrated independently of this, just send blk without block data. Signed-off-by: Vladimir Sementsov-Ogievskiy vsement...@parallels.com --- block-migration.c | 173 +++--- savevm.c | 1 + 2 files changed, 154 insertions(+), 20 deletions(-) diff --git a/block-migration.c b/block-migration.c index 908a66d..95d54a1 100644 --- a/block-migration.c +++ b/block-migration.c @@ -32,6 +32,8 @@ #define BLK_MIG_FLAG_EOS0x02 #define BLK_MIG_FLAG_PROGRESS 0x04 #define BLK_MIG_FLAG_ZERO_BLOCK 0x08 +#define BLK_MIG_FLAG_HAS_BLOCK 0x10 +#define BLK_MIG_FLAG_HAS_BITMAPS0x20 #define MAX_IS_ALLOCATED_SEARCH 65536 @@ -51,6 +53,8 @@ typedef struct BlkMigDevState { int shared_base; int64_t total_sectors; QSIMPLEQ_ENTRY(BlkMigDevState) entry; +int nr_bitmaps; +BdrvDirtyBitmap **dirty_bitmaps; /* Only used by migration thread. Does not need a lock. */ int bulk_completed; @@ -64,6 +68,11 @@ typedef struct BlkMigDevState { Error *blocker; } BlkMigDevState; +typedef struct BlkMigDirtyBitmap { +BdrvDirtyBitmap *bitmap; +uint8_t *buf; +} BlkMigDirtyBitmap; + typedef struct BlkMigBlock { /* Only used by migration thread. */ uint8_t *buf; @@ -74,6 +83,9 @@ typedef struct BlkMigBlock { QEMUIOVector qiov; BlockAIOCB *aiocb; +int nr_bitmaps; +BlkMigDirtyBitmap *dirty_bitmaps; + /* Protected by block migration lock. */ int ret; QSIMPLEQ_ENTRY(BlkMigBlock) entry; @@ -83,6 +95,7 @@ typedef struct BlkMigState { /* Written during setup phase. Can be read without a lock. */ int blk_enable; int shared_base; +int dbm_enable; QSIMPLEQ_HEAD(bmds_list, BlkMigDevState) bmds_list; int64_t total_sector_sum; bool zero_blocks; @@ -116,27 +129,63 @@ static void blk_mig_unlock(void) /* Only allocating and initializing structure fields, not copying any data. */ static BlkMigBlock *blk_create(BlkMigDevState *bmds, int64_t sector, -int nr_sectors) + int nr_sectors, bool only_bitmaps) { +int i; BlkMigBlock *blk = g_new(BlkMigBlock, 1); -blk-buf = g_malloc(BLOCK_SIZE); +blk-buf = only_bitmaps ? NULL : g_malloc(BLOCK_SIZE); blk-bmds = bmds; blk-sector = sector; blk-nr_sectors = nr_sectors; +blk-dirty_bitmaps = NULL; +blk-nr_bitmaps = 0; blk-iov.iov_base = blk-buf; blk-iov.iov_len = nr_sectors * BDRV_SECTOR_SIZE; qemu_iovec_init_external(blk-qiov, blk-iov, 1); +if (block_mig_state.dbm_enable) { +BlkMigDirtyBitmap *mig_bm; + +blk-nr_bitmaps = bmds-nr_bitmaps; +mig_bm = blk-dirty_bitmaps = g_new(BlkMigDirtyBitmap, +bmds-nr_bitmaps); +for (i = 0; i bmds-nr_bitmaps; ++i) { +BdrvDirtyBitmap *bm = bmds-dirty_bitmaps[i]; +mig_bm-buf = g_malloc(bdrv_dbm_data_size(bm, nr_sectors)); +mig_bm-bitmap = bm; +mig_bm++; +} +} + return blk; } static void blk_free(BlkMigBlock *blk) { +int i; g_free(blk-buf); + +if (blk-dirty_bitmaps) { +for (i = 0; i blk-nr_bitmaps; ++i) { +g_free(blk-dirty_bitmaps[i].buf); +} +g_free(blk-dirty_bitmaps); +} + g_free(blk); } +static void blk_store_bitmaps(BlkMigBlock *blk) +{ +int i; +for (i = 0; i blk-nr_bitmaps; ++i) { +bdrv_dbm_store_data(blk-dirty_bitmaps[i].bitmap, +blk-dirty_bitmaps[i].buf, +blk-sector, blk-nr_sectors); +} +} + /* Must run outside of the iothread lock during the bulk phase, * or the VM will stall. */ @@ -146,9 +195,15 @@ static void blk_send(QEMUFile *f, BlkMigBlock * blk) int len; uint64_t flags = BLK_MIG_FLAG_DEVICE_BLOCK; -if (block_mig_state.zero_blocks +if (block_mig_state.zero_blocks blk-buf buffer_is_zero(blk-buf, BLOCK_SIZE)) { flags |= BLK_MIG_FLAG_ZERO_BLOCK; +} else if (blk-buf) { +flags |= BLK_MIG_FLAG_HAS_BLOCK; +} + +if (block_mig_state.dbm_enable) { +flags |= BLK_MIG_FLAG_HAS_BITMAPS; } /* sector number and flags */ @@ -160,10 +215,27 @@ static void blk_send(QEMUFile *f, BlkMigBlock * blk) qemu_put_byte(f, len); qemu_put_buffer(f, (uint8_t *)bdrv_get_device_name(blk-bmds-bs), len); +/* dirty bitmaps */ +if (flags BLK_MIG_FLAG_HAS_BITMAPS) { +int i; +qemu_put_be32(f, blk-nr_bitmaps); +for (i = 0; i blk-nr_bitmaps; ++i) { +BdrvDirtyBitmap *bm = blk-dirty_bitmaps[i].bitmap; +int buf_size =