Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration

2015-01-20 Thread John Snow



On 01/17/2015 12:17 PM, Vladimir Sementsov-Ogievskiy wrote:


Best regards,
Vladimir

On 09.01.2015 01:05, John Snow wrote:

CCing migration maintainers, feedback otherwise in-line.

On 12/11/2014 09:17 AM, Vladimir Sementsov-Ogievskiy wrote:

Just migrate parts of dirty bitmaps, corresponding to the block being
migrated. Also, skip block migration if it is disabled (blk parameter
of migrate command is false).

Skipping shared sectors: bitmaps are migrated independently of this,
just send blk without block data.

Signed-off-by: Vladimir Sementsov-Ogievskiy vsement...@parallels.com
---


In terms of general approach, migrating the dirty bitmap alongside the
blocks it describes makes sense when migrating both.

Is this a lot of overhead when that's not the case, though? If we
utilize the bitmap only pathways added here, don't we iterate
through the savevm handlers a lot to only transfer very little data
per section?

If we really need migration of bitmaps apart from the data they
describe, is it not worth developing a more condensed transfer
mechanism to get more bitmap data per iteration, instead of just one
block's worth?

The first stage of migration can be easily optimized in this case - just
take a larger bitmap pieces. For the second stage, it's not as simple.
If we will take larger pieces on each step - we will just increase dirty
data transfer. One approach is to maintain dirty-region - two numbers,
representing the region of set bits in migration dirty bitmap, and send
data only when this region is large enough.




  block-migration.c | 173
+++---
  savevm.c  |   1 +
  2 files changed, 154 insertions(+), 20 deletions(-)

diff --git a/block-migration.c b/block-migration.c
index 908a66d..95d54a1 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -32,6 +32,8 @@
  #define BLK_MIG_FLAG_EOS0x02
  #define BLK_MIG_FLAG_PROGRESS   0x04
  #define BLK_MIG_FLAG_ZERO_BLOCK 0x08
+#define BLK_MIG_FLAG_HAS_BLOCK  0x10
+#define BLK_MIG_FLAG_HAS_BITMAPS0x20



OK: As a result of allowing bitmaps to be migrated without the block
data itself, we now must acknowledge the concept that we can send
either block data, bitmaps, both, or neither -- so new defines are
added here to indicate what data can be found in the section following.


  #define MAX_IS_ALLOCATED_SEARCH 65536

@@ -51,6 +53,8 @@ typedef struct BlkMigDevState {
  int shared_base;
  int64_t total_sectors;
  QSIMPLEQ_ENTRY(BlkMigDevState) entry;
+int nr_bitmaps;
+BdrvDirtyBitmap **dirty_bitmaps;

  /* Only used by migration thread.  Does not need a lock. */
  int bulk_completed;
@@ -64,6 +68,11 @@ typedef struct BlkMigDevState {
  Error *blocker;
  } BlkMigDevState;

+typedef struct BlkMigDirtyBitmap {
+BdrvDirtyBitmap *bitmap;
+uint8_t *buf;
+} BlkMigDirtyBitmap;
+
  typedef struct BlkMigBlock {
  /* Only used by migration thread.  */
  uint8_t *buf;
@@ -74,6 +83,9 @@ typedef struct BlkMigBlock {
  QEMUIOVector qiov;
  BlockAIOCB *aiocb;

+int nr_bitmaps;
+BlkMigDirtyBitmap *dirty_bitmaps;
+
  /* Protected by block migration lock.  */
  int ret;
  QSIMPLEQ_ENTRY(BlkMigBlock) entry;
@@ -83,6 +95,7 @@ typedef struct BlkMigState {
  /* Written during setup phase.  Can be read without a lock.  */
  int blk_enable;
  int shared_base;
+int dbm_enable;


Similar to the feedback in a previous patch, we may not want to use
'dbm' to mean Dirty Bitmap because we have not been applying the
abbreviation consistently.

For now, the recommendation from stefan is to use the full
bdrv_dirty_bitmap or dirty_bitmap in function names.

If we do want an acronym to refer to this particular type of dirty
bitmap, we should apply it consistently to all functions that work
with the BdrvDirtyBitmap type.

For now, bdrv_dirty_bitmap_enable should suffice, even though it's a
bit wordy.

It's a flag, that enables the migration of dirty bitmaps, like
blk_enable enables the migrtion of blocks.. Then, should it be
dirty_bitmap_migration_enable? Isn't it too long for a simple flag
variable?


Yeah, I see your point. I think the only difference is that blk is 
used very widely throughout the code, but this is the first occurrence 
of dbm.


even just bitmap_enable would be sufficiently more self-evident than 
dbm_enable, unless we rework all of the BdrvDirtyBitmap functions to 
use the dbm abbreviation.


I'd also be happy with migrate_bitmaps which is fairly 
self-explanatory, too.





  QSIMPLEQ_HEAD(bmds_list, BlkMigDevState) bmds_list;
  int64_t total_sector_sum;
  bool zero_blocks;
@@ -116,27 +129,63 @@ static void blk_mig_unlock(void)
  /* Only allocating and initializing structure fields, not copying
any data. */

  static BlkMigBlock *blk_create(BlkMigDevState *bmds, int64_t sector,
-int nr_sectors)
+   int 

Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration

2015-01-17 Thread Vladimir Sementsov-Ogievskiy


Best regards,
Vladimir

On 09.01.2015 01:05, John Snow wrote:

CCing migration maintainers, feedback otherwise in-line.

On 12/11/2014 09:17 AM, Vladimir Sementsov-Ogievskiy wrote:

Just migrate parts of dirty bitmaps, corresponding to the block being
migrated. Also, skip block migration if it is disabled (blk parameter
of migrate command is false).

Skipping shared sectors: bitmaps are migrated independently of this,
just send blk without block data.

Signed-off-by: Vladimir Sementsov-Ogievskiy vsement...@parallels.com
---


In terms of general approach, migrating the dirty bitmap alongside the 
blocks it describes makes sense when migrating both.


Is this a lot of overhead when that's not the case, though? If we 
utilize the bitmap only pathways added here, don't we iterate 
through the savevm handlers a lot to only transfer very little data 
per section?


If we really need migration of bitmaps apart from the data they 
describe, is it not worth developing a more condensed transfer 
mechanism to get more bitmap data per iteration, instead of just one 
block's worth?
The first stage of migration can be easily optimized in this case - just 
take a larger bitmap pieces. For the second stage, it's not as simple. 
If we will take larger pieces on each step - we will just increase dirty 
data transfer. One approach is to maintain dirty-region - two numbers, 
representing the region of set bits in migration dirty bitmap, and send 
data only when this region is large enough.




  block-migration.c | 173 
+++---

  savevm.c  |   1 +
  2 files changed, 154 insertions(+), 20 deletions(-)

diff --git a/block-migration.c b/block-migration.c
index 908a66d..95d54a1 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -32,6 +32,8 @@
  #define BLK_MIG_FLAG_EOS0x02
  #define BLK_MIG_FLAG_PROGRESS   0x04
  #define BLK_MIG_FLAG_ZERO_BLOCK 0x08
+#define BLK_MIG_FLAG_HAS_BLOCK  0x10
+#define BLK_MIG_FLAG_HAS_BITMAPS0x20



OK: As a result of allowing bitmaps to be migrated without the block 
data itself, we now must acknowledge the concept that we can send 
either block data, bitmaps, both, or neither -- so new defines are 
added here to indicate what data can be found in the section following.



  #define MAX_IS_ALLOCATED_SEARCH 65536

@@ -51,6 +53,8 @@ typedef struct BlkMigDevState {
  int shared_base;
  int64_t total_sectors;
  QSIMPLEQ_ENTRY(BlkMigDevState) entry;
+int nr_bitmaps;
+BdrvDirtyBitmap **dirty_bitmaps;

  /* Only used by migration thread.  Does not need a lock. */
  int bulk_completed;
@@ -64,6 +68,11 @@ typedef struct BlkMigDevState {
  Error *blocker;
  } BlkMigDevState;

+typedef struct BlkMigDirtyBitmap {
+BdrvDirtyBitmap *bitmap;
+uint8_t *buf;
+} BlkMigDirtyBitmap;
+
  typedef struct BlkMigBlock {
  /* Only used by migration thread.  */
  uint8_t *buf;
@@ -74,6 +83,9 @@ typedef struct BlkMigBlock {
  QEMUIOVector qiov;
  BlockAIOCB *aiocb;

+int nr_bitmaps;
+BlkMigDirtyBitmap *dirty_bitmaps;
+
  /* Protected by block migration lock.  */
  int ret;
  QSIMPLEQ_ENTRY(BlkMigBlock) entry;
@@ -83,6 +95,7 @@ typedef struct BlkMigState {
  /* Written during setup phase.  Can be read without a lock.  */
  int blk_enable;
  int shared_base;
+int dbm_enable;


Similar to the feedback in a previous patch, we may not want to use 
'dbm' to mean Dirty Bitmap because we have not been applying the 
abbreviation consistently.


For now, the recommendation from stefan is to use the full 
bdrv_dirty_bitmap or dirty_bitmap in function names.


If we do want an acronym to refer to this particular type of dirty 
bitmap, we should apply it consistently to all functions that work 
with the BdrvDirtyBitmap type.


For now, bdrv_dirty_bitmap_enable should suffice, even though it's a 
bit wordy.
It's a flag, that enables the migration of dirty bitmaps, like 
blk_enable enables the migrtion of blocks.. Then, should it be 
dirty_bitmap_migration_enable? Isn't it too long for a simple flag 
variable?



  QSIMPLEQ_HEAD(bmds_list, BlkMigDevState) bmds_list;
  int64_t total_sector_sum;
  bool zero_blocks;
@@ -116,27 +129,63 @@ static void blk_mig_unlock(void)
  /* Only allocating and initializing structure fields, not copying 
any data. */


  static BlkMigBlock *blk_create(BlkMigDevState *bmds, int64_t sector,
-int nr_sectors)
+   int nr_sectors, bool only_bitmaps)
  {
+int i;
  BlkMigBlock *blk = g_new(BlkMigBlock, 1);
-blk-buf = g_malloc(BLOCK_SIZE);
+blk-buf = only_bitmaps ? NULL : g_malloc(BLOCK_SIZE);
  blk-bmds = bmds;
  blk-sector = sector;
  blk-nr_sectors = nr_sectors;
+blk-dirty_bitmaps = NULL;
+blk-nr_bitmaps = 0;

  blk-iov.iov_base = blk-buf;
  blk-iov.iov_len = nr_sectors * BDRV_SECTOR_SIZE;
  

Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration

2015-01-13 Thread Vladimir Sementsov-Ogievskiy


On 12.01.2015 17:42, Paolo Bonzini wrote:

On 12/01/2015 15:20, Vladimir Sementsov-Ogievskiy wrote:

On 09.01.2015 01:36, Paolo Bonzini wrote:

The bitmaps are transmitted many times in their entirety, but only the
last copy actually means something. The others are lost. This means
you should use the non-live interface (register_savevm). This will
simplify the code a lot.

But what if the dirty bitmap is really big?

For example, for 5 Pb drive the bitmap with granularity 65536 will be of
2 Gb size. So, shouldn't it be migrated iteratively for live migration?

But your code is not attempting to do that.  It is not attempting to
track the dirtiness of the dirty bitmap, so to speak.

For such a huge storage, in any case, I suspect the solution is to not
use QEMU's live operations and, instead, operate at the storage level.

Paolo


On 12.01.2015 22:09, Paolo Bonzini wrote:

On 12/01/2015 18:31, John Snow wrote:

How do you track which parts of the disk have been acted upon (e.g.
mirrored in the case of the drive-mirror command), so that they have
become 0?

Hm... so your question here is, What happens if the named bitmap you
are transferring  gets reset to 0 during migration? Will anybody track
this change?

As far as I know, the only consumer of _named_ BdrvDirtyBitmaps is
block/backup.c, and we do not support it for mirror (yet?)

Right.


So can a block-backup job be running WHILE we migrate?

I think so.

Paolo


There is no dirty bitmap for dirty bitmap in my solution, actually block 
migration dirty bitmap is used for this. It provides tracking of setting 
bitmap bits to 1 (when corresponding blocks are written), and if the 
bitmap changes only is this way (no other block-jobs are working with 
the migrated bitmap) - then the bitmap will be migrated iteratively as 
it is.


Such approach doesn't provide tracking of resetting the bitmap to 0, and 
such changes are not migrated. So if the bitmap is in use by any 
block-job, it may have additional set bits after migration.


If we want to migrate more precisely we need separate dirty bitmap for 
every migrated bitmap. And tracking of this dirty-dirty bitmap should be 
injected into BdrvDirtyBitmap interface.


But do we really need precise migration of the bitmap being in use by 
some block-job (backup for example)? In this case we should migrate 
the backup process to, migrate working block-job.. Is it possible and is 
it planned to be implemented?


Best regards,
Vladimir





Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration

2015-01-13 Thread John Snow



On 01/13/2015 04:16 AM, Vladimir Sementsov-Ogievskiy wrote:


On 12.01.2015 17:42, Paolo Bonzini wrote:

On 12/01/2015 15:20, Vladimir Sementsov-Ogievskiy wrote:

On 09.01.2015 01:36, Paolo Bonzini wrote:

The bitmaps are transmitted many times in their entirety, but only the
last copy actually means something. The others are lost. This means
you should use the non-live interface (register_savevm). This will
simplify the code a lot.

But what if the dirty bitmap is really big?

For example, for 5 Pb drive the bitmap with granularity 65536 will be of
2 Gb size. So, shouldn't it be migrated iteratively for live migration?

But your code is not attempting to do that.  It is not attempting to
track the dirtiness of the dirty bitmap, so to speak.

For such a huge storage, in any case, I suspect the solution is to not
use QEMU's live operations and, instead, operate at the storage level.

Paolo


On 12.01.2015 22:09, Paolo Bonzini wrote:

On 12/01/2015 18:31, John Snow wrote:

How do you track which parts of the disk have been acted upon (e.g.
mirrored in the case of the drive-mirror command), so that they have
become 0?

Hm... so your question here is, What happens if the named bitmap you
are transferring  gets reset to 0 during migration? Will anybody track
this change?

As far as I know, the only consumer of _named_ BdrvDirtyBitmaps is
block/backup.c, and we do not support it for mirror (yet?)

Right.


So can a block-backup job be running WHILE we migrate?

I think so.

Paolo


There is no dirty bitmap for dirty bitmap in my solution, actually block
migration dirty bitmap is used for this. It provides tracking of setting
bitmap bits to 1 (when corresponding blocks are written), and if the
bitmap changes only is this way (no other block-jobs are working with
the migrated bitmap) - then the bitmap will be migrated iteratively as
it is.

Such approach doesn't provide tracking of resetting the bitmap to 0, and
such changes are not migrated. So if the bitmap is in use by any
block-job, it may have additional set bits after migration.

If we want to migrate more precisely we need separate dirty bitmap for
every migrated bitmap. And tracking of this dirty-dirty bitmap should be
injected into BdrvDirtyBitmap interface.

But do we really need precise migration of the bitmap being in use by
some block-job (backup for example)? In this case we should migrate
the backup process to, migrate working block-job.. Is it possible and is
it planned to be implemented?

Best regards,
Vladimir





This is my question, too. The approach presented here will at least work 
correctly, as far as I can tell. The only problem may arise if we try to 
migrate while doing an incremental backup which will lead to too many 
dirty bits being migrated, which is just an inefficiency.


I'm not convinced it's important ...
Stefan, any thoughts?



Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration

2015-01-12 Thread Vladimir Sementsov-Ogievskiy


Best regards,
Vladimir

On 09.01.2015 01:36, Paolo Bonzini wrote:
The bitmaps are transmitted many times in their entirety, but only the 
last copy actually means something. The others are lost. This means 
you should use the non-live interface (register_savevm). This will 
simplify the code a lot.


But what if the dirty bitmap is really big?

For example, for 5 Pb drive the bitmap with granularity 65536 will be of 
2 Gb size. So, shouldn't it be migrated iteratively for live migration?





Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration

2015-01-12 Thread Paolo Bonzini


On 12/01/2015 15:20, Vladimir Sementsov-Ogievskiy wrote:
 
 On 09.01.2015 01:36, Paolo Bonzini wrote:
 The bitmaps are transmitted many times in their entirety, but only the
 last copy actually means something. The others are lost. This means
 you should use the non-live interface (register_savevm). This will
 simplify the code a lot.
 
 But what if the dirty bitmap is really big?
 
 For example, for 5 Pb drive the bitmap with granularity 65536 will be of
 2 Gb size. So, shouldn't it be migrated iteratively for live migration?

But your code is not attempting to do that.  It is not attempting to
track the dirtiness of the dirty bitmap, so to speak.

For such a huge storage, in any case, I suspect the solution is to not
use QEMU's live operations and, instead, operate at the storage level.

Paolo



Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration

2015-01-12 Thread Paolo Bonzini


On 12/01/2015 18:31, John Snow wrote:
 How do you track which parts of the disk have been acted upon (e.g.
 mirrored in the case of the drive-mirror command), so that they have
 become 0?
 
 Hm... so your question here is, What happens if the named bitmap you
 are transferring  gets reset to 0 during migration? Will anybody track
 this change?
 
 As far as I know, the only consumer of _named_ BdrvDirtyBitmaps is
 block/backup.c, and we do not support it for mirror (yet?)

Right.

 So can a block-backup job be running WHILE we migrate?

I think so.

Paolo



Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration

2015-01-12 Thread John Snow



On 01/12/2015 11:48 AM, Paolo Bonzini wrote:



On 11/12/2014 15:17, Vladimir Sementsov-Ogievskiy wrote:

Just migrate parts of dirty bitmaps, corresponding to the block being
migrated. Also, skip block migration if it is disabled (blk parameter
of migrate command is false).


So I have misread the patch. blk here:

+static void blk_store_bitmaps(BlkMigBlock *blk)
+{
+int i;
+for (i = 0; i  blk-nr_bitmaps; ++i) {
+bdrv_dbm_store_data(blk-dirty_bitmaps[i].bitmap,
+blk-dirty_bitmaps[i].buf,
+blk-sector, blk-nr_sectors);
+}

refers to a BlkMigBlock, not a BlockBackend.

Still, I need an explanation of the operation of this patch.

During the bulk phase you migrate all of the dirty bitmap.  This is okay.

Then, during the iterative phase you mgirate parts of the dirty bitmap
corresponding to sectors that are dirty for block migration.  This means
parts of the dirty bitmap that have become 1.

How do you track which parts of the disk have been acted upon (e.g.
mirrored in the case of the drive-mirror command), so that they have
become 0?


Hm... so your question here is, What happens if the named bitmap you 
are transferring  gets reset to 0 during migration? Will anybody track 
this change?


As far as I know, the only consumer of _named_ BdrvDirtyBitmaps is 
block/backup.c, and we do not support it for mirror (yet?)


So can a block-backup job be running WHILE we migrate? Otherwise, I 
don't think this situation (untracked bitmap resets) will arise here.



This, strictly speaking, is conservative so it is not incorrect, but it
basically throws away all the work done by the block job during the
block migration---which might take several minutes.  Is a non-live
solution really that bad?  If it is, and it is not feasible to just
migrate the bitmaps non-live, can we do better to migrate bits of the
dirty bitmap that have become 0?

Paolo


Skipping shared sectors: bitmaps are migrated independently of this,
just send blk without block data.

Signed-off-by: Vladimir Sementsov-Ogievskiy vsement...@parallels.com
---
  block-migration.c | 173 +++---
  savevm.c  |   1 +
  2 files changed, 154 insertions(+), 20 deletions(-)

diff --git a/block-migration.c b/block-migration.c
index 908a66d..95d54a1 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -32,6 +32,8 @@
  #define BLK_MIG_FLAG_EOS0x02
  #define BLK_MIG_FLAG_PROGRESS   0x04
  #define BLK_MIG_FLAG_ZERO_BLOCK 0x08
+#define BLK_MIG_FLAG_HAS_BLOCK  0x10
+#define BLK_MIG_FLAG_HAS_BITMAPS0x20

  #define MAX_IS_ALLOCATED_SEARCH 65536

@@ -51,6 +53,8 @@ typedef struct BlkMigDevState {
  int shared_base;
  int64_t total_sectors;
  QSIMPLEQ_ENTRY(BlkMigDevState) entry;
+int nr_bitmaps;
+BdrvDirtyBitmap **dirty_bitmaps;

  /* Only used by migration thread.  Does not need a lock.  */
  int bulk_completed;
@@ -64,6 +68,11 @@ typedef struct BlkMigDevState {
  Error *blocker;
  } BlkMigDevState;

+typedef struct BlkMigDirtyBitmap {
+BdrvDirtyBitmap *bitmap;
+uint8_t *buf;
+} BlkMigDirtyBitmap;
+
  typedef struct BlkMigBlock {
  /* Only used by migration thread.  */
  uint8_t *buf;
@@ -74,6 +83,9 @@ typedef struct BlkMigBlock {
  QEMUIOVector qiov;
  BlockAIOCB *aiocb;

+int nr_bitmaps;
+BlkMigDirtyBitmap *dirty_bitmaps;
+
  /* Protected by block migration lock.  */
  int ret;
  QSIMPLEQ_ENTRY(BlkMigBlock) entry;
@@ -83,6 +95,7 @@ typedef struct BlkMigState {
  /* Written during setup phase.  Can be read without a lock.  */
  int blk_enable;
  int shared_base;
+int dbm_enable;
  QSIMPLEQ_HEAD(bmds_list, BlkMigDevState) bmds_list;
  int64_t total_sector_sum;
  bool zero_blocks;
@@ -116,27 +129,63 @@ static void blk_mig_unlock(void)
  /* Only allocating and initializing structure fields, not copying any data. */

  static BlkMigBlock *blk_create(BlkMigDevState *bmds, int64_t sector,
-int nr_sectors)
+   int nr_sectors, bool only_bitmaps)
  {
+int i;
  BlkMigBlock *blk = g_new(BlkMigBlock, 1);
-blk-buf = g_malloc(BLOCK_SIZE);
+blk-buf = only_bitmaps ? NULL : g_malloc(BLOCK_SIZE);
  blk-bmds = bmds;
  blk-sector = sector;
  blk-nr_sectors = nr_sectors;
+blk-dirty_bitmaps = NULL;
+blk-nr_bitmaps = 0;

  blk-iov.iov_base = blk-buf;
  blk-iov.iov_len = nr_sectors * BDRV_SECTOR_SIZE;
  qemu_iovec_init_external(blk-qiov, blk-iov, 1);

+if (block_mig_state.dbm_enable) {
+BlkMigDirtyBitmap *mig_bm;
+
+blk-nr_bitmaps = bmds-nr_bitmaps;
+mig_bm = blk-dirty_bitmaps = g_new(BlkMigDirtyBitmap,
+bmds-nr_bitmaps);
+for (i = 0; i  bmds-nr_bitmaps; ++i) {
+BdrvDirtyBitmap *bm = bmds-dirty_bitmaps[i];
+  

Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration

2015-01-12 Thread Paolo Bonzini


On 11/12/2014 15:17, Vladimir Sementsov-Ogievskiy wrote:
 Just migrate parts of dirty bitmaps, corresponding to the block being
 migrated. Also, skip block migration if it is disabled (blk parameter
 of migrate command is false).

So I have misread the patch. blk here:

+static void blk_store_bitmaps(BlkMigBlock *blk)
+{
+int i;
+for (i = 0; i  blk-nr_bitmaps; ++i) {
+bdrv_dbm_store_data(blk-dirty_bitmaps[i].bitmap,
+blk-dirty_bitmaps[i].buf,
+blk-sector, blk-nr_sectors);
+}

refers to a BlkMigBlock, not a BlockBackend.

Still, I need an explanation of the operation of this patch.

During the bulk phase you migrate all of the dirty bitmap.  This is okay.

Then, during the iterative phase you mgirate parts of the dirty bitmap
corresponding to sectors that are dirty for block migration.  This means
parts of the dirty bitmap that have become 1.

How do you track which parts of the disk have been acted upon (e.g.
mirrored in the case of the drive-mirror command), so that they have
become 0?

This, strictly speaking, is conservative so it is not incorrect, but it
basically throws away all the work done by the block job during the
block migration---which might take several minutes.  Is a non-live
solution really that bad?  If it is, and it is not feasible to just
migrate the bitmaps non-live, can we do better to migrate bits of the
dirty bitmap that have become 0?

Paolo

 Skipping shared sectors: bitmaps are migrated independently of this,
 just send blk without block data.
 
 Signed-off-by: Vladimir Sementsov-Ogievskiy vsement...@parallels.com
 ---
  block-migration.c | 173 
 +++---
  savevm.c  |   1 +
  2 files changed, 154 insertions(+), 20 deletions(-)
 
 diff --git a/block-migration.c b/block-migration.c
 index 908a66d..95d54a1 100644
 --- a/block-migration.c
 +++ b/block-migration.c
 @@ -32,6 +32,8 @@
  #define BLK_MIG_FLAG_EOS0x02
  #define BLK_MIG_FLAG_PROGRESS   0x04
  #define BLK_MIG_FLAG_ZERO_BLOCK 0x08
 +#define BLK_MIG_FLAG_HAS_BLOCK  0x10
 +#define BLK_MIG_FLAG_HAS_BITMAPS0x20
  
  #define MAX_IS_ALLOCATED_SEARCH 65536
  
 @@ -51,6 +53,8 @@ typedef struct BlkMigDevState {
  int shared_base;
  int64_t total_sectors;
  QSIMPLEQ_ENTRY(BlkMigDevState) entry;
 +int nr_bitmaps;
 +BdrvDirtyBitmap **dirty_bitmaps;
  
  /* Only used by migration thread.  Does not need a lock.  */
  int bulk_completed;
 @@ -64,6 +68,11 @@ typedef struct BlkMigDevState {
  Error *blocker;
  } BlkMigDevState;
  
 +typedef struct BlkMigDirtyBitmap {
 +BdrvDirtyBitmap *bitmap;
 +uint8_t *buf;
 +} BlkMigDirtyBitmap;
 +
  typedef struct BlkMigBlock {
  /* Only used by migration thread.  */
  uint8_t *buf;
 @@ -74,6 +83,9 @@ typedef struct BlkMigBlock {
  QEMUIOVector qiov;
  BlockAIOCB *aiocb;
  
 +int nr_bitmaps;
 +BlkMigDirtyBitmap *dirty_bitmaps;
 +
  /* Protected by block migration lock.  */
  int ret;
  QSIMPLEQ_ENTRY(BlkMigBlock) entry;
 @@ -83,6 +95,7 @@ typedef struct BlkMigState {
  /* Written during setup phase.  Can be read without a lock.  */
  int blk_enable;
  int shared_base;
 +int dbm_enable;
  QSIMPLEQ_HEAD(bmds_list, BlkMigDevState) bmds_list;
  int64_t total_sector_sum;
  bool zero_blocks;
 @@ -116,27 +129,63 @@ static void blk_mig_unlock(void)
  /* Only allocating and initializing structure fields, not copying any data. 
 */
  
  static BlkMigBlock *blk_create(BlkMigDevState *bmds, int64_t sector,
 -int nr_sectors)
 +   int nr_sectors, bool only_bitmaps)
  {
 +int i;
  BlkMigBlock *blk = g_new(BlkMigBlock, 1);
 -blk-buf = g_malloc(BLOCK_SIZE);
 +blk-buf = only_bitmaps ? NULL : g_malloc(BLOCK_SIZE);
  blk-bmds = bmds;
  blk-sector = sector;
  blk-nr_sectors = nr_sectors;
 +blk-dirty_bitmaps = NULL;
 +blk-nr_bitmaps = 0;
  
  blk-iov.iov_base = blk-buf;
  blk-iov.iov_len = nr_sectors * BDRV_SECTOR_SIZE;
  qemu_iovec_init_external(blk-qiov, blk-iov, 1);
  
 +if (block_mig_state.dbm_enable) {
 +BlkMigDirtyBitmap *mig_bm;
 +
 +blk-nr_bitmaps = bmds-nr_bitmaps;
 +mig_bm = blk-dirty_bitmaps = g_new(BlkMigDirtyBitmap,
 +bmds-nr_bitmaps);
 +for (i = 0; i  bmds-nr_bitmaps; ++i) {
 +BdrvDirtyBitmap *bm = bmds-dirty_bitmaps[i];
 +mig_bm-buf = g_malloc(bdrv_dbm_data_size(bm, nr_sectors));
 +mig_bm-bitmap = bm;
 +mig_bm++;
 +}
 +}
 +
  return blk;
  }
  
  static void blk_free(BlkMigBlock *blk)
  {
 +int i;
  g_free(blk-buf);
 +
 +if (blk-dirty_bitmaps) {
 +for (i = 0; i  blk-nr_bitmaps; ++i) {
 +g_free(blk-dirty_bitmaps[i].buf);
 +}
 +

Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration

2015-01-08 Thread John Snow

CCing migration maintainers, feedback otherwise in-line.

On 12/11/2014 09:17 AM, Vladimir Sementsov-Ogievskiy wrote:

Just migrate parts of dirty bitmaps, corresponding to the block being
migrated. Also, skip block migration if it is disabled (blk parameter
of migrate command is false).

Skipping shared sectors: bitmaps are migrated independently of this,
just send blk without block data.

Signed-off-by: Vladimir Sementsov-Ogievskiy vsement...@parallels.com
---


In terms of general approach, migrating the dirty bitmap alongside the 
blocks it describes makes sense when migrating both.


Is this a lot of overhead when that's not the case, though? If we 
utilize the bitmap only pathways added here, don't we iterate through 
the savevm handlers a lot to only transfer very little data per section?


If we really need migration of bitmaps apart from the data they 
describe, is it not worth developing a more condensed transfer mechanism 
to get more bitmap data per iteration, instead of just one block's worth?



  block-migration.c | 173 +++---
  savevm.c  |   1 +
  2 files changed, 154 insertions(+), 20 deletions(-)

diff --git a/block-migration.c b/block-migration.c
index 908a66d..95d54a1 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -32,6 +32,8 @@
  #define BLK_MIG_FLAG_EOS0x02
  #define BLK_MIG_FLAG_PROGRESS   0x04
  #define BLK_MIG_FLAG_ZERO_BLOCK 0x08
+#define BLK_MIG_FLAG_HAS_BLOCK  0x10
+#define BLK_MIG_FLAG_HAS_BITMAPS0x20



OK: As a result of allowing bitmaps to be migrated without the block 
data itself, we now must acknowledge the concept that we can send either 
block data, bitmaps, both, or neither -- so new defines are added here 
to indicate what data can be found in the section following.



  #define MAX_IS_ALLOCATED_SEARCH 65536

@@ -51,6 +53,8 @@ typedef struct BlkMigDevState {
  int shared_base;
  int64_t total_sectors;
  QSIMPLEQ_ENTRY(BlkMigDevState) entry;
+int nr_bitmaps;
+BdrvDirtyBitmap **dirty_bitmaps;

  /* Only used by migration thread.  Does not need a lock.  */
  int bulk_completed;
@@ -64,6 +68,11 @@ typedef struct BlkMigDevState {
  Error *blocker;
  } BlkMigDevState;

+typedef struct BlkMigDirtyBitmap {
+BdrvDirtyBitmap *bitmap;
+uint8_t *buf;
+} BlkMigDirtyBitmap;
+
  typedef struct BlkMigBlock {
  /* Only used by migration thread.  */
  uint8_t *buf;
@@ -74,6 +83,9 @@ typedef struct BlkMigBlock {
  QEMUIOVector qiov;
  BlockAIOCB *aiocb;

+int nr_bitmaps;
+BlkMigDirtyBitmap *dirty_bitmaps;
+
  /* Protected by block migration lock.  */
  int ret;
  QSIMPLEQ_ENTRY(BlkMigBlock) entry;
@@ -83,6 +95,7 @@ typedef struct BlkMigState {
  /* Written during setup phase.  Can be read without a lock.  */
  int blk_enable;
  int shared_base;
+int dbm_enable;


Similar to the feedback in a previous patch, we may not want to use 
'dbm' to mean Dirty Bitmap because we have not been applying the 
abbreviation consistently.


For now, the recommendation from stefan is to use the full 
bdrv_dirty_bitmap or dirty_bitmap in function names.


If we do want an acronym to refer to this particular type of dirty 
bitmap, we should apply it consistently to all functions that work with 
the BdrvDirtyBitmap type.


For now, bdrv_dirty_bitmap_enable should suffice, even though it's a 
bit wordy.



  QSIMPLEQ_HEAD(bmds_list, BlkMigDevState) bmds_list;
  int64_t total_sector_sum;
  bool zero_blocks;
@@ -116,27 +129,63 @@ static void blk_mig_unlock(void)
  /* Only allocating and initializing structure fields, not copying any data. */

  static BlkMigBlock *blk_create(BlkMigDevState *bmds, int64_t sector,
-int nr_sectors)
+   int nr_sectors, bool only_bitmaps)
  {
+int i;
  BlkMigBlock *blk = g_new(BlkMigBlock, 1);
-blk-buf = g_malloc(BLOCK_SIZE);
+blk-buf = only_bitmaps ? NULL : g_malloc(BLOCK_SIZE);
  blk-bmds = bmds;
  blk-sector = sector;
  blk-nr_sectors = nr_sectors;
+blk-dirty_bitmaps = NULL;
+blk-nr_bitmaps = 0;

  blk-iov.iov_base = blk-buf;
  blk-iov.iov_len = nr_sectors * BDRV_SECTOR_SIZE;
  qemu_iovec_init_external(blk-qiov, blk-iov, 1);



We can probably skip this block if only_bitmaps is true.


+if (block_mig_state.dbm_enable) {
+BlkMigDirtyBitmap *mig_bm;
+
+blk-nr_bitmaps = bmds-nr_bitmaps;
+mig_bm = blk-dirty_bitmaps = g_new(BlkMigDirtyBitmap,
+bmds-nr_bitmaps);
+for (i = 0; i  bmds-nr_bitmaps; ++i) {
+BdrvDirtyBitmap *bm = bmds-dirty_bitmaps[i];
+mig_bm-buf = g_malloc(bdrv_dbm_data_size(bm, nr_sectors));
+mig_bm-bitmap = bm;
+mig_bm++;
+}
+}
+


It's strange to me that we'd give it an only bitmaps boolean and then 
rely on 

Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration

2015-01-08 Thread Paolo Bonzini


On 11/12/2014 15:17, Vladimir Sementsov-Ogievskiy wrote:
 Just migrate parts of dirty bitmaps, corresponding to the block being
 migrated. Also, skip block migration if it is disabled (blk parameter
 of migrate command is false).
 
 Skipping shared sectors: bitmaps are migrated independently of this,
 just send blk without block data.

I think dirty bitmaps should be migrated separately from the block
migration.  It makes a lot of sense to migrate them even if you are not
using block migration.

Compared to new flags, the availability of migration capabilities can be
queried via query-migrate-capabilities before actually invoking the
migration.  So they are preferred, and you should add a migration
capability instead of a new argument.

The bitmaps are transmitted many times in their entirety, but only the
last copy actually means something.  The others are lost.  This means
you should use the non-live interface (register_savevm).  This will
simplify the code a lot.

Paolo



Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration

2015-01-08 Thread Eric Blake
On 01/08/2015 03:36 PM, Paolo Bonzini wrote:
 
 
 On 11/12/2014 15:17, Vladimir Sementsov-Ogievskiy wrote:
 Just migrate parts of dirty bitmaps, corresponding to the block being
 migrated. Also, skip block migration if it is disabled (blk parameter
 of migrate command is false).

 Skipping shared sectors: bitmaps are migrated independently of this,
 just send blk without block data.
 
 I think dirty bitmaps should be migrated separately from the block
 migration.  It makes a lot of sense to migrate them even if you are not
 using block migration.
 
 Compared to new flags, the availability of migration capabilities can be
 queried via query-migrate-capabilities before actually invoking the
 migration.  So they are preferred, and you should add a migration
 capability instead of a new argument.

Good point - we still don't have argument introspection (only command
introspection), so turning on bitmap migration with a capability is
indeed preferable over having to probe whether attempting an optional
parameter will induce failure.

 
 The bitmaps are transmitted many times in their entirety, but only the
 last copy actually means something.  The others are lost.  This means
 you should use the non-live interface (register_savevm).  This will
 simplify the code a lot.

If I understand correctly, you are saying that:

block migration vs. assuming shared storage during migration

is independent from:

current behavior of resetting dirty tracking on destination vs. new
one-shot non-live dirty bitmap migration

and that all four combinations are potentially useful.  Also, by doing
dirty bitmap migration as one-shot at the tail end of migration (in the
small window when things are no longer live) you have a lot less effort,
although the window of non-live activity is now a bit longer if the
dirty table migration is not efficient.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration

2015-01-08 Thread Paolo Bonzini
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



On 08/01/2015 23:45, Eric Blake wrote:
 The bitmaps are transmitted many times in their entirety, but
 only the last copy actually means something.  The others are
 lost.  This means you should use the non-live interface
 (register_savevm).  This will simplify the code a lot.
 If I understand correctly, you are saying that:
 
 block migration vs. assuming shared storage during migration
 
 is independent from:
 
 current behavior of resetting dirty tracking on destination vs.
 new one-shot non-live dirty bitmap migration
 
 and that all four combinations are potentially useful.

Yes.  For example, you could use drive-mirror to snoop writes to an
NBD destination (e.g. to a malware scanner or to a secondary machine
if you have fault tolerance).  Then you need to migrate the dirty
bitmap even if you have shared storage.

 Also, by doing dirty bitmap migration as one-shot at the tail end
 of migration (in the small window when things are no longer live)
 you have a lot less effort, although the window of non-live
 activity is now a bit longer if the dirty table migration is not
 efficient.

As of this series, dirty bitmap migration is not attempting to track
the dirtiness of the dirty bitmap itself (you might have to read that
twice :)).  So there's nothing to lose in terms of efficiency.

Paolo
-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQEcBAEBAgAGBQJUrwlwAAoJEL/70l94x66DSN4H+gLqqkghOqdkPzduTmf1hzi1
pwLe/TGZv+gNNWu8G2hO2FfasMlmBxEOVtwODFcwGExvPnH2jv1N7/dplIwKqbLS
g9Hq5cI3UagxfFQts7fyBhjCxeUrHuaKfIvc/A1kfMMwesn8PPGFUGEXNqIs1NGc
I+3qTE5TlcOKWCfL+3zgLDUowvRACbTJngHfveOtoWscVz743yQxhH3NOKPu7jxR
8H5AC9TK7sr3azHrXFgMP53W2qHT0KHTUyLQem+iKagFLsxX6oyOEn0ueMooAndE
tzvw+5EiFj8MfAzPAdQWEPYwxQyDBy/oXQizQTCmAQku2TZIlWi+kStJ3K/qEaM=
=v5Gj
-END PGP SIGNATURE-



[Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration

2014-12-11 Thread Vladimir Sementsov-Ogievskiy
Just migrate parts of dirty bitmaps, corresponding to the block being
migrated. Also, skip block migration if it is disabled (blk parameter
of migrate command is false).

Skipping shared sectors: bitmaps are migrated independently of this,
just send blk without block data.

Signed-off-by: Vladimir Sementsov-Ogievskiy vsement...@parallels.com
---
 block-migration.c | 173 +++---
 savevm.c  |   1 +
 2 files changed, 154 insertions(+), 20 deletions(-)

diff --git a/block-migration.c b/block-migration.c
index 908a66d..95d54a1 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -32,6 +32,8 @@
 #define BLK_MIG_FLAG_EOS0x02
 #define BLK_MIG_FLAG_PROGRESS   0x04
 #define BLK_MIG_FLAG_ZERO_BLOCK 0x08
+#define BLK_MIG_FLAG_HAS_BLOCK  0x10
+#define BLK_MIG_FLAG_HAS_BITMAPS0x20
 
 #define MAX_IS_ALLOCATED_SEARCH 65536
 
@@ -51,6 +53,8 @@ typedef struct BlkMigDevState {
 int shared_base;
 int64_t total_sectors;
 QSIMPLEQ_ENTRY(BlkMigDevState) entry;
+int nr_bitmaps;
+BdrvDirtyBitmap **dirty_bitmaps;
 
 /* Only used by migration thread.  Does not need a lock.  */
 int bulk_completed;
@@ -64,6 +68,11 @@ typedef struct BlkMigDevState {
 Error *blocker;
 } BlkMigDevState;
 
+typedef struct BlkMigDirtyBitmap {
+BdrvDirtyBitmap *bitmap;
+uint8_t *buf;
+} BlkMigDirtyBitmap;
+
 typedef struct BlkMigBlock {
 /* Only used by migration thread.  */
 uint8_t *buf;
@@ -74,6 +83,9 @@ typedef struct BlkMigBlock {
 QEMUIOVector qiov;
 BlockAIOCB *aiocb;
 
+int nr_bitmaps;
+BlkMigDirtyBitmap *dirty_bitmaps;
+
 /* Protected by block migration lock.  */
 int ret;
 QSIMPLEQ_ENTRY(BlkMigBlock) entry;
@@ -83,6 +95,7 @@ typedef struct BlkMigState {
 /* Written during setup phase.  Can be read without a lock.  */
 int blk_enable;
 int shared_base;
+int dbm_enable;
 QSIMPLEQ_HEAD(bmds_list, BlkMigDevState) bmds_list;
 int64_t total_sector_sum;
 bool zero_blocks;
@@ -116,27 +129,63 @@ static void blk_mig_unlock(void)
 /* Only allocating and initializing structure fields, not copying any data. */
 
 static BlkMigBlock *blk_create(BlkMigDevState *bmds, int64_t sector,
-int nr_sectors)
+   int nr_sectors, bool only_bitmaps)
 {
+int i;
 BlkMigBlock *blk = g_new(BlkMigBlock, 1);
-blk-buf = g_malloc(BLOCK_SIZE);
+blk-buf = only_bitmaps ? NULL : g_malloc(BLOCK_SIZE);
 blk-bmds = bmds;
 blk-sector = sector;
 blk-nr_sectors = nr_sectors;
+blk-dirty_bitmaps = NULL;
+blk-nr_bitmaps = 0;
 
 blk-iov.iov_base = blk-buf;
 blk-iov.iov_len = nr_sectors * BDRV_SECTOR_SIZE;
 qemu_iovec_init_external(blk-qiov, blk-iov, 1);
 
+if (block_mig_state.dbm_enable) {
+BlkMigDirtyBitmap *mig_bm;
+
+blk-nr_bitmaps = bmds-nr_bitmaps;
+mig_bm = blk-dirty_bitmaps = g_new(BlkMigDirtyBitmap,
+bmds-nr_bitmaps);
+for (i = 0; i  bmds-nr_bitmaps; ++i) {
+BdrvDirtyBitmap *bm = bmds-dirty_bitmaps[i];
+mig_bm-buf = g_malloc(bdrv_dbm_data_size(bm, nr_sectors));
+mig_bm-bitmap = bm;
+mig_bm++;
+}
+}
+
 return blk;
 }
 
 static void blk_free(BlkMigBlock *blk)
 {
+int i;
 g_free(blk-buf);
+
+if (blk-dirty_bitmaps) {
+for (i = 0; i  blk-nr_bitmaps; ++i) {
+g_free(blk-dirty_bitmaps[i].buf);
+}
+g_free(blk-dirty_bitmaps);
+}
+
 g_free(blk);
 }
 
+static void blk_store_bitmaps(BlkMigBlock *blk)
+{
+int i;
+for (i = 0; i  blk-nr_bitmaps; ++i) {
+bdrv_dbm_store_data(blk-dirty_bitmaps[i].bitmap,
+blk-dirty_bitmaps[i].buf,
+blk-sector, blk-nr_sectors);
+}
+}
+
 /* Must run outside of the iothread lock during the bulk phase,
  * or the VM will stall.
  */
@@ -146,9 +195,15 @@ static void blk_send(QEMUFile *f, BlkMigBlock * blk)
 int len;
 uint64_t flags = BLK_MIG_FLAG_DEVICE_BLOCK;
 
-if (block_mig_state.zero_blocks 
+if (block_mig_state.zero_blocks  blk-buf 
 buffer_is_zero(blk-buf, BLOCK_SIZE)) {
 flags |= BLK_MIG_FLAG_ZERO_BLOCK;
+} else if (blk-buf) {
+flags |= BLK_MIG_FLAG_HAS_BLOCK;
+}
+
+if (block_mig_state.dbm_enable) {
+flags |= BLK_MIG_FLAG_HAS_BITMAPS;
 }
 
 /* sector number and flags */
@@ -160,10 +215,27 @@ static void blk_send(QEMUFile *f, BlkMigBlock * blk)
 qemu_put_byte(f, len);
 qemu_put_buffer(f, (uint8_t *)bdrv_get_device_name(blk-bmds-bs), len);
 
+/* dirty bitmaps */
+if (flags  BLK_MIG_FLAG_HAS_BITMAPS) {
+int i;
+qemu_put_be32(f, blk-nr_bitmaps);
+for (i = 0; i  blk-nr_bitmaps; ++i) {
+BdrvDirtyBitmap *bm = blk-dirty_bitmaps[i].bitmap;
+int buf_size =