Re: [Qemu-devel] [PATCH RESEND for 2.3 5/6] migration: optimize xbzrle by reducing data copy

2014-12-11 Thread Dr. David Alan Gilbert
* ChenLiang (chenlian...@huawei.com) wrote:
 On 2014/12/10 18:39, Dr. David Alan Gilbert wrote:
 
  * Juan Quintela (quint...@redhat.com) wrote:
  arei.gong...@huawei.com wrote:
  From: ChenLiang chenlian...@huawei.com
 
  Signed-off-by: ChenLiang chenlian...@huawei.com
  Signed-off-by: Gonglei arei.gong...@huawei.com
  Reviewed-by: Dr. David Alan Gilbert dgilb...@redhat.com
  ---
   arch_init.c | 8 +++-
   1 file changed, 3 insertions(+), 5 deletions(-)
 
  diff --git a/arch_init.c b/arch_init.c
  index 846e4c5..0d0ba4a 100644
  --- a/arch_init.c
  +++ b/arch_init.c
  @@ -376,11 +376,8 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t 
  **current_data,
   
   prev_cached_page = get_cached_data(XBZRLE.cache, current_addr);
   
  -/* save current buffer into memory */
  -memcpy(XBZRLE.current_buf, *current_data, TARGET_PAGE_SIZE);
  -
 
  I think this is wrong.
  Remember that now migration is done in parallel with the guest running.
  If the guest modifies the page while we are encoding it, we end with a
  different contents in the cache and in the real page, and that causes
  corruption.
 
  This way, what we encoded is a private copy of the page, so we don't
  have that problem.
 
  Makes sense?
  
  Kind of; see back in March I hit this while testing the 1st version of this
  patch:
  https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg05631.html
  
  but then we had some patches that fixed it; and the discussion was here:
  https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg05677.html
  and then I summarized it as:
  https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg05768.html
  
  * It's an interesting, if unusual, observation; it means we can send
  * completely bogus data at this point because we know it will get
  * overwritten later; I think the requirements are:
  * 
  *   1) That we meet the protocol (which seems to require that the run 
  lengths are
  *  not allowed to be 0)
  *   2) That we don't get stuck in any loops or go over the end of the page
  *  (I think this means we have to be careful of those byte loops within
  *  the word-at-a-time cases)
  *   3) The page that ends up in our xbzrle cache must match the destination
  *  page, since the next cycle of xbzrle will use it as reference.
  * 
  
  Dave
 
 
 Hi
 The content that is discussed above is helpful to understand
 the principle of xbzrle. Do you mind that I add it into xbzrle.txt?

That's fine.

Dave

 
 Best regards
 ChenLiang
 
   /* XBZRLE encoding (if there is no overflow) */
  -encoded_len = xbzrle_encode_buffer(prev_cached_page, 
  XBZRLE.current_buf,
  +encoded_len = xbzrle_encode_buffer(prev_cached_page, *current_data,
  TARGET_PAGE_SIZE, 
  XBZRLE.encoded_buf,
  TARGET_PAGE_SIZE);
   if (encoded_len == 0) {
  @@ -399,7 +396,8 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t 
  **current_data,
   
   /* we need to update the data in the cache, in order to get the same 
  data */
   if (!last_stage) {
  -memcpy(prev_cached_page, XBZRLE.current_buf, TARGET_PAGE_SIZE);
  +xbzrle_decode_buffer(XBZRLE.encoded_buf, encoded_len, 
  prev_cached_page,
  + TARGET_PAGE_SIZE);
   }
   
   /* Send XBZRLE based compressed page */
  --
  Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
  
  .
  
 
 
 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [PATCH RESEND for 2.3 5/6] migration: optimize xbzrle by reducing data copy

2014-12-10 Thread Juan Quintela
arei.gong...@huawei.com wrote:
 From: ChenLiang chenlian...@huawei.com

 Signed-off-by: ChenLiang chenlian...@huawei.com
 Signed-off-by: Gonglei arei.gong...@huawei.com
 Reviewed-by: Dr. David Alan Gilbert dgilb...@redhat.com
 ---
  arch_init.c | 8 +++-
  1 file changed, 3 insertions(+), 5 deletions(-)

 diff --git a/arch_init.c b/arch_init.c
 index 846e4c5..0d0ba4a 100644
 --- a/arch_init.c
 +++ b/arch_init.c
 @@ -376,11 +376,8 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t 
 **current_data,
  
  prev_cached_page = get_cached_data(XBZRLE.cache, current_addr);
  
 -/* save current buffer into memory */
 -memcpy(XBZRLE.current_buf, *current_data, TARGET_PAGE_SIZE);
 -

I think this is wrong.
Remember that now migration is done in parallel with the guest running.
If the guest modifies the page while we are encoding it, we end with a
different contents in the cache and in the real page, and that causes
corruption.

This way, what we encoded is a private copy of the page, so we don't
have that problem.

Makes sense?

  /* XBZRLE encoding (if there is no overflow) */
 -encoded_len = xbzrle_encode_buffer(prev_cached_page, XBZRLE.current_buf,
 +encoded_len = xbzrle_encode_buffer(prev_cached_page, *current_data,
 TARGET_PAGE_SIZE, XBZRLE.encoded_buf,
 TARGET_PAGE_SIZE);
  if (encoded_len == 0) {
 @@ -399,7 +396,8 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t 
 **current_data,
  
  /* we need to update the data in the cache, in order to get the same 
 data */
  if (!last_stage) {
 -memcpy(prev_cached_page, XBZRLE.current_buf, TARGET_PAGE_SIZE);
 +xbzrle_decode_buffer(XBZRLE.encoded_buf, encoded_len, 
 prev_cached_page,
 + TARGET_PAGE_SIZE);
  }
  
  /* Send XBZRLE based compressed page */



Re: [Qemu-devel] [PATCH RESEND for 2.3 5/6] migration: optimize xbzrle by reducing data copy

2014-12-10 Thread Dr. David Alan Gilbert
* Juan Quintela (quint...@redhat.com) wrote:
 arei.gong...@huawei.com wrote:
  From: ChenLiang chenlian...@huawei.com
 
  Signed-off-by: ChenLiang chenlian...@huawei.com
  Signed-off-by: Gonglei arei.gong...@huawei.com
  Reviewed-by: Dr. David Alan Gilbert dgilb...@redhat.com
  ---
   arch_init.c | 8 +++-
   1 file changed, 3 insertions(+), 5 deletions(-)
 
  diff --git a/arch_init.c b/arch_init.c
  index 846e4c5..0d0ba4a 100644
  --- a/arch_init.c
  +++ b/arch_init.c
  @@ -376,11 +376,8 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t 
  **current_data,
   
   prev_cached_page = get_cached_data(XBZRLE.cache, current_addr);
   
  -/* save current buffer into memory */
  -memcpy(XBZRLE.current_buf, *current_data, TARGET_PAGE_SIZE);
  -
 
 I think this is wrong.
 Remember that now migration is done in parallel with the guest running.
 If the guest modifies the page while we are encoding it, we end with a
 different contents in the cache and in the real page, and that causes
 corruption.
 
 This way, what we encoded is a private copy of the page, so we don't
 have that problem.
 
 Makes sense?

Kind of; see back in March I hit this while testing the 1st version of this
patch:
https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg05631.html

but then we had some patches that fixed it; and the discussion was here:
https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg05677.html
and then I summarized it as:
https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg05768.html

* It's an interesting, if unusual, observation; it means we can send
* completely bogus data at this point because we know it will get
* overwritten later; I think the requirements are:
* 
*   1) That we meet the protocol (which seems to require that the run lengths 
are
*  not allowed to be 0)
*   2) That we don't get stuck in any loops or go over the end of the page
*  (I think this means we have to be careful of those byte loops within
*  the word-at-a-time cases)
*   3) The page that ends up in our xbzrle cache must match the destination
*  page, since the next cycle of xbzrle will use it as reference.
* 

Dave

   /* XBZRLE encoding (if there is no overflow) */
  -encoded_len = xbzrle_encode_buffer(prev_cached_page, 
  XBZRLE.current_buf,
  +encoded_len = xbzrle_encode_buffer(prev_cached_page, *current_data,
  TARGET_PAGE_SIZE, 
  XBZRLE.encoded_buf,
  TARGET_PAGE_SIZE);
   if (encoded_len == 0) {
  @@ -399,7 +396,8 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t 
  **current_data,
   
   /* we need to update the data in the cache, in order to get the same 
  data */
   if (!last_stage) {
  -memcpy(prev_cached_page, XBZRLE.current_buf, TARGET_PAGE_SIZE);
  +xbzrle_decode_buffer(XBZRLE.encoded_buf, encoded_len, 
  prev_cached_page,
  + TARGET_PAGE_SIZE);
   }
   
   /* Send XBZRLE based compressed page */
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [PATCH RESEND for 2.3 5/6] migration: optimize xbzrle by reducing data copy

2014-12-10 Thread ChenLiang
On 2014/12/10 18:39, Dr. David Alan Gilbert wrote:

 * Juan Quintela (quint...@redhat.com) wrote:
 arei.gong...@huawei.com wrote:
 From: ChenLiang chenlian...@huawei.com

 Signed-off-by: ChenLiang chenlian...@huawei.com
 Signed-off-by: Gonglei arei.gong...@huawei.com
 Reviewed-by: Dr. David Alan Gilbert dgilb...@redhat.com
 ---
  arch_init.c | 8 +++-
  1 file changed, 3 insertions(+), 5 deletions(-)

 diff --git a/arch_init.c b/arch_init.c
 index 846e4c5..0d0ba4a 100644
 --- a/arch_init.c
 +++ b/arch_init.c
 @@ -376,11 +376,8 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t 
 **current_data,
  
  prev_cached_page = get_cached_data(XBZRLE.cache, current_addr);
  
 -/* save current buffer into memory */
 -memcpy(XBZRLE.current_buf, *current_data, TARGET_PAGE_SIZE);
 -

 I think this is wrong.
 Remember that now migration is done in parallel with the guest running.
 If the guest modifies the page while we are encoding it, we end with a
 different contents in the cache and in the real page, and that causes
 corruption.

 This way, what we encoded is a private copy of the page, so we don't
 have that problem.

 Makes sense?
 
 Kind of; see back in March I hit this while testing the 1st version of this
 patch:
 https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg05631.html
 
 but then we had some patches that fixed it; and the discussion was here:
 https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg05677.html
 and then I summarized it as:
 https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg05768.html
 
 * It's an interesting, if unusual, observation; it means we can send
 * completely bogus data at this point because we know it will get
 * overwritten later; I think the requirements are:
 * 
 *   1) That we meet the protocol (which seems to require that the run lengths 
 are
 *  not allowed to be 0)
 *   2) That we don't get stuck in any loops or go over the end of the page
 *  (I think this means we have to be careful of those byte loops within
 *  the word-at-a-time cases)
 *   3) The page that ends up in our xbzrle cache must match the destination
 *  page, since the next cycle of xbzrle will use it as reference.
 * 
 
 Dave


Hi
The content that is discussed above is helpful to understand
the principle of xbzrle. Do you mind that I add it into xbzrle.txt?

Best regards
ChenLiang

  /* XBZRLE encoding (if there is no overflow) */
 -encoded_len = xbzrle_encode_buffer(prev_cached_page, 
 XBZRLE.current_buf,
 +encoded_len = xbzrle_encode_buffer(prev_cached_page, *current_data,
 TARGET_PAGE_SIZE, 
 XBZRLE.encoded_buf,
 TARGET_PAGE_SIZE);
  if (encoded_len == 0) {
 @@ -399,7 +396,8 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t 
 **current_data,
  
  /* we need to update the data in the cache, in order to get the same 
 data */
  if (!last_stage) {
 -memcpy(prev_cached_page, XBZRLE.current_buf, TARGET_PAGE_SIZE);
 +xbzrle_decode_buffer(XBZRLE.encoded_buf, encoded_len, 
 prev_cached_page,
 + TARGET_PAGE_SIZE);
  }
  
  /* Send XBZRLE based compressed page */
 --
 Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
 
 .