Re: [Qemu-devel] [Qemu-ppc] broken incoming migration

2013-06-12 Thread Paolo Bonzini
Il 08/06/2013 22:53, Benjamin Herrenschmidt ha scritto:
 On Sun, 2013-06-09 at 10:16 +0800, Wenchao Xia wrote:
If an page was not received and destination knows that page should
 exist according to total size, fill it with zero at destination, would
 it solve the problem?
 
 The easiest way to do that is to not write to those pages at the
 destination to begin with, when initializing the VM... Is there any way
 to know that a VM is being setup as a migration target or not ?

There is the incoming variable in vl.c (currently not a global), but I
suspect Peter's patch could have also broken loadvm.  It could quickly
become a rat hole.

The only bug we have is not a performance bug related to compression;
it's that writing zero pages breaks overcommit.  Let's fix that, and
only that.

Paolo



Re: [Qemu-devel] [Qemu-ppc] broken incoming migration

2013-06-12 Thread Benjamin Herrenschmidt
On Wed, 2013-06-12 at 10:00 -0400, Paolo Bonzini wrote:
 The only bug we have is not a performance bug related to compression;
 it's that writing zero pages breaks overcommit.  Let's fix that, and
 only that.

Right, do we have a way to madvise throw away these instead ? Or do we
have a way to track that the platform init code did write something
there and only clear *those* pages ?

Cheers,
Ben.





Re: [Qemu-devel] [Qemu-ppc] broken incoming migration

2013-06-12 Thread Paolo Bonzini
Il 12/06/2013 10:11, Benjamin Herrenschmidt ha scritto:
 On Wed, 2013-06-12 at 10:00 -0400, Paolo Bonzini wrote:
 The only bug we have is not a performance bug related to compression;
 it's that writing zero pages breaks overcommit.  Let's fix that, and
 only that.
 
 Right, do we have a way to madvise throw away these instead ?

We already do that, but apparently that madvise is asynchronous.

 Or do we
 have a way to track that the platform init code did write something
 there and only clear *those* pages ?

No need for; since it's copy-on-write, not copy-on-read :) we can just
check for pages that are zero and not rewrite them with zeros.  That's
what Peter's patches do, I'll review them right away.

Paolo




Re: [Qemu-devel] [Qemu-ppc] broken incoming migration

2013-06-12 Thread Wenchao Xia

于 2013-6-13 4:10, Paolo Bonzini 写道:

Il 12/06/2013 10:11, Benjamin Herrenschmidt ha scritto:

On Wed, 2013-06-12 at 10:00 -0400, Paolo Bonzini wrote:

The only bug we have is not a performance bug related to compression;
it's that writing zero pages breaks overcommit.  Let's fix that, and
only that.


Right, do we have a way to madvise throw away these instead ?


We already do that, but apparently that madvise is asynchronous.


Or do we
have a way to track that the platform init code did write something
there and only clear *those* pages ?


No need for; since it's copy-on-write, not copy-on-read :) we can just
check for pages that are zero and not rewrite them with zeros.  That's

  I think it is the right way to improve overcommit without breaking
anything.


what Peter's patches do, I'll review them right away.

Paolo










--
Best Regards

Wenchao Xia




Re: [Qemu-devel] [Qemu-ppc] broken incoming migration

2013-06-10 Thread Benjamin Herrenschmidt
On Mon, 2013-06-10 at 19:10 +1000, Alexey Kardashevskiy wrote:
  I would prefer not to completely drop the patch since it saves bandwidth and
  resources.
 
 I would like migration to do what it should do - send pages no matter what,
 this is exactly what migration is for. If there any many, many empty pages
 (which I doubt to be a very often real life case), they could all merged in
 big consecutive chunks and sent at the end of migration.

I tend to agree. The problem of sending empty pages is purely a problem of
compression. If the current mechanism is deemed not efficient enough for
in the case of having lots of zero-pages, then by all means invent a better
packet format for more tightly representing them on the wire, but don't
break things by not sending them at all.

Cheers,
Ben.





Re: [Qemu-devel] [Qemu-ppc] broken incoming migration

2013-06-10 Thread Peter Lieven

On 10.06.2013 11:33, Benjamin Herrenschmidt wrote:

On Mon, 2013-06-10 at 19:10 +1000, Alexey Kardashevskiy wrote:

I would prefer not to completely drop the patch since it saves bandwidth and
resources.

I would like migration to do what it should do - send pages no matter what,
this is exactly what migration is for. If there any many, many empty pages
(which I doubt to be a very often real life case), they could all merged in
big consecutive chunks and sent at the end of migration.

I tend to agree. The problem of sending empty pages is purely a problem of
compression. If the current mechanism is deemed not efficient enough for
in the case of having lots of zero-pages, then by all means invent a better
packet format for more tightly representing them on the wire, but don't
break things by not sending them at all.


Ok, I see the point. I think the paradigm to say that the destination
should decide if it needs a page or not is a sound one.

Zero pages are quite often depending on the lifetime and the operating
system used. But a consecutive range of zero pages is only likely
in the bulk stage. I don't know if its reasonable to add a special encoding
for that.

I will sent a v2 of my previous revert patch addressing Erics concerns shortly.

Peter




Re: [Qemu-devel] [Qemu-ppc] broken incoming migration

2013-06-08 Thread Benjamin Herrenschmidt
On Sun, 2013-06-09 at 12:34 +1000, Alexey Kardashevskiy wrote:

 It is _live_ migration, the source sends changes, same pages can change and
 be sent several times. So we would need to turn tracking on on the
 destination to know if some page was received from the source or changed by
 the destination itself (by writing there bios/firmware images, etc) and
 then clear pages which were touched by the destination and were not sent by
 the source.

Or we can set some kind of flag so that when creating a migration
target VM we don't load all these things into memory.

 Or we do not make guesses, the source sends everything and the destination
 simply checks if a page which is empty on the source is empty on the
 destination and avoid writing zeroes to it. Looks simpler to me and this is
 what the new patch does.

But you end up sending a lot of zero's ... is the migration compressed
(I am not familiar with it at all) ? If it is, that shouldn't be a big
deal, but else it feels to me that you should be able to send a special
packet instead that says all zeros because you'll potentially have an
awful lot of these.

Ben.

  
 
  Also, you mean following code is from qemu and it does not allocate
  memory with you gcc right? Maybe it is related to KVM, how about
  turn off KVM and retry following code in qemu?
 
  #include stdio.h
  #include stdlib.h
  #include assert.h
  #include unistd.h
  #include sys/resource.h
  #include inttypes.h
  #include string.h
  #include sys/mman.h
  #include errno.h
 
  #if defined __SSE2__
  #include emmintrin.h
  #define VECTYPE__m128i
  #define SPLAT(p)   _mm_set1_epi8(*(p))
  #define ALL_EQ(v1, v2) (_mm_movemask_epi8(_mm_cmpeq_epi8(v1, v2)) ==
  0x)
  #else
  #define VECTYPEunsigned long
  #define SPLAT(p)   (*(p) * (~0UL / 255))
  #define ALL_EQ(v1, v2) ((v1) == (v2))
  #endif
 
  #define BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR 8
 
  /* Round number down to multiple */
  #define QEMU_ALIGN_DOWN(n, m) ((n) / (m) * (m))
 
  /* Round number up to multiple */
  #define QEMU_ALIGN_UP(n, m) QEMU_ALIGN_DOWN((n) + (m) - 1, (m))
 
  #define QEMU_VMALLOC_ALIGN (256 * 4096)
 
  /* alloc shared memory pages */
  void *qemu_anon_ram_alloc(size_t size)
  {
size_t align = QEMU_VMALLOC_ALIGN;
size_t total = size + align - getpagesize();
void *ptr = mmap(0, total, PROT_READ | PROT_WRITE,
 MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
size_t offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) -
  (uintptr_t)ptr;
 
if (ptr == MAP_FAILED) {
fprintf(stderr, Failed to allocate %zu B: %s\n,
size, strerror(errno));
abort();
}
 
ptr += offset;
total -= offset;
 
if (offset  0) {
munmap(ptr - offset, offset);
}
if (total  size) {
munmap(ptr + size, total - size);
}
 
return ptr;
  }
 
  static inline int
  can_use_buffer_find_nonzero_offset(const void *buf, size_t len)
  {
return (len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
   * sizeof(VECTYPE)) == 0
 ((uintptr_t) buf) % sizeof(VECTYPE) == 0);
  }
 
  size_t buffer_find_nonzero_offset(const void *buf, size_t len)
  {
const VECTYPE *p = buf;
const VECTYPE zero = (VECTYPE){0};
size_t i;
 
if (!len) {
return 0;
}
 
assert(can_use_buffer_find_nonzero_offset(buf, len));
 
for (i = 0; i  BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR; i++) {
if (!ALL_EQ(p[i], zero)) {
return i * sizeof(VECTYPE);
}
}
 
for (i = BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR;
 i  len / sizeof(VECTYPE);
 i += BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR) {
VECTYPE tmp0 = p[i + 0] | p[i + 1];
VECTYPE tmp1 = p[i + 2] | p[i + 3];
VECTYPE tmp2 = p[i + 4] | p[i + 5];
VECTYPE tmp3 = p[i + 6] | p[i + 7];
VECTYPE tmp01 = tmp0 | tmp1;
VECTYPE tmp23 = tmp2 | tmp3;
if (!ALL_EQ(tmp01 | tmp23, zero)) {
break;
}
}
 
return i * sizeof(VECTYPE);
  }
 
  int main()
  {
 //char *x = malloc(1024  20);
 char *x = qemu_anon_ram_alloc(1024  20);
 
 int i, j;
 int ret = 0;
 struct rusage rusage;
 for (i = 0; i  500; i ++) {
 for (j = 0; j  10  20; j += 4096) {
  ret += buffer_find_nonzero_offset((char*) (x + (i  20)
  + j), 4096);
 }
 getrusage( RUSAGE_SELF, rusage );
 printf(read offset: %d kB, RSS size: %ld kB, ((i+1)  10),
  rusage.ru_maxrss);
 getchar();
 }
 printf(%d zero pages\n, ret);
  }
 
 
 
 
 
  
  
 
 





Re: [Qemu-devel] [Qemu-ppc] broken incoming migration

2013-06-08 Thread Benjamin Herrenschmidt
On Sun, 2013-06-09 at 10:16 +0800, Wenchao Xia wrote:
If an page was not received and destination knows that page should
 exist according to total size, fill it with zero at destination, would
 it solve the problem?

The easiest way to do that is to not write to those pages at the
destination to begin with, when initializing the VM... Is there any way
to know that a VM is being setup as a migration target or not ?

Ben.





Re: [Qemu-devel] [Qemu-ppc] broken incoming migration

2013-06-08 Thread Alexey Kardashevskiy
On 06/09/2013 12:52 PM, Benjamin Herrenschmidt wrote:
 On Sun, 2013-06-09 at 12:34 +1000, Alexey Kardashevskiy wrote:
 
 It is _live_ migration, the source sends changes, same pages can change and
 be sent several times. So we would need to turn tracking on on the
 destination to know if some page was received from the source or changed by
 the destination itself (by writing there bios/firmware images, etc) and
 then clear pages which were touched by the destination and were not sent by
 the source.
 
 Or we can set some kind of flag so that when creating a migration
 target VM we don't load all these things into memory.

How would we do that? The platform initialization code does not have a clue
whether it is going to receive a migrated host or not.


 Or we do not make guesses, the source sends everything and the destination
 simply checks if a page which is empty on the source is empty on the
 destination and avoid writing zeroes to it. Looks simpler to me and this is
 what the new patch does.
 
 But you end up sending a lot of zero's ... is the migration compressed
 (I am not familiar with it at all) ? If it is, that shouldn't be a big
 deal, but else it feels to me that you should be able to send a special
 packet instead that says all zeros because you'll potentially have an
 awful lot of these.

It is compressed exactly as you described..


 
 Ben.
 


 Also, you mean following code is from qemu and it does not allocate
 memory with you gcc right? Maybe it is related to KVM, how about
 turn off KVM and retry following code in qemu?

 #include stdio.h
 #include stdlib.h
 #include assert.h
 #include unistd.h
 #include sys/resource.h
 #include inttypes.h
 #include string.h
 #include sys/mman.h
 #include errno.h

 #if defined __SSE2__
 #include emmintrin.h
 #define VECTYPE__m128i
 #define SPLAT(p)   _mm_set1_epi8(*(p))
 #define ALL_EQ(v1, v2) (_mm_movemask_epi8(_mm_cmpeq_epi8(v1, v2)) ==
 0x)
 #else
 #define VECTYPEunsigned long
 #define SPLAT(p)   (*(p) * (~0UL / 255))
 #define ALL_EQ(v1, v2) ((v1) == (v2))
 #endif

 #define BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR 8

 /* Round number down to multiple */
 #define QEMU_ALIGN_DOWN(n, m) ((n) / (m) * (m))

 /* Round number up to multiple */
 #define QEMU_ALIGN_UP(n, m) QEMU_ALIGN_DOWN((n) + (m) - 1, (m))

 #define QEMU_VMALLOC_ALIGN (256 * 4096)

 /* alloc shared memory pages */
 void *qemu_anon_ram_alloc(size_t size)
 {
   size_t align = QEMU_VMALLOC_ALIGN;
   size_t total = size + align - getpagesize();
   void *ptr = mmap(0, total, PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
   size_t offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) -
 (uintptr_t)ptr;

   if (ptr == MAP_FAILED) {
   fprintf(stderr, Failed to allocate %zu B: %s\n,
   size, strerror(errno));
   abort();
   }

   ptr += offset;
   total -= offset;

   if (offset  0) {
   munmap(ptr - offset, offset);
   }
   if (total  size) {
   munmap(ptr + size, total - size);
   }

   return ptr;
 }

 static inline int
 can_use_buffer_find_nonzero_offset(const void *buf, size_t len)
 {
   return (len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
  * sizeof(VECTYPE)) == 0
((uintptr_t) buf) % sizeof(VECTYPE) == 0);
 }

 size_t buffer_find_nonzero_offset(const void *buf, size_t len)
 {
   const VECTYPE *p = buf;
   const VECTYPE zero = (VECTYPE){0};
   size_t i;

   if (!len) {
   return 0;
   }

   assert(can_use_buffer_find_nonzero_offset(buf, len));

   for (i = 0; i  BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR; i++) {
   if (!ALL_EQ(p[i], zero)) {
   return i * sizeof(VECTYPE);
   }
   }

   for (i = BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR;
i  len / sizeof(VECTYPE);
i += BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR) {
   VECTYPE tmp0 = p[i + 0] | p[i + 1];
   VECTYPE tmp1 = p[i + 2] | p[i + 3];
   VECTYPE tmp2 = p[i + 4] | p[i + 5];
   VECTYPE tmp3 = p[i + 6] | p[i + 7];
   VECTYPE tmp01 = tmp0 | tmp1;
   VECTYPE tmp23 = tmp2 | tmp3;
   if (!ALL_EQ(tmp01 | tmp23, zero)) {
   break;
   }
   }

   return i * sizeof(VECTYPE);
 }

 int main()
 {
//char *x = malloc(1024  20);
char *x = qemu_anon_ram_alloc(1024  20);

int i, j;
int ret = 0;
struct rusage rusage;
for (i = 0; i  500; i ++) {
for (j = 0; j  10  20; j += 4096) {
 ret += buffer_find_nonzero_offset((char*) (x + (i  20)
 + j), 4096);
}
getrusage( RUSAGE_SELF, rusage );
printf(read offset: %d kB, RSS size: %ld kB, ((i+1)  10),
 rusage.ru_maxrss);
getchar();
}
printf(%d zero pages\n, ret);
 }









 
 


-- 
Alexey