RE: [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-15 Thread Li, Liang Z
> > > > > >   I'm just catching back up on this thread; so without
> > > > > > reference to any particular previous mail in the thread.
> > > > > >
> > > > > >   1) How many of the free pages do we tell the host about?
> > > > > >  Your main change is telling the host about all the
> > > > > >  free pages.
> > > > >
> > > > > Yes, all the guest's free pages.
> > > > >
> > > > > >  If we tell the host about all the free pages, then we might
> > > > > >  end up needing to allocate more pages and update the host
> > > > > >  with pages we now want to use; that would have to wait for the
> > > > > >  host to acknowledge that use of these pages, since if we don't
> > > > > >  wait for it then it might have skipped migrating a page we
> > > > > >  just started using (I don't understand how your series solves 
> > > > > > that).
> > > > > >  So the guest probably needs to keep some free pages - how
> many?
> > > > >
> > > > > Actually, there is no need to care about whether the free pages
> > > > > will be
> > > used by the host.
> > > > > We only care about some of the free pages we get reused by the
> > > > > guest,
> > > right?
> > > > >
> > > > > The dirty page logging can be used to solve this, starting the
> > > > > dirty page logging before getting the free pages informant from guest.
> > > > > Even some of the free pages are modified by the guest during the
> > > > > process of getting the free pages information, these modified
> > > > > pages will
> > > be traced by the dirty page logging mechanism. So in the following
> > > migration_bitmap_sync() function.
> > > > > The pages in the free pages bitmap, but latter was modified,
> > > > > will be reset to dirty. We won't omit any dirtied pages.
> > > > >
> > > > > So, guest doesn't need to keep any free pages.
> > > >
> > > > OK, yes, that works; so we do:
> > > >   * enable dirty logging
> > > >   * ask guest for free pages
> > > >   * initialise the migration bitmap as everything-free
> > > >   * then later we do the normal sync-dirty bitmap stuff and it all just
> works.
> > > >
> > > > That's nice and simple.
> > >
> > > This works once, sure. But there's an issue is that you have to
> > > defer migration until you get the free page list, and this only
> > > works once. So you end up with heuristics about how long to wait.
> > >
> > > Instead I propose:
> > >
> > > - mark all pages dirty as we do now.
> > >
> > > - at start of migration, start tracking dirty
> > >   pages in kvm, and tell guest to start tracking free pages
> > >
> > > we can now introduce any kind of delay, for example wait for ack
> > > from guest, or do whatever else, or even just start migrating pages
> > >
> > > - repeatedly:
> > >   - get list of free pages from guest
> > >   - clear them in migration bitmap
> > >   - get dirty list from kvm
> > >
> > > - at end of migration, stop tracking writes in kvm,
> > >   and tell guest to stop tracking free pages
> >
> > I had thought of filtering out the free pages in each migration bitmap
> synchronization.
> > The advantage is we can skip process as many free pages as possible. Not
> just once.
> > The disadvantage is that we should change the current memory
> > management code to track the free pages, instead of traversing the free
> page list to construct the free pages bitmap, to reduce the overhead to get
> the free pages bitmap.
> > I am not sure the if the Kernel people would like it.
> >
> > If keeping the traversing mechanism, because of the overhead, maybe it's
> not worth to filter out the free pages repeatedly.
> 
> Well, Michael's idea of not waiting for the dirty bitmap to be filled does 
> make
> that idea of constnatly using the free-bitmap better.
> 

No wait is a good idea.
Actually, we could shorten the waiting time by pre allocating the free pages 
bit map
and update it when guest allocating/freeing pages. it requires to modify the mm 
related code. I don't know whether the kernel people like this.

> In that case, is it easier if something (guest/host?) allocates some memory in
> the guests physical RAM space and just points the host to it, rather than
> having an explicit 'send'.
> 

Good idea too.

Liang
> Dave


RE: [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-15 Thread Li, Liang Z
> > > > > >   I'm just catching back up on this thread; so without
> > > > > > reference to any particular previous mail in the thread.
> > > > > >
> > > > > >   1) How many of the free pages do we tell the host about?
> > > > > >  Your main change is telling the host about all the
> > > > > >  free pages.
> > > > >
> > > > > Yes, all the guest's free pages.
> > > > >
> > > > > >  If we tell the host about all the free pages, then we might
> > > > > >  end up needing to allocate more pages and update the host
> > > > > >  with pages we now want to use; that would have to wait for the
> > > > > >  host to acknowledge that use of these pages, since if we don't
> > > > > >  wait for it then it might have skipped migrating a page we
> > > > > >  just started using (I don't understand how your series solves 
> > > > > > that).
> > > > > >  So the guest probably needs to keep some free pages - how
> many?
> > > > >
> > > > > Actually, there is no need to care about whether the free pages
> > > > > will be
> > > used by the host.
> > > > > We only care about some of the free pages we get reused by the
> > > > > guest,
> > > right?
> > > > >
> > > > > The dirty page logging can be used to solve this, starting the
> > > > > dirty page logging before getting the free pages informant from guest.
> > > > > Even some of the free pages are modified by the guest during the
> > > > > process of getting the free pages information, these modified
> > > > > pages will
> > > be traced by the dirty page logging mechanism. So in the following
> > > migration_bitmap_sync() function.
> > > > > The pages in the free pages bitmap, but latter was modified,
> > > > > will be reset to dirty. We won't omit any dirtied pages.
> > > > >
> > > > > So, guest doesn't need to keep any free pages.
> > > >
> > > > OK, yes, that works; so we do:
> > > >   * enable dirty logging
> > > >   * ask guest for free pages
> > > >   * initialise the migration bitmap as everything-free
> > > >   * then later we do the normal sync-dirty bitmap stuff and it all just
> works.
> > > >
> > > > That's nice and simple.
> > >
> > > This works once, sure. But there's an issue is that you have to
> > > defer migration until you get the free page list, and this only
> > > works once. So you end up with heuristics about how long to wait.
> > >
> > > Instead I propose:
> > >
> > > - mark all pages dirty as we do now.
> > >
> > > - at start of migration, start tracking dirty
> > >   pages in kvm, and tell guest to start tracking free pages
> > >
> > > we can now introduce any kind of delay, for example wait for ack
> > > from guest, or do whatever else, or even just start migrating pages
> > >
> > > - repeatedly:
> > >   - get list of free pages from guest
> > >   - clear them in migration bitmap
> > >   - get dirty list from kvm
> > >
> > > - at end of migration, stop tracking writes in kvm,
> > >   and tell guest to stop tracking free pages
> >
> > I had thought of filtering out the free pages in each migration bitmap
> synchronization.
> > The advantage is we can skip process as many free pages as possible. Not
> just once.
> > The disadvantage is that we should change the current memory
> > management code to track the free pages, instead of traversing the free
> page list to construct the free pages bitmap, to reduce the overhead to get
> the free pages bitmap.
> > I am not sure the if the Kernel people would like it.
> >
> > If keeping the traversing mechanism, because of the overhead, maybe it's
> not worth to filter out the free pages repeatedly.
> 
> Well, Michael's idea of not waiting for the dirty bitmap to be filled does 
> make
> that idea of constnatly using the free-bitmap better.
> 

No wait is a good idea.
Actually, we could shorten the waiting time by pre allocating the free pages 
bit map
and update it when guest allocating/freeing pages. it requires to modify the mm 
related code. I don't know whether the kernel people like this.

> In that case, is it easier if something (guest/host?) allocates some memory in
> the guests physical RAM space and just points the host to it, rather than
> having an explicit 'send'.
> 

Good idea too.

Liang
> Dave


Re: [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-15 Thread Dr. David Alan Gilbert
* Li, Liang Z (liang.z...@intel.com) wrote:
> > On Mon, Mar 14, 2016 at 05:03:34PM +, Dr. David Alan Gilbert wrote:
> > > * Li, Liang Z (liang.z...@intel.com) wrote:
> > > > >
> > > > > Hi,
> > > > >   I'm just catching back up on this thread; so without reference
> > > > > to any particular previous mail in the thread.
> > > > >
> > > > >   1) How many of the free pages do we tell the host about?
> > > > >  Your main change is telling the host about all the
> > > > >  free pages.
> > > >
> > > > Yes, all the guest's free pages.
> > > >
> > > > >  If we tell the host about all the free pages, then we might
> > > > >  end up needing to allocate more pages and update the host
> > > > >  with pages we now want to use; that would have to wait for the
> > > > >  host to acknowledge that use of these pages, since if we don't
> > > > >  wait for it then it might have skipped migrating a page we
> > > > >  just started using (I don't understand how your series solves 
> > > > > that).
> > > > >  So the guest probably needs to keep some free pages - how many?
> > > >
> > > > Actually, there is no need to care about whether the free pages will be
> > used by the host.
> > > > We only care about some of the free pages we get reused by the guest,
> > right?
> > > >
> > > > The dirty page logging can be used to solve this, starting the dirty
> > > > page logging before getting the free pages informant from guest.
> > > > Even some of the free pages are modified by the guest during the
> > > > process of getting the free pages information, these modified pages will
> > be traced by the dirty page logging mechanism. So in the following
> > migration_bitmap_sync() function.
> > > > The pages in the free pages bitmap, but latter was modified, will be
> > > > reset to dirty. We won't omit any dirtied pages.
> > > >
> > > > So, guest doesn't need to keep any free pages.
> > >
> > > OK, yes, that works; so we do:
> > >   * enable dirty logging
> > >   * ask guest for free pages
> > >   * initialise the migration bitmap as everything-free
> > >   * then later we do the normal sync-dirty bitmap stuff and it all just 
> > > works.
> > >
> > > That's nice and simple.
> > 
> > This works once, sure. But there's an issue is that you have to defer 
> > migration
> > until you get the free page list, and this only works once. So you end up 
> > with
> > heuristics about how long to wait.
> > 
> > Instead I propose:
> > 
> > - mark all pages dirty as we do now.
> > 
> > - at start of migration, start tracking dirty
> >   pages in kvm, and tell guest to start tracking free pages
> > 
> > we can now introduce any kind of delay, for example wait for ack from guest,
> > or do whatever else, or even just start migrating pages
> > 
> > - repeatedly:
> > - get list of free pages from guest
> > - clear them in migration bitmap
> > - get dirty list from kvm
> > 
> > - at end of migration, stop tracking writes in kvm,
> >   and tell guest to stop tracking free pages
> 
> I had thought of filtering out the free pages in each migration bitmap 
> synchronization. 
> The advantage is we can skip process as many free pages as possible. Not just 
> once.
> The disadvantage is that we should change the current memory management code 
> to track the free pages,
> instead of traversing the free page list to construct the free pages bitmap, 
> to reduce the overhead to get the free pages bitmap.
> I am not sure the if the Kernel people would like it.
> 
> If keeping the traversing mechanism, because of the overhead, maybe it's not 
> worth to filter out the free pages repeatedly.

Well, Michael's idea of not waiting for the dirty
bitmap to be filled does make that idea of constnatly
using the free-bitmap better.

In that case, is it easier if something (guest/host?)
allocates some memory in the guests physical RAM space
and just points the host to it, rather than having an 
explicit 'send'.

Dave

> Liang
> 
> 
> 
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK


Re: [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-15 Thread Dr. David Alan Gilbert
* Li, Liang Z (liang.z...@intel.com) wrote:
> > On Mon, Mar 14, 2016 at 05:03:34PM +, Dr. David Alan Gilbert wrote:
> > > * Li, Liang Z (liang.z...@intel.com) wrote:
> > > > >
> > > > > Hi,
> > > > >   I'm just catching back up on this thread; so without reference
> > > > > to any particular previous mail in the thread.
> > > > >
> > > > >   1) How many of the free pages do we tell the host about?
> > > > >  Your main change is telling the host about all the
> > > > >  free pages.
> > > >
> > > > Yes, all the guest's free pages.
> > > >
> > > > >  If we tell the host about all the free pages, then we might
> > > > >  end up needing to allocate more pages and update the host
> > > > >  with pages we now want to use; that would have to wait for the
> > > > >  host to acknowledge that use of these pages, since if we don't
> > > > >  wait for it then it might have skipped migrating a page we
> > > > >  just started using (I don't understand how your series solves 
> > > > > that).
> > > > >  So the guest probably needs to keep some free pages - how many?
> > > >
> > > > Actually, there is no need to care about whether the free pages will be
> > used by the host.
> > > > We only care about some of the free pages we get reused by the guest,
> > right?
> > > >
> > > > The dirty page logging can be used to solve this, starting the dirty
> > > > page logging before getting the free pages informant from guest.
> > > > Even some of the free pages are modified by the guest during the
> > > > process of getting the free pages information, these modified pages will
> > be traced by the dirty page logging mechanism. So in the following
> > migration_bitmap_sync() function.
> > > > The pages in the free pages bitmap, but latter was modified, will be
> > > > reset to dirty. We won't omit any dirtied pages.
> > > >
> > > > So, guest doesn't need to keep any free pages.
> > >
> > > OK, yes, that works; so we do:
> > >   * enable dirty logging
> > >   * ask guest for free pages
> > >   * initialise the migration bitmap as everything-free
> > >   * then later we do the normal sync-dirty bitmap stuff and it all just 
> > > works.
> > >
> > > That's nice and simple.
> > 
> > This works once, sure. But there's an issue is that you have to defer 
> > migration
> > until you get the free page list, and this only works once. So you end up 
> > with
> > heuristics about how long to wait.
> > 
> > Instead I propose:
> > 
> > - mark all pages dirty as we do now.
> > 
> > - at start of migration, start tracking dirty
> >   pages in kvm, and tell guest to start tracking free pages
> > 
> > we can now introduce any kind of delay, for example wait for ack from guest,
> > or do whatever else, or even just start migrating pages
> > 
> > - repeatedly:
> > - get list of free pages from guest
> > - clear them in migration bitmap
> > - get dirty list from kvm
> > 
> > - at end of migration, stop tracking writes in kvm,
> >   and tell guest to stop tracking free pages
> 
> I had thought of filtering out the free pages in each migration bitmap 
> synchronization. 
> The advantage is we can skip process as many free pages as possible. Not just 
> once.
> The disadvantage is that we should change the current memory management code 
> to track the free pages,
> instead of traversing the free page list to construct the free pages bitmap, 
> to reduce the overhead to get the free pages bitmap.
> I am not sure the if the Kernel people would like it.
> 
> If keeping the traversing mechanism, because of the overhead, maybe it's not 
> worth to filter out the free pages repeatedly.

Well, Michael's idea of not waiting for the dirty
bitmap to be filled does make that idea of constnatly
using the free-bitmap better.

In that case, is it easier if something (guest/host?)
allocates some memory in the guests physical RAM space
and just points the host to it, rather than having an 
explicit 'send'.

Dave

> Liang
> 
> 
> 
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK


RE: [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-15 Thread Li, Liang Z
> On Mon, Mar 14, 2016 at 05:03:34PM +, Dr. David Alan Gilbert wrote:
> > * Li, Liang Z (liang.z...@intel.com) wrote:
> > > >
> > > > Hi,
> > > >   I'm just catching back up on this thread; so without reference
> > > > to any particular previous mail in the thread.
> > > >
> > > >   1) How many of the free pages do we tell the host about?
> > > >  Your main change is telling the host about all the
> > > >  free pages.
> > >
> > > Yes, all the guest's free pages.
> > >
> > > >  If we tell the host about all the free pages, then we might
> > > >  end up needing to allocate more pages and update the host
> > > >  with pages we now want to use; that would have to wait for the
> > > >  host to acknowledge that use of these pages, since if we don't
> > > >  wait for it then it might have skipped migrating a page we
> > > >  just started using (I don't understand how your series solves 
> > > > that).
> > > >  So the guest probably needs to keep some free pages - how many?
> > >
> > > Actually, there is no need to care about whether the free pages will be
> used by the host.
> > > We only care about some of the free pages we get reused by the guest,
> right?
> > >
> > > The dirty page logging can be used to solve this, starting the dirty
> > > page logging before getting the free pages informant from guest.
> > > Even some of the free pages are modified by the guest during the
> > > process of getting the free pages information, these modified pages will
> be traced by the dirty page logging mechanism. So in the following
> migration_bitmap_sync() function.
> > > The pages in the free pages bitmap, but latter was modified, will be
> > > reset to dirty. We won't omit any dirtied pages.
> > >
> > > So, guest doesn't need to keep any free pages.
> >
> > OK, yes, that works; so we do:
> >   * enable dirty logging
> >   * ask guest for free pages
> >   * initialise the migration bitmap as everything-free
> >   * then later we do the normal sync-dirty bitmap stuff and it all just 
> > works.
> >
> > That's nice and simple.
> 
> This works once, sure. But there's an issue is that you have to defer 
> migration
> until you get the free page list, and this only works once. So you end up with
> heuristics about how long to wait.
> 
> Instead I propose:
> 
> - mark all pages dirty as we do now.
> 
> - at start of migration, start tracking dirty
>   pages in kvm, and tell guest to start tracking free pages
> 
> we can now introduce any kind of delay, for example wait for ack from guest,
> or do whatever else, or even just start migrating pages
> 
> - repeatedly:
>   - get list of free pages from guest
>   - clear them in migration bitmap
>   - get dirty list from kvm
> 
> - at end of migration, stop tracking writes in kvm,
>   and tell guest to stop tracking free pages

I had thought of filtering out the free pages in each migration bitmap 
synchronization. 
The advantage is we can skip process as many free pages as possible. Not just 
once.
The disadvantage is that we should change the current memory management code to 
track the free pages,
instead of traversing the free page list to construct the free pages bitmap, to 
reduce the overhead to get the free pages bitmap.
I am not sure the if the Kernel people would like it.

If keeping the traversing mechanism, because of the overhead, maybe it's not 
worth to filter out the free pages repeatedly.

Liang






RE: [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-15 Thread Li, Liang Z
> On Mon, Mar 14, 2016 at 05:03:34PM +, Dr. David Alan Gilbert wrote:
> > * Li, Liang Z (liang.z...@intel.com) wrote:
> > > >
> > > > Hi,
> > > >   I'm just catching back up on this thread; so without reference
> > > > to any particular previous mail in the thread.
> > > >
> > > >   1) How many of the free pages do we tell the host about?
> > > >  Your main change is telling the host about all the
> > > >  free pages.
> > >
> > > Yes, all the guest's free pages.
> > >
> > > >  If we tell the host about all the free pages, then we might
> > > >  end up needing to allocate more pages and update the host
> > > >  with pages we now want to use; that would have to wait for the
> > > >  host to acknowledge that use of these pages, since if we don't
> > > >  wait for it then it might have skipped migrating a page we
> > > >  just started using (I don't understand how your series solves 
> > > > that).
> > > >  So the guest probably needs to keep some free pages - how many?
> > >
> > > Actually, there is no need to care about whether the free pages will be
> used by the host.
> > > We only care about some of the free pages we get reused by the guest,
> right?
> > >
> > > The dirty page logging can be used to solve this, starting the dirty
> > > page logging before getting the free pages informant from guest.
> > > Even some of the free pages are modified by the guest during the
> > > process of getting the free pages information, these modified pages will
> be traced by the dirty page logging mechanism. So in the following
> migration_bitmap_sync() function.
> > > The pages in the free pages bitmap, but latter was modified, will be
> > > reset to dirty. We won't omit any dirtied pages.
> > >
> > > So, guest doesn't need to keep any free pages.
> >
> > OK, yes, that works; so we do:
> >   * enable dirty logging
> >   * ask guest for free pages
> >   * initialise the migration bitmap as everything-free
> >   * then later we do the normal sync-dirty bitmap stuff and it all just 
> > works.
> >
> > That's nice and simple.
> 
> This works once, sure. But there's an issue is that you have to defer 
> migration
> until you get the free page list, and this only works once. So you end up with
> heuristics about how long to wait.
> 
> Instead I propose:
> 
> - mark all pages dirty as we do now.
> 
> - at start of migration, start tracking dirty
>   pages in kvm, and tell guest to start tracking free pages
> 
> we can now introduce any kind of delay, for example wait for ack from guest,
> or do whatever else, or even just start migrating pages
> 
> - repeatedly:
>   - get list of free pages from guest
>   - clear them in migration bitmap
>   - get dirty list from kvm
> 
> - at end of migration, stop tracking writes in kvm,
>   and tell guest to stop tracking free pages

I had thought of filtering out the free pages in each migration bitmap 
synchronization. 
The advantage is we can skip process as many free pages as possible. Not just 
once.
The disadvantage is that we should change the current memory management code to 
track the free pages,
instead of traversing the free page list to construct the free pages bitmap, to 
reduce the overhead to get the free pages bitmap.
I am not sure the if the Kernel people would like it.

If keeping the traversing mechanism, because of the overhead, maybe it's not 
worth to filter out the free pages repeatedly.

Liang






Re: [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-15 Thread Michael S. Tsirkin
On Mon, Mar 14, 2016 at 05:03:34PM +, Dr. David Alan Gilbert wrote:
> * Li, Liang Z (liang.z...@intel.com) wrote:
> > > 
> > > Hi,
> > >   I'm just catching back up on this thread; so without reference to any
> > > particular previous mail in the thread.
> > > 
> > >   1) How many of the free pages do we tell the host about?
> > >  Your main change is telling the host about all the
> > >  free pages.
> > 
> > Yes, all the guest's free pages.
> > 
> > >  If we tell the host about all the free pages, then we might
> > >  end up needing to allocate more pages and update the host
> > >  with pages we now want to use; that would have to wait for the
> > >  host to acknowledge that use of these pages, since if we don't
> > >  wait for it then it might have skipped migrating a page we
> > >  just started using (I don't understand how your series solves that).
> > >  So the guest probably needs to keep some free pages - how many?
> > 
> > Actually, there is no need to care about whether the free pages will be 
> > used by the host.
> > We only care about some of the free pages we get reused by the guest, right?
> > 
> > The dirty page logging can be used to solve this, starting the dirty page 
> > logging before getting
> > the free pages informant from guest. Even some of the free pages are 
> > modified by the guest
> > during the process of getting the free pages information, these modified 
> > pages will be traced
> > by the dirty page logging mechanism. So in the following 
> > migration_bitmap_sync() function.
> > The pages in the free pages bitmap, but latter was modified, will be reset 
> > to dirty. We won't
> > omit any dirtied pages.
> > 
> > So, guest doesn't need to keep any free pages.
> 
> OK, yes, that works; so we do:
>   * enable dirty logging
>   * ask guest for free pages
>   * initialise the migration bitmap as everything-free
>   * then later we do the normal sync-dirty bitmap stuff and it all just works.
> 
> That's nice and simple.

This works once, sure. But there's an issue is that you have
to defer migration until you get the free page list,
and this only works once. So you end up with heuristics
about how long to wait.

Instead I propose:

- mark all pages dirty as we do now.

- at start of migration, start tracking dirty
  pages in kvm, and tell guest to start tracking free pages

we can now introduce any kind of delay, for
example wait for ack from guest, or do whatever else,
or even just start migrating pages

- repeatedly:
- get list of free pages from guest
- clear them in migration bitmap
- get dirty list from kvm

- at end of migration, stop tracking writes in kvm,
  and tell guest to stop tracking free pages



> > >   2) Clearing out caches
> > >  Does it make sense to clean caches?  They're apparently useful data
> > >  so if we clean them it's likely to slow the guest down; I guess
> > >  they're also likely to be fairly static data - so at least fairly
> > >  easy to migrate.
> > >  The answer here partially depends on what you want from your 
> > > migration;
> > >  if you're after the fastest possible migration time it might make
> > >  sense to clean the caches and avoid migrating them; but that might
> > >  be at the cost of more disruption to the guest - there's a trade off
> > >  somewhere and it's not clear to me how you set that depending on your
> > >  guest/network/reqirements.
> > > 
> > 
> > Yes, clean the caches is an option.  Let the users decide using it or not.
> > 
> > >   3) Why is ballooning slow?
> > >  You've got a figure of 5s to balloon on an 8GB VM - but an
> > >  8GB VM isn't huge; so I worry about how long it would take
> > >  on a big VM.   We need to understand why it's slow
> > >* is it due to the guest shuffling pages around?
> > >* is it due to the virtio-balloon protocol sending one page
> > >  at a time?
> > >  + Do balloon pages normally clump in physical memory
> > > - i.e. would a 'large balloon' message help
> > > - or do we need a bitmap because it tends not to clump?
> > > 
> > 
> > I didn't do a comprehensive test. But I found most of the time spending
> > on allocating the pages and sending the PFNs to guest, I don't know that's
> > the most time consuming operation, allocating the pages or sending the PFNs.
> 
> It might be a good idea to analyse it a bit more to convince people where
> the problem is.
> 
> > >* is it due to the madvise on the host?
> > >  If we were using the normal balloon messages, then we
> > >  could, during migration, just route those to the migration
> > >  code rather than bothering with the madvise.
> > >  If they're clumping together we could just turn that into
> > >  one big madvise; if they're not then would we benefit from
> > >  a call that lets us madvise lots of areas?
> > > 

Re: [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-15 Thread Michael S. Tsirkin
On Mon, Mar 14, 2016 at 05:03:34PM +, Dr. David Alan Gilbert wrote:
> * Li, Liang Z (liang.z...@intel.com) wrote:
> > > 
> > > Hi,
> > >   I'm just catching back up on this thread; so without reference to any
> > > particular previous mail in the thread.
> > > 
> > >   1) How many of the free pages do we tell the host about?
> > >  Your main change is telling the host about all the
> > >  free pages.
> > 
> > Yes, all the guest's free pages.
> > 
> > >  If we tell the host about all the free pages, then we might
> > >  end up needing to allocate more pages and update the host
> > >  with pages we now want to use; that would have to wait for the
> > >  host to acknowledge that use of these pages, since if we don't
> > >  wait for it then it might have skipped migrating a page we
> > >  just started using (I don't understand how your series solves that).
> > >  So the guest probably needs to keep some free pages - how many?
> > 
> > Actually, there is no need to care about whether the free pages will be 
> > used by the host.
> > We only care about some of the free pages we get reused by the guest, right?
> > 
> > The dirty page logging can be used to solve this, starting the dirty page 
> > logging before getting
> > the free pages informant from guest. Even some of the free pages are 
> > modified by the guest
> > during the process of getting the free pages information, these modified 
> > pages will be traced
> > by the dirty page logging mechanism. So in the following 
> > migration_bitmap_sync() function.
> > The pages in the free pages bitmap, but latter was modified, will be reset 
> > to dirty. We won't
> > omit any dirtied pages.
> > 
> > So, guest doesn't need to keep any free pages.
> 
> OK, yes, that works; so we do:
>   * enable dirty logging
>   * ask guest for free pages
>   * initialise the migration bitmap as everything-free
>   * then later we do the normal sync-dirty bitmap stuff and it all just works.
> 
> That's nice and simple.

This works once, sure. But there's an issue is that you have
to defer migration until you get the free page list,
and this only works once. So you end up with heuristics
about how long to wait.

Instead I propose:

- mark all pages dirty as we do now.

- at start of migration, start tracking dirty
  pages in kvm, and tell guest to start tracking free pages

we can now introduce any kind of delay, for
example wait for ack from guest, or do whatever else,
or even just start migrating pages

- repeatedly:
- get list of free pages from guest
- clear them in migration bitmap
- get dirty list from kvm

- at end of migration, stop tracking writes in kvm,
  and tell guest to stop tracking free pages



> > >   2) Clearing out caches
> > >  Does it make sense to clean caches?  They're apparently useful data
> > >  so if we clean them it's likely to slow the guest down; I guess
> > >  they're also likely to be fairly static data - so at least fairly
> > >  easy to migrate.
> > >  The answer here partially depends on what you want from your 
> > > migration;
> > >  if you're after the fastest possible migration time it might make
> > >  sense to clean the caches and avoid migrating them; but that might
> > >  be at the cost of more disruption to the guest - there's a trade off
> > >  somewhere and it's not clear to me how you set that depending on your
> > >  guest/network/reqirements.
> > > 
> > 
> > Yes, clean the caches is an option.  Let the users decide using it or not.
> > 
> > >   3) Why is ballooning slow?
> > >  You've got a figure of 5s to balloon on an 8GB VM - but an
> > >  8GB VM isn't huge; so I worry about how long it would take
> > >  on a big VM.   We need to understand why it's slow
> > >* is it due to the guest shuffling pages around?
> > >* is it due to the virtio-balloon protocol sending one page
> > >  at a time?
> > >  + Do balloon pages normally clump in physical memory
> > > - i.e. would a 'large balloon' message help
> > > - or do we need a bitmap because it tends not to clump?
> > > 
> > 
> > I didn't do a comprehensive test. But I found most of the time spending
> > on allocating the pages and sending the PFNs to guest, I don't know that's
> > the most time consuming operation, allocating the pages or sending the PFNs.
> 
> It might be a good idea to analyse it a bit more to convince people where
> the problem is.
> 
> > >* is it due to the madvise on the host?
> > >  If we were using the normal balloon messages, then we
> > >  could, during migration, just route those to the migration
> > >  code rather than bothering with the madvise.
> > >  If they're clumping together we could just turn that into
> > >  one big madvise; if they're not then would we benefit from
> > >  a call that lets us madvise lots of areas?
> > > 

RE: [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-14 Thread Li, Liang Z
> > > Hi,
> > >   I'm just catching back up on this thread; so without reference to
> > > any particular previous mail in the thread.
> > >
> > >   1) How many of the free pages do we tell the host about?
> > >  Your main change is telling the host about all the
> > >  free pages.
> >
> > Yes, all the guest's free pages.
> >
> > >  If we tell the host about all the free pages, then we might
> > >  end up needing to allocate more pages and update the host
> > >  with pages we now want to use; that would have to wait for the
> > >  host to acknowledge that use of these pages, since if we don't
> > >  wait for it then it might have skipped migrating a page we
> > >  just started using (I don't understand how your series solves that).
> > >  So the guest probably needs to keep some free pages - how many?
> >
> > Actually, there is no need to care about whether the free pages will be
> used by the host.
> > We only care about some of the free pages we get reused by the guest,
> right?
> >
> > The dirty page logging can be used to solve this, starting the dirty
> > page logging before getting the free pages informant from guest. Even
> > some of the free pages are modified by the guest during the process of
> > getting the free pages information, these modified pages will be traced by
> the dirty page logging mechanism. So in the following
> migration_bitmap_sync() function.
> > The pages in the free pages bitmap, but latter was modified, will be
> > reset to dirty. We won't omit any dirtied pages.
> >
> > So, guest doesn't need to keep any free pages.
> 
> OK, yes, that works; so we do:
>   * enable dirty logging
>   * ask guest for free pages
>   * initialise the migration bitmap as everything-free
>   * then later we do the normal sync-dirty bitmap stuff and it all just works.
> 
> That's nice and simple.
> 
> > >   2) Clearing out caches
> > >  Does it make sense to clean caches?  They're apparently useful data
> > >  so if we clean them it's likely to slow the guest down; I guess
> > >  they're also likely to be fairly static data - so at least fairly
> > >  easy to migrate.
> > >  The answer here partially depends on what you want from your
> migration;
> > >  if you're after the fastest possible migration time it might make
> > >  sense to clean the caches and avoid migrating them; but that might
> > >  be at the cost of more disruption to the guest - there's a trade off
> > >  somewhere and it's not clear to me how you set that depending on
> your
> > >  guest/network/reqirements.
> > >
> >
> > Yes, clean the caches is an option.  Let the users decide using it or not.
> >
> > >   3) Why is ballooning slow?
> > >  You've got a figure of 5s to balloon on an 8GB VM - but an
> > >  8GB VM isn't huge; so I worry about how long it would take
> > >  on a big VM.   We need to understand why it's slow
> > >* is it due to the guest shuffling pages around?
> > >* is it due to the virtio-balloon protocol sending one page
> > >  at a time?
> > >  + Do balloon pages normally clump in physical memory
> > > - i.e. would a 'large balloon' message help
> > > - or do we need a bitmap because it tends not to clump?
> > >
> >
> > I didn't do a comprehensive test. But I found most of the time
> > spending on allocating the pages and sending the PFNs to guest, I
> > don't know that's the most time consuming operation, allocating the pages
> or sending the PFNs.
> 
> It might be a good idea to analyse it a bit more to convince people where the
> problem is.
> 

Yes, I will try to measure the time spending on different parts.

> > >* is it due to the madvise on the host?
> > >  If we were using the normal balloon messages, then we
> > >  could, during migration, just route those to the migration
> > >  code rather than bothering with the madvise.
> > >  If they're clumping together we could just turn that into
> > >  one big madvise; if they're not then would we benefit from
> > >  a call that lets us madvise lots of areas?
> > >
> >
> > My test showed madvise() is not the main reason for the long time,
> > only taken 10% of the total  inflating balloon operation time.
> > Big madvise can more or less improve the performance.
> 
> OK; 10% of the total is still pretty big even for your 8GB VM.
> 
> > >   4) Speeding up the migration of those free pages
> > > You're using the bitmap to avoid migrating those free pages; HPe's
> > > patchset is reconstructing a bitmap from the balloon data;  OK, so
> > > this all makes sense to avoid migrating them - I'd also been thinking
> > > of using pagemap to spot zero pages that would help find other zero'd
> > > pages, but perhaps ballooned is enough?
> > >
> > Could you describe your ideal with more details?
> 
> At the moment the migration code spends a fair amount of time 

RE: [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-14 Thread Li, Liang Z
> > > Hi,
> > >   I'm just catching back up on this thread; so without reference to
> > > any particular previous mail in the thread.
> > >
> > >   1) How many of the free pages do we tell the host about?
> > >  Your main change is telling the host about all the
> > >  free pages.
> >
> > Yes, all the guest's free pages.
> >
> > >  If we tell the host about all the free pages, then we might
> > >  end up needing to allocate more pages and update the host
> > >  with pages we now want to use; that would have to wait for the
> > >  host to acknowledge that use of these pages, since if we don't
> > >  wait for it then it might have skipped migrating a page we
> > >  just started using (I don't understand how your series solves that).
> > >  So the guest probably needs to keep some free pages - how many?
> >
> > Actually, there is no need to care about whether the free pages will be
> used by the host.
> > We only care about some of the free pages we get reused by the guest,
> right?
> >
> > The dirty page logging can be used to solve this, starting the dirty
> > page logging before getting the free pages informant from guest. Even
> > some of the free pages are modified by the guest during the process of
> > getting the free pages information, these modified pages will be traced by
> the dirty page logging mechanism. So in the following
> migration_bitmap_sync() function.
> > The pages in the free pages bitmap, but latter was modified, will be
> > reset to dirty. We won't omit any dirtied pages.
> >
> > So, guest doesn't need to keep any free pages.
> 
> OK, yes, that works; so we do:
>   * enable dirty logging
>   * ask guest for free pages
>   * initialise the migration bitmap as everything-free
>   * then later we do the normal sync-dirty bitmap stuff and it all just works.
> 
> That's nice and simple.
> 
> > >   2) Clearing out caches
> > >  Does it make sense to clean caches?  They're apparently useful data
> > >  so if we clean them it's likely to slow the guest down; I guess
> > >  they're also likely to be fairly static data - so at least fairly
> > >  easy to migrate.
> > >  The answer here partially depends on what you want from your
> migration;
> > >  if you're after the fastest possible migration time it might make
> > >  sense to clean the caches and avoid migrating them; but that might
> > >  be at the cost of more disruption to the guest - there's a trade off
> > >  somewhere and it's not clear to me how you set that depending on
> your
> > >  guest/network/reqirements.
> > >
> >
> > Yes, clean the caches is an option.  Let the users decide using it or not.
> >
> > >   3) Why is ballooning slow?
> > >  You've got a figure of 5s to balloon on an 8GB VM - but an
> > >  8GB VM isn't huge; so I worry about how long it would take
> > >  on a big VM.   We need to understand why it's slow
> > >* is it due to the guest shuffling pages around?
> > >* is it due to the virtio-balloon protocol sending one page
> > >  at a time?
> > >  + Do balloon pages normally clump in physical memory
> > > - i.e. would a 'large balloon' message help
> > > - or do we need a bitmap because it tends not to clump?
> > >
> >
> > I didn't do a comprehensive test. But I found most of the time
> > spending on allocating the pages and sending the PFNs to guest, I
> > don't know that's the most time consuming operation, allocating the pages
> or sending the PFNs.
> 
> It might be a good idea to analyse it a bit more to convince people where the
> problem is.
> 

Yes, I will try to measure the time spending on different parts.

> > >* is it due to the madvise on the host?
> > >  If we were using the normal balloon messages, then we
> > >  could, during migration, just route those to the migration
> > >  code rather than bothering with the madvise.
> > >  If they're clumping together we could just turn that into
> > >  one big madvise; if they're not then would we benefit from
> > >  a call that lets us madvise lots of areas?
> > >
> >
> > My test showed madvise() is not the main reason for the long time,
> > only taken 10% of the total  inflating balloon operation time.
> > Big madvise can more or less improve the performance.
> 
> OK; 10% of the total is still pretty big even for your 8GB VM.
> 
> > >   4) Speeding up the migration of those free pages
> > > You're using the bitmap to avoid migrating those free pages; HPe's
> > > patchset is reconstructing a bitmap from the balloon data;  OK, so
> > > this all makes sense to avoid migrating them - I'd also been thinking
> > > of using pagemap to spot zero pages that would help find other zero'd
> > > pages, but perhaps ballooned is enough?
> > >
> > Could you describe your ideal with more details?
> 
> At the moment the migration code spends a fair amount of time 

Re: [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-14 Thread Dr. David Alan Gilbert
* Li, Liang Z (liang.z...@intel.com) wrote:
> > 
> > Hi,
> >   I'm just catching back up on this thread; so without reference to any
> > particular previous mail in the thread.
> > 
> >   1) How many of the free pages do we tell the host about?
> >  Your main change is telling the host about all the
> >  free pages.
> 
> Yes, all the guest's free pages.
> 
> >  If we tell the host about all the free pages, then we might
> >  end up needing to allocate more pages and update the host
> >  with pages we now want to use; that would have to wait for the
> >  host to acknowledge that use of these pages, since if we don't
> >  wait for it then it might have skipped migrating a page we
> >  just started using (I don't understand how your series solves that).
> >  So the guest probably needs to keep some free pages - how many?
> 
> Actually, there is no need to care about whether the free pages will be used 
> by the host.
> We only care about some of the free pages we get reused by the guest, right?
> 
> The dirty page logging can be used to solve this, starting the dirty page 
> logging before getting
> the free pages informant from guest. Even some of the free pages are modified 
> by the guest
> during the process of getting the free pages information, these modified 
> pages will be traced
> by the dirty page logging mechanism. So in the following 
> migration_bitmap_sync() function.
> The pages in the free pages bitmap, but latter was modified, will be reset to 
> dirty. We won't
> omit any dirtied pages.
> 
> So, guest doesn't need to keep any free pages.

OK, yes, that works; so we do:
  * enable dirty logging
  * ask guest for free pages
  * initialise the migration bitmap as everything-free
  * then later we do the normal sync-dirty bitmap stuff and it all just works.

That's nice and simple.

> >   2) Clearing out caches
> >  Does it make sense to clean caches?  They're apparently useful data
> >  so if we clean them it's likely to slow the guest down; I guess
> >  they're also likely to be fairly static data - so at least fairly
> >  easy to migrate.
> >  The answer here partially depends on what you want from your migration;
> >  if you're after the fastest possible migration time it might make
> >  sense to clean the caches and avoid migrating them; but that might
> >  be at the cost of more disruption to the guest - there's a trade off
> >  somewhere and it's not clear to me how you set that depending on your
> >  guest/network/reqirements.
> > 
> 
> Yes, clean the caches is an option.  Let the users decide using it or not.
> 
> >   3) Why is ballooning slow?
> >  You've got a figure of 5s to balloon on an 8GB VM - but an
> >  8GB VM isn't huge; so I worry about how long it would take
> >  on a big VM.   We need to understand why it's slow
> >* is it due to the guest shuffling pages around?
> >* is it due to the virtio-balloon protocol sending one page
> >  at a time?
> >  + Do balloon pages normally clump in physical memory
> > - i.e. would a 'large balloon' message help
> > - or do we need a bitmap because it tends not to clump?
> > 
> 
> I didn't do a comprehensive test. But I found most of the time spending
> on allocating the pages and sending the PFNs to guest, I don't know that's
> the most time consuming operation, allocating the pages or sending the PFNs.

It might be a good idea to analyse it a bit more to convince people where
the problem is.

> >* is it due to the madvise on the host?
> >  If we were using the normal balloon messages, then we
> >  could, during migration, just route those to the migration
> >  code rather than bothering with the madvise.
> >  If they're clumping together we could just turn that into
> >  one big madvise; if they're not then would we benefit from
> >  a call that lets us madvise lots of areas?
> > 
> 
> My test showed madvise() is not the main reason for the long time, only taken
> 10% of the total  inflating balloon operation time.
> Big madvise can more or less improve the performance.

OK; 10% of the total is still pretty big even for your 8GB VM.

> >   4) Speeding up the migration of those free pages
> > You're using the bitmap to avoid migrating those free pages; HPe's
> > patchset is reconstructing a bitmap from the balloon data;  OK, so
> > this all makes sense to avoid migrating them - I'd also been thinking
> > of using pagemap to spot zero pages that would help find other zero'd
> > pages, but perhaps ballooned is enough?
> > 
> Could you describe your ideal with more details?

At the moment the migration code spends a fair amount of time checking if a page
is zero; I was thinking perhaps the qemu could just open /proc/self/pagemap
and check if the page was mapped; that would seem cheap if we're checking big
ranges; and that 

Re: [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-14 Thread Dr. David Alan Gilbert
* Li, Liang Z (liang.z...@intel.com) wrote:
> > 
> > Hi,
> >   I'm just catching back up on this thread; so without reference to any
> > particular previous mail in the thread.
> > 
> >   1) How many of the free pages do we tell the host about?
> >  Your main change is telling the host about all the
> >  free pages.
> 
> Yes, all the guest's free pages.
> 
> >  If we tell the host about all the free pages, then we might
> >  end up needing to allocate more pages and update the host
> >  with pages we now want to use; that would have to wait for the
> >  host to acknowledge that use of these pages, since if we don't
> >  wait for it then it might have skipped migrating a page we
> >  just started using (I don't understand how your series solves that).
> >  So the guest probably needs to keep some free pages - how many?
> 
> Actually, there is no need to care about whether the free pages will be used 
> by the host.
> We only care about some of the free pages we get reused by the guest, right?
> 
> The dirty page logging can be used to solve this, starting the dirty page 
> logging before getting
> the free pages informant from guest. Even some of the free pages are modified 
> by the guest
> during the process of getting the free pages information, these modified 
> pages will be traced
> by the dirty page logging mechanism. So in the following 
> migration_bitmap_sync() function.
> The pages in the free pages bitmap, but latter was modified, will be reset to 
> dirty. We won't
> omit any dirtied pages.
> 
> So, guest doesn't need to keep any free pages.

OK, yes, that works; so we do:
  * enable dirty logging
  * ask guest for free pages
  * initialise the migration bitmap as everything-free
  * then later we do the normal sync-dirty bitmap stuff and it all just works.

That's nice and simple.

> >   2) Clearing out caches
> >  Does it make sense to clean caches?  They're apparently useful data
> >  so if we clean them it's likely to slow the guest down; I guess
> >  they're also likely to be fairly static data - so at least fairly
> >  easy to migrate.
> >  The answer here partially depends on what you want from your migration;
> >  if you're after the fastest possible migration time it might make
> >  sense to clean the caches and avoid migrating them; but that might
> >  be at the cost of more disruption to the guest - there's a trade off
> >  somewhere and it's not clear to me how you set that depending on your
> >  guest/network/reqirements.
> > 
> 
> Yes, clean the caches is an option.  Let the users decide using it or not.
> 
> >   3) Why is ballooning slow?
> >  You've got a figure of 5s to balloon on an 8GB VM - but an
> >  8GB VM isn't huge; so I worry about how long it would take
> >  on a big VM.   We need to understand why it's slow
> >* is it due to the guest shuffling pages around?
> >* is it due to the virtio-balloon protocol sending one page
> >  at a time?
> >  + Do balloon pages normally clump in physical memory
> > - i.e. would a 'large balloon' message help
> > - or do we need a bitmap because it tends not to clump?
> > 
> 
> I didn't do a comprehensive test. But I found most of the time spending
> on allocating the pages and sending the PFNs to guest, I don't know that's
> the most time consuming operation, allocating the pages or sending the PFNs.

It might be a good idea to analyse it a bit more to convince people where
the problem is.

> >* is it due to the madvise on the host?
> >  If we were using the normal balloon messages, then we
> >  could, during migration, just route those to the migration
> >  code rather than bothering with the madvise.
> >  If they're clumping together we could just turn that into
> >  one big madvise; if they're not then would we benefit from
> >  a call that lets us madvise lots of areas?
> > 
> 
> My test showed madvise() is not the main reason for the long time, only taken
> 10% of the total  inflating balloon operation time.
> Big madvise can more or less improve the performance.

OK; 10% of the total is still pretty big even for your 8GB VM.

> >   4) Speeding up the migration of those free pages
> > You're using the bitmap to avoid migrating those free pages; HPe's
> > patchset is reconstructing a bitmap from the balloon data;  OK, so
> > this all makes sense to avoid migrating them - I'd also been thinking
> > of using pagemap to spot zero pages that would help find other zero'd
> > pages, but perhaps ballooned is enough?
> > 
> Could you describe your ideal with more details?

At the moment the migration code spends a fair amount of time checking if a page
is zero; I was thinking perhaps the qemu could just open /proc/self/pagemap
and check if the page was mapped; that would seem cheap if we're checking big
ranges; and that 

RE: [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-10 Thread Li, Liang Z
> 
> Hi,
>   I'm just catching back up on this thread; so without reference to any
> particular previous mail in the thread.
> 
>   1) How many of the free pages do we tell the host about?
>  Your main change is telling the host about all the
>  free pages.

Yes, all the guest's free pages.

>  If we tell the host about all the free pages, then we might
>  end up needing to allocate more pages and update the host
>  with pages we now want to use; that would have to wait for the
>  host to acknowledge that use of these pages, since if we don't
>  wait for it then it might have skipped migrating a page we
>  just started using (I don't understand how your series solves that).
>  So the guest probably needs to keep some free pages - how many?

Actually, there is no need to care about whether the free pages will be used by 
the host.
We only care about some of the free pages we get reused by the guest, right?

The dirty page logging can be used to solve this, starting the dirty page 
logging before getting
the free pages informant from guest. Even some of the free pages are modified 
by the guest
during the process of getting the free pages information, these modified pages 
will be traced
by the dirty page logging mechanism. So in the following 
migration_bitmap_sync() function.
The pages in the free pages bitmap, but latter was modified, will be reset to 
dirty. We won't
omit any dirtied pages.

So, guest doesn't need to keep any free pages.

>   2) Clearing out caches
>  Does it make sense to clean caches?  They're apparently useful data
>  so if we clean them it's likely to slow the guest down; I guess
>  they're also likely to be fairly static data - so at least fairly
>  easy to migrate.
>  The answer here partially depends on what you want from your migration;
>  if you're after the fastest possible migration time it might make
>  sense to clean the caches and avoid migrating them; but that might
>  be at the cost of more disruption to the guest - there's a trade off
>  somewhere and it's not clear to me how you set that depending on your
>  guest/network/reqirements.
> 

Yes, clean the caches is an option.  Let the users decide using it or not.

>   3) Why is ballooning slow?
>  You've got a figure of 5s to balloon on an 8GB VM - but an
>  8GB VM isn't huge; so I worry about how long it would take
>  on a big VM.   We need to understand why it's slow
>* is it due to the guest shuffling pages around?
>* is it due to the virtio-balloon protocol sending one page
>  at a time?
>  + Do balloon pages normally clump in physical memory
> - i.e. would a 'large balloon' message help
> - or do we need a bitmap because it tends not to clump?
> 

I didn't do a comprehensive test. But I found most of the time spending
on allocating the pages and sending the PFNs to guest, I don't know that's
the most time consuming operation, allocating the pages or sending the PFNs.

>* is it due to the madvise on the host?
>  If we were using the normal balloon messages, then we
>  could, during migration, just route those to the migration
>  code rather than bothering with the madvise.
>  If they're clumping together we could just turn that into
>  one big madvise; if they're not then would we benefit from
>  a call that lets us madvise lots of areas?
> 

My test showed madvise() is not the main reason for the long time, only taken
10% of the total  inflating balloon operation time.
Big madvise can more or less improve the performance.

>   4) Speeding up the migration of those free pages
> You're using the bitmap to avoid migrating those free pages; HPe's
> patchset is reconstructing a bitmap from the balloon data;  OK, so
> this all makes sense to avoid migrating them - I'd also been thinking
> of using pagemap to spot zero pages that would help find other zero'd
> pages, but perhaps ballooned is enough?
> 
Could you describe your ideal with more details?

>   5) Second-migrate
> Given a VM where you've done all those tricks on, what happens when
> you migrate it a second time?   I guess you're aiming for the guest
> to update it's bitmap;  HPe's solution is to migrate it's balloon
> bitmap along with the migration data.

Nothing is special in the second migration, QEMU will request the guest for 
free pages
Information, and the guest will traverse it's current free page list to 
construct a
new free page bitmap and send it to QEMU. Just like in the first migration.

Liang
> 
> Dave
> 
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK


RE: [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-10 Thread Li, Liang Z
> 
> Hi,
>   I'm just catching back up on this thread; so without reference to any
> particular previous mail in the thread.
> 
>   1) How many of the free pages do we tell the host about?
>  Your main change is telling the host about all the
>  free pages.

Yes, all the guest's free pages.

>  If we tell the host about all the free pages, then we might
>  end up needing to allocate more pages and update the host
>  with pages we now want to use; that would have to wait for the
>  host to acknowledge that use of these pages, since if we don't
>  wait for it then it might have skipped migrating a page we
>  just started using (I don't understand how your series solves that).
>  So the guest probably needs to keep some free pages - how many?

Actually, there is no need to care about whether the free pages will be used by 
the host.
We only care about some of the free pages we get reused by the guest, right?

The dirty page logging can be used to solve this, starting the dirty page 
logging before getting
the free pages informant from guest. Even some of the free pages are modified 
by the guest
during the process of getting the free pages information, these modified pages 
will be traced
by the dirty page logging mechanism. So in the following 
migration_bitmap_sync() function.
The pages in the free pages bitmap, but latter was modified, will be reset to 
dirty. We won't
omit any dirtied pages.

So, guest doesn't need to keep any free pages.

>   2) Clearing out caches
>  Does it make sense to clean caches?  They're apparently useful data
>  so if we clean them it's likely to slow the guest down; I guess
>  they're also likely to be fairly static data - so at least fairly
>  easy to migrate.
>  The answer here partially depends on what you want from your migration;
>  if you're after the fastest possible migration time it might make
>  sense to clean the caches and avoid migrating them; but that might
>  be at the cost of more disruption to the guest - there's a trade off
>  somewhere and it's not clear to me how you set that depending on your
>  guest/network/reqirements.
> 

Yes, clean the caches is an option.  Let the users decide using it or not.

>   3) Why is ballooning slow?
>  You've got a figure of 5s to balloon on an 8GB VM - but an
>  8GB VM isn't huge; so I worry about how long it would take
>  on a big VM.   We need to understand why it's slow
>* is it due to the guest shuffling pages around?
>* is it due to the virtio-balloon protocol sending one page
>  at a time?
>  + Do balloon pages normally clump in physical memory
> - i.e. would a 'large balloon' message help
> - or do we need a bitmap because it tends not to clump?
> 

I didn't do a comprehensive test. But I found most of the time spending
on allocating the pages and sending the PFNs to guest, I don't know that's
the most time consuming operation, allocating the pages or sending the PFNs.

>* is it due to the madvise on the host?
>  If we were using the normal balloon messages, then we
>  could, during migration, just route those to the migration
>  code rather than bothering with the madvise.
>  If they're clumping together we could just turn that into
>  one big madvise; if they're not then would we benefit from
>  a call that lets us madvise lots of areas?
> 

My test showed madvise() is not the main reason for the long time, only taken
10% of the total  inflating balloon operation time.
Big madvise can more or less improve the performance.

>   4) Speeding up the migration of those free pages
> You're using the bitmap to avoid migrating those free pages; HPe's
> patchset is reconstructing a bitmap from the balloon data;  OK, so
> this all makes sense to avoid migrating them - I'd also been thinking
> of using pagemap to spot zero pages that would help find other zero'd
> pages, but perhaps ballooned is enough?
> 
Could you describe your ideal with more details?

>   5) Second-migrate
> Given a VM where you've done all those tricks on, what happens when
> you migrate it a second time?   I guess you're aiming for the guest
> to update it's bitmap;  HPe's solution is to migrate it's balloon
> bitmap along with the migration data.

Nothing is special in the second migration, QEMU will request the guest for 
free pages
Information, and the guest will traverse it's current free page list to 
construct a
new free page bitmap and send it to QEMU. Just like in the first migration.

Liang
> 
> Dave
> 
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK


Re: [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-10 Thread Dr. David Alan Gilbert
Hi,
  I'm just catching back up on this thread; so without reference to any
particular previous mail in the thread.

  1) How many of the free pages do we tell the host about?
 Your main change is telling the host about all the
 free pages.
 If we tell the host about all the free pages, then we might
 end up needing to allocate more pages and update the host
 with pages we now want to use; that would have to wait for the
 host to acknowledge that use of these pages, since if we don't
 wait for it then it might have skipped migrating a page we
 just started using (I don't understand how your series solves that).
 So the guest probably needs to keep some free pages - how many?

  2) Clearing out caches
 Does it make sense to clean caches?  They're apparently useful data
 so if we clean them it's likely to slow the guest down; I guess
 they're also likely to be fairly static data - so at least fairly
 easy to migrate.
 The answer here partially depends on what you want from your migration;
 if you're after the fastest possible migration time it might make
 sense to clean the caches and avoid migrating them; but that might
 be at the cost of more disruption to the guest - there's a trade off
 somewhere and it's not clear to me how you set that depending on your
 guest/network/reqirements.

  3) Why is ballooning slow?
 You've got a figure of 5s to balloon on an 8GB VM - but an 
 8GB VM isn't huge; so I worry about how long it would take
 on a big VM.   We need to understand why it's slow 
   * is it due to the guest shuffling pages around? 
   * is it due to the virtio-balloon protocol sending one page
 at a time?
 + Do balloon pages normally clump in physical memory
- i.e. would a 'large balloon' message help
- or do we need a bitmap because it tends not to clump?

   * is it due to the madvise on the host?
 If we were using the normal balloon messages, then we
 could, during migration, just route those to the migration
 code rather than bothering with the madvise.
 If they're clumping together we could just turn that into
 one big madvise; if they're not then would we benefit from
 a call that lets us madvise lots of areas?

  4) Speeding up the migration of those free pages
You're using the bitmap to avoid migrating those free pages; HPe's
patchset is reconstructing a bitmap from the balloon data;  OK, so
this all makes sense to avoid migrating them - I'd also been thinking
of using pagemap to spot zero pages that would help find other zero'd
pages, but perhaps ballooned is enough?

  5) Second-migrate
Given a VM where you've done all those tricks on, what happens when
you migrate it a second time?   I guess you're aiming for the guest
to update it's bitmap;  HPe's solution is to migrate it's balloon
bitmap along with the migration data.
 
Dave

--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK


Re: [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-10 Thread Dr. David Alan Gilbert
Hi,
  I'm just catching back up on this thread; so without reference to any
particular previous mail in the thread.

  1) How many of the free pages do we tell the host about?
 Your main change is telling the host about all the
 free pages.
 If we tell the host about all the free pages, then we might
 end up needing to allocate more pages and update the host
 with pages we now want to use; that would have to wait for the
 host to acknowledge that use of these pages, since if we don't
 wait for it then it might have skipped migrating a page we
 just started using (I don't understand how your series solves that).
 So the guest probably needs to keep some free pages - how many?

  2) Clearing out caches
 Does it make sense to clean caches?  They're apparently useful data
 so if we clean them it's likely to slow the guest down; I guess
 they're also likely to be fairly static data - so at least fairly
 easy to migrate.
 The answer here partially depends on what you want from your migration;
 if you're after the fastest possible migration time it might make
 sense to clean the caches and avoid migrating them; but that might
 be at the cost of more disruption to the guest - there's a trade off
 somewhere and it's not clear to me how you set that depending on your
 guest/network/reqirements.

  3) Why is ballooning slow?
 You've got a figure of 5s to balloon on an 8GB VM - but an 
 8GB VM isn't huge; so I worry about how long it would take
 on a big VM.   We need to understand why it's slow 
   * is it due to the guest shuffling pages around? 
   * is it due to the virtio-balloon protocol sending one page
 at a time?
 + Do balloon pages normally clump in physical memory
- i.e. would a 'large balloon' message help
- or do we need a bitmap because it tends not to clump?

   * is it due to the madvise on the host?
 If we were using the normal balloon messages, then we
 could, during migration, just route those to the migration
 code rather than bothering with the madvise.
 If they're clumping together we could just turn that into
 one big madvise; if they're not then would we benefit from
 a call that lets us madvise lots of areas?

  4) Speeding up the migration of those free pages
You're using the bitmap to avoid migrating those free pages; HPe's
patchset is reconstructing a bitmap from the balloon data;  OK, so
this all makes sense to avoid migrating them - I'd also been thinking
of using pagemap to spot zero pages that would help find other zero'd
pages, but perhaps ballooned is enough?

  5) Second-migrate
Given a VM where you've done all those tricks on, what happens when
you migrate it a second time?   I guess you're aiming for the guest
to update it's bitmap;  HPe's solution is to migrate it's balloon
bitmap along with the migration data.
 
Dave

--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK


RE: [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-10 Thread Li, Liang Z
> >  Could provide more information on how to use virtio-serial to exchange
> data?  Thread , Wiki or code are all OK.
> >  I have not find some useful information yet.
> 
> See this commit in the Linux sources:
> 
> 108fc82596e3b66b819df9d28c1ebbc9ab5de14c
> 
> that adds a way to send guest trace data over to the host.  I think that's the
> most relevant to your use-case.  However, you'll have to add an in-kernel
> user of virtio-serial (like the virtio-console code
> -- the code that deals with tty and hvc currently).  There's no other non-tty
> user right now, and this is the right kind of use-case to add one for!
> 
> For many other (userspace) use-cases, see the qemu-guest-agent in the
> qemu sources.
> 
> The API is documented in the wiki:
> 
> http://www.linux-kvm.org/page/Virtio-serial_API
> 
> and the feature pages have some information that may help as well:
> 
> https://fedoraproject.org/wiki/Features/VirtioSerial
> 
> There are some links in here too:
> 
> http://log.amitshah.net/2010/09/communication-between-guests-and-
> hosts/
> 
> Hope this helps.
> 
> 
>   Amit

Thanks a lot !!

Liang


RE: [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-10 Thread Li, Liang Z
> >  Could provide more information on how to use virtio-serial to exchange
> data?  Thread , Wiki or code are all OK.
> >  I have not find some useful information yet.
> 
> See this commit in the Linux sources:
> 
> 108fc82596e3b66b819df9d28c1ebbc9ab5de14c
> 
> that adds a way to send guest trace data over to the host.  I think that's the
> most relevant to your use-case.  However, you'll have to add an in-kernel
> user of virtio-serial (like the virtio-console code
> -- the code that deals with tty and hvc currently).  There's no other non-tty
> user right now, and this is the right kind of use-case to add one for!
> 
> For many other (userspace) use-cases, see the qemu-guest-agent in the
> qemu sources.
> 
> The API is documented in the wiki:
> 
> http://www.linux-kvm.org/page/Virtio-serial_API
> 
> and the feature pages have some information that may help as well:
> 
> https://fedoraproject.org/wiki/Features/VirtioSerial
> 
> There are some links in here too:
> 
> http://log.amitshah.net/2010/09/communication-between-guests-and-
> hosts/
> 
> Hope this helps.
> 
> 
>   Amit

Thanks a lot !!

Liang


Re: [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-09 Thread Amit Shah
On (Thu) 10 Mar 2016 [07:44:19], Li, Liang Z wrote:
> 
> Hi Amit,
> 
>  Could provide more information on how to use virtio-serial to exchange data? 
>  Thread , Wiki or code are all OK. 
>  I have not find some useful information yet.

See this commit in the Linux sources:

108fc82596e3b66b819df9d28c1ebbc9ab5de14c

that adds a way to send guest trace data over to the host.  I think
that's the most relevant to your use-case.  However, you'll have to
add an in-kernel user of virtio-serial (like the virtio-console code
-- the code that deals with tty and hvc currently).  There's no other
non-tty user right now, and this is the right kind of use-case to add
one for!

For many other (userspace) use-cases, see the qemu-guest-agent in the
qemu sources.

The API is documented in the wiki:

http://www.linux-kvm.org/page/Virtio-serial_API

and the feature pages have some information that may help as well:

https://fedoraproject.org/wiki/Features/VirtioSerial

There are some links in here too:

http://log.amitshah.net/2010/09/communication-between-guests-and-hosts/

Hope this helps.


Amit


Re: [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-09 Thread Amit Shah
On (Thu) 10 Mar 2016 [07:44:19], Li, Liang Z wrote:
> 
> Hi Amit,
> 
>  Could provide more information on how to use virtio-serial to exchange data? 
>  Thread , Wiki or code are all OK. 
>  I have not find some useful information yet.

See this commit in the Linux sources:

108fc82596e3b66b819df9d28c1ebbc9ab5de14c

that adds a way to send guest trace data over to the host.  I think
that's the most relevant to your use-case.  However, you'll have to
add an in-kernel user of virtio-serial (like the virtio-console code
-- the code that deals with tty and hvc currently).  There's no other
non-tty user right now, and this is the right kind of use-case to add
one for!

For many other (userspace) use-cases, see the qemu-guest-agent in the
qemu sources.

The API is documented in the wiki:

http://www.linux-kvm.org/page/Virtio-serial_API

and the feature pages have some information that may help as well:

https://fedoraproject.org/wiki/Features/VirtioSerial

There are some links in here too:

http://log.amitshah.net/2010/09/communication-between-guests-and-hosts/

Hope this helps.


Amit


RE: [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-09 Thread Li, Liang Z
> > This patch set is the QEMU side implementation.
> >
> > The virtio-balloon is extended so that QEMU can get the free pages
> > information from the guest through virtio.
> >
> > After getting the free pages information (a bitmap), QEMU can use it
> > to filter out the guest's free pages in the ram bulk stage. This make
> > the live migration process much more efficient.
> >
> > This RFC version doesn't take the post-copy and RDMA into
> > consideration, maybe both of them can benefit from this PV solution by
> > with some extra modifications.
> 
> I like the idea, just have to prove (review) and test it a lot to ensure we 
> don't
> end up skipping pages that matter.
> 
> However, there are a couple of points:
> 
> In my opinion, the information that's exchanged between the guest and the
> host should be exchanged over a virtio-serial channel rather than virtio-
> balloon.  First, there's nothing related to the balloon here.
> It just happens to be memory info.  Second, I would never enable balloon in
> a guest that I want to be performance-sensitive.  So even if you add this as
> part of balloon, you'll find no one is using this solution.
> 
> Secondly, I suggest virtio-serial, because it's meant exactly to exchange 
> free-
> flowing information between a host and a guest, and you don't need to
> extend any part of the protocol for it (hence no changes necessary to the
> spec).  You can see how spice, vnc, etc., use virtio-serial to exchange data.
> 
> 
>   Amit

Hi Amit,

 Could provide more information on how to use virtio-serial to exchange data?  
Thread , Wiki or code are all OK. 
 I have not find some useful information yet.

Thanks
Liang



RE: [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-09 Thread Li, Liang Z
> > This patch set is the QEMU side implementation.
> >
> > The virtio-balloon is extended so that QEMU can get the free pages
> > information from the guest through virtio.
> >
> > After getting the free pages information (a bitmap), QEMU can use it
> > to filter out the guest's free pages in the ram bulk stage. This make
> > the live migration process much more efficient.
> >
> > This RFC version doesn't take the post-copy and RDMA into
> > consideration, maybe both of them can benefit from this PV solution by
> > with some extra modifications.
> 
> I like the idea, just have to prove (review) and test it a lot to ensure we 
> don't
> end up skipping pages that matter.
> 
> However, there are a couple of points:
> 
> In my opinion, the information that's exchanged between the guest and the
> host should be exchanged over a virtio-serial channel rather than virtio-
> balloon.  First, there's nothing related to the balloon here.
> It just happens to be memory info.  Second, I would never enable balloon in
> a guest that I want to be performance-sensitive.  So even if you add this as
> part of balloon, you'll find no one is using this solution.
> 
> Secondly, I suggest virtio-serial, because it's meant exactly to exchange 
> free-
> flowing information between a host and a guest, and you don't need to
> extend any part of the protocol for it (hence no changes necessary to the
> spec).  You can see how spice, vnc, etc., use virtio-serial to exchange data.
> 
> 
>   Amit

Hi Amit,

 Could provide more information on how to use virtio-serial to exchange data?  
Thread , Wiki or code are all OK. 
 I have not find some useful information yet.

Thanks
Liang



RE: [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-08 Thread Li, Liang Z
> Subject: Re: [RFC qemu 0/4] A PV solution for live migration optimization
> 
> On (Thu) 03 Mar 2016 [18:44:24], Liang Li wrote:
> > The current QEMU live migration implementation mark the all the
> > guest's RAM pages as dirtied in the ram bulk stage, all these pages
> > will be processed and that takes quit a lot of CPU cycles.
> >
> > From guest's point of view, it doesn't care about the content in free
> > pages. We can make use of this fact and skip processing the free pages
> > in the ram bulk stage, it can save a lot CPU cycles and reduce the
> > network traffic significantly while speed up the live migration
> > process obviously.
> >
> > This patch set is the QEMU side implementation.
> >
> > The virtio-balloon is extended so that QEMU can get the free pages
> > information from the guest through virtio.
> >
> > After getting the free pages information (a bitmap), QEMU can use it
> > to filter out the guest's free pages in the ram bulk stage. This make
> > the live migration process much more efficient.
> >
> > This RFC version doesn't take the post-copy and RDMA into
> > consideration, maybe both of them can benefit from this PV solution by
> > with some extra modifications.
> 
> I like the idea, just have to prove (review) and test it a lot to ensure we 
> don't
> end up skipping pages that matter.
> 
> However, there are a couple of points:
> 
> In my opinion, the information that's exchanged between the guest and the
> host should be exchanged over a virtio-serial channel rather than virtio-
> balloon.  First, there's nothing related to the balloon here.
> It just happens to be memory info.  Second, I would never enable balloon in
> a guest that I want to be performance-sensitive.  So even if you add this as
> part of balloon, you'll find no one is using this solution.
> 
> Secondly, I suggest virtio-serial, because it's meant exactly to exchange 
> free-
> flowing information between a host and a guest, and you don't need to
> extend any part of the protocol for it (hence no changes necessary to the
> spec).  You can see how spice, vnc, etc., use virtio-serial to exchange data.
> 
> 
>   Amit

I don't like to use the virtio-balloon too, and it's confusing. 
It's grate if the virtio-serial can be used, I will take a look at it. 

Thanks for your suggestion!

Liang


RE: [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-08 Thread Li, Liang Z
> Subject: Re: [RFC qemu 0/4] A PV solution for live migration optimization
> 
> On (Thu) 03 Mar 2016 [18:44:24], Liang Li wrote:
> > The current QEMU live migration implementation mark the all the
> > guest's RAM pages as dirtied in the ram bulk stage, all these pages
> > will be processed and that takes quit a lot of CPU cycles.
> >
> > From guest's point of view, it doesn't care about the content in free
> > pages. We can make use of this fact and skip processing the free pages
> > in the ram bulk stage, it can save a lot CPU cycles and reduce the
> > network traffic significantly while speed up the live migration
> > process obviously.
> >
> > This patch set is the QEMU side implementation.
> >
> > The virtio-balloon is extended so that QEMU can get the free pages
> > information from the guest through virtio.
> >
> > After getting the free pages information (a bitmap), QEMU can use it
> > to filter out the guest's free pages in the ram bulk stage. This make
> > the live migration process much more efficient.
> >
> > This RFC version doesn't take the post-copy and RDMA into
> > consideration, maybe both of them can benefit from this PV solution by
> > with some extra modifications.
> 
> I like the idea, just have to prove (review) and test it a lot to ensure we 
> don't
> end up skipping pages that matter.
> 
> However, there are a couple of points:
> 
> In my opinion, the information that's exchanged between the guest and the
> host should be exchanged over a virtio-serial channel rather than virtio-
> balloon.  First, there's nothing related to the balloon here.
> It just happens to be memory info.  Second, I would never enable balloon in
> a guest that I want to be performance-sensitive.  So even if you add this as
> part of balloon, you'll find no one is using this solution.
> 
> Secondly, I suggest virtio-serial, because it's meant exactly to exchange 
> free-
> flowing information between a host and a guest, and you don't need to
> extend any part of the protocol for it (hence no changes necessary to the
> spec).  You can see how spice, vnc, etc., use virtio-serial to exchange data.
> 
> 
>   Amit

I don't like to use the virtio-balloon too, and it's confusing. 
It's grate if the virtio-serial can be used, I will take a look at it. 

Thanks for your suggestion!

Liang


Re: [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-08 Thread Amit Shah
On (Thu) 03 Mar 2016 [18:44:24], Liang Li wrote:
> The current QEMU live migration implementation mark the all the
> guest's RAM pages as dirtied in the ram bulk stage, all these pages
> will be processed and that takes quit a lot of CPU cycles.
> 
> From guest's point of view, it doesn't care about the content in free
> pages. We can make use of this fact and skip processing the free
> pages in the ram bulk stage, it can save a lot CPU cycles and reduce
> the network traffic significantly while speed up the live migration
> process obviously.
> 
> This patch set is the QEMU side implementation.
> 
> The virtio-balloon is extended so that QEMU can get the free pages
> information from the guest through virtio.
> 
> After getting the free pages information (a bitmap), QEMU can use it
> to filter out the guest's free pages in the ram bulk stage. This make
> the live migration process much more efficient.
> 
> This RFC version doesn't take the post-copy and RDMA into
> consideration, maybe both of them can benefit from this PV solution
> by with some extra modifications.

I like the idea, just have to prove (review) and test it a lot to
ensure we don't end up skipping pages that matter.

However, there are a couple of points:

In my opinion, the information that's exchanged between the guest and
the host should be exchanged over a virtio-serial channel rather than
virtio-balloon.  First, there's nothing related to the balloon here.
It just happens to be memory info.  Second, I would never enable
balloon in a guest that I want to be performance-sensitive.  So even
if you add this as part of balloon, you'll find no one is using this
solution.

Secondly, I suggest virtio-serial, because it's meant exactly to
exchange free-flowing information between a host and a guest, and you
don't need to extend any part of the protocol for it (hence no changes
necessary to the spec).  You can see how spice, vnc, etc., use
virtio-serial to exchange data.


Amit


Re: [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-08 Thread Amit Shah
On (Thu) 03 Mar 2016 [18:44:24], Liang Li wrote:
> The current QEMU live migration implementation mark the all the
> guest's RAM pages as dirtied in the ram bulk stage, all these pages
> will be processed and that takes quit a lot of CPU cycles.
> 
> From guest's point of view, it doesn't care about the content in free
> pages. We can make use of this fact and skip processing the free
> pages in the ram bulk stage, it can save a lot CPU cycles and reduce
> the network traffic significantly while speed up the live migration
> process obviously.
> 
> This patch set is the QEMU side implementation.
> 
> The virtio-balloon is extended so that QEMU can get the free pages
> information from the guest through virtio.
> 
> After getting the free pages information (a bitmap), QEMU can use it
> to filter out the guest's free pages in the ram bulk stage. This make
> the live migration process much more efficient.
> 
> This RFC version doesn't take the post-copy and RDMA into
> consideration, maybe both of them can benefit from this PV solution
> by with some extra modifications.

I like the idea, just have to prove (review) and test it a lot to
ensure we don't end up skipping pages that matter.

However, there are a couple of points:

In my opinion, the information that's exchanged between the guest and
the host should be exchanged over a virtio-serial channel rather than
virtio-balloon.  First, there's nothing related to the balloon here.
It just happens to be memory info.  Second, I would never enable
balloon in a guest that I want to be performance-sensitive.  So even
if you add this as part of balloon, you'll find no one is using this
solution.

Secondly, I suggest virtio-serial, because it's meant exactly to
exchange free-flowing information between a host and a guest, and you
don't need to extend any part of the protocol for it (hence no changes
necessary to the spec).  You can see how spice, vnc, etc., use
virtio-serial to exchange data.


Amit


Re: [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-08 Thread Amit Shah
On (Fri) 04 Mar 2016 [15:02:47], Jitendra Kolhe wrote:
> > >
> > > * Liang Li (liang.z...@intel.com) wrote:
> > > > The current QEMU live migration implementation mark the all the
> > > > guest's RAM pages as dirtied in the ram bulk stage, all these pages
> > > > will be processed and that takes quit a lot of CPU cycles.
> > > >
> > > > From guest's point of view, it doesn't care about the content in free
> > > > pages. We can make use of this fact and skip processing the free pages
> > > > in the ram bulk stage, it can save a lot CPU cycles and reduce the
> > > > network traffic significantly while speed up the live migration
> > > > process obviously.
> > > >
> > > > This patch set is the QEMU side implementation.
> > > >
> > > > The virtio-balloon is extended so that QEMU can get the free pages
> > > > information from the guest through virtio.
> > > >
> > > > After getting the free pages information (a bitmap), QEMU can use it
> > > > to filter out the guest's free pages in the ram bulk stage. This make
> > > > the live migration process much more efficient.
> > >
> > > Hi,
> > >   An interesting solution; I know a few different people have been 
> > > looking at
> > > how to speed up ballooned VM migration.
> > >
> >
> > Ooh, different solutions for the same purpose, and both based on the 
> > balloon.
> 
> We were also tying to address similar problem, without actually needing to 
> modify
> the guest driver. Please find patch details under mail with subject.
> migration: skip sending ram pages released by virtio-balloon driver

The scope of this patch series seems to be wider: don't send free
pages to a dest at all, vs. don't send pages that are ballooned out.

Amit


Re: [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-08 Thread Amit Shah
On (Fri) 04 Mar 2016 [15:02:47], Jitendra Kolhe wrote:
> > >
> > > * Liang Li (liang.z...@intel.com) wrote:
> > > > The current QEMU live migration implementation mark the all the
> > > > guest's RAM pages as dirtied in the ram bulk stage, all these pages
> > > > will be processed and that takes quit a lot of CPU cycles.
> > > >
> > > > From guest's point of view, it doesn't care about the content in free
> > > > pages. We can make use of this fact and skip processing the free pages
> > > > in the ram bulk stage, it can save a lot CPU cycles and reduce the
> > > > network traffic significantly while speed up the live migration
> > > > process obviously.
> > > >
> > > > This patch set is the QEMU side implementation.
> > > >
> > > > The virtio-balloon is extended so that QEMU can get the free pages
> > > > information from the guest through virtio.
> > > >
> > > > After getting the free pages information (a bitmap), QEMU can use it
> > > > to filter out the guest's free pages in the ram bulk stage. This make
> > > > the live migration process much more efficient.
> > >
> > > Hi,
> > >   An interesting solution; I know a few different people have been 
> > > looking at
> > > how to speed up ballooned VM migration.
> > >
> >
> > Ooh, different solutions for the same purpose, and both based on the 
> > balloon.
> 
> We were also tying to address similar problem, without actually needing to 
> modify
> the guest driver. Please find patch details under mail with subject.
> migration: skip sending ram pages released by virtio-balloon driver

The scope of this patch series seems to be wider: don't send free
pages to a dest at all, vs. don't send pages that are ballooned out.

Amit


RE: [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-04 Thread Li, Liang Z
> > > * Liang Li (liang.z...@intel.com) wrote:
> > > > The current QEMU live migration implementation mark the all the
> > > > guest's RAM pages as dirtied in the ram bulk stage, all these
> > > > pages will be processed and that takes quit a lot of CPU cycles.
> > > >
> > > > From guest's point of view, it doesn't care about the content in
> > > > free pages. We can make use of this fact and skip processing the
> > > > free pages in the ram bulk stage, it can save a lot CPU cycles and
> > > > reduce the network traffic significantly while speed up the live
> > > > migration process obviously.
> > > >
> > > > This patch set is the QEMU side implementation.
> > > >
> > > > The virtio-balloon is extended so that QEMU can get the free pages
> > > > information from the guest through virtio.
> > > >
> > > > After getting the free pages information (a bitmap), QEMU can use
> > > > it to filter out the guest's free pages in the ram bulk stage.
> > > > This make the live migration process much more efficient.
> > >
> > > Hi,
> > >   An interesting solution; I know a few different people have been
> > > looking at how to speed up ballooned VM migration.
> > >
> >
> > Ooh, different solutions for the same purpose, and both based on the
> balloon.
> 
> We were also tying to address similar problem, without actually needing to
> modify the guest driver. Please find patch details under mail with subject.
> migration: skip sending ram pages released by virtio-balloon driver
> 
> Thanks,
> - Jitendra
> 

Great! Thanks for your information.

Liang
> >
> > >   I wonder if it would be possible to avoid the kernel changes by
> > > parsing /proc/self/pagemap - if that can be used to detect
> > > unmapped/zero mapped pages in the guest ram, would it achieve the
> same result?
> > >


RE: [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-04 Thread Li, Liang Z
> > > * Liang Li (liang.z...@intel.com) wrote:
> > > > The current QEMU live migration implementation mark the all the
> > > > guest's RAM pages as dirtied in the ram bulk stage, all these
> > > > pages will be processed and that takes quit a lot of CPU cycles.
> > > >
> > > > From guest's point of view, it doesn't care about the content in
> > > > free pages. We can make use of this fact and skip processing the
> > > > free pages in the ram bulk stage, it can save a lot CPU cycles and
> > > > reduce the network traffic significantly while speed up the live
> > > > migration process obviously.
> > > >
> > > > This patch set is the QEMU side implementation.
> > > >
> > > > The virtio-balloon is extended so that QEMU can get the free pages
> > > > information from the guest through virtio.
> > > >
> > > > After getting the free pages information (a bitmap), QEMU can use
> > > > it to filter out the guest's free pages in the ram bulk stage.
> > > > This make the live migration process much more efficient.
> > >
> > > Hi,
> > >   An interesting solution; I know a few different people have been
> > > looking at how to speed up ballooned VM migration.
> > >
> >
> > Ooh, different solutions for the same purpose, and both based on the
> balloon.
> 
> We were also tying to address similar problem, without actually needing to
> modify the guest driver. Please find patch details under mail with subject.
> migration: skip sending ram pages released by virtio-balloon driver
> 
> Thanks,
> - Jitendra
> 

Great! Thanks for your information.

Liang
> >
> > >   I wonder if it would be possible to avoid the kernel changes by
> > > parsing /proc/self/pagemap - if that can be used to detect
> > > unmapped/zero mapped pages in the guest ram, would it achieve the
> same result?
> > >


RE: [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-03 Thread Li, Liang Z
> Subject: Re: [RFC qemu 0/4] A PV solution for live migration optimization
> 
> * Liang Li (liang.z...@intel.com) wrote:
> > The current QEMU live migration implementation mark the all the
> > guest's RAM pages as dirtied in the ram bulk stage, all these pages
> > will be processed and that takes quit a lot of CPU cycles.
> >
> > From guest's point of view, it doesn't care about the content in free
> > pages. We can make use of this fact and skip processing the free pages
> > in the ram bulk stage, it can save a lot CPU cycles and reduce the
> > network traffic significantly while speed up the live migration
> > process obviously.
> >
> > This patch set is the QEMU side implementation.
> >
> > The virtio-balloon is extended so that QEMU can get the free pages
> > information from the guest through virtio.
> >
> > After getting the free pages information (a bitmap), QEMU can use it
> > to filter out the guest's free pages in the ram bulk stage. This make
> > the live migration process much more efficient.
> 
> Hi,
>   An interesting solution; I know a few different people have been looking at
> how to speed up ballooned VM migration.
> 

Ooh, different solutions for the same purpose, and both based on the balloon.

>   I wonder if it would be possible to avoid the kernel changes by parsing
> /proc/self/pagemap - if that can be used to detect unmapped/zero mapped
> pages in the guest ram, would it achieve the same result?
> 

Only detect the unmapped/zero mapped pages is not enough. Consider the 
situation like case 2, it can't achieve the same result.

> > This RFC version doesn't take the post-copy and RDMA into
> > consideration, maybe both of them can benefit from this PV solution by
> > with some extra modifications.
> 
> For postcopy to be safe, you would still need to send a message to the
> destination telling it that there were zero pages, otherwise the destination
> can't tell if it's supposed to request the page from the source or treat the
> page as zero.
> 
> Dave

I will consider this later, thanks, Dave.

Liang

> 
> >
> > Performance data
> > 
> >
> > Test environment:
> >
> > CPU: Intel (R) Xeon(R) CPU ES-2699 v3 @ 2.30GHz Host RAM: 64GB
> > Host Linux Kernel:  4.2.0   Host OS: CentOS 7.1
> > Guest Linux Kernel:  4.5.rc6Guest OS: CentOS 6.6
> > Network:  X540-AT2 with 10 Gigabit connection Guest RAM: 8GB
> >
> > Case 1: Idle guest just boots:
> > 
> > | original  |pv
> > ---
> > total time(ms)  |1894   |   421
> > 
> > transferred ram(KB) |   398017  |  353242
> > 
> >
> >
> > Case 2: The guest has ever run some memory consuming workload, the
> > workload is terminated just before live migration.
> > 
> > | original  |pv
> > ---
> > total time(ms)  |   7436|   552
> > 
> > transferred ram(KB) |  8146291  |  361375
> > 
> >


RE: [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-03 Thread Li, Liang Z
> Subject: Re: [RFC qemu 0/4] A PV solution for live migration optimization
> 
> * Liang Li (liang.z...@intel.com) wrote:
> > The current QEMU live migration implementation mark the all the
> > guest's RAM pages as dirtied in the ram bulk stage, all these pages
> > will be processed and that takes quit a lot of CPU cycles.
> >
> > From guest's point of view, it doesn't care about the content in free
> > pages. We can make use of this fact and skip processing the free pages
> > in the ram bulk stage, it can save a lot CPU cycles and reduce the
> > network traffic significantly while speed up the live migration
> > process obviously.
> >
> > This patch set is the QEMU side implementation.
> >
> > The virtio-balloon is extended so that QEMU can get the free pages
> > information from the guest through virtio.
> >
> > After getting the free pages information (a bitmap), QEMU can use it
> > to filter out the guest's free pages in the ram bulk stage. This make
> > the live migration process much more efficient.
> 
> Hi,
>   An interesting solution; I know a few different people have been looking at
> how to speed up ballooned VM migration.
> 

Ooh, different solutions for the same purpose, and both based on the balloon.

>   I wonder if it would be possible to avoid the kernel changes by parsing
> /proc/self/pagemap - if that can be used to detect unmapped/zero mapped
> pages in the guest ram, would it achieve the same result?
> 

Only detect the unmapped/zero mapped pages is not enough. Consider the 
situation like case 2, it can't achieve the same result.

> > This RFC version doesn't take the post-copy and RDMA into
> > consideration, maybe both of them can benefit from this PV solution by
> > with some extra modifications.
> 
> For postcopy to be safe, you would still need to send a message to the
> destination telling it that there were zero pages, otherwise the destination
> can't tell if it's supposed to request the page from the source or treat the
> page as zero.
> 
> Dave

I will consider this later, thanks, Dave.

Liang

> 
> >
> > Performance data
> > 
> >
> > Test environment:
> >
> > CPU: Intel (R) Xeon(R) CPU ES-2699 v3 @ 2.30GHz Host RAM: 64GB
> > Host Linux Kernel:  4.2.0   Host OS: CentOS 7.1
> > Guest Linux Kernel:  4.5.rc6Guest OS: CentOS 6.6
> > Network:  X540-AT2 with 10 Gigabit connection Guest RAM: 8GB
> >
> > Case 1: Idle guest just boots:
> > 
> > | original  |pv
> > ---
> > total time(ms)  |1894   |   421
> > 
> > transferred ram(KB) |   398017  |  353242
> > 
> >
> >
> > Case 2: The guest has ever run some memory consuming workload, the
> > workload is terminated just before live migration.
> > 
> > | original  |pv
> > ---
> > total time(ms)  |   7436|   552
> > 
> > transferred ram(KB) |  8146291  |  361375
> > 
> >


Re: [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-03 Thread Dr. David Alan Gilbert
* Liang Li (liang.z...@intel.com) wrote:
> The current QEMU live migration implementation mark the all the
> guest's RAM pages as dirtied in the ram bulk stage, all these pages
> will be processed and that takes quit a lot of CPU cycles.
> 
> From guest's point of view, it doesn't care about the content in free
> pages. We can make use of this fact and skip processing the free
> pages in the ram bulk stage, it can save a lot CPU cycles and reduce
> the network traffic significantly while speed up the live migration
> process obviously.
> 
> This patch set is the QEMU side implementation.
> 
> The virtio-balloon is extended so that QEMU can get the free pages
> information from the guest through virtio.
> 
> After getting the free pages information (a bitmap), QEMU can use it
> to filter out the guest's free pages in the ram bulk stage. This make
> the live migration process much more efficient.

Hi,
  An interesting solution; I know a few different people have been looking
at how to speed up ballooned VM migration.

  I wonder if it would be possible to avoid the kernel changes by
parsing /proc/self/pagemap - if that can be used to detect unmapped/zero
mapped pages in the guest ram, would it achieve the same result?

> This RFC version doesn't take the post-copy and RDMA into
> consideration, maybe both of them can benefit from this PV solution
> by with some extra modifications.

For postcopy to be safe, you would still need to send a message to the
destination telling it that there were zero pages, otherwise the destination
can't tell if it's supposed to request the page from the source or
treat the page as zero.

Dave

> 
> Performance data
> 
> 
> Test environment:
> 
> CPU: Intel (R) Xeon(R) CPU ES-2699 v3 @ 2.30GHz
> Host RAM: 64GB
> Host Linux Kernel:  4.2.0   Host OS: CentOS 7.1
> Guest Linux Kernel:  4.5.rc6Guest OS: CentOS 6.6
> Network:  X540-AT2 with 10 Gigabit connection
> Guest RAM: 8GB
> 
> Case 1: Idle guest just boots:
> 
> | original  |pv
> ---
> total time(ms)  |1894   |   421
> 
> transferred ram(KB) |   398017  |  353242
> 
> 
> 
> Case 2: The guest has ever run some memory consuming workload, the
> workload is terminated just before live migration.
> 
> | original  |pv
> ---
> total time(ms)  |   7436|   552
> 
> transferred ram(KB) |  8146291  |  361375
> 
> 
> Liang Li (4):
>   pc: Add code to get the lowmem form PCMachineState
>   virtio-balloon: Add a new feature to balloon device
>   migration: not set migration bitmap in setup stage
>   migration: filter out guest's free pages in ram bulk stage
> 
>  balloon.c   | 30 -
>  hw/i386/pc.c|  5 ++
>  hw/i386/pc_piix.c   |  1 +
>  hw/i386/pc_q35.c|  1 +
>  hw/virtio/virtio-balloon.c  | 81 
> -
>  include/hw/i386/pc.h|  3 +-
>  include/hw/virtio/virtio-balloon.h  | 17 +-
>  include/standard-headers/linux/virtio_balloon.h |  1 +
>  include/sysemu/balloon.h| 10 ++-
>  migration/ram.c | 64 +++
>  10 files changed, 195 insertions(+), 18 deletions(-)
> 
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK


Re: [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-03 Thread Dr. David Alan Gilbert
* Liang Li (liang.z...@intel.com) wrote:
> The current QEMU live migration implementation mark the all the
> guest's RAM pages as dirtied in the ram bulk stage, all these pages
> will be processed and that takes quit a lot of CPU cycles.
> 
> From guest's point of view, it doesn't care about the content in free
> pages. We can make use of this fact and skip processing the free
> pages in the ram bulk stage, it can save a lot CPU cycles and reduce
> the network traffic significantly while speed up the live migration
> process obviously.
> 
> This patch set is the QEMU side implementation.
> 
> The virtio-balloon is extended so that QEMU can get the free pages
> information from the guest through virtio.
> 
> After getting the free pages information (a bitmap), QEMU can use it
> to filter out the guest's free pages in the ram bulk stage. This make
> the live migration process much more efficient.

Hi,
  An interesting solution; I know a few different people have been looking
at how to speed up ballooned VM migration.

  I wonder if it would be possible to avoid the kernel changes by
parsing /proc/self/pagemap - if that can be used to detect unmapped/zero
mapped pages in the guest ram, would it achieve the same result?

> This RFC version doesn't take the post-copy and RDMA into
> consideration, maybe both of them can benefit from this PV solution
> by with some extra modifications.

For postcopy to be safe, you would still need to send a message to the
destination telling it that there were zero pages, otherwise the destination
can't tell if it's supposed to request the page from the source or
treat the page as zero.

Dave

> 
> Performance data
> 
> 
> Test environment:
> 
> CPU: Intel (R) Xeon(R) CPU ES-2699 v3 @ 2.30GHz
> Host RAM: 64GB
> Host Linux Kernel:  4.2.0   Host OS: CentOS 7.1
> Guest Linux Kernel:  4.5.rc6Guest OS: CentOS 6.6
> Network:  X540-AT2 with 10 Gigabit connection
> Guest RAM: 8GB
> 
> Case 1: Idle guest just boots:
> 
> | original  |pv
> ---
> total time(ms)  |1894   |   421
> 
> transferred ram(KB) |   398017  |  353242
> 
> 
> 
> Case 2: The guest has ever run some memory consuming workload, the
> workload is terminated just before live migration.
> 
> | original  |pv
> ---
> total time(ms)  |   7436|   552
> 
> transferred ram(KB) |  8146291  |  361375
> 
> 
> Liang Li (4):
>   pc: Add code to get the lowmem form PCMachineState
>   virtio-balloon: Add a new feature to balloon device
>   migration: not set migration bitmap in setup stage
>   migration: filter out guest's free pages in ram bulk stage
> 
>  balloon.c   | 30 -
>  hw/i386/pc.c|  5 ++
>  hw/i386/pc_piix.c   |  1 +
>  hw/i386/pc_q35.c|  1 +
>  hw/virtio/virtio-balloon.c  | 81 
> -
>  include/hw/i386/pc.h|  3 +-
>  include/hw/virtio/virtio-balloon.h  | 17 +-
>  include/standard-headers/linux/virtio_balloon.h |  1 +
>  include/sysemu/balloon.h| 10 ++-
>  migration/ram.c | 64 +++
>  10 files changed, 195 insertions(+), 18 deletions(-)
> 
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK