RE: [RFC qemu 0/4] A PV solution for live migration optimization
> > > > > > I'm just catching back up on this thread; so without > > > > > > reference to any particular previous mail in the thread. > > > > > > > > > > > > 1) How many of the free pages do we tell the host about? > > > > > > Your main change is telling the host about all the > > > > > > free pages. > > > > > > > > > > Yes, all the guest's free pages. > > > > > > > > > > > If we tell the host about all the free pages, then we might > > > > > > end up needing to allocate more pages and update the host > > > > > > with pages we now want to use; that would have to wait for the > > > > > > host to acknowledge that use of these pages, since if we don't > > > > > > wait for it then it might have skipped migrating a page we > > > > > > just started using (I don't understand how your series solves > > > > > > that). > > > > > > So the guest probably needs to keep some free pages - how > many? > > > > > > > > > > Actually, there is no need to care about whether the free pages > > > > > will be > > > used by the host. > > > > > We only care about some of the free pages we get reused by the > > > > > guest, > > > right? > > > > > > > > > > The dirty page logging can be used to solve this, starting the > > > > > dirty page logging before getting the free pages informant from guest. > > > > > Even some of the free pages are modified by the guest during the > > > > > process of getting the free pages information, these modified > > > > > pages will > > > be traced by the dirty page logging mechanism. So in the following > > > migration_bitmap_sync() function. > > > > > The pages in the free pages bitmap, but latter was modified, > > > > > will be reset to dirty. We won't omit any dirtied pages. > > > > > > > > > > So, guest doesn't need to keep any free pages. > > > > > > > > OK, yes, that works; so we do: > > > > * enable dirty logging > > > > * ask guest for free pages > > > > * initialise the migration bitmap as everything-free > > > > * then later we do the normal sync-dirty bitmap stuff and it all just > works. > > > > > > > > That's nice and simple. > > > > > > This works once, sure. But there's an issue is that you have to > > > defer migration until you get the free page list, and this only > > > works once. So you end up with heuristics about how long to wait. > > > > > > Instead I propose: > > > > > > - mark all pages dirty as we do now. > > > > > > - at start of migration, start tracking dirty > > > pages in kvm, and tell guest to start tracking free pages > > > > > > we can now introduce any kind of delay, for example wait for ack > > > from guest, or do whatever else, or even just start migrating pages > > > > > > - repeatedly: > > > - get list of free pages from guest > > > - clear them in migration bitmap > > > - get dirty list from kvm > > > > > > - at end of migration, stop tracking writes in kvm, > > > and tell guest to stop tracking free pages > > > > I had thought of filtering out the free pages in each migration bitmap > synchronization. > > The advantage is we can skip process as many free pages as possible. Not > just once. > > The disadvantage is that we should change the current memory > > management code to track the free pages, instead of traversing the free > page list to construct the free pages bitmap, to reduce the overhead to get > the free pages bitmap. > > I am not sure the if the Kernel people would like it. > > > > If keeping the traversing mechanism, because of the overhead, maybe it's > not worth to filter out the free pages repeatedly. > > Well, Michael's idea of not waiting for the dirty bitmap to be filled does > make > that idea of constnatly using the free-bitmap better. > No wait is a good idea. Actually, we could shorten the waiting time by pre allocating the free pages bit map and update it when guest allocating/freeing pages. it requires to modify the mm related code. I don't know whether the kernel people like this. > In that case, is it easier if something (guest/host?) allocates some memory in > the guests physical RAM space and just points the host to it, rather than > having an explicit 'send'. > Good idea too. Liang > Dave
RE: [RFC qemu 0/4] A PV solution for live migration optimization
> > > > > > I'm just catching back up on this thread; so without > > > > > > reference to any particular previous mail in the thread. > > > > > > > > > > > > 1) How many of the free pages do we tell the host about? > > > > > > Your main change is telling the host about all the > > > > > > free pages. > > > > > > > > > > Yes, all the guest's free pages. > > > > > > > > > > > If we tell the host about all the free pages, then we might > > > > > > end up needing to allocate more pages and update the host > > > > > > with pages we now want to use; that would have to wait for the > > > > > > host to acknowledge that use of these pages, since if we don't > > > > > > wait for it then it might have skipped migrating a page we > > > > > > just started using (I don't understand how your series solves > > > > > > that). > > > > > > So the guest probably needs to keep some free pages - how > many? > > > > > > > > > > Actually, there is no need to care about whether the free pages > > > > > will be > > > used by the host. > > > > > We only care about some of the free pages we get reused by the > > > > > guest, > > > right? > > > > > > > > > > The dirty page logging can be used to solve this, starting the > > > > > dirty page logging before getting the free pages informant from guest. > > > > > Even some of the free pages are modified by the guest during the > > > > > process of getting the free pages information, these modified > > > > > pages will > > > be traced by the dirty page logging mechanism. So in the following > > > migration_bitmap_sync() function. > > > > > The pages in the free pages bitmap, but latter was modified, > > > > > will be reset to dirty. We won't omit any dirtied pages. > > > > > > > > > > So, guest doesn't need to keep any free pages. > > > > > > > > OK, yes, that works; so we do: > > > > * enable dirty logging > > > > * ask guest for free pages > > > > * initialise the migration bitmap as everything-free > > > > * then later we do the normal sync-dirty bitmap stuff and it all just > works. > > > > > > > > That's nice and simple. > > > > > > This works once, sure. But there's an issue is that you have to > > > defer migration until you get the free page list, and this only > > > works once. So you end up with heuristics about how long to wait. > > > > > > Instead I propose: > > > > > > - mark all pages dirty as we do now. > > > > > > - at start of migration, start tracking dirty > > > pages in kvm, and tell guest to start tracking free pages > > > > > > we can now introduce any kind of delay, for example wait for ack > > > from guest, or do whatever else, or even just start migrating pages > > > > > > - repeatedly: > > > - get list of free pages from guest > > > - clear them in migration bitmap > > > - get dirty list from kvm > > > > > > - at end of migration, stop tracking writes in kvm, > > > and tell guest to stop tracking free pages > > > > I had thought of filtering out the free pages in each migration bitmap > synchronization. > > The advantage is we can skip process as many free pages as possible. Not > just once. > > The disadvantage is that we should change the current memory > > management code to track the free pages, instead of traversing the free > page list to construct the free pages bitmap, to reduce the overhead to get > the free pages bitmap. > > I am not sure the if the Kernel people would like it. > > > > If keeping the traversing mechanism, because of the overhead, maybe it's > not worth to filter out the free pages repeatedly. > > Well, Michael's idea of not waiting for the dirty bitmap to be filled does > make > that idea of constnatly using the free-bitmap better. > No wait is a good idea. Actually, we could shorten the waiting time by pre allocating the free pages bit map and update it when guest allocating/freeing pages. it requires to modify the mm related code. I don't know whether the kernel people like this. > In that case, is it easier if something (guest/host?) allocates some memory in > the guests physical RAM space and just points the host to it, rather than > having an explicit 'send'. > Good idea too. Liang > Dave
Re: [RFC qemu 0/4] A PV solution for live migration optimization
* Li, Liang Z (liang.z...@intel.com) wrote: > > On Mon, Mar 14, 2016 at 05:03:34PM +, Dr. David Alan Gilbert wrote: > > > * Li, Liang Z (liang.z...@intel.com) wrote: > > > > > > > > > > Hi, > > > > > I'm just catching back up on this thread; so without reference > > > > > to any particular previous mail in the thread. > > > > > > > > > > 1) How many of the free pages do we tell the host about? > > > > > Your main change is telling the host about all the > > > > > free pages. > > > > > > > > Yes, all the guest's free pages. > > > > > > > > > If we tell the host about all the free pages, then we might > > > > > end up needing to allocate more pages and update the host > > > > > with pages we now want to use; that would have to wait for the > > > > > host to acknowledge that use of these pages, since if we don't > > > > > wait for it then it might have skipped migrating a page we > > > > > just started using (I don't understand how your series solves > > > > > that). > > > > > So the guest probably needs to keep some free pages - how many? > > > > > > > > Actually, there is no need to care about whether the free pages will be > > used by the host. > > > > We only care about some of the free pages we get reused by the guest, > > right? > > > > > > > > The dirty page logging can be used to solve this, starting the dirty > > > > page logging before getting the free pages informant from guest. > > > > Even some of the free pages are modified by the guest during the > > > > process of getting the free pages information, these modified pages will > > be traced by the dirty page logging mechanism. So in the following > > migration_bitmap_sync() function. > > > > The pages in the free pages bitmap, but latter was modified, will be > > > > reset to dirty. We won't omit any dirtied pages. > > > > > > > > So, guest doesn't need to keep any free pages. > > > > > > OK, yes, that works; so we do: > > > * enable dirty logging > > > * ask guest for free pages > > > * initialise the migration bitmap as everything-free > > > * then later we do the normal sync-dirty bitmap stuff and it all just > > > works. > > > > > > That's nice and simple. > > > > This works once, sure. But there's an issue is that you have to defer > > migration > > until you get the free page list, and this only works once. So you end up > > with > > heuristics about how long to wait. > > > > Instead I propose: > > > > - mark all pages dirty as we do now. > > > > - at start of migration, start tracking dirty > > pages in kvm, and tell guest to start tracking free pages > > > > we can now introduce any kind of delay, for example wait for ack from guest, > > or do whatever else, or even just start migrating pages > > > > - repeatedly: > > - get list of free pages from guest > > - clear them in migration bitmap > > - get dirty list from kvm > > > > - at end of migration, stop tracking writes in kvm, > > and tell guest to stop tracking free pages > > I had thought of filtering out the free pages in each migration bitmap > synchronization. > The advantage is we can skip process as many free pages as possible. Not just > once. > The disadvantage is that we should change the current memory management code > to track the free pages, > instead of traversing the free page list to construct the free pages bitmap, > to reduce the overhead to get the free pages bitmap. > I am not sure the if the Kernel people would like it. > > If keeping the traversing mechanism, because of the overhead, maybe it's not > worth to filter out the free pages repeatedly. Well, Michael's idea of not waiting for the dirty bitmap to be filled does make that idea of constnatly using the free-bitmap better. In that case, is it easier if something (guest/host?) allocates some memory in the guests physical RAM space and just points the host to it, rather than having an explicit 'send'. Dave > Liang > > > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [RFC qemu 0/4] A PV solution for live migration optimization
* Li, Liang Z (liang.z...@intel.com) wrote: > > On Mon, Mar 14, 2016 at 05:03:34PM +, Dr. David Alan Gilbert wrote: > > > * Li, Liang Z (liang.z...@intel.com) wrote: > > > > > > > > > > Hi, > > > > > I'm just catching back up on this thread; so without reference > > > > > to any particular previous mail in the thread. > > > > > > > > > > 1) How many of the free pages do we tell the host about? > > > > > Your main change is telling the host about all the > > > > > free pages. > > > > > > > > Yes, all the guest's free pages. > > > > > > > > > If we tell the host about all the free pages, then we might > > > > > end up needing to allocate more pages and update the host > > > > > with pages we now want to use; that would have to wait for the > > > > > host to acknowledge that use of these pages, since if we don't > > > > > wait for it then it might have skipped migrating a page we > > > > > just started using (I don't understand how your series solves > > > > > that). > > > > > So the guest probably needs to keep some free pages - how many? > > > > > > > > Actually, there is no need to care about whether the free pages will be > > used by the host. > > > > We only care about some of the free pages we get reused by the guest, > > right? > > > > > > > > The dirty page logging can be used to solve this, starting the dirty > > > > page logging before getting the free pages informant from guest. > > > > Even some of the free pages are modified by the guest during the > > > > process of getting the free pages information, these modified pages will > > be traced by the dirty page logging mechanism. So in the following > > migration_bitmap_sync() function. > > > > The pages in the free pages bitmap, but latter was modified, will be > > > > reset to dirty. We won't omit any dirtied pages. > > > > > > > > So, guest doesn't need to keep any free pages. > > > > > > OK, yes, that works; so we do: > > > * enable dirty logging > > > * ask guest for free pages > > > * initialise the migration bitmap as everything-free > > > * then later we do the normal sync-dirty bitmap stuff and it all just > > > works. > > > > > > That's nice and simple. > > > > This works once, sure. But there's an issue is that you have to defer > > migration > > until you get the free page list, and this only works once. So you end up > > with > > heuristics about how long to wait. > > > > Instead I propose: > > > > - mark all pages dirty as we do now. > > > > - at start of migration, start tracking dirty > > pages in kvm, and tell guest to start tracking free pages > > > > we can now introduce any kind of delay, for example wait for ack from guest, > > or do whatever else, or even just start migrating pages > > > > - repeatedly: > > - get list of free pages from guest > > - clear them in migration bitmap > > - get dirty list from kvm > > > > - at end of migration, stop tracking writes in kvm, > > and tell guest to stop tracking free pages > > I had thought of filtering out the free pages in each migration bitmap > synchronization. > The advantage is we can skip process as many free pages as possible. Not just > once. > The disadvantage is that we should change the current memory management code > to track the free pages, > instead of traversing the free page list to construct the free pages bitmap, > to reduce the overhead to get the free pages bitmap. > I am not sure the if the Kernel people would like it. > > If keeping the traversing mechanism, because of the overhead, maybe it's not > worth to filter out the free pages repeatedly. Well, Michael's idea of not waiting for the dirty bitmap to be filled does make that idea of constnatly using the free-bitmap better. In that case, is it easier if something (guest/host?) allocates some memory in the guests physical RAM space and just points the host to it, rather than having an explicit 'send'. Dave > Liang > > > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
RE: [RFC qemu 0/4] A PV solution for live migration optimization
> On Mon, Mar 14, 2016 at 05:03:34PM +, Dr. David Alan Gilbert wrote: > > * Li, Liang Z (liang.z...@intel.com) wrote: > > > > > > > > Hi, > > > > I'm just catching back up on this thread; so without reference > > > > to any particular previous mail in the thread. > > > > > > > > 1) How many of the free pages do we tell the host about? > > > > Your main change is telling the host about all the > > > > free pages. > > > > > > Yes, all the guest's free pages. > > > > > > > If we tell the host about all the free pages, then we might > > > > end up needing to allocate more pages and update the host > > > > with pages we now want to use; that would have to wait for the > > > > host to acknowledge that use of these pages, since if we don't > > > > wait for it then it might have skipped migrating a page we > > > > just started using (I don't understand how your series solves > > > > that). > > > > So the guest probably needs to keep some free pages - how many? > > > > > > Actually, there is no need to care about whether the free pages will be > used by the host. > > > We only care about some of the free pages we get reused by the guest, > right? > > > > > > The dirty page logging can be used to solve this, starting the dirty > > > page logging before getting the free pages informant from guest. > > > Even some of the free pages are modified by the guest during the > > > process of getting the free pages information, these modified pages will > be traced by the dirty page logging mechanism. So in the following > migration_bitmap_sync() function. > > > The pages in the free pages bitmap, but latter was modified, will be > > > reset to dirty. We won't omit any dirtied pages. > > > > > > So, guest doesn't need to keep any free pages. > > > > OK, yes, that works; so we do: > > * enable dirty logging > > * ask guest for free pages > > * initialise the migration bitmap as everything-free > > * then later we do the normal sync-dirty bitmap stuff and it all just > > works. > > > > That's nice and simple. > > This works once, sure. But there's an issue is that you have to defer > migration > until you get the free page list, and this only works once. So you end up with > heuristics about how long to wait. > > Instead I propose: > > - mark all pages dirty as we do now. > > - at start of migration, start tracking dirty > pages in kvm, and tell guest to start tracking free pages > > we can now introduce any kind of delay, for example wait for ack from guest, > or do whatever else, or even just start migrating pages > > - repeatedly: > - get list of free pages from guest > - clear them in migration bitmap > - get dirty list from kvm > > - at end of migration, stop tracking writes in kvm, > and tell guest to stop tracking free pages I had thought of filtering out the free pages in each migration bitmap synchronization. The advantage is we can skip process as many free pages as possible. Not just once. The disadvantage is that we should change the current memory management code to track the free pages, instead of traversing the free page list to construct the free pages bitmap, to reduce the overhead to get the free pages bitmap. I am not sure the if the Kernel people would like it. If keeping the traversing mechanism, because of the overhead, maybe it's not worth to filter out the free pages repeatedly. Liang
RE: [RFC qemu 0/4] A PV solution for live migration optimization
> On Mon, Mar 14, 2016 at 05:03:34PM +, Dr. David Alan Gilbert wrote: > > * Li, Liang Z (liang.z...@intel.com) wrote: > > > > > > > > Hi, > > > > I'm just catching back up on this thread; so without reference > > > > to any particular previous mail in the thread. > > > > > > > > 1) How many of the free pages do we tell the host about? > > > > Your main change is telling the host about all the > > > > free pages. > > > > > > Yes, all the guest's free pages. > > > > > > > If we tell the host about all the free pages, then we might > > > > end up needing to allocate more pages and update the host > > > > with pages we now want to use; that would have to wait for the > > > > host to acknowledge that use of these pages, since if we don't > > > > wait for it then it might have skipped migrating a page we > > > > just started using (I don't understand how your series solves > > > > that). > > > > So the guest probably needs to keep some free pages - how many? > > > > > > Actually, there is no need to care about whether the free pages will be > used by the host. > > > We only care about some of the free pages we get reused by the guest, > right? > > > > > > The dirty page logging can be used to solve this, starting the dirty > > > page logging before getting the free pages informant from guest. > > > Even some of the free pages are modified by the guest during the > > > process of getting the free pages information, these modified pages will > be traced by the dirty page logging mechanism. So in the following > migration_bitmap_sync() function. > > > The pages in the free pages bitmap, but latter was modified, will be > > > reset to dirty. We won't omit any dirtied pages. > > > > > > So, guest doesn't need to keep any free pages. > > > > OK, yes, that works; so we do: > > * enable dirty logging > > * ask guest for free pages > > * initialise the migration bitmap as everything-free > > * then later we do the normal sync-dirty bitmap stuff and it all just > > works. > > > > That's nice and simple. > > This works once, sure. But there's an issue is that you have to defer > migration > until you get the free page list, and this only works once. So you end up with > heuristics about how long to wait. > > Instead I propose: > > - mark all pages dirty as we do now. > > - at start of migration, start tracking dirty > pages in kvm, and tell guest to start tracking free pages > > we can now introduce any kind of delay, for example wait for ack from guest, > or do whatever else, or even just start migrating pages > > - repeatedly: > - get list of free pages from guest > - clear them in migration bitmap > - get dirty list from kvm > > - at end of migration, stop tracking writes in kvm, > and tell guest to stop tracking free pages I had thought of filtering out the free pages in each migration bitmap synchronization. The advantage is we can skip process as many free pages as possible. Not just once. The disadvantage is that we should change the current memory management code to track the free pages, instead of traversing the free page list to construct the free pages bitmap, to reduce the overhead to get the free pages bitmap. I am not sure the if the Kernel people would like it. If keeping the traversing mechanism, because of the overhead, maybe it's not worth to filter out the free pages repeatedly. Liang
Re: [RFC qemu 0/4] A PV solution for live migration optimization
On Mon, Mar 14, 2016 at 05:03:34PM +, Dr. David Alan Gilbert wrote: > * Li, Liang Z (liang.z...@intel.com) wrote: > > > > > > Hi, > > > I'm just catching back up on this thread; so without reference to any > > > particular previous mail in the thread. > > > > > > 1) How many of the free pages do we tell the host about? > > > Your main change is telling the host about all the > > > free pages. > > > > Yes, all the guest's free pages. > > > > > If we tell the host about all the free pages, then we might > > > end up needing to allocate more pages and update the host > > > with pages we now want to use; that would have to wait for the > > > host to acknowledge that use of these pages, since if we don't > > > wait for it then it might have skipped migrating a page we > > > just started using (I don't understand how your series solves that). > > > So the guest probably needs to keep some free pages - how many? > > > > Actually, there is no need to care about whether the free pages will be > > used by the host. > > We only care about some of the free pages we get reused by the guest, right? > > > > The dirty page logging can be used to solve this, starting the dirty page > > logging before getting > > the free pages informant from guest. Even some of the free pages are > > modified by the guest > > during the process of getting the free pages information, these modified > > pages will be traced > > by the dirty page logging mechanism. So in the following > > migration_bitmap_sync() function. > > The pages in the free pages bitmap, but latter was modified, will be reset > > to dirty. We won't > > omit any dirtied pages. > > > > So, guest doesn't need to keep any free pages. > > OK, yes, that works; so we do: > * enable dirty logging > * ask guest for free pages > * initialise the migration bitmap as everything-free > * then later we do the normal sync-dirty bitmap stuff and it all just works. > > That's nice and simple. This works once, sure. But there's an issue is that you have to defer migration until you get the free page list, and this only works once. So you end up with heuristics about how long to wait. Instead I propose: - mark all pages dirty as we do now. - at start of migration, start tracking dirty pages in kvm, and tell guest to start tracking free pages we can now introduce any kind of delay, for example wait for ack from guest, or do whatever else, or even just start migrating pages - repeatedly: - get list of free pages from guest - clear them in migration bitmap - get dirty list from kvm - at end of migration, stop tracking writes in kvm, and tell guest to stop tracking free pages > > > 2) Clearing out caches > > > Does it make sense to clean caches? They're apparently useful data > > > so if we clean them it's likely to slow the guest down; I guess > > > they're also likely to be fairly static data - so at least fairly > > > easy to migrate. > > > The answer here partially depends on what you want from your > > > migration; > > > if you're after the fastest possible migration time it might make > > > sense to clean the caches and avoid migrating them; but that might > > > be at the cost of more disruption to the guest - there's a trade off > > > somewhere and it's not clear to me how you set that depending on your > > > guest/network/reqirements. > > > > > > > Yes, clean the caches is an option. Let the users decide using it or not. > > > > > 3) Why is ballooning slow? > > > You've got a figure of 5s to balloon on an 8GB VM - but an > > > 8GB VM isn't huge; so I worry about how long it would take > > > on a big VM. We need to understand why it's slow > > >* is it due to the guest shuffling pages around? > > >* is it due to the virtio-balloon protocol sending one page > > > at a time? > > > + Do balloon pages normally clump in physical memory > > > - i.e. would a 'large balloon' message help > > > - or do we need a bitmap because it tends not to clump? > > > > > > > I didn't do a comprehensive test. But I found most of the time spending > > on allocating the pages and sending the PFNs to guest, I don't know that's > > the most time consuming operation, allocating the pages or sending the PFNs. > > It might be a good idea to analyse it a bit more to convince people where > the problem is. > > > >* is it due to the madvise on the host? > > > If we were using the normal balloon messages, then we > > > could, during migration, just route those to the migration > > > code rather than bothering with the madvise. > > > If they're clumping together we could just turn that into > > > one big madvise; if they're not then would we benefit from > > > a call that lets us madvise lots of areas? > > >
Re: [RFC qemu 0/4] A PV solution for live migration optimization
On Mon, Mar 14, 2016 at 05:03:34PM +, Dr. David Alan Gilbert wrote: > * Li, Liang Z (liang.z...@intel.com) wrote: > > > > > > Hi, > > > I'm just catching back up on this thread; so without reference to any > > > particular previous mail in the thread. > > > > > > 1) How many of the free pages do we tell the host about? > > > Your main change is telling the host about all the > > > free pages. > > > > Yes, all the guest's free pages. > > > > > If we tell the host about all the free pages, then we might > > > end up needing to allocate more pages and update the host > > > with pages we now want to use; that would have to wait for the > > > host to acknowledge that use of these pages, since if we don't > > > wait for it then it might have skipped migrating a page we > > > just started using (I don't understand how your series solves that). > > > So the guest probably needs to keep some free pages - how many? > > > > Actually, there is no need to care about whether the free pages will be > > used by the host. > > We only care about some of the free pages we get reused by the guest, right? > > > > The dirty page logging can be used to solve this, starting the dirty page > > logging before getting > > the free pages informant from guest. Even some of the free pages are > > modified by the guest > > during the process of getting the free pages information, these modified > > pages will be traced > > by the dirty page logging mechanism. So in the following > > migration_bitmap_sync() function. > > The pages in the free pages bitmap, but latter was modified, will be reset > > to dirty. We won't > > omit any dirtied pages. > > > > So, guest doesn't need to keep any free pages. > > OK, yes, that works; so we do: > * enable dirty logging > * ask guest for free pages > * initialise the migration bitmap as everything-free > * then later we do the normal sync-dirty bitmap stuff and it all just works. > > That's nice and simple. This works once, sure. But there's an issue is that you have to defer migration until you get the free page list, and this only works once. So you end up with heuristics about how long to wait. Instead I propose: - mark all pages dirty as we do now. - at start of migration, start tracking dirty pages in kvm, and tell guest to start tracking free pages we can now introduce any kind of delay, for example wait for ack from guest, or do whatever else, or even just start migrating pages - repeatedly: - get list of free pages from guest - clear them in migration bitmap - get dirty list from kvm - at end of migration, stop tracking writes in kvm, and tell guest to stop tracking free pages > > > 2) Clearing out caches > > > Does it make sense to clean caches? They're apparently useful data > > > so if we clean them it's likely to slow the guest down; I guess > > > they're also likely to be fairly static data - so at least fairly > > > easy to migrate. > > > The answer here partially depends on what you want from your > > > migration; > > > if you're after the fastest possible migration time it might make > > > sense to clean the caches and avoid migrating them; but that might > > > be at the cost of more disruption to the guest - there's a trade off > > > somewhere and it's not clear to me how you set that depending on your > > > guest/network/reqirements. > > > > > > > Yes, clean the caches is an option. Let the users decide using it or not. > > > > > 3) Why is ballooning slow? > > > You've got a figure of 5s to balloon on an 8GB VM - but an > > > 8GB VM isn't huge; so I worry about how long it would take > > > on a big VM. We need to understand why it's slow > > >* is it due to the guest shuffling pages around? > > >* is it due to the virtio-balloon protocol sending one page > > > at a time? > > > + Do balloon pages normally clump in physical memory > > > - i.e. would a 'large balloon' message help > > > - or do we need a bitmap because it tends not to clump? > > > > > > > I didn't do a comprehensive test. But I found most of the time spending > > on allocating the pages and sending the PFNs to guest, I don't know that's > > the most time consuming operation, allocating the pages or sending the PFNs. > > It might be a good idea to analyse it a bit more to convince people where > the problem is. > > > >* is it due to the madvise on the host? > > > If we were using the normal balloon messages, then we > > > could, during migration, just route those to the migration > > > code rather than bothering with the madvise. > > > If they're clumping together we could just turn that into > > > one big madvise; if they're not then would we benefit from > > > a call that lets us madvise lots of areas? > > >
RE: [RFC qemu 0/4] A PV solution for live migration optimization
> > > Hi, > > > I'm just catching back up on this thread; so without reference to > > > any particular previous mail in the thread. > > > > > > 1) How many of the free pages do we tell the host about? > > > Your main change is telling the host about all the > > > free pages. > > > > Yes, all the guest's free pages. > > > > > If we tell the host about all the free pages, then we might > > > end up needing to allocate more pages and update the host > > > with pages we now want to use; that would have to wait for the > > > host to acknowledge that use of these pages, since if we don't > > > wait for it then it might have skipped migrating a page we > > > just started using (I don't understand how your series solves that). > > > So the guest probably needs to keep some free pages - how many? > > > > Actually, there is no need to care about whether the free pages will be > used by the host. > > We only care about some of the free pages we get reused by the guest, > right? > > > > The dirty page logging can be used to solve this, starting the dirty > > page logging before getting the free pages informant from guest. Even > > some of the free pages are modified by the guest during the process of > > getting the free pages information, these modified pages will be traced by > the dirty page logging mechanism. So in the following > migration_bitmap_sync() function. > > The pages in the free pages bitmap, but latter was modified, will be > > reset to dirty. We won't omit any dirtied pages. > > > > So, guest doesn't need to keep any free pages. > > OK, yes, that works; so we do: > * enable dirty logging > * ask guest for free pages > * initialise the migration bitmap as everything-free > * then later we do the normal sync-dirty bitmap stuff and it all just works. > > That's nice and simple. > > > > 2) Clearing out caches > > > Does it make sense to clean caches? They're apparently useful data > > > so if we clean them it's likely to slow the guest down; I guess > > > they're also likely to be fairly static data - so at least fairly > > > easy to migrate. > > > The answer here partially depends on what you want from your > migration; > > > if you're after the fastest possible migration time it might make > > > sense to clean the caches and avoid migrating them; but that might > > > be at the cost of more disruption to the guest - there's a trade off > > > somewhere and it's not clear to me how you set that depending on > your > > > guest/network/reqirements. > > > > > > > Yes, clean the caches is an option. Let the users decide using it or not. > > > > > 3) Why is ballooning slow? > > > You've got a figure of 5s to balloon on an 8GB VM - but an > > > 8GB VM isn't huge; so I worry about how long it would take > > > on a big VM. We need to understand why it's slow > > >* is it due to the guest shuffling pages around? > > >* is it due to the virtio-balloon protocol sending one page > > > at a time? > > > + Do balloon pages normally clump in physical memory > > > - i.e. would a 'large balloon' message help > > > - or do we need a bitmap because it tends not to clump? > > > > > > > I didn't do a comprehensive test. But I found most of the time > > spending on allocating the pages and sending the PFNs to guest, I > > don't know that's the most time consuming operation, allocating the pages > or sending the PFNs. > > It might be a good idea to analyse it a bit more to convince people where the > problem is. > Yes, I will try to measure the time spending on different parts. > > >* is it due to the madvise on the host? > > > If we were using the normal balloon messages, then we > > > could, during migration, just route those to the migration > > > code rather than bothering with the madvise. > > > If they're clumping together we could just turn that into > > > one big madvise; if they're not then would we benefit from > > > a call that lets us madvise lots of areas? > > > > > > > My test showed madvise() is not the main reason for the long time, > > only taken 10% of the total inflating balloon operation time. > > Big madvise can more or less improve the performance. > > OK; 10% of the total is still pretty big even for your 8GB VM. > > > > 4) Speeding up the migration of those free pages > > > You're using the bitmap to avoid migrating those free pages; HPe's > > > patchset is reconstructing a bitmap from the balloon data; OK, so > > > this all makes sense to avoid migrating them - I'd also been thinking > > > of using pagemap to spot zero pages that would help find other zero'd > > > pages, but perhaps ballooned is enough? > > > > > Could you describe your ideal with more details? > > At the moment the migration code spends a fair amount of time
RE: [RFC qemu 0/4] A PV solution for live migration optimization
> > > Hi, > > > I'm just catching back up on this thread; so without reference to > > > any particular previous mail in the thread. > > > > > > 1) How many of the free pages do we tell the host about? > > > Your main change is telling the host about all the > > > free pages. > > > > Yes, all the guest's free pages. > > > > > If we tell the host about all the free pages, then we might > > > end up needing to allocate more pages and update the host > > > with pages we now want to use; that would have to wait for the > > > host to acknowledge that use of these pages, since if we don't > > > wait for it then it might have skipped migrating a page we > > > just started using (I don't understand how your series solves that). > > > So the guest probably needs to keep some free pages - how many? > > > > Actually, there is no need to care about whether the free pages will be > used by the host. > > We only care about some of the free pages we get reused by the guest, > right? > > > > The dirty page logging can be used to solve this, starting the dirty > > page logging before getting the free pages informant from guest. Even > > some of the free pages are modified by the guest during the process of > > getting the free pages information, these modified pages will be traced by > the dirty page logging mechanism. So in the following > migration_bitmap_sync() function. > > The pages in the free pages bitmap, but latter was modified, will be > > reset to dirty. We won't omit any dirtied pages. > > > > So, guest doesn't need to keep any free pages. > > OK, yes, that works; so we do: > * enable dirty logging > * ask guest for free pages > * initialise the migration bitmap as everything-free > * then later we do the normal sync-dirty bitmap stuff and it all just works. > > That's nice and simple. > > > > 2) Clearing out caches > > > Does it make sense to clean caches? They're apparently useful data > > > so if we clean them it's likely to slow the guest down; I guess > > > they're also likely to be fairly static data - so at least fairly > > > easy to migrate. > > > The answer here partially depends on what you want from your > migration; > > > if you're after the fastest possible migration time it might make > > > sense to clean the caches and avoid migrating them; but that might > > > be at the cost of more disruption to the guest - there's a trade off > > > somewhere and it's not clear to me how you set that depending on > your > > > guest/network/reqirements. > > > > > > > Yes, clean the caches is an option. Let the users decide using it or not. > > > > > 3) Why is ballooning slow? > > > You've got a figure of 5s to balloon on an 8GB VM - but an > > > 8GB VM isn't huge; so I worry about how long it would take > > > on a big VM. We need to understand why it's slow > > >* is it due to the guest shuffling pages around? > > >* is it due to the virtio-balloon protocol sending one page > > > at a time? > > > + Do balloon pages normally clump in physical memory > > > - i.e. would a 'large balloon' message help > > > - or do we need a bitmap because it tends not to clump? > > > > > > > I didn't do a comprehensive test. But I found most of the time > > spending on allocating the pages and sending the PFNs to guest, I > > don't know that's the most time consuming operation, allocating the pages > or sending the PFNs. > > It might be a good idea to analyse it a bit more to convince people where the > problem is. > Yes, I will try to measure the time spending on different parts. > > >* is it due to the madvise on the host? > > > If we were using the normal balloon messages, then we > > > could, during migration, just route those to the migration > > > code rather than bothering with the madvise. > > > If they're clumping together we could just turn that into > > > one big madvise; if they're not then would we benefit from > > > a call that lets us madvise lots of areas? > > > > > > > My test showed madvise() is not the main reason for the long time, > > only taken 10% of the total inflating balloon operation time. > > Big madvise can more or less improve the performance. > > OK; 10% of the total is still pretty big even for your 8GB VM. > > > > 4) Speeding up the migration of those free pages > > > You're using the bitmap to avoid migrating those free pages; HPe's > > > patchset is reconstructing a bitmap from the balloon data; OK, so > > > this all makes sense to avoid migrating them - I'd also been thinking > > > of using pagemap to spot zero pages that would help find other zero'd > > > pages, but perhaps ballooned is enough? > > > > > Could you describe your ideal with more details? > > At the moment the migration code spends a fair amount of time
Re: [RFC qemu 0/4] A PV solution for live migration optimization
* Li, Liang Z (liang.z...@intel.com) wrote: > > > > Hi, > > I'm just catching back up on this thread; so without reference to any > > particular previous mail in the thread. > > > > 1) How many of the free pages do we tell the host about? > > Your main change is telling the host about all the > > free pages. > > Yes, all the guest's free pages. > > > If we tell the host about all the free pages, then we might > > end up needing to allocate more pages and update the host > > with pages we now want to use; that would have to wait for the > > host to acknowledge that use of these pages, since if we don't > > wait for it then it might have skipped migrating a page we > > just started using (I don't understand how your series solves that). > > So the guest probably needs to keep some free pages - how many? > > Actually, there is no need to care about whether the free pages will be used > by the host. > We only care about some of the free pages we get reused by the guest, right? > > The dirty page logging can be used to solve this, starting the dirty page > logging before getting > the free pages informant from guest. Even some of the free pages are modified > by the guest > during the process of getting the free pages information, these modified > pages will be traced > by the dirty page logging mechanism. So in the following > migration_bitmap_sync() function. > The pages in the free pages bitmap, but latter was modified, will be reset to > dirty. We won't > omit any dirtied pages. > > So, guest doesn't need to keep any free pages. OK, yes, that works; so we do: * enable dirty logging * ask guest for free pages * initialise the migration bitmap as everything-free * then later we do the normal sync-dirty bitmap stuff and it all just works. That's nice and simple. > > 2) Clearing out caches > > Does it make sense to clean caches? They're apparently useful data > > so if we clean them it's likely to slow the guest down; I guess > > they're also likely to be fairly static data - so at least fairly > > easy to migrate. > > The answer here partially depends on what you want from your migration; > > if you're after the fastest possible migration time it might make > > sense to clean the caches and avoid migrating them; but that might > > be at the cost of more disruption to the guest - there's a trade off > > somewhere and it's not clear to me how you set that depending on your > > guest/network/reqirements. > > > > Yes, clean the caches is an option. Let the users decide using it or not. > > > 3) Why is ballooning slow? > > You've got a figure of 5s to balloon on an 8GB VM - but an > > 8GB VM isn't huge; so I worry about how long it would take > > on a big VM. We need to understand why it's slow > >* is it due to the guest shuffling pages around? > >* is it due to the virtio-balloon protocol sending one page > > at a time? > > + Do balloon pages normally clump in physical memory > > - i.e. would a 'large balloon' message help > > - or do we need a bitmap because it tends not to clump? > > > > I didn't do a comprehensive test. But I found most of the time spending > on allocating the pages and sending the PFNs to guest, I don't know that's > the most time consuming operation, allocating the pages or sending the PFNs. It might be a good idea to analyse it a bit more to convince people where the problem is. > >* is it due to the madvise on the host? > > If we were using the normal balloon messages, then we > > could, during migration, just route those to the migration > > code rather than bothering with the madvise. > > If they're clumping together we could just turn that into > > one big madvise; if they're not then would we benefit from > > a call that lets us madvise lots of areas? > > > > My test showed madvise() is not the main reason for the long time, only taken > 10% of the total inflating balloon operation time. > Big madvise can more or less improve the performance. OK; 10% of the total is still pretty big even for your 8GB VM. > > 4) Speeding up the migration of those free pages > > You're using the bitmap to avoid migrating those free pages; HPe's > > patchset is reconstructing a bitmap from the balloon data; OK, so > > this all makes sense to avoid migrating them - I'd also been thinking > > of using pagemap to spot zero pages that would help find other zero'd > > pages, but perhaps ballooned is enough? > > > Could you describe your ideal with more details? At the moment the migration code spends a fair amount of time checking if a page is zero; I was thinking perhaps the qemu could just open /proc/self/pagemap and check if the page was mapped; that would seem cheap if we're checking big ranges; and that
Re: [RFC qemu 0/4] A PV solution for live migration optimization
* Li, Liang Z (liang.z...@intel.com) wrote: > > > > Hi, > > I'm just catching back up on this thread; so without reference to any > > particular previous mail in the thread. > > > > 1) How many of the free pages do we tell the host about? > > Your main change is telling the host about all the > > free pages. > > Yes, all the guest's free pages. > > > If we tell the host about all the free pages, then we might > > end up needing to allocate more pages and update the host > > with pages we now want to use; that would have to wait for the > > host to acknowledge that use of these pages, since if we don't > > wait for it then it might have skipped migrating a page we > > just started using (I don't understand how your series solves that). > > So the guest probably needs to keep some free pages - how many? > > Actually, there is no need to care about whether the free pages will be used > by the host. > We only care about some of the free pages we get reused by the guest, right? > > The dirty page logging can be used to solve this, starting the dirty page > logging before getting > the free pages informant from guest. Even some of the free pages are modified > by the guest > during the process of getting the free pages information, these modified > pages will be traced > by the dirty page logging mechanism. So in the following > migration_bitmap_sync() function. > The pages in the free pages bitmap, but latter was modified, will be reset to > dirty. We won't > omit any dirtied pages. > > So, guest doesn't need to keep any free pages. OK, yes, that works; so we do: * enable dirty logging * ask guest for free pages * initialise the migration bitmap as everything-free * then later we do the normal sync-dirty bitmap stuff and it all just works. That's nice and simple. > > 2) Clearing out caches > > Does it make sense to clean caches? They're apparently useful data > > so if we clean them it's likely to slow the guest down; I guess > > they're also likely to be fairly static data - so at least fairly > > easy to migrate. > > The answer here partially depends on what you want from your migration; > > if you're after the fastest possible migration time it might make > > sense to clean the caches and avoid migrating them; but that might > > be at the cost of more disruption to the guest - there's a trade off > > somewhere and it's not clear to me how you set that depending on your > > guest/network/reqirements. > > > > Yes, clean the caches is an option. Let the users decide using it or not. > > > 3) Why is ballooning slow? > > You've got a figure of 5s to balloon on an 8GB VM - but an > > 8GB VM isn't huge; so I worry about how long it would take > > on a big VM. We need to understand why it's slow > >* is it due to the guest shuffling pages around? > >* is it due to the virtio-balloon protocol sending one page > > at a time? > > + Do balloon pages normally clump in physical memory > > - i.e. would a 'large balloon' message help > > - or do we need a bitmap because it tends not to clump? > > > > I didn't do a comprehensive test. But I found most of the time spending > on allocating the pages and sending the PFNs to guest, I don't know that's > the most time consuming operation, allocating the pages or sending the PFNs. It might be a good idea to analyse it a bit more to convince people where the problem is. > >* is it due to the madvise on the host? > > If we were using the normal balloon messages, then we > > could, during migration, just route those to the migration > > code rather than bothering with the madvise. > > If they're clumping together we could just turn that into > > one big madvise; if they're not then would we benefit from > > a call that lets us madvise lots of areas? > > > > My test showed madvise() is not the main reason for the long time, only taken > 10% of the total inflating balloon operation time. > Big madvise can more or less improve the performance. OK; 10% of the total is still pretty big even for your 8GB VM. > > 4) Speeding up the migration of those free pages > > You're using the bitmap to avoid migrating those free pages; HPe's > > patchset is reconstructing a bitmap from the balloon data; OK, so > > this all makes sense to avoid migrating them - I'd also been thinking > > of using pagemap to spot zero pages that would help find other zero'd > > pages, but perhaps ballooned is enough? > > > Could you describe your ideal with more details? At the moment the migration code spends a fair amount of time checking if a page is zero; I was thinking perhaps the qemu could just open /proc/self/pagemap and check if the page was mapped; that would seem cheap if we're checking big ranges; and that
RE: [RFC qemu 0/4] A PV solution for live migration optimization
> > Hi, > I'm just catching back up on this thread; so without reference to any > particular previous mail in the thread. > > 1) How many of the free pages do we tell the host about? > Your main change is telling the host about all the > free pages. Yes, all the guest's free pages. > If we tell the host about all the free pages, then we might > end up needing to allocate more pages and update the host > with pages we now want to use; that would have to wait for the > host to acknowledge that use of these pages, since if we don't > wait for it then it might have skipped migrating a page we > just started using (I don't understand how your series solves that). > So the guest probably needs to keep some free pages - how many? Actually, there is no need to care about whether the free pages will be used by the host. We only care about some of the free pages we get reused by the guest, right? The dirty page logging can be used to solve this, starting the dirty page logging before getting the free pages informant from guest. Even some of the free pages are modified by the guest during the process of getting the free pages information, these modified pages will be traced by the dirty page logging mechanism. So in the following migration_bitmap_sync() function. The pages in the free pages bitmap, but latter was modified, will be reset to dirty. We won't omit any dirtied pages. So, guest doesn't need to keep any free pages. > 2) Clearing out caches > Does it make sense to clean caches? They're apparently useful data > so if we clean them it's likely to slow the guest down; I guess > they're also likely to be fairly static data - so at least fairly > easy to migrate. > The answer here partially depends on what you want from your migration; > if you're after the fastest possible migration time it might make > sense to clean the caches and avoid migrating them; but that might > be at the cost of more disruption to the guest - there's a trade off > somewhere and it's not clear to me how you set that depending on your > guest/network/reqirements. > Yes, clean the caches is an option. Let the users decide using it or not. > 3) Why is ballooning slow? > You've got a figure of 5s to balloon on an 8GB VM - but an > 8GB VM isn't huge; so I worry about how long it would take > on a big VM. We need to understand why it's slow >* is it due to the guest shuffling pages around? >* is it due to the virtio-balloon protocol sending one page > at a time? > + Do balloon pages normally clump in physical memory > - i.e. would a 'large balloon' message help > - or do we need a bitmap because it tends not to clump? > I didn't do a comprehensive test. But I found most of the time spending on allocating the pages and sending the PFNs to guest, I don't know that's the most time consuming operation, allocating the pages or sending the PFNs. >* is it due to the madvise on the host? > If we were using the normal balloon messages, then we > could, during migration, just route those to the migration > code rather than bothering with the madvise. > If they're clumping together we could just turn that into > one big madvise; if they're not then would we benefit from > a call that lets us madvise lots of areas? > My test showed madvise() is not the main reason for the long time, only taken 10% of the total inflating balloon operation time. Big madvise can more or less improve the performance. > 4) Speeding up the migration of those free pages > You're using the bitmap to avoid migrating those free pages; HPe's > patchset is reconstructing a bitmap from the balloon data; OK, so > this all makes sense to avoid migrating them - I'd also been thinking > of using pagemap to spot zero pages that would help find other zero'd > pages, but perhaps ballooned is enough? > Could you describe your ideal with more details? > 5) Second-migrate > Given a VM where you've done all those tricks on, what happens when > you migrate it a second time? I guess you're aiming for the guest > to update it's bitmap; HPe's solution is to migrate it's balloon > bitmap along with the migration data. Nothing is special in the second migration, QEMU will request the guest for free pages Information, and the guest will traverse it's current free page list to construct a new free page bitmap and send it to QEMU. Just like in the first migration. Liang > > Dave > > -- > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
RE: [RFC qemu 0/4] A PV solution for live migration optimization
> > Hi, > I'm just catching back up on this thread; so without reference to any > particular previous mail in the thread. > > 1) How many of the free pages do we tell the host about? > Your main change is telling the host about all the > free pages. Yes, all the guest's free pages. > If we tell the host about all the free pages, then we might > end up needing to allocate more pages and update the host > with pages we now want to use; that would have to wait for the > host to acknowledge that use of these pages, since if we don't > wait for it then it might have skipped migrating a page we > just started using (I don't understand how your series solves that). > So the guest probably needs to keep some free pages - how many? Actually, there is no need to care about whether the free pages will be used by the host. We only care about some of the free pages we get reused by the guest, right? The dirty page logging can be used to solve this, starting the dirty page logging before getting the free pages informant from guest. Even some of the free pages are modified by the guest during the process of getting the free pages information, these modified pages will be traced by the dirty page logging mechanism. So in the following migration_bitmap_sync() function. The pages in the free pages bitmap, but latter was modified, will be reset to dirty. We won't omit any dirtied pages. So, guest doesn't need to keep any free pages. > 2) Clearing out caches > Does it make sense to clean caches? They're apparently useful data > so if we clean them it's likely to slow the guest down; I guess > they're also likely to be fairly static data - so at least fairly > easy to migrate. > The answer here partially depends on what you want from your migration; > if you're after the fastest possible migration time it might make > sense to clean the caches and avoid migrating them; but that might > be at the cost of more disruption to the guest - there's a trade off > somewhere and it's not clear to me how you set that depending on your > guest/network/reqirements. > Yes, clean the caches is an option. Let the users decide using it or not. > 3) Why is ballooning slow? > You've got a figure of 5s to balloon on an 8GB VM - but an > 8GB VM isn't huge; so I worry about how long it would take > on a big VM. We need to understand why it's slow >* is it due to the guest shuffling pages around? >* is it due to the virtio-balloon protocol sending one page > at a time? > + Do balloon pages normally clump in physical memory > - i.e. would a 'large balloon' message help > - or do we need a bitmap because it tends not to clump? > I didn't do a comprehensive test. But I found most of the time spending on allocating the pages and sending the PFNs to guest, I don't know that's the most time consuming operation, allocating the pages or sending the PFNs. >* is it due to the madvise on the host? > If we were using the normal balloon messages, then we > could, during migration, just route those to the migration > code rather than bothering with the madvise. > If they're clumping together we could just turn that into > one big madvise; if they're not then would we benefit from > a call that lets us madvise lots of areas? > My test showed madvise() is not the main reason for the long time, only taken 10% of the total inflating balloon operation time. Big madvise can more or less improve the performance. > 4) Speeding up the migration of those free pages > You're using the bitmap to avoid migrating those free pages; HPe's > patchset is reconstructing a bitmap from the balloon data; OK, so > this all makes sense to avoid migrating them - I'd also been thinking > of using pagemap to spot zero pages that would help find other zero'd > pages, but perhaps ballooned is enough? > Could you describe your ideal with more details? > 5) Second-migrate > Given a VM where you've done all those tricks on, what happens when > you migrate it a second time? I guess you're aiming for the guest > to update it's bitmap; HPe's solution is to migrate it's balloon > bitmap along with the migration data. Nothing is special in the second migration, QEMU will request the guest for free pages Information, and the guest will traverse it's current free page list to construct a new free page bitmap and send it to QEMU. Just like in the first migration. Liang > > Dave > > -- > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [RFC qemu 0/4] A PV solution for live migration optimization
Hi, I'm just catching back up on this thread; so without reference to any particular previous mail in the thread. 1) How many of the free pages do we tell the host about? Your main change is telling the host about all the free pages. If we tell the host about all the free pages, then we might end up needing to allocate more pages and update the host with pages we now want to use; that would have to wait for the host to acknowledge that use of these pages, since if we don't wait for it then it might have skipped migrating a page we just started using (I don't understand how your series solves that). So the guest probably needs to keep some free pages - how many? 2) Clearing out caches Does it make sense to clean caches? They're apparently useful data so if we clean them it's likely to slow the guest down; I guess they're also likely to be fairly static data - so at least fairly easy to migrate. The answer here partially depends on what you want from your migration; if you're after the fastest possible migration time it might make sense to clean the caches and avoid migrating them; but that might be at the cost of more disruption to the guest - there's a trade off somewhere and it's not clear to me how you set that depending on your guest/network/reqirements. 3) Why is ballooning slow? You've got a figure of 5s to balloon on an 8GB VM - but an 8GB VM isn't huge; so I worry about how long it would take on a big VM. We need to understand why it's slow * is it due to the guest shuffling pages around? * is it due to the virtio-balloon protocol sending one page at a time? + Do balloon pages normally clump in physical memory - i.e. would a 'large balloon' message help - or do we need a bitmap because it tends not to clump? * is it due to the madvise on the host? If we were using the normal balloon messages, then we could, during migration, just route those to the migration code rather than bothering with the madvise. If they're clumping together we could just turn that into one big madvise; if they're not then would we benefit from a call that lets us madvise lots of areas? 4) Speeding up the migration of those free pages You're using the bitmap to avoid migrating those free pages; HPe's patchset is reconstructing a bitmap from the balloon data; OK, so this all makes sense to avoid migrating them - I'd also been thinking of using pagemap to spot zero pages that would help find other zero'd pages, but perhaps ballooned is enough? 5) Second-migrate Given a VM where you've done all those tricks on, what happens when you migrate it a second time? I guess you're aiming for the guest to update it's bitmap; HPe's solution is to migrate it's balloon bitmap along with the migration data. Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [RFC qemu 0/4] A PV solution for live migration optimization
Hi, I'm just catching back up on this thread; so without reference to any particular previous mail in the thread. 1) How many of the free pages do we tell the host about? Your main change is telling the host about all the free pages. If we tell the host about all the free pages, then we might end up needing to allocate more pages and update the host with pages we now want to use; that would have to wait for the host to acknowledge that use of these pages, since if we don't wait for it then it might have skipped migrating a page we just started using (I don't understand how your series solves that). So the guest probably needs to keep some free pages - how many? 2) Clearing out caches Does it make sense to clean caches? They're apparently useful data so if we clean them it's likely to slow the guest down; I guess they're also likely to be fairly static data - so at least fairly easy to migrate. The answer here partially depends on what you want from your migration; if you're after the fastest possible migration time it might make sense to clean the caches and avoid migrating them; but that might be at the cost of more disruption to the guest - there's a trade off somewhere and it's not clear to me how you set that depending on your guest/network/reqirements. 3) Why is ballooning slow? You've got a figure of 5s to balloon on an 8GB VM - but an 8GB VM isn't huge; so I worry about how long it would take on a big VM. We need to understand why it's slow * is it due to the guest shuffling pages around? * is it due to the virtio-balloon protocol sending one page at a time? + Do balloon pages normally clump in physical memory - i.e. would a 'large balloon' message help - or do we need a bitmap because it tends not to clump? * is it due to the madvise on the host? If we were using the normal balloon messages, then we could, during migration, just route those to the migration code rather than bothering with the madvise. If they're clumping together we could just turn that into one big madvise; if they're not then would we benefit from a call that lets us madvise lots of areas? 4) Speeding up the migration of those free pages You're using the bitmap to avoid migrating those free pages; HPe's patchset is reconstructing a bitmap from the balloon data; OK, so this all makes sense to avoid migrating them - I'd also been thinking of using pagemap to spot zero pages that would help find other zero'd pages, but perhaps ballooned is enough? 5) Second-migrate Given a VM where you've done all those tricks on, what happens when you migrate it a second time? I guess you're aiming for the guest to update it's bitmap; HPe's solution is to migrate it's balloon bitmap along with the migration data. Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
RE: [RFC qemu 0/4] A PV solution for live migration optimization
> > Could provide more information on how to use virtio-serial to exchange > data? Thread , Wiki or code are all OK. > > I have not find some useful information yet. > > See this commit in the Linux sources: > > 108fc82596e3b66b819df9d28c1ebbc9ab5de14c > > that adds a way to send guest trace data over to the host. I think that's the > most relevant to your use-case. However, you'll have to add an in-kernel > user of virtio-serial (like the virtio-console code > -- the code that deals with tty and hvc currently). There's no other non-tty > user right now, and this is the right kind of use-case to add one for! > > For many other (userspace) use-cases, see the qemu-guest-agent in the > qemu sources. > > The API is documented in the wiki: > > http://www.linux-kvm.org/page/Virtio-serial_API > > and the feature pages have some information that may help as well: > > https://fedoraproject.org/wiki/Features/VirtioSerial > > There are some links in here too: > > http://log.amitshah.net/2010/09/communication-between-guests-and- > hosts/ > > Hope this helps. > > > Amit Thanks a lot !! Liang
RE: [RFC qemu 0/4] A PV solution for live migration optimization
> > Could provide more information on how to use virtio-serial to exchange > data? Thread , Wiki or code are all OK. > > I have not find some useful information yet. > > See this commit in the Linux sources: > > 108fc82596e3b66b819df9d28c1ebbc9ab5de14c > > that adds a way to send guest trace data over to the host. I think that's the > most relevant to your use-case. However, you'll have to add an in-kernel > user of virtio-serial (like the virtio-console code > -- the code that deals with tty and hvc currently). There's no other non-tty > user right now, and this is the right kind of use-case to add one for! > > For many other (userspace) use-cases, see the qemu-guest-agent in the > qemu sources. > > The API is documented in the wiki: > > http://www.linux-kvm.org/page/Virtio-serial_API > > and the feature pages have some information that may help as well: > > https://fedoraproject.org/wiki/Features/VirtioSerial > > There are some links in here too: > > http://log.amitshah.net/2010/09/communication-between-guests-and- > hosts/ > > Hope this helps. > > > Amit Thanks a lot !! Liang
Re: [RFC qemu 0/4] A PV solution for live migration optimization
On (Thu) 10 Mar 2016 [07:44:19], Li, Liang Z wrote: > > Hi Amit, > > Could provide more information on how to use virtio-serial to exchange data? > Thread , Wiki or code are all OK. > I have not find some useful information yet. See this commit in the Linux sources: 108fc82596e3b66b819df9d28c1ebbc9ab5de14c that adds a way to send guest trace data over to the host. I think that's the most relevant to your use-case. However, you'll have to add an in-kernel user of virtio-serial (like the virtio-console code -- the code that deals with tty and hvc currently). There's no other non-tty user right now, and this is the right kind of use-case to add one for! For many other (userspace) use-cases, see the qemu-guest-agent in the qemu sources. The API is documented in the wiki: http://www.linux-kvm.org/page/Virtio-serial_API and the feature pages have some information that may help as well: https://fedoraproject.org/wiki/Features/VirtioSerial There are some links in here too: http://log.amitshah.net/2010/09/communication-between-guests-and-hosts/ Hope this helps. Amit
Re: [RFC qemu 0/4] A PV solution for live migration optimization
On (Thu) 10 Mar 2016 [07:44:19], Li, Liang Z wrote: > > Hi Amit, > > Could provide more information on how to use virtio-serial to exchange data? > Thread , Wiki or code are all OK. > I have not find some useful information yet. See this commit in the Linux sources: 108fc82596e3b66b819df9d28c1ebbc9ab5de14c that adds a way to send guest trace data over to the host. I think that's the most relevant to your use-case. However, you'll have to add an in-kernel user of virtio-serial (like the virtio-console code -- the code that deals with tty and hvc currently). There's no other non-tty user right now, and this is the right kind of use-case to add one for! For many other (userspace) use-cases, see the qemu-guest-agent in the qemu sources. The API is documented in the wiki: http://www.linux-kvm.org/page/Virtio-serial_API and the feature pages have some information that may help as well: https://fedoraproject.org/wiki/Features/VirtioSerial There are some links in here too: http://log.amitshah.net/2010/09/communication-between-guests-and-hosts/ Hope this helps. Amit
RE: [RFC qemu 0/4] A PV solution for live migration optimization
> > This patch set is the QEMU side implementation. > > > > The virtio-balloon is extended so that QEMU can get the free pages > > information from the guest through virtio. > > > > After getting the free pages information (a bitmap), QEMU can use it > > to filter out the guest's free pages in the ram bulk stage. This make > > the live migration process much more efficient. > > > > This RFC version doesn't take the post-copy and RDMA into > > consideration, maybe both of them can benefit from this PV solution by > > with some extra modifications. > > I like the idea, just have to prove (review) and test it a lot to ensure we > don't > end up skipping pages that matter. > > However, there are a couple of points: > > In my opinion, the information that's exchanged between the guest and the > host should be exchanged over a virtio-serial channel rather than virtio- > balloon. First, there's nothing related to the balloon here. > It just happens to be memory info. Second, I would never enable balloon in > a guest that I want to be performance-sensitive. So even if you add this as > part of balloon, you'll find no one is using this solution. > > Secondly, I suggest virtio-serial, because it's meant exactly to exchange > free- > flowing information between a host and a guest, and you don't need to > extend any part of the protocol for it (hence no changes necessary to the > spec). You can see how spice, vnc, etc., use virtio-serial to exchange data. > > > Amit Hi Amit, Could provide more information on how to use virtio-serial to exchange data? Thread , Wiki or code are all OK. I have not find some useful information yet. Thanks Liang
RE: [RFC qemu 0/4] A PV solution for live migration optimization
> > This patch set is the QEMU side implementation. > > > > The virtio-balloon is extended so that QEMU can get the free pages > > information from the guest through virtio. > > > > After getting the free pages information (a bitmap), QEMU can use it > > to filter out the guest's free pages in the ram bulk stage. This make > > the live migration process much more efficient. > > > > This RFC version doesn't take the post-copy and RDMA into > > consideration, maybe both of them can benefit from this PV solution by > > with some extra modifications. > > I like the idea, just have to prove (review) and test it a lot to ensure we > don't > end up skipping pages that matter. > > However, there are a couple of points: > > In my opinion, the information that's exchanged between the guest and the > host should be exchanged over a virtio-serial channel rather than virtio- > balloon. First, there's nothing related to the balloon here. > It just happens to be memory info. Second, I would never enable balloon in > a guest that I want to be performance-sensitive. So even if you add this as > part of balloon, you'll find no one is using this solution. > > Secondly, I suggest virtio-serial, because it's meant exactly to exchange > free- > flowing information between a host and a guest, and you don't need to > extend any part of the protocol for it (hence no changes necessary to the > spec). You can see how spice, vnc, etc., use virtio-serial to exchange data. > > > Amit Hi Amit, Could provide more information on how to use virtio-serial to exchange data? Thread , Wiki or code are all OK. I have not find some useful information yet. Thanks Liang
RE: [RFC qemu 0/4] A PV solution for live migration optimization
> Subject: Re: [RFC qemu 0/4] A PV solution for live migration optimization > > On (Thu) 03 Mar 2016 [18:44:24], Liang Li wrote: > > The current QEMU live migration implementation mark the all the > > guest's RAM pages as dirtied in the ram bulk stage, all these pages > > will be processed and that takes quit a lot of CPU cycles. > > > > From guest's point of view, it doesn't care about the content in free > > pages. We can make use of this fact and skip processing the free pages > > in the ram bulk stage, it can save a lot CPU cycles and reduce the > > network traffic significantly while speed up the live migration > > process obviously. > > > > This patch set is the QEMU side implementation. > > > > The virtio-balloon is extended so that QEMU can get the free pages > > information from the guest through virtio. > > > > After getting the free pages information (a bitmap), QEMU can use it > > to filter out the guest's free pages in the ram bulk stage. This make > > the live migration process much more efficient. > > > > This RFC version doesn't take the post-copy and RDMA into > > consideration, maybe both of them can benefit from this PV solution by > > with some extra modifications. > > I like the idea, just have to prove (review) and test it a lot to ensure we > don't > end up skipping pages that matter. > > However, there are a couple of points: > > In my opinion, the information that's exchanged between the guest and the > host should be exchanged over a virtio-serial channel rather than virtio- > balloon. First, there's nothing related to the balloon here. > It just happens to be memory info. Second, I would never enable balloon in > a guest that I want to be performance-sensitive. So even if you add this as > part of balloon, you'll find no one is using this solution. > > Secondly, I suggest virtio-serial, because it's meant exactly to exchange > free- > flowing information between a host and a guest, and you don't need to > extend any part of the protocol for it (hence no changes necessary to the > spec). You can see how spice, vnc, etc., use virtio-serial to exchange data. > > > Amit I don't like to use the virtio-balloon too, and it's confusing. It's grate if the virtio-serial can be used, I will take a look at it. Thanks for your suggestion! Liang
RE: [RFC qemu 0/4] A PV solution for live migration optimization
> Subject: Re: [RFC qemu 0/4] A PV solution for live migration optimization > > On (Thu) 03 Mar 2016 [18:44:24], Liang Li wrote: > > The current QEMU live migration implementation mark the all the > > guest's RAM pages as dirtied in the ram bulk stage, all these pages > > will be processed and that takes quit a lot of CPU cycles. > > > > From guest's point of view, it doesn't care about the content in free > > pages. We can make use of this fact and skip processing the free pages > > in the ram bulk stage, it can save a lot CPU cycles and reduce the > > network traffic significantly while speed up the live migration > > process obviously. > > > > This patch set is the QEMU side implementation. > > > > The virtio-balloon is extended so that QEMU can get the free pages > > information from the guest through virtio. > > > > After getting the free pages information (a bitmap), QEMU can use it > > to filter out the guest's free pages in the ram bulk stage. This make > > the live migration process much more efficient. > > > > This RFC version doesn't take the post-copy and RDMA into > > consideration, maybe both of them can benefit from this PV solution by > > with some extra modifications. > > I like the idea, just have to prove (review) and test it a lot to ensure we > don't > end up skipping pages that matter. > > However, there are a couple of points: > > In my opinion, the information that's exchanged between the guest and the > host should be exchanged over a virtio-serial channel rather than virtio- > balloon. First, there's nothing related to the balloon here. > It just happens to be memory info. Second, I would never enable balloon in > a guest that I want to be performance-sensitive. So even if you add this as > part of balloon, you'll find no one is using this solution. > > Secondly, I suggest virtio-serial, because it's meant exactly to exchange > free- > flowing information between a host and a guest, and you don't need to > extend any part of the protocol for it (hence no changes necessary to the > spec). You can see how spice, vnc, etc., use virtio-serial to exchange data. > > > Amit I don't like to use the virtio-balloon too, and it's confusing. It's grate if the virtio-serial can be used, I will take a look at it. Thanks for your suggestion! Liang
Re: [RFC qemu 0/4] A PV solution for live migration optimization
On (Thu) 03 Mar 2016 [18:44:24], Liang Li wrote: > The current QEMU live migration implementation mark the all the > guest's RAM pages as dirtied in the ram bulk stage, all these pages > will be processed and that takes quit a lot of CPU cycles. > > From guest's point of view, it doesn't care about the content in free > pages. We can make use of this fact and skip processing the free > pages in the ram bulk stage, it can save a lot CPU cycles and reduce > the network traffic significantly while speed up the live migration > process obviously. > > This patch set is the QEMU side implementation. > > The virtio-balloon is extended so that QEMU can get the free pages > information from the guest through virtio. > > After getting the free pages information (a bitmap), QEMU can use it > to filter out the guest's free pages in the ram bulk stage. This make > the live migration process much more efficient. > > This RFC version doesn't take the post-copy and RDMA into > consideration, maybe both of them can benefit from this PV solution > by with some extra modifications. I like the idea, just have to prove (review) and test it a lot to ensure we don't end up skipping pages that matter. However, there are a couple of points: In my opinion, the information that's exchanged between the guest and the host should be exchanged over a virtio-serial channel rather than virtio-balloon. First, there's nothing related to the balloon here. It just happens to be memory info. Second, I would never enable balloon in a guest that I want to be performance-sensitive. So even if you add this as part of balloon, you'll find no one is using this solution. Secondly, I suggest virtio-serial, because it's meant exactly to exchange free-flowing information between a host and a guest, and you don't need to extend any part of the protocol for it (hence no changes necessary to the spec). You can see how spice, vnc, etc., use virtio-serial to exchange data. Amit
Re: [RFC qemu 0/4] A PV solution for live migration optimization
On (Thu) 03 Mar 2016 [18:44:24], Liang Li wrote: > The current QEMU live migration implementation mark the all the > guest's RAM pages as dirtied in the ram bulk stage, all these pages > will be processed and that takes quit a lot of CPU cycles. > > From guest's point of view, it doesn't care about the content in free > pages. We can make use of this fact and skip processing the free > pages in the ram bulk stage, it can save a lot CPU cycles and reduce > the network traffic significantly while speed up the live migration > process obviously. > > This patch set is the QEMU side implementation. > > The virtio-balloon is extended so that QEMU can get the free pages > information from the guest through virtio. > > After getting the free pages information (a bitmap), QEMU can use it > to filter out the guest's free pages in the ram bulk stage. This make > the live migration process much more efficient. > > This RFC version doesn't take the post-copy and RDMA into > consideration, maybe both of them can benefit from this PV solution > by with some extra modifications. I like the idea, just have to prove (review) and test it a lot to ensure we don't end up skipping pages that matter. However, there are a couple of points: In my opinion, the information that's exchanged between the guest and the host should be exchanged over a virtio-serial channel rather than virtio-balloon. First, there's nothing related to the balloon here. It just happens to be memory info. Second, I would never enable balloon in a guest that I want to be performance-sensitive. So even if you add this as part of balloon, you'll find no one is using this solution. Secondly, I suggest virtio-serial, because it's meant exactly to exchange free-flowing information between a host and a guest, and you don't need to extend any part of the protocol for it (hence no changes necessary to the spec). You can see how spice, vnc, etc., use virtio-serial to exchange data. Amit
Re: [RFC qemu 0/4] A PV solution for live migration optimization
On (Fri) 04 Mar 2016 [15:02:47], Jitendra Kolhe wrote: > > > > > > * Liang Li (liang.z...@intel.com) wrote: > > > > The current QEMU live migration implementation mark the all the > > > > guest's RAM pages as dirtied in the ram bulk stage, all these pages > > > > will be processed and that takes quit a lot of CPU cycles. > > > > > > > > From guest's point of view, it doesn't care about the content in free > > > > pages. We can make use of this fact and skip processing the free pages > > > > in the ram bulk stage, it can save a lot CPU cycles and reduce the > > > > network traffic significantly while speed up the live migration > > > > process obviously. > > > > > > > > This patch set is the QEMU side implementation. > > > > > > > > The virtio-balloon is extended so that QEMU can get the free pages > > > > information from the guest through virtio. > > > > > > > > After getting the free pages information (a bitmap), QEMU can use it > > > > to filter out the guest's free pages in the ram bulk stage. This make > > > > the live migration process much more efficient. > > > > > > Hi, > > > An interesting solution; I know a few different people have been > > > looking at > > > how to speed up ballooned VM migration. > > > > > > > Ooh, different solutions for the same purpose, and both based on the > > balloon. > > We were also tying to address similar problem, without actually needing to > modify > the guest driver. Please find patch details under mail with subject. > migration: skip sending ram pages released by virtio-balloon driver The scope of this patch series seems to be wider: don't send free pages to a dest at all, vs. don't send pages that are ballooned out. Amit
Re: [RFC qemu 0/4] A PV solution for live migration optimization
On (Fri) 04 Mar 2016 [15:02:47], Jitendra Kolhe wrote: > > > > > > * Liang Li (liang.z...@intel.com) wrote: > > > > The current QEMU live migration implementation mark the all the > > > > guest's RAM pages as dirtied in the ram bulk stage, all these pages > > > > will be processed and that takes quit a lot of CPU cycles. > > > > > > > > From guest's point of view, it doesn't care about the content in free > > > > pages. We can make use of this fact and skip processing the free pages > > > > in the ram bulk stage, it can save a lot CPU cycles and reduce the > > > > network traffic significantly while speed up the live migration > > > > process obviously. > > > > > > > > This patch set is the QEMU side implementation. > > > > > > > > The virtio-balloon is extended so that QEMU can get the free pages > > > > information from the guest through virtio. > > > > > > > > After getting the free pages information (a bitmap), QEMU can use it > > > > to filter out the guest's free pages in the ram bulk stage. This make > > > > the live migration process much more efficient. > > > > > > Hi, > > > An interesting solution; I know a few different people have been > > > looking at > > > how to speed up ballooned VM migration. > > > > > > > Ooh, different solutions for the same purpose, and both based on the > > balloon. > > We were also tying to address similar problem, without actually needing to > modify > the guest driver. Please find patch details under mail with subject. > migration: skip sending ram pages released by virtio-balloon driver The scope of this patch series seems to be wider: don't send free pages to a dest at all, vs. don't send pages that are ballooned out. Amit
RE: [RFC qemu 0/4] A PV solution for live migration optimization
> > > * Liang Li (liang.z...@intel.com) wrote: > > > > The current QEMU live migration implementation mark the all the > > > > guest's RAM pages as dirtied in the ram bulk stage, all these > > > > pages will be processed and that takes quit a lot of CPU cycles. > > > > > > > > From guest's point of view, it doesn't care about the content in > > > > free pages. We can make use of this fact and skip processing the > > > > free pages in the ram bulk stage, it can save a lot CPU cycles and > > > > reduce the network traffic significantly while speed up the live > > > > migration process obviously. > > > > > > > > This patch set is the QEMU side implementation. > > > > > > > > The virtio-balloon is extended so that QEMU can get the free pages > > > > information from the guest through virtio. > > > > > > > > After getting the free pages information (a bitmap), QEMU can use > > > > it to filter out the guest's free pages in the ram bulk stage. > > > > This make the live migration process much more efficient. > > > > > > Hi, > > > An interesting solution; I know a few different people have been > > > looking at how to speed up ballooned VM migration. > > > > > > > Ooh, different solutions for the same purpose, and both based on the > balloon. > > We were also tying to address similar problem, without actually needing to > modify the guest driver. Please find patch details under mail with subject. > migration: skip sending ram pages released by virtio-balloon driver > > Thanks, > - Jitendra > Great! Thanks for your information. Liang > > > > > I wonder if it would be possible to avoid the kernel changes by > > > parsing /proc/self/pagemap - if that can be used to detect > > > unmapped/zero mapped pages in the guest ram, would it achieve the > same result? > > >
RE: [RFC qemu 0/4] A PV solution for live migration optimization
> > > * Liang Li (liang.z...@intel.com) wrote: > > > > The current QEMU live migration implementation mark the all the > > > > guest's RAM pages as dirtied in the ram bulk stage, all these > > > > pages will be processed and that takes quit a lot of CPU cycles. > > > > > > > > From guest's point of view, it doesn't care about the content in > > > > free pages. We can make use of this fact and skip processing the > > > > free pages in the ram bulk stage, it can save a lot CPU cycles and > > > > reduce the network traffic significantly while speed up the live > > > > migration process obviously. > > > > > > > > This patch set is the QEMU side implementation. > > > > > > > > The virtio-balloon is extended so that QEMU can get the free pages > > > > information from the guest through virtio. > > > > > > > > After getting the free pages information (a bitmap), QEMU can use > > > > it to filter out the guest's free pages in the ram bulk stage. > > > > This make the live migration process much more efficient. > > > > > > Hi, > > > An interesting solution; I know a few different people have been > > > looking at how to speed up ballooned VM migration. > > > > > > > Ooh, different solutions for the same purpose, and both based on the > balloon. > > We were also tying to address similar problem, without actually needing to > modify the guest driver. Please find patch details under mail with subject. > migration: skip sending ram pages released by virtio-balloon driver > > Thanks, > - Jitendra > Great! Thanks for your information. Liang > > > > > I wonder if it would be possible to avoid the kernel changes by > > > parsing /proc/self/pagemap - if that can be used to detect > > > unmapped/zero mapped pages in the guest ram, would it achieve the > same result? > > >
RE: [RFC qemu 0/4] A PV solution for live migration optimization
> Subject: Re: [RFC qemu 0/4] A PV solution for live migration optimization > > * Liang Li (liang.z...@intel.com) wrote: > > The current QEMU live migration implementation mark the all the > > guest's RAM pages as dirtied in the ram bulk stage, all these pages > > will be processed and that takes quit a lot of CPU cycles. > > > > From guest's point of view, it doesn't care about the content in free > > pages. We can make use of this fact and skip processing the free pages > > in the ram bulk stage, it can save a lot CPU cycles and reduce the > > network traffic significantly while speed up the live migration > > process obviously. > > > > This patch set is the QEMU side implementation. > > > > The virtio-balloon is extended so that QEMU can get the free pages > > information from the guest through virtio. > > > > After getting the free pages information (a bitmap), QEMU can use it > > to filter out the guest's free pages in the ram bulk stage. This make > > the live migration process much more efficient. > > Hi, > An interesting solution; I know a few different people have been looking at > how to speed up ballooned VM migration. > Ooh, different solutions for the same purpose, and both based on the balloon. > I wonder if it would be possible to avoid the kernel changes by parsing > /proc/self/pagemap - if that can be used to detect unmapped/zero mapped > pages in the guest ram, would it achieve the same result? > Only detect the unmapped/zero mapped pages is not enough. Consider the situation like case 2, it can't achieve the same result. > > This RFC version doesn't take the post-copy and RDMA into > > consideration, maybe both of them can benefit from this PV solution by > > with some extra modifications. > > For postcopy to be safe, you would still need to send a message to the > destination telling it that there were zero pages, otherwise the destination > can't tell if it's supposed to request the page from the source or treat the > page as zero. > > Dave I will consider this later, thanks, Dave. Liang > > > > > Performance data > > > > > > Test environment: > > > > CPU: Intel (R) Xeon(R) CPU ES-2699 v3 @ 2.30GHz Host RAM: 64GB > > Host Linux Kernel: 4.2.0 Host OS: CentOS 7.1 > > Guest Linux Kernel: 4.5.rc6Guest OS: CentOS 6.6 > > Network: X540-AT2 with 10 Gigabit connection Guest RAM: 8GB > > > > Case 1: Idle guest just boots: > > > > | original |pv > > --- > > total time(ms) |1894 | 421 > > > > transferred ram(KB) | 398017 | 353242 > > > > > > > > Case 2: The guest has ever run some memory consuming workload, the > > workload is terminated just before live migration. > > > > | original |pv > > --- > > total time(ms) | 7436| 552 > > > > transferred ram(KB) | 8146291 | 361375 > > > >
RE: [RFC qemu 0/4] A PV solution for live migration optimization
> Subject: Re: [RFC qemu 0/4] A PV solution for live migration optimization > > * Liang Li (liang.z...@intel.com) wrote: > > The current QEMU live migration implementation mark the all the > > guest's RAM pages as dirtied in the ram bulk stage, all these pages > > will be processed and that takes quit a lot of CPU cycles. > > > > From guest's point of view, it doesn't care about the content in free > > pages. We can make use of this fact and skip processing the free pages > > in the ram bulk stage, it can save a lot CPU cycles and reduce the > > network traffic significantly while speed up the live migration > > process obviously. > > > > This patch set is the QEMU side implementation. > > > > The virtio-balloon is extended so that QEMU can get the free pages > > information from the guest through virtio. > > > > After getting the free pages information (a bitmap), QEMU can use it > > to filter out the guest's free pages in the ram bulk stage. This make > > the live migration process much more efficient. > > Hi, > An interesting solution; I know a few different people have been looking at > how to speed up ballooned VM migration. > Ooh, different solutions for the same purpose, and both based on the balloon. > I wonder if it would be possible to avoid the kernel changes by parsing > /proc/self/pagemap - if that can be used to detect unmapped/zero mapped > pages in the guest ram, would it achieve the same result? > Only detect the unmapped/zero mapped pages is not enough. Consider the situation like case 2, it can't achieve the same result. > > This RFC version doesn't take the post-copy and RDMA into > > consideration, maybe both of them can benefit from this PV solution by > > with some extra modifications. > > For postcopy to be safe, you would still need to send a message to the > destination telling it that there were zero pages, otherwise the destination > can't tell if it's supposed to request the page from the source or treat the > page as zero. > > Dave I will consider this later, thanks, Dave. Liang > > > > > Performance data > > > > > > Test environment: > > > > CPU: Intel (R) Xeon(R) CPU ES-2699 v3 @ 2.30GHz Host RAM: 64GB > > Host Linux Kernel: 4.2.0 Host OS: CentOS 7.1 > > Guest Linux Kernel: 4.5.rc6Guest OS: CentOS 6.6 > > Network: X540-AT2 with 10 Gigabit connection Guest RAM: 8GB > > > > Case 1: Idle guest just boots: > > > > | original |pv > > --- > > total time(ms) |1894 | 421 > > > > transferred ram(KB) | 398017 | 353242 > > > > > > > > Case 2: The guest has ever run some memory consuming workload, the > > workload is terminated just before live migration. > > > > | original |pv > > --- > > total time(ms) | 7436| 552 > > > > transferred ram(KB) | 8146291 | 361375 > > > >
Re: [RFC qemu 0/4] A PV solution for live migration optimization
* Liang Li (liang.z...@intel.com) wrote: > The current QEMU live migration implementation mark the all the > guest's RAM pages as dirtied in the ram bulk stage, all these pages > will be processed and that takes quit a lot of CPU cycles. > > From guest's point of view, it doesn't care about the content in free > pages. We can make use of this fact and skip processing the free > pages in the ram bulk stage, it can save a lot CPU cycles and reduce > the network traffic significantly while speed up the live migration > process obviously. > > This patch set is the QEMU side implementation. > > The virtio-balloon is extended so that QEMU can get the free pages > information from the guest through virtio. > > After getting the free pages information (a bitmap), QEMU can use it > to filter out the guest's free pages in the ram bulk stage. This make > the live migration process much more efficient. Hi, An interesting solution; I know a few different people have been looking at how to speed up ballooned VM migration. I wonder if it would be possible to avoid the kernel changes by parsing /proc/self/pagemap - if that can be used to detect unmapped/zero mapped pages in the guest ram, would it achieve the same result? > This RFC version doesn't take the post-copy and RDMA into > consideration, maybe both of them can benefit from this PV solution > by with some extra modifications. For postcopy to be safe, you would still need to send a message to the destination telling it that there were zero pages, otherwise the destination can't tell if it's supposed to request the page from the source or treat the page as zero. Dave > > Performance data > > > Test environment: > > CPU: Intel (R) Xeon(R) CPU ES-2699 v3 @ 2.30GHz > Host RAM: 64GB > Host Linux Kernel: 4.2.0 Host OS: CentOS 7.1 > Guest Linux Kernel: 4.5.rc6Guest OS: CentOS 6.6 > Network: X540-AT2 with 10 Gigabit connection > Guest RAM: 8GB > > Case 1: Idle guest just boots: > > | original |pv > --- > total time(ms) |1894 | 421 > > transferred ram(KB) | 398017 | 353242 > > > > Case 2: The guest has ever run some memory consuming workload, the > workload is terminated just before live migration. > > | original |pv > --- > total time(ms) | 7436| 552 > > transferred ram(KB) | 8146291 | 361375 > > > Liang Li (4): > pc: Add code to get the lowmem form PCMachineState > virtio-balloon: Add a new feature to balloon device > migration: not set migration bitmap in setup stage > migration: filter out guest's free pages in ram bulk stage > > balloon.c | 30 - > hw/i386/pc.c| 5 ++ > hw/i386/pc_piix.c | 1 + > hw/i386/pc_q35.c| 1 + > hw/virtio/virtio-balloon.c | 81 > - > include/hw/i386/pc.h| 3 +- > include/hw/virtio/virtio-balloon.h | 17 +- > include/standard-headers/linux/virtio_balloon.h | 1 + > include/sysemu/balloon.h| 10 ++- > migration/ram.c | 64 +++ > 10 files changed, 195 insertions(+), 18 deletions(-) > > -- > 1.8.3.1 > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [RFC qemu 0/4] A PV solution for live migration optimization
* Liang Li (liang.z...@intel.com) wrote: > The current QEMU live migration implementation mark the all the > guest's RAM pages as dirtied in the ram bulk stage, all these pages > will be processed and that takes quit a lot of CPU cycles. > > From guest's point of view, it doesn't care about the content in free > pages. We can make use of this fact and skip processing the free > pages in the ram bulk stage, it can save a lot CPU cycles and reduce > the network traffic significantly while speed up the live migration > process obviously. > > This patch set is the QEMU side implementation. > > The virtio-balloon is extended so that QEMU can get the free pages > information from the guest through virtio. > > After getting the free pages information (a bitmap), QEMU can use it > to filter out the guest's free pages in the ram bulk stage. This make > the live migration process much more efficient. Hi, An interesting solution; I know a few different people have been looking at how to speed up ballooned VM migration. I wonder if it would be possible to avoid the kernel changes by parsing /proc/self/pagemap - if that can be used to detect unmapped/zero mapped pages in the guest ram, would it achieve the same result? > This RFC version doesn't take the post-copy and RDMA into > consideration, maybe both of them can benefit from this PV solution > by with some extra modifications. For postcopy to be safe, you would still need to send a message to the destination telling it that there were zero pages, otherwise the destination can't tell if it's supposed to request the page from the source or treat the page as zero. Dave > > Performance data > > > Test environment: > > CPU: Intel (R) Xeon(R) CPU ES-2699 v3 @ 2.30GHz > Host RAM: 64GB > Host Linux Kernel: 4.2.0 Host OS: CentOS 7.1 > Guest Linux Kernel: 4.5.rc6Guest OS: CentOS 6.6 > Network: X540-AT2 with 10 Gigabit connection > Guest RAM: 8GB > > Case 1: Idle guest just boots: > > | original |pv > --- > total time(ms) |1894 | 421 > > transferred ram(KB) | 398017 | 353242 > > > > Case 2: The guest has ever run some memory consuming workload, the > workload is terminated just before live migration. > > | original |pv > --- > total time(ms) | 7436| 552 > > transferred ram(KB) | 8146291 | 361375 > > > Liang Li (4): > pc: Add code to get the lowmem form PCMachineState > virtio-balloon: Add a new feature to balloon device > migration: not set migration bitmap in setup stage > migration: filter out guest's free pages in ram bulk stage > > balloon.c | 30 - > hw/i386/pc.c| 5 ++ > hw/i386/pc_piix.c | 1 + > hw/i386/pc_q35.c| 1 + > hw/virtio/virtio-balloon.c | 81 > - > include/hw/i386/pc.h| 3 +- > include/hw/virtio/virtio-balloon.h | 17 +- > include/standard-headers/linux/virtio_balloon.h | 1 + > include/sysemu/balloon.h| 10 ++- > migration/ram.c | 64 +++ > 10 files changed, 195 insertions(+), 18 deletions(-) > > -- > 1.8.3.1 > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK