Re: Inspect pages created after a vm_object is marked as copy-on-write

2018-07-01 Thread Elena Mihailescu
On 1 July 2018 at 11:29, Mark Johnston  wrote:
>
> On Sun, Jul 01, 2018 at 01:34:01AM +0300, Konstantin Belousov wrote:
> > On Sat, Jun 30, 2018 at 05:59:56PM -0400, Mark Johnston wrote:
> > > On Sat, Jun 30, 2018 at 10:38:21AM +0300, Mihai Carabas wrote:
> > > > On Sat, Jun 30, 2018 at 1:52 AM, Mark Johnston  
> > > > wrote:
> > > > > On Fri, Jun 29, 2018 at 11:58:31AM +0300, Elena Mihailescu wrote:
> > > > >> Is there anything I am doing wrong? Maybe I misunderstood something 
> > > > >> about
> > > > >> the way the virtual memory works in FreeBSD.
> > > > >
> > > > > I'll note that inspecting and manipulating vm_map_entry and vm_object
> > > > > structures in the bhyve code constitutes something of an abstraction
> > > > > violation, though it's reasonable to proceed this way while working 
> > > > > on a
> > > > > prototype of the feature.  That is, I think you should keep trying 
> > > > > your
> > > > > current approach, but just be aware that you are using the 
> > > > > copy-on-write
> > > > > mechanism in a way that the VM system isn't really expecting.
> > > > >
> > > >
> > > > Can you point out the right approach in our case?
> > >
> > > I am merely suggesting that once the required VM interactions are fully
> > > understood, the mechanism implemented for bhyve should be generalized
> > > and lifted into the VM code.  It's hard to say what the "right" approach
> > > is, since I don't fully understand the proposed algorithm.  It sounds
> > > like you might be attempting something like:
> > >
> > > 1. mark the mappings of to-be-migrated objects as NEEDS_COW, so that a
> > >subsequent write fault triggers creation of a shadow object
> > It is actually MAP_ENTRY_COW | MAP_ENTRY_NEEDS_COPY.
>
> Indeed.
>
> > Note that setting an entry to COW changes the behaviour of mprotect(2),
> > at least.
> >
> > > 2. invalidate all physical mappings of pages in the object to be copied,
> > >so that subsequent writes trigger a fault
> > I do not think this is needed to detect writes after the COW is set.
> > It is enough to remove the write permissions.  Same as fork() does,
> > see the vm_map_copy_entry() code for the handling of MAP_ENTRY_NEEDS_COPY
> > case.
>
> Ah, right.
>
> > > 3. copy pages from the backing object to the destination
> > As I understand, this is done right after the entry is marked
> > as COW.
> >
> > > 4. copy any pages from the shadow object to the desination
> > And this is done after all backing data is copied and the process is
> > suspended.
>
> Right, I see.  Some reading suggests that in general we might perform
> multiple iterations of this procedure before suspending the process and
> performing any remaining copies.
>

For the live migration implementation, we only need this approach to
send the guest's physical memory (in my tests, I observed that the
guest's memory was represented by a single vm_map_entry for a virtual
machine with 512MB of RAM. I found that by inspecting each bhyve
instance's vm_map_entry and compared its start and end addresses with
the virtual machine's base-address and memory size).

What we are trying to implement is to send the initial memory while
the virtual machine is running (that's why we need to mark that entry
copy-on-write so that we have consistency), then stop the virtual
machine execution (freeze the vCPUs) and send only the pages/memory
areas that were modified meanwhile (using the copy-on-write mechanism,
we could determine which pages were modified). This is "a simplified
live migration". A live-migration feature will have multiple rounds to
send the virtual memory (round1: send the entire memory, round2: send
only the differences between round1 and round2, round3: send only the
difference between round2 and round3 and so on, until a certain point
when you stop the virtual machine and send to the destination only the
differences and guest's CPU state).

After the memory migration is completed (and of course, after
migrating guest's CPU state), the virtual machine will be destroyed.
So, the objects created using the copy-on-write mechanisms will be
destroyed.

As the number of rounds, it will be choose later, based on different
performance tests, but it shouldn't be too big.

Elena

> > > 5. collapse the backing object into the shadow
> > > 6. if the shadow object exists and was non-empty before the collapse,
> > >goto 1
> > Are you trying to describe how to undo the COW marking ?  Marking an
> > entry as COW really changes its semantic, and we do not need the undo
> > operation in the base so far.  Collapsing the objects would lesser
> > the pressure on the system pollution with objects, but it does not
> > change back the meaning of mappings, e.g. their behaviour on inheritance
> > on fork.
>
> I was just thinking about how to avoid creating a long chain of objects
> if multiple iterations are in fact needed.
> ___
> freebsd-amd64@freebsd.org mailing list
> 

Re: Inspect pages created after a vm_object is marked as copy-on-write

2018-07-01 Thread Mark Johnston
On Sun, Jul 01, 2018 at 01:34:01AM +0300, Konstantin Belousov wrote:
> On Sat, Jun 30, 2018 at 05:59:56PM -0400, Mark Johnston wrote:
> > On Sat, Jun 30, 2018 at 10:38:21AM +0300, Mihai Carabas wrote:
> > > On Sat, Jun 30, 2018 at 1:52 AM, Mark Johnston  wrote:
> > > > On Fri, Jun 29, 2018 at 11:58:31AM +0300, Elena Mihailescu wrote:
> > > >> Is there anything I am doing wrong? Maybe I misunderstood something 
> > > >> about
> > > >> the way the virtual memory works in FreeBSD.
> > > >
> > > > I'll note that inspecting and manipulating vm_map_entry and vm_object
> > > > structures in the bhyve code constitutes something of an abstraction
> > > > violation, though it's reasonable to proceed this way while working on a
> > > > prototype of the feature.  That is, I think you should keep trying your
> > > > current approach, but just be aware that you are using the copy-on-write
> > > > mechanism in a way that the VM system isn't really expecting.
> > > >
> > > 
> > > Can you point out the right approach in our case?
> > 
> > I am merely suggesting that once the required VM interactions are fully
> > understood, the mechanism implemented for bhyve should be generalized
> > and lifted into the VM code.  It's hard to say what the "right" approach
> > is, since I don't fully understand the proposed algorithm.  It sounds
> > like you might be attempting something like:
> > 
> > 1. mark the mappings of to-be-migrated objects as NEEDS_COW, so that a
> >subsequent write fault triggers creation of a shadow object
> It is actually MAP_ENTRY_COW | MAP_ENTRY_NEEDS_COPY.

Indeed.

> Note that setting an entry to COW changes the behaviour of mprotect(2),
> at least.
> 
> > 2. invalidate all physical mappings of pages in the object to be copied,
> >so that subsequent writes trigger a fault
> I do not think this is needed to detect writes after the COW is set.
> It is enough to remove the write permissions.  Same as fork() does,
> see the vm_map_copy_entry() code for the handling of MAP_ENTRY_NEEDS_COPY
> case.

Ah, right.

> > 3. copy pages from the backing object to the destination
> As I understand, this is done right after the entry is marked
> as COW.
> 
> > 4. copy any pages from the shadow object to the desination
> And this is done after all backing data is copied and the process is
> suspended.

Right, I see.  Some reading suggests that in general we might perform
multiple iterations of this procedure before suspending the process and
performing any remaining copies.

> > 5. collapse the backing object into the shadow
> > 6. if the shadow object exists and was non-empty before the collapse,
> >goto 1
> Are you trying to describe how to undo the COW marking ?  Marking an
> entry as COW really changes its semantic, and we do not need the undo
> operation in the base so far.  Collapsing the objects would lesser
> the pressure on the system pollution with objects, but it does not
> change back the meaning of mappings, e.g. their behaviour on inheritance
> on fork.

I was just thinking about how to avoid creating a long chain of objects
if multiple iterations are in fact needed.
___
freebsd-amd64@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-amd64
To unsubscribe, send any mail to "freebsd-amd64-unsubscr...@freebsd.org"


Re: Inspect pages created after a vm_object is marked as copy-on-write

2018-06-30 Thread Konstantin Belousov
On Sat, Jun 30, 2018 at 05:59:56PM -0400, Mark Johnston wrote:
> On Sat, Jun 30, 2018 at 10:38:21AM +0300, Mihai Carabas wrote:
> > On Sat, Jun 30, 2018 at 1:52 AM, Mark Johnston  wrote:
> > > On Fri, Jun 29, 2018 at 11:58:31AM +0300, Elena Mihailescu wrote:
> > >> Is there anything I am doing wrong? Maybe I misunderstood something about
> > >> the way the virtual memory works in FreeBSD.
> > >
> > > I'll note that inspecting and manipulating vm_map_entry and vm_object
> > > structures in the bhyve code constitutes something of an abstraction
> > > violation, though it's reasonable to proceed this way while working on a
> > > prototype of the feature.  That is, I think you should keep trying your
> > > current approach, but just be aware that you are using the copy-on-write
> > > mechanism in a way that the VM system isn't really expecting.
> > >
> > 
> > Can you point out the right approach in our case?
> 
> I am merely suggesting that once the required VM interactions are fully
> understood, the mechanism implemented for bhyve should be generalized
> and lifted into the VM code.  It's hard to say what the "right" approach
> is, since I don't fully understand the proposed algorithm.  It sounds
> like you might be attempting something like:
> 
> 1. mark the mappings of to-be-migrated objects as NEEDS_COW, so that a
>subsequent write fault triggers creation of a shadow object
It is actually MAP_ENTRY_COW | MAP_ENTRY_NEEDS_COPY.

Note that setting an entry to COW changes the behaviour of mprotect(2),
at least.

> 2. invalidate all physical mappings of pages in the object to be copied,
>so that subsequent writes trigger a fault
I do not think this is needed to detect writes after the COW is set.
It is enough to remove the write permissions.  Same as fork() does,
see the vm_map_copy_entry() code for the handling of MAP_ENTRY_NEEDS_COPY
case.

> 3. copy pages from the backing object to the destination
As I understand, this is done right after the entry is marked
as COW.

> 4. copy any pages from the shadow object to the desination
And this is done after all backing data is copied and the process is
suspended.

> 5. collapse the backing object into the shadow
> 6. if the shadow object exists and was non-empty before the collapse,
>goto 1
Are you trying to describe how to undo the COW marking ?  Marking an
entry as COW really changes its semantic, and we do not need the undo
operation in the base so far.  Collapsing the objects would lesser
the pressure on the system pollution with objects, but it does not
change back the meaning of mappings, e.g. their behaviour on inheritance
on fork.

> 
> Is that at all accurate?  I'm not familiar with the mechanisms used to
> implement live migration in other hypervisors.
> ___
> freebsd-amd64@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-amd64
> To unsubscribe, send any mail to "freebsd-amd64-unsubscr...@freebsd.org"
___
freebsd-amd64@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-amd64
To unsubscribe, send any mail to "freebsd-amd64-unsubscr...@freebsd.org"


Re: Inspect pages created after a vm_object is marked as copy-on-write

2018-06-30 Thread Mark Johnston
On Sat, Jun 30, 2018 at 10:38:21AM +0300, Mihai Carabas wrote:
> On Sat, Jun 30, 2018 at 1:52 AM, Mark Johnston  wrote:
> > On Fri, Jun 29, 2018 at 11:58:31AM +0300, Elena Mihailescu wrote:
> >> Is there anything I am doing wrong? Maybe I misunderstood something about
> >> the way the virtual memory works in FreeBSD.
> >
> > I'll note that inspecting and manipulating vm_map_entry and vm_object
> > structures in the bhyve code constitutes something of an abstraction
> > violation, though it's reasonable to proceed this way while working on a
> > prototype of the feature.  That is, I think you should keep trying your
> > current approach, but just be aware that you are using the copy-on-write
> > mechanism in a way that the VM system isn't really expecting.
> >
> 
> Can you point out the right approach in our case?

I am merely suggesting that once the required VM interactions are fully
understood, the mechanism implemented for bhyve should be generalized
and lifted into the VM code.  It's hard to say what the "right" approach
is, since I don't fully understand the proposed algorithm.  It sounds
like you might be attempting something like:

1. mark the mappings of to-be-migrated objects as NEEDS_COW, so that a
   subsequent write fault triggers creation of a shadow object
2. invalidate all physical mappings of pages in the object to be copied,
   so that subsequent writes trigger a fault
3. copy pages from the backing object to the destination
4. copy any pages from the shadow object to the desination
5. collapse the backing object into the shadow
6. if the shadow object exists and was non-empty before the collapse,
   goto 1

Is that at all accurate?  I'm not familiar with the mechanisms used to
implement live migration in other hypervisors.
___
freebsd-amd64@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-amd64
To unsubscribe, send any mail to "freebsd-amd64-unsubscr...@freebsd.org"


Re: Inspect pages created after a vm_object is marked as copy-on-write

2018-06-30 Thread Mihai Carabas
On Sat, Jun 30, 2018 at 1:52 AM, Mark Johnston  wrote:
> On Fri, Jun 29, 2018 at 11:58:31AM +0300, Elena Mihailescu wrote:
>>  Hello,
>>
>> I am interested if there is a method to inspect what pages/objects were
>> created after a vm_object (the vm_map_entry associated with the object) is
>> marked as copy-on-write. More specifically, I'm interested only in the
>> pages that were copied when a write operation was proceed on a page that
>> belongs to the object marked copy-on-write.
>>
>> I need this for a live migration feature for bhyve in order to send the
>> pages that were modified between the iterations in which I migrate the
>> guest's memory(the guest's memory will be migrated in rounds - firstly, all
>> memory will be sent remote, then, only the pages that were modified and so
>> on).
>>
>> What I want to implement is the following:
>> Step 1: Given a vm_object *obj, mark its associated vm_map_entry *entry as
>> copy-on-write.
>> Step 2: After a while (a non-deterministic amount of time),
>> inspect/retrieve the pages that were created based on information existent
>> in the object.
>>
>> What I tried until now:
>>
>> I implemented a function in kernel that:
>> - gets the vmspace structure pointer for the current process
>> - gets the vm_map structure pointer for the vmspace
>> - iterates through each vm_map_entry and based on the vm_offset_start and
>> vm_offset_end determines vm_map_entry that contains the object I am
>> interested in.
>> - for this object, it prints some debug information such as: shadow_count,
>> ref_count, whether if it has a backing_object or not.
>> The code written is similar with the code from here (the way in which I get
>> vmspace for the current process and the way I am iterating through
>> vm_map_entry and objects):
>> [0]
>> https://github.com/FreeBSD-UPB/freebsd/blob/projects/bhyve_migration/sys/amd64/vmm/vmm_dev.c#L979
>>
>> I have read the following documentation about FreeBSD's implementation for
>> virtual memory:
>> [1] https://www.freebsd.org/doc/en/books/arch-handbook/vm.html
>> [2]
>> https://www.freebsd.org/doc/en_US.ISO8859-1/articles/vm-design/article.html
>> [3] https://people.freebsd.org/~neel/bhyve/bhyve_nested_paging.pdf
>> [4] http://www.cse.chalmers.se/edu/year/2011/course/EDA203/unix4.pdf
>>
>> As far as I could tell after reading the documentation presented above, I
>> should look for the object that the object I am interested in is a shadow
>> of or an object that my object is shadow for.
>
> Right. When a copy-on-write fault results in the creation of a new
> object, the new object is said to shadow the original object, which
> becomes the backing object for the shadow object.  When faults in the
> corresponding map entry occur, the fault handler first searches for a
> page in the map entry's object, and then falls back to the backing
> object if necessary.
>
>> To do that, I should inspect the following fields from the vm_object
>> structure (among others)(
>> https://github.com/freebsd/freebsd/blob/master/sys/vm/vm_object.h#L98) :
>>
>> - int shadow_count; /* how many objects that this is a shadow for */
>> - struct vm_object *backing_object; /* object that I'm a shadow of */
>>
>> But in all my tests, for the object I am interested in, the shadow_count is
>> 0 and the backing_object is NULL.
>>
>> The code I use to mark the vm_map_entry for the object I am interested in
>> copy-on-write is here:
>> [5]
>> https://github.com/FreeBSD-UPB/freebsd/blob/projects/bhyve_migration/sys/amd64/vmm/vmm_dev.c#L949
>
> MAP_ENTRY_NEEDS_COPY needs to be set in order for the copy-on-write
> machinery to work the way you expect.  Take a look at vm_map_lookup():
> when it sees that flag and the caller is attempting a write fault, it
> creates a shadow object, updates the entry and clears
> MAP_ENTRY_NEEDS_COPY, leaving MAP_ENTRY_COW set.
>
>> Is there anything I am doing wrong? Maybe I misunderstood something about
>> the way the virtual memory works in FreeBSD.
>
> I'll note that inspecting and manipulating vm_map_entry and vm_object
> structures in the bhyve code constitutes something of an abstraction
> violation, though it's reasonable to proceed this way while working on a
> prototype of the feature.  That is, I think you should keep trying your
> current approach, but just be aware that you are using the copy-on-write
> mechanism in a way that the VM system isn't really expecting.
>

Can you point out the right approach in our case?

Thanks,
Mihai

>> There is another way I could inspect what pages were created between the
>> moment I mark an object (its vm_map_entry) as copy-on-write and a later
>> moment?
> ___
> freebsd-virtualizat...@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
> To unsubscribe, send any mail to 
> "freebsd-virtualization-unsubscr...@freebsd.org"
___
freebsd-amd64@freebsd.org mailing list