Re: [Xen-devel] [PATCH v5 15/18] xen/mem_sharing: VM forking

2020-01-23 Thread Tamas K Lengyel
On Thu, Jan 23, 2020 at 10:21 AM George Dunlap  wrote:
>
> On 1/21/20 5:49 PM, Tamas K Lengyel wrote:
> > VM forking is the process of creating a domain with an empty memory space 
> > and a
> > parent domain specified from which to populate the memory when necessary. 
> > For
> > the new domain to be functional the VM state is copied over as part of the 
> > fork
> > operation (HVM params, hap allocation, etc).
> >
> > Signed-off-by: Tamas K Lengyel 
>
> Overall this looks really good.  Just a few questions.
>
> > ---
> >  xen/arch/x86/domain.c |   9 ++
> >  xen/arch/x86/hvm/hvm.c|   2 +-
> >  xen/arch/x86/mm/mem_sharing.c | 220 ++
> >  xen/arch/x86/mm/p2m.c |  11 +-
> >  xen/include/asm-x86/mem_sharing.h |  20 ++-
> >  xen/include/public/memory.h   |   5 +
> >  xen/include/xen/sched.h   |   2 +
> >  7 files changed, 265 insertions(+), 4 deletions(-)
> >
> > diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> > index 28fefa1f81..953abcc1fc 100644
> > --- a/xen/arch/x86/domain.c
> > +++ b/xen/arch/x86/domain.c
> > @@ -2198,6 +2198,15 @@ int domain_relinquish_resources(struct domain *d)
> >  ret = relinquish_shared_pages(d);
> >  if ( ret )
> >  return ret;
> > +
> > +/*
> > + * If the domain is forked, decrement the parent's pause count.
> > + */
> > +if ( d->parent )
> > +{
> > +domain_unpause(d->parent);
> > +d->parent = NULL;
>
> Did this want to be `if ( d->parent_paused )`?

If the domain has the parent pointer set, it's guaranteed that the
parent is paused. It's paused before the parent pointer is set during
the fork hypercall.

>
> > +static int bring_up_vcpus(struct domain *cd, struct cpupool *cpupool)
> > +{
> > +int ret;
> > +unsigned int i;
> > +
> > +if ( (ret = cpupool_move_domain(cd, cpupool)) )
> > +return ret;
> > +
> > +for ( i = 0; i < cd->max_vcpus; i++ )
> > +{
> > +if ( cd->vcpu[i] )
> > +continue;
> > +
> > +if ( !vcpu_create(cd, i) )
> > +return -EINVAL;
>
> You're not copying the contents of the vcpu registers or anything here
> -- is that something you're leaving to your dom0 agent?

The registers are being copied as part of the HVM contexts.

>
> > +static int mem_sharing_fork(struct domain *d, struct domain *cd)
> > +{
> > +int rc = -EINVAL;
> > +
> > +if ( !cd->controller_pause_count )
> > +return rc;
> > +
> > +/*
> > + * We only want to pause the parent once, not each time this
> > + * operation is restarted due to preemption.
> > + */
> > +if ( !cd->parent_paused )
> > +{
> > +domain_pause(d);
> > +cd->parent_paused = true;
> > +}
> > +
> > +cd->max_pages = d->max_pages;
> > +cd->max_vcpus = d->max_vcpus;
> > +
> > +/* this is preemptible so it's the first to get done */
> > +if ( (rc = fork_hap_allocation(d, cd)) )
> > +goto done;
> > +
> > +if ( (rc = bring_up_vcpus(cd, d->cpupool)) )
> > +goto done;
> > +
> > +if ( (rc = hvm_copy_context_and_params(d, cd)) )
> > +goto done;
> > +
> > +fork_tsc(d, cd);
> > +
> > +cd->parent = d;
>
> What happens if the parent dies?
>
> It seems like we might want to do get_domain(parent) here, and
> put_domain(parent) in domain_destroy.

If forks are still active when someone destroys the parent than Xen
will crash I assume. Right now it's a requirement that the parent
remains in existence - and it's paused - while there are forks active.
We enforce the pause state but making the parent undestroyable is not
implemented right now. We just trust that the user of this
experimental system won't do that.

But yes, doing the get_domain()/put_domain() dance would be an easy
way to do that. Will add that and then we don't have to worry about
the parent getting pulled from under our feat.

>
> I'll probably need to come back to this, at which point I may have more
> questions.

Thanks!
Tamas

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 15/18] xen/mem_sharing: VM forking

2020-01-23 Thread George Dunlap
On 1/21/20 5:49 PM, Tamas K Lengyel wrote:
> VM forking is the process of creating a domain with an empty memory space and 
> a
> parent domain specified from which to populate the memory when necessary. For
> the new domain to be functional the VM state is copied over as part of the 
> fork
> operation (HVM params, hap allocation, etc).
> 
> Signed-off-by: Tamas K Lengyel 

Overall this looks really good.  Just a few questions.

> ---
>  xen/arch/x86/domain.c |   9 ++
>  xen/arch/x86/hvm/hvm.c|   2 +-
>  xen/arch/x86/mm/mem_sharing.c | 220 ++
>  xen/arch/x86/mm/p2m.c |  11 +-
>  xen/include/asm-x86/mem_sharing.h |  20 ++-
>  xen/include/public/memory.h   |   5 +
>  xen/include/xen/sched.h   |   2 +
>  7 files changed, 265 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index 28fefa1f81..953abcc1fc 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -2198,6 +2198,15 @@ int domain_relinquish_resources(struct domain *d)
>  ret = relinquish_shared_pages(d);
>  if ( ret )
>  return ret;
> +
> +/*
> + * If the domain is forked, decrement the parent's pause count.
> + */
> +if ( d->parent )
> +{
> +domain_unpause(d->parent);
> +d->parent = NULL;

Did this want to be `if ( d->parent_paused )`?

> +static int bring_up_vcpus(struct domain *cd, struct cpupool *cpupool)
> +{
> +int ret;
> +unsigned int i;
> +
> +if ( (ret = cpupool_move_domain(cd, cpupool)) )
> +return ret;
> +
> +for ( i = 0; i < cd->max_vcpus; i++ )
> +{
> +if ( cd->vcpu[i] )
> +continue;
> +
> +if ( !vcpu_create(cd, i) )
> +return -EINVAL;

You're not copying the contents of the vcpu registers or anything here
-- is that something you're leaving to your dom0 agent?

> +static int mem_sharing_fork(struct domain *d, struct domain *cd)
> +{
> +int rc = -EINVAL;
> +
> +if ( !cd->controller_pause_count )
> +return rc;
> +
> +/*
> + * We only want to pause the parent once, not each time this
> + * operation is restarted due to preemption.
> + */
> +if ( !cd->parent_paused )
> +{
> +domain_pause(d);
> +cd->parent_paused = true;
> +}
> +
> +cd->max_pages = d->max_pages;
> +cd->max_vcpus = d->max_vcpus;
> +
> +/* this is preemptible so it's the first to get done */
> +if ( (rc = fork_hap_allocation(d, cd)) )
> +goto done;
> +
> +if ( (rc = bring_up_vcpus(cd, d->cpupool)) )
> +goto done;
> +
> +if ( (rc = hvm_copy_context_and_params(d, cd)) )
> +goto done;
> +
> +fork_tsc(d, cd);
> +
> +cd->parent = d;

What happens if the parent dies?

It seems like we might want to do get_domain(parent) here, and
put_domain(parent) in domain_destroy.

I'll probably need to come back to this, at which point I may have more
questions.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v5 15/18] xen/mem_sharing: VM forking

2020-01-21 Thread Tamas K Lengyel
VM forking is the process of creating a domain with an empty memory space and a
parent domain specified from which to populate the memory when necessary. For
the new domain to be functional the VM state is copied over as part of the fork
operation (HVM params, hap allocation, etc).

Signed-off-by: Tamas K Lengyel 
---
 xen/arch/x86/domain.c |   9 ++
 xen/arch/x86/hvm/hvm.c|   2 +-
 xen/arch/x86/mm/mem_sharing.c | 220 ++
 xen/arch/x86/mm/p2m.c |  11 +-
 xen/include/asm-x86/mem_sharing.h |  20 ++-
 xen/include/public/memory.h   |   5 +
 xen/include/xen/sched.h   |   2 +
 7 files changed, 265 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 28fefa1f81..953abcc1fc 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2198,6 +2198,15 @@ int domain_relinquish_resources(struct domain *d)
 ret = relinquish_shared_pages(d);
 if ( ret )
 return ret;
+
+/*
+ * If the domain is forked, decrement the parent's pause count.
+ */
+if ( d->parent )
+{
+domain_unpause(d->parent);
+d->parent = NULL;
+}
 }
 #endif
 
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index e60b4931bf..6012b88845 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1906,7 +1906,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long 
gla,
 }
 #endif
 
-/* Spurious fault? PoD and log-dirty also take this path. */
+/* Spurious fault? PoD, log-dirty and VM forking also take this path. */
 if ( p2m_is_ram(p2mt) )
 {
 rc = 1;
diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index e3ddb63b4f..749305417c 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -22,11 +22,13 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -36,6 +38,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 #include 
 
 #include "mm-locks.h"
@@ -1421,6 +1426,191 @@ static inline int mem_sharing_control(struct domain *d, 
bool enable)
 return 0;
 }
 
+/*
+ * Forking a page only gets called when the VM faults due to no entry being
+ * in the EPT for the access. Depending on the type of access we either
+ * populate the physmap with a shared entry for read-only access or
+ * fork the page if its a write access.
+ *
+ * The client p2m is already locked so we only need to lock
+ * the parent's here.
+ */
+int mem_sharing_fork_page(struct domain *d, gfn_t gfn, bool unsharing)
+{
+int rc = -ENOENT;
+shr_handle_t handle;
+struct domain *parent;
+struct p2m_domain *p2m;
+unsigned long gfn_l = gfn_x(gfn);
+mfn_t mfn, new_mfn;
+p2m_type_t p2mt;
+struct page_info *page;
+
+if ( !mem_sharing_is_fork(d) )
+return -ENOENT;
+
+parent = d->parent;
+
+if ( !unsharing )
+{
+/* For read-only accesses we just add a shared entry to the physmap */
+while ( parent )
+{
+if ( !(rc = nominate_page(parent, gfn, 0, )) )
+break;
+
+parent = parent->parent;
+}
+
+if ( !rc )
+{
+/* The client's p2m is already locked */
+struct p2m_domain *pp2m = p2m_get_hostp2m(parent);
+
+p2m_lock(pp2m);
+rc = add_to_physmap(parent, gfn_l, handle, d, gfn_l, false);
+p2m_unlock(pp2m);
+
+if ( !rc )
+return 0;
+}
+}
+
+/*
+ * If it's a write access (ie. unsharing) or if adding a shared entry to
+ * the physmap failed we'll fork the page directly.
+ */
+p2m = p2m_get_hostp2m(d);
+parent = d->parent;
+
+while ( parent )
+{
+mfn = get_gfn_query(parent, gfn_l, );
+
+if ( mfn_valid(mfn) && p2m_is_any_ram(p2mt) )
+break;
+
+put_gfn(parent, gfn_l);
+parent = parent->parent;
+}
+
+if ( !parent )
+return -ENOENT;
+
+if ( !(page = alloc_domheap_page(d, 0)) )
+{
+put_gfn(parent, gfn_l);
+return -ENOMEM;
+}
+
+new_mfn = page_to_mfn(page);
+copy_domain_page(new_mfn, mfn);
+set_gpfn_from_mfn(mfn_x(new_mfn), gfn_l);
+
+put_gfn(parent, gfn_l);
+
+return p2m->set_entry(p2m, gfn, new_mfn, PAGE_ORDER_4K, p2m_ram_rw,
+  p2m->default_access, -1);
+}
+
+static int bring_up_vcpus(struct domain *cd, struct cpupool *cpupool)
+{
+int ret;
+unsigned int i;
+
+if ( (ret = cpupool_move_domain(cd, cpupool)) )
+return ret;
+
+for ( i = 0; i < cd->max_vcpus; i++ )
+{
+if ( cd->vcpu[i] )
+continue;
+
+if ( !vcpu_create(cd, i) )
+return -EINVAL;
+}
+
+