Re: [Xen-devel] [PATCH v4 2/2] xen: move TLB-flush filtering out into populate_physmap during vm creation

2016-09-16 Thread Dongli Zhang
> What is the scenario that you would want toolstack to set such flag?
> 
> Shouldn't hypervisor always set the flag when the guest is never
> unpaused and always clear / ignore that flag if the guest is ever
> unpaused? If that's all is needed, why does toolstack need to get
> involved?

You are right. I will not expose the flag to toolstack.

- Original Message -
From: wei.l...@citrix.com
To: dongli.zh...@oracle.com
Cc: jbeul...@suse.com, wei.l...@citrix.com, konrad.w...@oracle.com, 
sstabell...@kernel.org, t...@xen.org, dario.faggi...@citrix.com, 
ian.jack...@eu.citrix.com, george.dun...@eu.citrix.com, 
david.vra...@citrix.com, xen-devel@lists.xen.org, andrew.coop...@citrix.com
Sent: Friday, September 16, 2016 6:55:33 PM GMT +08:00 Beijing / Chongqing / 
Hong Kong / Urumqi
Subject: Re: [PATCH v4 2/2] xen: move TLB-flush filtering out into 
populate_physmap during vm creation

On Fri, Sep 16, 2016 at 03:47:23AM -0700, Dongli Zhang wrote:
> > > +/*
> > > + * MEMF_no_tlbflush can be set only during vm creation phase when
> > > + * is_ever_unpaused is still false before this domain gets unpaused 
> > > for
> > > + * the first time.
> > > + */
> > > +if ( unlikely(!d->is_ever_unpaused) )
> > > +a->memflags |= MEMF_no_tlbflush;
> > 
> > So you no longer mean to expose this to the caller?
> 
> hmmm I would prefer to expose this to the toolstack if it is OK for
> maintainers.
> 
> I copy and paste Wei's comments below:
> 
> ==
> 
> > Rule 1. It is toolstack's responsibility to set the "MEMF_no_tlbflush" bit
> > in memflags. The toolstack developers should be careful that
> > "MEMF_no_tlbflush" should never be used after vm creation is finished.
> > 
> 
> Is it possible to have a safety catch for this in the hypervisor? In
> general IMHO we should avoid providing an interface that is possible to
> create a security problem.
> 
> ==
> 
> Hi Wei, since it is possible to have a safety catch now in the hypervisor (the
> bit is allowed only before VM creation is finished), is it OK for you to 
> expose
> MEMF_no_tlbflush bit to toolstack?
> 

What is the scenario that you would want toolstack to set such flag?

Shouldn't hypervisor always set the flag when the guest is never
unpaused and always clear / ignore that flag if the guest is ever
unpaused? If that's all is needed, why does toolstack need to get
involved?

Do I miss something here?

Wei.


> Thank you very much!
> 
> Dongli Zhang

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 2/2] xen: move TLB-flush filtering out into populate_physmap during vm creation

2016-09-16 Thread Wei Liu
On Fri, Sep 16, 2016 at 03:47:23AM -0700, Dongli Zhang wrote:
> > > +/*
> > > + * MEMF_no_tlbflush can be set only during vm creation phase when
> > > + * is_ever_unpaused is still false before this domain gets unpaused 
> > > for
> > > + * the first time.
> > > + */
> > > +if ( unlikely(!d->is_ever_unpaused) )
> > > +a->memflags |= MEMF_no_tlbflush;
> > 
> > So you no longer mean to expose this to the caller?
> 
> hmmm I would prefer to expose this to the toolstack if it is OK for
> maintainers.
> 
> I copy and paste Wei's comments below:
> 
> ==
> 
> > Rule 1. It is toolstack's responsibility to set the "MEMF_no_tlbflush" bit
> > in memflags. The toolstack developers should be careful that
> > "MEMF_no_tlbflush" should never be used after vm creation is finished.
> > 
> 
> Is it possible to have a safety catch for this in the hypervisor? In
> general IMHO we should avoid providing an interface that is possible to
> create a security problem.
> 
> ==
> 
> Hi Wei, since it is possible to have a safety catch now in the hypervisor (the
> bit is allowed only before VM creation is finished), is it OK for you to 
> expose
> MEMF_no_tlbflush bit to toolstack?
> 

What is the scenario that you would want toolstack to set such flag?

Shouldn't hypervisor always set the flag when the guest is never
unpaused and always clear / ignore that flag if the guest is ever
unpaused? If that's all is needed, why does toolstack need to get
involved?

Do I miss something here?

Wei.


> Thank you very much!
> 
> Dongli Zhang

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 2/2] xen: move TLB-flush filtering out into populate_physmap during vm creation

2016-09-16 Thread Dongli Zhang
> > +/*
> > + * MEMF_no_tlbflush can be set only during vm creation phase when
> > + * is_ever_unpaused is still false before this domain gets unpaused for
> > + * the first time.
> > + */
> > +if ( unlikely(!d->is_ever_unpaused) )
> > +a->memflags |= MEMF_no_tlbflush;
> 
> So you no longer mean to expose this to the caller?

hmmm I would prefer to expose this to the toolstack if it is OK for
maintainers.

I copy and paste Wei's comments below:

==

> Rule 1. It is toolstack's responsibility to set the "MEMF_no_tlbflush" bit
> in memflags. The toolstack developers should be careful that
> "MEMF_no_tlbflush" should never be used after vm creation is finished.
> 

Is it possible to have a safety catch for this in the hypervisor? In
general IMHO we should avoid providing an interface that is possible to
create a security problem.

==

Hi Wei, since it is possible to have a safety catch now in the hypervisor (the
bit is allowed only before VM creation is finished), is it OK for you to expose
MEMF_no_tlbflush bit to toolstack?

Thank you very much!

Dongli Zhang

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 2/2] xen: move TLB-flush filtering out into populate_physmap during vm creation

2016-09-15 Thread Jan Beulich
>>> On 12.09.16 at 10:16,  wrote:
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -303,6 +303,8 @@ struct domain *domain_create(domid_t domid, unsigned int 
> domcr_flags,
>  if ( !zalloc_cpumask_var(>domain_dirty_cpumask) )
>  goto fail;
>  
> +d->is_ever_unpaused = false;

This it not needed - struct domain starts out as all zeros anyway.

> @@ -1004,6 +1006,15 @@ int domain_unpause_by_systemcontroller(struct domain 
> *d)
>  {
>  int old, new, prev = d->controller_pause_count;
>  
> +/*
> + * Set is_ever_unpaused to true when this domain gets unpaused for the
> + * first time. We record this information here to help populate_physmap
> + * verify whether the domain has ever been unpaused. MEMF_no_tlbflush
> + * is allowed to be set by populate_physmap only during vm creation.
> + */
> +if ( unlikely(!d->is_ever_unpaused) )
> +d->is_ever_unpaused = true;

As mentioned before, the conditional is pointless. And just like Dario,
I dislike the name of the field. How about "has_run", "was_unpaused",
or "is_alive"? Or even better, how about combining this with the
is_shutting_down and is_shut_down into an enum? For that latter
variant, that would presumably better be a patch on its own then.

> @@ -150,6 +152,14 @@ static void populate_physmap(struct memop_args *a)
>  max_order(curr_d)) )
>  return;
>  
> +/*
> + * MEMF_no_tlbflush can be set only during vm creation phase when
> + * is_ever_unpaused is still false before this domain gets unpaused for
> + * the first time.
> + */
> +if ( unlikely(!d->is_ever_unpaused) )
> +a->memflags |= MEMF_no_tlbflush;

So you no longer mean to expose this to the caller?

> @@ -214,6 +224,20 @@ static void populate_physmap(struct memop_args *a)
>  goto out;
>  }
>  
> +if ( unlikely(!d->is_ever_unpaused) )

Please check MEMF_no_tlbflush here instead.

> +{
> +for ( j = 0; j < (1U << a->extent_order); j++ )
> +{
> +if ( page_needs_tlbflush([j], need_tlbflush,
> + tlbflush_timestamp,
> + tlbflush_current_time()) )
> +{
> +need_tlbflush = true;
> +tlbflush_timestamp = page[j].tlbflush_timestamp;
> +}
> +}
> +}
> +
>  mfn = page_to_mfn(page);
>  }
>  
> @@ -232,6 +256,16 @@ static void populate_physmap(struct memop_args *a)
>  }
>  
>  out:
> +if ( need_tlbflush )
> +{
> +cpumask_t mask = cpu_online_map;
> +tlbflush_filter(mask, tlbflush_timestamp);

Blank line between declarations and statements please. Also,
considering this repeats what gets done in page_alloc.c, I think
it should also be factored out into a function. And along those
lines I think the other abstraction should then also go further
and take care of the updating of need_tlbflush and
tlbflush_timestamp.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 2/2] xen: move TLB-flush filtering out into populate_physmap during vm creation

2016-09-14 Thread Dario Faggioli
On Mon, 2016-09-12 at 16:16 +0800, Dongli Zhang wrote:
> This patch implemented parts of TODO left in commit id
> a902c12ee45fc9389eb8fe54eeddaf267a555c58. 
>
We usually put both the (not necessarily full) hash and the subject
line of the commit in here.

> Signed-off-by: Dongli Zhang 
> 
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index a8804e4..7be1bee 100644
> @@ -303,6 +303,8 @@ struct domain *domain_create(domid_t domid,
> unsigned int domcr_flags,
>  if ( !zalloc_cpumask_var(>domain_dirty_cpumask) )
>  goto fail;
>  
> +d->is_ever_unpaused = false;
> +
>
I'd go for something like "first_unpaused" or "creation_finished", but
if maintainers are happy with this one already, I'm fine too.

> @@ -1004,6 +1006,15 @@ int domain_unpause_by_systemcontroller(struct
> domain *d)
>  {
>  int old, new, prev = d->controller_pause_count;
>  
> +/*
> + * Set is_ever_unpaused to true when this domain gets unpaused
> for the
> + * first time. We record this information here to help
> populate_physmap
> + * verify whether the domain has ever been unpaused.
> MEMF_no_tlbflush
> + * is allowed to be set by populate_physmap only during vm
> creation.
> + */

"We record this information here for populate_physmap to figure out
 that the domain has already been unpaused, after finishing being
 created. That's because we're allowed to set MEMF_no_tlbflush only
 during VM creation."

Or, de-focusing the unpausing even more:

"We record this information here for populate_physmap to figure out
 tha
t the domain has finished being created. In fact, we're only
 allowed to
set the MEMF_no_tlbflush flag during VM creation."

I.e., the important thing is not really the unpausing (that's where we
found it handy to put the check), it's the fact that something should
only happen at creation time and why (see below).

> +if ( unlikely(!d->is_ever_unpaused) )
> +d->is_ever_unpaused = true;
> +
>  do
>  {
>  old = prev;

> diff --git a/xen/common/memory.c b/xen/common/memory.c
> index cc0f69e..f3a733b 100644
> @@ -150,6 +152,14 @@ static void populate_physmap(struct memop_args
> *a)
>  max_order(curr_d)) )
>  return;
>  
> +/*
> + * MEMF_no_tlbflush can be set only during vm creation phase
> when
> + * is_ever_unpaused is still false before this domain gets
> unpaused for
> + * the first time.
> + */
>
What about, 'citing' from the changelog:

"With MEMF_no_tlbflush set, alloc_heap_pages() will ignore TLB-
 flushes. After VM creation, this is a security issue (it can make
 pages accessible to guest B, when guest A may still have a cached
 mapping to them). So we only do this only during domain creation,
 when the domain itself has not yet been unpaused for the first
 time."

> +if ( unlikely(!d->is_ever_unpaused) )
> +a->memflags |= MEMF_no_tlbflush;
> +
>  for ( i = a->nr_done; i < a->nr_extents; i++ )
>  {
>  if ( i != a->nr_done && hypercall_preempt_check() )

> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> index 2f9c15f..7fe8841 100644
> @@ -474,6 +474,9 @@ struct domain
>  unsigned int guest_request_enabled   : 1;
>  unsigned int guest_request_sync  : 1;
>  } monitor;
> +
> +/* set to true the first time this domain gets unpaused. */
>
I think it's relevant to say _when_ that is. What about:

/*
 * Set to true at the very end of domain creation, when the domain is 
 * unpaused for the first time by the systemcontroller.
 */

(not 100% happy about the "by the systemcontroller" part... but that's
the idea.)

> +bool_t is_ever_unpaused;
>
As said by Jan already --here and elsewhere-- new code should use
'bool'.

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)



signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 2/2] xen: move TLB-flush filtering out into populate_physmap during vm creation

2016-09-12 Thread Dongli Zhang
This patch implemented parts of TODO left in commit id
a902c12ee45fc9389eb8fe54eeddaf267a555c58. It moved TLB-flush filtering out
into populate_physmap. Because of TLB-flush in alloc_heap_pages, it's very
slow to create a guest with memory size of more than 100GB on host with
100+ cpus.

This patch introduced a "MEMF_no_tlbflush" bit to memflags to indicate
whether TLB-flush should be done in alloc_heap_pages or its caller
populate_physmap. Once this bit is set in memflags, alloc_heap_pages will
ignore TLB-flush. To use this bit after vm is created might lead to
security issue, that is, this would make pages accessible to the guest B,
when guest A may still have a cached mapping to them.

Therefore, this patch also introduced a "is_ever_unpaused" field to struct
domain to indicate whether this domain has ever got unpaused by hypervisor.
MEMF_no_tlbflush can be set only during vm creation phase when
is_ever_unpaused is still false before this domain gets unpaused for the
first time.

Signed-off-by: Dongli Zhang 
---
Changed since v3:
  * Set the flag to true in domain_unpause_by_systemcontroller when
unpausing the guest domain for the first time.
  * Use true/false for all boot_t variables.
  * Add unlikely to optimize "if statement".
  * Correct comment style.

Changed since v2:
  * Limit this optimization to domain creation time.

---
 xen/common/domain.c | 11 +++
 xen/common/memory.c | 34 ++
 xen/common/page_alloc.c |  3 ++-
 xen/include/xen/mm.h|  2 ++
 xen/include/xen/sched.h |  3 +++
 5 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index a8804e4..7be1bee 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -303,6 +303,8 @@ struct domain *domain_create(domid_t domid, unsigned int 
domcr_flags,
 if ( !zalloc_cpumask_var(>domain_dirty_cpumask) )
 goto fail;
 
+d->is_ever_unpaused = false;
+
 if ( domcr_flags & DOMCRF_hvm )
 d->guest_type = guest_type_hvm;
 else if ( domcr_flags & DOMCRF_pvh )
@@ -1004,6 +1006,15 @@ int domain_unpause_by_systemcontroller(struct domain *d)
 {
 int old, new, prev = d->controller_pause_count;
 
+/*
+ * Set is_ever_unpaused to true when this domain gets unpaused for the
+ * first time. We record this information here to help populate_physmap
+ * verify whether the domain has ever been unpaused. MEMF_no_tlbflush
+ * is allowed to be set by populate_physmap only during vm creation.
+ */
+if ( unlikely(!d->is_ever_unpaused) )
+d->is_ever_unpaused = true;
+
 do
 {
 old = prev;
diff --git a/xen/common/memory.c b/xen/common/memory.c
index cc0f69e..f3a733b 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -141,6 +141,8 @@ static void populate_physmap(struct memop_args *a)
 unsigned int i, j;
 xen_pfn_t gpfn, mfn;
 struct domain *d = a->domain, *curr_d = current->domain;
+bool_t need_tlbflush = false;
+uint32_t tlbflush_timestamp = 0;
 
 if ( !guest_handle_subrange_okay(a->extent_list, a->nr_done,
  a->nr_extents-1) )
@@ -150,6 +152,14 @@ static void populate_physmap(struct memop_args *a)
 max_order(curr_d)) )
 return;
 
+/*
+ * MEMF_no_tlbflush can be set only during vm creation phase when
+ * is_ever_unpaused is still false before this domain gets unpaused for
+ * the first time.
+ */
+if ( unlikely(!d->is_ever_unpaused) )
+a->memflags |= MEMF_no_tlbflush;
+
 for ( i = a->nr_done; i < a->nr_extents; i++ )
 {
 if ( i != a->nr_done && hypercall_preempt_check() )
@@ -214,6 +224,20 @@ static void populate_physmap(struct memop_args *a)
 goto out;
 }
 
+if ( unlikely(!d->is_ever_unpaused) )
+{
+for ( j = 0; j < (1U << a->extent_order); j++ )
+{
+if ( page_needs_tlbflush([j], need_tlbflush,
+ tlbflush_timestamp,
+ tlbflush_current_time()) )
+{
+need_tlbflush = true;
+tlbflush_timestamp = page[j].tlbflush_timestamp;
+}
+}
+}
+
 mfn = page_to_mfn(page);
 }
 
@@ -232,6 +256,16 @@ static void populate_physmap(struct memop_args *a)
 }
 
 out:
+if ( need_tlbflush )
+{
+cpumask_t mask = cpu_online_map;
+tlbflush_filter(mask, tlbflush_timestamp);
+if ( !cpumask_empty() )
+{
+perfc_incr(need_flush_tlb_flush);
+flush_tlb_mask();
+}
+}
 a->nr_done = i;
 }
 
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 5b93a01..04ca26a