Re: [Xen-devel] [PATCH v4 2/2] xen: move TLB-flush filtering out into populate_physmap during vm creation
> What is the scenario that you would want toolstack to set such flag? > > Shouldn't hypervisor always set the flag when the guest is never > unpaused and always clear / ignore that flag if the guest is ever > unpaused? If that's all is needed, why does toolstack need to get > involved? You are right. I will not expose the flag to toolstack. - Original Message - From: wei.l...@citrix.com To: dongli.zh...@oracle.com Cc: jbeul...@suse.com, wei.l...@citrix.com, konrad.w...@oracle.com, sstabell...@kernel.org, t...@xen.org, dario.faggi...@citrix.com, ian.jack...@eu.citrix.com, george.dun...@eu.citrix.com, david.vra...@citrix.com, xen-devel@lists.xen.org, andrew.coop...@citrix.com Sent: Friday, September 16, 2016 6:55:33 PM GMT +08:00 Beijing / Chongqing / Hong Kong / Urumqi Subject: Re: [PATCH v4 2/2] xen: move TLB-flush filtering out into populate_physmap during vm creation On Fri, Sep 16, 2016 at 03:47:23AM -0700, Dongli Zhang wrote: > > > +/* > > > + * MEMF_no_tlbflush can be set only during vm creation phase when > > > + * is_ever_unpaused is still false before this domain gets unpaused > > > for > > > + * the first time. > > > + */ > > > +if ( unlikely(!d->is_ever_unpaused) ) > > > +a->memflags |= MEMF_no_tlbflush; > > > > So you no longer mean to expose this to the caller? > > hmmm I would prefer to expose this to the toolstack if it is OK for > maintainers. > > I copy and paste Wei's comments below: > > == > > > Rule 1. It is toolstack's responsibility to set the "MEMF_no_tlbflush" bit > > in memflags. The toolstack developers should be careful that > > "MEMF_no_tlbflush" should never be used after vm creation is finished. > > > > Is it possible to have a safety catch for this in the hypervisor? In > general IMHO we should avoid providing an interface that is possible to > create a security problem. > > == > > Hi Wei, since it is possible to have a safety catch now in the hypervisor (the > bit is allowed only before VM creation is finished), is it OK for you to > expose > MEMF_no_tlbflush bit to toolstack? > What is the scenario that you would want toolstack to set such flag? Shouldn't hypervisor always set the flag when the guest is never unpaused and always clear / ignore that flag if the guest is ever unpaused? If that's all is needed, why does toolstack need to get involved? Do I miss something here? Wei. > Thank you very much! > > Dongli Zhang ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 2/2] xen: move TLB-flush filtering out into populate_physmap during vm creation
On Fri, Sep 16, 2016 at 03:47:23AM -0700, Dongli Zhang wrote: > > > +/* > > > + * MEMF_no_tlbflush can be set only during vm creation phase when > > > + * is_ever_unpaused is still false before this domain gets unpaused > > > for > > > + * the first time. > > > + */ > > > +if ( unlikely(!d->is_ever_unpaused) ) > > > +a->memflags |= MEMF_no_tlbflush; > > > > So you no longer mean to expose this to the caller? > > hmmm I would prefer to expose this to the toolstack if it is OK for > maintainers. > > I copy and paste Wei's comments below: > > == > > > Rule 1. It is toolstack's responsibility to set the "MEMF_no_tlbflush" bit > > in memflags. The toolstack developers should be careful that > > "MEMF_no_tlbflush" should never be used after vm creation is finished. > > > > Is it possible to have a safety catch for this in the hypervisor? In > general IMHO we should avoid providing an interface that is possible to > create a security problem. > > == > > Hi Wei, since it is possible to have a safety catch now in the hypervisor (the > bit is allowed only before VM creation is finished), is it OK for you to > expose > MEMF_no_tlbflush bit to toolstack? > What is the scenario that you would want toolstack to set such flag? Shouldn't hypervisor always set the flag when the guest is never unpaused and always clear / ignore that flag if the guest is ever unpaused? If that's all is needed, why does toolstack need to get involved? Do I miss something here? Wei. > Thank you very much! > > Dongli Zhang ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 2/2] xen: move TLB-flush filtering out into populate_physmap during vm creation
> > +/* > > + * MEMF_no_tlbflush can be set only during vm creation phase when > > + * is_ever_unpaused is still false before this domain gets unpaused for > > + * the first time. > > + */ > > +if ( unlikely(!d->is_ever_unpaused) ) > > +a->memflags |= MEMF_no_tlbflush; > > So you no longer mean to expose this to the caller? hmmm I would prefer to expose this to the toolstack if it is OK for maintainers. I copy and paste Wei's comments below: == > Rule 1. It is toolstack's responsibility to set the "MEMF_no_tlbflush" bit > in memflags. The toolstack developers should be careful that > "MEMF_no_tlbflush" should never be used after vm creation is finished. > Is it possible to have a safety catch for this in the hypervisor? In general IMHO we should avoid providing an interface that is possible to create a security problem. == Hi Wei, since it is possible to have a safety catch now in the hypervisor (the bit is allowed only before VM creation is finished), is it OK for you to expose MEMF_no_tlbflush bit to toolstack? Thank you very much! Dongli Zhang ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 2/2] xen: move TLB-flush filtering out into populate_physmap during vm creation
>>> On 12.09.16 at 10:16,wrote: > --- a/xen/common/domain.c > +++ b/xen/common/domain.c > @@ -303,6 +303,8 @@ struct domain *domain_create(domid_t domid, unsigned int > domcr_flags, > if ( !zalloc_cpumask_var(>domain_dirty_cpumask) ) > goto fail; > > +d->is_ever_unpaused = false; This it not needed - struct domain starts out as all zeros anyway. > @@ -1004,6 +1006,15 @@ int domain_unpause_by_systemcontroller(struct domain > *d) > { > int old, new, prev = d->controller_pause_count; > > +/* > + * Set is_ever_unpaused to true when this domain gets unpaused for the > + * first time. We record this information here to help populate_physmap > + * verify whether the domain has ever been unpaused. MEMF_no_tlbflush > + * is allowed to be set by populate_physmap only during vm creation. > + */ > +if ( unlikely(!d->is_ever_unpaused) ) > +d->is_ever_unpaused = true; As mentioned before, the conditional is pointless. And just like Dario, I dislike the name of the field. How about "has_run", "was_unpaused", or "is_alive"? Or even better, how about combining this with the is_shutting_down and is_shut_down into an enum? For that latter variant, that would presumably better be a patch on its own then. > @@ -150,6 +152,14 @@ static void populate_physmap(struct memop_args *a) > max_order(curr_d)) ) > return; > > +/* > + * MEMF_no_tlbflush can be set only during vm creation phase when > + * is_ever_unpaused is still false before this domain gets unpaused for > + * the first time. > + */ > +if ( unlikely(!d->is_ever_unpaused) ) > +a->memflags |= MEMF_no_tlbflush; So you no longer mean to expose this to the caller? > @@ -214,6 +224,20 @@ static void populate_physmap(struct memop_args *a) > goto out; > } > > +if ( unlikely(!d->is_ever_unpaused) ) Please check MEMF_no_tlbflush here instead. > +{ > +for ( j = 0; j < (1U << a->extent_order); j++ ) > +{ > +if ( page_needs_tlbflush([j], need_tlbflush, > + tlbflush_timestamp, > + tlbflush_current_time()) ) > +{ > +need_tlbflush = true; > +tlbflush_timestamp = page[j].tlbflush_timestamp; > +} > +} > +} > + > mfn = page_to_mfn(page); > } > > @@ -232,6 +256,16 @@ static void populate_physmap(struct memop_args *a) > } > > out: > +if ( need_tlbflush ) > +{ > +cpumask_t mask = cpu_online_map; > +tlbflush_filter(mask, tlbflush_timestamp); Blank line between declarations and statements please. Also, considering this repeats what gets done in page_alloc.c, I think it should also be factored out into a function. And along those lines I think the other abstraction should then also go further and take care of the updating of need_tlbflush and tlbflush_timestamp. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 2/2] xen: move TLB-flush filtering out into populate_physmap during vm creation
On Mon, 2016-09-12 at 16:16 +0800, Dongli Zhang wrote: > This patch implemented parts of TODO left in commit id > a902c12ee45fc9389eb8fe54eeddaf267a555c58. > We usually put both the (not necessarily full) hash and the subject line of the commit in here. > Signed-off-by: Dongli Zhang> > diff --git a/xen/common/domain.c b/xen/common/domain.c > index a8804e4..7be1bee 100644 > @@ -303,6 +303,8 @@ struct domain *domain_create(domid_t domid, > unsigned int domcr_flags, > if ( !zalloc_cpumask_var(>domain_dirty_cpumask) ) > goto fail; > > +d->is_ever_unpaused = false; > + > I'd go for something like "first_unpaused" or "creation_finished", but if maintainers are happy with this one already, I'm fine too. > @@ -1004,6 +1006,15 @@ int domain_unpause_by_systemcontroller(struct > domain *d) > { > int old, new, prev = d->controller_pause_count; > > +/* > + * Set is_ever_unpaused to true when this domain gets unpaused > for the > + * first time. We record this information here to help > populate_physmap > + * verify whether the domain has ever been unpaused. > MEMF_no_tlbflush > + * is allowed to be set by populate_physmap only during vm > creation. > + */ "We record this information here for populate_physmap to figure out that the domain has already been unpaused, after finishing being created. That's because we're allowed to set MEMF_no_tlbflush only during VM creation." Or, de-focusing the unpausing even more: "We record this information here for populate_physmap to figure out tha t the domain has finished being created. In fact, we're only allowed to set the MEMF_no_tlbflush flag during VM creation." I.e., the important thing is not really the unpausing (that's where we found it handy to put the check), it's the fact that something should only happen at creation time and why (see below). > +if ( unlikely(!d->is_ever_unpaused) ) > +d->is_ever_unpaused = true; > + > do > { > old = prev; > diff --git a/xen/common/memory.c b/xen/common/memory.c > index cc0f69e..f3a733b 100644 > @@ -150,6 +152,14 @@ static void populate_physmap(struct memop_args > *a) > max_order(curr_d)) ) > return; > > +/* > + * MEMF_no_tlbflush can be set only during vm creation phase > when > + * is_ever_unpaused is still false before this domain gets > unpaused for > + * the first time. > + */ > What about, 'citing' from the changelog: "With MEMF_no_tlbflush set, alloc_heap_pages() will ignore TLB- flushes. After VM creation, this is a security issue (it can make pages accessible to guest B, when guest A may still have a cached mapping to them). So we only do this only during domain creation, when the domain itself has not yet been unpaused for the first time." > +if ( unlikely(!d->is_ever_unpaused) ) > +a->memflags |= MEMF_no_tlbflush; > + > for ( i = a->nr_done; i < a->nr_extents; i++ ) > { > if ( i != a->nr_done && hypercall_preempt_check() ) > diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h > index 2f9c15f..7fe8841 100644 > @@ -474,6 +474,9 @@ struct domain > unsigned int guest_request_enabled : 1; > unsigned int guest_request_sync : 1; > } monitor; > + > +/* set to true the first time this domain gets unpaused. */ > I think it's relevant to say _when_ that is. What about: /* * Set to true at the very end of domain creation, when the domain is * unpaused for the first time by the systemcontroller. */ (not 100% happy about the "by the systemcontroller" part... but that's the idea.) > +bool_t is_ever_unpaused; > As said by Jan already --here and elsewhere-- new code should use 'bool'. Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 2/2] xen: move TLB-flush filtering out into populate_physmap during vm creation
This patch implemented parts of TODO left in commit id a902c12ee45fc9389eb8fe54eeddaf267a555c58. It moved TLB-flush filtering out into populate_physmap. Because of TLB-flush in alloc_heap_pages, it's very slow to create a guest with memory size of more than 100GB on host with 100+ cpus. This patch introduced a "MEMF_no_tlbflush" bit to memflags to indicate whether TLB-flush should be done in alloc_heap_pages or its caller populate_physmap. Once this bit is set in memflags, alloc_heap_pages will ignore TLB-flush. To use this bit after vm is created might lead to security issue, that is, this would make pages accessible to the guest B, when guest A may still have a cached mapping to them. Therefore, this patch also introduced a "is_ever_unpaused" field to struct domain to indicate whether this domain has ever got unpaused by hypervisor. MEMF_no_tlbflush can be set only during vm creation phase when is_ever_unpaused is still false before this domain gets unpaused for the first time. Signed-off-by: Dongli Zhang--- Changed since v3: * Set the flag to true in domain_unpause_by_systemcontroller when unpausing the guest domain for the first time. * Use true/false for all boot_t variables. * Add unlikely to optimize "if statement". * Correct comment style. Changed since v2: * Limit this optimization to domain creation time. --- xen/common/domain.c | 11 +++ xen/common/memory.c | 34 ++ xen/common/page_alloc.c | 3 ++- xen/include/xen/mm.h| 2 ++ xen/include/xen/sched.h | 3 +++ 5 files changed, 52 insertions(+), 1 deletion(-) diff --git a/xen/common/domain.c b/xen/common/domain.c index a8804e4..7be1bee 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -303,6 +303,8 @@ struct domain *domain_create(domid_t domid, unsigned int domcr_flags, if ( !zalloc_cpumask_var(>domain_dirty_cpumask) ) goto fail; +d->is_ever_unpaused = false; + if ( domcr_flags & DOMCRF_hvm ) d->guest_type = guest_type_hvm; else if ( domcr_flags & DOMCRF_pvh ) @@ -1004,6 +1006,15 @@ int domain_unpause_by_systemcontroller(struct domain *d) { int old, new, prev = d->controller_pause_count; +/* + * Set is_ever_unpaused to true when this domain gets unpaused for the + * first time. We record this information here to help populate_physmap + * verify whether the domain has ever been unpaused. MEMF_no_tlbflush + * is allowed to be set by populate_physmap only during vm creation. + */ +if ( unlikely(!d->is_ever_unpaused) ) +d->is_ever_unpaused = true; + do { old = prev; diff --git a/xen/common/memory.c b/xen/common/memory.c index cc0f69e..f3a733b 100644 --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -141,6 +141,8 @@ static void populate_physmap(struct memop_args *a) unsigned int i, j; xen_pfn_t gpfn, mfn; struct domain *d = a->domain, *curr_d = current->domain; +bool_t need_tlbflush = false; +uint32_t tlbflush_timestamp = 0; if ( !guest_handle_subrange_okay(a->extent_list, a->nr_done, a->nr_extents-1) ) @@ -150,6 +152,14 @@ static void populate_physmap(struct memop_args *a) max_order(curr_d)) ) return; +/* + * MEMF_no_tlbflush can be set only during vm creation phase when + * is_ever_unpaused is still false before this domain gets unpaused for + * the first time. + */ +if ( unlikely(!d->is_ever_unpaused) ) +a->memflags |= MEMF_no_tlbflush; + for ( i = a->nr_done; i < a->nr_extents; i++ ) { if ( i != a->nr_done && hypercall_preempt_check() ) @@ -214,6 +224,20 @@ static void populate_physmap(struct memop_args *a) goto out; } +if ( unlikely(!d->is_ever_unpaused) ) +{ +for ( j = 0; j < (1U << a->extent_order); j++ ) +{ +if ( page_needs_tlbflush([j], need_tlbflush, + tlbflush_timestamp, + tlbflush_current_time()) ) +{ +need_tlbflush = true; +tlbflush_timestamp = page[j].tlbflush_timestamp; +} +} +} + mfn = page_to_mfn(page); } @@ -232,6 +256,16 @@ static void populate_physmap(struct memop_args *a) } out: +if ( need_tlbflush ) +{ +cpumask_t mask = cpu_online_map; +tlbflush_filter(mask, tlbflush_timestamp); +if ( !cpumask_empty() ) +{ +perfc_incr(need_flush_tlb_flush); +flush_tlb_mask(); +} +} a->nr_done = i; } diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c index 5b93a01..04ca26a