Re: [Xen-devel] [PATCH v3 1/1] xen: move TLB-flush filtering out into populate_physmap during vm creation

2016-09-09 Thread Andrew Cooper
On 09/09/16 10:46, Jan Beulich wrote:
 On 09.09.16 at 11:34,  wrote:
>> On 09/09/16 08:56, Jan Beulich wrote:
>> On 09.09.16 at 07:37,  wrote:
>> Did you go through and check that there is nothing this information
>> can already get derived from? I can't immediately point you at
>> anything, but it feels like there should. 
>>
> Indeed. And if there isn't and we need to do add our own flagging,
> isn't there a better way and place where to put it (e.g., what Juergen
> and Andrew are hinting at)?
 I prefer that to derive whether guest boot finishes is neither limited to a
 specific domain type (pv, hvm, pvhvm or pvh) nor a specific arch (i386, 
 x86_64,
 arm32, arm64 or more in the future). Ideally, I would have one or a combo 
 of
 fields belong to "struct domain" but there isn't.
>>> Of course.
>>>
 Can we use the field in vcpu[0]?
>>> I don't think so: What if another vCPU was scheduled to run first?
>> Use the XEN_DOMCTL_unpausedomain hypercall.  It must be called by
>> toolstacks on all architectures to complete domain construction.
> I think we had settled on this one already. The question now being
> discussed is whether a new field in the domain structure is needed,
> or whether the information he's after can be derived from already
> existing fields.

If the information were derivable from other means, there would be no
need to alter XEN_DOMCTL_unpausedomain in the first place.

I can't think of any option other than to add something new.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 1/1] xen: move TLB-flush filtering out into populate_physmap during vm creation

2016-09-09 Thread Jan Beulich
>>> On 09.09.16 at 11:34,  wrote:
> On 09/09/16 08:56, Jan Beulich wrote:
> On 09.09.16 at 07:37,  wrote:
> Did you go through and check that there is nothing this information
> can already get derived from? I can't immediately point you at
> anything, but it feels like there should. 
>
 Indeed. And if there isn't and we need to do add our own flagging,
 isn't there a better way and place where to put it (e.g., what Juergen
 and Andrew are hinting at)?
>>> I prefer that to derive whether guest boot finishes is neither limited to a
>>> specific domain type (pv, hvm, pvhvm or pvh) nor a specific arch (i386, 
>>> x86_64,
>>> arm32, arm64 or more in the future). Ideally, I would have one or a combo of
>>> fields belong to "struct domain" but there isn't.
>> Of course.
>>
>>> Can we use the field in vcpu[0]?
>> I don't think so: What if another vCPU was scheduled to run first?
> 
> Use the XEN_DOMCTL_unpausedomain hypercall.  It must be called by
> toolstacks on all architectures to complete domain construction.

I think we had settled on this one already. The question now being
discussed is whether a new field in the domain structure is needed,
or whether the information he's after can be derived from already
existing fields.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 1/1] xen: move TLB-flush filtering out into populate_physmap during vm creation

2016-09-09 Thread Andrew Cooper
On 09/09/16 08:56, Jan Beulich wrote:
 On 09.09.16 at 07:37,  wrote:
 Did you go through and check that there is nothing this information
 can already get derived from? I can't immediately point you at
 anything, but it feels like there should. 

>>> Indeed. And if there isn't and we need to do add our own flagging,
>>> isn't there a better way and place where to put it (e.g., what Juergen
>>> and Andrew are hinting at)?
>> I prefer that to derive whether guest boot finishes is neither limited to a
>> specific domain type (pv, hvm, pvhvm or pvh) nor a specific arch (i386, 
>> x86_64,
>> arm32, arm64 or more in the future). Ideally, I would have one or a combo of
>> fields belong to "struct domain" but there isn't.
> Of course.
>
>> Can we use the field in vcpu[0]?
> I don't think so: What if another vCPU was scheduled to run first?

Use the XEN_DOMCTL_unpausedomain hypercall.  It must be called by
toolstacks on all architectures to complete domain construction.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 1/1] xen: move TLB-flush filtering out into populate_physmap during vm creation

2016-09-09 Thread Jan Beulich
>>> On 09.09.16 at 07:37,  wrote:
>> > Did you go through and check that there is nothing this information
>> > can already get derived from? I can't immediately point you at
>> > anything, but it feels like there should. 
>> >
>> Indeed. And if there isn't and we need to do add our own flagging,
>> isn't there a better way and place where to put it (e.g., what Juergen
>> and Andrew are hinting at)?
> 
> I prefer that to derive whether guest boot finishes is neither limited to a
> specific domain type (pv, hvm, pvhvm or pvh) nor a specific arch (i386, 
> x86_64,
> arm32, arm64 or more in the future). Ideally, I would have one or a combo of
> fields belong to "struct domain" but there isn't.

Of course.

> Can we use the field in vcpu[0]?

I don't think so: What if another vCPU was scheduled to run first?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 1/1] xen: move TLB-flush filtering out into populate_physmap during vm creation

2016-09-08 Thread Dongli Zhang
Thank you all very much for the feedback. I will address them in the next
version.

> > Did you go through and check that there is nothing this information
> > can already get derived from? I can't immediately point you at
> > anything, but it feels like there should. 
> >
> Indeed. And if there isn't and we need to do add our own flagging,
> isn't there a better way and place where to put it (e.g., what Juergen
> and Andrew are hinting at)?

I prefer that to derive whether guest boot finishes is neither limited to a
specific domain type (pv, hvm, pvhvm or pvh) nor a specific arch (i386, x86_64,
arm32, arm64 or more in the future). Ideally, I would have one or a combo of
fields belong to "struct domain" but there isn't.

Can we use the field in vcpu[0]? How about domain->vcpu[0]->last_run_time?  The
only line of code updating the field is in function "schedule":

1411prev->last_run_time = now;

I am afraid if there would be any existing (I did not find one) or future
interfaces that can reset the entire vcpu[0] to 0.

Dongli Zhang

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 1/1] xen: move TLB-flush filtering out into populate_physmap during vm creation

2016-09-08 Thread Dario Faggioli
On Thu, 2016-09-08 at 09:36 -0600, Jan Beulich wrote:
> > 
> > > 
> > > > 
> > > > On 08.09.16 at 07:30,  wrote:
> > --- a/xen/include/xen/sched.h
> > +++ b/xen/include/xen/sched.h
> > @@ -474,6 +474,9 @@ struct domain
> >  unsigned int guest_request_enabled   : 1;
> >  unsigned int guest_request_sync  : 1;
> >  } monitor;
> > +
> > +/* set to 1 the first time this domain gets scheduled. */
> > +bool_t already_scheduled;
> Did you go through and check that there is nothing this information
> can already get derived from? I can't immediately point you at
> anything, but it feels like there should. 
>
Indeed. And if there isn't and we need to do add our own flagging,
isn't there a better way and place where to put it (e.g., what Juergen
and Andrew are hinting at)?

> And if indeed there isn't,
> then - to extend on someone else's comments (I think it was Dario)
> - please use plain bool in new additions.
> 
It's Wei that commented about bool-s use in the patch. :-)

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)



signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 1/1] xen: move TLB-flush filtering out into populate_physmap during vm creation

2016-09-08 Thread Jan Beulich
>>> On 08.09.16 at 07:30,  wrote:
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -474,6 +474,9 @@ struct domain
>  unsigned int guest_request_enabled   : 1;
>  unsigned int guest_request_sync  : 1;
>  } monitor;
> +
> +/* set to 1 the first time this domain gets scheduled. */
> +bool_t already_scheduled;

Did you go through and check that there is nothing this information
can already get derived from? I can't immediately point you at
anything, but it feels like there should. And if indeed there isn't,
then - to extend on someone else's comments (I think it was Dario)
- please use plain bool in new additions.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 1/1] xen: move TLB-flush filtering out into populate_physmap during vm creation

2016-09-08 Thread Andrew Cooper
On 08/09/16 15:49, Dario Faggioli wrote:
> On Thu, 2016-09-08 at 13:19 +0200, Juergen Gross wrote:
>> The first scheduling is done via unpausing the domain. Why not
>> setting
>> the flag to true in that path?
>>
> That could be a good idea.
>
> And in general, I'm all for finding a place and/or a state that better
> represents the condition of "setting to run this vcpu for the first
> time" and set the flag there, rather than re-assigning 1 to an
> "already_scheduled" flag each an every time we go through schedule()
> (and not for performance and optimization reason).
>
> Which domain_unpause() (or whatever) do you have in mind precisely?

Domains are created with a single systemcontroller pause refcount, which
must be undone by the use of XEN_DOMCTL_unpausedomain when construction
is complete.

That would be the appropriate place to set such a variable.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 1/1] xen: move TLB-flush filtering out into populate_physmap during vm creation

2016-09-08 Thread Dario Faggioli
On Thu, 2016-09-08 at 13:19 +0200, Juergen Gross wrote:
> The first scheduling is done via unpausing the domain. Why not
> setting
> the flag to true in that path?
> 
That could be a good idea.

And in general, I'm all for finding a place and/or a state that better
represents the condition of "setting to run this vcpu for the first
time" and set the flag there, rather than re-assigning 1 to an
"already_scheduled" flag each an every time we go through schedule()
(and not for performance and optimization reason).

Which domain_unpause() (or whatever) do you have in mind precisely?

Thanks and Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)



signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 1/1] xen: move TLB-flush filtering out into populate_physmap during vm creation

2016-09-08 Thread Ian Jackson
Wei Liu writes ("Re: [PATCH v3 1/1] xen: move TLB-flush filtering out into 
populate_physmap during vm creation"):
> On Thu, Sep 08, 2016 at 01:01:40PM +0200, Dario Faggioli wrote:
> > On Thu, 2016-09-08 at 11:50 +0100, Wei Liu wrote:
> > > On Thu, Sep 08, 2016 at 01:30:03PM +0800, Dongli Zhang wrote:
> > > > +if ( next->domain->already_scheduled == 0 )
> > > > +next->domain->already_scheduled = 1;
> > > > +
> > > Can be simplified by omitting the "if" altogether.  
> > >
> > Are you sure? I mean looking at the cases when the flag is already true
> > (which means, during the life of a domain, basically **always** except
> > a handful of instances after creation), what costs less, a check that
> > is always false, or a write that is always updating a value with its
> > current value?
> 
> Omitting the check certain results in less instructions. And it would
> probably eliminate misses in instruction cache and branch prediction
> logic in the processor.
> 
> In the grand scheme of things, this is a rather minor optimisation, so I
> wouldn't argue strongly for this.

Are we sure we ought to be discussing this in terms of optimisation ?
I doubt it makes any significant difference either way.

But there is a difference in clarity.  I would not normally expect to
see this:

   bool x;

   ...

   if (!x)
   x = 1;

If I saw that I would wonder if the programmer was confused, or
whether I was missing something.

Looking at it without the benefit of the definition of x, it looks
more like x might be a non-boolean type.

Thanks,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 1/1] xen: move TLB-flush filtering out into populate_physmap during vm creation

2016-09-08 Thread Juergen Gross
On 08/09/16 13:11, Wei Liu wrote:
> On Thu, Sep 08, 2016 at 01:01:40PM +0200, Dario Faggioli wrote:
>> On Thu, 2016-09-08 at 11:50 +0100, Wei Liu wrote:
>>> On Thu, Sep 08, 2016 at 01:30:03PM +0800, Dongli Zhang wrote:
  
 diff --git a/xen/common/schedule.c b/xen/common/schedule.c
 index 32a300f..593541a 100644
 --- a/xen/common/schedule.c
 +++ b/xen/common/schedule.c
 @@ -1376,6 +1376,11 @@ static void schedule(void)
  
  next = next_slice.task;
  
 +/* Set already_scheduled to 1 when this domain gets scheduled
 for the
 + * first time */
 +if ( next->domain->already_scheduled == 0 )
 +next->domain->already_scheduled = 1;
 +
>>> Can be simplified by omitting the "if" altogether.  
>>>
>> Are you sure? I mean looking at the cases when the flag is already true
>> (which means, during the life of a domain, basically **always** except
>> a handful of instances after creation), what costs less, a check that
>> is always false, or a write that is always updating a value with its
>> current value?
> 
> Omitting the check certain results in less instructions. And it would
> probably eliminate misses in instruction cache and branch prediction
> logic in the processor.

The first scheduling is done via unpausing the domain. Why not setting
the flag to true in that path?

Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 1/1] xen: move TLB-flush filtering out into populate_physmap during vm creation

2016-09-08 Thread Wei Liu
On Thu, Sep 08, 2016 at 01:01:40PM +0200, Dario Faggioli wrote:
> On Thu, 2016-09-08 at 11:50 +0100, Wei Liu wrote:
> > On Thu, Sep 08, 2016 at 01:30:03PM +0800, Dongli Zhang wrote:
> > > 
> > > diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> > > index 32a300f..593541a 100644
> > > --- a/xen/common/schedule.c
> > > +++ b/xen/common/schedule.c
> > > @@ -1376,6 +1376,11 @@ static void schedule(void)
> > >  
> > >  next = next_slice.task;
> > >  
> > > +/* Set already_scheduled to 1 when this domain gets scheduled
> > > for the
> > > + * first time */
> > > +if ( next->domain->already_scheduled == 0 )
> > > +next->domain->already_scheduled = 1;
> > > +
> > Can be simplified by omitting the "if" altogether.  
> >
> Are you sure? I mean looking at the cases when the flag is already true
> (which means, during the life of a domain, basically **always** except
> a handful of instances after creation), what costs less, a check that
> is always false, or a write that is always updating a value with its
> current value?

Omitting the check certain results in less instructions. And it would
probably eliminate misses in instruction cache and branch prediction
logic in the processor.

In the grand scheme of things, this is a rather minor optimisation, so I
wouldn't argue strongly for this.

Wei.

> 
> And I'm not being ironic or anything, I honestly am not sure and this
> is a genuine question.
> 
> > And use "true" here.
> > 
> Yeah, or just:
> 
>  if ( unlikely(!next->domain->already_scheduled) )
>      ...
> 
> > Wei.
> -- 
> <> (Raistlin Majere)
> -
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)
> 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 1/1] xen: move TLB-flush filtering out into populate_physmap during vm creation

2016-09-08 Thread Dario Faggioli
On Thu, 2016-09-08 at 11:50 +0100, Wei Liu wrote:
> On Thu, Sep 08, 2016 at 01:30:03PM +0800, Dongli Zhang wrote:
> > 
> > diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> > index 32a300f..593541a 100644
> > --- a/xen/common/schedule.c
> > +++ b/xen/common/schedule.c
> > @@ -1376,6 +1376,11 @@ static void schedule(void)
> >  
> >  next = next_slice.task;
> >  
> > +/* Set already_scheduled to 1 when this domain gets scheduled
> > for the
> > + * first time */
> > +if ( next->domain->already_scheduled == 0 )
> > +next->domain->already_scheduled = 1;
> > +
> Can be simplified by omitting the "if" altogether.  
>
Are you sure? I mean looking at the cases when the flag is already true
(which means, during the life of a domain, basically **always** except
a handful of instances after creation), what costs less, a check that
is always false, or a write that is always updating a value with its
current value?

And I'm not being ironic or anything, I honestly am not sure and this
is a genuine question.

> And use "true" here.
> 
Yeah, or just:

 if ( unlikely(!next->domain->already_scheduled) )
     ...

> Wei.
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)



signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 1/1] xen: move TLB-flush filtering out into populate_physmap during vm creation

2016-09-08 Thread Wei Liu
On Thu, Sep 08, 2016 at 01:30:03PM +0800, Dongli Zhang wrote:
> This patch implemented parts of TODO left in commit id
> a902c12ee45fc9389eb8fe54eeddaf267a555c58. It moved TLB-flush filtering out
> into populate_physmap. Because of TLB-flush in alloc_heap_pages, it's very
> slow to create a guest with memory size of more than 100GB on host with
> 100+ cpus.
> 
> This patch introduced a "MEMF_no_tlbflush" bit to memflags to indicate
> whether TLB-flush should be done in alloc_heap_pages or its caller
> populate_physmap. Once this bit is set in memflags, alloc_heap_pages will
> ignore TLB-flush. To use this bit after vm is created might lead to
> security issue, that is, this would make pages accessible to the guest B,
> when guest A may still have a cached mapping to them.
> 
> Therefore, this patch also introduced a "already_scheduled" field to struct
> domain to indicate whether this domain has ever got scheduled by
> hypervisor.  MEMF_no_tlbflush can be set only during vm creation phase when
> already_scheduled is still 0 before this domain gets scheduled for the
> first time.
> 
> TODO: ballooning very huge amount of memory cannot benefit from this patch
> and might still be slow.
> 
> Signed-off-by: Dongli Zhang 
> 
> ---
> Changed since v2:
>   * Limit this optimization to domain creation time.
> 
> ---
>  xen/common/domain.c |  2 ++
>  xen/common/memory.c | 33 +
>  xen/common/page_alloc.c |  3 ++-
>  xen/common/schedule.c   |  5 +
>  xen/include/xen/mm.h|  2 ++
>  xen/include/xen/sched.h |  3 +++
>  6 files changed, 47 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index a8804e4..611a471 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -303,6 +303,8 @@ struct domain *domain_create(domid_t domid, unsigned int 
> domcr_flags,
>  if ( !zalloc_cpumask_var(>domain_dirty_cpumask) )
>  goto fail;
>  
> +d->already_scheduled = 0;
> +

Use false please -- this is a bool_t.

[...]
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index 32a300f..593541a 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -1376,6 +1376,11 @@ static void schedule(void)
>  
>  next = next_slice.task;
>  
> +/* Set already_scheduled to 1 when this domain gets scheduled for the
> + * first time */
> +if ( next->domain->already_scheduled == 0 )
> +next->domain->already_scheduled = 1;
> +

Can be simplified by omitting the "if" altogether.  And use "true" here.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 1/1] xen: move TLB-flush filtering out into populate_physmap during vm creation

2016-09-08 Thread Dario Faggioli
On Thu, 2016-09-08 at 13:30 +0800, Dongli Zhang wrote:
> diff --git a/xen/common/memory.c b/xen/common/memory.c
> index f34dd56..3641469 100644
> @@ -150,6 +152,12 @@ static void populate_physmap(struct memop_args
> *a)
>  max_order(curr_d)) )
>  return;
>  
> +/* MEMF_no_tlbflush can be set only during vm creation phase
> when
> + * already_scheduled is still 0 before this domain gets
> scheduled for
> + * the first time. */
>
/*
 * Comment style for multi line comments in Xen
 * includes the 'wings'. :-)
 */

Yes, I know there's some inconsistency in this file (and in many others
:-/), but still.

> +if ( d->already_scheduled == 0 )
>
unlikely() maybe?

> +a->memflags |= MEMF_no_tlbflush;
> +
>  for ( i = a->nr_done; i < a->nr_extents; i++ )
>  {
>  if ( i != a->nr_done && hypercall_preempt_check() )
> @@ -214,6 +222,21 @@ static void populate_physmap(struct memop_args
> *a)
>  goto out;
>  }
>  
> +if ( d->already_scheduled == 0 )
> +{
> +for ( j = 0; j < (1U << a->extent_order); j++ )
> +{
> +if ( page[j].u.free.need_tlbflush &&
> + (page[j].tlbflush_timestamp <=
> tlbflush_current_time()) &&
> + (!need_tlbflush ||
> + (page[j].tlbflush_timestamp >
> tlbflush_timestamp)) )
>
This check is long, complicated to read (at least to a non TLBflush
guru), and also appear twice.. can it be put in an inline function with
a talking name?

Oh, and I think you don't need the parenthesis around these twos:

 (page[j].tlbflush_timestamp <= tlbflush_current_time())
 (page[j].tlbflush_timestamp > tlbflush_timestamp)

> +{
> +need_tlbflush = 1;
> +tlbflush_timestamp =
> page[j].tlbflush_timestamp;
> +}
> +}
> +}
> +
>  mfn = page_to_mfn(page);
>  }

> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index 32a300f..593541a 100644
> @@ -1376,6 +1376,11 @@ static void schedule(void)
>  
>  next = next_slice.task;
>  
> +/* Set already_scheduled to 1 when this domain gets scheduled
> for the
> + * first time */
>
Wings again.

And, about the content, it's already clear from the code that this gets
set when a vcpu of a domain is scheduled. What we want here is a
_quick_ explanation of why we need the scheduler to record this
information.

> +if ( next->domain->already_scheduled == 0 )
>
unlikely() (and here I'm sure :-)).

> +next->domain->already_scheduled = 1;
> +
>
And, finally, I'd move this toward the bottom of the function, outside
of the pcpu_schedule_lock() critical section, e.g., around the call to
vcpu_periodic_timer_work(next);

>  sd->curr = next;
>  
>  if ( next_slice.time >= 0 ) /* -ve means no limit */

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)



signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3 1/1] xen: move TLB-flush filtering out into populate_physmap during vm creation

2016-09-07 Thread Dongli Zhang
This patch implemented parts of TODO left in commit id
a902c12ee45fc9389eb8fe54eeddaf267a555c58. It moved TLB-flush filtering out
into populate_physmap. Because of TLB-flush in alloc_heap_pages, it's very
slow to create a guest with memory size of more than 100GB on host with
100+ cpus.

This patch introduced a "MEMF_no_tlbflush" bit to memflags to indicate
whether TLB-flush should be done in alloc_heap_pages or its caller
populate_physmap. Once this bit is set in memflags, alloc_heap_pages will
ignore TLB-flush. To use this bit after vm is created might lead to
security issue, that is, this would make pages accessible to the guest B,
when guest A may still have a cached mapping to them.

Therefore, this patch also introduced a "already_scheduled" field to struct
domain to indicate whether this domain has ever got scheduled by
hypervisor.  MEMF_no_tlbflush can be set only during vm creation phase when
already_scheduled is still 0 before this domain gets scheduled for the
first time.

TODO: ballooning very huge amount of memory cannot benefit from this patch
and might still be slow.

Signed-off-by: Dongli Zhang 

---
Changed since v2:
  * Limit this optimization to domain creation time.

---
 xen/common/domain.c |  2 ++
 xen/common/memory.c | 33 +
 xen/common/page_alloc.c |  3 ++-
 xen/common/schedule.c   |  5 +
 xen/include/xen/mm.h|  2 ++
 xen/include/xen/sched.h |  3 +++
 6 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index a8804e4..611a471 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -303,6 +303,8 @@ struct domain *domain_create(domid_t domid, unsigned int 
domcr_flags,
 if ( !zalloc_cpumask_var(>domain_dirty_cpumask) )
 goto fail;
 
+d->already_scheduled = 0;
+
 if ( domcr_flags & DOMCRF_hvm )
 d->guest_type = guest_type_hvm;
 else if ( domcr_flags & DOMCRF_pvh )
diff --git a/xen/common/memory.c b/xen/common/memory.c
index f34dd56..3641469 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -141,6 +141,8 @@ static void populate_physmap(struct memop_args *a)
 unsigned int i, j;
 xen_pfn_t gpfn, mfn;
 struct domain *d = a->domain, *curr_d = current->domain;
+bool_t need_tlbflush = 0;
+uint32_t tlbflush_timestamp = 0;
 
 if ( !guest_handle_subrange_okay(a->extent_list, a->nr_done,
  a->nr_extents-1) )
@@ -150,6 +152,12 @@ static void populate_physmap(struct memop_args *a)
 max_order(curr_d)) )
 return;
 
+/* MEMF_no_tlbflush can be set only during vm creation phase when
+ * already_scheduled is still 0 before this domain gets scheduled for
+ * the first time. */
+if ( d->already_scheduled == 0 )
+a->memflags |= MEMF_no_tlbflush;
+
 for ( i = a->nr_done; i < a->nr_extents; i++ )
 {
 if ( i != a->nr_done && hypercall_preempt_check() )
@@ -214,6 +222,21 @@ static void populate_physmap(struct memop_args *a)
 goto out;
 }
 
+if ( d->already_scheduled == 0 )
+{
+for ( j = 0; j < (1U << a->extent_order); j++ )
+{
+if ( page[j].u.free.need_tlbflush &&
+ (page[j].tlbflush_timestamp <= 
tlbflush_current_time()) &&
+ (!need_tlbflush ||
+ (page[j].tlbflush_timestamp > 
tlbflush_timestamp)) )
+{
+need_tlbflush = 1;
+tlbflush_timestamp = page[j].tlbflush_timestamp;
+}
+}
+}
+
 mfn = page_to_mfn(page);
 }
 
@@ -232,6 +255,16 @@ static void populate_physmap(struct memop_args *a)
 }
 
 out:
+if ( need_tlbflush )
+{
+cpumask_t mask = cpu_online_map;
+tlbflush_filter(mask, tlbflush_timestamp);
+if ( !cpumask_empty() )
+{
+perfc_incr(need_flush_tlb_flush);
+flush_tlb_mask();
+}
+}
 a->nr_done = i;
 }
 
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 18ff6cf..e0283fc 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -827,7 +827,8 @@ static struct page_info *alloc_heap_pages(
 BUG_ON(pg[i].count_info != PGC_state_free);
 pg[i].count_info = PGC_state_inuse;
 
-if ( pg[i].u.free.need_tlbflush &&
+if ( !(memflags & MEMF_no_tlbflush) &&
+ pg[i].u.free.need_tlbflush &&
  (pg[i].tlbflush_timestamp <= tlbflush_current_time()) &&
  (!need_tlbflush ||
   (pg[i].tlbflush_timestamp > tlbflush_timestamp)) )
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 32a300f..593541a 100644
--- a/xen/common/schedule.c
+++