Re: [PATCH V3 RFC 2/2] kvm: Handle yield_to failure return code for potential undercommit case

2012-12-07 Thread Marcelo Tosatti
On Thu, Dec 06, 2012 at 12:29:02PM +0530, Raghavendra K T wrote:
> On 12/04/2012 01:26 AM, Marcelo Tosatti wrote:
> >On Wed, Nov 28, 2012 at 10:40:56AM +0530, Raghavendra K T wrote:
> >>On 11/28/2012 06:42 AM, Marcelo Tosatti wrote:
> >>>
> >>>Don't understand the reasoning behind why 3 is a good choice.
> >>
> >>Here is where I came from. (explaining from scratch for
> >>completeness, forgive me :))
> >>In moderate overcommits, we can falsely exit from ple handler even when
> >>we have preempted task of same VM waiting on other cpus. To reduce this
> >>problem, we try few times before exiting.
> >>The problem boils down to:
> >>what is the probability that we exit ple handler even when we have more
> >>than 1 task in other cpus. Theoretical worst case should be around 1.5x
> >>overcommit (As also pointed by Andrew Theurer). [But practical
> >>worstcase may be around 2x,3x overcommits as indicated by the results
> >>for the patch series]
> >>
> >>So if p is the probability of finding rq length one on a particular cpu,
> >>and if we do n tries, then probability of exiting ple handler is:
> >>
> >>  p^(n+1) [ because we would have come across one source with rq length
> >>1 and n target cpu rqs  with length 1 ]
> >>
> >>so
> >>num tries: probability of aborting ple handler (1.5x overcommit)
> >>  1 1/4
> >>  2 1/8
> >>  3 1/16
> >>
> >>We can increase this probability with more tries, but the problem is
> >>the overhead.
> >>Also, If we have tried three times that means we would have iterated
> >>over 3 good eligible vcpus along with many non-eligible candidates. In
> >>worst case if we iterate all the vcpus, we reduce 1x performance and
> >>overcommit performance get hit. [ as in results ].
> >>
> >>I have tried num_tries = 1,2,3 and n already ( not 4 yet). So I
> >>concluded 3 is enough.
> >>
> >>Infact I have also run kernbench and hackbench which are giving 5-20%
> >>improvement.
> >>
> >>[ As a side note , I also thought how about having num_tries = f(n) =
> >>ceil ( log(num_online_cpus)/2 ) But I thought calculation is too much
> >>overhead and also there is no point in probably making it dependent on
> >>online cpus ]
> >>
> >>Please let me know if you are happy with this rationale/ or correct me
> >>if you foresee some problem. (Infact Avi, Rik's concern about false
> >>exiting made me arrive at 'try' logic which I did not have earlier).
> >>
> >>I am currently trying out the result for 1.5x overcommit will post the
> >>result.
> >
> >Raghavendra
> >
> >Makes sense to me. Thanks.
> >
> 
> Marcelo,
> Do you think this can be considered for next merge window? or you are
> expecting anything else on this patchset.

Nope, not expecting anything else. About merge window, depends on
upstream.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 RFC 2/2] kvm: Handle yield_to failure return code for potential undercommit case

2012-12-07 Thread Marcelo Tosatti
On Thu, Dec 06, 2012 at 12:29:02PM +0530, Raghavendra K T wrote:
 On 12/04/2012 01:26 AM, Marcelo Tosatti wrote:
 On Wed, Nov 28, 2012 at 10:40:56AM +0530, Raghavendra K T wrote:
 On 11/28/2012 06:42 AM, Marcelo Tosatti wrote:
 
 Don't understand the reasoning behind why 3 is a good choice.
 
 Here is where I came from. (explaining from scratch for
 completeness, forgive me :))
 In moderate overcommits, we can falsely exit from ple handler even when
 we have preempted task of same VM waiting on other cpus. To reduce this
 problem, we try few times before exiting.
 The problem boils down to:
 what is the probability that we exit ple handler even when we have more
 than 1 task in other cpus. Theoretical worst case should be around 1.5x
 overcommit (As also pointed by Andrew Theurer). [But practical
 worstcase may be around 2x,3x overcommits as indicated by the results
 for the patch series]
 
 So if p is the probability of finding rq length one on a particular cpu,
 and if we do n tries, then probability of exiting ple handler is:
 
   p^(n+1) [ because we would have come across one source with rq length
 1 and n target cpu rqs  with length 1 ]
 
 so
 num tries: probability of aborting ple handler (1.5x overcommit)
   1 1/4
   2 1/8
   3 1/16
 
 We can increase this probability with more tries, but the problem is
 the overhead.
 Also, If we have tried three times that means we would have iterated
 over 3 good eligible vcpus along with many non-eligible candidates. In
 worst case if we iterate all the vcpus, we reduce 1x performance and
 overcommit performance get hit. [ as in results ].
 
 I have tried num_tries = 1,2,3 and n already ( not 4 yet). So I
 concluded 3 is enough.
 
 Infact I have also run kernbench and hackbench which are giving 5-20%
 improvement.
 
 [ As a side note , I also thought how about having num_tries = f(n) =
 ceil ( log(num_online_cpus)/2 ) But I thought calculation is too much
 overhead and also there is no point in probably making it dependent on
 online cpus ]
 
 Please let me know if you are happy with this rationale/ or correct me
 if you foresee some problem. (Infact Avi, Rik's concern about false
 exiting made me arrive at 'try' logic which I did not have earlier).
 
 I am currently trying out the result for 1.5x overcommit will post the
 result.
 
 Raghavendra
 
 Makes sense to me. Thanks.
 
 
 Marcelo,
 Do you think this can be considered for next merge window? or you are
 expecting anything else on this patchset.

Nope, not expecting anything else. About merge window, depends on
upstream.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 RFC 2/2] kvm: Handle yield_to failure return code for potential undercommit case

2012-12-05 Thread Raghavendra K T

On 12/04/2012 01:26 AM, Marcelo Tosatti wrote:

On Wed, Nov 28, 2012 at 10:40:56AM +0530, Raghavendra K T wrote:

On 11/28/2012 06:42 AM, Marcelo Tosatti wrote:


Don't understand the reasoning behind why 3 is a good choice.


Here is where I came from. (explaining from scratch for
completeness, forgive me :))
In moderate overcommits, we can falsely exit from ple handler even when
we have preempted task of same VM waiting on other cpus. To reduce this
problem, we try few times before exiting.
The problem boils down to:
what is the probability that we exit ple handler even when we have more
than 1 task in other cpus. Theoretical worst case should be around 1.5x
overcommit (As also pointed by Andrew Theurer). [But practical
worstcase may be around 2x,3x overcommits as indicated by the results
for the patch series]

So if p is the probability of finding rq length one on a particular cpu,
and if we do n tries, then probability of exiting ple handler is:

  p^(n+1) [ because we would have come across one source with rq length
1 and n target cpu rqs  with length 1 ]

so
num tries: probability of aborting ple handler (1.5x overcommit)
  1 1/4
  2 1/8
  3 1/16

We can increase this probability with more tries, but the problem is
the overhead.
Also, If we have tried three times that means we would have iterated
over 3 good eligible vcpus along with many non-eligible candidates. In
worst case if we iterate all the vcpus, we reduce 1x performance and
overcommit performance get hit. [ as in results ].

I have tried num_tries = 1,2,3 and n already ( not 4 yet). So I
concluded 3 is enough.

Infact I have also run kernbench and hackbench which are giving 5-20%
improvement.

[ As a side note , I also thought how about having num_tries = f(n) =
ceil ( log(num_online_cpus)/2 ) But I thought calculation is too much
overhead and also there is no point in probably making it dependent on
online cpus ]

Please let me know if you are happy with this rationale/ or correct me
if you foresee some problem. (Infact Avi, Rik's concern about false
exiting made me arrive at 'try' logic which I did not have earlier).

I am currently trying out the result for 1.5x overcommit will post the
result.


Raghavendra

Makes sense to me. Thanks.



Marcelo,
Do you think this can be considered for next merge window? or you are
expecting anything else on this patchset.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 RFC 2/2] kvm: Handle yield_to failure return code for potential undercommit case

2012-12-05 Thread Raghavendra K T

On 12/04/2012 01:26 AM, Marcelo Tosatti wrote:

On Wed, Nov 28, 2012 at 10:40:56AM +0530, Raghavendra K T wrote:

On 11/28/2012 06:42 AM, Marcelo Tosatti wrote:


Don't understand the reasoning behind why 3 is a good choice.


Here is where I came from. (explaining from scratch for
completeness, forgive me :))
In moderate overcommits, we can falsely exit from ple handler even when
we have preempted task of same VM waiting on other cpus. To reduce this
problem, we try few times before exiting.
The problem boils down to:
what is the probability that we exit ple handler even when we have more
than 1 task in other cpus. Theoretical worst case should be around 1.5x
overcommit (As also pointed by Andrew Theurer). [But practical
worstcase may be around 2x,3x overcommits as indicated by the results
for the patch series]

So if p is the probability of finding rq length one on a particular cpu,
and if we do n tries, then probability of exiting ple handler is:

  p^(n+1) [ because we would have come across one source with rq length
1 and n target cpu rqs  with length 1 ]

so
num tries: probability of aborting ple handler (1.5x overcommit)
  1 1/4
  2 1/8
  3 1/16

We can increase this probability with more tries, but the problem is
the overhead.
Also, If we have tried three times that means we would have iterated
over 3 good eligible vcpus along with many non-eligible candidates. In
worst case if we iterate all the vcpus, we reduce 1x performance and
overcommit performance get hit. [ as in results ].

I have tried num_tries = 1,2,3 and n already ( not 4 yet). So I
concluded 3 is enough.

Infact I have also run kernbench and hackbench which are giving 5-20%
improvement.

[ As a side note , I also thought how about having num_tries = f(n) =
ceil ( log(num_online_cpus)/2 ) But I thought calculation is too much
overhead and also there is no point in probably making it dependent on
online cpus ]

Please let me know if you are happy with this rationale/ or correct me
if you foresee some problem. (Infact Avi, Rik's concern about false
exiting made me arrive at 'try' logic which I did not have earlier).

I am currently trying out the result for 1.5x overcommit will post the
result.


Raghavendra

Makes sense to me. Thanks.



Marcelo,
Do you think this can be considered for next merge window? or you are
expecting anything else on this patchset.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 RFC 2/2] kvm: Handle yield_to failure return code for potential undercommit case

2012-12-04 Thread Raghavendra K T

On 12/04/2012 01:26 AM, Marcelo Tosatti wrote:

On Wed, Nov 28, 2012 at 10:40:56AM +0530, Raghavendra K T wrote:

On 11/28/2012 06:42 AM, Marcelo Tosatti wrote:


Don't understand the reasoning behind why 3 is a good choice.


Here is where I came from. (explaining from scratch for
completeness, forgive me :))
In moderate overcommits, we can falsely exit from ple handler even when
we have preempted task of same VM waiting on other cpus. To reduce this
problem, we try few times before exiting.
The problem boils down to:
what is the probability that we exit ple handler even when we have more
than 1 task in other cpus. Theoretical worst case should be around 1.5x
overcommit (As also pointed by Andrew Theurer). [But practical
worstcase may be around 2x,3x overcommits as indicated by the results
for the patch series]

So if p is the probability of finding rq length one on a particular cpu,
and if we do n tries, then probability of exiting ple handler is:

  p^(n+1) [ because we would have come across one source with rq length
1 and n target cpu rqs  with length 1 ]

so
num tries: probability of aborting ple handler (1.5x overcommit)
  1 1/4
  2 1/8
  3 1/16

We can increase this probability with more tries, but the problem is
the overhead.
Also, If we have tried three times that means we would have iterated
over 3 good eligible vcpus along with many non-eligible candidates. In
worst case if we iterate all the vcpus, we reduce 1x performance and
overcommit performance get hit. [ as in results ].

I have tried num_tries = 1,2,3 and n already ( not 4 yet). So I
concluded 3 is enough.

Infact I have also run kernbench and hackbench which are giving 5-20%
improvement.

[ As a side note , I also thought how about having num_tries = f(n) =
ceil ( log(num_online_cpus)/2 ) But I thought calculation is too much
overhead and also there is no point in probably making it dependent on
online cpus ]

Please let me know if you are happy with this rationale/ or correct me
if you foresee some problem. (Infact Avi, Rik's concern about false
exiting made me arrive at 'try' logic which I did not have earlier).

I am currently trying out the result for 1.5x overcommit will post the
result.


Raghavendra

Makes sense to me. Thanks.



Hi Marcelo, Thanks for looking into patches.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 RFC 2/2] kvm: Handle yield_to failure return code for potential undercommit case

2012-12-04 Thread Raghavendra K T

On 12/04/2012 01:26 AM, Marcelo Tosatti wrote:

On Wed, Nov 28, 2012 at 10:40:56AM +0530, Raghavendra K T wrote:

On 11/28/2012 06:42 AM, Marcelo Tosatti wrote:


Don't understand the reasoning behind why 3 is a good choice.


Here is where I came from. (explaining from scratch for
completeness, forgive me :))
In moderate overcommits, we can falsely exit from ple handler even when
we have preempted task of same VM waiting on other cpus. To reduce this
problem, we try few times before exiting.
The problem boils down to:
what is the probability that we exit ple handler even when we have more
than 1 task in other cpus. Theoretical worst case should be around 1.5x
overcommit (As also pointed by Andrew Theurer). [But practical
worstcase may be around 2x,3x overcommits as indicated by the results
for the patch series]

So if p is the probability of finding rq length one on a particular cpu,
and if we do n tries, then probability of exiting ple handler is:

  p^(n+1) [ because we would have come across one source with rq length
1 and n target cpu rqs  with length 1 ]

so
num tries: probability of aborting ple handler (1.5x overcommit)
  1 1/4
  2 1/8
  3 1/16

We can increase this probability with more tries, but the problem is
the overhead.
Also, If we have tried three times that means we would have iterated
over 3 good eligible vcpus along with many non-eligible candidates. In
worst case if we iterate all the vcpus, we reduce 1x performance and
overcommit performance get hit. [ as in results ].

I have tried num_tries = 1,2,3 and n already ( not 4 yet). So I
concluded 3 is enough.

Infact I have also run kernbench and hackbench which are giving 5-20%
improvement.

[ As a side note , I also thought how about having num_tries = f(n) =
ceil ( log(num_online_cpus)/2 ) But I thought calculation is too much
overhead and also there is no point in probably making it dependent on
online cpus ]

Please let me know if you are happy with this rationale/ or correct me
if you foresee some problem. (Infact Avi, Rik's concern about false
exiting made me arrive at 'try' logic which I did not have earlier).

I am currently trying out the result for 1.5x overcommit will post the
result.


Raghavendra

Makes sense to me. Thanks.



Hi Marcelo, Thanks for looking into patches.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 RFC 2/2] kvm: Handle yield_to failure return code for potential undercommit case

2012-12-03 Thread Marcelo Tosatti
On Wed, Nov 28, 2012 at 10:40:56AM +0530, Raghavendra K T wrote:
> On 11/28/2012 06:42 AM, Marcelo Tosatti wrote:
> >
> >Don't understand the reasoning behind why 3 is a good choice.
> 
> Here is where I came from. (explaining from scratch for
> completeness, forgive me :))
> In moderate overcommits, we can falsely exit from ple handler even when
> we have preempted task of same VM waiting on other cpus. To reduce this
> problem, we try few times before exiting.
> The problem boils down to:
> what is the probability that we exit ple handler even when we have more
> than 1 task in other cpus. Theoretical worst case should be around 1.5x
> overcommit (As also pointed by Andrew Theurer). [But practical
> worstcase may be around 2x,3x overcommits as indicated by the results
> for the patch series]
> 
> So if p is the probability of finding rq length one on a particular cpu,
> and if we do n tries, then probability of exiting ple handler is:
> 
>  p^(n+1) [ because we would have come across one source with rq length
> 1 and n target cpu rqs  with length 1 ]
> 
> so
> num tries: probability of aborting ple handler (1.5x overcommit)
>  1 1/4
>  2 1/8
>  3 1/16
> 
> We can increase this probability with more tries, but the problem is
> the overhead.
> Also, If we have tried three times that means we would have iterated
> over 3 good eligible vcpus along with many non-eligible candidates. In
> worst case if we iterate all the vcpus, we reduce 1x performance and
> overcommit performance get hit. [ as in results ].
> 
> I have tried num_tries = 1,2,3 and n already ( not 4 yet). So I
> concluded 3 is enough.
> 
> Infact I have also run kernbench and hackbench which are giving 5-20%
> improvement.
> 
> [ As a side note , I also thought how about having num_tries = f(n) =
> ceil ( log(num_online_cpus)/2 ) But I thought calculation is too much
> overhead and also there is no point in probably making it dependent on
> online cpus ]
> 
> Please let me know if you are happy with this rationale/ or correct me
> if you foresee some problem. (Infact Avi, Rik's concern about false
> exiting made me arrive at 'try' logic which I did not have earlier).
> 
> I am currently trying out the result for 1.5x overcommit will post the
> result.

Raghavendra

Makes sense to me. Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 RFC 2/2] kvm: Handle yield_to failure return code for potential undercommit case

2012-12-03 Thread Marcelo Tosatti
On Wed, Nov 28, 2012 at 10:40:56AM +0530, Raghavendra K T wrote:
 On 11/28/2012 06:42 AM, Marcelo Tosatti wrote:
 
 Don't understand the reasoning behind why 3 is a good choice.
 
 Here is where I came from. (explaining from scratch for
 completeness, forgive me :))
 In moderate overcommits, we can falsely exit from ple handler even when
 we have preempted task of same VM waiting on other cpus. To reduce this
 problem, we try few times before exiting.
 The problem boils down to:
 what is the probability that we exit ple handler even when we have more
 than 1 task in other cpus. Theoretical worst case should be around 1.5x
 overcommit (As also pointed by Andrew Theurer). [But practical
 worstcase may be around 2x,3x overcommits as indicated by the results
 for the patch series]
 
 So if p is the probability of finding rq length one on a particular cpu,
 and if we do n tries, then probability of exiting ple handler is:
 
  p^(n+1) [ because we would have come across one source with rq length
 1 and n target cpu rqs  with length 1 ]
 
 so
 num tries: probability of aborting ple handler (1.5x overcommit)
  1 1/4
  2 1/8
  3 1/16
 
 We can increase this probability with more tries, but the problem is
 the overhead.
 Also, If we have tried three times that means we would have iterated
 over 3 good eligible vcpus along with many non-eligible candidates. In
 worst case if we iterate all the vcpus, we reduce 1x performance and
 overcommit performance get hit. [ as in results ].
 
 I have tried num_tries = 1,2,3 and n already ( not 4 yet). So I
 concluded 3 is enough.
 
 Infact I have also run kernbench and hackbench which are giving 5-20%
 improvement.
 
 [ As a side note , I also thought how about having num_tries = f(n) =
 ceil ( log(num_online_cpus)/2 ) But I thought calculation is too much
 overhead and also there is no point in probably making it dependent on
 online cpus ]
 
 Please let me know if you are happy with this rationale/ or correct me
 if you foresee some problem. (Infact Avi, Rik's concern about false
 exiting made me arrive at 'try' logic which I did not have earlier).
 
 I am currently trying out the result for 1.5x overcommit will post the
 result.

Raghavendra

Makes sense to me. Thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 RFC 2/2] kvm: Handle yield_to failure return code for potential undercommit case

2012-11-29 Thread Raghavendra K T

On 11/29/2012 05:46 PM, Gleb Natapov wrote:

On Wed, Nov 28, 2012 at 10:40:56AM +0530, Raghavendra K T wrote:

On 11/28/2012 06:42 AM, Marcelo Tosatti wrote:


Don't understand the reasoning behind why 3 is a good choice.


Here is where I came from. (explaining from scratch for
completeness, forgive me :))
In moderate overcommits, we can falsely exit from ple handler even when
we have preempted task of same VM waiting on other cpus. To reduce this
problem, we try few times before exiting.
The problem boils down to:
what is the probability that we exit ple handler even when we have more
than 1 task in other cpus. Theoretical worst case should be around 1.5x
overcommit (As also pointed by Andrew Theurer). [But practical
worstcase may be around 2x,3x overcommits as indicated by the results
for the patch series]

So if p is the probability of finding rq length one on a particular cpu,
and if we do n tries, then probability of exiting ple handler is:

  p^(n+1) [ because we would have come across one source with rq length
1 and n target cpu rqs  with length 1 ]

so
num tries: probability of aborting ple handler (1.5x overcommit)
  1 1/4
  2 1/8
  3 1/16

We can increase this probability with more tries, but the problem is
the overhead.

IIRC Avi (again) had an idea to track vcpu preemption. When vcpu thread
is preempted we do kvm->preempted_vcpus++, when it runs again we do
kvm->preempted_vcpus--. PLE handler can try harder if kvm->preempted_vcpus
is big or do not try at all if it is zero.


Thanks for the reply Gleb.

Yes.. It was on my next TODO as you know and it make sense to weigh all 
these approaches (undercommit patches/throttled yield/preempt
notifier/pvspinlock and their combination) to good extent before going 
further. I am happy if these patches are now in 'good shape to compare'

state. (same reason I had posted dynamic PLE appaoch too).




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 RFC 2/2] kvm: Handle yield_to failure return code for potential undercommit case

2012-11-29 Thread Gleb Natapov
On Wed, Nov 28, 2012 at 10:40:56AM +0530, Raghavendra K T wrote:
> On 11/28/2012 06:42 AM, Marcelo Tosatti wrote:
> >
> >Don't understand the reasoning behind why 3 is a good choice.
> 
> Here is where I came from. (explaining from scratch for
> completeness, forgive me :))
> In moderate overcommits, we can falsely exit from ple handler even when
> we have preempted task of same VM waiting on other cpus. To reduce this
> problem, we try few times before exiting.
> The problem boils down to:
> what is the probability that we exit ple handler even when we have more
> than 1 task in other cpus. Theoretical worst case should be around 1.5x
> overcommit (As also pointed by Andrew Theurer). [But practical
> worstcase may be around 2x,3x overcommits as indicated by the results
> for the patch series]
> 
> So if p is the probability of finding rq length one on a particular cpu,
> and if we do n tries, then probability of exiting ple handler is:
> 
>  p^(n+1) [ because we would have come across one source with rq length
> 1 and n target cpu rqs  with length 1 ]
> 
> so
> num tries: probability of aborting ple handler (1.5x overcommit)
>  1 1/4
>  2 1/8
>  3 1/16
> 
> We can increase this probability with more tries, but the problem is
> the overhead.
IIRC Avi (again) had an idea to track vcpu preemption. When vcpu thread
is preempted we do kvm->preempted_vcpus++, when it runs again we do
kvm->preempted_vcpus--. PLE handler can try harder if kvm->preempted_vcpus
is big or do not try at all if it is zero.

> Also, If we have tried three times that means we would have iterated
> over 3 good eligible vcpus along with many non-eligible candidates. In
> worst case if we iterate all the vcpus, we reduce 1x performance and
> overcommit performance get hit. [ as in results ].
> 
> I have tried num_tries = 1,2,3 and n already ( not 4 yet). So I
> concluded 3 is enough.
> 
> Infact I have also run kernbench and hackbench which are giving 5-20%
> improvement.
> 
> [ As a side note , I also thought how about having num_tries = f(n) =
> ceil ( log(num_online_cpus)/2 ) But I thought calculation is too much
> overhead and also there is no point in probably making it dependent on
> online cpus ]
> 
> Please let me know if you are happy with this rationale/ or correct me
> if you foresee some problem. (Infact Avi, Rik's concern about false
> exiting made me arrive at 'try' logic which I did not have earlier).
> 
> I am currently trying out the result for 1.5x overcommit will post the
> result.
> 
> >
> >On Mon, Nov 26, 2012 at 05:38:04PM +0530, Raghavendra K T wrote:
> >>From: Raghavendra K T 
> >>
> >>yield_to returns -ESRCH, When source and target of yield_to
> >>run queue length is one. When we see three successive failures of
> >>yield_to we assume we are in potential undercommit case and abort
> >>from PLE handler.
> >>The assumption is backed by low probability of wrong decision
> >>for even worst case scenarios such as average runqueue length
> >>between 1 and 2.
> >>
> >>note that we do not update last boosted vcpu in failure cases.
> >>Thank Avi for raising question on aborting after first fail from yield_to.
> >>
> >>Reviewed-by: Srikar Dronamraju 
> >>Signed-off-by: Raghavendra K T 
> [...]

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 RFC 2/2] kvm: Handle yield_to failure return code for potential undercommit case

2012-11-29 Thread Gleb Natapov
On Wed, Nov 28, 2012 at 10:40:56AM +0530, Raghavendra K T wrote:
 On 11/28/2012 06:42 AM, Marcelo Tosatti wrote:
 
 Don't understand the reasoning behind why 3 is a good choice.
 
 Here is where I came from. (explaining from scratch for
 completeness, forgive me :))
 In moderate overcommits, we can falsely exit from ple handler even when
 we have preempted task of same VM waiting on other cpus. To reduce this
 problem, we try few times before exiting.
 The problem boils down to:
 what is the probability that we exit ple handler even when we have more
 than 1 task in other cpus. Theoretical worst case should be around 1.5x
 overcommit (As also pointed by Andrew Theurer). [But practical
 worstcase may be around 2x,3x overcommits as indicated by the results
 for the patch series]
 
 So if p is the probability of finding rq length one on a particular cpu,
 and if we do n tries, then probability of exiting ple handler is:
 
  p^(n+1) [ because we would have come across one source with rq length
 1 and n target cpu rqs  with length 1 ]
 
 so
 num tries: probability of aborting ple handler (1.5x overcommit)
  1 1/4
  2 1/8
  3 1/16
 
 We can increase this probability with more tries, but the problem is
 the overhead.
IIRC Avi (again) had an idea to track vcpu preemption. When vcpu thread
is preempted we do kvm-preempted_vcpus++, when it runs again we do
kvm-preempted_vcpus--. PLE handler can try harder if kvm-preempted_vcpus
is big or do not try at all if it is zero.

 Also, If we have tried three times that means we would have iterated
 over 3 good eligible vcpus along with many non-eligible candidates. In
 worst case if we iterate all the vcpus, we reduce 1x performance and
 overcommit performance get hit. [ as in results ].
 
 I have tried num_tries = 1,2,3 and n already ( not 4 yet). So I
 concluded 3 is enough.
 
 Infact I have also run kernbench and hackbench which are giving 5-20%
 improvement.
 
 [ As a side note , I also thought how about having num_tries = f(n) =
 ceil ( log(num_online_cpus)/2 ) But I thought calculation is too much
 overhead and also there is no point in probably making it dependent on
 online cpus ]
 
 Please let me know if you are happy with this rationale/ or correct me
 if you foresee some problem. (Infact Avi, Rik's concern about false
 exiting made me arrive at 'try' logic which I did not have earlier).
 
 I am currently trying out the result for 1.5x overcommit will post the
 result.
 
 
 On Mon, Nov 26, 2012 at 05:38:04PM +0530, Raghavendra K T wrote:
 From: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 
 yield_to returns -ESRCH, When source and target of yield_to
 run queue length is one. When we see three successive failures of
 yield_to we assume we are in potential undercommit case and abort
 from PLE handler.
 The assumption is backed by low probability of wrong decision
 for even worst case scenarios such as average runqueue length
 between 1 and 2.
 
 note that we do not update last boosted vcpu in failure cases.
 Thank Avi for raising question on aborting after first fail from yield_to.
 
 Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 [...]

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 RFC 2/2] kvm: Handle yield_to failure return code for potential undercommit case

2012-11-29 Thread Raghavendra K T

On 11/29/2012 05:46 PM, Gleb Natapov wrote:

On Wed, Nov 28, 2012 at 10:40:56AM +0530, Raghavendra K T wrote:

On 11/28/2012 06:42 AM, Marcelo Tosatti wrote:


Don't understand the reasoning behind why 3 is a good choice.


Here is where I came from. (explaining from scratch for
completeness, forgive me :))
In moderate overcommits, we can falsely exit from ple handler even when
we have preempted task of same VM waiting on other cpus. To reduce this
problem, we try few times before exiting.
The problem boils down to:
what is the probability that we exit ple handler even when we have more
than 1 task in other cpus. Theoretical worst case should be around 1.5x
overcommit (As also pointed by Andrew Theurer). [But practical
worstcase may be around 2x,3x overcommits as indicated by the results
for the patch series]

So if p is the probability of finding rq length one on a particular cpu,
and if we do n tries, then probability of exiting ple handler is:

  p^(n+1) [ because we would have come across one source with rq length
1 and n target cpu rqs  with length 1 ]

so
num tries: probability of aborting ple handler (1.5x overcommit)
  1 1/4
  2 1/8
  3 1/16

We can increase this probability with more tries, but the problem is
the overhead.

IIRC Avi (again) had an idea to track vcpu preemption. When vcpu thread
is preempted we do kvm-preempted_vcpus++, when it runs again we do
kvm-preempted_vcpus--. PLE handler can try harder if kvm-preempted_vcpus
is big or do not try at all if it is zero.


Thanks for the reply Gleb.

Yes.. It was on my next TODO as you know and it make sense to weigh all 
these approaches (undercommit patches/throttled yield/preempt
notifier/pvspinlock and their combination) to good extent before going 
further. I am happy if these patches are now in 'good shape to compare'

state. (same reason I had posted dynamic PLE appaoch too).




--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 RFC 2/2] kvm: Handle yield_to failure return code for potential undercommit case

2012-11-27 Thread Raghavendra K T

On 11/28/2012 06:42 AM, Marcelo Tosatti wrote:


Don't understand the reasoning behind why 3 is a good choice.


Here is where I came from. (explaining from scratch for completeness, 
forgive me :))

In moderate overcommits, we can falsely exit from ple handler even when
we have preempted task of same VM waiting on other cpus. To reduce this
problem, we try few times before exiting.
The problem boils down to:
what is the probability that we exit ple handler even when we have more
than 1 task in other cpus. Theoretical worst case should be around 1.5x
overcommit (As also pointed by Andrew Theurer). [But practical
worstcase may be around 2x,3x overcommits as indicated by the results
for the patch series]

So if p is the probability of finding rq length one on a particular cpu,
and if we do n tries, then probability of exiting ple handler is:

 p^(n+1) [ because we would have come across one source with rq length
1 and n target cpu rqs  with length 1 ]

so
num tries: probability of aborting ple handler (1.5x overcommit)
 1 1/4
 2 1/8
 3 1/16

We can increase this probability with more tries, but the problem is
the overhead.
Also, If we have tried three times that means we would have iterated
over 3 good eligible vcpus along with many non-eligible candidates. In
worst case if we iterate all the vcpus, we reduce 1x performance and
overcommit performance get hit. [ as in results ].

I have tried num_tries = 1,2,3 and n already ( not 4 yet). So I
concluded 3 is enough.

Infact I have also run kernbench and hackbench which are giving 5-20%
improvement.

[ As a side note , I also thought how about having num_tries = f(n) =
ceil ( log(num_online_cpus)/2 ) But I thought calculation is too much
overhead and also there is no point in probably making it dependent on
online cpus ]

Please let me know if you are happy with this rationale/ or correct me
if you foresee some problem. (Infact Avi, Rik's concern about false
exiting made me arrive at 'try' logic which I did not have earlier).

I am currently trying out the result for 1.5x overcommit will post the
result.



On Mon, Nov 26, 2012 at 05:38:04PM +0530, Raghavendra K T wrote:

From: Raghavendra K T 

yield_to returns -ESRCH, When source and target of yield_to
run queue length is one. When we see three successive failures of
yield_to we assume we are in potential undercommit case and abort
from PLE handler.
The assumption is backed by low probability of wrong decision
for even worst case scenarios such as average runqueue length
between 1 and 2.

note that we do not update last boosted vcpu in failure cases.
Thank Avi for raising question on aborting after first fail from yield_to.

Reviewed-by: Srikar Dronamraju 
Signed-off-by: Raghavendra K T 

[...]

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 RFC 2/2] kvm: Handle yield_to failure return code for potential undercommit case

2012-11-27 Thread Marcelo Tosatti

Don't understand the reasoning behind why 3 is a good choice.

On Mon, Nov 26, 2012 at 05:38:04PM +0530, Raghavendra K T wrote:
> From: Raghavendra K T 
> 
> yield_to returns -ESRCH, When source and target of yield_to
> run queue length is one. When we see three successive failures of
> yield_to we assume we are in potential undercommit case and abort
> from PLE handler.
> The assumption is backed by low probability of wrong decision
> for even worst case scenarios such as average runqueue length
> between 1 and 2.
> 
> note that we do not update last boosted vcpu in failure cases.
> Thank Avi for raising question on aborting after first fail from yield_to.
> 
> Reviewed-by: Srikar Dronamraju 
> Signed-off-by: Raghavendra K T 
> ---
>  virt/kvm/kvm_main.c |   26 --
>  1 file changed, 16 insertions(+), 10 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index be70035..053f494 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1639,6 +1639,7 @@ bool kvm_vcpu_yield_to(struct kvm_vcpu *target)
>  {
>   struct pid *pid;
>   struct task_struct *task = NULL;
> + bool ret = false;
>  
>   rcu_read_lock();
>   pid = rcu_dereference(target->pid);
> @@ -1646,17 +1647,15 @@ bool kvm_vcpu_yield_to(struct kvm_vcpu *target)
>   task = get_pid_task(target->pid, PIDTYPE_PID);
>   rcu_read_unlock();
>   if (!task)
> - return false;
> + return ret;
>   if (task->flags & PF_VCPU) {
>   put_task_struct(task);
> - return false;
> - }
> - if (yield_to(task, 1)) {
> - put_task_struct(task);
> - return true;
> + return ret;
>   }
> + ret = yield_to(task, 1);
>   put_task_struct(task);
> - return false;
> +
> + return ret;
>  }
>  EXPORT_SYMBOL_GPL(kvm_vcpu_yield_to);
>  
> @@ -1697,12 +1696,14 @@ bool kvm_vcpu_eligible_for_directed_yield(struct 
> kvm_vcpu *vcpu)
>   return eligible;
>  }
>  #endif
> +
>  void kvm_vcpu_on_spin(struct kvm_vcpu *me)
>  {
>   struct kvm *kvm = me->kvm;
>   struct kvm_vcpu *vcpu;
>   int last_boosted_vcpu = me->kvm->last_boosted_vcpu;
>   int yielded = 0;
> + int try = 3;
>   int pass;
>   int i;
>  
> @@ -1714,7 +1715,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
>* VCPU is holding the lock that we need and will release it.
>* We approximate round-robin by starting at the last boosted VCPU.
>*/
> - for (pass = 0; pass < 2 && !yielded; pass++) {
> + for (pass = 0; pass < 2 && !yielded && try; pass++) {
>   kvm_for_each_vcpu(i, vcpu, kvm) {
>   if (!pass && i <= last_boosted_vcpu) {
>   i = last_boosted_vcpu;
> @@ -1727,10 +1728,15 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
>   continue;
>   if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
>   continue;
> - if (kvm_vcpu_yield_to(vcpu)) {
> +
> + yielded = kvm_vcpu_yield_to(vcpu);
> + if (yielded > 0) {
>   kvm->last_boosted_vcpu = i;
> - yielded = 1;
>   break;
> + } else if (yielded < 0) {
> + try--;
> + if (!try)
> + break;
>   }
>   }
>   }
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 RFC 2/2] kvm: Handle yield_to failure return code for potential undercommit case

2012-11-27 Thread Andrew Jones
On Tue, Nov 27, 2012 at 03:57:25PM +0530, Raghavendra K T wrote:
> On 11/26/2012 07:13 PM, Andrew Jones wrote:
> >On Mon, Nov 26, 2012 at 05:38:04PM +0530, Raghavendra K T wrote:
> >>From: Raghavendra K T 
> >>
> >>yield_to returns -ESRCH, When source and target of yield_to
> >>run queue length is one. When we see three successive failures of
> >>yield_to we assume we are in potential undercommit case and abort
> >>from PLE handler.
> >>The assumption is backed by low probability of wrong decision
> >>for even worst case scenarios such as average runqueue length
> >>between 1 and 2.
> >>
> >>note that we do not update last boosted vcpu in failure cases.
> >>Thank Avi for raising question on aborting after first fail from
> yield_to.
> >>
> >>Reviewed-by: Srikar Dronamraju 
> >>Signed-off-by: Raghavendra K T 
> >>---
> >>  virt/kvm/kvm_main.c |   26 --
> >>  1 file changed, 16 insertions(+), 10 deletions(-)
> >>
> >>diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> >>index be70035..053f494 100644
> >>--- a/virt/kvm/kvm_main.c
> >>+++ b/virt/kvm/kvm_main.c
> >>@@ -1639,6 +1639,7 @@ bool kvm_vcpu_yield_to(struct kvm_vcpu *target)
> >>  {
> >>struct pid *pid;
> >>struct task_struct *task = NULL;
> >>+   bool ret = false;
> >>
> >>rcu_read_lock();
> >>pid = rcu_dereference(target->pid);
> >>@@ -1646,17 +1647,15 @@ bool kvm_vcpu_yield_to(struct kvm_vcpu *target)
> >>task = get_pid_task(target->pid, PIDTYPE_PID);
> >>rcu_read_unlock();
> >>if (!task)
> >>-   return false;
> >>+   return ret;
> >>if (task->flags & PF_VCPU) {
> >>put_task_struct(task);
> >>-   return false;
> >>-   }
> >>-   if (yield_to(task, 1)) {
> >>-   put_task_struct(task);
> >>-   return true;
> >>+   return ret;
> >>}
> >>+   ret = yield_to(task, 1);
> >>put_task_struct(task);
> >>-   return false;
> >>+
> >>+   return ret;
> >>  }
> >>  EXPORT_SYMBOL_GPL(kvm_vcpu_yield_to);
> >>
> >>@@ -1697,12 +1696,14 @@ bool
> kvm_vcpu_eligible_for_directed_yield(struct kvm_vcpu *vcpu)
> >>return eligible;
> >>  }
> >>  #endif
> >>+
> >>  void kvm_vcpu_on_spin(struct kvm_vcpu *me)
> >>  {
> >>struct kvm *kvm = me->kvm;
> >>struct kvm_vcpu *vcpu;
> >>int last_boosted_vcpu = me->kvm->last_boosted_vcpu;
> >>int yielded = 0;
> >>+   int try = 3;
> >>int pass;
> >>int i;
> >>
> >>@@ -1714,7 +1715,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
> >> * VCPU is holding the lock that we need and will release it.
> >> * We approximate round-robin by starting at the last boosted VCPU.
> >> */
> >>-   for (pass = 0; pass < 2 && !yielded; pass++) {
> >>+   for (pass = 0; pass < 2 && !yielded && try; pass++) {
> >>kvm_for_each_vcpu(i, vcpu, kvm) {
> >>if (!pass && i <= last_boosted_vcpu) {
> >>i = last_boosted_vcpu;
> >>@@ -1727,10 +1728,15 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
> >>continue;
> >>if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
> >>continue;
> >>-   if (kvm_vcpu_yield_to(vcpu)) {
> >>+
> >>+   yielded = kvm_vcpu_yield_to(vcpu);
> >>+   if (yielded > 0) {
> >>kvm->last_boosted_vcpu = i;
> >>-   yielded = 1;
> >>break;
> >>+   } else if (yielded < 0) {
> >>+   try--;
> >>+   if (!try)
> >>+   break;
> >>}
> >>}
> >>}
> >>
> 
> Drew, Thanks for reviewing this.
> >
> >The check done in patch 1/2 is done before the double_rq_lock, so it's
> >cheap. Now, this patch is to avoid doing too many get_pid_task calls.I
> >wonder if it would make more sense to change the vcpu state from tracking
> >the pid to tracking the task. If that was done, then I don't believe this
> >patch is necessary.
> 
> We would need a logic not to break upon first failure of yield_to.
> (which happens otherwise with patch1 alone). Breaking upon first
> failure out of ple handler resulted in degradation in moderate
> overcommits due to false exits even when we have more than one task in
> other cpu run queues.
> 
> But your suggestion triggered an idea to me, what would be the cost of
> iterating over all vcpus despite of yield_to failure?
> 
> (Where we breakout of PLE handler only if we have successful yield
> i.e yielded > 0) with something like this:
> 
> -   for (pass = 0; pass < 2 && !yielded; pass++) {
> +   for (pass = 0; pass < 2 && yielded <=0 ; pass++) {
> kvm_for_each_vcpu(i, vcpu, kvm) {
> if (!pass && i <= last_boosted_vcpu) {
> i = last_boosted_vcpu;
> @@ -1727,11 +1727,12 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
>  

Re: [PATCH V3 RFC 2/2] kvm: Handle yield_to failure return code for potential undercommit case

2012-11-27 Thread Raghavendra K T

On 11/26/2012 07:13 PM, Andrew Jones wrote:

On Mon, Nov 26, 2012 at 05:38:04PM +0530, Raghavendra K T wrote:

From: Raghavendra K T 

yield_to returns -ESRCH, When source and target of yield_to
run queue length is one. When we see three successive failures of
yield_to we assume we are in potential undercommit case and abort
from PLE handler.
The assumption is backed by low probability of wrong decision
for even worst case scenarios such as average runqueue length
between 1 and 2.

note that we do not update last boosted vcpu in failure cases.
Thank Avi for raising question on aborting after first fail from

yield_to.


Reviewed-by: Srikar Dronamraju 
Signed-off-by: Raghavendra K T 
---
  virt/kvm/kvm_main.c |   26 --
  1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index be70035..053f494 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1639,6 +1639,7 @@ bool kvm_vcpu_yield_to(struct kvm_vcpu *target)
  {
struct pid *pid;
struct task_struct *task = NULL;
+   bool ret = false;

rcu_read_lock();
pid = rcu_dereference(target->pid);
@@ -1646,17 +1647,15 @@ bool kvm_vcpu_yield_to(struct kvm_vcpu *target)
task = get_pid_task(target->pid, PIDTYPE_PID);
rcu_read_unlock();
if (!task)
-   return false;
+   return ret;
if (task->flags & PF_VCPU) {
put_task_struct(task);
-   return false;
-   }
-   if (yield_to(task, 1)) {
-   put_task_struct(task);
-   return true;
+   return ret;
}
+   ret = yield_to(task, 1);
put_task_struct(task);
-   return false;
+
+   return ret;
  }
  EXPORT_SYMBOL_GPL(kvm_vcpu_yield_to);

@@ -1697,12 +1696,14 @@ bool

kvm_vcpu_eligible_for_directed_yield(struct kvm_vcpu *vcpu)

return eligible;
  }
  #endif
+
  void kvm_vcpu_on_spin(struct kvm_vcpu *me)
  {
struct kvm *kvm = me->kvm;
struct kvm_vcpu *vcpu;
int last_boosted_vcpu = me->kvm->last_boosted_vcpu;
int yielded = 0;
+   int try = 3;
int pass;
int i;

@@ -1714,7 +1715,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 * VCPU is holding the lock that we need and will release it.
 * We approximate round-robin by starting at the last boosted VCPU.
 */
-   for (pass = 0; pass < 2 && !yielded; pass++) {
+   for (pass = 0; pass < 2 && !yielded && try; pass++) {
kvm_for_each_vcpu(i, vcpu, kvm) {
if (!pass && i <= last_boosted_vcpu) {
i = last_boosted_vcpu;
@@ -1727,10 +1728,15 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
continue;
if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
continue;
-   if (kvm_vcpu_yield_to(vcpu)) {
+
+   yielded = kvm_vcpu_yield_to(vcpu);
+   if (yielded > 0) {
kvm->last_boosted_vcpu = i;
-   yielded = 1;
break;
+   } else if (yielded < 0) {
+   try--;
+   if (!try)
+   break;
}
}
}



Drew, Thanks for reviewing this.


The check done in patch 1/2 is done before the double_rq_lock, so it's
cheap. Now, this patch is to avoid doing too many get_pid_task calls.I
wonder if it would make more sense to change the vcpu state from tracking
the pid to tracking the task. If that was done, then I don't believe this
patch is necessary.


We would need a logic not to break upon first failure of yield_to. 
(which happens otherwise with patch1 alone). Breaking upon first

failure out of ple handler resulted in degradation in moderate
overcommits due to false exits even when we have more than one task in
other cpu run queues.

But your suggestion triggered an idea to me, what would be the cost of
iterating over all vcpus despite of yield_to failure?

(Where we breakout of PLE handler only if we have successful yield i.e 
yielded > 0) with something like this:


-   for (pass = 0; pass < 2 && !yielded; pass++) {
+   for (pass = 0; pass < 2 && yielded <=0 ; pass++) {
kvm_for_each_vcpu(i, vcpu, kvm) {
if (!pass && i <= last_boosted_vcpu) {
i = last_boosted_vcpu;
@@ -1727,11 +1727,12 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
continue;
if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
continue;
-   if (kvm_vcpu_yield_to(vcpu)) {
+
+   yielded = kvm_vcpu_yield_to(vcpu);
+ 

Re: [PATCH V3 RFC 2/2] kvm: Handle yield_to failure return code for potential undercommit case

2012-11-27 Thread Raghavendra K T

On 11/26/2012 07:13 PM, Andrew Jones wrote:

On Mon, Nov 26, 2012 at 05:38:04PM +0530, Raghavendra K T wrote:

From: Raghavendra K T raghavendra...@linux.vnet.ibm.com

yield_to returns -ESRCH, When source and target of yield_to
run queue length is one. When we see three successive failures of
yield_to we assume we are in potential undercommit case and abort
from PLE handler.
The assumption is backed by low probability of wrong decision
for even worst case scenarios such as average runqueue length
between 1 and 2.

note that we do not update last boosted vcpu in failure cases.
Thank Avi for raising question on aborting after first fail from

yield_to.


Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
  virt/kvm/kvm_main.c |   26 --
  1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index be70035..053f494 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1639,6 +1639,7 @@ bool kvm_vcpu_yield_to(struct kvm_vcpu *target)
  {
struct pid *pid;
struct task_struct *task = NULL;
+   bool ret = false;

rcu_read_lock();
pid = rcu_dereference(target-pid);
@@ -1646,17 +1647,15 @@ bool kvm_vcpu_yield_to(struct kvm_vcpu *target)
task = get_pid_task(target-pid, PIDTYPE_PID);
rcu_read_unlock();
if (!task)
-   return false;
+   return ret;
if (task-flags  PF_VCPU) {
put_task_struct(task);
-   return false;
-   }
-   if (yield_to(task, 1)) {
-   put_task_struct(task);
-   return true;
+   return ret;
}
+   ret = yield_to(task, 1);
put_task_struct(task);
-   return false;
+
+   return ret;
  }
  EXPORT_SYMBOL_GPL(kvm_vcpu_yield_to);

@@ -1697,12 +1696,14 @@ bool

kvm_vcpu_eligible_for_directed_yield(struct kvm_vcpu *vcpu)

return eligible;
  }
  #endif
+
  void kvm_vcpu_on_spin(struct kvm_vcpu *me)
  {
struct kvm *kvm = me-kvm;
struct kvm_vcpu *vcpu;
int last_boosted_vcpu = me-kvm-last_boosted_vcpu;
int yielded = 0;
+   int try = 3;
int pass;
int i;

@@ -1714,7 +1715,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 * VCPU is holding the lock that we need and will release it.
 * We approximate round-robin by starting at the last boosted VCPU.
 */
-   for (pass = 0; pass  2  !yielded; pass++) {
+   for (pass = 0; pass  2  !yielded  try; pass++) {
kvm_for_each_vcpu(i, vcpu, kvm) {
if (!pass  i = last_boosted_vcpu) {
i = last_boosted_vcpu;
@@ -1727,10 +1728,15 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
continue;
if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
continue;
-   if (kvm_vcpu_yield_to(vcpu)) {
+
+   yielded = kvm_vcpu_yield_to(vcpu);
+   if (yielded  0) {
kvm-last_boosted_vcpu = i;
-   yielded = 1;
break;
+   } else if (yielded  0) {
+   try--;
+   if (!try)
+   break;
}
}
}



Drew, Thanks for reviewing this.


The check done in patch 1/2 is done before the double_rq_lock, so it's
cheap. Now, this patch is to avoid doing too many get_pid_task calls.I
wonder if it would make more sense to change the vcpu state from tracking
the pid to tracking the task. If that was done, then I don't believe this
patch is necessary.


We would need a logic not to break upon first failure of yield_to. 
(which happens otherwise with patch1 alone). Breaking upon first

failure out of ple handler resulted in degradation in moderate
overcommits due to false exits even when we have more than one task in
other cpu run queues.

But your suggestion triggered an idea to me, what would be the cost of
iterating over all vcpus despite of yield_to failure?

(Where we breakout of PLE handler only if we have successful yield i.e 
yielded  0) with something like this:


-   for (pass = 0; pass  2  !yielded; pass++) {
+   for (pass = 0; pass  2  yielded =0 ; pass++) {
kvm_for_each_vcpu(i, vcpu, kvm) {
if (!pass  i = last_boosted_vcpu) {
i = last_boosted_vcpu;
@@ -1727,11 +1727,12 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
continue;
if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
continue;
-   if (kvm_vcpu_yield_to(vcpu)) {
+
+ 

Re: [PATCH V3 RFC 2/2] kvm: Handle yield_to failure return code for potential undercommit case

2012-11-27 Thread Andrew Jones
On Tue, Nov 27, 2012 at 03:57:25PM +0530, Raghavendra K T wrote:
 On 11/26/2012 07:13 PM, Andrew Jones wrote:
 On Mon, Nov 26, 2012 at 05:38:04PM +0530, Raghavendra K T wrote:
 From: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 
 yield_to returns -ESRCH, When source and target of yield_to
 run queue length is one. When we see three successive failures of
 yield_to we assume we are in potential undercommit case and abort
 from PLE handler.
 The assumption is backed by low probability of wrong decision
 for even worst case scenarios such as average runqueue length
 between 1 and 2.
 
 note that we do not update last boosted vcpu in failure cases.
 Thank Avi for raising question on aborting after first fail from
 yield_to.
 
 Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 ---
   virt/kvm/kvm_main.c |   26 --
   1 file changed, 16 insertions(+), 10 deletions(-)
 
 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index be70035..053f494 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -1639,6 +1639,7 @@ bool kvm_vcpu_yield_to(struct kvm_vcpu *target)
   {
 struct pid *pid;
 struct task_struct *task = NULL;
 +   bool ret = false;
 
 rcu_read_lock();
 pid = rcu_dereference(target-pid);
 @@ -1646,17 +1647,15 @@ bool kvm_vcpu_yield_to(struct kvm_vcpu *target)
 task = get_pid_task(target-pid, PIDTYPE_PID);
 rcu_read_unlock();
 if (!task)
 -   return false;
 +   return ret;
 if (task-flags  PF_VCPU) {
 put_task_struct(task);
 -   return false;
 -   }
 -   if (yield_to(task, 1)) {
 -   put_task_struct(task);
 -   return true;
 +   return ret;
 }
 +   ret = yield_to(task, 1);
 put_task_struct(task);
 -   return false;
 +
 +   return ret;
   }
   EXPORT_SYMBOL_GPL(kvm_vcpu_yield_to);
 
 @@ -1697,12 +1696,14 @@ bool
 kvm_vcpu_eligible_for_directed_yield(struct kvm_vcpu *vcpu)
 return eligible;
   }
   #endif
 +
   void kvm_vcpu_on_spin(struct kvm_vcpu *me)
   {
 struct kvm *kvm = me-kvm;
 struct kvm_vcpu *vcpu;
 int last_boosted_vcpu = me-kvm-last_boosted_vcpu;
 int yielded = 0;
 +   int try = 3;
 int pass;
 int i;
 
 @@ -1714,7 +1715,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
  * VCPU is holding the lock that we need and will release it.
  * We approximate round-robin by starting at the last boosted VCPU.
  */
 -   for (pass = 0; pass  2  !yielded; pass++) {
 +   for (pass = 0; pass  2  !yielded  try; pass++) {
 kvm_for_each_vcpu(i, vcpu, kvm) {
 if (!pass  i = last_boosted_vcpu) {
 i = last_boosted_vcpu;
 @@ -1727,10 +1728,15 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 continue;
 if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
 continue;
 -   if (kvm_vcpu_yield_to(vcpu)) {
 +
 +   yielded = kvm_vcpu_yield_to(vcpu);
 +   if (yielded  0) {
 kvm-last_boosted_vcpu = i;
 -   yielded = 1;
 break;
 +   } else if (yielded  0) {
 +   try--;
 +   if (!try)
 +   break;
 }
 }
 }
 
 
 Drew, Thanks for reviewing this.
 
 The check done in patch 1/2 is done before the double_rq_lock, so it's
 cheap. Now, this patch is to avoid doing too many get_pid_task calls.I
 wonder if it would make more sense to change the vcpu state from tracking
 the pid to tracking the task. If that was done, then I don't believe this
 patch is necessary.
 
 We would need a logic not to break upon first failure of yield_to.
 (which happens otherwise with patch1 alone). Breaking upon first
 failure out of ple handler resulted in degradation in moderate
 overcommits due to false exits even when we have more than one task in
 other cpu run queues.
 
 But your suggestion triggered an idea to me, what would be the cost of
 iterating over all vcpus despite of yield_to failure?
 
 (Where we breakout of PLE handler only if we have successful yield
 i.e yielded  0) with something like this:
 
 -   for (pass = 0; pass  2  !yielded; pass++) {
 +   for (pass = 0; pass  2  yielded =0 ; pass++) {
 kvm_for_each_vcpu(i, vcpu, kvm) {
 if (!pass  i = last_boosted_vcpu) {
 i = last_boosted_vcpu;
 @@ -1727,11 +1727,12 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 continue;
 if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
 continue;
 -   if (kvm_vcpu_yield_to(vcpu)) {
 +
 +   yielded = 

Re: [PATCH V3 RFC 2/2] kvm: Handle yield_to failure return code for potential undercommit case

2012-11-27 Thread Marcelo Tosatti

Don't understand the reasoning behind why 3 is a good choice.

On Mon, Nov 26, 2012 at 05:38:04PM +0530, Raghavendra K T wrote:
 From: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 
 yield_to returns -ESRCH, When source and target of yield_to
 run queue length is one. When we see three successive failures of
 yield_to we assume we are in potential undercommit case and abort
 from PLE handler.
 The assumption is backed by low probability of wrong decision
 for even worst case scenarios such as average runqueue length
 between 1 and 2.
 
 note that we do not update last boosted vcpu in failure cases.
 Thank Avi for raising question on aborting after first fail from yield_to.
 
 Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 ---
  virt/kvm/kvm_main.c |   26 --
  1 file changed, 16 insertions(+), 10 deletions(-)
 
 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index be70035..053f494 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -1639,6 +1639,7 @@ bool kvm_vcpu_yield_to(struct kvm_vcpu *target)
  {
   struct pid *pid;
   struct task_struct *task = NULL;
 + bool ret = false;
  
   rcu_read_lock();
   pid = rcu_dereference(target-pid);
 @@ -1646,17 +1647,15 @@ bool kvm_vcpu_yield_to(struct kvm_vcpu *target)
   task = get_pid_task(target-pid, PIDTYPE_PID);
   rcu_read_unlock();
   if (!task)
 - return false;
 + return ret;
   if (task-flags  PF_VCPU) {
   put_task_struct(task);
 - return false;
 - }
 - if (yield_to(task, 1)) {
 - put_task_struct(task);
 - return true;
 + return ret;
   }
 + ret = yield_to(task, 1);
   put_task_struct(task);
 - return false;
 +
 + return ret;
  }
  EXPORT_SYMBOL_GPL(kvm_vcpu_yield_to);
  
 @@ -1697,12 +1696,14 @@ bool kvm_vcpu_eligible_for_directed_yield(struct 
 kvm_vcpu *vcpu)
   return eligible;
  }
  #endif
 +
  void kvm_vcpu_on_spin(struct kvm_vcpu *me)
  {
   struct kvm *kvm = me-kvm;
   struct kvm_vcpu *vcpu;
   int last_boosted_vcpu = me-kvm-last_boosted_vcpu;
   int yielded = 0;
 + int try = 3;
   int pass;
   int i;
  
 @@ -1714,7 +1715,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
* VCPU is holding the lock that we need and will release it.
* We approximate round-robin by starting at the last boosted VCPU.
*/
 - for (pass = 0; pass  2  !yielded; pass++) {
 + for (pass = 0; pass  2  !yielded  try; pass++) {
   kvm_for_each_vcpu(i, vcpu, kvm) {
   if (!pass  i = last_boosted_vcpu) {
   i = last_boosted_vcpu;
 @@ -1727,10 +1728,15 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
   continue;
   if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
   continue;
 - if (kvm_vcpu_yield_to(vcpu)) {
 +
 + yielded = kvm_vcpu_yield_to(vcpu);
 + if (yielded  0) {
   kvm-last_boosted_vcpu = i;
 - yielded = 1;
   break;
 + } else if (yielded  0) {
 + try--;
 + if (!try)
 + break;
   }
   }
   }
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 RFC 2/2] kvm: Handle yield_to failure return code for potential undercommit case

2012-11-27 Thread Raghavendra K T

On 11/28/2012 06:42 AM, Marcelo Tosatti wrote:


Don't understand the reasoning behind why 3 is a good choice.


Here is where I came from. (explaining from scratch for completeness, 
forgive me :))

In moderate overcommits, we can falsely exit from ple handler even when
we have preempted task of same VM waiting on other cpus. To reduce this
problem, we try few times before exiting.
The problem boils down to:
what is the probability that we exit ple handler even when we have more
than 1 task in other cpus. Theoretical worst case should be around 1.5x
overcommit (As also pointed by Andrew Theurer). [But practical
worstcase may be around 2x,3x overcommits as indicated by the results
for the patch series]

So if p is the probability of finding rq length one on a particular cpu,
and if we do n tries, then probability of exiting ple handler is:

 p^(n+1) [ because we would have come across one source with rq length
1 and n target cpu rqs  with length 1 ]

so
num tries: probability of aborting ple handler (1.5x overcommit)
 1 1/4
 2 1/8
 3 1/16

We can increase this probability with more tries, but the problem is
the overhead.
Also, If we have tried three times that means we would have iterated
over 3 good eligible vcpus along with many non-eligible candidates. In
worst case if we iterate all the vcpus, we reduce 1x performance and
overcommit performance get hit. [ as in results ].

I have tried num_tries = 1,2,3 and n already ( not 4 yet). So I
concluded 3 is enough.

Infact I have also run kernbench and hackbench which are giving 5-20%
improvement.

[ As a side note , I also thought how about having num_tries = f(n) =
ceil ( log(num_online_cpus)/2 ) But I thought calculation is too much
overhead and also there is no point in probably making it dependent on
online cpus ]

Please let me know if you are happy with this rationale/ or correct me
if you foresee some problem. (Infact Avi, Rik's concern about false
exiting made me arrive at 'try' logic which I did not have earlier).

I am currently trying out the result for 1.5x overcommit will post the
result.



On Mon, Nov 26, 2012 at 05:38:04PM +0530, Raghavendra K T wrote:

From: Raghavendra K T raghavendra...@linux.vnet.ibm.com

yield_to returns -ESRCH, When source and target of yield_to
run queue length is one. When we see three successive failures of
yield_to we assume we are in potential undercommit case and abort
from PLE handler.
The assumption is backed by low probability of wrong decision
for even worst case scenarios such as average runqueue length
between 1 and 2.

note that we do not update last boosted vcpu in failure cases.
Thank Avi for raising question on aborting after first fail from yield_to.

Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com

[...]

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 RFC 2/2] kvm: Handle yield_to failure return code for potential undercommit case

2012-11-26 Thread Andrew Jones
On Mon, Nov 26, 2012 at 02:43:02PM +0100, Andrew Jones wrote:
> On Mon, Nov 26, 2012 at 05:38:04PM +0530, Raghavendra K T wrote:
> > From: Raghavendra K T 
> > 
> > yield_to returns -ESRCH, When source and target of yield_to
> > run queue length is one. When we see three successive failures of
> > yield_to we assume we are in potential undercommit case and abort
> > from PLE handler.
> > The assumption is backed by low probability of wrong decision
> > for even worst case scenarios such as average runqueue length
> > between 1 and 2.
> > 
> > note that we do not update last boosted vcpu in failure cases.
> > Thank Avi for raising question on aborting after first fail from yield_to.
> > 
> > Reviewed-by: Srikar Dronamraju 
> > Signed-off-by: Raghavendra K T 
> > ---
> >  virt/kvm/kvm_main.c |   26 --
> >  1 file changed, 16 insertions(+), 10 deletions(-)
> > 
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index be70035..053f494 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -1639,6 +1639,7 @@ bool kvm_vcpu_yield_to(struct kvm_vcpu *target)
> >  {
> > struct pid *pid;
> > struct task_struct *task = NULL;
> > +   bool ret = false;
> >  
> > rcu_read_lock();
> > pid = rcu_dereference(target->pid);
> > @@ -1646,17 +1647,15 @@ bool kvm_vcpu_yield_to(struct kvm_vcpu *target)
> > task = get_pid_task(target->pid, PIDTYPE_PID);
> > rcu_read_unlock();
> > if (!task)
> > -   return false;
> > +   return ret;
> > if (task->flags & PF_VCPU) {
> > put_task_struct(task);
> > -   return false;
> > -   }
> > -   if (yield_to(task, 1)) {
> > -   put_task_struct(task);
> > -   return true;
> > +   return ret;
> > }
> > +   ret = yield_to(task, 1);
> > put_task_struct(task);
> > -   return false;
> > +
> > +   return ret;
> >  }
> >  EXPORT_SYMBOL_GPL(kvm_vcpu_yield_to);
> >  
> > @@ -1697,12 +1696,14 @@ bool kvm_vcpu_eligible_for_directed_yield(struct 
> > kvm_vcpu *vcpu)
> > return eligible;
> >  }
> >  #endif
> > +
> >  void kvm_vcpu_on_spin(struct kvm_vcpu *me)
> >  {
> > struct kvm *kvm = me->kvm;
> > struct kvm_vcpu *vcpu;
> > int last_boosted_vcpu = me->kvm->last_boosted_vcpu;
> > int yielded = 0;
> > +   int try = 3;
> > int pass;
> > int i;
> >  
> > @@ -1714,7 +1715,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
> >  * VCPU is holding the lock that we need and will release it.
> >  * We approximate round-robin by starting at the last boosted VCPU.
> >  */
> > -   for (pass = 0; pass < 2 && !yielded; pass++) {
> > +   for (pass = 0; pass < 2 && !yielded && try; pass++) {
> > kvm_for_each_vcpu(i, vcpu, kvm) {
> > if (!pass && i <= last_boosted_vcpu) {
> > i = last_boosted_vcpu;
> > @@ -1727,10 +1728,15 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
> > continue;
> > if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
> > continue;
> > -   if (kvm_vcpu_yield_to(vcpu)) {
> > +
> > +   yielded = kvm_vcpu_yield_to(vcpu);
> > +   if (yielded > 0) {
> > kvm->last_boosted_vcpu = i;
> > -   yielded = 1;
> > break;
> > +   } else if (yielded < 0) {
> > +   try--;
> > +   if (!try)
> > +   break;
> > }
> > }
> > }
> > 
> 
> The check done in patch 1/2 is done before the double_rq_lock, so it's
> cheap. Now, this patch is to avoid doing too many get_pid_task calls. I
> wonder if it would make more sense to change the vcpu state from tracking
> the pid to tracking the task. If that was done, then I don't believe this
> patch is necessary.
> 
> Rik,
> for 34bb10b79de7 was there a reason pid was used instead of task?

Nevermind, I guess there's no way to validate the task pointer without
checking the pid, since, as your git commit says, there are no guarantee
that the same task always keeps the same vcpu. We'd only know it's valid
if it's running, and if it's running, it's of no interest.

> 
> Drew
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 RFC 2/2] kvm: Handle yield_to failure return code for potential undercommit case

2012-11-26 Thread Andrew Jones
On Mon, Nov 26, 2012 at 05:38:04PM +0530, Raghavendra K T wrote:
> From: Raghavendra K T 
> 
> yield_to returns -ESRCH, When source and target of yield_to
> run queue length is one. When we see three successive failures of
> yield_to we assume we are in potential undercommit case and abort
> from PLE handler.
> The assumption is backed by low probability of wrong decision
> for even worst case scenarios such as average runqueue length
> between 1 and 2.
> 
> note that we do not update last boosted vcpu in failure cases.
> Thank Avi for raising question on aborting after first fail from yield_to.
> 
> Reviewed-by: Srikar Dronamraju 
> Signed-off-by: Raghavendra K T 
> ---
>  virt/kvm/kvm_main.c |   26 --
>  1 file changed, 16 insertions(+), 10 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index be70035..053f494 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1639,6 +1639,7 @@ bool kvm_vcpu_yield_to(struct kvm_vcpu *target)
>  {
>   struct pid *pid;
>   struct task_struct *task = NULL;
> + bool ret = false;
>  
>   rcu_read_lock();
>   pid = rcu_dereference(target->pid);
> @@ -1646,17 +1647,15 @@ bool kvm_vcpu_yield_to(struct kvm_vcpu *target)
>   task = get_pid_task(target->pid, PIDTYPE_PID);
>   rcu_read_unlock();
>   if (!task)
> - return false;
> + return ret;
>   if (task->flags & PF_VCPU) {
>   put_task_struct(task);
> - return false;
> - }
> - if (yield_to(task, 1)) {
> - put_task_struct(task);
> - return true;
> + return ret;
>   }
> + ret = yield_to(task, 1);
>   put_task_struct(task);
> - return false;
> +
> + return ret;
>  }
>  EXPORT_SYMBOL_GPL(kvm_vcpu_yield_to);
>  
> @@ -1697,12 +1696,14 @@ bool kvm_vcpu_eligible_for_directed_yield(struct 
> kvm_vcpu *vcpu)
>   return eligible;
>  }
>  #endif
> +
>  void kvm_vcpu_on_spin(struct kvm_vcpu *me)
>  {
>   struct kvm *kvm = me->kvm;
>   struct kvm_vcpu *vcpu;
>   int last_boosted_vcpu = me->kvm->last_boosted_vcpu;
>   int yielded = 0;
> + int try = 3;
>   int pass;
>   int i;
>  
> @@ -1714,7 +1715,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
>* VCPU is holding the lock that we need and will release it.
>* We approximate round-robin by starting at the last boosted VCPU.
>*/
> - for (pass = 0; pass < 2 && !yielded; pass++) {
> + for (pass = 0; pass < 2 && !yielded && try; pass++) {
>   kvm_for_each_vcpu(i, vcpu, kvm) {
>   if (!pass && i <= last_boosted_vcpu) {
>   i = last_boosted_vcpu;
> @@ -1727,10 +1728,15 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
>   continue;
>   if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
>   continue;
> - if (kvm_vcpu_yield_to(vcpu)) {
> +
> + yielded = kvm_vcpu_yield_to(vcpu);
> + if (yielded > 0) {
>   kvm->last_boosted_vcpu = i;
> - yielded = 1;
>   break;
> + } else if (yielded < 0) {
> + try--;
> + if (!try)
> + break;
>   }
>   }
>   }
> 

The check done in patch 1/2 is done before the double_rq_lock, so it's
cheap. Now, this patch is to avoid doing too many get_pid_task calls. I
wonder if it would make more sense to change the vcpu state from tracking
the pid to tracking the task. If that was done, then I don't believe this
patch is necessary.

Rik,
for 34bb10b79de7 was there a reason pid was used instead of task?

Drew
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V3 RFC 2/2] kvm: Handle yield_to failure return code for potential undercommit case

2012-11-26 Thread Raghavendra K T
From: Raghavendra K T 

yield_to returns -ESRCH, When source and target of yield_to
run queue length is one. When we see three successive failures of
yield_to we assume we are in potential undercommit case and abort
from PLE handler.
The assumption is backed by low probability of wrong decision
for even worst case scenarios such as average runqueue length
between 1 and 2.

note that we do not update last boosted vcpu in failure cases.
Thank Avi for raising question on aborting after first fail from yield_to.

Reviewed-by: Srikar Dronamraju 
Signed-off-by: Raghavendra K T 
---
 virt/kvm/kvm_main.c |   26 --
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index be70035..053f494 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1639,6 +1639,7 @@ bool kvm_vcpu_yield_to(struct kvm_vcpu *target)
 {
struct pid *pid;
struct task_struct *task = NULL;
+   bool ret = false;
 
rcu_read_lock();
pid = rcu_dereference(target->pid);
@@ -1646,17 +1647,15 @@ bool kvm_vcpu_yield_to(struct kvm_vcpu *target)
task = get_pid_task(target->pid, PIDTYPE_PID);
rcu_read_unlock();
if (!task)
-   return false;
+   return ret;
if (task->flags & PF_VCPU) {
put_task_struct(task);
-   return false;
-   }
-   if (yield_to(task, 1)) {
-   put_task_struct(task);
-   return true;
+   return ret;
}
+   ret = yield_to(task, 1);
put_task_struct(task);
-   return false;
+
+   return ret;
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_yield_to);
 
@@ -1697,12 +1696,14 @@ bool kvm_vcpu_eligible_for_directed_yield(struct 
kvm_vcpu *vcpu)
return eligible;
 }
 #endif
+
 void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 {
struct kvm *kvm = me->kvm;
struct kvm_vcpu *vcpu;
int last_boosted_vcpu = me->kvm->last_boosted_vcpu;
int yielded = 0;
+   int try = 3;
int pass;
int i;
 
@@ -1714,7 +1715,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 * VCPU is holding the lock that we need and will release it.
 * We approximate round-robin by starting at the last boosted VCPU.
 */
-   for (pass = 0; pass < 2 && !yielded; pass++) {
+   for (pass = 0; pass < 2 && !yielded && try; pass++) {
kvm_for_each_vcpu(i, vcpu, kvm) {
if (!pass && i <= last_boosted_vcpu) {
i = last_boosted_vcpu;
@@ -1727,10 +1728,15 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
continue;
if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
continue;
-   if (kvm_vcpu_yield_to(vcpu)) {
+
+   yielded = kvm_vcpu_yield_to(vcpu);
+   if (yielded > 0) {
kvm->last_boosted_vcpu = i;
-   yielded = 1;
break;
+   } else if (yielded < 0) {
+   try--;
+   if (!try)
+   break;
}
}
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V3 RFC 2/2] kvm: Handle yield_to failure return code for potential undercommit case

2012-11-26 Thread Raghavendra K T
From: Raghavendra K T raghavendra...@linux.vnet.ibm.com

yield_to returns -ESRCH, When source and target of yield_to
run queue length is one. When we see three successive failures of
yield_to we assume we are in potential undercommit case and abort
from PLE handler.
The assumption is backed by low probability of wrong decision
for even worst case scenarios such as average runqueue length
between 1 and 2.

note that we do not update last boosted vcpu in failure cases.
Thank Avi for raising question on aborting after first fail from yield_to.

Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
 virt/kvm/kvm_main.c |   26 --
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index be70035..053f494 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1639,6 +1639,7 @@ bool kvm_vcpu_yield_to(struct kvm_vcpu *target)
 {
struct pid *pid;
struct task_struct *task = NULL;
+   bool ret = false;
 
rcu_read_lock();
pid = rcu_dereference(target-pid);
@@ -1646,17 +1647,15 @@ bool kvm_vcpu_yield_to(struct kvm_vcpu *target)
task = get_pid_task(target-pid, PIDTYPE_PID);
rcu_read_unlock();
if (!task)
-   return false;
+   return ret;
if (task-flags  PF_VCPU) {
put_task_struct(task);
-   return false;
-   }
-   if (yield_to(task, 1)) {
-   put_task_struct(task);
-   return true;
+   return ret;
}
+   ret = yield_to(task, 1);
put_task_struct(task);
-   return false;
+
+   return ret;
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_yield_to);
 
@@ -1697,12 +1696,14 @@ bool kvm_vcpu_eligible_for_directed_yield(struct 
kvm_vcpu *vcpu)
return eligible;
 }
 #endif
+
 void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 {
struct kvm *kvm = me-kvm;
struct kvm_vcpu *vcpu;
int last_boosted_vcpu = me-kvm-last_boosted_vcpu;
int yielded = 0;
+   int try = 3;
int pass;
int i;
 
@@ -1714,7 +1715,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 * VCPU is holding the lock that we need and will release it.
 * We approximate round-robin by starting at the last boosted VCPU.
 */
-   for (pass = 0; pass  2  !yielded; pass++) {
+   for (pass = 0; pass  2  !yielded  try; pass++) {
kvm_for_each_vcpu(i, vcpu, kvm) {
if (!pass  i = last_boosted_vcpu) {
i = last_boosted_vcpu;
@@ -1727,10 +1728,15 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
continue;
if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
continue;
-   if (kvm_vcpu_yield_to(vcpu)) {
+
+   yielded = kvm_vcpu_yield_to(vcpu);
+   if (yielded  0) {
kvm-last_boosted_vcpu = i;
-   yielded = 1;
break;
+   } else if (yielded  0) {
+   try--;
+   if (!try)
+   break;
}
}
}

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 RFC 2/2] kvm: Handle yield_to failure return code for potential undercommit case

2012-11-26 Thread Andrew Jones
On Mon, Nov 26, 2012 at 05:38:04PM +0530, Raghavendra K T wrote:
 From: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 
 yield_to returns -ESRCH, When source and target of yield_to
 run queue length is one. When we see three successive failures of
 yield_to we assume we are in potential undercommit case and abort
 from PLE handler.
 The assumption is backed by low probability of wrong decision
 for even worst case scenarios such as average runqueue length
 between 1 and 2.
 
 note that we do not update last boosted vcpu in failure cases.
 Thank Avi for raising question on aborting after first fail from yield_to.
 
 Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 ---
  virt/kvm/kvm_main.c |   26 --
  1 file changed, 16 insertions(+), 10 deletions(-)
 
 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index be70035..053f494 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -1639,6 +1639,7 @@ bool kvm_vcpu_yield_to(struct kvm_vcpu *target)
  {
   struct pid *pid;
   struct task_struct *task = NULL;
 + bool ret = false;
  
   rcu_read_lock();
   pid = rcu_dereference(target-pid);
 @@ -1646,17 +1647,15 @@ bool kvm_vcpu_yield_to(struct kvm_vcpu *target)
   task = get_pid_task(target-pid, PIDTYPE_PID);
   rcu_read_unlock();
   if (!task)
 - return false;
 + return ret;
   if (task-flags  PF_VCPU) {
   put_task_struct(task);
 - return false;
 - }
 - if (yield_to(task, 1)) {
 - put_task_struct(task);
 - return true;
 + return ret;
   }
 + ret = yield_to(task, 1);
   put_task_struct(task);
 - return false;
 +
 + return ret;
  }
  EXPORT_SYMBOL_GPL(kvm_vcpu_yield_to);
  
 @@ -1697,12 +1696,14 @@ bool kvm_vcpu_eligible_for_directed_yield(struct 
 kvm_vcpu *vcpu)
   return eligible;
  }
  #endif
 +
  void kvm_vcpu_on_spin(struct kvm_vcpu *me)
  {
   struct kvm *kvm = me-kvm;
   struct kvm_vcpu *vcpu;
   int last_boosted_vcpu = me-kvm-last_boosted_vcpu;
   int yielded = 0;
 + int try = 3;
   int pass;
   int i;
  
 @@ -1714,7 +1715,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
* VCPU is holding the lock that we need and will release it.
* We approximate round-robin by starting at the last boosted VCPU.
*/
 - for (pass = 0; pass  2  !yielded; pass++) {
 + for (pass = 0; pass  2  !yielded  try; pass++) {
   kvm_for_each_vcpu(i, vcpu, kvm) {
   if (!pass  i = last_boosted_vcpu) {
   i = last_boosted_vcpu;
 @@ -1727,10 +1728,15 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
   continue;
   if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
   continue;
 - if (kvm_vcpu_yield_to(vcpu)) {
 +
 + yielded = kvm_vcpu_yield_to(vcpu);
 + if (yielded  0) {
   kvm-last_boosted_vcpu = i;
 - yielded = 1;
   break;
 + } else if (yielded  0) {
 + try--;
 + if (!try)
 + break;
   }
   }
   }
 

The check done in patch 1/2 is done before the double_rq_lock, so it's
cheap. Now, this patch is to avoid doing too many get_pid_task calls. I
wonder if it would make more sense to change the vcpu state from tracking
the pid to tracking the task. If that was done, then I don't believe this
patch is necessary.

Rik,
for 34bb10b79de7 was there a reason pid was used instead of task?

Drew
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 RFC 2/2] kvm: Handle yield_to failure return code for potential undercommit case

2012-11-26 Thread Andrew Jones
On Mon, Nov 26, 2012 at 02:43:02PM +0100, Andrew Jones wrote:
 On Mon, Nov 26, 2012 at 05:38:04PM +0530, Raghavendra K T wrote:
  From: Raghavendra K T raghavendra...@linux.vnet.ibm.com
  
  yield_to returns -ESRCH, When source and target of yield_to
  run queue length is one. When we see three successive failures of
  yield_to we assume we are in potential undercommit case and abort
  from PLE handler.
  The assumption is backed by low probability of wrong decision
  for even worst case scenarios such as average runqueue length
  between 1 and 2.
  
  note that we do not update last boosted vcpu in failure cases.
  Thank Avi for raising question on aborting after first fail from yield_to.
  
  Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com
  Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
  ---
   virt/kvm/kvm_main.c |   26 --
   1 file changed, 16 insertions(+), 10 deletions(-)
  
  diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
  index be70035..053f494 100644
  --- a/virt/kvm/kvm_main.c
  +++ b/virt/kvm/kvm_main.c
  @@ -1639,6 +1639,7 @@ bool kvm_vcpu_yield_to(struct kvm_vcpu *target)
   {
  struct pid *pid;
  struct task_struct *task = NULL;
  +   bool ret = false;
   
  rcu_read_lock();
  pid = rcu_dereference(target-pid);
  @@ -1646,17 +1647,15 @@ bool kvm_vcpu_yield_to(struct kvm_vcpu *target)
  task = get_pid_task(target-pid, PIDTYPE_PID);
  rcu_read_unlock();
  if (!task)
  -   return false;
  +   return ret;
  if (task-flags  PF_VCPU) {
  put_task_struct(task);
  -   return false;
  -   }
  -   if (yield_to(task, 1)) {
  -   put_task_struct(task);
  -   return true;
  +   return ret;
  }
  +   ret = yield_to(task, 1);
  put_task_struct(task);
  -   return false;
  +
  +   return ret;
   }
   EXPORT_SYMBOL_GPL(kvm_vcpu_yield_to);
   
  @@ -1697,12 +1696,14 @@ bool kvm_vcpu_eligible_for_directed_yield(struct 
  kvm_vcpu *vcpu)
  return eligible;
   }
   #endif
  +
   void kvm_vcpu_on_spin(struct kvm_vcpu *me)
   {
  struct kvm *kvm = me-kvm;
  struct kvm_vcpu *vcpu;
  int last_boosted_vcpu = me-kvm-last_boosted_vcpu;
  int yielded = 0;
  +   int try = 3;
  int pass;
  int i;
   
  @@ -1714,7 +1715,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
   * VCPU is holding the lock that we need and will release it.
   * We approximate round-robin by starting at the last boosted VCPU.
   */
  -   for (pass = 0; pass  2  !yielded; pass++) {
  +   for (pass = 0; pass  2  !yielded  try; pass++) {
  kvm_for_each_vcpu(i, vcpu, kvm) {
  if (!pass  i = last_boosted_vcpu) {
  i = last_boosted_vcpu;
  @@ -1727,10 +1728,15 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
  continue;
  if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
  continue;
  -   if (kvm_vcpu_yield_to(vcpu)) {
  +
  +   yielded = kvm_vcpu_yield_to(vcpu);
  +   if (yielded  0) {
  kvm-last_boosted_vcpu = i;
  -   yielded = 1;
  break;
  +   } else if (yielded  0) {
  +   try--;
  +   if (!try)
  +   break;
  }
  }
  }
  
 
 The check done in patch 1/2 is done before the double_rq_lock, so it's
 cheap. Now, this patch is to avoid doing too many get_pid_task calls. I
 wonder if it would make more sense to change the vcpu state from tracking
 the pid to tracking the task. If that was done, then I don't believe this
 patch is necessary.
 
 Rik,
 for 34bb10b79de7 was there a reason pid was used instead of task?

Nevermind, I guess there's no way to validate the task pointer without
checking the pid, since, as your git commit says, there are no guarantee
that the same task always keeps the same vcpu. We'd only know it's valid
if it's running, and if it's running, it's of no interest.

 
 Drew
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/