Re: [Xen-devel] [PATCH] xen: sched: avoid races on time values read from NOW()

2016-05-24 Thread Dario Faggioli
On Tue, 2016-05-24 at 04:08 -0600, Jan Beulich wrote:
> While backporting this for 4.6 (which required substantial
> adjustment to the sched_rt.c part) 
>
Yep, I figure it did. :-(

It sounds like you've pretty much done with it, but if not, and if you
want me or Meng to provide the backport, just ask!

> I noticed that there's another
> of the cases mentioned in the description in rt_vcpu_insert(). The
> commit message doesn't mention why this was left unchanged, so
> was not fixing this perhaps just an oversight?
> 
It is indeed. I'll send a patch.

Sorry,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)



signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen: sched: avoid races on time values read from NOW()

2016-05-24 Thread Jan Beulich
>>> On 19.05.16 at 10:11,  wrote:
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -1198,7 +1198,7 @@ static void
>  rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
>  {
>  struct rt_vcpu * const svc = rt_vcpu(vc);
> -s_time_t now = NOW();
> +s_time_t now;
>  bool_t missed;
>  
>  BUG_ON( is_idle_vcpu(vc) );
> @@ -1225,6 +1225,7 @@ rt_vcpu_wake(const struct scheduler *ops, struct vcpu 
> *vc)
>   * If a deadline passed while svc was asleep/blocked, we need new
>   * scheduling parameters (a new deadline and full budget).
>   */
> +now = NOW();
>  
>  missed = ( now >= svc->cur_deadline );
>  if ( missed )
> @@ -1394,7 +1395,7 @@ rt_dom_cntl(
>   * from the replq and does the actual replenishment.
>   */
>  static void repl_timer_handler(void *data){
> -s_time_t now = NOW();
> +s_time_t now;
>  struct scheduler *ops = data;
>  struct rt_private *prv = rt_priv(ops);
>  struct list_head *replq = rt_replq(ops);
> @@ -1406,6 +1407,8 @@ static void repl_timer_handler(void *data){
>  
>  spin_lock_irq(>lock);
>  
> +now = NOW();
> +
>  /*
>   * Do the replenishment and move replenished vcpus
>   * to the temporary list to tickle.

While backporting this for 4.6 (which required substantial
adjustment to the sched_rt.c part) I noticed that there's another
of the cases mentioned in the description in rt_vcpu_insert(). The
commit message doesn't mention why this was left unchanged, so
was not fixing this perhaps just an oversight?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen: sched: avoid races on time values read from NOW()

2016-05-19 Thread Meng Xu
On Thu, May 19, 2016 at 4:11 AM, Dario Faggioli
 wrote:
> or (even in cases where there is no race, e.g., outside
> of Credit2) avoid using a time sample which may be rather
> old, and hence stale.
>
> In fact, we should only sample NOW() from _inside_
> the critical region within which the value we read is
> used. If we don't, in case we have to spin for a while
> before entering the region, when actually using it:
>
>  1) we will use something that, at the veryy least, is
> not really "now", because of the spinning,
>
>  2) if someone else sampled NOW() during a critical
> region protected by the lock we are spinning on,
> and if we compare the two samples when we get
> inside our region, our one will be 'earlier',
> even if we actually arrived later, which is a
> race.
>
> In Credit2, we see an instance of 2), in runq_tickle(),
> when it is called by csched2_context_saved() as it samples
> NOW() before acquiring the runq lock. This makes things
> look like the time went backwards, and it confuses the
> algorithm (there's even a d2printk() about it, which would
> trigger all the time, if enabled).
>
> In RTDS, something similar happens in repl_timer_handler(),
> and there's another instance in schedule() (in generic code),
> so fix these cases too.
>
> While there, improve csched2_vcpu_wake() and and rt_vcpu_wake()
> a little as well (removing a pointless initialization, and
> moving the sampling a bit closer to its use). These two hunks
> entail no further functional changes.
>
> Signed-off-by: Dario Faggioli 
> ---
> Cc: George Dunlap 
> Cc: Meng Xu 
> Cc: Wei Liu 
> ---

Reviewed-by: Meng Xu 

The bug will cause incorrect budget accounting for one VCPU when the
race occurs.

Best Regards,

Meng
---
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen: sched: avoid races on time values read from NOW()

2016-05-19 Thread Wei Liu
On Thu, May 19, 2016 at 10:11:35AM +0200, Dario Faggioli wrote:
> or (even in cases where there is no race, e.g., outside
> of Credit2) avoid using a time sample which may be rather
> old, and hence stale.
> 
> In fact, we should only sample NOW() from _inside_
> the critical region within which the value we read is
> used. If we don't, in case we have to spin for a while
> before entering the region, when actually using it:
> 
>  1) we will use something that, at the veryy least, is
> not really "now", because of the spinning,
> 
>  2) if someone else sampled NOW() during a critical
> region protected by the lock we are spinning on,
> and if we compare the two samples when we get
> inside our region, our one will be 'earlier',
> even if we actually arrived later, which is a
> race.
> 
> In Credit2, we see an instance of 2), in runq_tickle(),
> when it is called by csched2_context_saved() as it samples
> NOW() before acquiring the runq lock. This makes things
> look like the time went backwards, and it confuses the
> algorithm (there's even a d2printk() about it, which would
> trigger all the time, if enabled).
> 
> In RTDS, something similar happens in repl_timer_handler(),
> and there's another instance in schedule() (in generic code),
> so fix these cases too.
> 
> While there, improve csched2_vcpu_wake() and and rt_vcpu_wake()
> a little as well (removing a pointless initialization, and
> moving the sampling a bit closer to its use). These two hunks
> entail no further functional changes.
> 
> Signed-off-by: Dario Faggioli 
> ---
> Cc: George Dunlap 
> Cc: Meng Xu 
> Cc: Wei Liu 


Subject to review from Meng and George:

Release-acked-by: Wei Liu 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen: sched: avoid races on time values read from NOW()

2016-05-19 Thread George Dunlap
On 19/05/16 09:11, Dario Faggioli wrote:
> or (even in cases where there is no race, e.g., outside
> of Credit2) avoid using a time sample which may be rather
> old, and hence stale.
> 
> In fact, we should only sample NOW() from _inside_
> the critical region within which the value we read is
> used. If we don't, in case we have to spin for a while
> before entering the region, when actually using it:
> 
>  1) we will use something that, at the veryy least, is
> not really "now", because of the spinning,
> 
>  2) if someone else sampled NOW() during a critical
> region protected by the lock we are spinning on,
> and if we compare the two samples when we get
> inside our region, our one will be 'earlier',
> even if we actually arrived later, which is a
> race.
> 
> In Credit2, we see an instance of 2), in runq_tickle(),
> when it is called by csched2_context_saved() as it samples
> NOW() before acquiring the runq lock. This makes things
> look like the time went backwards, and it confuses the
> algorithm (there's even a d2printk() about it, which would
> trigger all the time, if enabled).
> 
> In RTDS, something similar happens in repl_timer_handler(),
> and there's another instance in schedule() (in generic code),
> so fix these cases too.
> 
> While there, improve csched2_vcpu_wake() and and rt_vcpu_wake()
> a little as well (removing a pointless initialization, and
> moving the sampling a bit closer to its use). These two hunks
> entail no further functional changes.
> 
> Signed-off-by: Dario Faggioli 

Reviewed-by: George Dunlap 

I agree this is a fairly low-risk bugfix that should be considered for 4.7.

 -George

> ---
> Cc: George Dunlap 
> Cc: Meng Xu 
> Cc: Wei Liu 
> ---
>  xen/common/sched_credit2.c |4 ++--
>  xen/common/sched_rt.c  |7 +--
>  xen/common/schedule.c  |4 +++-
>  3 files changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index f95e509..1933ff1 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -1028,7 +1028,7 @@ static void
>  csched2_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
>  {
>  struct csched2_vcpu * const svc = CSCHED2_VCPU(vc);
> -s_time_t now = 0;
> +s_time_t now;
>  
>  /* Schedule lock should be held at this point. */
>  
> @@ -1085,8 +1085,8 @@ static void
>  csched2_context_saved(const struct scheduler *ops, struct vcpu *vc)
>  {
>  struct csched2_vcpu * const svc = CSCHED2_VCPU(vc);
> -s_time_t now = NOW();
>  spinlock_t *lock = vcpu_schedule_lock_irq(vc);
> +s_time_t now = NOW();
>  
>  BUG_ON( !is_idle_vcpu(vc) && svc->rqd != RQD(ops, vc->processor));
>  
> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> index aa3ffd2..0946101 100644
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -1198,7 +1198,7 @@ static void
>  rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
>  {
>  struct rt_vcpu * const svc = rt_vcpu(vc);
> -s_time_t now = NOW();
> +s_time_t now;
>  bool_t missed;
>  
>  BUG_ON( is_idle_vcpu(vc) );
> @@ -1225,6 +1225,7 @@ rt_vcpu_wake(const struct scheduler *ops, struct vcpu 
> *vc)
>   * If a deadline passed while svc was asleep/blocked, we need new
>   * scheduling parameters (a new deadline and full budget).
>   */
> +now = NOW();
>  
>  missed = ( now >= svc->cur_deadline );
>  if ( missed )
> @@ -1394,7 +1395,7 @@ rt_dom_cntl(
>   * from the replq and does the actual replenishment.
>   */
>  static void repl_timer_handler(void *data){
> -s_time_t now = NOW();
> +s_time_t now;
>  struct scheduler *ops = data;
>  struct rt_private *prv = rt_priv(ops);
>  struct list_head *replq = rt_replq(ops);
> @@ -1406,6 +1407,8 @@ static void repl_timer_handler(void *data){
>  
>  spin_lock_irq(>lock);
>  
> +now = NOW();
> +
>  /*
>   * Do the replenishment and move replenished vcpus
>   * to the temporary list to tickle.
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index 80fea39..5e35310 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -1320,7 +1320,7 @@ static void vcpu_periodic_timer_work(struct vcpu *v)
>  static void schedule(void)
>  {
>  struct vcpu  *prev = current, *next = NULL;
> -s_time_t  now = NOW();
> +s_time_t  now;
>  struct scheduler *sched;
>  unsigned long*tasklet_work = _cpu(tasklet_work_to_do);
>  bool_ttasklet_work_scheduled = 0;
> @@ -1355,6 +1355,8 @@ static void schedule(void)
>  
>  lock = pcpu_schedule_lock_irq(cpu);
>  
> +now = NOW();
> +
>  stop_timer(>s_timer);
>  
>  /* get policy-specific decision on scheduling... */
> 



[Xen-devel] [PATCH] xen: sched: avoid races on time values read from NOW()

2016-05-19 Thread Dario Faggioli
or (even in cases where there is no race, e.g., outside
of Credit2) avoid using a time sample which may be rather
old, and hence stale.

In fact, we should only sample NOW() from _inside_
the critical region within which the value we read is
used. If we don't, in case we have to spin for a while
before entering the region, when actually using it:

 1) we will use something that, at the veryy least, is
not really "now", because of the spinning,

 2) if someone else sampled NOW() during a critical
region protected by the lock we are spinning on,
and if we compare the two samples when we get
inside our region, our one will be 'earlier',
even if we actually arrived later, which is a
race.

In Credit2, we see an instance of 2), in runq_tickle(),
when it is called by csched2_context_saved() as it samples
NOW() before acquiring the runq lock. This makes things
look like the time went backwards, and it confuses the
algorithm (there's even a d2printk() about it, which would
trigger all the time, if enabled).

In RTDS, something similar happens in repl_timer_handler(),
and there's another instance in schedule() (in generic code),
so fix these cases too.

While there, improve csched2_vcpu_wake() and and rt_vcpu_wake()
a little as well (removing a pointless initialization, and
moving the sampling a bit closer to its use). These two hunks
entail no further functional changes.

Signed-off-by: Dario Faggioli 
---
Cc: George Dunlap 
Cc: Meng Xu 
Cc: Wei Liu 
---
 xen/common/sched_credit2.c |4 ++--
 xen/common/sched_rt.c  |7 +--
 xen/common/schedule.c  |4 +++-
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index f95e509..1933ff1 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1028,7 +1028,7 @@ static void
 csched2_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
 {
 struct csched2_vcpu * const svc = CSCHED2_VCPU(vc);
-s_time_t now = 0;
+s_time_t now;
 
 /* Schedule lock should be held at this point. */
 
@@ -1085,8 +1085,8 @@ static void
 csched2_context_saved(const struct scheduler *ops, struct vcpu *vc)
 {
 struct csched2_vcpu * const svc = CSCHED2_VCPU(vc);
-s_time_t now = NOW();
 spinlock_t *lock = vcpu_schedule_lock_irq(vc);
+s_time_t now = NOW();
 
 BUG_ON( !is_idle_vcpu(vc) && svc->rqd != RQD(ops, vc->processor));
 
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index aa3ffd2..0946101 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -1198,7 +1198,7 @@ static void
 rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
 {
 struct rt_vcpu * const svc = rt_vcpu(vc);
-s_time_t now = NOW();
+s_time_t now;
 bool_t missed;
 
 BUG_ON( is_idle_vcpu(vc) );
@@ -1225,6 +1225,7 @@ rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
  * If a deadline passed while svc was asleep/blocked, we need new
  * scheduling parameters (a new deadline and full budget).
  */
+now = NOW();
 
 missed = ( now >= svc->cur_deadline );
 if ( missed )
@@ -1394,7 +1395,7 @@ rt_dom_cntl(
  * from the replq and does the actual replenishment.
  */
 static void repl_timer_handler(void *data){
-s_time_t now = NOW();
+s_time_t now;
 struct scheduler *ops = data;
 struct rt_private *prv = rt_priv(ops);
 struct list_head *replq = rt_replq(ops);
@@ -1406,6 +1407,8 @@ static void repl_timer_handler(void *data){
 
 spin_lock_irq(>lock);
 
+now = NOW();
+
 /*
  * Do the replenishment and move replenished vcpus
  * to the temporary list to tickle.
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 80fea39..5e35310 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1320,7 +1320,7 @@ static void vcpu_periodic_timer_work(struct vcpu *v)
 static void schedule(void)
 {
 struct vcpu  *prev = current, *next = NULL;
-s_time_t  now = NOW();
+s_time_t  now;
 struct scheduler *sched;
 unsigned long*tasklet_work = _cpu(tasklet_work_to_do);
 bool_ttasklet_work_scheduled = 0;
@@ -1355,6 +1355,8 @@ static void schedule(void)
 
 lock = pcpu_schedule_lock_irq(cpu);
 
+now = NOW();
+
 stop_timer(>s_timer);
 
 /* get policy-specific decision on scheduling... */


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel