Re: [patch] sched: Fix clock_gettime(CLOCK_[PROCESS/THREAD]_CPUTIME_ID) monotonicity

2014-06-23 Thread Mike Galbraith
On Mon, 2014-06-23 at 19:07 +0200, Mike Galbraith wrote: 
> On Mon, 2014-06-23 at 17:14 +0200, Peter Zijlstra wrote: 
> > On Mon, Jun 16, 2014 at 11:16:52AM +0200, Mike Galbraith wrote:
> > > (disregard patch of same name from that enterprise weenie;)
> > > 
> > > If a task has been dequeued, it has been accounted.  Do not project
> > > cycles that may or may not ever be accounted to a dequeued task, as
> > > that may make clock_gettime() both inaccurate and non-monotonic.
> > > 
> > > Protect update_rq_clock() from slight TSC skew while at it.
> > > 
> > > Signed-off-by:Mike Galbraith 
> > > ---
> > >  kernel/sched/core.c |   13 +++--
> > >  1 file changed, 11 insertions(+), 2 deletions(-)
> > > 
> > > --- a/kernel/sched/core.c
> > > +++ b/kernel/sched/core.c
> > > @@ -144,6 +144,8 @@ void update_rq_clock(struct rq *rq)
> > >   return;
> > >  
> > >   delta = sched_clock_cpu(cpu_of(rq)) - rq->clock;
> > > + if (delta < 0)
> > > + return;
> > >   rq->clock += delta;
> > >   update_rq_clock_task(rq, delta);
> > >  }
> > 
> > Have you actually observed this? If TSC is stable this should not
> > happen, if TSC is not stable we should be using kernel/sched/clock.c
> > which should also avoid this, because while sched_clock_cpu(x) -
> > sched_clock_cpu(y) < 0 is possible, sched_clock_cpu(x) -
> > sched_clock_cpu(x) should always be >= 0.
> > 
> > I suppose it can happen when the TSC gets screwed and we haven't
> > switched to the slow path yet.
> 
> Seen, no.

s/no/yes.. 8 socket 64 core box:

vogelweide:/sys/kernel/debug/tracing/:[0]# cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 7/7   #P:64
#
#  _---=> irqs-off  
#/  _--=> need-resched  
#|/  _-=> need-resched_lazy 
#||/  _=> hardirq/softirq   
#|||/  _---=> preempt-depth 
#/  _--=> preempt-lazy-depth
#| / _-=> migrate-disable   
#|| / delay
#   TASK-PID   CPU#  ||  TIMESTAMP  FUNCTION
#  | |   |   || | |
modprobe-1656  [051] d...31312.711729: update_rq_clock: CPU25 
negative delta -38
  tbench_srv-8091  [054] d...314   129.845174: update_rq_clock: CPU30 
negative delta -12
  tbench_srv-8048  [062] d...314   129.861227: update_rq_clock: CPU34 
negative delta -74
  tbench_srv-8054  [046] d...316   132.713941: update_rq_clock: CPU58 
negative delta -187
  tbench_srv-8077  [047] d...316   133.779680: update_rq_clock: CPU0 
negative delta -54
  tbench_srv-8048  [058] d...316   137.490513: update_rq_clock: CPU19 
negative delta -19
  tbench-8087  [004] d...313   139.001686: update_rq_clock: CPU5 
negative delta -176


> Damn, I had a reason for leaving that, but seems no now.

As expected, testing verified that my imagination works great :)

sched: Fix clock_gettime(CLOCK_[PROCESS/THREAD]_CPUTIME_ID) monotonicity

If a task has been dequeued, it has been accounted.  Do not project
cycles that may or may not ever be accounted to a dequeued task, as
that may make clock_gettime() both inaccurate and non-monotonic.

Protect update_rq_clock() from slight TSC skew while at it.

Signed-off-by:  Mike Galbraith 
---
 kernel/sched/core.c |   13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -144,6 +144,8 @@ void update_rq_clock(struct rq *rq)
return;
 
delta = sched_clock_cpu(cpu_of(rq)) - rq->clock;
+   if (delta < 0)
+   return;
rq->clock += delta;
update_rq_clock_task(rq, delta);
 }
@@ -2533,7 +2535,12 @@ static u64 do_task_delta_exec(struct tas
 {
u64 ns = 0;
 
-   if (task_current(rq, p)) {
+   /*
+* Must be ->curr _and_ ->on_rq.  If dequeued, we would
+* project cycles that may never be accounted to this
+* thread, breaking clock_gettime().
+*/
+   if (task_current(rq, p) && p->on_rq) {
update_rq_clock(rq);
ns = rq_clock_task(rq) - p->se.exec_start;
if ((s64)ns < 0)
@@ -2576,8 +2583,10 @@ unsigned long long task_sched_runtime(st
 * If we race with it leaving cpu, we'll take a lock. So we're correct.
 * If we race with it entering cpu, unaccounted time is 0. This is
 * indistinguishable from the read occurring a few cycles earlier.
+* If we see ->on_cpu without ->on_rq, the task is leaving, and has
+* been accounted, so we're correct here as well.
 */
-   if (!p->on_cpu)
+   if (!p->on_cpu || !p->on_rq)
return p->se.sum_exec_runtime;
 #endif
 





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More 

Re: [patch] sched: Fix clock_gettime(CLOCK_[PROCESS/THREAD]_CPUTIME_ID) monotonicity

2014-06-23 Thread Mike Galbraith
On Mon, 2014-06-23 at 17:14 +0200, Peter Zijlstra wrote: 
> On Mon, Jun 16, 2014 at 11:16:52AM +0200, Mike Galbraith wrote:
> > (disregard patch of same name from that enterprise weenie;)
> > 
> > If a task has been dequeued, it has been accounted.  Do not project
> > cycles that may or may not ever be accounted to a dequeued task, as
> > that may make clock_gettime() both inaccurate and non-monotonic.
> > 
> > Protect update_rq_clock() from slight TSC skew while at it.
> > 
> > Signed-off-by:  Mike Galbraith 
> > ---
> >  kernel/sched/core.c |   13 +++--
> >  1 file changed, 11 insertions(+), 2 deletions(-)
> > 
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -144,6 +144,8 @@ void update_rq_clock(struct rq *rq)
> > return;
> >  
> > delta = sched_clock_cpu(cpu_of(rq)) - rq->clock;
> > +   if (delta < 0)
> > +   return;
> > rq->clock += delta;
> > update_rq_clock_task(rq, delta);
> >  }
> 
> Have you actually observed this? If TSC is stable this should not
> happen, if TSC is not stable we should be using kernel/sched/clock.c
> which should also avoid this, because while sched_clock_cpu(x) -
> sched_clock_cpu(y) < 0 is possible, sched_clock_cpu(x) -
> sched_clock_cpu(x) should always be >= 0.
> 
> I suppose it can happen when the TSC gets screwed and we haven't
> switched to the slow path yet.

Seen, no.   I saw it while eyeballing, and thought making it impossible
for a dinky offset to create a negative delta would be a better plan
than hoping such don't exist in the wild.

> > @@ -2533,7 +2535,12 @@ static u64 do_task_delta_exec(struct tas
> >  {
> > u64 ns = 0;
> >  
> > -   if (task_current(rq, p)) {
> > +   /*
> > +* Must be ->curr, ->on_cpu _and_ ->on_rq.  If dequeued, we
> > +* would project cycles that may never be accounted to this
> > +* thread, breaking clock_gettime().
> > +*/
> > +   if (task_current(rq, p) && p->on_cpu && p->on_rq) {
> 
> do we still need ->on_cpu in this case?

Damn, I had a reason for leaving that, but seems no now.

I don't see why we need to play the accounting projection game at all
really.  If what's accounted at call time isn't good enough, we should
be doing real accounting, not prognostication.

Anyway, I'll retest with the on_cpu dropped, which should either make me
remember why I felt it should stay, or give me cause to make it gone :)

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: Fix clock_gettime(CLOCK_[PROCESS/THREAD]_CPUTIME_ID) monotonicity

2014-06-23 Thread Peter Zijlstra
On Mon, Jun 16, 2014 at 11:16:52AM +0200, Mike Galbraith wrote:
> (disregard patch of same name from that enterprise weenie;)
> 
> If a task has been dequeued, it has been accounted.  Do not project
> cycles that may or may not ever be accounted to a dequeued task, as
> that may make clock_gettime() both inaccurate and non-monotonic.
> 
> Protect update_rq_clock() from slight TSC skew while at it.
> 
> Signed-off-by:Mike Galbraith 
> ---
>  kernel/sched/core.c |   13 +++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -144,6 +144,8 @@ void update_rq_clock(struct rq *rq)
>   return;
>  
>   delta = sched_clock_cpu(cpu_of(rq)) - rq->clock;
> + if (delta < 0)
> + return;
>   rq->clock += delta;
>   update_rq_clock_task(rq, delta);
>  }

Have you actually observed this? If TSC is stable this should not
happen, if TSC is not stable we should be using kernel/sched/clock.c
which should also avoid this, because while sched_clock_cpu(x) -
sched_clock_cpu(y) < 0 is possible, sched_clock_cpu(x) -
sched_clock_cpu(x) should always be >= 0.

I suppose it can happen when the TSC gets screwed and we haven't
switched to the slow path yet.

> @@ -2533,7 +2535,12 @@ static u64 do_task_delta_exec(struct tas
>  {
>   u64 ns = 0;
>  
> - if (task_current(rq, p)) {
> + /*
> +  * Must be ->curr, ->on_cpu _and_ ->on_rq.  If dequeued, we
> +  * would project cycles that may never be accounted to this
> +  * thread, breaking clock_gettime().
> +  */
> + if (task_current(rq, p) && p->on_cpu && p->on_rq) {

do we still need ->on_cpu in this case?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: Fix clock_gettime(CLOCK_[PROCESS/THREAD]_CPUTIME_ID) monotonicity

2014-06-23 Thread Peter Zijlstra
On Mon, Jun 16, 2014 at 11:16:52AM +0200, Mike Galbraith wrote:
 (disregard patch of same name from that enterprise weenie;)
 
 If a task has been dequeued, it has been accounted.  Do not project
 cycles that may or may not ever be accounted to a dequeued task, as
 that may make clock_gettime() both inaccurate and non-monotonic.
 
 Protect update_rq_clock() from slight TSC skew while at it.
 
 Signed-off-by:Mike Galbraith umgwanakikb...@gmail.com
 ---
  kernel/sched/core.c |   13 +++--
  1 file changed, 11 insertions(+), 2 deletions(-)
 
 --- a/kernel/sched/core.c
 +++ b/kernel/sched/core.c
 @@ -144,6 +144,8 @@ void update_rq_clock(struct rq *rq)
   return;
  
   delta = sched_clock_cpu(cpu_of(rq)) - rq-clock;
 + if (delta  0)
 + return;
   rq-clock += delta;
   update_rq_clock_task(rq, delta);
  }

Have you actually observed this? If TSC is stable this should not
happen, if TSC is not stable we should be using kernel/sched/clock.c
which should also avoid this, because while sched_clock_cpu(x) -
sched_clock_cpu(y)  0 is possible, sched_clock_cpu(x) -
sched_clock_cpu(x) should always be = 0.

I suppose it can happen when the TSC gets screwed and we haven't
switched to the slow path yet.

 @@ -2533,7 +2535,12 @@ static u64 do_task_delta_exec(struct tas
  {
   u64 ns = 0;
  
 - if (task_current(rq, p)) {
 + /*
 +  * Must be -curr, -on_cpu _and_ -on_rq.  If dequeued, we
 +  * would project cycles that may never be accounted to this
 +  * thread, breaking clock_gettime().
 +  */
 + if (task_current(rq, p)  p-on_cpu  p-on_rq) {

do we still need -on_cpu in this case?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: Fix clock_gettime(CLOCK_[PROCESS/THREAD]_CPUTIME_ID) monotonicity

2014-06-23 Thread Mike Galbraith
On Mon, 2014-06-23 at 17:14 +0200, Peter Zijlstra wrote: 
 On Mon, Jun 16, 2014 at 11:16:52AM +0200, Mike Galbraith wrote:
  (disregard patch of same name from that enterprise weenie;)
  
  If a task has been dequeued, it has been accounted.  Do not project
  cycles that may or may not ever be accounted to a dequeued task, as
  that may make clock_gettime() both inaccurate and non-monotonic.
  
  Protect update_rq_clock() from slight TSC skew while at it.
  
  Signed-off-by:  Mike Galbraith umgwanakikb...@gmail.com
  ---
   kernel/sched/core.c |   13 +++--
   1 file changed, 11 insertions(+), 2 deletions(-)
  
  --- a/kernel/sched/core.c
  +++ b/kernel/sched/core.c
  @@ -144,6 +144,8 @@ void update_rq_clock(struct rq *rq)
  return;
   
  delta = sched_clock_cpu(cpu_of(rq)) - rq-clock;
  +   if (delta  0)
  +   return;
  rq-clock += delta;
  update_rq_clock_task(rq, delta);
   }
 
 Have you actually observed this? If TSC is stable this should not
 happen, if TSC is not stable we should be using kernel/sched/clock.c
 which should also avoid this, because while sched_clock_cpu(x) -
 sched_clock_cpu(y)  0 is possible, sched_clock_cpu(x) -
 sched_clock_cpu(x) should always be = 0.
 
 I suppose it can happen when the TSC gets screwed and we haven't
 switched to the slow path yet.

Seen, no.   I saw it while eyeballing, and thought making it impossible
for a dinky offset to create a negative delta would be a better plan
than hoping such don't exist in the wild.

  @@ -2533,7 +2535,12 @@ static u64 do_task_delta_exec(struct tas
   {
  u64 ns = 0;
   
  -   if (task_current(rq, p)) {
  +   /*
  +* Must be -curr, -on_cpu _and_ -on_rq.  If dequeued, we
  +* would project cycles that may never be accounted to this
  +* thread, breaking clock_gettime().
  +*/
  +   if (task_current(rq, p)  p-on_cpu  p-on_rq) {
 
 do we still need -on_cpu in this case?

Damn, I had a reason for leaving that, but seems no now.

I don't see why we need to play the accounting projection game at all
really.  If what's accounted at call time isn't good enough, we should
be doing real accounting, not prognostication.

Anyway, I'll retest with the on_cpu dropped, which should either make me
remember why I felt it should stay, or give me cause to make it gone :)

-Mike

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: Fix clock_gettime(CLOCK_[PROCESS/THREAD]_CPUTIME_ID) monotonicity

2014-06-23 Thread Mike Galbraith
On Mon, 2014-06-23 at 19:07 +0200, Mike Galbraith wrote: 
 On Mon, 2014-06-23 at 17:14 +0200, Peter Zijlstra wrote: 
  On Mon, Jun 16, 2014 at 11:16:52AM +0200, Mike Galbraith wrote:
   (disregard patch of same name from that enterprise weenie;)
   
   If a task has been dequeued, it has been accounted.  Do not project
   cycles that may or may not ever be accounted to a dequeued task, as
   that may make clock_gettime() both inaccurate and non-monotonic.
   
   Protect update_rq_clock() from slight TSC skew while at it.
   
   Signed-off-by:Mike Galbraith umgwanakikb...@gmail.com
   ---
kernel/sched/core.c |   13 +++--
1 file changed, 11 insertions(+), 2 deletions(-)
   
   --- a/kernel/sched/core.c
   +++ b/kernel/sched/core.c
   @@ -144,6 +144,8 @@ void update_rq_clock(struct rq *rq)
 return;

 delta = sched_clock_cpu(cpu_of(rq)) - rq-clock;
   + if (delta  0)
   + return;
 rq-clock += delta;
 update_rq_clock_task(rq, delta);
}
  
  Have you actually observed this? If TSC is stable this should not
  happen, if TSC is not stable we should be using kernel/sched/clock.c
  which should also avoid this, because while sched_clock_cpu(x) -
  sched_clock_cpu(y)  0 is possible, sched_clock_cpu(x) -
  sched_clock_cpu(x) should always be = 0.
  
  I suppose it can happen when the TSC gets screwed and we haven't
  switched to the slow path yet.
 
 Seen, no.

s/no/yes.. 8 socket 64 core box:

vogelweide:/sys/kernel/debug/tracing/:[0]# cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 7/7   #P:64
#
#  _---= irqs-off  
#/  _--= need-resched  
#|/  _-= need-resched_lazy 
#||/  _= hardirq/softirq   
#|||/  _---= preempt-depth 
#/  _--= preempt-lazy-depth
#| / _-= migrate-disable   
#|| / delay
#   TASK-PID   CPU#  ||  TIMESTAMP  FUNCTION
#  | |   |   || | |
modprobe-1656  [051] d...31312.711729: update_rq_clock: CPU25 
negative delta -38
  tbench_srv-8091  [054] d...314   129.845174: update_rq_clock: CPU30 
negative delta -12
  tbench_srv-8048  [062] d...314   129.861227: update_rq_clock: CPU34 
negative delta -74
  tbench_srv-8054  [046] d...316   132.713941: update_rq_clock: CPU58 
negative delta -187
  tbench_srv-8077  [047] d...316   133.779680: update_rq_clock: CPU0 
negative delta -54
  tbench_srv-8048  [058] d...316   137.490513: update_rq_clock: CPU19 
negative delta -19
  tbench-8087  [004] d...313   139.001686: update_rq_clock: CPU5 
negative delta -176


 Damn, I had a reason for leaving that, but seems no now.

As expected, testing verified that my imagination works great :)

sched: Fix clock_gettime(CLOCK_[PROCESS/THREAD]_CPUTIME_ID) monotonicity

If a task has been dequeued, it has been accounted.  Do not project
cycles that may or may not ever be accounted to a dequeued task, as
that may make clock_gettime() both inaccurate and non-monotonic.

Protect update_rq_clock() from slight TSC skew while at it.

Signed-off-by:  Mike Galbraith umgwanakikb...@gmail.com
---
 kernel/sched/core.c |   13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -144,6 +144,8 @@ void update_rq_clock(struct rq *rq)
return;
 
delta = sched_clock_cpu(cpu_of(rq)) - rq-clock;
+   if (delta  0)
+   return;
rq-clock += delta;
update_rq_clock_task(rq, delta);
 }
@@ -2533,7 +2535,12 @@ static u64 do_task_delta_exec(struct tas
 {
u64 ns = 0;
 
-   if (task_current(rq, p)) {
+   /*
+* Must be -curr _and_ -on_rq.  If dequeued, we would
+* project cycles that may never be accounted to this
+* thread, breaking clock_gettime().
+*/
+   if (task_current(rq, p)  p-on_rq) {
update_rq_clock(rq);
ns = rq_clock_task(rq) - p-se.exec_start;
if ((s64)ns  0)
@@ -2576,8 +2583,10 @@ unsigned long long task_sched_runtime(st
 * If we race with it leaving cpu, we'll take a lock. So we're correct.
 * If we race with it entering cpu, unaccounted time is 0. This is
 * indistinguishable from the read occurring a few cycles earlier.
+* If we see -on_cpu without -on_rq, the task is leaving, and has
+* been accounted, so we're correct here as well.
 */
-   if (!p-on_cpu)
+   if (!p-on_cpu || !p-on_rq)
return p-se.sum_exec_runtime;
 #endif
 





--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the 

[patch] sched: Fix clock_gettime(CLOCK_[PROCESS/THREAD]_CPUTIME_ID) monotonicity

2014-06-16 Thread Mike Galbraith
(disregard patch of same name from that enterprise weenie;)

If a task has been dequeued, it has been accounted.  Do not project
cycles that may or may not ever be accounted to a dequeued task, as
that may make clock_gettime() both inaccurate and non-monotonic.

Protect update_rq_clock() from slight TSC skew while at it.

Signed-off-by:  Mike Galbraith 
---
 kernel/sched/core.c |   13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -144,6 +144,8 @@ void update_rq_clock(struct rq *rq)
return;
 
delta = sched_clock_cpu(cpu_of(rq)) - rq->clock;
+   if (delta < 0)
+   return;
rq->clock += delta;
update_rq_clock_task(rq, delta);
 }
@@ -2533,7 +2535,12 @@ static u64 do_task_delta_exec(struct tas
 {
u64 ns = 0;
 
-   if (task_current(rq, p)) {
+   /*
+* Must be ->curr, ->on_cpu _and_ ->on_rq.  If dequeued, we
+* would project cycles that may never be accounted to this
+* thread, breaking clock_gettime().
+*/
+   if (task_current(rq, p) && p->on_cpu && p->on_rq) {
update_rq_clock(rq);
ns = rq_clock_task(rq) - p->se.exec_start;
if ((s64)ns < 0)
@@ -2576,8 +2583,10 @@ unsigned long long task_sched_runtime(st
 * If we race with it leaving cpu, we'll take a lock. So we're correct.
 * If we race with it entering cpu, unaccounted time is 0. This is
 * indistinguishable from the read occurring a few cycles earlier.
+* If we see ->on_cpu without ->on_rq, the task is leaving, and has
+* been accounted, so we're correct here as well.
 */
-   if (!p->on_cpu)
+   if (!p->on_cpu || !p->on_rq)
return p->se.sum_exec_runtime;
 #endif
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch] sched: Fix clock_gettime(CLOCK_[PROCESS/THREAD]_CPUTIME_ID) monotonicity

2014-06-16 Thread Mike Galbraith

If a task has been dequeued, it has been accounted.  Do not project
cycles that may or may not ever be accounted to a dequeued task, as
that may make clock_gettime() both inaccurate and non-monotonic.

Protect update_rq_clock() from slight TSC skew while at it.

Signed-off-by:  Mike Galbraith 
---
 kernel/sched/core.c |   13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -144,6 +144,8 @@ void update_rq_clock(struct rq *rq)
return;
 
delta = sched_clock_cpu(cpu_of(rq)) - rq->clock;
+   if (delta < 0)
+   return;
rq->clock += delta;
update_rq_clock_task(rq, delta);
 }
@@ -2533,7 +2535,12 @@ static u64 do_task_delta_exec(struct tas
 {
u64 ns = 0;
 
-   if (task_current(rq, p)) {
+   /*
+* Must be ->curr, ->on_cpu _and_ ->on_rq.  If dequeued, we
+* would project cycles that may never be accounted to this
+* thread, breaking clock_gettime().
+*/
+   if (task_current(rq, p) && p->on_cpu && p->on_rq) {
update_rq_clock(rq);
ns = rq_clock_task(rq) - p->se.exec_start;
if ((s64)ns < 0)
@@ -2576,8 +2583,10 @@ unsigned long long task_sched_runtime(st
 * If we race with it leaving cpu, we'll take a lock. So we're correct.
 * If we race with it entering cpu, unaccounted time is 0. This is
 * indistinguishable from the read occurring a few cycles earlier.
+* If we see ->on_cpu without ->on_rq, the task is leaving, and has
+* been accounted, so we're correct here as well.
 */
-   if (!p->on_cpu)
+   if (!p->on_cpu || !p->on_rq)
return p->se.sum_exec_runtime;
 #endif
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch] sched: Fix clock_gettime(CLOCK_[PROCESS/THREAD]_CPUTIME_ID) monotonicity

2014-06-16 Thread Mike Galbraith

If a task has been dequeued, it has been accounted.  Do not project
cycles that may or may not ever be accounted to a dequeued task, as
that may make clock_gettime() both inaccurate and non-monotonic.

Protect update_rq_clock() from slight TSC skew while at it.

Signed-off-by:  Mike Galbraith umgwanakikb...@gmail.com
---
 kernel/sched/core.c |   13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -144,6 +144,8 @@ void update_rq_clock(struct rq *rq)
return;
 
delta = sched_clock_cpu(cpu_of(rq)) - rq-clock;
+   if (delta  0)
+   return;
rq-clock += delta;
update_rq_clock_task(rq, delta);
 }
@@ -2533,7 +2535,12 @@ static u64 do_task_delta_exec(struct tas
 {
u64 ns = 0;
 
-   if (task_current(rq, p)) {
+   /*
+* Must be -curr, -on_cpu _and_ -on_rq.  If dequeued, we
+* would project cycles that may never be accounted to this
+* thread, breaking clock_gettime().
+*/
+   if (task_current(rq, p)  p-on_cpu  p-on_rq) {
update_rq_clock(rq);
ns = rq_clock_task(rq) - p-se.exec_start;
if ((s64)ns  0)
@@ -2576,8 +2583,10 @@ unsigned long long task_sched_runtime(st
 * If we race with it leaving cpu, we'll take a lock. So we're correct.
 * If we race with it entering cpu, unaccounted time is 0. This is
 * indistinguishable from the read occurring a few cycles earlier.
+* If we see -on_cpu without -on_rq, the task is leaving, and has
+* been accounted, so we're correct here as well.
 */
-   if (!p-on_cpu)
+   if (!p-on_cpu || !p-on_rq)
return p-se.sum_exec_runtime;
 #endif
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch] sched: Fix clock_gettime(CLOCK_[PROCESS/THREAD]_CPUTIME_ID) monotonicity

2014-06-16 Thread Mike Galbraith
(disregard patch of same name from that enterprise weenie;)

If a task has been dequeued, it has been accounted.  Do not project
cycles that may or may not ever be accounted to a dequeued task, as
that may make clock_gettime() both inaccurate and non-monotonic.

Protect update_rq_clock() from slight TSC skew while at it.

Signed-off-by:  Mike Galbraith umgwanakikb...@gmail.com
---
 kernel/sched/core.c |   13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -144,6 +144,8 @@ void update_rq_clock(struct rq *rq)
return;
 
delta = sched_clock_cpu(cpu_of(rq)) - rq-clock;
+   if (delta  0)
+   return;
rq-clock += delta;
update_rq_clock_task(rq, delta);
 }
@@ -2533,7 +2535,12 @@ static u64 do_task_delta_exec(struct tas
 {
u64 ns = 0;
 
-   if (task_current(rq, p)) {
+   /*
+* Must be -curr, -on_cpu _and_ -on_rq.  If dequeued, we
+* would project cycles that may never be accounted to this
+* thread, breaking clock_gettime().
+*/
+   if (task_current(rq, p)  p-on_cpu  p-on_rq) {
update_rq_clock(rq);
ns = rq_clock_task(rq) - p-se.exec_start;
if ((s64)ns  0)
@@ -2576,8 +2583,10 @@ unsigned long long task_sched_runtime(st
 * If we race with it leaving cpu, we'll take a lock. So we're correct.
 * If we race with it entering cpu, unaccounted time is 0. This is
 * indistinguishable from the read occurring a few cycles earlier.
+* If we see -on_cpu without -on_rq, the task is leaving, and has
+* been accounted, so we're correct here as well.
 */
-   if (!p-on_cpu)
+   if (!p-on_cpu || !p-on_rq)
return p-se.sum_exec_runtime;
 #endif
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/