Re: [patch] sched: Fix clock_gettime(CLOCK_[PROCESS/THREAD]_CPUTIME_ID) monotonicity
On Mon, 2014-06-23 at 19:07 +0200, Mike Galbraith wrote: > On Mon, 2014-06-23 at 17:14 +0200, Peter Zijlstra wrote: > > On Mon, Jun 16, 2014 at 11:16:52AM +0200, Mike Galbraith wrote: > > > (disregard patch of same name from that enterprise weenie;) > > > > > > If a task has been dequeued, it has been accounted. Do not project > > > cycles that may or may not ever be accounted to a dequeued task, as > > > that may make clock_gettime() both inaccurate and non-monotonic. > > > > > > Protect update_rq_clock() from slight TSC skew while at it. > > > > > > Signed-off-by:Mike Galbraith > > > --- > > > kernel/sched/core.c | 13 +++-- > > > 1 file changed, 11 insertions(+), 2 deletions(-) > > > > > > --- a/kernel/sched/core.c > > > +++ b/kernel/sched/core.c > > > @@ -144,6 +144,8 @@ void update_rq_clock(struct rq *rq) > > > return; > > > > > > delta = sched_clock_cpu(cpu_of(rq)) - rq->clock; > > > + if (delta < 0) > > > + return; > > > rq->clock += delta; > > > update_rq_clock_task(rq, delta); > > > } > > > > Have you actually observed this? If TSC is stable this should not > > happen, if TSC is not stable we should be using kernel/sched/clock.c > > which should also avoid this, because while sched_clock_cpu(x) - > > sched_clock_cpu(y) < 0 is possible, sched_clock_cpu(x) - > > sched_clock_cpu(x) should always be >= 0. > > > > I suppose it can happen when the TSC gets screwed and we haven't > > switched to the slow path yet. > > Seen, no. s/no/yes.. 8 socket 64 core box: vogelweide:/sys/kernel/debug/tracing/:[0]# cat trace # tracer: nop # # entries-in-buffer/entries-written: 7/7 #P:64 # # _---=> irqs-off #/ _--=> need-resched #|/ _-=> need-resched_lazy #||/ _=> hardirq/softirq #|||/ _---=> preempt-depth #/ _--=> preempt-lazy-depth #| / _-=> migrate-disable #|| / delay # TASK-PID CPU# || TIMESTAMP FUNCTION # | | | || | | modprobe-1656 [051] d...31312.711729: update_rq_clock: CPU25 negative delta -38 tbench_srv-8091 [054] d...314 129.845174: update_rq_clock: CPU30 negative delta -12 tbench_srv-8048 [062] d...314 129.861227: update_rq_clock: CPU34 negative delta -74 tbench_srv-8054 [046] d...316 132.713941: update_rq_clock: CPU58 negative delta -187 tbench_srv-8077 [047] d...316 133.779680: update_rq_clock: CPU0 negative delta -54 tbench_srv-8048 [058] d...316 137.490513: update_rq_clock: CPU19 negative delta -19 tbench-8087 [004] d...313 139.001686: update_rq_clock: CPU5 negative delta -176 > Damn, I had a reason for leaving that, but seems no now. As expected, testing verified that my imagination works great :) sched: Fix clock_gettime(CLOCK_[PROCESS/THREAD]_CPUTIME_ID) monotonicity If a task has been dequeued, it has been accounted. Do not project cycles that may or may not ever be accounted to a dequeued task, as that may make clock_gettime() both inaccurate and non-monotonic. Protect update_rq_clock() from slight TSC skew while at it. Signed-off-by: Mike Galbraith --- kernel/sched/core.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -144,6 +144,8 @@ void update_rq_clock(struct rq *rq) return; delta = sched_clock_cpu(cpu_of(rq)) - rq->clock; + if (delta < 0) + return; rq->clock += delta; update_rq_clock_task(rq, delta); } @@ -2533,7 +2535,12 @@ static u64 do_task_delta_exec(struct tas { u64 ns = 0; - if (task_current(rq, p)) { + /* +* Must be ->curr _and_ ->on_rq. If dequeued, we would +* project cycles that may never be accounted to this +* thread, breaking clock_gettime(). +*/ + if (task_current(rq, p) && p->on_rq) { update_rq_clock(rq); ns = rq_clock_task(rq) - p->se.exec_start; if ((s64)ns < 0) @@ -2576,8 +2583,10 @@ unsigned long long task_sched_runtime(st * If we race with it leaving cpu, we'll take a lock. So we're correct. * If we race with it entering cpu, unaccounted time is 0. This is * indistinguishable from the read occurring a few cycles earlier. +* If we see ->on_cpu without ->on_rq, the task is leaving, and has +* been accounted, so we're correct here as well. */ - if (!p->on_cpu) + if (!p->on_cpu || !p->on_rq) return p->se.sum_exec_runtime; #endif -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More
Re: [patch] sched: Fix clock_gettime(CLOCK_[PROCESS/THREAD]_CPUTIME_ID) monotonicity
On Mon, 2014-06-23 at 17:14 +0200, Peter Zijlstra wrote: > On Mon, Jun 16, 2014 at 11:16:52AM +0200, Mike Galbraith wrote: > > (disregard patch of same name from that enterprise weenie;) > > > > If a task has been dequeued, it has been accounted. Do not project > > cycles that may or may not ever be accounted to a dequeued task, as > > that may make clock_gettime() both inaccurate and non-monotonic. > > > > Protect update_rq_clock() from slight TSC skew while at it. > > > > Signed-off-by: Mike Galbraith > > --- > > kernel/sched/core.c | 13 +++-- > > 1 file changed, 11 insertions(+), 2 deletions(-) > > > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -144,6 +144,8 @@ void update_rq_clock(struct rq *rq) > > return; > > > > delta = sched_clock_cpu(cpu_of(rq)) - rq->clock; > > + if (delta < 0) > > + return; > > rq->clock += delta; > > update_rq_clock_task(rq, delta); > > } > > Have you actually observed this? If TSC is stable this should not > happen, if TSC is not stable we should be using kernel/sched/clock.c > which should also avoid this, because while sched_clock_cpu(x) - > sched_clock_cpu(y) < 0 is possible, sched_clock_cpu(x) - > sched_clock_cpu(x) should always be >= 0. > > I suppose it can happen when the TSC gets screwed and we haven't > switched to the slow path yet. Seen, no. I saw it while eyeballing, and thought making it impossible for a dinky offset to create a negative delta would be a better plan than hoping such don't exist in the wild. > > @@ -2533,7 +2535,12 @@ static u64 do_task_delta_exec(struct tas > > { > > u64 ns = 0; > > > > - if (task_current(rq, p)) { > > + /* > > +* Must be ->curr, ->on_cpu _and_ ->on_rq. If dequeued, we > > +* would project cycles that may never be accounted to this > > +* thread, breaking clock_gettime(). > > +*/ > > + if (task_current(rq, p) && p->on_cpu && p->on_rq) { > > do we still need ->on_cpu in this case? Damn, I had a reason for leaving that, but seems no now. I don't see why we need to play the accounting projection game at all really. If what's accounted at call time isn't good enough, we should be doing real accounting, not prognostication. Anyway, I'll retest with the on_cpu dropped, which should either make me remember why I felt it should stay, or give me cause to make it gone :) -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] sched: Fix clock_gettime(CLOCK_[PROCESS/THREAD]_CPUTIME_ID) monotonicity
On Mon, Jun 16, 2014 at 11:16:52AM +0200, Mike Galbraith wrote: > (disregard patch of same name from that enterprise weenie;) > > If a task has been dequeued, it has been accounted. Do not project > cycles that may or may not ever be accounted to a dequeued task, as > that may make clock_gettime() both inaccurate and non-monotonic. > > Protect update_rq_clock() from slight TSC skew while at it. > > Signed-off-by:Mike Galbraith > --- > kernel/sched/core.c | 13 +++-- > 1 file changed, 11 insertions(+), 2 deletions(-) > > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -144,6 +144,8 @@ void update_rq_clock(struct rq *rq) > return; > > delta = sched_clock_cpu(cpu_of(rq)) - rq->clock; > + if (delta < 0) > + return; > rq->clock += delta; > update_rq_clock_task(rq, delta); > } Have you actually observed this? If TSC is stable this should not happen, if TSC is not stable we should be using kernel/sched/clock.c which should also avoid this, because while sched_clock_cpu(x) - sched_clock_cpu(y) < 0 is possible, sched_clock_cpu(x) - sched_clock_cpu(x) should always be >= 0. I suppose it can happen when the TSC gets screwed and we haven't switched to the slow path yet. > @@ -2533,7 +2535,12 @@ static u64 do_task_delta_exec(struct tas > { > u64 ns = 0; > > - if (task_current(rq, p)) { > + /* > + * Must be ->curr, ->on_cpu _and_ ->on_rq. If dequeued, we > + * would project cycles that may never be accounted to this > + * thread, breaking clock_gettime(). > + */ > + if (task_current(rq, p) && p->on_cpu && p->on_rq) { do we still need ->on_cpu in this case? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] sched: Fix clock_gettime(CLOCK_[PROCESS/THREAD]_CPUTIME_ID) monotonicity
On Mon, Jun 16, 2014 at 11:16:52AM +0200, Mike Galbraith wrote: (disregard patch of same name from that enterprise weenie;) If a task has been dequeued, it has been accounted. Do not project cycles that may or may not ever be accounted to a dequeued task, as that may make clock_gettime() both inaccurate and non-monotonic. Protect update_rq_clock() from slight TSC skew while at it. Signed-off-by:Mike Galbraith umgwanakikb...@gmail.com --- kernel/sched/core.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -144,6 +144,8 @@ void update_rq_clock(struct rq *rq) return; delta = sched_clock_cpu(cpu_of(rq)) - rq-clock; + if (delta 0) + return; rq-clock += delta; update_rq_clock_task(rq, delta); } Have you actually observed this? If TSC is stable this should not happen, if TSC is not stable we should be using kernel/sched/clock.c which should also avoid this, because while sched_clock_cpu(x) - sched_clock_cpu(y) 0 is possible, sched_clock_cpu(x) - sched_clock_cpu(x) should always be = 0. I suppose it can happen when the TSC gets screwed and we haven't switched to the slow path yet. @@ -2533,7 +2535,12 @@ static u64 do_task_delta_exec(struct tas { u64 ns = 0; - if (task_current(rq, p)) { + /* + * Must be -curr, -on_cpu _and_ -on_rq. If dequeued, we + * would project cycles that may never be accounted to this + * thread, breaking clock_gettime(). + */ + if (task_current(rq, p) p-on_cpu p-on_rq) { do we still need -on_cpu in this case? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] sched: Fix clock_gettime(CLOCK_[PROCESS/THREAD]_CPUTIME_ID) monotonicity
On Mon, 2014-06-23 at 17:14 +0200, Peter Zijlstra wrote: On Mon, Jun 16, 2014 at 11:16:52AM +0200, Mike Galbraith wrote: (disregard patch of same name from that enterprise weenie;) If a task has been dequeued, it has been accounted. Do not project cycles that may or may not ever be accounted to a dequeued task, as that may make clock_gettime() both inaccurate and non-monotonic. Protect update_rq_clock() from slight TSC skew while at it. Signed-off-by: Mike Galbraith umgwanakikb...@gmail.com --- kernel/sched/core.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -144,6 +144,8 @@ void update_rq_clock(struct rq *rq) return; delta = sched_clock_cpu(cpu_of(rq)) - rq-clock; + if (delta 0) + return; rq-clock += delta; update_rq_clock_task(rq, delta); } Have you actually observed this? If TSC is stable this should not happen, if TSC is not stable we should be using kernel/sched/clock.c which should also avoid this, because while sched_clock_cpu(x) - sched_clock_cpu(y) 0 is possible, sched_clock_cpu(x) - sched_clock_cpu(x) should always be = 0. I suppose it can happen when the TSC gets screwed and we haven't switched to the slow path yet. Seen, no. I saw it while eyeballing, and thought making it impossible for a dinky offset to create a negative delta would be a better plan than hoping such don't exist in the wild. @@ -2533,7 +2535,12 @@ static u64 do_task_delta_exec(struct tas { u64 ns = 0; - if (task_current(rq, p)) { + /* +* Must be -curr, -on_cpu _and_ -on_rq. If dequeued, we +* would project cycles that may never be accounted to this +* thread, breaking clock_gettime(). +*/ + if (task_current(rq, p) p-on_cpu p-on_rq) { do we still need -on_cpu in this case? Damn, I had a reason for leaving that, but seems no now. I don't see why we need to play the accounting projection game at all really. If what's accounted at call time isn't good enough, we should be doing real accounting, not prognostication. Anyway, I'll retest with the on_cpu dropped, which should either make me remember why I felt it should stay, or give me cause to make it gone :) -Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] sched: Fix clock_gettime(CLOCK_[PROCESS/THREAD]_CPUTIME_ID) monotonicity
On Mon, 2014-06-23 at 19:07 +0200, Mike Galbraith wrote: On Mon, 2014-06-23 at 17:14 +0200, Peter Zijlstra wrote: On Mon, Jun 16, 2014 at 11:16:52AM +0200, Mike Galbraith wrote: (disregard patch of same name from that enterprise weenie;) If a task has been dequeued, it has been accounted. Do not project cycles that may or may not ever be accounted to a dequeued task, as that may make clock_gettime() both inaccurate and non-monotonic. Protect update_rq_clock() from slight TSC skew while at it. Signed-off-by:Mike Galbraith umgwanakikb...@gmail.com --- kernel/sched/core.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -144,6 +144,8 @@ void update_rq_clock(struct rq *rq) return; delta = sched_clock_cpu(cpu_of(rq)) - rq-clock; + if (delta 0) + return; rq-clock += delta; update_rq_clock_task(rq, delta); } Have you actually observed this? If TSC is stable this should not happen, if TSC is not stable we should be using kernel/sched/clock.c which should also avoid this, because while sched_clock_cpu(x) - sched_clock_cpu(y) 0 is possible, sched_clock_cpu(x) - sched_clock_cpu(x) should always be = 0. I suppose it can happen when the TSC gets screwed and we haven't switched to the slow path yet. Seen, no. s/no/yes.. 8 socket 64 core box: vogelweide:/sys/kernel/debug/tracing/:[0]# cat trace # tracer: nop # # entries-in-buffer/entries-written: 7/7 #P:64 # # _---= irqs-off #/ _--= need-resched #|/ _-= need-resched_lazy #||/ _= hardirq/softirq #|||/ _---= preempt-depth #/ _--= preempt-lazy-depth #| / _-= migrate-disable #|| / delay # TASK-PID CPU# || TIMESTAMP FUNCTION # | | | || | | modprobe-1656 [051] d...31312.711729: update_rq_clock: CPU25 negative delta -38 tbench_srv-8091 [054] d...314 129.845174: update_rq_clock: CPU30 negative delta -12 tbench_srv-8048 [062] d...314 129.861227: update_rq_clock: CPU34 negative delta -74 tbench_srv-8054 [046] d...316 132.713941: update_rq_clock: CPU58 negative delta -187 tbench_srv-8077 [047] d...316 133.779680: update_rq_clock: CPU0 negative delta -54 tbench_srv-8048 [058] d...316 137.490513: update_rq_clock: CPU19 negative delta -19 tbench-8087 [004] d...313 139.001686: update_rq_clock: CPU5 negative delta -176 Damn, I had a reason for leaving that, but seems no now. As expected, testing verified that my imagination works great :) sched: Fix clock_gettime(CLOCK_[PROCESS/THREAD]_CPUTIME_ID) monotonicity If a task has been dequeued, it has been accounted. Do not project cycles that may or may not ever be accounted to a dequeued task, as that may make clock_gettime() both inaccurate and non-monotonic. Protect update_rq_clock() from slight TSC skew while at it. Signed-off-by: Mike Galbraith umgwanakikb...@gmail.com --- kernel/sched/core.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -144,6 +144,8 @@ void update_rq_clock(struct rq *rq) return; delta = sched_clock_cpu(cpu_of(rq)) - rq-clock; + if (delta 0) + return; rq-clock += delta; update_rq_clock_task(rq, delta); } @@ -2533,7 +2535,12 @@ static u64 do_task_delta_exec(struct tas { u64 ns = 0; - if (task_current(rq, p)) { + /* +* Must be -curr _and_ -on_rq. If dequeued, we would +* project cycles that may never be accounted to this +* thread, breaking clock_gettime(). +*/ + if (task_current(rq, p) p-on_rq) { update_rq_clock(rq); ns = rq_clock_task(rq) - p-se.exec_start; if ((s64)ns 0) @@ -2576,8 +2583,10 @@ unsigned long long task_sched_runtime(st * If we race with it leaving cpu, we'll take a lock. So we're correct. * If we race with it entering cpu, unaccounted time is 0. This is * indistinguishable from the read occurring a few cycles earlier. +* If we see -on_cpu without -on_rq, the task is leaving, and has +* been accounted, so we're correct here as well. */ - if (!p-on_cpu) + if (!p-on_cpu || !p-on_rq) return p-se.sum_exec_runtime; #endif -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the
[patch] sched: Fix clock_gettime(CLOCK_[PROCESS/THREAD]_CPUTIME_ID) monotonicity
(disregard patch of same name from that enterprise weenie;) If a task has been dequeued, it has been accounted. Do not project cycles that may or may not ever be accounted to a dequeued task, as that may make clock_gettime() both inaccurate and non-monotonic. Protect update_rq_clock() from slight TSC skew while at it. Signed-off-by: Mike Galbraith --- kernel/sched/core.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -144,6 +144,8 @@ void update_rq_clock(struct rq *rq) return; delta = sched_clock_cpu(cpu_of(rq)) - rq->clock; + if (delta < 0) + return; rq->clock += delta; update_rq_clock_task(rq, delta); } @@ -2533,7 +2535,12 @@ static u64 do_task_delta_exec(struct tas { u64 ns = 0; - if (task_current(rq, p)) { + /* +* Must be ->curr, ->on_cpu _and_ ->on_rq. If dequeued, we +* would project cycles that may never be accounted to this +* thread, breaking clock_gettime(). +*/ + if (task_current(rq, p) && p->on_cpu && p->on_rq) { update_rq_clock(rq); ns = rq_clock_task(rq) - p->se.exec_start; if ((s64)ns < 0) @@ -2576,8 +2583,10 @@ unsigned long long task_sched_runtime(st * If we race with it leaving cpu, we'll take a lock. So we're correct. * If we race with it entering cpu, unaccounted time is 0. This is * indistinguishable from the read occurring a few cycles earlier. +* If we see ->on_cpu without ->on_rq, the task is leaving, and has +* been accounted, so we're correct here as well. */ - if (!p->on_cpu) + if (!p->on_cpu || !p->on_rq) return p->se.sum_exec_runtime; #endif -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] sched: Fix clock_gettime(CLOCK_[PROCESS/THREAD]_CPUTIME_ID) monotonicity
If a task has been dequeued, it has been accounted. Do not project cycles that may or may not ever be accounted to a dequeued task, as that may make clock_gettime() both inaccurate and non-monotonic. Protect update_rq_clock() from slight TSC skew while at it. Signed-off-by: Mike Galbraith --- kernel/sched/core.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -144,6 +144,8 @@ void update_rq_clock(struct rq *rq) return; delta = sched_clock_cpu(cpu_of(rq)) - rq->clock; + if (delta < 0) + return; rq->clock += delta; update_rq_clock_task(rq, delta); } @@ -2533,7 +2535,12 @@ static u64 do_task_delta_exec(struct tas { u64 ns = 0; - if (task_current(rq, p)) { + /* +* Must be ->curr, ->on_cpu _and_ ->on_rq. If dequeued, we +* would project cycles that may never be accounted to this +* thread, breaking clock_gettime(). +*/ + if (task_current(rq, p) && p->on_cpu && p->on_rq) { update_rq_clock(rq); ns = rq_clock_task(rq) - p->se.exec_start; if ((s64)ns < 0) @@ -2576,8 +2583,10 @@ unsigned long long task_sched_runtime(st * If we race with it leaving cpu, we'll take a lock. So we're correct. * If we race with it entering cpu, unaccounted time is 0. This is * indistinguishable from the read occurring a few cycles earlier. +* If we see ->on_cpu without ->on_rq, the task is leaving, and has +* been accounted, so we're correct here as well. */ - if (!p->on_cpu) + if (!p->on_cpu || !p->on_rq) return p->se.sum_exec_runtime; #endif -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] sched: Fix clock_gettime(CLOCK_[PROCESS/THREAD]_CPUTIME_ID) monotonicity
If a task has been dequeued, it has been accounted. Do not project cycles that may or may not ever be accounted to a dequeued task, as that may make clock_gettime() both inaccurate and non-monotonic. Protect update_rq_clock() from slight TSC skew while at it. Signed-off-by: Mike Galbraith umgwanakikb...@gmail.com --- kernel/sched/core.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -144,6 +144,8 @@ void update_rq_clock(struct rq *rq) return; delta = sched_clock_cpu(cpu_of(rq)) - rq-clock; + if (delta 0) + return; rq-clock += delta; update_rq_clock_task(rq, delta); } @@ -2533,7 +2535,12 @@ static u64 do_task_delta_exec(struct tas { u64 ns = 0; - if (task_current(rq, p)) { + /* +* Must be -curr, -on_cpu _and_ -on_rq. If dequeued, we +* would project cycles that may never be accounted to this +* thread, breaking clock_gettime(). +*/ + if (task_current(rq, p) p-on_cpu p-on_rq) { update_rq_clock(rq); ns = rq_clock_task(rq) - p-se.exec_start; if ((s64)ns 0) @@ -2576,8 +2583,10 @@ unsigned long long task_sched_runtime(st * If we race with it leaving cpu, we'll take a lock. So we're correct. * If we race with it entering cpu, unaccounted time is 0. This is * indistinguishable from the read occurring a few cycles earlier. +* If we see -on_cpu without -on_rq, the task is leaving, and has +* been accounted, so we're correct here as well. */ - if (!p-on_cpu) + if (!p-on_cpu || !p-on_rq) return p-se.sum_exec_runtime; #endif -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] sched: Fix clock_gettime(CLOCK_[PROCESS/THREAD]_CPUTIME_ID) monotonicity
(disregard patch of same name from that enterprise weenie;) If a task has been dequeued, it has been accounted. Do not project cycles that may or may not ever be accounted to a dequeued task, as that may make clock_gettime() both inaccurate and non-monotonic. Protect update_rq_clock() from slight TSC skew while at it. Signed-off-by: Mike Galbraith umgwanakikb...@gmail.com --- kernel/sched/core.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -144,6 +144,8 @@ void update_rq_clock(struct rq *rq) return; delta = sched_clock_cpu(cpu_of(rq)) - rq-clock; + if (delta 0) + return; rq-clock += delta; update_rq_clock_task(rq, delta); } @@ -2533,7 +2535,12 @@ static u64 do_task_delta_exec(struct tas { u64 ns = 0; - if (task_current(rq, p)) { + /* +* Must be -curr, -on_cpu _and_ -on_rq. If dequeued, we +* would project cycles that may never be accounted to this +* thread, breaking clock_gettime(). +*/ + if (task_current(rq, p) p-on_cpu p-on_rq) { update_rq_clock(rq); ns = rq_clock_task(rq) - p-se.exec_start; if ((s64)ns 0) @@ -2576,8 +2583,10 @@ unsigned long long task_sched_runtime(st * If we race with it leaving cpu, we'll take a lock. So we're correct. * If we race with it entering cpu, unaccounted time is 0. This is * indistinguishable from the read occurring a few cycles earlier. +* If we see -on_cpu without -on_rq, the task is leaving, and has +* been accounted, so we're correct here as well. */ - if (!p-on_cpu) + if (!p-on_cpu || !p-on_rq) return p-se.sum_exec_runtime; #endif -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/