Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

2018-07-09 Thread Xunlei Pang
Hi Peter,

On 7/9/18 6:48 PM, Peter Zijlstra wrote:
> On Mon, Jul 09, 2018 at 01:52:38PM +0800, Xunlei Pang wrote:
>> Please see the enclosure for the reproducer cputime_adjust.tgz
> 
> No, I'm not going to reverse engineer something if you cannot even
> explain what the problem is.
> 

I rewrote the whole changelog and sent v2, please have a look.

Thanks,
Xunlei


Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

2018-07-09 Thread Xunlei Pang
Hi Peter,

On 7/9/18 6:48 PM, Peter Zijlstra wrote:
> On Mon, Jul 09, 2018 at 01:52:38PM +0800, Xunlei Pang wrote:
>> Please see the enclosure for the reproducer cputime_adjust.tgz
> 
> No, I'm not going to reverse engineer something if you cannot even
> explain what the problem is.
> 

I rewrote the whole changelog and sent v2, please have a look.

Thanks,
Xunlei


Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

2018-07-09 Thread Peter Zijlstra
On Mon, Jul 09, 2018 at 01:52:38PM +0800, Xunlei Pang wrote:
> Please see the enclosure for the reproducer cputime_adjust.tgz

No, I'm not going to reverse engineer something if you cannot even
explain what the problem is.


Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

2018-07-09 Thread Peter Zijlstra
On Mon, Jul 09, 2018 at 01:52:38PM +0800, Xunlei Pang wrote:
> Please see the enclosure for the reproducer cputime_adjust.tgz

No, I'm not going to reverse engineer something if you cannot even
explain what the problem is.


Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

2018-07-09 Thread Peter Zijlstra
On Thu, Jul 05, 2018 at 09:58:50PM +0800, xunlei wrote:
> It is rtime < utime + stime, that is the imprecise tick-based run time
> may be larger than precise sum_exec_runtime scheduler-based run time, it
> can happen with some frequent run-sleep patterns.

You're still comparing two values that should not be compared. utime +
stime is not a run-time. You _CANNOT_ compare to rtime and doing so
means you're still not getting it.



Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

2018-07-09 Thread Peter Zijlstra
On Thu, Jul 05, 2018 at 09:58:50PM +0800, xunlei wrote:
> It is rtime < utime + stime, that is the imprecise tick-based run time
> may be larger than precise sum_exec_runtime scheduler-based run time, it
> can happen with some frequent run-sleep patterns.

You're still comparing two values that should not be compared. utime +
stime is not a run-time. You _CANNOT_ compare to rtime and doing so
means you're still not getting it.



Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

2018-07-08 Thread Xunlei Pang
Hi Peter,

On 7/5/18 9:21 PM, Xunlei Pang wrote:
> On 7/5/18 6:46 PM, Peter Zijlstra wrote:
>> On Wed, Jun 27, 2018 at 08:22:42PM +0800, Xunlei Pang wrote:
>>> tick-based whole utime is utime_0, tick-based whole stime
>>> is stime_0, scheduler time is rtime_0. 
>>
>>> For a long time, the process runs mainly in userspace with
>>> run-sleep patterns, and because two different clocks, it
>>> is possible to have the following condition:
>>>   rtime_0 < utime_0 (as with little stime_0)
>>
>> I don't follow... what?
>>
>> Why are you, and why do you think it makes sense to, compare rtime_0
>> against utime_0 ?
>>
>> The [us]time_0 are, per your earlier definition, ticks. They're not an
>> actual measure of time. Do not compare the two, that makes no bloody
>> sense.
>>
> 
> [us]time_0 is task_struct:utime{stime}, I cited directly from
> cputime_adjust(), both in nanoseconds. I assumed "rtime_0 < utime_0"
> here to simple the following proof to help explain the problem we met.
> 

Please see the enclosure for the reproducer cputime_adjust.tgz
(process_top.sh, usr_sys.c):
gcc usr_sys.c -o usr_sys
Firstly, the function consume_sys() in usr_sys.c yields 100% sys
which can be verified as follows:
$ taskset -c 0 ./usr_sys 1
$ ./process_top.sh $(pidof usr_sys)
0.0 usr,  100.0 sys
0.0 usr,  100.0 sys


Tested on my local box on 4.17.0 by executing "taskset -c 0 ./usr_sys",
then executing "./process_top.sh $(pidof usr_sys)" to watch.
1) Before this patch
50.0 usr,  0.0 sys
50.0 usr,  1.0 sys
50.0 usr,  0.0 sys
50.0 usr,  0.0 sys
49.0 usr,  4.0 sys //switch to consume 100% sys, ignore this line
12.0 usr,  88.0 sys
11.0 usr,  89.0 sys
10.0 usr,  90.0 sys
10.0 usr,  90.0 sys
9.0 usr,  91.0 sys
8.0 usr,  91.0 sys

Obviously there were around 10% sys wrongly goes to usr

2) After this patch
50.0 usr,  0.0 sys
50.0 usr,  0.0 sys
50.0 usr,  0.0 sys
50.0 usr,  0.0 sys
11.0 usr,  76.0 sys //switch to consume 100% sys, ignore this line
1.0 usr,  100.0 sys
0.0 usr,  100.0 sys
1.0 usr,  100.0 sys
0.0 usr,  100.0 sys
0.0 usr,  100.0 sys

So it displayed the correct result as we expected after this patch.

Thanks,
Xunlei


cputime_adjust.tgz
Description: GNU Zip compressed data


Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

2018-07-08 Thread Xunlei Pang
Hi Peter,

On 7/5/18 9:21 PM, Xunlei Pang wrote:
> On 7/5/18 6:46 PM, Peter Zijlstra wrote:
>> On Wed, Jun 27, 2018 at 08:22:42PM +0800, Xunlei Pang wrote:
>>> tick-based whole utime is utime_0, tick-based whole stime
>>> is stime_0, scheduler time is rtime_0. 
>>
>>> For a long time, the process runs mainly in userspace with
>>> run-sleep patterns, and because two different clocks, it
>>> is possible to have the following condition:
>>>   rtime_0 < utime_0 (as with little stime_0)
>>
>> I don't follow... what?
>>
>> Why are you, and why do you think it makes sense to, compare rtime_0
>> against utime_0 ?
>>
>> The [us]time_0 are, per your earlier definition, ticks. They're not an
>> actual measure of time. Do not compare the two, that makes no bloody
>> sense.
>>
> 
> [us]time_0 is task_struct:utime{stime}, I cited directly from
> cputime_adjust(), both in nanoseconds. I assumed "rtime_0 < utime_0"
> here to simple the following proof to help explain the problem we met.
> 

Please see the enclosure for the reproducer cputime_adjust.tgz
(process_top.sh, usr_sys.c):
gcc usr_sys.c -o usr_sys
Firstly, the function consume_sys() in usr_sys.c yields 100% sys
which can be verified as follows:
$ taskset -c 0 ./usr_sys 1
$ ./process_top.sh $(pidof usr_sys)
0.0 usr,  100.0 sys
0.0 usr,  100.0 sys


Tested on my local box on 4.17.0 by executing "taskset -c 0 ./usr_sys",
then executing "./process_top.sh $(pidof usr_sys)" to watch.
1) Before this patch
50.0 usr,  0.0 sys
50.0 usr,  1.0 sys
50.0 usr,  0.0 sys
50.0 usr,  0.0 sys
49.0 usr,  4.0 sys //switch to consume 100% sys, ignore this line
12.0 usr,  88.0 sys
11.0 usr,  89.0 sys
10.0 usr,  90.0 sys
10.0 usr,  90.0 sys
9.0 usr,  91.0 sys
8.0 usr,  91.0 sys

Obviously there were around 10% sys wrongly goes to usr

2) After this patch
50.0 usr,  0.0 sys
50.0 usr,  0.0 sys
50.0 usr,  0.0 sys
50.0 usr,  0.0 sys
11.0 usr,  76.0 sys //switch to consume 100% sys, ignore this line
1.0 usr,  100.0 sys
0.0 usr,  100.0 sys
1.0 usr,  100.0 sys
0.0 usr,  100.0 sys
0.0 usr,  100.0 sys

So it displayed the correct result as we expected after this patch.

Thanks,
Xunlei


cputime_adjust.tgz
Description: GNU Zip compressed data


Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

2018-07-05 Thread xunlei
On 7/5/18 9:42 PM, Peter Zijlstra wrote:
> On Thu, Jul 05, 2018 at 09:21:15PM +0800, Xunlei Pang wrote:
>> On 7/5/18 6:46 PM, Peter Zijlstra wrote:
>>> On Wed, Jun 27, 2018 at 08:22:42PM +0800, Xunlei Pang wrote:
 tick-based whole utime is utime_0, tick-based whole stime
 is stime_0, scheduler time is rtime_0. 
>>>
 For a long time, the process runs mainly in userspace with
 run-sleep patterns, and because two different clocks, it
 is possible to have the following condition:
   rtime_0 < utime_0 (as with little stime_0)
>>>
>>> I don't follow... what?
>>>
>>> Why are you, and why do you think it makes sense to, compare rtime_0
>>> against utime_0 ?
>>>
>>> The [us]time_0 are, per your earlier definition, ticks. They're not an
>>> actual measure of time. Do not compare the two, that makes no bloody
>>> sense.
>>>
>>
>> [us]time_0 is task_struct:utime{stime}, I cited directly from
>> cputime_adjust(), both in nanoseconds. I assumed "rtime_0 < utime_0"
>> here to simple the following proof to help explain the problem we met.
> 
> In the !VIRT_CPU_ACCOUNTING case they (task_struct::[us]time) are not
> actual durations. Yes, the happen to be accounted in multiples of
> TICK_NSEC and thereby happen to carry a [ns] unit, but they are not
> durations, they are samples.
> 
> (we just happen to store them in a [ns] unit because for
> VIRT_CPU_ACCOUNTING they are in fact durations)
> 
> If 'rtime < utime' is not a valid assumption to build a problem on for
> !VIRT_CPU_ACCOUNTING.
> 

It is rtime < utime + stime, that is the imprecise tick-based run time
may be larger than precise sum_exec_runtime scheduler-based run time, it
can happen with some frequent run-sleep patterns.

Because stime is usually very small, so it is possible to have rtime <
utime.

> 
> So please try again, so far you're not making any sense.
> 

I also had a reproducer to verify this patch, can attach it tomorrow.


Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

2018-07-05 Thread xunlei
On 7/5/18 9:42 PM, Peter Zijlstra wrote:
> On Thu, Jul 05, 2018 at 09:21:15PM +0800, Xunlei Pang wrote:
>> On 7/5/18 6:46 PM, Peter Zijlstra wrote:
>>> On Wed, Jun 27, 2018 at 08:22:42PM +0800, Xunlei Pang wrote:
 tick-based whole utime is utime_0, tick-based whole stime
 is stime_0, scheduler time is rtime_0. 
>>>
 For a long time, the process runs mainly in userspace with
 run-sleep patterns, and because two different clocks, it
 is possible to have the following condition:
   rtime_0 < utime_0 (as with little stime_0)
>>>
>>> I don't follow... what?
>>>
>>> Why are you, and why do you think it makes sense to, compare rtime_0
>>> against utime_0 ?
>>>
>>> The [us]time_0 are, per your earlier definition, ticks. They're not an
>>> actual measure of time. Do not compare the two, that makes no bloody
>>> sense.
>>>
>>
>> [us]time_0 is task_struct:utime{stime}, I cited directly from
>> cputime_adjust(), both in nanoseconds. I assumed "rtime_0 < utime_0"
>> here to simple the following proof to help explain the problem we met.
> 
> In the !VIRT_CPU_ACCOUNTING case they (task_struct::[us]time) are not
> actual durations. Yes, the happen to be accounted in multiples of
> TICK_NSEC and thereby happen to carry a [ns] unit, but they are not
> durations, they are samples.
> 
> (we just happen to store them in a [ns] unit because for
> VIRT_CPU_ACCOUNTING they are in fact durations)
> 
> If 'rtime < utime' is not a valid assumption to build a problem on for
> !VIRT_CPU_ACCOUNTING.
> 

It is rtime < utime + stime, that is the imprecise tick-based run time
may be larger than precise sum_exec_runtime scheduler-based run time, it
can happen with some frequent run-sleep patterns.

Because stime is usually very small, so it is possible to have rtime <
utime.

> 
> So please try again, so far you're not making any sense.
> 

I also had a reproducer to verify this patch, can attach it tomorrow.


Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

2018-07-05 Thread Peter Zijlstra
On Thu, Jul 05, 2018 at 09:21:15PM +0800, Xunlei Pang wrote:
> On 7/5/18 6:46 PM, Peter Zijlstra wrote:
> > On Wed, Jun 27, 2018 at 08:22:42PM +0800, Xunlei Pang wrote:
> >> tick-based whole utime is utime_0, tick-based whole stime
> >> is stime_0, scheduler time is rtime_0. 
> > 
> >> For a long time, the process runs mainly in userspace with
> >> run-sleep patterns, and because two different clocks, it
> >> is possible to have the following condition:
> >>   rtime_0 < utime_0 (as with little stime_0)
> > 
> > I don't follow... what?
> > 
> > Why are you, and why do you think it makes sense to, compare rtime_0
> > against utime_0 ?
> > 
> > The [us]time_0 are, per your earlier definition, ticks. They're not an
> > actual measure of time. Do not compare the two, that makes no bloody
> > sense.
> > 
> 
> [us]time_0 is task_struct:utime{stime}, I cited directly from
> cputime_adjust(), both in nanoseconds. I assumed "rtime_0 < utime_0"
> here to simple the following proof to help explain the problem we met.

In the !VIRT_CPU_ACCOUNTING case they (task_struct::[us]time) are not
actual durations. Yes, the happen to be accounted in multiples of
TICK_NSEC and thereby happen to carry a [ns] unit, but they are not
durations, they are samples.

(we just happen to store them in a [ns] unit because for
VIRT_CPU_ACCOUNTING they are in fact durations)

If 'rtime < utime' is not a valid assumption to build a problem on for
!VIRT_CPU_ACCOUNTING.


So please try again, so far you're not making any sense.


Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

2018-07-05 Thread Peter Zijlstra
On Thu, Jul 05, 2018 at 09:21:15PM +0800, Xunlei Pang wrote:
> On 7/5/18 6:46 PM, Peter Zijlstra wrote:
> > On Wed, Jun 27, 2018 at 08:22:42PM +0800, Xunlei Pang wrote:
> >> tick-based whole utime is utime_0, tick-based whole stime
> >> is stime_0, scheduler time is rtime_0. 
> > 
> >> For a long time, the process runs mainly in userspace with
> >> run-sleep patterns, and because two different clocks, it
> >> is possible to have the following condition:
> >>   rtime_0 < utime_0 (as with little stime_0)
> > 
> > I don't follow... what?
> > 
> > Why are you, and why do you think it makes sense to, compare rtime_0
> > against utime_0 ?
> > 
> > The [us]time_0 are, per your earlier definition, ticks. They're not an
> > actual measure of time. Do not compare the two, that makes no bloody
> > sense.
> > 
> 
> [us]time_0 is task_struct:utime{stime}, I cited directly from
> cputime_adjust(), both in nanoseconds. I assumed "rtime_0 < utime_0"
> here to simple the following proof to help explain the problem we met.

In the !VIRT_CPU_ACCOUNTING case they (task_struct::[us]time) are not
actual durations. Yes, the happen to be accounted in multiples of
TICK_NSEC and thereby happen to carry a [ns] unit, but they are not
durations, they are samples.

(we just happen to store them in a [ns] unit because for
VIRT_CPU_ACCOUNTING they are in fact durations)

If 'rtime < utime' is not a valid assumption to build a problem on for
!VIRT_CPU_ACCOUNTING.


So please try again, so far you're not making any sense.


Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

2018-07-05 Thread Xunlei Pang
On 7/5/18 6:46 PM, Peter Zijlstra wrote:
> On Wed, Jun 27, 2018 at 08:22:42PM +0800, Xunlei Pang wrote:
>> tick-based whole utime is utime_0, tick-based whole stime
>> is stime_0, scheduler time is rtime_0. 
> 
>> For a long time, the process runs mainly in userspace with
>> run-sleep patterns, and because two different clocks, it
>> is possible to have the following condition:
>>   rtime_0 < utime_0 (as with little stime_0)
> 
> I don't follow... what?
> 
> Why are you, and why do you think it makes sense to, compare rtime_0
> against utime_0 ?
> 
> The [us]time_0 are, per your earlier definition, ticks. They're not an
> actual measure of time. Do not compare the two, that makes no bloody
> sense.
> 

[us]time_0 is task_struct:utime{stime}, I cited directly from
cputime_adjust(), both in nanoseconds. I assumed "rtime_0 < utime_0"
here to simple the following proof to help explain the problem we met.


Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

2018-07-05 Thread Xunlei Pang
On 7/5/18 6:46 PM, Peter Zijlstra wrote:
> On Wed, Jun 27, 2018 at 08:22:42PM +0800, Xunlei Pang wrote:
>> tick-based whole utime is utime_0, tick-based whole stime
>> is stime_0, scheduler time is rtime_0. 
> 
>> For a long time, the process runs mainly in userspace with
>> run-sleep patterns, and because two different clocks, it
>> is possible to have the following condition:
>>   rtime_0 < utime_0 (as with little stime_0)
> 
> I don't follow... what?
> 
> Why are you, and why do you think it makes sense to, compare rtime_0
> against utime_0 ?
> 
> The [us]time_0 are, per your earlier definition, ticks. They're not an
> actual measure of time. Do not compare the two, that makes no bloody
> sense.
> 

[us]time_0 is task_struct:utime{stime}, I cited directly from
cputime_adjust(), both in nanoseconds. I assumed "rtime_0 < utime_0"
here to simple the following proof to help explain the problem we met.


Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

2018-07-05 Thread Peter Zijlstra
On Wed, Jun 27, 2018 at 08:22:42PM +0800, Xunlei Pang wrote:
> tick-based whole utime is utime_0, tick-based whole stime
> is stime_0, scheduler time is rtime_0. 

> For a long time, the process runs mainly in userspace with
> run-sleep patterns, and because two different clocks, it
> is possible to have the following condition:
>   rtime_0 < utime_0 (as with little stime_0)

I don't follow... what?

Why are you, and why do you think it makes sense to, compare rtime_0
against utime_0 ?

The [us]time_0 are, per your earlier definition, ticks. They're not an
actual measure of time. Do not compare the two, that makes no bloody
sense.


Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

2018-07-05 Thread Peter Zijlstra
On Wed, Jun 27, 2018 at 08:22:42PM +0800, Xunlei Pang wrote:
> tick-based whole utime is utime_0, tick-based whole stime
> is stime_0, scheduler time is rtime_0. 

> For a long time, the process runs mainly in userspace with
> run-sleep patterns, and because two different clocks, it
> is possible to have the following condition:
>   rtime_0 < utime_0 (as with little stime_0)

I don't follow... what?

Why are you, and why do you think it makes sense to, compare rtime_0
against utime_0 ?

The [us]time_0 are, per your earlier definition, ticks. They're not an
actual measure of time. Do not compare the two, that makes no bloody
sense.


Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

2018-07-04 Thread Xunlei Pang
On 7/2/18 11:21 PM, Tejun Heo wrote:
> Hello, Peter.
> 
> On Tue, Jun 26, 2018 at 05:49:08PM +0200, Peter Zijlstra wrote:
>> Well, no, because the Changelog is incomprehensible and the patch
>> doesn't really have useful comments, so I'll have to reverse engineer
>> the entire thing, and I've just not had time for that.
> 
> Just as an additional data point, we also sometimes see artifacts from
> cpu_adjust_time() in the form of per-task user or sys time getting
> stuck for some period (in extreme cases for over a minute) while the
> application isn't doing anything differently.  We're telling the users
> that it's an inherent sampling artifact but it'd be nice to improve it
> if possible without adding noticeable overhead.  No idea whether this
> patch's approach is a good one tho.

The patch has no noticeable overhead except the extra cputime fileds
added into task_struct. We've been running this patch on our servers
for months, looks good till now.

Thanks!


Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

2018-07-04 Thread Xunlei Pang
On 7/2/18 11:21 PM, Tejun Heo wrote:
> Hello, Peter.
> 
> On Tue, Jun 26, 2018 at 05:49:08PM +0200, Peter Zijlstra wrote:
>> Well, no, because the Changelog is incomprehensible and the patch
>> doesn't really have useful comments, so I'll have to reverse engineer
>> the entire thing, and I've just not had time for that.
> 
> Just as an additional data point, we also sometimes see artifacts from
> cpu_adjust_time() in the form of per-task user or sys time getting
> stuck for some period (in extreme cases for over a minute) while the
> application isn't doing anything differently.  We're telling the users
> that it's an inherent sampling artifact but it'd be nice to improve it
> if possible without adding noticeable overhead.  No idea whether this
> patch's approach is a good one tho.

The patch has no noticeable overhead except the extra cputime fileds
added into task_struct. We've been running this patch on our servers
for months, looks good till now.

Thanks!


Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

2018-07-02 Thread Tejun Heo
Hello, Peter.

On Tue, Jun 26, 2018 at 05:49:08PM +0200, Peter Zijlstra wrote:
> Well, no, because the Changelog is incomprehensible and the patch
> doesn't really have useful comments, so I'll have to reverse engineer
> the entire thing, and I've just not had time for that.

Just as an additional data point, we also sometimes see artifacts from
cpu_adjust_time() in the form of per-task user or sys time getting
stuck for some period (in extreme cases for over a minute) while the
application isn't doing anything differently.  We're telling the users
that it's an inherent sampling artifact but it'd be nice to improve it
if possible without adding noticeable overhead.  No idea whether this
patch's approach is a good one tho.

Thanks.

-- 
tejun


Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

2018-07-02 Thread Tejun Heo
Hello, Peter.

On Tue, Jun 26, 2018 at 05:49:08PM +0200, Peter Zijlstra wrote:
> Well, no, because the Changelog is incomprehensible and the patch
> doesn't really have useful comments, so I'll have to reverse engineer
> the entire thing, and I've just not had time for that.

Just as an additional data point, we also sometimes see artifacts from
cpu_adjust_time() in the form of per-task user or sys time getting
stuck for some period (in extreme cases for over a minute) while the
application isn't doing anything differently.  We're telling the users
that it's an inherent sampling artifact but it'd be nice to improve it
if possible without adding noticeable overhead.  No idea whether this
patch's approach is a good one tho.

Thanks.

-- 
tejun


Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

2018-06-27 Thread Xunlei Pang
On 6/26/18 11:49 PM, Peter Zijlstra wrote:
> On Tue, Jun 26, 2018 at 08:19:49PM +0800, Xunlei Pang wrote:
>> On 6/22/18 3:15 PM, Xunlei Pang wrote:
>>> We use per-cgroup cpu usage statistics similar to "cgroup rstat",
>>> and encountered a problem that user and sys usages are wrongly
>>> split sometimes.
>>>
>>> Run tasks with some random run-sleep pattern for a long time, and
>>> when tick-based time and scheduler sum_exec_runtime hugely drifts
>>> apart(scheduler sum_exec_runtime is less than tick-based time),
>>> the current implementation of cputime_adjust() will produce less
>>> sys usage than the actual use after changing to run a different
>>> workload pattern with high sys. This is because total tick-based
>>> utime and stime are used to split the total sum_exec_runtime.
>>>
>>> Same problem exists on utime and stime from "/proc//stat".
>>>
>>> [Example]
>>> Run some random run-sleep patterns for minutes, then change to run
>>> high sys pattern, and watch.
>>> 1) standard "top"(which is the correct one):
>>>4.6 us, 94.5 sy,  0.0 ni,  0.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
>>> 2) our tool parsing utime and stime from "/proc//stat":
>>>20.5 usr, 78.4 sys
>>> We can see "20.5 usr" displayed in 2) was incorrect, it recovers
>>> gradually with time: 9.7 usr, 89.5 sys
>>>
>> High sys probably means there's something abnormal on the kernel
>> path, it may hide issues, so we should make it fairly reliable.
>> It can easily hit this problem with our per-cgroup statistics.
>>
>> Hi Peter, any comment on this patch?
> 
> Well, no, because the Changelog is incomprehensible and the patch
> doesn't really have useful comments, so I'll have to reverse engineer
> the entire thing, and I've just not had time for that.
> 

Let me try the best to describe it again.

There are two types of run time for a process:
1) task_struct::utime, task_struct::stime in ticks.
2) scheduler task_struct::se.sum_exec_runtime(rtime) in ns.

In case of no vtime accounting, the utime/stime fileds of
/proc/pid/stat are calculated by cputime_adjust(), which
splits the precise rtime in the proportion of tick-based
utime and stime.

However cputime_adjust() always does the split using the
total utime/stime of the process, this may cause wrong
splitting in some cases, e.g.

A typical statistic collector accesses "/proc/pid/stat".
1) moment t0
After accessed /proc/pid/stat in t0:
tick-based whole utime is utime_0, tick-based whole stime
is stime_0, scheduler time is rtime_0. The ajusted utime
caculated by cputime_adjust() is autime_0, ajusted stime
is astime_0, so astime_0=rtime_0*stime_0/(utime_0+stime_0).

For a long time, the process runs mainly in userspace with
run-sleep patterns, and because two different clocks, it
is possible to have the following condition:
  rtime_0 < utime_0 (as with little stime_0)

2) moment t1(after dt, i.e. t0+dt)
Then the process suddenly runs 100% sys workload afterwards
lasting "dt", when accessing /proc/pid/stat at t1="t0+dt",
both rtime_0 and stime_0 increase "dt", thus cputime_ajust()
does the calculation for new adjusted astime_1 as follows:
  (rtime_0+dt)*(stime_0+dt)/(utime_0+stime_0+dt)
= (rtime_0*stime_0+rtime_0*dt+stime_0*dt+dt*dt)/(utime_0+stime_0+dt)
= (rtime_0*stime_0+rtime_0*dt-utime_0*dt)/(utime_0+stime_0+dt) + dt
< rtime_0*stime_0/(utime_0+stime_0+dt) + dt (for rtime_0 < utime_0)
< rtime_0*stime_0/(utime_0+stime_0) + dt
< astime_0+dt

The actual astime_1 should be "astime_0+dt"(as it runs 100%
sys during dt), but the caculated figure by cputime_adjust()
becomes much smaller, as a result the statistics collector
shows less cpu sys usage than the actual one.

That's why we occasionally observed the incorrect cpu usage
described in the changelog:
[Example]
Run some random run-sleep patterns for minutes, then change
to run high sys pattern, and watch.
1) standard "top"(which is the correct one):
   4.6 us, 94.5 sy,  0.0 ni,  0.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
2) Parsing utime and stime from "/proc//stat":
   20.5 usr, 78.4 sys
We can see "20.5 usr" displayed in 2) was incorrect, it recovers
gradually with time: 9.7 usr, 89.5 sys


Thanks,
Xunlei


Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

2018-06-27 Thread Xunlei Pang
On 6/26/18 11:49 PM, Peter Zijlstra wrote:
> On Tue, Jun 26, 2018 at 08:19:49PM +0800, Xunlei Pang wrote:
>> On 6/22/18 3:15 PM, Xunlei Pang wrote:
>>> We use per-cgroup cpu usage statistics similar to "cgroup rstat",
>>> and encountered a problem that user and sys usages are wrongly
>>> split sometimes.
>>>
>>> Run tasks with some random run-sleep pattern for a long time, and
>>> when tick-based time and scheduler sum_exec_runtime hugely drifts
>>> apart(scheduler sum_exec_runtime is less than tick-based time),
>>> the current implementation of cputime_adjust() will produce less
>>> sys usage than the actual use after changing to run a different
>>> workload pattern with high sys. This is because total tick-based
>>> utime and stime are used to split the total sum_exec_runtime.
>>>
>>> Same problem exists on utime and stime from "/proc//stat".
>>>
>>> [Example]
>>> Run some random run-sleep patterns for minutes, then change to run
>>> high sys pattern, and watch.
>>> 1) standard "top"(which is the correct one):
>>>4.6 us, 94.5 sy,  0.0 ni,  0.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
>>> 2) our tool parsing utime and stime from "/proc//stat":
>>>20.5 usr, 78.4 sys
>>> We can see "20.5 usr" displayed in 2) was incorrect, it recovers
>>> gradually with time: 9.7 usr, 89.5 sys
>>>
>> High sys probably means there's something abnormal on the kernel
>> path, it may hide issues, so we should make it fairly reliable.
>> It can easily hit this problem with our per-cgroup statistics.
>>
>> Hi Peter, any comment on this patch?
> 
> Well, no, because the Changelog is incomprehensible and the patch
> doesn't really have useful comments, so I'll have to reverse engineer
> the entire thing, and I've just not had time for that.
> 

Let me try the best to describe it again.

There are two types of run time for a process:
1) task_struct::utime, task_struct::stime in ticks.
2) scheduler task_struct::se.sum_exec_runtime(rtime) in ns.

In case of no vtime accounting, the utime/stime fileds of
/proc/pid/stat are calculated by cputime_adjust(), which
splits the precise rtime in the proportion of tick-based
utime and stime.

However cputime_adjust() always does the split using the
total utime/stime of the process, this may cause wrong
splitting in some cases, e.g.

A typical statistic collector accesses "/proc/pid/stat".
1) moment t0
After accessed /proc/pid/stat in t0:
tick-based whole utime is utime_0, tick-based whole stime
is stime_0, scheduler time is rtime_0. The ajusted utime
caculated by cputime_adjust() is autime_0, ajusted stime
is astime_0, so astime_0=rtime_0*stime_0/(utime_0+stime_0).

For a long time, the process runs mainly in userspace with
run-sleep patterns, and because two different clocks, it
is possible to have the following condition:
  rtime_0 < utime_0 (as with little stime_0)

2) moment t1(after dt, i.e. t0+dt)
Then the process suddenly runs 100% sys workload afterwards
lasting "dt", when accessing /proc/pid/stat at t1="t0+dt",
both rtime_0 and stime_0 increase "dt", thus cputime_ajust()
does the calculation for new adjusted astime_1 as follows:
  (rtime_0+dt)*(stime_0+dt)/(utime_0+stime_0+dt)
= (rtime_0*stime_0+rtime_0*dt+stime_0*dt+dt*dt)/(utime_0+stime_0+dt)
= (rtime_0*stime_0+rtime_0*dt-utime_0*dt)/(utime_0+stime_0+dt) + dt
< rtime_0*stime_0/(utime_0+stime_0+dt) + dt (for rtime_0 < utime_0)
< rtime_0*stime_0/(utime_0+stime_0) + dt
< astime_0+dt

The actual astime_1 should be "astime_0+dt"(as it runs 100%
sys during dt), but the caculated figure by cputime_adjust()
becomes much smaller, as a result the statistics collector
shows less cpu sys usage than the actual one.

That's why we occasionally observed the incorrect cpu usage
described in the changelog:
[Example]
Run some random run-sleep patterns for minutes, then change
to run high sys pattern, and watch.
1) standard "top"(which is the correct one):
   4.6 us, 94.5 sy,  0.0 ni,  0.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
2) Parsing utime and stime from "/proc//stat":
   20.5 usr, 78.4 sys
We can see "20.5 usr" displayed in 2) was incorrect, it recovers
gradually with time: 9.7 usr, 89.5 sys


Thanks,
Xunlei


Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

2018-06-26 Thread Peter Zijlstra
On Tue, Jun 26, 2018 at 08:19:49PM +0800, Xunlei Pang wrote:
> On 6/22/18 3:15 PM, Xunlei Pang wrote:
> > We use per-cgroup cpu usage statistics similar to "cgroup rstat",
> > and encountered a problem that user and sys usages are wrongly
> > split sometimes.
> > 
> > Run tasks with some random run-sleep pattern for a long time, and
> > when tick-based time and scheduler sum_exec_runtime hugely drifts
> > apart(scheduler sum_exec_runtime is less than tick-based time),
> > the current implementation of cputime_adjust() will produce less
> > sys usage than the actual use after changing to run a different
> > workload pattern with high sys. This is because total tick-based
> > utime and stime are used to split the total sum_exec_runtime.
> > 
> > Same problem exists on utime and stime from "/proc//stat".
> > 
> > [Example]
> > Run some random run-sleep patterns for minutes, then change to run
> > high sys pattern, and watch.
> > 1) standard "top"(which is the correct one):
> >4.6 us, 94.5 sy,  0.0 ni,  0.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> > 2) our tool parsing utime and stime from "/proc//stat":
> >20.5 usr, 78.4 sys
> > We can see "20.5 usr" displayed in 2) was incorrect, it recovers
> > gradually with time: 9.7 usr, 89.5 sys
> > 
> High sys probably means there's something abnormal on the kernel
> path, it may hide issues, so we should make it fairly reliable.
> It can easily hit this problem with our per-cgroup statistics.
> 
> Hi Peter, any comment on this patch?

Well, no, because the Changelog is incomprehensible and the patch
doesn't really have useful comments, so I'll have to reverse engineer
the entire thing, and I've just not had time for that.


Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

2018-06-26 Thread Peter Zijlstra
On Tue, Jun 26, 2018 at 08:19:49PM +0800, Xunlei Pang wrote:
> On 6/22/18 3:15 PM, Xunlei Pang wrote:
> > We use per-cgroup cpu usage statistics similar to "cgroup rstat",
> > and encountered a problem that user and sys usages are wrongly
> > split sometimes.
> > 
> > Run tasks with some random run-sleep pattern for a long time, and
> > when tick-based time and scheduler sum_exec_runtime hugely drifts
> > apart(scheduler sum_exec_runtime is less than tick-based time),
> > the current implementation of cputime_adjust() will produce less
> > sys usage than the actual use after changing to run a different
> > workload pattern with high sys. This is because total tick-based
> > utime and stime are used to split the total sum_exec_runtime.
> > 
> > Same problem exists on utime and stime from "/proc//stat".
> > 
> > [Example]
> > Run some random run-sleep patterns for minutes, then change to run
> > high sys pattern, and watch.
> > 1) standard "top"(which is the correct one):
> >4.6 us, 94.5 sy,  0.0 ni,  0.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> > 2) our tool parsing utime and stime from "/proc//stat":
> >20.5 usr, 78.4 sys
> > We can see "20.5 usr" displayed in 2) was incorrect, it recovers
> > gradually with time: 9.7 usr, 89.5 sys
> > 
> High sys probably means there's something abnormal on the kernel
> path, it may hide issues, so we should make it fairly reliable.
> It can easily hit this problem with our per-cgroup statistics.
> 
> Hi Peter, any comment on this patch?

Well, no, because the Changelog is incomprehensible and the patch
doesn't really have useful comments, so I'll have to reverse engineer
the entire thing, and I've just not had time for that.


Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

2018-06-26 Thread Xunlei Pang
On 6/22/18 3:15 PM, Xunlei Pang wrote:
> We use per-cgroup cpu usage statistics similar to "cgroup rstat",
> and encountered a problem that user and sys usages are wrongly
> split sometimes.
> 
> Run tasks with some random run-sleep pattern for a long time, and
> when tick-based time and scheduler sum_exec_runtime hugely drifts
> apart(scheduler sum_exec_runtime is less than tick-based time),
> the current implementation of cputime_adjust() will produce less
> sys usage than the actual use after changing to run a different
> workload pattern with high sys. This is because total tick-based
> utime and stime are used to split the total sum_exec_runtime.
> 
> Same problem exists on utime and stime from "/proc//stat".
> 
> [Example]
> Run some random run-sleep patterns for minutes, then change to run
> high sys pattern, and watch.
> 1) standard "top"(which is the correct one):
>4.6 us, 94.5 sy,  0.0 ni,  0.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> 2) our tool parsing utime and stime from "/proc//stat":
>20.5 usr, 78.4 sys
> We can see "20.5 usr" displayed in 2) was incorrect, it recovers
> gradually with time: 9.7 usr, 89.5 sys
> 
High sys probably means there's something abnormal on the kernel
path, it may hide issues, so we should make it fairly reliable.
It can easily hit this problem with our per-cgroup statistics.

Hi Peter, any comment on this patch?


Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

2018-06-26 Thread Xunlei Pang
On 6/22/18 3:15 PM, Xunlei Pang wrote:
> We use per-cgroup cpu usage statistics similar to "cgroup rstat",
> and encountered a problem that user and sys usages are wrongly
> split sometimes.
> 
> Run tasks with some random run-sleep pattern for a long time, and
> when tick-based time and scheduler sum_exec_runtime hugely drifts
> apart(scheduler sum_exec_runtime is less than tick-based time),
> the current implementation of cputime_adjust() will produce less
> sys usage than the actual use after changing to run a different
> workload pattern with high sys. This is because total tick-based
> utime and stime are used to split the total sum_exec_runtime.
> 
> Same problem exists on utime and stime from "/proc//stat".
> 
> [Example]
> Run some random run-sleep patterns for minutes, then change to run
> high sys pattern, and watch.
> 1) standard "top"(which is the correct one):
>4.6 us, 94.5 sy,  0.0 ni,  0.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> 2) our tool parsing utime and stime from "/proc//stat":
>20.5 usr, 78.4 sys
> We can see "20.5 usr" displayed in 2) was incorrect, it recovers
> gradually with time: 9.7 usr, 89.5 sys
> 
High sys probably means there's something abnormal on the kernel
path, it may hide issues, so we should make it fairly reliable.
It can easily hit this problem with our per-cgroup statistics.

Hi Peter, any comment on this patch?


Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

2018-06-25 Thread Xunlei Pang
On 6/22/18 6:35 PM, kbuild test robot wrote:
> Hi Xunlei,
> 
> Thank you for the patch! Perhaps something to improve:
> 
> [auto build test WARNING on tip/sched/core]
> [also build test WARNING on v4.18-rc1 next-20180622]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/Xunlei-Pang/sched-cputime-Ensure-correct-utime-and-stime-proportion/20180622-153720
> reproduce: make htmldocs
> 
> All warnings (new ones prefixed by >>):
> 
>WARNING: convert(1) not found, for SVG to PDF conversion install 
> ImageMagick (https://www.imagemagick.org)
>mm/mempool.c:228: warning: Function parameter or member 'pool' not 
> described in 'mempool_init'
>include/net/cfg80211.h:4279: warning: Function parameter or member 
> 'wext.ibss' not described in 'wireless_dev'
>include/net/cfg80211.h:4279: warning: Function parameter or member 
> 'wext.connect' not described in 'wireless_dev'
>include/net/cfg80211.h:4279: warning: Function parameter or member 
> 'wext.keys' not described in 'wireless_dev'
>include/net/cfg80211.h:4279: warning: Function parameter or member 
> 'wext.ie' not described in 'wireless_dev'
>include/net/cfg80211.h:4279: warning: Function parameter or member 
> 'wext.ie_len' not described in 'wireless_dev'
>include/net/cfg80211.h:4279: warning: Function parameter or member 
> 'wext.bssid' not described in 'wireless_dev'
>include/net/cfg80211.h:4279: warning: Function parameter or member 
> 'wext.ssid' not described in 'wireless_dev'
>include/net/cfg80211.h:4279: warning: Function parameter or member 
> 'wext.default_key' not described in 'wireless_dev'
>include/net/cfg80211.h:4279: warning: Function parameter or member 
> 'wext.default_mgmt_key' not described in 'wireless_dev'
>include/net/cfg80211.h:4279: warning: Function parameter or member 
> 'wext.prev_bssid_valid' not described in 'wireless_dev'
>include/net/mac80211.h:2282: warning: Function parameter or member 
> 'radiotap_timestamp.units_pos' not described in 'ieee80211_hw'
>include/net/mac80211.h:2282: warning: Function parameter or member 
> 'radiotap_timestamp.accuracy' not described in 'ieee80211_hw'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'control.rates' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'control.rts_cts_rate_idx' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'control.use_rts' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'control.use_cts_prot' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'control.short_preamble' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'control.skip_table' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'control.jiffies' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'control.vif' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'control.hw_key' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'control.flags' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'control.enqueue_time' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 'ack' 
> not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'ack.cookie' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'status.rates' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'status.ack_signal' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'status.ampdu_ack_len' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'status.ampdu_len' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'status.antenna' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'status.tx_time' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'status.is_valid_ack_signal' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'status.status_driver_data' not described in 'ieee80211_tx_info'
>  

Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

2018-06-25 Thread Xunlei Pang
On 6/22/18 6:35 PM, kbuild test robot wrote:
> Hi Xunlei,
> 
> Thank you for the patch! Perhaps something to improve:
> 
> [auto build test WARNING on tip/sched/core]
> [also build test WARNING on v4.18-rc1 next-20180622]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/Xunlei-Pang/sched-cputime-Ensure-correct-utime-and-stime-proportion/20180622-153720
> reproduce: make htmldocs
> 
> All warnings (new ones prefixed by >>):
> 
>WARNING: convert(1) not found, for SVG to PDF conversion install 
> ImageMagick (https://www.imagemagick.org)
>mm/mempool.c:228: warning: Function parameter or member 'pool' not 
> described in 'mempool_init'
>include/net/cfg80211.h:4279: warning: Function parameter or member 
> 'wext.ibss' not described in 'wireless_dev'
>include/net/cfg80211.h:4279: warning: Function parameter or member 
> 'wext.connect' not described in 'wireless_dev'
>include/net/cfg80211.h:4279: warning: Function parameter or member 
> 'wext.keys' not described in 'wireless_dev'
>include/net/cfg80211.h:4279: warning: Function parameter or member 
> 'wext.ie' not described in 'wireless_dev'
>include/net/cfg80211.h:4279: warning: Function parameter or member 
> 'wext.ie_len' not described in 'wireless_dev'
>include/net/cfg80211.h:4279: warning: Function parameter or member 
> 'wext.bssid' not described in 'wireless_dev'
>include/net/cfg80211.h:4279: warning: Function parameter or member 
> 'wext.ssid' not described in 'wireless_dev'
>include/net/cfg80211.h:4279: warning: Function parameter or member 
> 'wext.default_key' not described in 'wireless_dev'
>include/net/cfg80211.h:4279: warning: Function parameter or member 
> 'wext.default_mgmt_key' not described in 'wireless_dev'
>include/net/cfg80211.h:4279: warning: Function parameter or member 
> 'wext.prev_bssid_valid' not described in 'wireless_dev'
>include/net/mac80211.h:2282: warning: Function parameter or member 
> 'radiotap_timestamp.units_pos' not described in 'ieee80211_hw'
>include/net/mac80211.h:2282: warning: Function parameter or member 
> 'radiotap_timestamp.accuracy' not described in 'ieee80211_hw'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'control.rates' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'control.rts_cts_rate_idx' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'control.use_rts' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'control.use_cts_prot' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'control.short_preamble' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'control.skip_table' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'control.jiffies' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'control.vif' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'control.hw_key' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'control.flags' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'control.enqueue_time' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 'ack' 
> not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'ack.cookie' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'status.rates' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'status.ack_signal' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'status.ampdu_ack_len' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'status.ampdu_len' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'status.antenna' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'status.tx_time' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'status.is_valid_ack_signal' not described in 'ieee80211_tx_info'
>include/net/mac80211.h:955: warning: Function parameter or member 
> 'status.status_driver_data' not described in 'ieee80211_tx_info'
>  

Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

2018-06-22 Thread kbuild test robot
Hi Xunlei,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on tip/sched/core]
[also build test WARNING on v4.18-rc1 next-20180622]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Xunlei-Pang/sched-cputime-Ensure-correct-utime-and-stime-proportion/20180622-153720
reproduce: make htmldocs

All warnings (new ones prefixed by >>):

   WARNING: convert(1) not found, for SVG to PDF conversion install ImageMagick 
(https://www.imagemagick.org)
   mm/mempool.c:228: warning: Function parameter or member 'pool' not described 
in 'mempool_init'
   include/net/cfg80211.h:4279: warning: Function parameter or member 
'wext.ibss' not described in 'wireless_dev'
   include/net/cfg80211.h:4279: warning: Function parameter or member 
'wext.connect' not described in 'wireless_dev'
   include/net/cfg80211.h:4279: warning: Function parameter or member 
'wext.keys' not described in 'wireless_dev'
   include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.ie' 
not described in 'wireless_dev'
   include/net/cfg80211.h:4279: warning: Function parameter or member 
'wext.ie_len' not described in 'wireless_dev'
   include/net/cfg80211.h:4279: warning: Function parameter or member 
'wext.bssid' not described in 'wireless_dev'
   include/net/cfg80211.h:4279: warning: Function parameter or member 
'wext.ssid' not described in 'wireless_dev'
   include/net/cfg80211.h:4279: warning: Function parameter or member 
'wext.default_key' not described in 'wireless_dev'
   include/net/cfg80211.h:4279: warning: Function parameter or member 
'wext.default_mgmt_key' not described in 'wireless_dev'
   include/net/cfg80211.h:4279: warning: Function parameter or member 
'wext.prev_bssid_valid' not described in 'wireless_dev'
   include/net/mac80211.h:2282: warning: Function parameter or member 
'radiotap_timestamp.units_pos' not described in 'ieee80211_hw'
   include/net/mac80211.h:2282: warning: Function parameter or member 
'radiotap_timestamp.accuracy' not described in 'ieee80211_hw'
   include/net/mac80211.h:955: warning: Function parameter or member 
'control.rates' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'control.rts_cts_rate_idx' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'control.use_rts' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'control.use_cts_prot' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'control.short_preamble' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'control.skip_table' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'control.jiffies' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'control.vif' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'control.hw_key' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'control.flags' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'control.enqueue_time' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 'ack' not 
described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'ack.cookie' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'status.rates' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'status.ack_signal' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'status.ampdu_ack_len' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'status.ampdu_len' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'status.antenna' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'status.tx_time' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'status.is_valid_ack_signal' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'status.status_driver_data' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'driver_rates' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 'pad' not 
described in 

Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

2018-06-22 Thread kbuild test robot
Hi Xunlei,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on tip/sched/core]
[also build test WARNING on v4.18-rc1 next-20180622]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Xunlei-Pang/sched-cputime-Ensure-correct-utime-and-stime-proportion/20180622-153720
reproduce: make htmldocs

All warnings (new ones prefixed by >>):

   WARNING: convert(1) not found, for SVG to PDF conversion install ImageMagick 
(https://www.imagemagick.org)
   mm/mempool.c:228: warning: Function parameter or member 'pool' not described 
in 'mempool_init'
   include/net/cfg80211.h:4279: warning: Function parameter or member 
'wext.ibss' not described in 'wireless_dev'
   include/net/cfg80211.h:4279: warning: Function parameter or member 
'wext.connect' not described in 'wireless_dev'
   include/net/cfg80211.h:4279: warning: Function parameter or member 
'wext.keys' not described in 'wireless_dev'
   include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.ie' 
not described in 'wireless_dev'
   include/net/cfg80211.h:4279: warning: Function parameter or member 
'wext.ie_len' not described in 'wireless_dev'
   include/net/cfg80211.h:4279: warning: Function parameter or member 
'wext.bssid' not described in 'wireless_dev'
   include/net/cfg80211.h:4279: warning: Function parameter or member 
'wext.ssid' not described in 'wireless_dev'
   include/net/cfg80211.h:4279: warning: Function parameter or member 
'wext.default_key' not described in 'wireless_dev'
   include/net/cfg80211.h:4279: warning: Function parameter or member 
'wext.default_mgmt_key' not described in 'wireless_dev'
   include/net/cfg80211.h:4279: warning: Function parameter or member 
'wext.prev_bssid_valid' not described in 'wireless_dev'
   include/net/mac80211.h:2282: warning: Function parameter or member 
'radiotap_timestamp.units_pos' not described in 'ieee80211_hw'
   include/net/mac80211.h:2282: warning: Function parameter or member 
'radiotap_timestamp.accuracy' not described in 'ieee80211_hw'
   include/net/mac80211.h:955: warning: Function parameter or member 
'control.rates' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'control.rts_cts_rate_idx' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'control.use_rts' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'control.use_cts_prot' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'control.short_preamble' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'control.skip_table' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'control.jiffies' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'control.vif' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'control.hw_key' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'control.flags' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'control.enqueue_time' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 'ack' not 
described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'ack.cookie' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'status.rates' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'status.ack_signal' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'status.ampdu_ack_len' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'status.ampdu_len' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'status.antenna' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'status.tx_time' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'status.is_valid_ack_signal' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'status.status_driver_data' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 
'driver_rates' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 'pad' not 
described in 

[PATCH] sched/cputime: Ensure correct utime and stime proportion

2018-06-22 Thread Xunlei Pang
We use per-cgroup cpu usage statistics similar to "cgroup rstat",
and encountered a problem that user and sys usages are wrongly
split sometimes.

Run tasks with some random run-sleep pattern for a long time, and
when tick-based time and scheduler sum_exec_runtime hugely drifts
apart(scheduler sum_exec_runtime is less than tick-based time),
the current implementation of cputime_adjust() will produce less
sys usage than the actual use after changing to run a different
workload pattern with high sys. This is because total tick-based
utime and stime are used to split the total sum_exec_runtime.

Same problem exists on utime and stime from "/proc//stat".

[Example]
Run some random run-sleep patterns for minutes, then change to run
high sys pattern, and watch.
1) standard "top"(which is the correct one):
   4.6 us, 94.5 sy,  0.0 ni,  0.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
2) our tool parsing utime and stime from "/proc//stat":
   20.5 usr, 78.4 sys
We can see "20.5 usr" displayed in 2) was incorrect, it recovers
gradually with time: 9.7 usr, 89.5 sys

This patch fixes the issue by calculating using all kinds of time
elapsed since last parse in cputime_adjust(), and accumulate the
corresponding results calculated into prev_cputime. A new field
of task_cputime type is added in structure prev_cputime to record
previous task_cputime so that we can get the elapsed time deltas.

Signed-off-by: Xunlei Pang 
---
 include/linux/sched.h | 33 +++
 include/linux/sched/cputime.h | 12 -
 kernel/sched/cputime.c| 61 ---
 3 files changed, 51 insertions(+), 55 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 87bf02d93a27..5108ac8414e0 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -223,6 +223,22 @@ extern void io_schedule_finish(int token);
 extern long io_schedule_timeout(long timeout);
 extern void io_schedule(void);
 
+/**
+ * struct task_cputime - collected CPU time counts
+ * @utime: time spent in user mode, in nanoseconds
+ * @stime: time spent in kernel mode, in nanoseconds
+ * @sum_exec_runtime:  total time spent on the CPU, in nanoseconds
+ *
+ * This structure groups together three kinds of CPU time that are tracked for
+ * threads and thread groups.  Most things considering CPU time want to group
+ * these counts together and treat all three of them in parallel.
+ */
+struct task_cputime {
+   u64 utime;
+   u64 stime;
+   unsigned long long  sum_exec_runtime;
+};
+
 /**
  * struct prev_cputime - snapshot of system and user cputime
  * @utime: time spent in user mode
@@ -236,26 +252,11 @@ struct prev_cputime {
 #ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
u64 utime;
u64 stime;
+   struct task_cputime cputime;
raw_spinlock_t  lock;
 #endif
 };
 
-/**
- * struct task_cputime - collected CPU time counts
- * @utime: time spent in user mode, in nanoseconds
- * @stime: time spent in kernel mode, in nanoseconds
- * @sum_exec_runtime:  total time spent on the CPU, in nanoseconds
- *
- * This structure groups together three kinds of CPU time that are tracked for
- * threads and thread groups.  Most things considering CPU time want to group
- * these counts together and treat all three of them in parallel.
- */
-struct task_cputime {
-   u64 utime;
-   u64 stime;
-   unsigned long long  sum_exec_runtime;
-};
-
 /* Alternate field names when used on cache expirations: */
 #define virt_exp   utime
 #define prof_exp   stime
diff --git a/include/linux/sched/cputime.h b/include/linux/sched/cputime.h
index 53f883f5a2fd..49f8fd2564ed 100644
--- a/include/linux/sched/cputime.h
+++ b/include/linux/sched/cputime.h
@@ -175,10 +175,20 @@ static inline void account_group_exec_runtime(struct 
task_struct *tsk,
atomic64_add(ns, >cputime_atomic.sum_exec_runtime);
 }
 
-static inline void prev_cputime_init(struct prev_cputime *prev)
+static inline void prev_cputime_clear(struct prev_cputime *prev)
 {
 #ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
prev->utime = prev->stime = 0;
+   prev->cputime.utime = 0;
+   prev->cputime.stime = 0;
+   prev->cputime.sum_exec_runtime = 0;
+#endif
+}
+
+static inline void prev_cputime_init(struct prev_cputime *prev)
+{
+#ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
+   prev_cputime_clear(prev);
raw_spin_lock_init(>lock);
 #endif
 }
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 0796f938c4f0..a68483ee3ad7 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -590,69 +590,54 @@ static u64 scale_stime(u64 stime, u64 rtime, u64 total)
 void cputime_adjust(struct 

[PATCH] sched/cputime: Ensure correct utime and stime proportion

2018-06-22 Thread Xunlei Pang
We use per-cgroup cpu usage statistics similar to "cgroup rstat",
and encountered a problem that user and sys usages are wrongly
split sometimes.

Run tasks with some random run-sleep pattern for a long time, and
when tick-based time and scheduler sum_exec_runtime hugely drifts
apart(scheduler sum_exec_runtime is less than tick-based time),
the current implementation of cputime_adjust() will produce less
sys usage than the actual use after changing to run a different
workload pattern with high sys. This is because total tick-based
utime and stime are used to split the total sum_exec_runtime.

Same problem exists on utime and stime from "/proc//stat".

[Example]
Run some random run-sleep patterns for minutes, then change to run
high sys pattern, and watch.
1) standard "top"(which is the correct one):
   4.6 us, 94.5 sy,  0.0 ni,  0.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
2) our tool parsing utime and stime from "/proc//stat":
   20.5 usr, 78.4 sys
We can see "20.5 usr" displayed in 2) was incorrect, it recovers
gradually with time: 9.7 usr, 89.5 sys

This patch fixes the issue by calculating using all kinds of time
elapsed since last parse in cputime_adjust(), and accumulate the
corresponding results calculated into prev_cputime. A new field
of task_cputime type is added in structure prev_cputime to record
previous task_cputime so that we can get the elapsed time deltas.

Signed-off-by: Xunlei Pang 
---
 include/linux/sched.h | 33 +++
 include/linux/sched/cputime.h | 12 -
 kernel/sched/cputime.c| 61 ---
 3 files changed, 51 insertions(+), 55 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 87bf02d93a27..5108ac8414e0 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -223,6 +223,22 @@ extern void io_schedule_finish(int token);
 extern long io_schedule_timeout(long timeout);
 extern void io_schedule(void);
 
+/**
+ * struct task_cputime - collected CPU time counts
+ * @utime: time spent in user mode, in nanoseconds
+ * @stime: time spent in kernel mode, in nanoseconds
+ * @sum_exec_runtime:  total time spent on the CPU, in nanoseconds
+ *
+ * This structure groups together three kinds of CPU time that are tracked for
+ * threads and thread groups.  Most things considering CPU time want to group
+ * these counts together and treat all three of them in parallel.
+ */
+struct task_cputime {
+   u64 utime;
+   u64 stime;
+   unsigned long long  sum_exec_runtime;
+};
+
 /**
  * struct prev_cputime - snapshot of system and user cputime
  * @utime: time spent in user mode
@@ -236,26 +252,11 @@ struct prev_cputime {
 #ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
u64 utime;
u64 stime;
+   struct task_cputime cputime;
raw_spinlock_t  lock;
 #endif
 };
 
-/**
- * struct task_cputime - collected CPU time counts
- * @utime: time spent in user mode, in nanoseconds
- * @stime: time spent in kernel mode, in nanoseconds
- * @sum_exec_runtime:  total time spent on the CPU, in nanoseconds
- *
- * This structure groups together three kinds of CPU time that are tracked for
- * threads and thread groups.  Most things considering CPU time want to group
- * these counts together and treat all three of them in parallel.
- */
-struct task_cputime {
-   u64 utime;
-   u64 stime;
-   unsigned long long  sum_exec_runtime;
-};
-
 /* Alternate field names when used on cache expirations: */
 #define virt_exp   utime
 #define prof_exp   stime
diff --git a/include/linux/sched/cputime.h b/include/linux/sched/cputime.h
index 53f883f5a2fd..49f8fd2564ed 100644
--- a/include/linux/sched/cputime.h
+++ b/include/linux/sched/cputime.h
@@ -175,10 +175,20 @@ static inline void account_group_exec_runtime(struct 
task_struct *tsk,
atomic64_add(ns, >cputime_atomic.sum_exec_runtime);
 }
 
-static inline void prev_cputime_init(struct prev_cputime *prev)
+static inline void prev_cputime_clear(struct prev_cputime *prev)
 {
 #ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
prev->utime = prev->stime = 0;
+   prev->cputime.utime = 0;
+   prev->cputime.stime = 0;
+   prev->cputime.sum_exec_runtime = 0;
+#endif
+}
+
+static inline void prev_cputime_init(struct prev_cputime *prev)
+{
+#ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
+   prev_cputime_clear(prev);
raw_spin_lock_init(>lock);
 #endif
 }
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 0796f938c4f0..a68483ee3ad7 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -590,69 +590,54 @@ static u64 scale_stime(u64 stime, u64 rtime, u64 total)
 void cputime_adjust(struct