Re: [patch] CFS scheduler, -v14

2007-06-07 Thread S.Çağlar Onur
Hi;

01 Haz 2007 Cum tarihinde, Linus Torvalds şunları yazmıştı: 
> Has it been hot where you are lately? Is your fan working?

First of all sorry for late reply.

For a while İstanbul is not really hot [~26 C] :) and yes fans are/seems 
working without a problem.

> Hardware that acts up under load is quite often thermal-related,
> especially if it starts happening during summer and didn't happen before
> that... ESPECIALLY the kinds of behaviours you see: the "sudden power-off"
> is the normal behaviour for a CPU that trips a critial overheating point,
> and the slowdown is also one normal response to overheating (CPU
> throttling).

According to ACPI output;

[EMAIL PROTECTED]> cat /proc/acpi/thermal_zone/THRM/*

cooling mode:   passive

state:   ok
temperature: 56 C
critical (S5):   105 C
passive: 95 C: tc1=1 tc2=5 tsp=10 devices=0xc20deec8

105 C is critical for that CPU, for a while (this is why i reply late) i'm 
constantly monitoring the temprature under low and high load. 

Its in 50-70 C interval in normal usage/idle and 80-100 C interval under high 
load (compiling some applications, using cpuburn to test etc.), so seems it 
can handle overheating issues

But digging the kern.log shows some strange values also;

May 24 10:39:23 localhost kernel: [0.00] Detected 897.748 MHz 
processor. <--- 2.6.21.2-CFS-v14
...
May 30 00:59:11 localhost kernel: [0.00] Detected 898.726 MHz 
processor. <--- 2.6.21.2-CFS-v15
...
Jun  1 02:09:44 localhost kernel: [0.00] Detected 897.591 MHz 
processor. <--- 2.6.21.3-CFS-v15
...

And according to same log these slowdowns occured after i compiled/installed 
these kernel versions into system(cause these are the first appearence of 
this versions in kern.log). So as you said it seems definetly a overheating 
issue. I'll continue to test/monitor and report back if i can find anything. 
Thanks!

Cheers
-- 
S.Çağlar Onur <[EMAIL PROTECTED]>
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in house!


signature.asc
Description: This is a digitally signed message part.


Re: [patch] CFS scheduler, -v14

2007-06-07 Thread S.Çağlar Onur
Hi;

01 Haz 2007 Cum tarihinde, Linus Torvalds şunları yazmıştı: 
 Has it been hot where you are lately? Is your fan working?

First of all sorry for late reply.

For a while İstanbul is not really hot [~26 C] :) and yes fans are/seems 
working without a problem.

 Hardware that acts up under load is quite often thermal-related,
 especially if it starts happening during summer and didn't happen before
 that... ESPECIALLY the kinds of behaviours you see: the sudden power-off
 is the normal behaviour for a CPU that trips a critial overheating point,
 and the slowdown is also one normal response to overheating (CPU
 throttling).

According to ACPI output;

[EMAIL PROTECTED] cat /proc/acpi/thermal_zone/THRM/*
setting not supported
cooling mode:   passive
polling disabled
state:   ok
temperature: 56 C
critical (S5):   105 C
passive: 95 C: tc1=1 tc2=5 tsp=10 devices=0xc20deec8

105 C is critical for that CPU, for a while (this is why i reply late) i'm 
constantly monitoring the temprature under low and high load. 

Its in 50-70 C interval in normal usage/idle and 80-100 C interval under high 
load (compiling some applications, using cpuburn to test etc.), so seems it 
can handle overheating issues

But digging the kern.log shows some strange values also;

May 24 10:39:23 localhost kernel: [0.00] Detected 897.748 MHz 
processor. --- 2.6.21.2-CFS-v14
...
May 30 00:59:11 localhost kernel: [0.00] Detected 898.726 MHz 
processor. --- 2.6.21.2-CFS-v15
...
Jun  1 02:09:44 localhost kernel: [0.00] Detected 897.591 MHz 
processor. --- 2.6.21.3-CFS-v15
...

And according to same log these slowdowns occured after i compiled/installed 
these kernel versions into system(cause these are the first appearence of 
this versions in kern.log). So as you said it seems definetly a overheating 
issue. I'll continue to test/monitor and report back if i can find anything. 
Thanks!

Cheers
-- 
S.Çağlar Onur [EMAIL PROTECTED]
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in house!


signature.asc
Description: This is a digitally signed message part.


Re: [patch] CFS scheduler, -v14

2007-06-06 Thread Li Yu

Hi, Ingo:

   I am sorry for disturbing you again, I am interesting on CFS, 
however, had really confused on the fairness implementation of CFS.


   After reviewed the past mails of LKML, I known the virtual clock is 
used by fairness measuring scale, it is excellent idea. and CFS use 
wait_runtime (the total time to run) to simulate the virtual clock of 
the task. however, from my experiment, it seem we can not get well 
fairness effect if we just only take care of task->wait_runtime. but, 
CFS work well-known fine ;-)


   Here is the detail of my experiment:

   Suppose it is one UP system, and there are three 100 % cpuhog tasks 
on that processor,  they have 1, 2, 3  weight respectively,  so they 
should have 1, 2, 3 seconds time to run itself in 6 secords interval 
respectively.


   The clock tick interval is 1 sec. so the step of virtual time(VT) is 
0.17 (1/6) wall time(WT). I use follow convention to describe runtime 
information:


   VTR0.17: the virtual time 0.17 of runqueue
   VTT11.0 : the virtual time 11.0 of task
   WT2: the wall time 2.
   TASK_1/123.00: the task named TASK_1, it has 123.00 wait_runtime at 
that time.


for example:

   WT1/VTR0.17 [ VTT0.00:[TASK_2/1.00, TASK_3/1.00], 
VTT1.00:[TASK_1/-0.83] ] current: TASK_2/1.00  

its meaning is we pick TASK_2 as next task at wall time 1 / virtual time 
of runqueue 0.17, the ready task list has three tasks:


TASK_2:  the virtual time of it is 0.00, the wait_runtime of it is 1.00
TASK_3:  the virtual time of it is 0.00, the wait_runtime of it is 1.00
TASK_1:  the virtual time of it is 1.00, the wait_runtime of it is -0.83

It seem the result of picking next task by least VTT or by largest 
wait_runtime is same at this time, lucky.


Follow is complete result of running my script to simulate 6 clock ticks:

-
WT0/VTR0.00 [ VTT0.00:[TASK_1/0.00, TASK_2/0.00, TASK_3/0.00] ] 
current: TASK_1/0.00

Before WT1 :
TASK_2/1.00 wait 1.00 sec
TASK_3/1.00 wait 1.00 sec
TASK_1/0.00 spent - 0.83 sec (delta_mine-delta_exec, delta_exec always 
is 1.0)
WT1/VTR0.17 [ VTT0.00:[TASK_2/1.00, TASK_3/1.00], 
VTT1.00:[TASK_1/-0.83] ] current: TASK_2/1.00

Before WT2 :
TASK_3/2.00 wait 1.00 sec
TASK_1/0.17 wait 1.00 sec
TASK_2/1.00 spent - 0.67 sec (delta_mine-delta_exec, delta_exec always 
is 1.0)
WT2/VTR0.33 [ VTT0.00:[TASK_3/2.00], VTT0.50:[TASK_2/0.33], 
VTT1.00:[TASK_1/0.17] ] current: TASK_3/2.00

Before WT3 :
TASK_1/1.17 wait 1.00 sec
TASK_2/1.33 wait 1.00 sec
TASK_3/2.00 spent - 0.50 sec (delta_mine-delta_exec, delta_exec always 
is 1.0)
WT3/VTR0.50 [ VTT0.33:[TASK_3/1.50], VTT0.50:[TASK_2/1.33], 
VTT1.00:[TASK_1/1.17] ] current: TASK_3/1.50

Before WT4 :
TASK_1/2.17 wait 1.00 sec
TASK_2/2.33 wait 1.00 sec
TASK_3/1.50 spent - 0.50 sec (delta_mine-delta_exec, delta_exec always 
is 1.0)
WT4/VTR0.67 [ VTT0.50:[TASK_2/2.33], VTT0.67:[TASK_3/1.00], 
VTT1.00:[TASK_1/2.17] ] current: TASK_2/2.33

Before WT5 :
TASK_1/3.17 wait 1.00 sec
TASK_3/2.00 wait 1.00 sec
TASK_2/2.33 spent - 0.67 sec (delta_mine-delta_exec, delta_exec always 
is 1.0)
WT5/VTR0.83 [ VTT0.67:[TASK_3/2.00], VTT1.00:[TASK_1/3.17, 
TASK_2/1.67] ] current: TASK_3/2.00

-
TASK_1/3.17 run 1.00 sec
TASK_2/1.67 run 1.00 sec
TASK_3/2.00 run 2.00 sec
TASK_2/1.67 run 1.00 sec
TASK_3/2.00 run 1.00 sec
==
TASK_1 / 1.0 total run: 1.0 sec
TASK_2 / 2.0 total run: 2.0 sec
TASK_3 / 3.0 total run: 3.0 sec
==

if we pick next task by the least VTT (as above showing),  we can get 
the well fair result, the scheduling sequence is :


TASK_1 -> TASK_2 -> TASK_3 -> TASK_2 -> TASK_3

however, if we pick next task by the largest wait_runtime ,we can get 
other scheduling sequence:


TASK_1 -> TASK_2 -> TASK_3 -> TASK_2 -> TASK_1

In this case, they are not fairness anymore. every task got same 
processor time!


if we run the latter longer time, for example, to simulate 6000 times 
clock tick. the result is:


==
TASK_1 / 1.0  total run : 1806.0 sec
TASK_2 / 2.0  total run : 1987.0 sec
TASK_3 / 3.0  total run : 2207.0 sec
==

Do not need any vindication,  I really trust CFS works fine (it work 
fine for my desktop ;), it is fact.


so I think there must have some wrongs in my above experiment. but it is 
true apparently. What is really cleft between VTT and wait_runtime? and 
how you fill it in CFS? It seem I should give TASK_3  some extra credits 
in some ways.


Sorry for such long mail and so bad English.

Good luck.

- Li Yu
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-06-06 Thread Li Yu

Hi, Ingo:

   I am sorry for disturbing you again, I am interesting on CFS, 
however, had really confused on the fairness implementation of CFS.


   After reviewed the past mails of LKML, I known the virtual clock is 
used by fairness measuring scale, it is excellent idea. and CFS use 
wait_runtime (the total time to run) to simulate the virtual clock of 
the task. however, from my experiment, it seem we can not get well 
fairness effect if we just only take care of task-wait_runtime. but, 
CFS work well-known fine ;-)


   Here is the detail of my experiment:

   Suppose it is one UP system, and there are three 100 % cpuhog tasks 
on that processor,  they have 1, 2, 3  weight respectively,  so they 
should have 1, 2, 3 seconds time to run itself in 6 secords interval 
respectively.


   The clock tick interval is 1 sec. so the step of virtual time(VT) is 
0.17 (1/6) wall time(WT). I use follow convention to describe runtime 
information:


   VTR0.17: the virtual time 0.17 of runqueue
   VTT11.0 : the virtual time 11.0 of task
   WT2: the wall time 2.
   TASK_1/123.00: the task named TASK_1, it has 123.00 wait_runtime at 
that time.


for example:

   WT1/VTR0.17 [ VTT0.00:[TASK_2/1.00, TASK_3/1.00], 
VTT1.00:[TASK_1/-0.83] ] current: TASK_2/1.00  

its meaning is we pick TASK_2 as next task at wall time 1 / virtual time 
of runqueue 0.17, the ready task list has three tasks:


TASK_2:  the virtual time of it is 0.00, the wait_runtime of it is 1.00
TASK_3:  the virtual time of it is 0.00, the wait_runtime of it is 1.00
TASK_1:  the virtual time of it is 1.00, the wait_runtime of it is -0.83

It seem the result of picking next task by least VTT or by largest 
wait_runtime is same at this time, lucky.


Follow is complete result of running my script to simulate 6 clock ticks:

-
WT0/VTR0.00 [ VTT0.00:[TASK_1/0.00, TASK_2/0.00, TASK_3/0.00] ] 
current: TASK_1/0.00

Before WT1 :
TASK_2/1.00 wait 1.00 sec
TASK_3/1.00 wait 1.00 sec
TASK_1/0.00 spent - 0.83 sec (delta_mine-delta_exec, delta_exec always 
is 1.0)
WT1/VTR0.17 [ VTT0.00:[TASK_2/1.00, TASK_3/1.00], 
VTT1.00:[TASK_1/-0.83] ] current: TASK_2/1.00

Before WT2 :
TASK_3/2.00 wait 1.00 sec
TASK_1/0.17 wait 1.00 sec
TASK_2/1.00 spent - 0.67 sec (delta_mine-delta_exec, delta_exec always 
is 1.0)
WT2/VTR0.33 [ VTT0.00:[TASK_3/2.00], VTT0.50:[TASK_2/0.33], 
VTT1.00:[TASK_1/0.17] ] current: TASK_3/2.00

Before WT3 :
TASK_1/1.17 wait 1.00 sec
TASK_2/1.33 wait 1.00 sec
TASK_3/2.00 spent - 0.50 sec (delta_mine-delta_exec, delta_exec always 
is 1.0)
WT3/VTR0.50 [ VTT0.33:[TASK_3/1.50], VTT0.50:[TASK_2/1.33], 
VTT1.00:[TASK_1/1.17] ] current: TASK_3/1.50

Before WT4 :
TASK_1/2.17 wait 1.00 sec
TASK_2/2.33 wait 1.00 sec
TASK_3/1.50 spent - 0.50 sec (delta_mine-delta_exec, delta_exec always 
is 1.0)
WT4/VTR0.67 [ VTT0.50:[TASK_2/2.33], VTT0.67:[TASK_3/1.00], 
VTT1.00:[TASK_1/2.17] ] current: TASK_2/2.33

Before WT5 :
TASK_1/3.17 wait 1.00 sec
TASK_3/2.00 wait 1.00 sec
TASK_2/2.33 spent - 0.67 sec (delta_mine-delta_exec, delta_exec always 
is 1.0)
WT5/VTR0.83 [ VTT0.67:[TASK_3/2.00], VTT1.00:[TASK_1/3.17, 
TASK_2/1.67] ] current: TASK_3/2.00

-
TASK_1/3.17 run 1.00 sec
TASK_2/1.67 run 1.00 sec
TASK_3/2.00 run 2.00 sec
TASK_2/1.67 run 1.00 sec
TASK_3/2.00 run 1.00 sec
==
TASK_1 / 1.0 total run: 1.0 sec
TASK_2 / 2.0 total run: 2.0 sec
TASK_3 / 3.0 total run: 3.0 sec
==

if we pick next task by the least VTT (as above showing),  we can get 
the well fair result, the scheduling sequence is :


TASK_1 - TASK_2 - TASK_3 - TASK_2 - TASK_3

however, if we pick next task by the largest wait_runtime ,we can get 
other scheduling sequence:


TASK_1 - TASK_2 - TASK_3 - TASK_2 - TASK_1

In this case, they are not fairness anymore. every task got same 
processor time!


if we run the latter longer time, for example, to simulate 6000 times 
clock tick. the result is:


==
TASK_1 / 1.0  total run : 1806.0 sec
TASK_2 / 2.0  total run : 1987.0 sec
TASK_3 / 3.0  total run : 2207.0 sec
==

Do not need any vindication,  I really trust CFS works fine (it work 
fine for my desktop ;), it is fact.


so I think there must have some wrongs in my above experiment. but it is 
true apparently. What is really cleft between VTT and wait_runtime? and 
how you fill it in CFS? It seem I should give TASK_3  some extra credits 
in some ways.


Sorry for such long mail and so bad English.

Good luck.

- Li Yu
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-06-05 Thread Li Yu

Ingo Molnar wrote:

* Li Yu <[EMAIL PROTECTED]> wrote:

  
Eh, I wrong again~ I even took an experiment in last week end, this 
idea is really bad! ;(


I think the most inner of source of my wrong again and again is 
misunderstanding virtual time. For more better understanding this, I 
try to write one python script to simulate CFS behavior. However, It 
can not implement the fairness as I want. I really confuse here.


Would you like help me point out what's wrong in it? Any suggestion is 
welcome. Thanks in advanced.



sorry, my python-fu is really, really weak. All i can give you at the 
moment is the in-kernel implementation of CFS :-)


  

:~)

I changed that script to check my understanding of virtual clock. I 
found out we really got the really fairness if allocate resource by 
selecting the most earliest task virtual clock! this really eliminate my 
doubt on virtual clock in much degree. for example:


./htucfs.py 60

==
TASK_1/C10.00 / 1.0 : 11.0 sec
TASK_2/C10.00 / 2.0 : 20.0 sec
TASK_3/C10.00 / 3.0 : 30.0 sec
==

It seem my haltingly english works fine when I read the introduction of 
virtual clock ;-)


The next step is find out why wait_runtime can not work normally in my 
script. 


Thanks for your quickly reply.

Good luck.

- Li Yu
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-06-05 Thread Ingo Molnar

* Li Yu <[EMAIL PROTECTED]> wrote:

> Eh, I wrong again~ I even took an experiment in last week end, this 
> idea is really bad! ;(
> 
> I think the most inner of source of my wrong again and again is 
> misunderstanding virtual time. For more better understanding this, I 
> try to write one python script to simulate CFS behavior. However, It 
> can not implement the fairness as I want. I really confuse here.
> 
> Would you like help me point out what's wrong in it? Any suggestion is 
> welcome. Thanks in advanced.

sorry, my python-fu is really, really weak. All i can give you at the 
moment is the in-kernel implementation of CFS :-)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-06-05 Thread Ingo Molnar

* Balbir Singh <[EMAIL PROTECTED]> wrote:

> + /*
> +  * Split up sched_exec_time according to the utime and
> +  * stime ratio. At this point utime contains the summed
> +  * sched_exec_runtime and stime is zero
> +  */
> + if (sum_us_time) {
> + utime = ((tu_time * total_time) / sum_us_time);
> + stime = ((ts_time * total_time) / sum_us_time);
> + }
> + }

hm, Dmitry Adamushko found out that this will cause rounding problems 
and might confuse 'top' - because total_time is a 10 msecs granular 
value, so under the above calculation the total of 'utime+stime' can 
shrink a bit as time goes forward. The symptom is that top will display 
a '99.9%' entry for tasks, sporadically.

I've attached below my current delta (ontop of -v15) which does the 
stime/utime splitup correctly and which includes some more enhancements 
from Dmitry - could you please take a look at this and add any deltas 
you might have ontop of it?

Ingo

---
 Makefile  |2 +-
 fs/proc/array.c   |   33 -
 include/linux/sched.h |3 +--
 kernel/posix-cpu-timers.c |2 +-
 kernel/sched.c|   17 ++---
 kernel/sched_debug.c  |   16 +++-
 kernel/sched_fair.c   |2 +-
 kernel/sched_rt.c |   12 
 8 files changed, 61 insertions(+), 26 deletions(-)

Index: linux/Makefile
===
--- linux.orig/Makefile
+++ linux/Makefile
@@ -1,7 +1,7 @@
 VERSION = 2
 PATCHLEVEL = 6
 SUBLEVEL = 21
-EXTRAVERSION = .3-cfs-v15
+EXTRAVERSION = .3-cfs-v16
 NAME = Nocturnal Monster Puppy
 
 # *DOCUMENTATION*
Index: linux/fs/proc/array.c
===
--- linux.orig/fs/proc/array.c
+++ linux/fs/proc/array.c
@@ -172,8 +172,8 @@ static inline char * task_state(struct t
"Uid:\t%d\t%d\t%d\t%d\n"
"Gid:\t%d\t%d\t%d\t%d\n",
get_task_state(p),
-   p->tgid, p->pid,
-   pid_alive(p) ? rcu_dereference(p->real_parent)->tgid : 0,
+   p->tgid, p->pid,
+   pid_alive(p) ? rcu_dereference(p->real_parent)->tgid : 0,
pid_alive(p) && p->ptrace ? rcu_dereference(p->parent)->pid : 0,
p->uid, p->euid, p->suid, p->fsuid,
p->gid, p->egid, p->sgid, p->fsgid);
@@ -312,24 +312,39 @@ int proc_pid_status(struct task_struct *
 
 static clock_t task_utime(struct task_struct *p)
 {
+   clock_t utime = cputime_to_clock_t(p->utime),
+   total = utime + cputime_to_clock_t(p->stime);
+
/*
 * Use CFS's precise accounting, if available:
 */
-   if (!has_rt_policy(p) && !(sysctl_sched_load_smoothing & 128))
-   return nsec_to_clock_t(p->sum_exec_runtime);
+   if (!(sysctl_sched_load_smoothing & 128)) {
+   u64 temp = (u64)nsec_to_clock_t(p->sum_exec_runtime);
+
+   if (total) {
+   temp *= utime;
+   do_div(temp, total);
+   }
+   utime = (clock_t)temp;
+   }
 
-   return cputime_to_clock_t(p->utime);
+   return utime;
 }
 
 static clock_t task_stime(struct task_struct *p)
 {
+   clock_t stime = cputime_to_clock_t(p->stime),
+   total = stime + cputime_to_clock_t(p->utime);
+
/*
-* Use CFS's precise accounting, if available:
+* Use CFS's precise accounting, if available (we subtract
+* utime from the total, to make sure the total observed
+* by userspace grows monotonically - apps rely on that):
 */
-   if (!has_rt_policy(p) && !(sysctl_sched_load_smoothing & 128))
-   return 0;
+   if (!(sysctl_sched_load_smoothing & 128))
+   stime = nsec_to_clock_t(p->sum_exec_runtime) - task_utime(p);
 
-   return cputime_to_clock_t(p->stime);
+   return stime;
 }
 
 
Index: linux/include/linux/sched.h
===
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -852,7 +852,6 @@ struct task_struct {
u64 block_max;
u64 exec_max;
u64 wait_max;
-   u64 last_ran;
 
s64 wait_runtime;
u64 sum_exec_runtime;
@@ -1235,7 +1234,7 @@ static inline int set_cpus_allowed(struc
 extern unsigned long long sched_clock(void);
 extern void sched_clock_unstable_event(void);
 extern unsigned long long
-current_sched_runtime(const struct task_struct *current_task);
+task_sched_runtime(struct task_struct *task);
 
 /* sched_exec is called by processes performing an exec */
 #ifdef CONFIG_SMP
Index: linux/kernel/posix-cpu-timers.c
===
--- linux.orig/kernel/posix-cpu-timers.c
+++ 

Re: [patch] CFS scheduler, -v14

2007-06-05 Thread Ingo Molnar

* Balbir Singh [EMAIL PROTECTED] wrote:

 + /*
 +  * Split up sched_exec_time according to the utime and
 +  * stime ratio. At this point utime contains the summed
 +  * sched_exec_runtime and stime is zero
 +  */
 + if (sum_us_time) {
 + utime = ((tu_time * total_time) / sum_us_time);
 + stime = ((ts_time * total_time) / sum_us_time);
 + }
 + }

hm, Dmitry Adamushko found out that this will cause rounding problems 
and might confuse 'top' - because total_time is a 10 msecs granular 
value, so under the above calculation the total of 'utime+stime' can 
shrink a bit as time goes forward. The symptom is that top will display 
a '99.9%' entry for tasks, sporadically.

I've attached below my current delta (ontop of -v15) which does the 
stime/utime splitup correctly and which includes some more enhancements 
from Dmitry - could you please take a look at this and add any deltas 
you might have ontop of it?

Ingo

---
 Makefile  |2 +-
 fs/proc/array.c   |   33 -
 include/linux/sched.h |3 +--
 kernel/posix-cpu-timers.c |2 +-
 kernel/sched.c|   17 ++---
 kernel/sched_debug.c  |   16 +++-
 kernel/sched_fair.c   |2 +-
 kernel/sched_rt.c |   12 
 8 files changed, 61 insertions(+), 26 deletions(-)

Index: linux/Makefile
===
--- linux.orig/Makefile
+++ linux/Makefile
@@ -1,7 +1,7 @@
 VERSION = 2
 PATCHLEVEL = 6
 SUBLEVEL = 21
-EXTRAVERSION = .3-cfs-v15
+EXTRAVERSION = .3-cfs-v16
 NAME = Nocturnal Monster Puppy
 
 # *DOCUMENTATION*
Index: linux/fs/proc/array.c
===
--- linux.orig/fs/proc/array.c
+++ linux/fs/proc/array.c
@@ -172,8 +172,8 @@ static inline char * task_state(struct t
Uid:\t%d\t%d\t%d\t%d\n
Gid:\t%d\t%d\t%d\t%d\n,
get_task_state(p),
-   p-tgid, p-pid,
-   pid_alive(p) ? rcu_dereference(p-real_parent)-tgid : 0,
+   p-tgid, p-pid,
+   pid_alive(p) ? rcu_dereference(p-real_parent)-tgid : 0,
pid_alive(p)  p-ptrace ? rcu_dereference(p-parent)-pid : 0,
p-uid, p-euid, p-suid, p-fsuid,
p-gid, p-egid, p-sgid, p-fsgid);
@@ -312,24 +312,39 @@ int proc_pid_status(struct task_struct *
 
 static clock_t task_utime(struct task_struct *p)
 {
+   clock_t utime = cputime_to_clock_t(p-utime),
+   total = utime + cputime_to_clock_t(p-stime);
+
/*
 * Use CFS's precise accounting, if available:
 */
-   if (!has_rt_policy(p)  !(sysctl_sched_load_smoothing  128))
-   return nsec_to_clock_t(p-sum_exec_runtime);
+   if (!(sysctl_sched_load_smoothing  128)) {
+   u64 temp = (u64)nsec_to_clock_t(p-sum_exec_runtime);
+
+   if (total) {
+   temp *= utime;
+   do_div(temp, total);
+   }
+   utime = (clock_t)temp;
+   }
 
-   return cputime_to_clock_t(p-utime);
+   return utime;
 }
 
 static clock_t task_stime(struct task_struct *p)
 {
+   clock_t stime = cputime_to_clock_t(p-stime),
+   total = stime + cputime_to_clock_t(p-utime);
+
/*
-* Use CFS's precise accounting, if available:
+* Use CFS's precise accounting, if available (we subtract
+* utime from the total, to make sure the total observed
+* by userspace grows monotonically - apps rely on that):
 */
-   if (!has_rt_policy(p)  !(sysctl_sched_load_smoothing  128))
-   return 0;
+   if (!(sysctl_sched_load_smoothing  128))
+   stime = nsec_to_clock_t(p-sum_exec_runtime) - task_utime(p);
 
-   return cputime_to_clock_t(p-stime);
+   return stime;
 }
 
 
Index: linux/include/linux/sched.h
===
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -852,7 +852,6 @@ struct task_struct {
u64 block_max;
u64 exec_max;
u64 wait_max;
-   u64 last_ran;
 
s64 wait_runtime;
u64 sum_exec_runtime;
@@ -1235,7 +1234,7 @@ static inline int set_cpus_allowed(struc
 extern unsigned long long sched_clock(void);
 extern void sched_clock_unstable_event(void);
 extern unsigned long long
-current_sched_runtime(const struct task_struct *current_task);
+task_sched_runtime(struct task_struct *task);
 
 /* sched_exec is called by processes performing an exec */
 #ifdef CONFIG_SMP
Index: linux/kernel/posix-cpu-timers.c
===
--- linux.orig/kernel/posix-cpu-timers.c
+++ linux/kernel/posix-cpu-timers.c
@@ -161,7 +161,7 @@ static inline 

Re: [patch] CFS scheduler, -v14

2007-06-05 Thread Ingo Molnar

* Li Yu [EMAIL PROTECTED] wrote:

 Eh, I wrong again~ I even took an experiment in last week end, this 
 idea is really bad! ;(
 
 I think the most inner of source of my wrong again and again is 
 misunderstanding virtual time. For more better understanding this, I 
 try to write one python script to simulate CFS behavior. However, It 
 can not implement the fairness as I want. I really confuse here.
 
 Would you like help me point out what's wrong in it? Any suggestion is 
 welcome. Thanks in advanced.

sorry, my python-fu is really, really weak. All i can give you at the 
moment is the in-kernel implementation of CFS :-)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-06-05 Thread Li Yu

Ingo Molnar wrote:

* Li Yu [EMAIL PROTECTED] wrote:

  
Eh, I wrong again~ I even took an experiment in last week end, this 
idea is really bad! ;(


I think the most inner of source of my wrong again and again is 
misunderstanding virtual time. For more better understanding this, I 
try to write one python script to simulate CFS behavior. However, It 
can not implement the fairness as I want. I really confuse here.


Would you like help me point out what's wrong in it? Any suggestion is 
welcome. Thanks in advanced.



sorry, my python-fu is really, really weak. All i can give you at the 
moment is the in-kernel implementation of CFS :-)


  

:~)

I changed that script to check my understanding of virtual clock. I 
found out we really got the really fairness if allocate resource by 
selecting the most earliest task virtual clock! this really eliminate my 
doubt on virtual clock in much degree. for example:


./htucfs.py 60

==
TASK_1/C10.00 / 1.0 : 11.0 sec
TASK_2/C10.00 / 2.0 : 20.0 sec
TASK_3/C10.00 / 3.0 : 30.0 sec
==

It seem my haltingly english works fine when I read the introduction of 
virtual clock ;-)


The next step is find out why wait_runtime can not work normally in my 
script. 


Thanks for your quickly reply.

Good luck.

- Li Yu
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-06-04 Thread Li Yu


Ingo Molnar wrote:

* Li Yu <[EMAIL PROTECTED]> wrote:

  

Also, I have want to know what's real meaning of

   add_wait_runtime(rq, curr, delta_mine - delta_exec);

in update_curr(), IMHO, it should be

   add_wait_runtime(rq, curr, delta_mine - delta_fair);

Is this just another heuristics? or my opinion is wrong again? :-)



well, ->wait_runtime is in real time units. If a task executes 
delta_exec time on the CPU, we deduct "-delta_exec" 1:1. But during that 
time the task also got entitled to a bit more CPU time, that is 
+delta_mine. The calculation above expresses this. I'm not sure what 
sense '-delta_fair' would make - "delta_fair" is the amount of time a 
nice-0 task would be entitled to - but this task might not be a nice-0 
task. Furthermore, even for a nice-0 task why deduct -delta_fair - it 
spent delta_exec on the CPU.




Eh, I wrong again~ I even took an experiment in last week end, this idea 

is really bad! ;(


I think the most inner of source of my wrong again and again is
misunderstanding virtual time. For more better understanding this, I try 
to write one python script to simulate CFS behavior. However, It can not 
implement the fairness as I want. I really confuse here.


Would you like help me point out what's wrong in it? Any suggestion is 
welcome. Thanks in advanced.






I think use wait_runtime is more clear. so I modify this script.


#! /usr/bin/python

# htucfs.py - Hard-To-Understand-CFS.py ;)
# Wrote by Li Yu / 20070604

#
# only support static load / UP.
#


# Usage:
#   ./htucfs.py nr_clock_ticks_to_run
#

import sys

class task_struct:
def __init__(self, name, load_weight):
self.name = name
self.wait_runtime = 0
self.fair_clock = 0
self.load_weight = float(load_weight)
def __repr__(self):
return "%s/C%.2f" % (self.name, self.fair_clock)

idle_task = task_struct("idle", 0)

class run_queue:
def __init__(self):
self.raw_weighted_load = 0
self.wall_clock = 0
self.fair_clock = 0
self.ready_queue = {}
self.run_history = []
self.task_list = []
self.curr = None
self.debug = 0

def snapshot(self):
if self.debug:
print "%.2f" % self.fair_clock, self.ready_queue, 
self.curr

def enqueue(self, task):
if not self.ready_queue.get(task.wait_runtime):
self.ready_queue[task.wait_runtime] = [task]
else:
# keep FIFO for same wait_runtime tasks.
self.ready_queue[task.wait_runtime].append(task)
self.raw_weighted_load += task.load_weight
self.task_list.append(task)

def dequeue(self, task):
self.raw_weighted_load -= task.load_weight
self.ready_queue[task.wait_runtime].remove(task)
if not self.ready_queue[task.wait_runtime]:
del self.ready_queue[task.wait_runtime]
self.task_list.remove(task)

def other_wait_runtime(self):
task_list = self.task_list[:]
for task in task_list:
if task == self.curr:
continue
self.dequeue(task)
task.wait_runtime += 1
print task, "wait 1 sec"
self.enqueue(task)

def clock_tick(self):
# clock_tick = 1.0
self.fair_clock += 1.0/self.raw_weighted_load
# delta_exec = 1.0
delta_mine = self.curr.load_weight / self.raw_weighted_load
self.dequeue(self.curr)
self.other_wait_runtime()
print self.curr, "run %.2f" % (delta_mine-1.0)
self.curr.wait_runtime += (delta_mine-1.0)
self.curr.fair_clock += 1.0/self.curr.load_weight
self.enqueue(self.curr)
self.pick_next_task()

def pick_next_task(self):
key_seq = self.ready_queue.keys()
if key_seq:
key_seq.sort()
self.curr = self.ready_queue[key_seq[-1]][0]
else:
self.curr = idle_task
self.snapshot()
self.record_run_history()

def record_run_history(self):
task = self.curr
if not self.run_history:
self.run_history.append([task, 1])
return
curr = self.run_history[-1]
if curr[0] != task:
self.run_history.append([task, 1])
else:
curr[1] += 1


Re: [patch] CFS scheduler, -v14

2007-06-04 Thread Li Yu


Ingo Molnar wrote:

* Li Yu <[EMAIL PROTECTED]> wrote:

  

Also, I have want to know what's real meaning of

   add_wait_runtime(rq, curr, delta_mine - delta_exec);

in update_curr(), IMHO, it should be

   add_wait_runtime(rq, curr, delta_mine - delta_fair);

Is this just another heuristics? or my opinion is wrong again? :-)



well, ->wait_runtime is in real time units. If a task executes 
delta_exec time on the CPU, we deduct "-delta_exec" 1:1. But during that 
time the task also got entitled to a bit more CPU time, that is 
+delta_mine. The calculation above expresses this. I'm not sure what 
sense '-delta_fair' would make - "delta_fair" is the amount of time a 
nice-0 task would be entitled to - but this task might not be a nice-0 
task. Furthermore, even for a nice-0 task why deduct -delta_fair - it 
spent delta_exec on the CPU.




Eh, I wrong again~ I even took an experiment in last week end, this idea 
is really bad! ;(


I think the most inner of source of my wrong again and again is
misunderstanding virtual time. For more better understanding this, I try 
to write one python script to simulate CFS behavior. However, It can not 
implement the fairness as I want. I really confuse here.


Would you like help me point out what's wrong in it? Any suggestion is 
welcome. Thanks in advanced.



#! /usr/bin/python

# htucfs.py - Hard-To-Understand-CFS.py ;)
# Wrote by Li Yu / 20070604

#
# only support static load on UP.
#


# Usage:
#./htucfs.py nr_clock_ticks_to_run
#

import sys

class task_struct:
   def __init__(self, name, load_weight):
   self.name = name
   self.wait_runtime = 0
   self.fair_clock = 0
   self.fair_key = 0
   self.load_weight = float(load_weight)
   def __repr__(self):
   return "%s/C%.2f" % (self.name, self.fair_clock)

idle_task = task_struct("idle", 0)

class run_queue:
   def __init__(self):
   self.raw_weighted_load = 0
   self.wall_clock = 0
   self.fair_clock = 0
   self.ready_queue = {}
   self.run_history = []
   self.task_list = []
   self.curr = None
   self.debug = 0

   def snapshot(self):
   if self.debug:
   print "%.2f" % self.fair_clock, self.ready_queue, self.curr

   def enqueue(self, task):
   task.fair_key = self.fair_clock-task.wait_runtime
   task.fair_key = int(100 * task.fair_key)
   if not self.ready_queue.get(task.fair_key):
   self.ready_queue[task.fair_key] = [task]
   else:
   # keep FIFO for same fair_key tasks.
   self.ready_queue[task.fair_key].append(task)
   self.raw_weighted_load += task.load_weight
   self.task_list.append(task)

   def dequeue(self, task):
   self.raw_weighted_load -= task.load_weight
   self.ready_queue[task.fair_key].remove(task)
   if not self.ready_queue[task.fair_key]:
   del self.ready_queue[task.fair_key]
   self.task_list.remove(task)

   def other_wait_runtime(self):
   for task in self.task_list:
   self.dequeue(task)
   task.wait_runtime += 1
   self.enqueue(task)

   def clock_tick(self):
   # clock_tick = 1.0
   self.fair_clock += 1.0/self.raw_weighted_load
   # delta_exec = 1.0
   delta_mine = self.curr.load_weight / self.raw_weighted_load
   self.curr.wait_runtime += (delta_mine-1.0)
   self.curr.fair_clock += 1.0/self.curr.load_weight
   self.dequeue(self.curr)
   self.other_wait_runtime()
   self.enqueue(self.curr)
   self.pick_next_task()

   def pick_next_task(self):
   key_seq = self.ready_queue.keys()
   if key_seq:
   key_seq.sort()
   self.curr = self.ready_queue[key_seq[0]][0]
   else:
   self.curr = idle_task
   self.snapshot()
   self.record_run_history()

   def record_run_history(self):
   task = self.curr
   if not self.run_history:
   self.run_history.append([task, 1])
   return
   curr = self.run_history[-1]
   if curr[0] != task:
   self.run_history.append([task, 1])
   else:
   curr[1] += 1

   def show_history(self):
   stat = {}
   for entry in self.run_history:
   task = entry[0]
   nsec = entry[1]
   print "%s run %d sec" % (task, nsec)
   if task not in stat.keys():
   stat[task] = nsec
   else:
   stat[task] += nsec
   print "=="
   tasks = stat.keys()
   tasks.sort()
   for task in tasks:
   print task, "/", task.load_weight, ":", stat[task], "sec"
   print "=="

   def run(self, delta=0, debug=0):
   self.debug = debug
   until = self.wall_clock + delta
   print "-"
   self.pick_next_task()
   while self.wall_clock < until:
   self.wall_clock += 1
   self.clock_tick()
   print "-"

#
# To turn this, display verbose runtime 

Re: [patch] CFS scheduler, -v14

2007-06-04 Thread Li Yu


Ingo Molnar wrote:

* Li Yu [EMAIL PROTECTED] wrote:

  

Also, I have want to know what's real meaning of

   add_wait_runtime(rq, curr, delta_mine - delta_exec);

in update_curr(), IMHO, it should be

   add_wait_runtime(rq, curr, delta_mine - delta_fair);

Is this just another heuristics? or my opinion is wrong again? :-)



well, -wait_runtime is in real time units. If a task executes 
delta_exec time on the CPU, we deduct -delta_exec 1:1. But during that 
time the task also got entitled to a bit more CPU time, that is 
+delta_mine. The calculation above expresses this. I'm not sure what 
sense '-delta_fair' would make - delta_fair is the amount of time a 
nice-0 task would be entitled to - but this task might not be a nice-0 
task. Furthermore, even for a nice-0 task why deduct -delta_fair - it 
spent delta_exec on the CPU.




Eh, I wrong again~ I even took an experiment in last week end, this idea 
is really bad! ;(


I think the most inner of source of my wrong again and again is
misunderstanding virtual time. For more better understanding this, I try 
to write one python script to simulate CFS behavior. However, It can not 
implement the fairness as I want. I really confuse here.


Would you like help me point out what's wrong in it? Any suggestion is 
welcome. Thanks in advanced.



#! /usr/bin/python

# htucfs.py - Hard-To-Understand-CFS.py ;)
# Wrote by Li Yu / 20070604

#
# only support static load on UP.
#


# Usage:
#./htucfs.py nr_clock_ticks_to_run
#

import sys

class task_struct:
   def __init__(self, name, load_weight):
   self.name = name
   self.wait_runtime = 0
   self.fair_clock = 0
   self.fair_key = 0
   self.load_weight = float(load_weight)
   def __repr__(self):
   return %s/C%.2f % (self.name, self.fair_clock)

idle_task = task_struct(idle, 0)

class run_queue:
   def __init__(self):
   self.raw_weighted_load = 0
   self.wall_clock = 0
   self.fair_clock = 0
   self.ready_queue = {}
   self.run_history = []
   self.task_list = []
   self.curr = None
   self.debug = 0

   def snapshot(self):
   if self.debug:
   print %.2f % self.fair_clock, self.ready_queue, self.curr

   def enqueue(self, task):
   task.fair_key = self.fair_clock-task.wait_runtime
   task.fair_key = int(100 * task.fair_key)
   if not self.ready_queue.get(task.fair_key):
   self.ready_queue[task.fair_key] = [task]
   else:
   # keep FIFO for same fair_key tasks.
   self.ready_queue[task.fair_key].append(task)
   self.raw_weighted_load += task.load_weight
   self.task_list.append(task)

   def dequeue(self, task):
   self.raw_weighted_load -= task.load_weight
   self.ready_queue[task.fair_key].remove(task)
   if not self.ready_queue[task.fair_key]:
   del self.ready_queue[task.fair_key]
   self.task_list.remove(task)

   def other_wait_runtime(self):
   for task in self.task_list:
   self.dequeue(task)
   task.wait_runtime += 1
   self.enqueue(task)

   def clock_tick(self):
   # clock_tick = 1.0
   self.fair_clock += 1.0/self.raw_weighted_load
   # delta_exec = 1.0
   delta_mine = self.curr.load_weight / self.raw_weighted_load
   self.curr.wait_runtime += (delta_mine-1.0)
   self.curr.fair_clock += 1.0/self.curr.load_weight
   self.dequeue(self.curr)
   self.other_wait_runtime()
   self.enqueue(self.curr)
   self.pick_next_task()

   def pick_next_task(self):
   key_seq = self.ready_queue.keys()
   if key_seq:
   key_seq.sort()
   self.curr = self.ready_queue[key_seq[0]][0]
   else:
   self.curr = idle_task
   self.snapshot()
   self.record_run_history()

   def record_run_history(self):
   task = self.curr
   if not self.run_history:
   self.run_history.append([task, 1])
   return
   curr = self.run_history[-1]
   if curr[0] != task:
   self.run_history.append([task, 1])
   else:
   curr[1] += 1

   def show_history(self):
   stat = {}
   for entry in self.run_history:
   task = entry[0]
   nsec = entry[1]
   print %s run %d sec % (task, nsec)
   if task not in stat.keys():
   stat[task] = nsec
   else:
   stat[task] += nsec
   print ==
   tasks = stat.keys()
   tasks.sort()
   for task in tasks:
   print task, /, task.load_weight, :, stat[task], sec
   print ==

   def run(self, delta=0, debug=0):
   self.debug = debug
   until = self.wall_clock + delta
   print -
   self.pick_next_task()
   while self.wall_clock  until:
   self.wall_clock += 1
   self.clock_tick()
   print -

#
# To turn this, display verbose runtime information.
#
debug = True

if 

Re: [patch] CFS scheduler, -v14

2007-06-04 Thread Li Yu


Ingo Molnar wrote:

* Li Yu [EMAIL PROTECTED] wrote:

  

Also, I have want to know what's real meaning of

   add_wait_runtime(rq, curr, delta_mine - delta_exec);

in update_curr(), IMHO, it should be

   add_wait_runtime(rq, curr, delta_mine - delta_fair);

Is this just another heuristics? or my opinion is wrong again? :-)



well, -wait_runtime is in real time units. If a task executes 
delta_exec time on the CPU, we deduct -delta_exec 1:1. But during that 
time the task also got entitled to a bit more CPU time, that is 
+delta_mine. The calculation above expresses this. I'm not sure what 
sense '-delta_fair' would make - delta_fair is the amount of time a 
nice-0 task would be entitled to - but this task might not be a nice-0 
task. Furthermore, even for a nice-0 task why deduct -delta_fair - it 
spent delta_exec on the CPU.




Eh, I wrong again~ I even took an experiment in last week end, this idea 

is really bad! ;(


I think the most inner of source of my wrong again and again is
misunderstanding virtual time. For more better understanding this, I try 
to write one python script to simulate CFS behavior. However, It can not 
implement the fairness as I want. I really confuse here.


Would you like help me point out what's wrong in it? Any suggestion is 
welcome. Thanks in advanced.






I think use wait_runtime is more clear. so I modify this script.


#! /usr/bin/python

# htucfs.py - Hard-To-Understand-CFS.py ;)
# Wrote by Li Yu / 20070604

#
# only support static load / UP.
#


# Usage:
#   ./htucfs.py nr_clock_ticks_to_run
#

import sys

class task_struct:
def __init__(self, name, load_weight):
self.name = name
self.wait_runtime = 0
self.fair_clock = 0
self.load_weight = float(load_weight)
def __repr__(self):
return %s/C%.2f % (self.name, self.fair_clock)

idle_task = task_struct(idle, 0)

class run_queue:
def __init__(self):
self.raw_weighted_load = 0
self.wall_clock = 0
self.fair_clock = 0
self.ready_queue = {}
self.run_history = []
self.task_list = []
self.curr = None
self.debug = 0

def snapshot(self):
if self.debug:
print %.2f % self.fair_clock, self.ready_queue, 
self.curr

def enqueue(self, task):
if not self.ready_queue.get(task.wait_runtime):
self.ready_queue[task.wait_runtime] = [task]
else:
# keep FIFO for same wait_runtime tasks.
self.ready_queue[task.wait_runtime].append(task)
self.raw_weighted_load += task.load_weight
self.task_list.append(task)

def dequeue(self, task):
self.raw_weighted_load -= task.load_weight
self.ready_queue[task.wait_runtime].remove(task)
if not self.ready_queue[task.wait_runtime]:
del self.ready_queue[task.wait_runtime]
self.task_list.remove(task)

def other_wait_runtime(self):
task_list = self.task_list[:]
for task in task_list:
if task == self.curr:
continue
self.dequeue(task)
task.wait_runtime += 1
print task, wait 1 sec
self.enqueue(task)

def clock_tick(self):
# clock_tick = 1.0
self.fair_clock += 1.0/self.raw_weighted_load
# delta_exec = 1.0
delta_mine = self.curr.load_weight / self.raw_weighted_load
self.dequeue(self.curr)
self.other_wait_runtime()
print self.curr, run %.2f % (delta_mine-1.0)
self.curr.wait_runtime += (delta_mine-1.0)
self.curr.fair_clock += 1.0/self.curr.load_weight
self.enqueue(self.curr)
self.pick_next_task()

def pick_next_task(self):
key_seq = self.ready_queue.keys()
if key_seq:
key_seq.sort()
self.curr = self.ready_queue[key_seq[-1]][0]
else:
self.curr = idle_task
self.snapshot()
self.record_run_history()

def record_run_history(self):
task = self.curr
if not self.run_history:
self.run_history.append([task, 1])
return
curr = self.run_history[-1]
if curr[0] != task:
self.run_history.append([task, 1])
else:
curr[1] += 1

def 

Re: [patch] CFS scheduler, -v14

2007-06-01 Thread Ingo Molnar

* Li Yu <[EMAIL PROTECTED]> wrote:

> Also, I have want to know what's real meaning of
> 
>add_wait_runtime(rq, curr, delta_mine - delta_exec);
> 
> in update_curr(), IMHO, it should be
> 
>add_wait_runtime(rq, curr, delta_mine - delta_fair);
> 
> Is this just another heuristics? or my opinion is wrong again? :-)

well, ->wait_runtime is in real time units. If a task executes 
delta_exec time on the CPU, we deduct "-delta_exec" 1:1. But during that 
time the task also got entitled to a bit more CPU time, that is 
+delta_mine. The calculation above expresses this. I'm not sure what 
sense '-delta_fair' would make - "delta_fair" is the amount of time a 
nice-0 task would be entitled to - but this task might not be a nice-0 
task. Furthermore, even for a nice-0 task why deduct -delta_fair - it 
spent delta_exec on the CPU.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[OT] Re: [patch] CFS scheduler, -v14

2007-06-01 Thread Andreas Mohr
[OT, thus removed private addresses]

Hi,

On Fri, Jun 01, 2007 at 04:35:02PM +0300, S.Ça??lar Onur wrote:
> Seems like this piece of hardware is dieing [For a while my laptop starts to 
> poweroff suddenly without any log/error etc] and i think all these problems 
> caused by this. Or at least/ for me/ this laptop (sony vaio fs-215b) is not a 
> stable test bed for this kind of human involved testings.

Socketed CPU?

It *might* be an idea to reseat it, maybe it's simply insufficient seating
of the CPU due to rougher travel handling than with a desktop.
(CPU socket issues can easily be the case on some notebooks AFAIK,
and it was on mine: Inspiron 8000 - bought it as completely dead until I simply
fiddled with CPU socket...).

Andreas Mohr
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-06-01 Thread Linus Torvalds


On Fri, 1 Jun 2007, S.?a?lar Onur wrote:
> 
> Seems like this piece of hardware is dieing [For a while my laptop starts to 
> poweroff suddenly without any log/error etc] and i think all these problems 
> caused by this. Or at least/ for me/ this laptop (sony vaio fs-215b) is not a 
> stable test bed for this kind of human involved testings.

Has it been hot where you are lately? Is your fan working? 

Hardware that acts up under load is quite often thermal-related, 
especially if it starts happening during summer and didn't happen before 
that... ESPECIALLY the kinds of behaviours you see: the "sudden power-off" 
is the normal behaviour for a CPU that trips a critial overheating point, 
and the slowdown is also one normal response to overheating (CPU 
throttling).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-06-01 Thread S.Çağlar Onur
Hi;

26 May 2007 Cts tarihinde, S.Çağlar Onur şunları yazmıştı: 
> Under load (compiling any Qt app. or kernel with -j1 or -j2) audio always
> goes sync with time (and i'm sure it never skips) but video starts slowdown
> and loses its sync with audio (like for the 10th sec. of a movie, audio is
> at 10th sec. also, but the shown video is from 7th sec.).
>
> After some time video suddenly wants to sync with audio and starts to play
> really fast (like fast-forward) and syncs with audio. But it will lose its
> audio/video sync after a while and loop continues like that.

After a lots of private mail traffic and debuggin efforts with Ingo, yesterday 
i simply requested to ignore that problem (at least until i can reproduce 
same with different machines). 

Yesterday i turn back to vanilla 2.6.18.8 to see the situation with it and i 
reproduce the problem even with lower loads.

Seems like this piece of hardware is dieing [For a while my laptop starts to 
poweroff suddenly without any log/error etc] and i think all these problems 
caused by this. Or at least/ for me/ this laptop (sony vaio fs-215b) is not a 
stable test bed for this kind of human involved testings.

Ingo cannot reproduce same audio/video out-of-sync problems with his setups, 
and currently i'm the only person deals with that problem. 

And for some boots kernel reports wrong frequency for my cpu (and notice the 
timing difference), [this maybe a overheat problem but aslo i'll try 
disabling CONFIG_NO_HZ as Ingo suggested]

...
[0.00] Initializing CPU#0
[0.00] PID hash table entries: 4096 (order: 12, 16384 bytes)
[0.00] Detected 897.591 MHz processor.
[   13.142654] Console: colour dummy device 80x25
[   13.143609] Dentry cache hash table entries: 131072 (order: 7, 524288 
 bytes)
[   13.144530] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
...

...
[0.00] Initializing CPU#0
[0.00] PID hash table entries: 4096 (order: 12, 16384 bytes)
[0.00] Detected 1729.292 MHz processor.
[8.286228] Console: colour dummy device 80x25
[8.286650] Dentry cache hash table entries: 131072 (order: 7, 524288 
bytes)
[8.287058] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
...

As a result, please ignore this problem until i can reproduce on different 
machines or someone else reports the same problem :)

Cheers
-- 
S.Çağlar Onur <[EMAIL PROTECTED]>
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in house!


signature.asc
Description: This is a digitally signed message part.


Re: [patch] CFS scheduler, -v14

2007-06-01 Thread Li Yu

Ingo Molnar wrote:

* Li Yu <[EMAIL PROTECTED]> wrote:

  

static void distribute_fair_add(struct rq *rq, s64 delta)
{
   struct task_struct *curr = rq->curr;
   s64 delta_fair = 0;

   if (!(sysctl_sched_load_smoothing & 32))
   return;

   if (rq->nr_running) {
   delta_fair = div64_s(delta, rq->nr_running);
   /*
* The currently running task's next wait_runtime value does
* not depend on the fair_clock, so fix it up explicitly:
*/
   add_wait_runtime(rq, curr, -delta_fair);
   rq->fair_clock -= delta_fair;
   }
}

See this line:

   delta_fair = div64_s(delta, rq->nr_running);

Ingo, should we be replace "rq->nr_running" with "rq->raw_load_weight" 
here?



that would break the code. The handling of sleep periods is basically 
heuristics and using nr_running here appears to be 'good enough' in 
practice.


  
Thanks,  I am wrong at seeing the delta variable is represented by 
virtual time unit. if the code does as I said, the delta_fair may be too 
small to meanless.


Also, I have want to know what's real meaning of 


   add_wait_runtime(rq, curr, delta_mine - delta_exec);

in update_curr(), IMHO, it should be

   add_wait_runtime(rq, curr, delta_mine - delta_fair);

Is this just another heuristics? or my opinion is wrong again? :-)

Good luck.

- Li Yu





-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-06-01 Thread Li Yu

Ingo Molnar wrote:

* Li Yu [EMAIL PROTECTED] wrote:

  

static void distribute_fair_add(struct rq *rq, s64 delta)
{
   struct task_struct *curr = rq-curr;
   s64 delta_fair = 0;

   if (!(sysctl_sched_load_smoothing  32))
   return;

   if (rq-nr_running) {
   delta_fair = div64_s(delta, rq-nr_running);
   /*
* The currently running task's next wait_runtime value does
* not depend on the fair_clock, so fix it up explicitly:
*/
   add_wait_runtime(rq, curr, -delta_fair);
   rq-fair_clock -= delta_fair;
   }
}

See this line:

   delta_fair = div64_s(delta, rq-nr_running);

Ingo, should we be replace rq-nr_running with rq-raw_load_weight 
here?



that would break the code. The handling of sleep periods is basically 
heuristics and using nr_running here appears to be 'good enough' in 
practice.


  
Thanks,  I am wrong at seeing the delta variable is represented by 
virtual time unit. if the code does as I said, the delta_fair may be too 
small to meanless.


Also, I have want to know what's real meaning of 


   add_wait_runtime(rq, curr, delta_mine - delta_exec);

in update_curr(), IMHO, it should be

   add_wait_runtime(rq, curr, delta_mine - delta_fair);

Is this just another heuristics? or my opinion is wrong again? :-)

Good luck.

- Li Yu





-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-06-01 Thread S.Çağlar Onur
Hi;

26 May 2007 Cts tarihinde, S.Çağlar Onur şunları yazmıştı: 
 Under load (compiling any Qt app. or kernel with -j1 or -j2) audio always
 goes sync with time (and i'm sure it never skips) but video starts slowdown
 and loses its sync with audio (like for the 10th sec. of a movie, audio is
 at 10th sec. also, but the shown video is from 7th sec.).

 After some time video suddenly wants to sync with audio and starts to play
 really fast (like fast-forward) and syncs with audio. But it will lose its
 audio/video sync after a while and loop continues like that.

After a lots of private mail traffic and debuggin efforts with Ingo, yesterday 
i simply requested to ignore that problem (at least until i can reproduce 
same with different machines). 

Yesterday i turn back to vanilla 2.6.18.8 to see the situation with it and i 
reproduce the problem even with lower loads.

Seems like this piece of hardware is dieing [For a while my laptop starts to 
poweroff suddenly without any log/error etc] and i think all these problems 
caused by this. Or at least/ for me/ this laptop (sony vaio fs-215b) is not a 
stable test bed for this kind of human involved testings.

Ingo cannot reproduce same audio/video out-of-sync problems with his setups, 
and currently i'm the only person deals with that problem. 

And for some boots kernel reports wrong frequency for my cpu (and notice the 
timing difference), [this maybe a overheat problem but aslo i'll try 
disabling CONFIG_NO_HZ as Ingo suggested]

...
[0.00] Initializing CPU#0
[0.00] PID hash table entries: 4096 (order: 12, 16384 bytes)
[0.00] Detected 897.591 MHz processor.
[   13.142654] Console: colour dummy device 80x25
[   13.143609] Dentry cache hash table entries: 131072 (order: 7, 524288 
 bytes)
[   13.144530] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
...

...
[0.00] Initializing CPU#0
[0.00] PID hash table entries: 4096 (order: 12, 16384 bytes)
[0.00] Detected 1729.292 MHz processor.
[8.286228] Console: colour dummy device 80x25
[8.286650] Dentry cache hash table entries: 131072 (order: 7, 524288 
bytes)
[8.287058] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
...

As a result, please ignore this problem until i can reproduce on different 
machines or someone else reports the same problem :)

Cheers
-- 
S.Çağlar Onur [EMAIL PROTECTED]
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in house!


signature.asc
Description: This is a digitally signed message part.


Re: [patch] CFS scheduler, -v14

2007-06-01 Thread Linus Torvalds


On Fri, 1 Jun 2007, S.?a?lar Onur wrote:
 
 Seems like this piece of hardware is dieing [For a while my laptop starts to 
 poweroff suddenly without any log/error etc] and i think all these problems 
 caused by this. Or at least/ for me/ this laptop (sony vaio fs-215b) is not a 
 stable test bed for this kind of human involved testings.

Has it been hot where you are lately? Is your fan working? 

Hardware that acts up under load is quite often thermal-related, 
especially if it starts happening during summer and didn't happen before 
that... ESPECIALLY the kinds of behaviours you see: the sudden power-off 
is the normal behaviour for a CPU that trips a critial overheating point, 
and the slowdown is also one normal response to overheating (CPU 
throttling).

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[OT] Re: [patch] CFS scheduler, -v14

2007-06-01 Thread Andreas Mohr
[OT, thus removed private addresses]

Hi,

On Fri, Jun 01, 2007 at 04:35:02PM +0300, S.Ça??lar Onur wrote:
 Seems like this piece of hardware is dieing [For a while my laptop starts to 
 poweroff suddenly without any log/error etc] and i think all these problems 
 caused by this. Or at least/ for me/ this laptop (sony vaio fs-215b) is not a 
 stable test bed for this kind of human involved testings.

Socketed CPU?

It *might* be an idea to reseat it, maybe it's simply insufficient seating
of the CPU due to rougher travel handling than with a desktop.
(CPU socket issues can easily be the case on some notebooks AFAIK,
and it was on mine: Inspiron 8000 - bought it as completely dead until I simply
fiddled with CPU socket...).

Andreas Mohr
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-06-01 Thread Ingo Molnar

* Li Yu [EMAIL PROTECTED] wrote:

 Also, I have want to know what's real meaning of
 
add_wait_runtime(rq, curr, delta_mine - delta_exec);
 
 in update_curr(), IMHO, it should be
 
add_wait_runtime(rq, curr, delta_mine - delta_fair);
 
 Is this just another heuristics? or my opinion is wrong again? :-)

well, -wait_runtime is in real time units. If a task executes 
delta_exec time on the CPU, we deduct -delta_exec 1:1. But during that 
time the task also got entitled to a bit more CPU time, that is 
+delta_mine. The calculation above expresses this. I'm not sure what 
sense '-delta_fair' would make - delta_fair is the amount of time a 
nice-0 task would be entitled to - but this task might not be a nice-0 
task. Furthermore, even for a nice-0 task why deduct -delta_fair - it 
spent delta_exec on the CPU.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-31 Thread Ingo Molnar

* Li Yu <[EMAIL PROTECTED]> wrote:

> static void distribute_fair_add(struct rq *rq, s64 delta)
> {
>struct task_struct *curr = rq->curr;
>s64 delta_fair = 0;
> 
>if (!(sysctl_sched_load_smoothing & 32))
>return;
> 
>if (rq->nr_running) {
>delta_fair = div64_s(delta, rq->nr_running);
>/*
> * The currently running task's next wait_runtime value does
> * not depend on the fair_clock, so fix it up explicitly:
> */
>add_wait_runtime(rq, curr, -delta_fair);
>rq->fair_clock -= delta_fair;
>}
> }
> 
> See this line:
> 
>delta_fair = div64_s(delta, rq->nr_running);
> 
> Ingo, should we be replace "rq->nr_running" with "rq->raw_load_weight" 
> here?

that would break the code. The handling of sleep periods is basically 
heuristics and using nr_running here appears to be 'good enough' in 
practice.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-31 Thread Li Yu


static void distribute_fair_add(struct rq *rq, s64 delta)
{
   struct task_struct *curr = rq->curr;
   s64 delta_fair = 0;

   if (!(sysctl_sched_load_smoothing & 32))
   return;

   if (rq->nr_running) {
   delta_fair = div64_s(delta, rq->nr_running);
   /*
* The currently running task's next wait_runtime value does
* not depend on the fair_clock, so fix it up explicitly:
*/
   add_wait_runtime(rq, curr, -delta_fair);
   rq->fair_clock -= delta_fair;
   }
}

See this line:

   delta_fair = div64_s(delta, rq->nr_running);

Ingo, should we be replace "rq->nr_running" with "rq->raw_load_weight" here?

Good luck
- Li Yu




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-31 Thread Li Yu


static void distribute_fair_add(struct rq *rq, s64 delta)
{
   struct task_struct *curr = rq-curr;
   s64 delta_fair = 0;

   if (!(sysctl_sched_load_smoothing  32))
   return;

   if (rq-nr_running) {
   delta_fair = div64_s(delta, rq-nr_running);
   /*
* The currently running task's next wait_runtime value does
* not depend on the fair_clock, so fix it up explicitly:
*/
   add_wait_runtime(rq, curr, -delta_fair);
   rq-fair_clock -= delta_fair;
   }
}

See this line:

   delta_fair = div64_s(delta, rq-nr_running);

Ingo, should we be replace rq-nr_running with rq-raw_load_weight here?

Good luck
- Li Yu




-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-31 Thread Ingo Molnar

* Li Yu [EMAIL PROTECTED] wrote:

 static void distribute_fair_add(struct rq *rq, s64 delta)
 {
struct task_struct *curr = rq-curr;
s64 delta_fair = 0;
 
if (!(sysctl_sched_load_smoothing  32))
return;
 
if (rq-nr_running) {
delta_fair = div64_s(delta, rq-nr_running);
/*
 * The currently running task's next wait_runtime value does
 * not depend on the fair_clock, so fix it up explicitly:
 */
add_wait_runtime(rq, curr, -delta_fair);
rq-fair_clock -= delta_fair;
}
 }
 
 See this line:
 
delta_fair = div64_s(delta, rq-nr_running);
 
 Ingo, should we be replace rq-nr_running with rq-raw_load_weight 
 here?

that would break the code. The handling of sleep periods is basically 
heuristics and using nr_running here appears to be 'good enough' in 
practice.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-29 Thread Balbir Singh
On Mon, May 28, 2007 at 01:07:48PM +0200, Ingo Molnar wrote:
> 
> * Balbir Singh <[EMAIL PROTECTED]> wrote:
> 
> > Ingo Molnar wrote:
> > > i found an accounting bug in this: it didnt sum up threads correctly. 
> > > The patch below fixes this. The stime == 0 problem is still there 
> > > though.
> > > 
> > >   Ingo
> > > 
> > 
> > Thanks! I'll test the code on Monday. I do not understand the 
> > sysctl_sched_smoothing functionality, so I do not understand its 
> > impact on accounting. I'll take a look more closely
> 
> basically sysctl_sched_smoothing is more of a 'experimental features 
> flag' kind of thing. I'll remove it soon, you should only need to 
> concentrate on the functionality that it enables by default.
> 
>   Ingo

Hi, Ingo,

I hope this patch addresses the stime == 0 problem.

This patch improves accounting of the CFS scheduler. We have the executed
run time in sum_exec_runtime field of the task. This patch splits the
sum_exec_runtime in the ratio of task->utime and task->stime to obtain
the user and system time of the task. 

TODO's:

1. Migrate getrusage() to use sum_exec_runtime so that the output in /proc
   is consistent with the data by running time(1).


Signed-off-by: Balbir Singh <[EMAIL PROTECTED]>
---

 fs/proc/array.c |   27 ---
 linux/sched.h   |0 
 2 files changed, 24 insertions(+), 3 deletions(-)

diff -puN fs/proc/array.c~cfs-distribute-accounting fs/proc/array.c
--- linux-2.6.22-rc2/fs/proc/array.c~cfs-distribute-accounting  2007-05-29 
13:47:47.0 +0530
+++ linux-2.6.22-rc2-balbir/fs/proc/array.c 2007-05-29 15:35:22.0 
+0530
@@ -332,7 +332,6 @@ static clock_t task_stime(struct task_st
return cputime_to_clock_t(p->stime);
 }
 
-
 static int do_task_stat(struct task_struct *task, char * buffer, int whole)
 {
unsigned long vsize, eip, esp, wchan = ~0UL;
@@ -400,8 +399,13 @@ static int do_task_stat(struct task_stru
 
min_flt += sig->min_flt;
maj_flt += sig->maj_flt;
-   utime += cputime_to_clock_t(sig->utime);
-   stime += cputime_to_clock_t(sig->stime);
+   if (!has_rt_policy(t))
+   utime += nsec_to_clock_t(
+   sig->sum_sched_runtime);
+   else {
+   utime += cputime_to_clock_t(sig->utime);
+   stime += cputime_to_clock_t(sig->stime);
+   }
}
 
sid = signal_session(sig);
@@ -421,6 +425,23 @@ static int do_task_stat(struct task_stru
stime = task_stime(task);
}
 
+   if (!has_rt_policy(task)) {
+   clock_t sum_us_time = utime + stime;
+   clock_t tu_time = cputime_to_clock_t(task->utime);
+   clock_t ts_time = cputime_to_clock_t(task->stime);
+   clock_t total_time = utime;
+
+   /*
+* Split up sched_exec_time according to the utime and
+* stime ratio. At this point utime contains the summed
+* sched_exec_runtime and stime is zero
+*/
+   if (sum_us_time) {
+   utime = ((tu_time * total_time) / sum_us_time);
+   stime = ((ts_time * total_time) / sum_us_time);
+   }
+   }
+
/* scale priority and nice values from timeslices to -20..20 */
/* to make it look like a "normal" Unix priority/nice value  */
priority = task_prio(task);
diff -puN kernel/sys.c~cfs-distribute-accounting kernel/sys.c
diff -puN include/linux/sched.h~cfs-distribute-accounting include/linux/sched.h
_

-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-29 Thread Balbir Singh
Hi, Ingo,

> +static clock_t task_utime(struct task_struct *p)
> +{
> + /*
> +  * Use CFS's precise accounting, if available:
> +  */
> + if (!has_rt_policy(p) && !(sysctl_sched_load_smoothing & 128))
> + return nsec_to_clock_t(p->sum_exec_runtime);


I wonder if this leads to data truncation, p->sum_exec_runtime is
unsigned long long and clock_t is long (on all architectures from what
my cscope shows me). I have my other patch ready on top of this. I'll
post it out soon.

> +
> + return cputime_to_clock_t(p->utime);
> +}
> +

-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-29 Thread Ingo Molnar

* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> > [...] in update_stats_enqueue(), It seem that these statements in 
> > two brances of "if (p->load_weight > NICE_0_LOAD)" are same, is it 
> > on purpose?
> 
> what do you mean?

you are right indeed. Mike Galbraith has sent a cleanup patch that 
removes that duplication (and uses div64_s()).

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-29 Thread Ingo Molnar

* Li Yu <[EMAIL PROTECTED]> wrote:

> Ingo Molnar wrote:
> >i'm pleased to announce release -v14 of the CFS scheduler patchset.
> >
> >The CFS patch against v2.6.22-rc2, v2.6.21.1 or v2.6.20.10 can be 
> >downloaded from the usual place:
> >   
> >  http://people.redhat.com/mingo/cfs-scheduler/
> >  
> I tried this on 2.6.21.1, Good work!

thanks :)

> [...] in update_stats_enqueue(), It seem that these statements in two 
> brances of "if (p->load_weight > NICE_0_LOAD)" are same, is it on 
> purpose?

what do you mean?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-29 Thread Ingo Molnar

* Li Yu [EMAIL PROTECTED] wrote:

 Ingo Molnar wrote:
 i'm pleased to announce release -v14 of the CFS scheduler patchset.
 
 The CFS patch against v2.6.22-rc2, v2.6.21.1 or v2.6.20.10 can be 
 downloaded from the usual place:

   http://people.redhat.com/mingo/cfs-scheduler/
   
 I tried this on 2.6.21.1, Good work!

thanks :)

 [...] in update_stats_enqueue(), It seem that these statements in two 
 brances of if (p-load_weight  NICE_0_LOAD) are same, is it on 
 purpose?

what do you mean?

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-29 Thread Ingo Molnar

* Ingo Molnar [EMAIL PROTECTED] wrote:

  [...] in update_stats_enqueue(), It seem that these statements in 
  two brances of if (p-load_weight  NICE_0_LOAD) are same, is it 
  on purpose?
 
 what do you mean?

you are right indeed. Mike Galbraith has sent a cleanup patch that 
removes that duplication (and uses div64_s()).

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-29 Thread Balbir Singh
Hi, Ingo,

 +static clock_t task_utime(struct task_struct *p)
 +{
 + /*
 +  * Use CFS's precise accounting, if available:
 +  */
 + if (!has_rt_policy(p)  !(sysctl_sched_load_smoothing  128))
 + return nsec_to_clock_t(p-sum_exec_runtime);


I wonder if this leads to data truncation, p-sum_exec_runtime is
unsigned long long and clock_t is long (on all architectures from what
my cscope shows me). I have my other patch ready on top of this. I'll
post it out soon.

 +
 + return cputime_to_clock_t(p-utime);
 +}
 +

-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-29 Thread Balbir Singh
On Mon, May 28, 2007 at 01:07:48PM +0200, Ingo Molnar wrote:
 
 * Balbir Singh [EMAIL PROTECTED] wrote:
 
  Ingo Molnar wrote:
   i found an accounting bug in this: it didnt sum up threads correctly. 
   The patch below fixes this. The stime == 0 problem is still there 
   though.
   
 Ingo
   
  
  Thanks! I'll test the code on Monday. I do not understand the 
  sysctl_sched_smoothing functionality, so I do not understand its 
  impact on accounting. I'll take a look more closely
 
 basically sysctl_sched_smoothing is more of a 'experimental features 
 flag' kind of thing. I'll remove it soon, you should only need to 
 concentrate on the functionality that it enables by default.
 
   Ingo

Hi, Ingo,

I hope this patch addresses the stime == 0 problem.

This patch improves accounting of the CFS scheduler. We have the executed
run time in sum_exec_runtime field of the task. This patch splits the
sum_exec_runtime in the ratio of task-utime and task-stime to obtain
the user and system time of the task. 

TODO's:

1. Migrate getrusage() to use sum_exec_runtime so that the output in /proc
   is consistent with the data by running time(1).


Signed-off-by: Balbir Singh [EMAIL PROTECTED]
---

 fs/proc/array.c |   27 ---
 linux/sched.h   |0 
 2 files changed, 24 insertions(+), 3 deletions(-)

diff -puN fs/proc/array.c~cfs-distribute-accounting fs/proc/array.c
--- linux-2.6.22-rc2/fs/proc/array.c~cfs-distribute-accounting  2007-05-29 
13:47:47.0 +0530
+++ linux-2.6.22-rc2-balbir/fs/proc/array.c 2007-05-29 15:35:22.0 
+0530
@@ -332,7 +332,6 @@ static clock_t task_stime(struct task_st
return cputime_to_clock_t(p-stime);
 }
 
-
 static int do_task_stat(struct task_struct *task, char * buffer, int whole)
 {
unsigned long vsize, eip, esp, wchan = ~0UL;
@@ -400,8 +399,13 @@ static int do_task_stat(struct task_stru
 
min_flt += sig-min_flt;
maj_flt += sig-maj_flt;
-   utime += cputime_to_clock_t(sig-utime);
-   stime += cputime_to_clock_t(sig-stime);
+   if (!has_rt_policy(t))
+   utime += nsec_to_clock_t(
+   sig-sum_sched_runtime);
+   else {
+   utime += cputime_to_clock_t(sig-utime);
+   stime += cputime_to_clock_t(sig-stime);
+   }
}
 
sid = signal_session(sig);
@@ -421,6 +425,23 @@ static int do_task_stat(struct task_stru
stime = task_stime(task);
}
 
+   if (!has_rt_policy(task)) {
+   clock_t sum_us_time = utime + stime;
+   clock_t tu_time = cputime_to_clock_t(task-utime);
+   clock_t ts_time = cputime_to_clock_t(task-stime);
+   clock_t total_time = utime;
+
+   /*
+* Split up sched_exec_time according to the utime and
+* stime ratio. At this point utime contains the summed
+* sched_exec_runtime and stime is zero
+*/
+   if (sum_us_time) {
+   utime = ((tu_time * total_time) / sum_us_time);
+   stime = ((ts_time * total_time) / sum_us_time);
+   }
+   }
+
/* scale priority and nice values from timeslices to -20..20 */
/* to make it look like a normal Unix priority/nice value  */
priority = task_prio(task);
diff -puN kernel/sys.c~cfs-distribute-accounting kernel/sys.c
diff -puN include/linux/sched.h~cfs-distribute-accounting include/linux/sched.h
_

-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-28 Thread Li Yu

Li Yu wrote:


But as I observe by cat /proc/sched_debug (2.6.21.1, UP, RHEL4), I 
found the all waiting fields often are more than zero, or less than zero.


IMHO, the sum of task_struct->wait_runtime just is the denominator of 
all runnable time in some ways, is it right? if so, increasing the sum 
of wait_runtime just make scheduling decision more precise. so what's 
meaning for keeping the wait_runtime is zero-sum?



Forget it pls, here I am wrong, sorry for pestering.

Good luck

- Li Yu

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-28 Thread Ingo Molnar

* Balbir Singh <[EMAIL PROTECTED]> wrote:

> Ingo Molnar wrote:
> > i found an accounting bug in this: it didnt sum up threads correctly. 
> > The patch below fixes this. The stime == 0 problem is still there 
> > though.
> > 
> > Ingo
> > 
> 
> Thanks! I'll test the code on Monday. I do not understand the 
> sysctl_sched_smoothing functionality, so I do not understand its 
> impact on accounting. I'll take a look more closely

basically sysctl_sched_smoothing is more of a 'experimental features 
flag' kind of thing. I'll remove it soon, you should only need to 
concentrate on the functionality that it enables by default.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-28 Thread Ingo Molnar

* Balbir Singh [EMAIL PROTECTED] wrote:

 Ingo Molnar wrote:
  i found an accounting bug in this: it didnt sum up threads correctly. 
  The patch below fixes this. The stime == 0 problem is still there 
  though.
  
  Ingo
  
 
 Thanks! I'll test the code on Monday. I do not understand the 
 sysctl_sched_smoothing functionality, so I do not understand its 
 impact on accounting. I'll take a look more closely

basically sysctl_sched_smoothing is more of a 'experimental features 
flag' kind of thing. I'll remove it soon, you should only need to 
concentrate on the functionality that it enables by default.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-28 Thread Li Yu

Li Yu wrote:


But as I observe by cat /proc/sched_debug (2.6.21.1, UP, RHEL4), I 
found the all waiting fields often are more than zero, or less than zero.


IMHO, the sum of task_struct-wait_runtime just is the denominator of 
all runnable time in some ways, is it right? if so, increasing the sum 
of wait_runtime just make scheduling decision more precise. so what's 
meaning for keeping the wait_runtime is zero-sum?



Forget it pls, here I am wrong, sorry for pestering.

Good luck

- Li Yu

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-27 Thread Li Yu

Ingo Molnar wrote:

i'm pleased to announce release -v14 of the CFS scheduler patchset.

The CFS patch against v2.6.22-rc2, v2.6.21.1 or v2.6.20.10 can be 
downloaded from the usual place:
   
  http://people.redhat.com/mingo/cfs-scheduler/
  


In comment before distribute_fair_add(), we have such text:

/*
 * A task gets added back to the runnable tasks and gets
 * a small credit for the CPU time it missed out on while
 * it slept, so fix up all other runnable task's wait_runtime
 * so that the sum stays constant (around 0).
 *
[snip]
 */

But as I observe by cat /proc/sched_debug (2.6.21.1, UP, RHEL4), I found 
the all waiting fields often are more than zero, or less than zero.


IMHO, the sum of task_struct->wait_runtime just is the denominator of 
all runnable time in some ways, is it right? if so, increasing the sum 
of wait_runtime just make scheduling decision more precise. so what's 
meaning for keeping the wait_runtime is zero-sum?


Good luck

- Li Yu
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-27 Thread Li Yu

Ingo Molnar wrote:

i'm pleased to announce release -v14 of the CFS scheduler patchset.

The CFS patch against v2.6.22-rc2, v2.6.21.1 or v2.6.20.10 can be 
downloaded from the usual place:
   
  http://people.redhat.com/mingo/cfs-scheduler/
  


In comment before distribute_fair_add(), we have such text:

/*
 * A task gets added back to the runnable tasks and gets
 * a small credit for the CPU time it missed out on while
 * it slept, so fix up all other runnable task's wait_runtime
 * so that the sum stays constant (around 0).
 *
[snip]
 */

But as I observe by cat /proc/sched_debug (2.6.21.1, UP, RHEL4), I found 
the all waiting fields often are more than zero, or less than zero.


IMHO, the sum of task_struct-wait_runtime just is the denominator of 
all runnable time in some ways, is it right? if so, increasing the sum 
of wait_runtime just make scheduling decision more precise. so what's 
meaning for keeping the wait_runtime is zero-sum?


Good luck

- Li Yu
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-26 Thread Li Yu

Ingo Molnar wrote:

i'm pleased to announce release -v14 of the CFS scheduler patchset.

The CFS patch against v2.6.22-rc2, v2.6.21.1 or v2.6.20.10 can be 
downloaded from the usual place:
   
  http://people.redhat.com/mingo/cfs-scheduler/
  

I tried this on 2.6.21.1, Good work!

I have a doubt when read this patch: in update_stats_enqueue(), It seem 
that these statements in two brances of "if (p->load_weight > 
NICE_0_LOAD)" are same,

is it on purpose?

Good luck

- Li Yu

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-26 Thread S.Çağlar Onur
26 May 2007 Cts tarihinde, S.Çağlar Onur şunları yazmıştı: 
> 23 May 2007 Çar tarihinde, Ingo Molnar şunları yazmıştı:
> > As usual, any sort of feedback, bugreport, fix and suggestion is more
> > than welcome!
>
> I have another kaffeine [0.8.4]/xine-lib [1.1.6] problem with CFS for you
> :)
>
> Under load (compiling any Qt app. or kernel with -j1 or -j2) audio always
> goes sync with time (and i'm sure it never skips) but video starts slowdown
> and loses its sync with audio (like for the 10th sec. of a movie, audio is
> at 10th sec. also, but the shown video is from 7th sec.).
>
> After some time video suddenly wants to sync with audio and starts to play
> really fast (like fast-forward) and syncs with audio. But it will lose its
> audio/video sync after a while and loop continues like that.
>
> I also reproduced that behaviour with CFS-13, i'm not sure its reproducible
> with mainline cause for a long time i only use CFS (but i'm pretty sure
> that problem not exists or not hit me with CFS-1 to CFS-11). And its only
> reproducible with some load and mplayer plays same video without losing its
> audio/video sync with same load.

Ah, i forgot to add you can find the "strace -o kaffine.log -f -tttTTT 
kaffeine" and ps output while that problem exists at [1]

[1] http://cekirdek.pardus.org.tr/~caglar/kaffeine/
-- 
S.Çağlar Onur <[EMAIL PROTECTED]>
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in house!


signature.asc
Description: This is a digitally signed message part.


Re: [patch] CFS scheduler, -v14

2007-05-26 Thread S.Çağlar Onur
Hi Ingo;

23 May 2007 Çar tarihinde, Ingo Molnar şunları yazmıştı: 
> As usual, any sort of feedback, bugreport, fix and suggestion is more
> than welcome!

I have another kaffeine [0.8.4]/xine-lib [1.1.6] problem with CFS for you :)

Under load (compiling any Qt app. or kernel with -j1 or -j2) audio always goes 
sync with time (and i'm sure it never skips) but video starts slowdown and 
loses its sync with audio (like for the 10th sec. of a movie, audio is at 
10th sec. also, but the shown video is from 7th sec.). 

After some time video suddenly wants to sync with audio and starts to play 
really fast (like fast-forward) and syncs with audio. But it will lose its 
audio/video sync after a while and loop continues like that.

I also reproduced that behaviour with CFS-13, i'm not sure its reproducible 
with mainline cause for a long time i only use CFS (but i'm pretty sure that 
problem not exists or not hit me with CFS-1 to CFS-11). And its only 
reproducible with some load and mplayer plays same video without losing its 
audio/video sync with same load.

Cheers
-- 
S.Çağlar Onur <[EMAIL PROTECTED]>
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in house!


signature.asc
Description: This is a digitally signed message part.


Re: [patch] CFS scheduler, -v14

2007-05-26 Thread S.Çağlar Onur
Hi Ingo;

23 May 2007 Çar tarihinde, Ingo Molnar şunları yazmıştı: 
 As usual, any sort of feedback, bugreport, fix and suggestion is more
 than welcome!

I have another kaffeine [0.8.4]/xine-lib [1.1.6] problem with CFS for you :)

Under load (compiling any Qt app. or kernel with -j1 or -j2) audio always goes 
sync with time (and i'm sure it never skips) but video starts slowdown and 
loses its sync with audio (like for the 10th sec. of a movie, audio is at 
10th sec. also, but the shown video is from 7th sec.). 

After some time video suddenly wants to sync with audio and starts to play 
really fast (like fast-forward) and syncs with audio. But it will lose its 
audio/video sync after a while and loop continues like that.

I also reproduced that behaviour with CFS-13, i'm not sure its reproducible 
with mainline cause for a long time i only use CFS (but i'm pretty sure that 
problem not exists or not hit me with CFS-1 to CFS-11). And its only 
reproducible with some load and mplayer plays same video without losing its 
audio/video sync with same load.

Cheers
-- 
S.Çağlar Onur [EMAIL PROTECTED]
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in house!


signature.asc
Description: This is a digitally signed message part.


Re: [patch] CFS scheduler, -v14

2007-05-26 Thread S.Çağlar Onur
26 May 2007 Cts tarihinde, S.Çağlar Onur şunları yazmıştı: 
 23 May 2007 Çar tarihinde, Ingo Molnar şunları yazmıştı:
  As usual, any sort of feedback, bugreport, fix and suggestion is more
  than welcome!

 I have another kaffeine [0.8.4]/xine-lib [1.1.6] problem with CFS for you
 :)

 Under load (compiling any Qt app. or kernel with -j1 or -j2) audio always
 goes sync with time (and i'm sure it never skips) but video starts slowdown
 and loses its sync with audio (like for the 10th sec. of a movie, audio is
 at 10th sec. also, but the shown video is from 7th sec.).

 After some time video suddenly wants to sync with audio and starts to play
 really fast (like fast-forward) and syncs with audio. But it will lose its
 audio/video sync after a while and loop continues like that.

 I also reproduced that behaviour with CFS-13, i'm not sure its reproducible
 with mainline cause for a long time i only use CFS (but i'm pretty sure
 that problem not exists or not hit me with CFS-1 to CFS-11). And its only
 reproducible with some load and mplayer plays same video without losing its
 audio/video sync with same load.

Ah, i forgot to add you can find the strace -o kaffine.log -f -tttTTT 
kaffeine and ps output while that problem exists at [1]

[1] http://cekirdek.pardus.org.tr/~caglar/kaffeine/
-- 
S.Çağlar Onur [EMAIL PROTECTED]
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in house!


signature.asc
Description: This is a digitally signed message part.


Re: [patch] CFS scheduler, -v14

2007-05-26 Thread Li Yu

Ingo Molnar wrote:

i'm pleased to announce release -v14 of the CFS scheduler patchset.

The CFS patch against v2.6.22-rc2, v2.6.21.1 or v2.6.20.10 can be 
downloaded from the usual place:
   
  http://people.redhat.com/mingo/cfs-scheduler/
  

I tried this on 2.6.21.1, Good work!

I have a doubt when read this patch: in update_stats_enqueue(), It seem 
that these statements in two brances of if (p-load_weight  
NICE_0_LOAD) are same,

is it on purpose?

Good luck

- Li Yu

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-25 Thread Balbir Singh
Ingo Molnar wrote:
> i found an accounting bug in this: it didnt sum up threads correctly. 
> The patch below fixes this. The stime == 0 problem is still there 
> though.
> 
>   Ingo
> 

Thanks! I'll test the code on Monday. I do not understand the
sysctl_sched_smoothing functionality, so I do not understand
its impact on accounting. I'll take a look more closely

-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-25 Thread Ingo Molnar

* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> btw., CFS does this change to fs/proc/array.c:
> 
> @@ -410,6 +408,14 @@ static int do_task_stat(struct task_stru
>   /* convert nsec -> ticks */
>   start_time = nsec_to_clock_t(start_time);
>  
> + /*
> +  * Use CFS's precise accounting, if available:
> +  */
> + if (!has_rt_policy(task)) {
> + utime = nsec_to_clock_t(task->sum_exec_runtime);
> + stime = 0;
> + }
> +
>   res = sprintf(buffer,"%d (%s) %c %d %d %d %d %d %lu %lu \
>  %lu %lu %lu %lu %lu %ld %ld %ld %ld %d 0 %llu %lu %ld %lu %lu %lu %lu %lu \
>  %lu %lu %lu %lu %lu %lu %lu %lu %d %d %lu %lu %llu\n",
> 
> if you have some spare capacity to improve this code, it could be 
> further enhanced by not setting 'stime' to zero, but using the 
> existing jiffies based utime/stime statistics as a _ratio_ to split up 
> the precise p->sum_exec_runtime. That way we dont have to add precise 
> accounting to syscall entry/exit points (that would be quite 
> expensive), but still the sum of utime+stime would be very precise. 
> (and that's what matters most anyway)

i found an accounting bug in this: it didnt sum up threads correctly. 
The patch below fixes this. The stime == 0 problem is still there 
though.

Ingo

Index: linux/fs/proc/array.c
===
--- linux.orig/fs/proc/array.c
+++ linux/fs/proc/array.c
@@ -310,6 +310,29 @@ int proc_pid_status(struct task_struct *
return buffer - orig;
 }
 
+static clock_t task_utime(struct task_struct *p)
+{
+   /*
+* Use CFS's precise accounting, if available:
+*/
+   if (!has_rt_policy(p) && !(sysctl_sched_load_smoothing & 128))
+   return nsec_to_clock_t(p->sum_exec_runtime);
+
+   return cputime_to_clock_t(p->utime);
+}
+
+static clock_t task_stime(struct task_struct *p)
+{
+   /*
+* Use CFS's precise accounting, if available:
+*/
+   if (!has_rt_policy(p) && !(sysctl_sched_load_smoothing & 128))
+   return 0;
+
+   return cputime_to_clock_t(p->stime);
+}
+
+
 static int do_task_stat(struct task_struct *task, char * buffer, int whole)
 {
unsigned long vsize, eip, esp, wchan = ~0UL;
@@ -324,7 +347,8 @@ static int do_task_stat(struct task_stru
unsigned long long start_time;
unsigned long cmin_flt = 0, cmaj_flt = 0;
unsigned long  min_flt = 0,  maj_flt = 0;
-   cputime_t cutime, cstime, utime, stime;
+   cputime_t cutime, cstime;
+   clock_t utime, stime;
unsigned long rsslim = 0;
char tcomm[sizeof(task->comm)];
unsigned long flags;
@@ -342,7 +366,8 @@ static int do_task_stat(struct task_stru
 
sigemptyset();
sigemptyset();
-   cutime = cstime = utime = stime = cputime_zero;
+   cutime = cstime = cputime_zero;
+   utime = stime = 0;
 
rcu_read_lock();
if (lock_task_sighand(task, )) {
@@ -368,15 +393,15 @@ static int do_task_stat(struct task_stru
do {
min_flt += t->min_flt;
maj_flt += t->maj_flt;
-   utime = cputime_add(utime, t->utime);
-   stime = cputime_add(stime, t->stime);
+   utime += task_utime(t);
+   stime += task_stime(t);
t = next_thread(t);
} while (t != task);
 
min_flt += sig->min_flt;
maj_flt += sig->maj_flt;
-   utime = cputime_add(utime, sig->utime);
-   stime = cputime_add(stime, sig->stime);
+   utime += cputime_to_clock_t(sig->utime);
+   stime += cputime_to_clock_t(sig->stime);
}
 
sid = signal_session(sig);
@@ -392,8 +417,8 @@ static int do_task_stat(struct task_stru
if (!whole) {
min_flt = task->min_flt;
maj_flt = task->maj_flt;
-   utime = task->utime;
-   stime = task->stime;
+   utime = task_utime(task);
+   stime = task_stime(task);
}
 
/* scale priority and nice values from timeslices to -20..20 */
@@ -408,14 +433,6 @@ static int do_task_stat(struct task_stru
/* convert nsec -> ticks */
start_time = nsec_to_clock_t(start_time);
 
-   /*
-* Use CFS's precise accounting, if available:
-*/
-   if (!has_rt_policy(task)) {
-   utime = nsec_to_clock_t(task->sum_exec_runtime);
-   stime = 0;
-   }
-
res = sprintf(buffer,"%d (%s) %c %d %d %d %d %d %lu %lu \
 %lu %lu %lu %lu %lu %ld %ld %ld %ld %d 0 %llu %lu %ld %lu %lu %lu %lu %lu \
 %lu %lu %lu %lu %lu %lu %lu %lu %d %d %lu %lu %llu\n",
-
To unsubscribe from this list: send the line "unsubscribe 

Re: [patch] CFS scheduler, -v14

2007-05-25 Thread Ingo Molnar

* Ingo Molnar [EMAIL PROTECTED] wrote:

 btw., CFS does this change to fs/proc/array.c:
 
 @@ -410,6 +408,14 @@ static int do_task_stat(struct task_stru
   /* convert nsec - ticks */
   start_time = nsec_to_clock_t(start_time);
  
 + /*
 +  * Use CFS's precise accounting, if available:
 +  */
 + if (!has_rt_policy(task)) {
 + utime = nsec_to_clock_t(task-sum_exec_runtime);
 + stime = 0;
 + }
 +
   res = sprintf(buffer,%d (%s) %c %d %d %d %d %d %lu %lu \
  %lu %lu %lu %lu %lu %ld %ld %ld %ld %d 0 %llu %lu %ld %lu %lu %lu %lu %lu \
  %lu %lu %lu %lu %lu %lu %lu %lu %d %d %lu %lu %llu\n,
 
 if you have some spare capacity to improve this code, it could be 
 further enhanced by not setting 'stime' to zero, but using the 
 existing jiffies based utime/stime statistics as a _ratio_ to split up 
 the precise p-sum_exec_runtime. That way we dont have to add precise 
 accounting to syscall entry/exit points (that would be quite 
 expensive), but still the sum of utime+stime would be very precise. 
 (and that's what matters most anyway)

i found an accounting bug in this: it didnt sum up threads correctly. 
The patch below fixes this. The stime == 0 problem is still there 
though.

Ingo

Index: linux/fs/proc/array.c
===
--- linux.orig/fs/proc/array.c
+++ linux/fs/proc/array.c
@@ -310,6 +310,29 @@ int proc_pid_status(struct task_struct *
return buffer - orig;
 }
 
+static clock_t task_utime(struct task_struct *p)
+{
+   /*
+* Use CFS's precise accounting, if available:
+*/
+   if (!has_rt_policy(p)  !(sysctl_sched_load_smoothing  128))
+   return nsec_to_clock_t(p-sum_exec_runtime);
+
+   return cputime_to_clock_t(p-utime);
+}
+
+static clock_t task_stime(struct task_struct *p)
+{
+   /*
+* Use CFS's precise accounting, if available:
+*/
+   if (!has_rt_policy(p)  !(sysctl_sched_load_smoothing  128))
+   return 0;
+
+   return cputime_to_clock_t(p-stime);
+}
+
+
 static int do_task_stat(struct task_struct *task, char * buffer, int whole)
 {
unsigned long vsize, eip, esp, wchan = ~0UL;
@@ -324,7 +347,8 @@ static int do_task_stat(struct task_stru
unsigned long long start_time;
unsigned long cmin_flt = 0, cmaj_flt = 0;
unsigned long  min_flt = 0,  maj_flt = 0;
-   cputime_t cutime, cstime, utime, stime;
+   cputime_t cutime, cstime;
+   clock_t utime, stime;
unsigned long rsslim = 0;
char tcomm[sizeof(task-comm)];
unsigned long flags;
@@ -342,7 +366,8 @@ static int do_task_stat(struct task_stru
 
sigemptyset(sigign);
sigemptyset(sigcatch);
-   cutime = cstime = utime = stime = cputime_zero;
+   cutime = cstime = cputime_zero;
+   utime = stime = 0;
 
rcu_read_lock();
if (lock_task_sighand(task, flags)) {
@@ -368,15 +393,15 @@ static int do_task_stat(struct task_stru
do {
min_flt += t-min_flt;
maj_flt += t-maj_flt;
-   utime = cputime_add(utime, t-utime);
-   stime = cputime_add(stime, t-stime);
+   utime += task_utime(t);
+   stime += task_stime(t);
t = next_thread(t);
} while (t != task);
 
min_flt += sig-min_flt;
maj_flt += sig-maj_flt;
-   utime = cputime_add(utime, sig-utime);
-   stime = cputime_add(stime, sig-stime);
+   utime += cputime_to_clock_t(sig-utime);
+   stime += cputime_to_clock_t(sig-stime);
}
 
sid = signal_session(sig);
@@ -392,8 +417,8 @@ static int do_task_stat(struct task_stru
if (!whole) {
min_flt = task-min_flt;
maj_flt = task-maj_flt;
-   utime = task-utime;
-   stime = task-stime;
+   utime = task_utime(task);
+   stime = task_stime(task);
}
 
/* scale priority and nice values from timeslices to -20..20 */
@@ -408,14 +433,6 @@ static int do_task_stat(struct task_stru
/* convert nsec - ticks */
start_time = nsec_to_clock_t(start_time);
 
-   /*
-* Use CFS's precise accounting, if available:
-*/
-   if (!has_rt_policy(task)) {
-   utime = nsec_to_clock_t(task-sum_exec_runtime);
-   stime = 0;
-   }
-
res = sprintf(buffer,%d (%s) %c %d %d %d %d %d %lu %lu \
 %lu %lu %lu %lu %lu %ld %ld %ld %ld %d 0 %llu %lu %ld %lu %lu %lu %lu %lu \
 %lu %lu %lu %lu %lu %lu %lu %lu %d %d %lu %lu %llu\n,
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to 

Re: [patch] CFS scheduler, -v14

2007-05-25 Thread Balbir Singh
Ingo Molnar wrote:
 i found an accounting bug in this: it didnt sum up threads correctly. 
 The patch below fixes this. The stime == 0 problem is still there 
 though.
 
   Ingo
 

Thanks! I'll test the code on Monday. I do not understand the
sysctl_sched_smoothing functionality, so I do not understand
its impact on accounting. I'll take a look more closely

-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-24 Thread Jeremy Fitzhardinge
Ingo Molnar wrote:
> it treats it as a per-cpu clock.
>   

Excellent.  I'd noticed it seems to work pretty well in a Xen guest with
lots of stolen time, but I haven't really evaluated it in detail.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-24 Thread Ingo Molnar

* Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote:

> Ingo Molnar wrote:
> > nice! I've merged your patch and it built/booted fine so it should show 
> > up in -v15. This should also play well with Andi's sched_clock() 
> > enhancements in -mm, slated for .23.
> >   
> 
> BTW, does CFS treat sched_clock as a per-cpu clock, or will it compare 
> time values of sched_clock()s called on different CPUs?

it treats it as a per-cpu clock.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-24 Thread Jeremy Fitzhardinge
Ingo Molnar wrote:
> nice! I've merged your patch and it built/booted fine so it should show 
> up in -v15. This should also play well with Andi's sched_clock() 
> enhancements in -mm, slated for .23.
>   

BTW, does CFS treat sched_clock as a per-cpu clock, or will it compare
time values of sched_clock()s called on different CPUs?

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-24 Thread Balbir Singh
Ingo Molnar wrote:
> btw., i think some more consolidation could be done in this area. We've 
> now got the traditional /proc/PID/stat metrics, schedstats, taskstats 
> and delay accounting and with CFS we've got /proc/sched_debug and 
> /proc/PID/sched. There's a fair amount of overlap.
> 

Yes. true. schedstats and delay accounting share code and taskstats is
a transport mechansim. I'll try and look at /proc/PID/stat and /proc/PID/sched
and /proc/sched_debug.

> btw., CFS does this change to fs/proc/array.c:
> 
> @@ -410,6 +408,14 @@ static int do_task_stat(struct task_stru
>   /* convert nsec -> ticks */
>   start_time = nsec_to_clock_t(start_time);
> 
> + /*
> +  * Use CFS's precise accounting, if available:
> +  */
> + if (!has_rt_policy(task)) {
> + utime = nsec_to_clock_t(task->sum_exec_runtime);
> + stime = 0;
> + }
> +
>   res = sprintf(buffer,"%d (%s) %c %d %d %d %d %d %lu %lu \
>  %lu %lu %lu %lu %lu %ld %ld %ld %ld %d 0 %llu %lu %ld %lu %lu %lu %lu %lu \
>  %lu %lu %lu %lu %lu %lu %lu %lu %d %d %lu %lu %llu\n",
> 
> if you have some spare capacity to improve this code, it could be 
> further enhanced by not setting 'stime' to zero, but using the existing 
> jiffies based utime/stime statistics as a _ratio_ to split up the 
> precise p->sum_exec_runtime. That way we dont have to add precise 
> accounting to syscall entry/exit points (that would be quite expensive), 
> but still the sum of utime+stime would be very precise. (and that's what 
> matters most anyway)
> 
>   Ingo

I'll start looking into splitting sum_exec_time into utime and stime
based on the ratio already present in the task structure. 

-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-24 Thread Ingo Molnar

* Balbir Singh <[EMAIL PROTECTED]> wrote:

> Hi, Ingo,
> 
> I've implemented a patch on top of v14 for better accounting of 
> sched_info statistics. Earlier, sched_info relied on jiffies for 
> accounting and I've seen applications that show "0" cpu usage 
> statistics (in delay accounting and from /proc) even though they've 
> been running on the CPU for a long time. The basic problem is that 
> accounting in jiffies is too coarse to be accurate.
> 
> The patch below uses sched_clock() for sched_info accounting.

nice! I've merged your patch and it built/booted fine so it should show 
up in -v15. This should also play well with Andi's sched_clock() 
enhancements in -mm, slated for .23.

btw., i think some more consolidation could be done in this area. We've 
now got the traditional /proc/PID/stat metrics, schedstats, taskstats 
and delay accounting and with CFS we've got /proc/sched_debug and 
/proc/PID/sched. There's a fair amount of overlap.

btw., CFS does this change to fs/proc/array.c:

@@ -410,6 +408,14 @@ static int do_task_stat(struct task_stru
/* convert nsec -> ticks */
start_time = nsec_to_clock_t(start_time);
 
+   /*
+* Use CFS's precise accounting, if available:
+*/
+   if (!has_rt_policy(task)) {
+   utime = nsec_to_clock_t(task->sum_exec_runtime);
+   stime = 0;
+   }
+
res = sprintf(buffer,"%d (%s) %c %d %d %d %d %d %lu %lu \
 %lu %lu %lu %lu %lu %ld %ld %ld %ld %d 0 %llu %lu %ld %lu %lu %lu %lu %lu \
 %lu %lu %lu %lu %lu %lu %lu %lu %d %d %lu %lu %llu\n",

if you have some spare capacity to improve this code, it could be 
further enhanced by not setting 'stime' to zero, but using the existing 
jiffies based utime/stime statistics as a _ratio_ to split up the 
precise p->sum_exec_runtime. That way we dont have to add precise 
accounting to syscall entry/exit points (that would be quite expensive), 
but still the sum of utime+stime would be very precise. (and that's what 
matters most anyway)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-24 Thread Balbir Singh
On Wed, May 23, 2007 at 02:06:16PM +0200, Ingo Molnar wrote:
> 
> i'm pleased to announce release -v14 of the CFS scheduler patchset.
> 
> The CFS patch against v2.6.22-rc2, v2.6.21.1 or v2.6.20.10 can be 
> downloaded from the usual place:
>
>   http://people.redhat.com/mingo/cfs-scheduler/
> 
> In -v14 the biggest user-visible change is increased sleeper fairness 
> (done by Mike Galbraith and myself), which results in better 
> interactivity under load. In particular 3D apps such as compiz/Beryl or 
> games benefit from it and should be less sensitive to other apps running 
> in parallel to them - but plain X benefits from it too.
> 
> CFS is converging nicely, with no regressions reported against -v13. 
> Changes since -v13:
> 
>  - increase sleeper-fairness (Mike Galbraith, me)
> 
>  - kernel/sched_debug.c printk argument fixes for ia64 (Andrew Morton)
> 
>  - CFS documentation fixes (Pranith Kumar D)
> 
>  - increased the default rescheduling granularity to 3msecs on UP,
>6 msecs on 2-way systems
> 
>  - small update_curr() precision fix
> 
>  - added an overview section to Documentation/sched-design-CFS.txt
> 
>  - misc cleanups
> 
> As usual, any sort of feedback, bugreport, fix and suggestion is more 
> than welcome!
> 
>   Ingo

Hi, Ingo,

I've implemented a patch on top of v14 for better accounting of
sched_info statistics. Earlier, sched_info relied on jiffies for
accounting and I've seen applications that show "0" cpu usage
statistics (in delay accounting and from /proc) even though they've
been running on the CPU for a long time. The basic problem is that
accounting in jiffies is too coarse to be accurate.

The patch below uses sched_clock() for sched_info accounting.

Comments, suggestions, feedback is more than welcome!

Signed-off-by: Balbir Singh <[EMAIL PROTECTED]>
---

 include/linux/sched.h |   10 +-
 kernel/delayacct.c|   10 +-
 kernel/sched_stats.h  |   28 ++--
 3 files changed, 24 insertions(+), 24 deletions(-)

diff -puN kernel/sched_stats.h~move-sched-accounting-to-sched_clock 
kernel/sched_stats.h
--- linux-2.6.21/kernel/sched_stats.h~move-sched-accounting-to-sched_clock  
2007-05-24 11:23:38.0 +0530
+++ linux-2.6.21-balbir/kernel/sched_stats.h2007-05-24 11:23:38.0 
+0530
@@ -97,10 +97,10 @@ const struct file_operations proc_scheds
  * Expects runqueue lock to be held for atomicity of update
  */
 static inline void
-rq_sched_info_arrive(struct rq *rq, unsigned long delta_jiffies)
+rq_sched_info_arrive(struct rq *rq, unsigned long long delta)
 {
if (rq) {
-   rq->rq_sched_info.run_delay += delta_jiffies;
+   rq->rq_sched_info.run_delay += delta;
rq->rq_sched_info.pcnt++;
}
 }
@@ -109,19 +109,19 @@ rq_sched_info_arrive(struct rq *rq, unsi
  * Expects runqueue lock to be held for atomicity of update
  */
 static inline void
-rq_sched_info_depart(struct rq *rq, unsigned long delta_jiffies)
+rq_sched_info_depart(struct rq *rq, unsigned long long delta)
 {
if (rq)
-   rq->rq_sched_info.cpu_time += delta_jiffies;
+   rq->rq_sched_info.cpu_time += delta;
 }
 # define schedstat_inc(rq, field)  do { (rq)->field++; } while (0)
 # define schedstat_add(rq, field, amt) do { (rq)->field += (amt); } while (0)
 #else /* !CONFIG_SCHEDSTATS */
 static inline void
-rq_sched_info_arrive(struct rq *rq, unsigned long delta_jiffies)
+rq_sched_info_arrive(struct rq *rq, unsigned long long delta)
 {}
 static inline void
-rq_sched_info_depart(struct rq *rq, unsigned long delta_jiffies)
+rq_sched_info_depart(struct rq *rq, unsigned long long delta)
 {}
 # define schedstat_inc(rq, field)  do { } while (0)
 # define schedstat_add(rq, field, amt) do { } while (0)
@@ -155,16 +155,16 @@ static inline void sched_info_dequeued(s
  */
 static void sched_info_arrive(struct task_struct *t)
 {
-   unsigned long now = jiffies, delta_jiffies = 0;
+   unsigned long long now = sched_clock(), delta = 0;
 
if (t->sched_info.last_queued)
-   delta_jiffies = now - t->sched_info.last_queued;
+   delta = now - t->sched_info.last_queued;
sched_info_dequeued(t);
-   t->sched_info.run_delay += delta_jiffies;
+   t->sched_info.run_delay += delta;
t->sched_info.last_arrival = now;
t->sched_info.pcnt++;
 
-   rq_sched_info_arrive(task_rq(t), delta_jiffies);
+   rq_sched_info_arrive(task_rq(t), delta);
 }
 
 /*
@@ -186,7 +186,7 @@ static inline void sched_info_queued(str
 {
if (unlikely(sched_info_on()))
if (!t->sched_info.last_queued)
-   t->sched_info.last_queued = jiffies;
+   t->sched_info.last_queued = sched_clock();
 }
 
 /*
@@ -195,10 +195,10 @@ static inline void sched_info_queued(str
  */
 static inline void sched_info_depart(struct task_struct *t)
 {
-   unsigned long delta_jiffies = jiffies - 

Re: [patch] CFS scheduler, -v14

2007-05-24 Thread Balbir Singh
On Wed, May 23, 2007 at 02:06:16PM +0200, Ingo Molnar wrote:
 
 i'm pleased to announce release -v14 of the CFS scheduler patchset.
 
 The CFS patch against v2.6.22-rc2, v2.6.21.1 or v2.6.20.10 can be 
 downloaded from the usual place:

   http://people.redhat.com/mingo/cfs-scheduler/
 
 In -v14 the biggest user-visible change is increased sleeper fairness 
 (done by Mike Galbraith and myself), which results in better 
 interactivity under load. In particular 3D apps such as compiz/Beryl or 
 games benefit from it and should be less sensitive to other apps running 
 in parallel to them - but plain X benefits from it too.
 
 CFS is converging nicely, with no regressions reported against -v13. 
 Changes since -v13:
 
  - increase sleeper-fairness (Mike Galbraith, me)
 
  - kernel/sched_debug.c printk argument fixes for ia64 (Andrew Morton)
 
  - CFS documentation fixes (Pranith Kumar D)
 
  - increased the default rescheduling granularity to 3msecs on UP,
6 msecs on 2-way systems
 
  - small update_curr() precision fix
 
  - added an overview section to Documentation/sched-design-CFS.txt
 
  - misc cleanups
 
 As usual, any sort of feedback, bugreport, fix and suggestion is more 
 than welcome!
 
   Ingo

Hi, Ingo,

I've implemented a patch on top of v14 for better accounting of
sched_info statistics. Earlier, sched_info relied on jiffies for
accounting and I've seen applications that show 0 cpu usage
statistics (in delay accounting and from /proc) even though they've
been running on the CPU for a long time. The basic problem is that
accounting in jiffies is too coarse to be accurate.

The patch below uses sched_clock() for sched_info accounting.

Comments, suggestions, feedback is more than welcome!

Signed-off-by: Balbir Singh [EMAIL PROTECTED]
---

 include/linux/sched.h |   10 +-
 kernel/delayacct.c|   10 +-
 kernel/sched_stats.h  |   28 ++--
 3 files changed, 24 insertions(+), 24 deletions(-)

diff -puN kernel/sched_stats.h~move-sched-accounting-to-sched_clock 
kernel/sched_stats.h
--- linux-2.6.21/kernel/sched_stats.h~move-sched-accounting-to-sched_clock  
2007-05-24 11:23:38.0 +0530
+++ linux-2.6.21-balbir/kernel/sched_stats.h2007-05-24 11:23:38.0 
+0530
@@ -97,10 +97,10 @@ const struct file_operations proc_scheds
  * Expects runqueue lock to be held for atomicity of update
  */
 static inline void
-rq_sched_info_arrive(struct rq *rq, unsigned long delta_jiffies)
+rq_sched_info_arrive(struct rq *rq, unsigned long long delta)
 {
if (rq) {
-   rq-rq_sched_info.run_delay += delta_jiffies;
+   rq-rq_sched_info.run_delay += delta;
rq-rq_sched_info.pcnt++;
}
 }
@@ -109,19 +109,19 @@ rq_sched_info_arrive(struct rq *rq, unsi
  * Expects runqueue lock to be held for atomicity of update
  */
 static inline void
-rq_sched_info_depart(struct rq *rq, unsigned long delta_jiffies)
+rq_sched_info_depart(struct rq *rq, unsigned long long delta)
 {
if (rq)
-   rq-rq_sched_info.cpu_time += delta_jiffies;
+   rq-rq_sched_info.cpu_time += delta;
 }
 # define schedstat_inc(rq, field)  do { (rq)-field++; } while (0)
 # define schedstat_add(rq, field, amt) do { (rq)-field += (amt); } while (0)
 #else /* !CONFIG_SCHEDSTATS */
 static inline void
-rq_sched_info_arrive(struct rq *rq, unsigned long delta_jiffies)
+rq_sched_info_arrive(struct rq *rq, unsigned long long delta)
 {}
 static inline void
-rq_sched_info_depart(struct rq *rq, unsigned long delta_jiffies)
+rq_sched_info_depart(struct rq *rq, unsigned long long delta)
 {}
 # define schedstat_inc(rq, field)  do { } while (0)
 # define schedstat_add(rq, field, amt) do { } while (0)
@@ -155,16 +155,16 @@ static inline void sched_info_dequeued(s
  */
 static void sched_info_arrive(struct task_struct *t)
 {
-   unsigned long now = jiffies, delta_jiffies = 0;
+   unsigned long long now = sched_clock(), delta = 0;
 
if (t-sched_info.last_queued)
-   delta_jiffies = now - t-sched_info.last_queued;
+   delta = now - t-sched_info.last_queued;
sched_info_dequeued(t);
-   t-sched_info.run_delay += delta_jiffies;
+   t-sched_info.run_delay += delta;
t-sched_info.last_arrival = now;
t-sched_info.pcnt++;
 
-   rq_sched_info_arrive(task_rq(t), delta_jiffies);
+   rq_sched_info_arrive(task_rq(t), delta);
 }
 
 /*
@@ -186,7 +186,7 @@ static inline void sched_info_queued(str
 {
if (unlikely(sched_info_on()))
if (!t-sched_info.last_queued)
-   t-sched_info.last_queued = jiffies;
+   t-sched_info.last_queued = sched_clock();
 }
 
 /*
@@ -195,10 +195,10 @@ static inline void sched_info_queued(str
  */
 static inline void sched_info_depart(struct task_struct *t)
 {
-   unsigned long delta_jiffies = jiffies - t-sched_info.last_arrival;
+   unsigned long long delta = 

Re: [patch] CFS scheduler, -v14

2007-05-24 Thread Ingo Molnar

* Balbir Singh [EMAIL PROTECTED] wrote:

 Hi, Ingo,
 
 I've implemented a patch on top of v14 for better accounting of 
 sched_info statistics. Earlier, sched_info relied on jiffies for 
 accounting and I've seen applications that show 0 cpu usage 
 statistics (in delay accounting and from /proc) even though they've 
 been running on the CPU for a long time. The basic problem is that 
 accounting in jiffies is too coarse to be accurate.
 
 The patch below uses sched_clock() for sched_info accounting.

nice! I've merged your patch and it built/booted fine so it should show 
up in -v15. This should also play well with Andi's sched_clock() 
enhancements in -mm, slated for .23.

btw., i think some more consolidation could be done in this area. We've 
now got the traditional /proc/PID/stat metrics, schedstats, taskstats 
and delay accounting and with CFS we've got /proc/sched_debug and 
/proc/PID/sched. There's a fair amount of overlap.

btw., CFS does this change to fs/proc/array.c:

@@ -410,6 +408,14 @@ static int do_task_stat(struct task_stru
/* convert nsec - ticks */
start_time = nsec_to_clock_t(start_time);
 
+   /*
+* Use CFS's precise accounting, if available:
+*/
+   if (!has_rt_policy(task)) {
+   utime = nsec_to_clock_t(task-sum_exec_runtime);
+   stime = 0;
+   }
+
res = sprintf(buffer,%d (%s) %c %d %d %d %d %d %lu %lu \
 %lu %lu %lu %lu %lu %ld %ld %ld %ld %d 0 %llu %lu %ld %lu %lu %lu %lu %lu \
 %lu %lu %lu %lu %lu %lu %lu %lu %d %d %lu %lu %llu\n,

if you have some spare capacity to improve this code, it could be 
further enhanced by not setting 'stime' to zero, but using the existing 
jiffies based utime/stime statistics as a _ratio_ to split up the 
precise p-sum_exec_runtime. That way we dont have to add precise 
accounting to syscall entry/exit points (that would be quite expensive), 
but still the sum of utime+stime would be very precise. (and that's what 
matters most anyway)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-24 Thread Balbir Singh
Ingo Molnar wrote:
 btw., i think some more consolidation could be done in this area. We've 
 now got the traditional /proc/PID/stat metrics, schedstats, taskstats 
 and delay accounting and with CFS we've got /proc/sched_debug and 
 /proc/PID/sched. There's a fair amount of overlap.
 

Yes. true. schedstats and delay accounting share code and taskstats is
a transport mechansim. I'll try and look at /proc/PID/stat and /proc/PID/sched
and /proc/sched_debug.

 btw., CFS does this change to fs/proc/array.c:
 
 @@ -410,6 +408,14 @@ static int do_task_stat(struct task_stru
   /* convert nsec - ticks */
   start_time = nsec_to_clock_t(start_time);
 
 + /*
 +  * Use CFS's precise accounting, if available:
 +  */
 + if (!has_rt_policy(task)) {
 + utime = nsec_to_clock_t(task-sum_exec_runtime);
 + stime = 0;
 + }
 +
   res = sprintf(buffer,%d (%s) %c %d %d %d %d %d %lu %lu \
  %lu %lu %lu %lu %lu %ld %ld %ld %ld %d 0 %llu %lu %ld %lu %lu %lu %lu %lu \
  %lu %lu %lu %lu %lu %lu %lu %lu %d %d %lu %lu %llu\n,
 
 if you have some spare capacity to improve this code, it could be 
 further enhanced by not setting 'stime' to zero, but using the existing 
 jiffies based utime/stime statistics as a _ratio_ to split up the 
 precise p-sum_exec_runtime. That way we dont have to add precise 
 accounting to syscall entry/exit points (that would be quite expensive), 
 but still the sum of utime+stime would be very precise. (and that's what 
 matters most anyway)
 
   Ingo

I'll start looking into splitting sum_exec_time into utime and stime
based on the ratio already present in the task structure. 

-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-24 Thread Jeremy Fitzhardinge
Ingo Molnar wrote:
 nice! I've merged your patch and it built/booted fine so it should show 
 up in -v15. This should also play well with Andi's sched_clock() 
 enhancements in -mm, slated for .23.
   

BTW, does CFS treat sched_clock as a per-cpu clock, or will it compare
time values of sched_clock()s called on different CPUs?

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-24 Thread Ingo Molnar

* Jeremy Fitzhardinge [EMAIL PROTECTED] wrote:

 Ingo Molnar wrote:
  nice! I've merged your patch and it built/booted fine so it should show 
  up in -v15. This should also play well with Andi's sched_clock() 
  enhancements in -mm, slated for .23.

 
 BTW, does CFS treat sched_clock as a per-cpu clock, or will it compare 
 time values of sched_clock()s called on different CPUs?

it treats it as a per-cpu clock.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-24 Thread Jeremy Fitzhardinge
Ingo Molnar wrote:
 it treats it as a per-cpu clock.
   

Excellent.  I'd noticed it seems to work pretty well in a Xen guest with
lots of stolen time, but I haven't really evaluated it in detail.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-23 Thread Nicolas Mailhot
Le mercredi 23 mai 2007 à 21:57 +0200, Ingo Molnar a écrit :
> * Nicolas Mailhot <[EMAIL PROTECTED]> wrote:
> 
> > Ingo Molnar  elte.hu> writes:
> > 
> > Hi Ingo
> > 
> > > i'm pleased to announce release -v14 of the CFS scheduler patchset.
> > > 
> > > The CFS patch against v2.6.22-rc2, v2.6.21.1 or v2.6.20.10 can be 
> > > downloaded from the usual place:
> > > 
> > >   http://people.redhat.com/mingo/cfs-scheduler/
> > 
> > I get a forbidden access on 
> > http://people.redhat.com/mingo/cfs-scheduler/sched-cfs-v2.6.22-rc2-mm1-v14.patch
> 
> oops - fixed it.

Works now, thanks!

-- 
Nicolas Mailhot


signature.asc
Description: Ceci est une partie de message	numériquement signée


Re: [patch] CFS scheduler, -v14

2007-05-23 Thread Ingo Molnar

* Nicolas Mailhot <[EMAIL PROTECTED]> wrote:

> Ingo Molnar  elte.hu> writes:
> 
> Hi Ingo
> 
> > i'm pleased to announce release -v14 of the CFS scheduler patchset.
> > 
> > The CFS patch against v2.6.22-rc2, v2.6.21.1 or v2.6.20.10 can be 
> > downloaded from the usual place:
> > 
> >   http://people.redhat.com/mingo/cfs-scheduler/
> 
> I get a forbidden access on 
> http://people.redhat.com/mingo/cfs-scheduler/sched-cfs-v2.6.22-rc2-mm1-v14.patch

oops - fixed it.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-23 Thread Nicolas Mailhot
Ingo Molnar  elte.hu> writes:

Hi Ingo

> i'm pleased to announce release -v14 of the CFS scheduler patchset.
> 
> The CFS patch against v2.6.22-rc2, v2.6.21.1 or v2.6.20.10 can be 
> downloaded from the usual place:
> 
>   http://people.redhat.com/mingo/cfs-scheduler/


I get a forbidden access on 
http://people.redhat.com/mingo/cfs-scheduler/sched-cfs-v2.6.22-rc2-mm1-v14.patch

Regards,

-- 
Nicolas Mailhot

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-23 Thread Nicolas Mailhot
Ingo Molnar mingo at elte.hu writes:

Hi Ingo

 i'm pleased to announce release -v14 of the CFS scheduler patchset.
 
 The CFS patch against v2.6.22-rc2, v2.6.21.1 or v2.6.20.10 can be 
 downloaded from the usual place:
 
   http://people.redhat.com/mingo/cfs-scheduler/


I get a forbidden access on 
http://people.redhat.com/mingo/cfs-scheduler/sched-cfs-v2.6.22-rc2-mm1-v14.patch

Regards,

-- 
Nicolas Mailhot

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-23 Thread Ingo Molnar

* Nicolas Mailhot [EMAIL PROTECTED] wrote:

 Ingo Molnar mingo at elte.hu writes:
 
 Hi Ingo
 
  i'm pleased to announce release -v14 of the CFS scheduler patchset.
  
  The CFS patch against v2.6.22-rc2, v2.6.21.1 or v2.6.20.10 can be 
  downloaded from the usual place:
  
http://people.redhat.com/mingo/cfs-scheduler/
 
 I get a forbidden access on 
 http://people.redhat.com/mingo/cfs-scheduler/sched-cfs-v2.6.22-rc2-mm1-v14.patch

oops - fixed it.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v14

2007-05-23 Thread Nicolas Mailhot
Le mercredi 23 mai 2007 à 21:57 +0200, Ingo Molnar a écrit :
 * Nicolas Mailhot [EMAIL PROTECTED] wrote:
 
  Ingo Molnar mingo at elte.hu writes:
  
  Hi Ingo
  
   i'm pleased to announce release -v14 of the CFS scheduler patchset.
   
   The CFS patch against v2.6.22-rc2, v2.6.21.1 or v2.6.20.10 can be 
   downloaded from the usual place:
   
 http://people.redhat.com/mingo/cfs-scheduler/
  
  I get a forbidden access on 
  http://people.redhat.com/mingo/cfs-scheduler/sched-cfs-v2.6.22-rc2-mm1-v14.patch
 
 oops - fixed it.

Works now, thanks!

-- 
Nicolas Mailhot


signature.asc
Description: Ceci est une partie de message	numériquement signée