subject:"\[git\] CFS\-devel, latest code"

Re: [git] CFS-devel, latest code

2007-10-04 Thread Ingo Molnar


* Dmitry Adamushko <[EMAIL PROTECTED]> wrote:

> results:
> 
> (SCHED_FIFO)
> 
> [EMAIL PROTECTED]:~/storage/prog$ sudo chrt -f 10 ./rr_interval 
> time_slice: 0 : 0
> 
> (SCHED_RR)
> 
> [EMAIL PROTECTED]:~/storage/prog$ sudo chrt 10 ./rr_interval 
> time_slice: 0 : 99984800
> 
> (SCHED_NORMAL)
> 
> [EMAIL PROTECTED]:~/storage/prog$ ./rr_interval 
> time_slice: 0 : 19996960
> 
> (SCHED_NORMAL + a cpu_hog of similar 'weight' on the same CPU --- so should 
> be a half of the previous result)
> 
> [EMAIL PROTECTED]:~/storage/prog$ taskset 1 ./rr_interval 
> time_slice: 0 : 9998480

thanks, applied.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-10-04 Thread Ingo Molnar


* Dmitry Adamushko <[EMAIL PROTECTED]> wrote:

> The following patch (sched: disable sleeper_fairness on SCHED_BATCH) 
> seems to break GROUP_SCHED. Although, it may be 'oops'-less due to the 
> possibility of 'p' being always a valid address.

thanks, applied.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-10-04 Thread Ingo Molnar


* Dmitry Adamushko [EMAIL PROTECTED] wrote:

 The following patch (sched: disable sleeper_fairness on SCHED_BATCH) 
 seems to break GROUP_SCHED. Although, it may be 'oops'-less due to the 
 possibility of 'p' being always a valid address.

thanks, applied.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-10-04 Thread Ingo Molnar


* Dmitry Adamushko [EMAIL PROTECTED] wrote:

 results:
 
 (SCHED_FIFO)
 
 [EMAIL PROTECTED]:~/storage/prog$ sudo chrt -f 10 ./rr_interval 
 time_slice: 0 : 0
 
 (SCHED_RR)
 
 [EMAIL PROTECTED]:~/storage/prog$ sudo chrt 10 ./rr_interval 
 time_slice: 0 : 99984800
 
 (SCHED_NORMAL)
 
 [EMAIL PROTECTED]:~/storage/prog$ ./rr_interval 
 time_slice: 0 : 19996960
 
 (SCHED_NORMAL + a cpu_hog of similar 'weight' on the same CPU --- so should 
 be a half of the previous result)
 
 [EMAIL PROTECTED]:~/storage/prog$ taskset 1 ./rr_interval 
 time_slice: 0 : 9998480

thanks, applied.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-10-02 Thread Srivatsa Vaddagiri

On Tue, Oct 02, 2007 at 09:59:04PM +0200, Dmitry Adamushko wrote:
> The following patch (sched: disable sleeper_fairness on SCHED_BATCH)
> seems to break GROUP_SCHED. Although, it may be
> 'oops'-less due to the possibility of 'p' being always a valid
> address.

Thanks for catching it!  Patch below looks good to me. 

Acked-by : Srivatsa Vaddagiri <[EMAIL PROTECTED]>

> Signed-off-by: Dmitry Adamushko <[EMAIL PROTECTED]>
> 
> ---
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index 8727d17..a379456 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -473,9 +473,8 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity 
> *se, int initial)
>   vruntime += sched_vslice_add(cfs_rq, se);
> 
>   if (!initial) {
> - struct task_struct *p = container_of(se, struct task_struct, 
> se);
> -
> - if (sched_feat(NEW_FAIR_SLEEPERS) && p->policy != SCHED_BATCH)
> + if (sched_feat(NEW_FAIR_SLEEPERS) && entity_is_task(se) &&
> + task_of(se)->policy != SCHED_BATCH)
>   vruntime -= sysctl_sched_latency;
> 
>   vruntime = max_t(s64, vruntime, se->vruntime);
> 
> ---
> 

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-10-02 Thread Dmitry Adamushko


The following patch (sched: disable sleeper_fairness on SCHED_BATCH)
seems to break GROUP_SCHED. Although, it may be
'oops'-less due to the possibility of 'p' being always a valid
address.


Signed-off-by: Dmitry Adamushko <[EMAIL PROTECTED]>

---
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 8727d17..a379456 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -473,9 +473,8 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity 
*se, int initial)
vruntime += sched_vslice_add(cfs_rq, se);
 
if (!initial) {
-   struct task_struct *p = container_of(se, struct task_struct, 
se);
-
-   if (sched_feat(NEW_FAIR_SLEEPERS) && p->policy != SCHED_BATCH)
+   if (sched_feat(NEW_FAIR_SLEEPERS) && entity_is_task(se) &&
+   task_of(se)->policy != SCHED_BATCH)
vruntime -= sysctl_sched_latency;
 
vruntime = max_t(s64, vruntime, se->vruntime);

---


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-10-02 Thread Dmitry Adamushko


On 01/10/2007, Ingo Molnar <[EMAIL PROTECTED]> wrote:
> 
> * Dmitry Adamushko <[EMAIL PROTECTED]> wrote:
> 
> > here is a few patches on top of the recent 'sched-dev':
> >
> > (1) [ proposal ] make timeslices of SCHED_RR tasks constant and not
> > dependent on task's static_prio;
> >
> > (2) [ cleanup ] calc_weighted() is obsolete, remove it;
> >
> > (3) [ refactoring ] make dequeue_entity() / enqueue_entity()
> > and update_stats_dequeue() / update_stats_enqueue() look similar, 
> > structure-wise.
> 
> thanks - i've applied all 3 patches of yours.
> 
> > (compiles well, not functionally tested yet)
> 
> (it boots fine here and SCHED_RR seems to work - but i've not tested
> getinterval.)

/me is guilty... it was a bit broken :-/ here is the fix.

results:

(SCHED_FIFO)

[EMAIL PROTECTED]:~/storage/prog$ sudo chrt -f 10 ./rr_interval 
time_slice: 0 : 0

(SCHED_RR)

[EMAIL PROTECTED]:~/storage/prog$ sudo chrt 10 ./rr_interval 
time_slice: 0 : 99984800

(SCHED_NORMAL)

[EMAIL PROTECTED]:~/storage/prog$ ./rr_interval 
time_slice: 0 : 19996960

(SCHED_NORMAL + a cpu_hog of similar 'weight' on the same CPU --- so should be 
a half of the previous result)

[EMAIL PROTECTED]:~/storage/prog$ taskset 1 ./rr_interval 
time_slice: 0 : 9998480


Signed-off-by: Dmitry Adamushko <[EMAIL PROTECTED]>

---
diff --git a/kernel/sched.c b/kernel/sched.c
index d835cd2..cce22ff 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4745,11 +4745,12 @@ long sys_sched_rr_get_interval(pid_t pid, struct 
timespec __user *interval)
else if (p->policy == SCHED_RR)
time_slice = DEF_TIMESLICE;
else {
+   struct sched_entity *se = >se;
unsigned long flags;
struct rq *rq;
 
rq = task_rq_lock(p, );
-   time_slice = sched_slice(>cfs, >se);
+   time_slice = NS_TO_JIFFIES(sched_slice(cfs_rq_of(se), se));
task_rq_unlock(rq, );
}
read_unlock(_lock);

---



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-10-02 Thread Dmitry Adamushko


On 01/10/2007, Ingo Molnar [EMAIL PROTECTED] wrote:
 
 * Dmitry Adamushko [EMAIL PROTECTED] wrote:
 
  here is a few patches on top of the recent 'sched-dev':
 
  (1) [ proposal ] make timeslices of SCHED_RR tasks constant and not
  dependent on task's static_prio;
 
  (2) [ cleanup ] calc_weighted() is obsolete, remove it;
 
  (3) [ refactoring ] make dequeue_entity() / enqueue_entity()
  and update_stats_dequeue() / update_stats_enqueue() look similar, 
  structure-wise.
 
 thanks - i've applied all 3 patches of yours.
 
  (compiles well, not functionally tested yet)
 
 (it boots fine here and SCHED_RR seems to work - but i've not tested
 getinterval.)

/me is guilty... it was a bit broken :-/ here is the fix.

results:

(SCHED_FIFO)

[EMAIL PROTECTED]:~/storage/prog$ sudo chrt -f 10 ./rr_interval 
time_slice: 0 : 0

(SCHED_RR)

[EMAIL PROTECTED]:~/storage/prog$ sudo chrt 10 ./rr_interval 
time_slice: 0 : 99984800

(SCHED_NORMAL)

[EMAIL PROTECTED]:~/storage/prog$ ./rr_interval 
time_slice: 0 : 19996960

(SCHED_NORMAL + a cpu_hog of similar 'weight' on the same CPU --- so should be 
a half of the previous result)

[EMAIL PROTECTED]:~/storage/prog$ taskset 1 ./rr_interval 
time_slice: 0 : 9998480


Signed-off-by: Dmitry Adamushko [EMAIL PROTECTED]

---
diff --git a/kernel/sched.c b/kernel/sched.c
index d835cd2..cce22ff 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4745,11 +4745,12 @@ long sys_sched_rr_get_interval(pid_t pid, struct 
timespec __user *interval)
else if (p-policy == SCHED_RR)
time_slice = DEF_TIMESLICE;
else {
+   struct sched_entity *se = p-se;
unsigned long flags;
struct rq *rq;
 
rq = task_rq_lock(p, flags);
-   time_slice = sched_slice(rq-cfs, p-se);
+   time_slice = NS_TO_JIFFIES(sched_slice(cfs_rq_of(se), se));
task_rq_unlock(rq, flags);
}
read_unlock(tasklist_lock);

---



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-10-02 Thread Dmitry Adamushko


The following patch (sched: disable sleeper_fairness on SCHED_BATCH)
seems to break GROUP_SCHED. Although, it may be
'oops'-less due to the possibility of 'p' being always a valid
address.


Signed-off-by: Dmitry Adamushko [EMAIL PROTECTED]

---
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 8727d17..a379456 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -473,9 +473,8 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity 
*se, int initial)
vruntime += sched_vslice_add(cfs_rq, se);
 
if (!initial) {
-   struct task_struct *p = container_of(se, struct task_struct, 
se);
-
-   if (sched_feat(NEW_FAIR_SLEEPERS)  p-policy != SCHED_BATCH)
+   if (sched_feat(NEW_FAIR_SLEEPERS)  entity_is_task(se) 
+   task_of(se)-policy != SCHED_BATCH)
vruntime -= sysctl_sched_latency;
 
vruntime = max_t(s64, vruntime, se-vruntime);

---


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-10-02 Thread Srivatsa Vaddagiri

On Tue, Oct 02, 2007 at 09:59:04PM +0200, Dmitry Adamushko wrote:
 The following patch (sched: disable sleeper_fairness on SCHED_BATCH)
 seems to break GROUP_SCHED. Although, it may be
 'oops'-less due to the possibility of 'p' being always a valid
 address.

Thanks for catching it!  Patch below looks good to me. 

Acked-by : Srivatsa Vaddagiri [EMAIL PROTECTED]

 Signed-off-by: Dmitry Adamushko [EMAIL PROTECTED]
 
 ---
 diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
 index 8727d17..a379456 100644
 --- a/kernel/sched_fair.c
 +++ b/kernel/sched_fair.c
 @@ -473,9 +473,8 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity 
 *se, int initial)
   vruntime += sched_vslice_add(cfs_rq, se);
 
   if (!initial) {
 - struct task_struct *p = container_of(se, struct task_struct, 
 se);
 -
 - if (sched_feat(NEW_FAIR_SLEEPERS)  p-policy != SCHED_BATCH)
 + if (sched_feat(NEW_FAIR_SLEEPERS)  entity_is_task(se) 
 + task_of(se)-policy != SCHED_BATCH)
   vruntime -= sysctl_sched_latency;
 
   vruntime = max_t(s64, vruntime, se-vruntime);
 
 ---
 

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-10-01 Thread Ingo Molnar


* Dmitry Adamushko <[EMAIL PROTECTED]> wrote:

> here is a few patches on top of the recent 'sched-dev':
> 
> (1) [ proposal ] make timeslices of SCHED_RR tasks constant and not
> dependent on task's static_prio;
> 
> (2) [ cleanup ] calc_weighted() is obsolete, remove it;
> 
> (3) [ refactoring ] make dequeue_entity() / enqueue_entity() 
> and update_stats_dequeue() / update_stats_enqueue() look similar, 
> structure-wise.

thanks - i've applied all 3 patches of yours.

> (compiles well, not functionally tested yet)

(it boots fine here and SCHED_RR seems to work - but i've not tested 
getinterval.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-10-01 Thread Ingo Molnar


* Mike Galbraith <[EMAIL PROTECTED]> wrote:

> On Sun, 2007-09-30 at 21:15 +0200, Dmitry Adamushko wrote:
> > 
> > remove obsolete code -- calc_weighted()
> > 
> 
> Here's another piece of low hanging obsolete fruit.
> 
> Remove obsolete TASK_NONINTERACTIVE.
> 
> Signed-off-by: Mike Galbraith <[EMAIL PROTECTED]>

thanks, applied.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-10-01 Thread Mike Galbraith

On Sun, 2007-09-30 at 21:15 +0200, Dmitry Adamushko wrote:
> 
> remove obsolete code -- calc_weighted()
> 

Here's another piece of low hanging obsolete fruit.

Remove obsolete TASK_NONINTERACTIVE.

Signed-off-by: Mike Galbraith <[EMAIL PROTECTED]>

diff -uprNX /root/dontdiff git/linux-2.6.sched-devel/fs/pipe.c 
linux-2.6.23-rc8.d/fs/pipe.c
--- git/linux-2.6.sched-devel/fs/pipe.c 2007-10-01 06:59:51.0 +0200
+++ linux-2.6.23-rc8.d/fs/pipe.c2007-10-01 07:41:17.0 +0200
@@ -45,8 +45,7 @@ void pipe_wait(struct pipe_inode_info *p
 * Pipes are system-local resources, so sleeping on them
 * is considered a noninteractive wait:
 */
-   prepare_to_wait(>wait, ,
-   TASK_INTERRUPTIBLE | TASK_NONINTERACTIVE);
+   prepare_to_wait(>wait, , TASK_INTERRUPTIBLE);
if (pipe->inode)
mutex_unlock(>inode->i_mutex);
schedule();
diff -uprNX /root/dontdiff git/linux-2.6.sched-devel/include/linux/sched.h 
linux-2.6.23-rc8.d/include/linux/sched.h
--- git/linux-2.6.sched-devel/include/linux/sched.h 2007-10-01 
07:00:25.0 +0200
+++ linux-2.6.23-rc8.d/include/linux/sched.h2007-10-01 07:25:25.0 
+0200
@@ -174,8 +174,7 @@ print_cfs_rq(struct seq_file *m, int cpu
 #define EXIT_ZOMBIE16
 #define EXIT_DEAD  32
 /* in tsk->state again */
-#define TASK_NONINTERACTIVE64
-#define TASK_DEAD  128
+#define TASK_DEAD  64
 
 #define __set_task_state(tsk, state_value) \
do { (tsk)->state = (state_value); } while (0)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-10-01 Thread Mike Galbraith

On Sun, 2007-09-30 at 21:15 +0200, Dmitry Adamushko wrote:
 
 remove obsolete code -- calc_weighted()
 

Here's another piece of low hanging obsolete fruit.

Remove obsolete TASK_NONINTERACTIVE.

Signed-off-by: Mike Galbraith [EMAIL PROTECTED]

diff -uprNX /root/dontdiff git/linux-2.6.sched-devel/fs/pipe.c 
linux-2.6.23-rc8.d/fs/pipe.c
--- git/linux-2.6.sched-devel/fs/pipe.c 2007-10-01 06:59:51.0 +0200
+++ linux-2.6.23-rc8.d/fs/pipe.c2007-10-01 07:41:17.0 +0200
@@ -45,8 +45,7 @@ void pipe_wait(struct pipe_inode_info *p
 * Pipes are system-local resources, so sleeping on them
 * is considered a noninteractive wait:
 */
-   prepare_to_wait(pipe-wait, wait,
-   TASK_INTERRUPTIBLE | TASK_NONINTERACTIVE);
+   prepare_to_wait(pipe-wait, wait, TASK_INTERRUPTIBLE);
if (pipe-inode)
mutex_unlock(pipe-inode-i_mutex);
schedule();
diff -uprNX /root/dontdiff git/linux-2.6.sched-devel/include/linux/sched.h 
linux-2.6.23-rc8.d/include/linux/sched.h
--- git/linux-2.6.sched-devel/include/linux/sched.h 2007-10-01 
07:00:25.0 +0200
+++ linux-2.6.23-rc8.d/include/linux/sched.h2007-10-01 07:25:25.0 
+0200
@@ -174,8 +174,7 @@ print_cfs_rq(struct seq_file *m, int cpu
 #define EXIT_ZOMBIE16
 #define EXIT_DEAD  32
 /* in tsk-state again */
-#define TASK_NONINTERACTIVE64
-#define TASK_DEAD  128
+#define TASK_DEAD  64
 
 #define __set_task_state(tsk, state_value) \
do { (tsk)-state = (state_value); } while (0)


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-10-01 Thread Ingo Molnar


* Mike Galbraith [EMAIL PROTECTED] wrote:

 On Sun, 2007-09-30 at 21:15 +0200, Dmitry Adamushko wrote:
  
  remove obsolete code -- calc_weighted()
  
 
 Here's another piece of low hanging obsolete fruit.
 
 Remove obsolete TASK_NONINTERACTIVE.
 
 Signed-off-by: Mike Galbraith [EMAIL PROTECTED]

thanks, applied.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-10-01 Thread Ingo Molnar


* Dmitry Adamushko [EMAIL PROTECTED] wrote:

 here is a few patches on top of the recent 'sched-dev':
 
 (1) [ proposal ] make timeslices of SCHED_RR tasks constant and not
 dependent on task's static_prio;
 
 (2) [ cleanup ] calc_weighted() is obsolete, remove it;
 
 (3) [ refactoring ] make dequeue_entity() / enqueue_entity() 
 and update_stats_dequeue() / update_stats_enqueue() look similar, 
 structure-wise.

thanks - i've applied all 3 patches of yours.

 (compiles well, not functionally tested yet)

(it boots fine here and SCHED_RR seems to work - but i've not tested 
getinterval.)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-30 Thread Dmitry Adamushko


and this one,

make dequeue_entity() / enqueue_entity() and update_stats_dequeue() /
update_stats_enqueue() look similar, structure-wise.

zero effect, functionally-wise.

Signed-off-by: Dmitry Adamushko <[EMAIL PROTECTED]>

---
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 2674e27..ed75a04 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -366,7 +366,6 @@ update_stats_wait_end(struct cfs_rq *cfs_rq, struct 
sched_entity *se)
 static inline void
 update_stats_dequeue(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
-   update_curr(cfs_rq);
/*
 * Mark the end of the wait period if dequeueing a
 * waiting task:
@@ -493,7 +492,7 @@ static void
 enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int wakeup)
 {
/*
-* Update the fair clock.
+* Update run-time statistics of the 'current'.
 */
update_curr(cfs_rq);
 
@@ -512,6 +511,11 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity 
*se, int wakeup)
 static void
 dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int sleep)
 {
+   /*
+* Update run-time statistics of the 'current'.
+*/
+   update_curr(cfs_rq);
+
update_stats_dequeue(cfs_rq, se);
if (sleep) {
 #ifdef CONFIG_SCHEDSTATS
@@ -775,8 +779,7 @@ static void yield_task_fair(struct rq *rq)
if (likely(!sysctl_sched_compat_yield)) {
__update_rq_clock(rq);
/*
-* Dequeue and enqueue the task to update its
-* position within the tree:
+* Update run-time statistics of the 'current'.
 */
update_curr(cfs_rq);
 

---

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-30 Thread Dmitry Adamushko



remove obsolete code -- calc_weighted()


Signed-off-by: Dmitry Adamushko <[EMAIL PROTECTED]>


---
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index fe4003d..2674e27 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -342,17 +342,6 @@ update_stats_wait_start(struct cfs_rq *cfs_rq,
struct sched_entity *se)
schedstat_set(se->wait_start, rq_of(cfs_rq)->clock);
 }
 
-static inline unsigned long
-calc_weighted(unsigned long delta, struct sched_entity *se)
-{
-   unsigned long weight = se->load.weight;
-
-   if (unlikely(weight != NICE_0_LOAD))
-   return (u64)delta * se->load.weight >> NICE_0_SHIFT;
-   else
-   return delta;
-}
-
 /*
  * Task is being enqueued - update stats:
  */

---

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-30 Thread Dmitry Adamushko



here is a few patches on top of the recent 'sched-dev':

(1) [ proposal ] make timeslices of SCHED_RR tasks constant and not
dependent on task's static_prio;

(2) [ cleanup ] calc_weighted() is obsolete, remove it;

(3) [ refactoring ] make dequeue_entity() / enqueue_entity() 
and update_stats_dequeue() / update_stats_enqueue() look similar, 
structure-wise.

---

(1)

- make timeslices of SCHED_RR tasks constant and not
dependent on task's static_prio [1] ;
- remove obsolete code (timeslice related bits);
- make sched_rr_get_interval() return something more
meaningful [2] for SCHED_OTHER tasks.

[1] according to the following link, the current behavior is not compliant
with SUSv3 (not sure though, what is the reference for us :-)
http://lkml.org/lkml/2007/3/7/656

[2] the interval is dynamic and can be depicted as follows "should a
task be one of the runnable tasks at this particular moment, it would
expect to run for this interval of time before being re-scheduled by the
scheduler tick".

all in all, the code doesn't increase:

   textdata bss dec hex filename
  465855102  40   51727ca0f ../build/kernel/sched.o.before
  465535102  40   51695c9ef ../build/kernel/sched.o

yeah, this seems to require task_rq_lock/unlock() but this is not a hot
path.

what do you think?

(compiles well, not functionally tested yet)

Almost-Signed-off-by: Dmitry Adamushko <[EMAIL PROTECTED]>

---
diff --git a/kernel/sched.c b/kernel/sched.c
index 0abed89..eba7827 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -104,11 +104,9 @@ unsigned long long __attribute__((weak)) sched_clock(void)
 /*
  * These are the 'tuning knobs' of the scheduler:
  *
- * Minimum timeslice is 5 msecs (or 1 jiffy, whichever is larger),
- * default timeslice is 100 msecs, maximum timeslice is 800 msecs.
+ * default timeslice is 100 msecs (used only for SCHED_RR tasks).
  * Timeslices get refilled after they expire.
  */
-#define MIN_TIMESLICE  max(5 * HZ / 1000, 1)
 #define DEF_TIMESLICE  (100 * HZ / 1000)
 
 #ifdef CONFIG_SMP
@@ -132,24 +130,6 @@ static inline void sg_inc_cpu_power(struct sched_group 
*sg, u32 val)
 }
 #endif
 
-#define SCALE_PRIO(x, prio) \
-   max(x * (MAX_PRIO - prio) / (MAX_USER_PRIO / 2), MIN_TIMESLICE)
-
-/*
- * static_prio_timeslice() scales user-nice values [ -20 ... 0 ... 19 ]
- * to time slice values: [800ms ... 100ms ... 5ms]
- */
-static unsigned int static_prio_timeslice(int static_prio)
-{
-   if (static_prio == NICE_TO_PRIO(19))
-   return 1;
-
-   if (static_prio < NICE_TO_PRIO(0))
-   return SCALE_PRIO(DEF_TIMESLICE * 4, static_prio);
-   else
-   return SCALE_PRIO(DEF_TIMESLICE, static_prio);
-}
-
 static inline int rt_policy(int policy)
 {
if (unlikely(policy == SCHED_FIFO) || unlikely(policy == SCHED_RR))
@@ -4759,6 +4739,7 @@ asmlinkage
 long sys_sched_rr_get_interval(pid_t pid, struct timespec __user *interval)
 {
struct task_struct *p;
+   unsigned int time_slice;
int retval = -EINVAL;
struct timespec t;
 
@@ -4775,9 +4756,20 @@ long sys_sched_rr_get_interval(pid_t pid, struct 
timespec __user *interval)
if (retval)
goto out_unlock;
 
-   jiffies_to_timespec(p->policy == SCHED_FIFO ?
-   0 : static_prio_timeslice(p->static_prio), );
+   if (p->policy == SCHED_FIFO)
+   time_slice = 0;
+   else if (p->policy == SCHED_RR)
+   time_slice = DEF_TIMESLICE;
+   else {
+   unsigned long flags;
+   struct rq *rq;
+
+   rq = task_rq_lock(p, );
+   time_slice = sched_slice(>cfs, >se);
+   task_rq_unlock(rq, );
+   }
read_unlock(_lock);
+   jiffies_to_timespec(time_slice, );
retval = copy_to_user(interval, , sizeof(t)) ? -EFAULT : 0;
 out_nounlock:
return retval;
diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index dbe4d8c..5c52881 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -206,7 +206,7 @@ static void task_tick_rt(struct rq *rq, struct task_struct 
*p)
if (--p->time_slice)
return;
 
-   p->time_slice = static_prio_timeslice(p->static_prio);
+   p->time_slice = DEF_TIMESLICE;
 
/*
 * Requeue to the end of queue if we are not the only element

---

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-30 Thread Dmitry Adamushko



here is a few patches on top of the recent 'sched-dev':

(1) [ proposal ] make timeslices of SCHED_RR tasks constant and not
dependent on task's static_prio;

(2) [ cleanup ] calc_weighted() is obsolete, remove it;

(3) [ refactoring ] make dequeue_entity() / enqueue_entity() 
and update_stats_dequeue() / update_stats_enqueue() look similar, 
structure-wise.

---

(1)

- make timeslices of SCHED_RR tasks constant and not
dependent on task's static_prio [1] ;
- remove obsolete code (timeslice related bits);
- make sched_rr_get_interval() return something more
meaningful [2] for SCHED_OTHER tasks.

[1] according to the following link, the current behavior is not compliant
with SUSv3 (not sure though, what is the reference for us :-)
http://lkml.org/lkml/2007/3/7/656

[2] the interval is dynamic and can be depicted as follows should a
task be one of the runnable tasks at this particular moment, it would
expect to run for this interval of time before being re-scheduled by the
scheduler tick.

all in all, the code doesn't increase:

   textdata bss dec hex filename
  465855102  40   51727ca0f ../build/kernel/sched.o.before
  465535102  40   51695c9ef ../build/kernel/sched.o

yeah, this seems to require task_rq_lock/unlock() but this is not a hot
path.

what do you think?

(compiles well, not functionally tested yet)

Almost-Signed-off-by: Dmitry Adamushko [EMAIL PROTECTED]

---
diff --git a/kernel/sched.c b/kernel/sched.c
index 0abed89..eba7827 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -104,11 +104,9 @@ unsigned long long __attribute__((weak)) sched_clock(void)
 /*
  * These are the 'tuning knobs' of the scheduler:
  *
- * Minimum timeslice is 5 msecs (or 1 jiffy, whichever is larger),
- * default timeslice is 100 msecs, maximum timeslice is 800 msecs.
+ * default timeslice is 100 msecs (used only for SCHED_RR tasks).
  * Timeslices get refilled after they expire.
  */
-#define MIN_TIMESLICE  max(5 * HZ / 1000, 1)
 #define DEF_TIMESLICE  (100 * HZ / 1000)
 
 #ifdef CONFIG_SMP
@@ -132,24 +130,6 @@ static inline void sg_inc_cpu_power(struct sched_group 
*sg, u32 val)
 }
 #endif
 
-#define SCALE_PRIO(x, prio) \
-   max(x * (MAX_PRIO - prio) / (MAX_USER_PRIO / 2), MIN_TIMESLICE)
-
-/*
- * static_prio_timeslice() scales user-nice values [ -20 ... 0 ... 19 ]
- * to time slice values: [800ms ... 100ms ... 5ms]
- */
-static unsigned int static_prio_timeslice(int static_prio)
-{
-   if (static_prio == NICE_TO_PRIO(19))
-   return 1;
-
-   if (static_prio  NICE_TO_PRIO(0))
-   return SCALE_PRIO(DEF_TIMESLICE * 4, static_prio);
-   else
-   return SCALE_PRIO(DEF_TIMESLICE, static_prio);
-}
-
 static inline int rt_policy(int policy)
 {
if (unlikely(policy == SCHED_FIFO) || unlikely(policy == SCHED_RR))
@@ -4759,6 +4739,7 @@ asmlinkage
 long sys_sched_rr_get_interval(pid_t pid, struct timespec __user *interval)
 {
struct task_struct *p;
+   unsigned int time_slice;
int retval = -EINVAL;
struct timespec t;
 
@@ -4775,9 +4756,20 @@ long sys_sched_rr_get_interval(pid_t pid, struct 
timespec __user *interval)
if (retval)
goto out_unlock;
 
-   jiffies_to_timespec(p-policy == SCHED_FIFO ?
-   0 : static_prio_timeslice(p-static_prio), t);
+   if (p-policy == SCHED_FIFO)
+   time_slice = 0;
+   else if (p-policy == SCHED_RR)
+   time_slice = DEF_TIMESLICE;
+   else {
+   unsigned long flags;
+   struct rq *rq;
+
+   rq = task_rq_lock(p, flags);
+   time_slice = sched_slice(rq-cfs, p-se);
+   task_rq_unlock(rq, flags);
+   }
read_unlock(tasklist_lock);
+   jiffies_to_timespec(time_slice, t);
retval = copy_to_user(interval, t, sizeof(t)) ? -EFAULT : 0;
 out_nounlock:
return retval;
diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index dbe4d8c..5c52881 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -206,7 +206,7 @@ static void task_tick_rt(struct rq *rq, struct task_struct 
*p)
if (--p-time_slice)
return;
 
-   p-time_slice = static_prio_timeslice(p-static_prio);
+   p-time_slice = DEF_TIMESLICE;
 
/*
 * Requeue to the end of queue if we are not the only element

---

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-30 Thread Dmitry Adamushko



remove obsolete code -- calc_weighted()


Signed-off-by: Dmitry Adamushko [EMAIL PROTECTED]


---
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index fe4003d..2674e27 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -342,17 +342,6 @@ update_stats_wait_start(struct cfs_rq *cfs_rq,
struct sched_entity *se)
schedstat_set(se-wait_start, rq_of(cfs_rq)-clock);
 }
 
-static inline unsigned long
-calc_weighted(unsigned long delta, struct sched_entity *se)
-{
-   unsigned long weight = se-load.weight;
-
-   if (unlikely(weight != NICE_0_LOAD))
-   return (u64)delta * se-load.weight  NICE_0_SHIFT;
-   else
-   return delta;
-}
-
 /*
  * Task is being enqueued - update stats:
  */

---

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-30 Thread Dmitry Adamushko


and this one,

make dequeue_entity() / enqueue_entity() and update_stats_dequeue() /
update_stats_enqueue() look similar, structure-wise.

zero effect, functionally-wise.

Signed-off-by: Dmitry Adamushko [EMAIL PROTECTED]

---
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 2674e27..ed75a04 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -366,7 +366,6 @@ update_stats_wait_end(struct cfs_rq *cfs_rq, struct 
sched_entity *se)
 static inline void
 update_stats_dequeue(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
-   update_curr(cfs_rq);
/*
 * Mark the end of the wait period if dequeueing a
 * waiting task:
@@ -493,7 +492,7 @@ static void
 enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int wakeup)
 {
/*
-* Update the fair clock.
+* Update run-time statistics of the 'current'.
 */
update_curr(cfs_rq);
 
@@ -512,6 +511,11 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity 
*se, int wakeup)
 static void
 dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int sleep)
 {
+   /*
+* Update run-time statistics of the 'current'.
+*/
+   update_curr(cfs_rq);
+
update_stats_dequeue(cfs_rq, se);
if (sleep) {
 #ifdef CONFIG_SCHEDSTATS
@@ -775,8 +779,7 @@ static void yield_task_fair(struct rq *rq)
if (likely(!sysctl_sched_compat_yield)) {
__update_rq_clock(rq);
/*
-* Dequeue and enqueue the task to update its
-* position within the tree:
+* Update run-time statistics of the 'current'.
 */
update_curr(cfs_rq);
 

---

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-28 Thread Bill Davidsen


Ingo Molnar wrote:

Maybe there's more to come: if we can get CONFIG_FAIR_USER_SCHED to work 
properly then your Xorg will have a load-independent 50% of CPU time all 
to itself.


It seems that perhaps that 50% makes more sense on a single/dual CPU 
system than on a more robust one, such as a four way dual core Xeon with 
HT or some such. With hotplug CPUs, and setups on various machines, 
perhaps some resource limit independent of the available resource would 
be useful.


Just throwing out the idea, in case it lands on fertile ground.

--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-28 Thread Bill Davidsen


Ingo Molnar wrote:

Maybe there's more to come: if we can get CONFIG_FAIR_USER_SCHED to work 
properly then your Xorg will have a load-independent 50% of CPU time all 
to itself.


It seems that perhaps that 50% makes more sense on a single/dual CPU 
system than on a more robust one, such as a four way dual core Xeon with 
HT or some such. With hotplug CPUs, and setups on various machines, 
perhaps some resource limit independent of the available resource would 
be useful.


Just throwing out the idea, in case it lands on fertile ground.

--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-27 Thread Ingo Molnar


* Dmitry Adamushko <[EMAIL PROTECTED]> wrote:

> humm... I think, it'd be safer to have something like the following 
> change in place.
> 
> The thing is that __pick_next_entity() must never be called when 
> first_fair(cfs_rq) == NULL. It wouldn't be a problem, should 
> 'run_node' be the very first field of 'struct sched_entity' (and it's 
> the second).
> 
> The 'nr_running != 0' check is _not_ enough, due to the fact that 
> 'current' is not within the tree. Generic paths are ok (e.g. 
> schedule() as put_prev_task() is called previously)... I'm more 
> worried about e.g. migration_call() -> CPU_DEAD_FROZEN -> 
> migrate_dead_tasks()... if 'current' == rq->idle, no problems.. if 
> it's one of the SCHED_NORMAL tasks (or imagine, some other use-cases 
> in the future -- i.e. we should not make outer world dependent on 
> internal details of sched_fair class) -- it may be "Houston, we've got 
> a problem" case.
> 
> it's +16 bytes to the ".text". Another variant is to make 'run_node' 
> the first data member of 'struct sched_entity' but an additional check 
> (se ! = NULL) is still needed in pick_next_entity().

looks good to me - and we already have something similar in sched_rt.c. 
I've added your patch to the queue. (Can i add your SoB line too?)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-27 Thread Ingo Molnar


* Dmitry Adamushko [EMAIL PROTECTED] wrote:

 humm... I think, it'd be safer to have something like the following 
 change in place.
 
 The thing is that __pick_next_entity() must never be called when 
 first_fair(cfs_rq) == NULL. It wouldn't be a problem, should 
 'run_node' be the very first field of 'struct sched_entity' (and it's 
 the second).
 
 The 'nr_running != 0' check is _not_ enough, due to the fact that 
 'current' is not within the tree. Generic paths are ok (e.g. 
 schedule() as put_prev_task() is called previously)... I'm more 
 worried about e.g. migration_call() - CPU_DEAD_FROZEN - 
 migrate_dead_tasks()... if 'current' == rq-idle, no problems.. if 
 it's one of the SCHED_NORMAL tasks (or imagine, some other use-cases 
 in the future -- i.e. we should not make outer world dependent on 
 internal details of sched_fair class) -- it may be Houston, we've got 
 a problem case.
 
 it's +16 bytes to the .text. Another variant is to make 'run_node' 
 the first data member of 'struct sched_entity' but an additional check 
 (se ! = NULL) is still needed in pick_next_entity().

looks good to me - and we already have something similar in sched_rt.c. 
I've added your patch to the queue. (Can i add your SoB line too?)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-26 Thread Mike Galbraith

On Tue, 2007-09-25 at 11:47 +0200, Ingo Molnar wrote:

> Maybe there's more to come: if we can get CONFIG_FAIR_USER_SCHED to work 
> properly then your Xorg will have a load-independent 50% of CPU time all 
> to itself. (Group scheduling is quite impressive already: i can log in 
> as root without feeling _any_ effect from a perpetual 'hackbench 100' 
> running as uid mingo. Fork bombs no more.) Will the Amarok gforce plugin 
> like that CPU time splitup? (or is most of the gforce overhead under 
> your user uid?)
> 
> it could also work out negatively, _sometimes_ X does not like being too 
> high prio. (weird as that might be.) So we'll see.

I piddled around with fair users this morning, and it worked well.  With
Xorg and Gforce as one user (X and Gforce are synchronous ATM), and a
make -j30 as another, I could barely tell the make was running.
Watching a dvd, I couldn't tell. Latencies were pretty darn good
throughout three hours of testing this and that.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-26 Thread Mike Galbraith

On Tue, 2007-09-25 at 11:47 +0200, Ingo Molnar wrote:

 Maybe there's more to come: if we can get CONFIG_FAIR_USER_SCHED to work 
 properly then your Xorg will have a load-independent 50% of CPU time all 
 to itself. (Group scheduling is quite impressive already: i can log in 
 as root without feeling _any_ effect from a perpetual 'hackbench 100' 
 running as uid mingo. Fork bombs no more.) Will the Amarok gforce plugin 
 like that CPU time splitup? (or is most of the gforce overhead under 
 your user uid?)
 
 it could also work out negatively, _sometimes_ X does not like being too 
 high prio. (weird as that might be.) So we'll see.

I piddled around with fair users this morning, and it worked well.  With
Xorg and Gforce as one user (X and Gforce are synchronous ATM), and a
make -j30 as another, I could barely tell the make was running.
Watching a dvd, I couldn't tell. Latencies were pretty darn good
throughout three hours of testing this and that.

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Dmitry Adamushko


humm... I think, it'd be safer to have something like the following
change in place.

The thing is that __pick_next_entity() must never be called when
first_fair(cfs_rq) == NULL. It wouldn't be a problem, should 'run_node'
be the very first field of 'struct sched_entity' (and it's the second).

The 'nr_running != 0' check is _not_ enough, due to the fact that
'current' is not within the tree. Generic paths are ok (e.g. schedule()
as put_prev_task() is called previously)... I'm more worried about e.g.
migration_call() -> CPU_DEAD_FROZEN -> migrate_dead_tasks()... if
'current' == rq->idle, no problems.. if it's one of the SCHED_NORMAL
tasks (or imagine, some other use-cases in the future -- i.e. we should
not make outer world dependent on internal details of sched_fair class)
-- it may be "Houston, we've got a problem" case.

it's +16 bytes to the ".text". Another variant is to make 'run_node' the
first data member of 'struct sched_entity' but an additional check (se !
= NULL) is still needed in pick_next_entity().

what do you think?


---
 diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index dae714a..33b2376 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -563,9 +563,12 @@ set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity 
*se)
 
 static struct sched_entity *pick_next_entity(struct cfs_rq *cfs_rq)
 {
-   struct sched_entity *se = __pick_next_entity(cfs_rq);
-
-   set_next_entity(cfs_rq, se);
+   struct sched_entity *se = NULL;
+   
+   if (first_fair(cfs_rq)) {
+   se = __pick_next_entity(cfs_rq);
+   set_next_entity(cfs_rq, se);
+   }
 
return se;
 }

---


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Srivatsa Vaddagiri

On Tue, Sep 25, 2007 at 09:34:20PM +0530, Srivatsa Vaddagiri wrote:
> On Tue, Sep 25, 2007 at 04:44:43PM +0200, Ingo Molnar wrote:
> > 
> > The latest sched-devel.git tree can be pulled from:
> >   
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel.git
> > 
> 
> This is required for it to compile.
> 
> ---
>  include/linux/sched.h |1 +
>  1 files changed, 1 insertion(+)
> 
> Index: current/include/linux/sched.h
> ===
> --- current.orig/include/linux/sched.h
> +++ current/include/linux/sched.h
> @@ -1404,6 +1404,7 @@ extern unsigned int sysctl_sched_wakeup_
>  extern unsigned int sysctl_sched_batch_wakeup_granularity;
>  extern unsigned int sysctl_sched_child_runs_first;
>  extern unsigned int sysctl_sched_features;
> +extern unsigned int sysctl_sched_nr_latency;
>  #endif
>  
>  extern unsigned int sysctl_sched_compat_yield;

and this:

---
 kernel/sched_debug.c |1 -
 1 files changed, 1 deletion(-)

Index: current/kernel/sched_debug.c
===
--- current.orig/kernel/sched_debug.c
+++ current/kernel/sched_debug.c
@@ -210,7 +210,6 @@ static int sched_debug_show(struct seq_f
 #define PN(x) \
SEQ_printf(m, "  .%-40s: %Ld.%06ld\n", #x, SPLIT_NS(x))
PN(sysctl_sched_latency);
-   PN(sysctl_sched_min_granularity);
PN(sysctl_sched_wakeup_granularity);
PN(sysctl_sched_batch_wakeup_granularity);
PN(sysctl_sched_child_runs_first);

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Srivatsa Vaddagiri

On Tue, Sep 25, 2007 at 04:44:43PM +0200, Ingo Molnar wrote:
> 
> The latest sched-devel.git tree can be pulled from:
>   
>
> git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel.git
> 

This is required for it to compile.

---
 include/linux/sched.h |1 +
 1 files changed, 1 insertion(+)

Index: current/include/linux/sched.h
===
--- current.orig/include/linux/sched.h
+++ current/include/linux/sched.h
@@ -1404,6 +1404,7 @@ extern unsigned int sysctl_sched_wakeup_
 extern unsigned int sysctl_sched_batch_wakeup_granularity;
 extern unsigned int sysctl_sched_child_runs_first;
 extern unsigned int sysctl_sched_features;
+extern unsigned int sysctl_sched_nr_latency;
 #endif
 
 extern unsigned int sysctl_sched_compat_yield;

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Daniel Walker

On Tue, 2007-09-25 at 08:45 +0200, Ingo Molnar wrote:
> * Daniel Walker <[EMAIL PROTECTED]> wrote:
> 
> > On Mon, 2007-09-24 at 23:45 +0200, Ingo Molnar wrote:
> > > Lots of scheduler updates in the past few days, done by many people. 
> > > Most importantly, the SMP latency problems reported and debugged by
> > > Mike 
> > > Galbraith should be fixed for good now. 
> > 
> > Does this have anything to do with idle balancing ? I noticed some 
> > fairly large latencies in that code in 2.6.23-rc's ..
> 
> any measurements?

Yes, I made this a while ago,

ftp://source.mvista.com/pub/dwalker/misc/long-cfs-load-balance-trace.txt

This was with PREEMPT_RT on btw, so it's not the most recent kernel. I
was able to reproduce it in all the -rc's I tried.

Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Srivatsa Vaddagiri

On Tue, Sep 25, 2007 at 01:33:06PM +0200, Ingo Molnar wrote:
> > hm. perhaps this fixup in kernel/sched.c:set_task_cpu():
> > 
> > p->se.vruntime -= old_rq->cfs.min_vruntime - 
> > new_rq->cfs.min_vruntime;
> > 
> > needs to become properly group-hierarchy aware?

You seem to have hit the nerve for this problem. The two patches I sent:

http://lkml.org/lkml/2007/9/25/117
http://lkml.org/lkml/2007/9/25/168

partly help, but we can do better.

> ===
> --- linux.orig/kernel/sched.c
> +++ linux/kernel/sched.c
> @@ -1039,7 +1039,8 @@ void set_task_cpu(struct task_struct *p,
>  {
>   int old_cpu = task_cpu(p);
>   struct rq *old_rq = cpu_rq(old_cpu), *new_rq = cpu_rq(new_cpu);
> - u64 clock_offset;
> + struct sched_entity *se;
> + u64 clock_offset, voffset;
> 
>   clock_offset = old_rq->clock - new_rq->clock;
> 
> @@ -1051,7 +1052,11 @@ void set_task_cpu(struct task_struct *p,
>   if (p->se.block_start)
>   p->se.block_start -= clock_offset;
>  #endif
> - p->se.vruntime -= old_rq->cfs.min_vruntime - new_rq->cfs.min_vruntime;
> +
> + se = >se;
> + voffset = old_rq->cfs.min_vruntime - new_rq->cfs.min_vruntime;

This one feels wrong, although I can't express my reaction correctly ..

> + for_each_sched_entity(se)
> + se->vruntime -= voffset;

Note that parent entities for a task is per-cpu. So if a task A
belonging to userid guest hops from CPU0 to CPU1, then it gets a new parent 
entity as well, which is different from its parent entity on CPU0.

Before:
taskA->se.parent = guest's tg->se[0]

After:
taskA->se.parent = guest's tg->se[1]

So walking up the entity hierarchy and fixing up (parent)se->vruntime will do
little good after the task has moved to a new cpu.

IMO, we need to be doing this :

- For dequeue of higher level sched entities, simulate as if
  they are going to "sleep" 
- For enqueue of higher level entities, simulate as if they are
  "waking up". This will cause enqueue_entity() to reset their
  vruntime (to existing value for cfs_rq->min_vruntime) when they 
  "wakeup".

If we don't do this, then lets say a group had only one task (A) and it
moves from CPU0 to CPU1. Then on CPU1, when group level entity for task
A is enqueued, it will have a very low vruntime (since it was never
running) and this will give task A unlimited cpu time, until its group
entity catches up with all the "sleep" time.

Let me try a fix for this next ..

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[git] CFS-devel, latest code

2007-09-25 Thread Ingo Molnar


The latest sched-devel.git tree can be pulled from:
  
   git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel.git
 
This is a quick iteration after yesterday's: a couple of group 
scheduling bugs were found/debugged and fixed by Srivatsa Vaddagiri and 
Mike Galbraith. There's also a yield fix from Dmitry Adamushko, a build 
fix from S.Ceglar Onur and Andrew Morton, a cleanup from Hiroshi 
Shimamoto and the usual stream of goodies from Peter Zijlstra. Rebased 
it to -rc8 as well.

there are no known regressions at the moment in the sched-devel.git 
codebase. (yay :)

Ingo

->
the shortlog relative to 2.6.23-rc8:
 
Dmitry Adamushko (9):
  sched: clean up struct load_stat
  sched: clean up schedstat block in dequeue_entity()
  sched: sched_setscheduler() fix
  sched: add set_curr_task() calls
  sched: do not keep current in the tree and get rid of 
sched_entity::fair_key
  sched: optimize task_new_fair()
  sched: simplify sched_class::yield_task()
  sched: rework enqueue/dequeue_entity() to get rid of set_curr_task()
  sched: yield fix

Hiroshi Shimamoto (1):
  sched: clean up sched_fork()

Ingo Molnar (44):
  sched: fix new-task method
  sched: resched task in task_new_fair()
  sched: small sched_debug cleanup
  sched: debug: track maximum 'slice'
  sched: uniform tunings
  sched: use constants if !CONFIG_SCHED_DEBUG
  sched: remove stat_gran
  sched: remove precise CPU load
  sched: remove precise CPU load calculations #2
  sched: track cfs_rq->curr on !group-scheduling too
  sched: cleanup: simplify cfs_rq_curr() methods
  sched: uninline __enqueue_entity()/__dequeue_entity()
  sched: speed up update_load_add/_sub()
  sched: clean up calc_weighted()
  sched: introduce se->vruntime
  sched: move sched_feat() definitions
  sched: optimize vruntime based scheduling
  sched: simplify check_preempt() methods
  sched: wakeup granularity fix
  sched: add se->vruntime debugging
  sched: add more vruntime statistics
  sched: debug: update exec_clock only when SCHED_DEBUG
  sched: remove wait_runtime limit
  sched: remove wait_runtime fields and features
  sched: x86: allow single-depth wchan output
  sched: fix delay accounting performance regression
  sched: prettify /proc/sched_debug output
  sched: enhance debug output
  sched: kernel/sched_fair.c whitespace cleanups
  sched: fair-group sched, cleanups
  sched: enable CONFIG_FAIR_GROUP_SCHED=y by default
  sched debug: BKL usage statistics
  sched: remove unneeded tunables
  sched debug: print settings
  sched debug: more width for parameter printouts
  sched: entity_key() fix
  sched: remove condition from set_task_cpu()
  sched: remove last_min_vruntime effect
  sched: undo some of the recent changes
  sched: fix place_entity()
  sched: fix sched_fork()
  sched: remove set_leftmost()
  sched: clean up schedstats, cnt -> count
  sched: cleanup, remove stale comment

Matthias Kaehlcke (1):
  sched: use list_for_each_entry_safe() in __wake_up_common()

Mike Galbraith (2):
  sched: fix SMP migration latencies
  sched: fix formatting of /proc/sched_debug

Peter Zijlstra (12):
  sched: simplify SCHED_FEAT_* code
  sched: new task placement for vruntime
  sched: simplify adaptive latency
  sched: clean up new task placement
  sched: add tree based averages
  sched: handle vruntime overflow
  sched: better min_vruntime tracking
  sched: add vslice
  sched debug: check spread
  sched: max_vruntime() simplification
  sched: clean up min_vruntime use
  sched: speed up and simplify vslice calculations

S.Ceglar Onur (1):
  sched debug: BKL usage statistics, fix

Srivatsa Vaddagiri (9):
  sched: group-scheduler core
  sched: revert recent removal of set_curr_task()
  sched: fix minor bug in yield
  sched: print nr_running and load in /proc/sched_debug
  sched: print >cfs stats
  sched: clean up code under CONFIG_FAIR_GROUP_SCHED
  sched: add fair-user scheduler
  sched: group scheduler wakeup latency fix
  sched: group scheduler SMP migration fix

 arch/i386/Kconfig   |   11 
 fs/proc/base.c  |2 
 include/linux/sched.h   |   55 ++-
 init/Kconfig|   21 +
 kernel/delayacct.c  |2 
 kernel/sched.c  |  577 +-
 kernel/sched_debug.c|  250 +++-
 kernel/sched_fair.c |  718 +---
 kernel/sched_idletask.c |5 
 kernel/sched_rt.c   |   12 
 kernel/sched_stats.h|   28 -
 kernel/sysctl.c |   31 --
 kernel/user.c   |   43 ++
 13 files changed, 954 insertions(+), 801 deletions(-)
-
To unsubscribe from this list: send the line "unsubscribe

Re: [git] CFS-devel, latest code

2007-09-25 Thread Srivatsa Vaddagiri

On Tue, Sep 25, 2007 at 03:35:17PM +0200, Mike Galbraith wrote:
> > I tried the following patch. I *think* I see some improvement, wrt
> > latency seen when I type on the shell. Before this patch, I noticed
> > oddities like "kill -9 chew-max-pid" wont kill chew-max (it is queued in
> > runqueue waiting for a looong time to run before it can acknowledge
> > signal and exit). With this patch, I don't see such oddities ..So I am 
> > hoping 
> > it fixes the latency problem you are seeing as well.
> 
> http://lkml.org/lkml/2007/9/25/117 plus the below seems to be the SIlver
> Bullet for the latencies I was seeing.

Cool ..Thanks for the quick feedback.

Ingo, do the two patches fix the latency problems you were seeing as
well?

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Mike Galbraith

On Tue, 2007-09-25 at 18:21 +0530, Srivatsa Vaddagiri wrote:
> On Tue, Sep 25, 2007 at 12:36:17PM +0200, Ingo Molnar wrote:
> > hm. perhaps this fixup in kernel/sched.c:set_task_cpu():
> > 
> > p->se.vruntime -= old_rq->cfs.min_vruntime - 
> > new_rq->cfs.min_vruntime;
> 
> This definitely does need some fixup, even though I am not sure yet if
> it will solve completely the latency issue.
> 
> I tried the following patch. I *think* I see some improvement, wrt
> latency seen when I type on the shell. Before this patch, I noticed
> oddities like "kill -9 chew-max-pid" wont kill chew-max (it is queued in
> runqueue waiting for a looong time to run before it can acknowledge
> signal and exit). With this patch, I don't see such oddities ..So I am hoping 
> it fixes the latency problem you are seeing as well.

http://lkml.org/lkml/2007/9/25/117 plus the below seems to be the SIlver
Bullet for the latencies I was seeing.

> Index: current/kernel/sched.c
> ===
> --- current.orig/kernel/sched.c
> +++ current/kernel/sched.c
> @@ -1039,6 +1039,8 @@ void set_task_cpu(struct task_struct *p,
>  {
>   int old_cpu = task_cpu(p);
>   struct rq *old_rq = cpu_rq(old_cpu), *new_rq = cpu_rq(new_cpu);
> + struct cfs_rq *old_cfsrq = task_cfs_rq(p),
> +   *new_cfsrq = cpu_cfs_rq(old_cfsrq, new_cpu);
>   u64 clock_offset;
>  
>   clock_offset = old_rq->clock - new_rq->clock;
> @@ -1051,7 +1053,8 @@ void set_task_cpu(struct task_struct *p,
>   if (p->se.block_start)
>   p->se.block_start -= clock_offset;
>  #endif
> - p->se.vruntime -= old_rq->cfs.min_vruntime - new_rq->cfs.min_vruntime;
> + p->se.vruntime -= old_cfsrq->min_vruntime -
> +  new_cfsrq->min_vruntime;
>  
>   __set_task_cpu(p, new_cpu);
>  }
>  
> 
> --
> Regards,
> vatsa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Mike Galbraith

On Tue, 2007-09-25 at 14:28 +0200, Mike Galbraith wrote:
> On Tue, 2007-09-25 at 15:58 +0530, Srivatsa Vaddagiri wrote:
> 
> > While I try recreating this myself, I wonder if this patch helps?
> 
> It didn't here, nor did tweaking root's share.  Booting with maxcpus=1,
> I was unable to produce large latencies, but didn't try very many
> things.

Easy way to make it pretty bad: pin a nice 0 loop to CPU0, pin a nice 19
loop to CPU1, then start an unpinned make.. more Xorg bouncing back and
forth I suppose.

se.wait_max  :14.105683
se.wait_max  :   316.943787
se.wait_max  :   692.884324
se.wait_max  :38.165534
se.wait_max  :   732.883492
se.wait_max  :   127.059784
se.wait_max  :63.403549
se.wait_max  :   372.933284

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Srivatsa Vaddagiri

On Tue, Sep 25, 2007 at 12:36:17PM +0200, Ingo Molnar wrote:
> hm. perhaps this fixup in kernel/sched.c:set_task_cpu():
> 
> p->se.vruntime -= old_rq->cfs.min_vruntime - new_rq->cfs.min_vruntime;

This definitely does need some fixup, even though I am not sure yet if
it will solve completely the latency issue.

I tried the following patch. I *think* I see some improvement, wrt
latency seen when I type on the shell. Before this patch, I noticed
oddities like "kill -9 chew-max-pid" wont kill chew-max (it is queued in
runqueue waiting for a looong time to run before it can acknowledge
signal and exit). With this patch, I don't see such oddities ..So I am hoping 
it fixes the latency problem you are seeing as well.

Index: current/kernel/sched.c
===
--- current.orig/kernel/sched.c
+++ current/kernel/sched.c
@@ -1039,6 +1039,8 @@ void set_task_cpu(struct task_struct *p,
 {
int old_cpu = task_cpu(p);
struct rq *old_rq = cpu_rq(old_cpu), *new_rq = cpu_rq(new_cpu);
+   struct cfs_rq *old_cfsrq = task_cfs_rq(p),
+ *new_cfsrq = cpu_cfs_rq(old_cfsrq, new_cpu);
u64 clock_offset;

clock_offset = old_rq->clock - new_rq->clock;
@@ -1051,7 +1053,8 @@ void set_task_cpu(struct task_struct *p,
if (p->se.block_start)
p->se.block_start -= clock_offset;
 #endif
-   p->se.vruntime -= old_rq->cfs.min_vruntime - new_rq->cfs.min_vruntime;
+   p->se.vruntime -= old_cfsrq->min_vruntime -
+new_cfsrq->min_vruntime;

__set_task_cpu(p, new_cpu);
 }

--
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Mike Galbraith

On Tue, 2007-09-25 at 15:58 +0530, Srivatsa Vaddagiri wrote:

> While I try recreating this myself, I wonder if this patch helps?

It didn't here, nor did tweaking root's share.  Booting with maxcpus=1,
I was unable to produce large latencies, but didn't try very many
things.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Ingo Molnar


* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> hm. perhaps this fixup in kernel/sched.c:set_task_cpu():
> 
> p->se.vruntime -= old_rq->cfs.min_vruntime - new_rq->cfs.min_vruntime;
> 
> needs to become properly group-hierarchy aware?

a quick first stab like the one below does not appear to solve the 
problem.

Ingo

--->
Subject: sched: group scheduler SMP migration fix
From: Ingo Molnar <[EMAIL PROTECTED]>

group scheduler SMP migration fix.

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 kernel/sched.c |9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Index: linux/kernel/sched.c
===
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -1039,7 +1039,8 @@ void set_task_cpu(struct task_struct *p,
 {
int old_cpu = task_cpu(p);
struct rq *old_rq = cpu_rq(old_cpu), *new_rq = cpu_rq(new_cpu);
-   u64 clock_offset;
+   struct sched_entity *se;
+   u64 clock_offset, voffset;
 
clock_offset = old_rq->clock - new_rq->clock;
 
@@ -1051,7 +1052,11 @@ void set_task_cpu(struct task_struct *p,
if (p->se.block_start)
p->se.block_start -= clock_offset;
 #endif
-   p->se.vruntime -= old_rq->cfs.min_vruntime - new_rq->cfs.min_vruntime;
+
+   se = >se;
+   voffset = old_rq->cfs.min_vruntime - new_rq->cfs.min_vruntime;
+   for_each_sched_entity(se)
+   se->vruntime -= voffset;
 
__set_task_cpu(p, new_cpu);
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Ingo Molnar


* Srivatsa Vaddagiri <[EMAIL PROTECTED]> wrote:

> On Tue, Sep 25, 2007 at 12:41:20AM -0700, Andrew Morton wrote:
> > This doornails the Vaio.  After grub handover the screen remains black
> > and the fan goes whir.
> > 
> > http://userweb.kernel.org/~akpm/config-sony.txt
> 
> This seems to be UP regression. Sorry abt that. I could recreate 
> the problem very easily with CONFIG_SMP turned off.
> 
> Can you check if this patch works? Works for me here.

thanks - i've put this fix into the core group-scheduling patch.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Ingo Molnar


* Srivatsa Vaddagiri <[EMAIL PROTECTED]> wrote:

> On Tue, Sep 25, 2007 at 12:10:44PM +0200, Ingo Molnar wrote:
> > So the patch below just removes the is_same_group() condition. But i can 
> > still see bad (and obvious) latencies with Mike's 2-hogs test:
> > 
> >  taskset 01 perl -e 'while (1) {}' &
> >  nice -19 taskset 02 perl -e 'while (1) {}' &
> > 
> > So something's amiss.
> 
> While I try recreating this myself, I wonder if this patch helps?

you should be able to recreate this easily by booting with maxcpus=1 and 
the commands above - then run a few instances of chew-max (without them 
being bound to any particular CPUs) and the latencies should show up.

i have tried your patch and it does not solve the problem - i think 
there's a more fundamental bug lurking, besides the wakeup latency 
problem.

Find below a /proc/sched_debug output of a really large latency. The 
latency is caused by the _huge_ (~450 seconds!) vruntime offset that 
'loop_silent' and 'sshd' has:

task   PID tree-key  switches  prio exec-runtime
 ---
 loop_silent  2391 55344.211189   203   120 55344.211189
sshd  2440513334.978030 4   120513334.978030
Rcat  2496513672.558835 4   120513672.558835

hm. perhaps this fixup in kernel/sched.c:set_task_cpu():

p->se.vruntime -= old_rq->cfs.min_vruntime - new_rq->cfs.min_vruntime;

needs to become properly group-hierarchy aware?

Ingo

-->
Sched Debug Version: v0.05-v20, 2.6.23-rc7 #89
now at 95878.065440 msecs
  .sysctl_sched_latency: 20.00
  .sysctl_sched_min_granularity: 2.00
  .sysctl_sched_wakeup_granularity : 2.00
  .sysctl_sched_batch_wakeup_granularity   : 25.00
  .sysctl_sched_child_runs_first   : 0.01
  .sysctl_sched_features   : 3

cpu#0, 1828.868 MHz
  .nr_running: 3
  .load  : 3072
  .nr_switches   : 32032
  .nr_load_updates   : 95906
  .nr_uninterruptible: 4294967238
  .jiffies   : 4294763202
  .next_balance  : 4294.763420
  .curr->pid : 2496
  .clock : 95893.484495
  .idle_clock: 55385.089335
  .prev_clock_raw: 84753.749367
  .clock_warps   : 0
  .clock_overflows   : 1737
  .clock_deep_idle_events: 71815
  .clock_max_delta   : 0.999843
  .cpu_load[0]   : 3072
  .cpu_load[1]   : 2560
  .cpu_load[2]   : 2304
  .cpu_load[3]   : 2176
  .cpu_load[4]   : 2119

cfs_rq
  .exec_clock: 38202.223241
  .MIN_vruntime  : 36334.281860
  .min_vruntime  : 36334.279140
  .max_vruntime  : 36334.281860
  .spread: 0.00
  .spread0   : 0.00
  .nr_running: 2
  .load  : 3072
  .bkl_cnt   : 3934
  .nr_spread_over: 37

cfs_rq
  .exec_clock: 34769.316246
  .MIN_vruntime  : 55344.211189
  .min_vruntime  : 36334.279140
  .max_vruntime  : 513334.978030
  .spread: 457990.766841
  .spread0   : 0.00
  .nr_running: 2
  .load  : 2048
  .bkl_cnt   : 3934
  .nr_spread_over: 10

cfs_rq
  .exec_clock: 36.982394
  .MIN_vruntime  : 0.01
  .min_vruntime  : 36334.279140
  .max_vruntime  : 0.01
  .spread: 0.00
  .spread0   : 0.00
  .nr_running: 0
  .load  : 0
  .bkl_cnt   : 3934
  .nr_spread_over: 1

cfs_rq
  .exec_clock: 20.244893
  .MIN_vruntime  : 0.01
  .min_vruntime  : 36334.279140
  .max_vruntime  : 0.01
  .spread: 0.00
  .spread0   : 0.00
  .nr_running: 0
  .load  : 0
  .bkl_cnt   : 3934
  .nr_spread_over: 0

cfs_rq
  .exec_clock: 3305.155973
  .MIN_vruntime  : 0.01
  .min_vruntime  : 36334.279140
  .max_vruntime  : 0.01
  .spread: 0.00
  .spread0   : 0.00
  .nr_running: 1
  .load  : 1024
  .bkl_cnt   : 3934
  .nr_spread_over

Re: [git] CFS-devel, latest code

2007-09-25 Thread Srivatsa Vaddagiri

On Tue, Sep 25, 2007 at 12:10:44PM +0200, Ingo Molnar wrote:
> So the patch below just removes the is_same_group() condition. But i can 
> still see bad (and obvious) latencies with Mike's 2-hogs test:
> 
>  taskset 01 perl -e 'while (1) {}' &
>  nice -19 taskset 02 perl -e 'while (1) {}' &
> 
> So something's amiss.

While I try recreating this myself, I wonder if this patch helps?

---
 kernel/sched_fair.c |   19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

Index: current/kernel/sched_fair.c
===
--- current.orig/kernel/sched_fair.c
+++ current/kernel/sched_fair.c
@@ -794,7 +794,8 @@ static void yield_task_fair(struct rq *r
 static void check_preempt_wakeup(struct rq *rq, struct task_struct *p)
 {
struct task_struct *curr = rq->curr;
-   struct cfs_rq *cfs_rq = task_cfs_rq(curr);
+   struct cfs_rq *cfs_rq = task_cfs_rq(curr), *pcfs_rq;
+   struct sched_entity *se = >se, *pse = >se;
 
if (unlikely(rt_prio(p->prio))) {
update_rq_clock(rq);
@@ -802,11 +803,19 @@ static void check_preempt_wakeup(struct 
resched_task(curr);
return;
}
-   if (is_same_group(curr, p)) {
-   s64 delta = curr->se.vruntime - p->se.vruntime;
 
-   if (delta > (s64)sysctl_sched_wakeup_granularity)
-   resched_task(curr);
+   for_each_sched_entity(se) {
+   cfs_rq = cfs_rq_of(se);
+   pcfs_rq = cfs_rq_of(pse);
+
+   if (cfs_rq == pcfs_rq) {
+   s64 delta = se->vruntime - pse->vruntime;
+
+   if (delta > (s64)sysctl_sched_wakeup_granularity)
+   resched_task(curr);
+   break;
+   }
+   pse = pse->parent;
}
 }
 

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Ingo Molnar


* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> 
> * Srivatsa Vaddagiri <[EMAIL PROTECTED]> wrote:
> 
> > On Tue, Sep 25, 2007 at 11:13:31AM +0200, Ingo Molnar wrote:
> > > ok, i'm too seeing some sort of latency weirdness with 
> > > CONFIG_FAIR_GROUP_SCHED enabled, _if_ there's Xorg involved which runs 
> > > under root uid on my box - and hence gets 50% of all CPU time.
> > > 
> > > Srivatsa, any ideas? It could either be an accounting buglet (less 
> > > likely, seems like the group scheduling bits stick to the 50% splitup 
> > > nicely), or a preemption buglet. One potential preemption buglet would 
> > > be for the group scheduler to not properly preempt a running task when a 
> > > task from another uid is woken?
> > 
> > Yep, I noticed that too.
> > 
> > check_preempt_wakeup()
> > {
> > ...
> > 
> > if (is_same_group(curr, p)) {
> > ^
> > 
> > resched_task();
> > }
> > 
> > }
> > 
> > Will try a fix to check for preemption at higher levels ..
> 
> i bet fixing this will increase precision of group scheduling as well. 
> Those long latencies can be thought of as noise as well, and the 
> fair-scheduling "engine" might not be capable to offset all sources of 
> noise. So generally, while we allow a certain amount of lag in 
> preemption decisions (wakeup-granularity, etc.), with which the 
> fairness engine will cope just fine, we do not want to allow unlimited 
> lag.

hm, i tried the naive patch. In theory the vruntime of all scheduling 
entities should be 'compatible' and comparable (that's the point behind 
using vruntime - the fairness engine drives each vruntime forward and 
tries to balance them).

So the patch below just removes the is_same_group() condition. But i can 
still see bad (and obvious) latencies with Mike's 2-hogs test:

 taskset 01 perl -e 'while (1) {}' &
 nice -19 taskset 02 perl -e 'while (1) {}' &

So something's amiss.

Ingo

--->
Subject: sched: group scheduler wakeup latency fix
From: Ingo Molnar <[EMAIL PROTECTED]>

group scheduler wakeup latency fix.

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 kernel/sched_fair.c |9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

Index: linux/kernel/sched_fair.c
===
--- linux.orig/kernel/sched_fair.c
+++ linux/kernel/sched_fair.c
@@ -785,6 +785,7 @@ static void check_preempt_wakeup(struct 
 {
struct task_struct *curr = rq->curr;
struct cfs_rq *cfs_rq = task_cfs_rq(curr);
+   s64 delta;
 
if (unlikely(rt_prio(p->prio))) {
update_rq_clock(rq);
@@ -792,12 +793,10 @@ static void check_preempt_wakeup(struct 
resched_task(curr);
return;
}
-   if (is_same_group(curr, p)) {
-   s64 delta = curr->se.vruntime - p->se.vruntime;
+   delta = curr->se.vruntime - p->se.vruntime;
 
-   if (delta > (s64)sysctl_sched_wakeup_granularity)
-   resched_task(curr);
-   }
+   if (delta > (s64)sysctl_sched_wakeup_granularity)
+   resched_task(curr);
 }
 
 static struct task_struct *pick_next_task_fair(struct rq *rq)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Mike Galbraith

On Tue, 2007-09-25 at 11:47 +0200, Ingo Molnar wrote:
> * Mike Galbraith <[EMAIL PROTECTED]> wrote:
> 
> > On Tue, 2007-09-25 at 11:13 +0200, Ingo Molnar wrote:
> > 
> > > > [...] Latencies of up to 336ms hit me during the recompile (make -j3), 
> > > > with nothing else running. Since reboot, latencies are, so far, very 
> > > > very nice. [...]
> > > 
> > > 'very very nice' == 'best ever' ? :-)
> > 
> > Yes.  Very VERY nice feel.
> 
> cool :-)
> 
> Maybe there's more to come: if we can get CONFIG_FAIR_USER_SCHED to work 
> properly then your Xorg will have a load-independent 50% of CPU time all 
> to itself. (Group scheduling is quite impressive already: i can log in 
> as root without feeling _any_ effect from a perpetual 'hackbench 100' 
> running as uid mingo. Fork bombs no more.) Will the Amarok gforce plugin 
> like that CPU time splitup? (or is most of the gforce overhead under 
> your user uid?)

I run everything as root (naughty me), so I'd have to change my evil
ways to reap the benefits.  (I'll do that to test, but it's unlikely to
ever become a permanent habit here)  Amarok/Gforce will definitely like
the user split as long as latency is low.  Visualizations are not only
bandwidth hungry, they're extremely latency sensitive.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Ingo Molnar

* Mike Galbraith <[EMAIL PROTECTED]> wrote:

> On Tue, 2007-09-25 at 11:13 +0200, Ingo Molnar wrote:
> 
> > > [...] Latencies of up to 336ms hit me during the recompile (make -j3), 
> > > with nothing else running. Since reboot, latencies are, so far, very 
> > > very nice. [...]
> > 
> > 'very very nice' == 'best ever' ? :-)
> 
> Yes.  Very VERY nice feel.

cool :-)

Maybe there's more to come: if we can get CONFIG_FAIR_USER_SCHED to work 
properly then your Xorg will have a load-independent 50% of CPU time all 
to itself. (Group scheduling is quite impressive already: i can log in 
as root without feeling _any_ effect from a perpetual 'hackbench 100' 
running as uid mingo. Fork bombs no more.) Will the Amarok gforce plugin 
like that CPU time splitup? (or is most of the gforce overhead under 
your user uid?)

it could also work out negatively, _sometimes_ X does not like being too 
high prio. (weird as that might be.) So we'll see.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Ingo Molnar


* Srivatsa Vaddagiri <[EMAIL PROTECTED]> wrote:

> On Tue, Sep 25, 2007 at 11:13:31AM +0200, Ingo Molnar wrote:
> > ok, i'm too seeing some sort of latency weirdness with 
> > CONFIG_FAIR_GROUP_SCHED enabled, _if_ there's Xorg involved which runs 
> > under root uid on my box - and hence gets 50% of all CPU time.
> > 
> > Srivatsa, any ideas? It could either be an accounting buglet (less 
> > likely, seems like the group scheduling bits stick to the 50% splitup 
> > nicely), or a preemption buglet. One potential preemption buglet would 
> > be for the group scheduler to not properly preempt a running task when a 
> > task from another uid is woken?
> 
> Yep, I noticed that too.
> 
> check_preempt_wakeup()
> {
>   ...
> 
>   if (is_same_group(curr, p)) {
>   ^
> 
>   resched_task();
>   }
> 
> }
> 
> Will try a fix to check for preemption at higher levels ..

i bet fixing this will increase precision of group scheduling as well. 
Those long latencies can be thought of as noise as well, and the 
fair-scheduling "engine" might not be capable to offset all sources of 
noise. So generally, while we allow a certain amount of lag in 
preemption decisions (wakeup-granularity, etc.), with which the fairness 
engine will cope just fine, we do not want to allow unlimited lag.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Srivatsa Vaddagiri

On Tue, Sep 25, 2007 at 11:13:31AM +0200, Ingo Molnar wrote:
> ok, i'm too seeing some sort of latency weirdness with 
> CONFIG_FAIR_GROUP_SCHED enabled, _if_ there's Xorg involved which runs 
> under root uid on my box - and hence gets 50% of all CPU time.
> 
> Srivatsa, any ideas? It could either be an accounting buglet (less 
> likely, seems like the group scheduling bits stick to the 50% splitup 
> nicely), or a preemption buglet. One potential preemption buglet would 
> be for the group scheduler to not properly preempt a running task when a 
> task from another uid is woken?

Yep, I noticed that too.

check_preempt_wakeup()
{
...

if (is_same_group(curr, p)) {
^

resched_task();
}

}

Will try a fix to check for preemption at higher levels ..

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Ingo Molnar


* S.Çağlar Onur <[EMAIL PROTECTED]> wrote:

> Seems like following trivial change needed to compile without 
> CONFIG_SCHEDSTATS
> 
> [EMAIL PROTECTED] linux-2.6 $ LC_ALL=C make
>   CHK include/linux/version.h
>   CHK include/linux/utsrelease.h
>   CALLscripts/checksyscalls.sh
>   CHK include/linux/compile.h
>   CC  kernel/sched.o
> In file included from kernel/sched.c:853:
> kernel/sched_debug.c: In function `print_cfs_rq':
> kernel/sched_debug.c:139: error: structure has no member named `bkl_cnt'
> kernel/sched_debug.c:139: error: structure has no member named `bkl_cnt'
> make[1]: *** [kernel/sched.o] Error 1
> make: *** [kernel] Error 2
> 
> Signed-off-by: S.Çağlar Onur <[EMAIL PROTECTED]>

thanks, applied!

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Mike Galbraith

On Tue, 2007-09-25 at 11:13 +0200, Ingo Molnar wrote:

> > [...] Latencies of up to 336ms hit me during the recompile (make -j3), 
> > with nothing else running. Since reboot, latencies are, so far, very 
> > very nice. [...]
> 
> 'very very nice' == 'best ever' ? :-)

Yes.  Very VERY nice feel.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Mike Galbraith

On Tue, 2007-09-25 at 14:41 +0530, Srivatsa Vaddagiri wrote:
> On Tue, Sep 25, 2007 at 02:23:29PM +0530, Srivatsa Vaddagiri wrote:
> > On Tue, Sep 25, 2007 at 10:33:27AM +0200, Mike Galbraith wrote:
> > > > Darn, have news: latency thing isn't dead.  Two busy loops, one at nice
> > > > 0 pinned to CPU0, and one at nice 19 pinned to CPU1 produced the
> > > > latencies below for nice -5 Xorg.  Didn't kill the box though.
> 
> These busy loops - are they spawned by the same user? Is it the root
> user? Also is this seen in UP mode also?
> 
> Can you also pls check if tuning root user's cpu share helps? Basically,
> 
>   # echo 4096 > /proc/root_user_share
> 
> [or any other higher value]

I'll try these after I beat on the box some more.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Mike Galbraith

On Tue, 2007-09-25 at 14:23 +0530, Srivatsa Vaddagiri wrote:

> Mike,
>   Do you have FAIR_USER_SCHED turned on as well? Can you send me
> your .config pls?

I did have.  gzipped config attached.. this is current though, after
disabling groups.  I'm still beating on the basic changes (boy does it
ever feel nice [awaits other shoe]).

-Mike

config.gz
Description: GNU Zip compressed data

Re: [git] CFS-devel, latest code

2007-09-25 Thread Ingo Molnar

* Mike Galbraith <[EMAIL PROTECTED]> wrote:

> > sched_debug (attached) is.. strange.
> 
> Disabling CONFIG_FAIR_GROUP_SCHED fixed both.  [...]

heh. Evil plan to enable the group scheduler by default worked out as 
planned! ;-) [guess how many container users would do ... interactivity 
tests like you do??]

> [...] Latencies of up to 336ms hit me during the recompile (make -j3), 
> with nothing else running. Since reboot, latencies are, so far, very 
> very nice. [...]

'very very nice' == 'best ever' ? :-)

> [...] I'm leaving it disabled for now.

ok, i'm too seeing some sort of latency weirdness with 
CONFIG_FAIR_GROUP_SCHED enabled, _if_ there's Xorg involved which runs 
under root uid on my box - and hence gets 50% of all CPU time.

Srivatsa, any ideas? It could either be an accounting buglet (less 
likely, seems like the group scheduling bits stick to the 50% splitup 
nicely), or a preemption buglet. One potential preemption buglet would 
be for the group scheduler to not properly preempt a running task when a 
task from another uid is woken?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Srivatsa Vaddagiri

On Tue, Sep 25, 2007 at 02:23:29PM +0530, Srivatsa Vaddagiri wrote:
> On Tue, Sep 25, 2007 at 10:33:27AM +0200, Mike Galbraith wrote:
> > > Darn, have news: latency thing isn't dead.  Two busy loops, one at nice
> > > 0 pinned to CPU0, and one at nice 19 pinned to CPU1 produced the
> > > latencies below for nice -5 Xorg.  Didn't kill the box though.

These busy loops - are they spawned by the same user? Is it the root
user? Also is this seen in UP mode also?

Can you also pls check if tuning root user's cpu share helps? Basically,

# echo 4096 > /proc/root_user_share

[or any other higher value]

> Also how do you check se.wait_max?

Ok ..I see that it is in /proc/sched_debug.


-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Andrew Morton

On Tue, 25 Sep 2007 14:13:27 +0530 Srivatsa Vaddagiri <[EMAIL PROTECTED]> wrote:

> On Tue, Sep 25, 2007 at 12:41:20AM -0700, Andrew Morton wrote:
> > This doornails the Vaio.  After grub handover the screen remains black
> > and the fan goes whir.
> > 
> > http://userweb.kernel.org/~akpm/config-sony.txt
> 
> This seems to be UP regression. Sorry abt that. I could recreate 
> the problem very easily with CONFIG_SMP turned off.
> 
> Can you check if this patch works? Works for me here.
> 
> --
> 
> Fix UP breakage.
> 
> Signed-off-by : Srivatsa Vaddagiri <[EMAIL PROTECTED]>
> 
> 
> ---
>  kernel/sched.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Index: current/kernel/sched.c
> ===
> --- current.orig/kernel/sched.c
> +++ current/kernel/sched.c
> @@ -1029,8 +1029,8 @@ static inline void __set_task_cpu(struct
>  {
>  #ifdef CONFIG_SMP
>   task_thread_info(p)->cpu = cpu;
> - set_task_cfs_rq(p);
>  #endif
> + set_task_cfs_rq(p);
>  }
>  
>  #ifdef CONFIG_SMP

yup, that's a fix.  It was 15 minutes too late for rc8-mm1 though :(
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Srivatsa Vaddagiri

On Tue, Sep 25, 2007 at 10:33:27AM +0200, Mike Galbraith wrote:
> > Darn, have news: latency thing isn't dead.  Two busy loops, one at nice
> > 0 pinned to CPU0, and one at nice 19 pinned to CPU1 produced the
> > latencies below for nice -5 Xorg.  Didn't kill the box though.
> > 
> > se.wait_max  :10.068169
> > se.wait_max  : 7.465334
> > se.wait_max  :   135.501816
> > se.wait_max  : 0.884483
> > se.wait_max  :   144.218955
> > se.wait_max  :   128.578376
> > se.wait_max  :93.975768
> > se.wait_max  : 4.965965
> > se.wait_max  :   113.655533
> > se.wait_max  : 4.301075
> > 
> > sched_debug (attached) is.. strange.

Mike,
Do you have FAIR_USER_SCHED turned on as well? Can you send me
your .config pls?

Also how do you check se.wait_max?

> Disabling CONFIG_FAIR_GROUP_SCHED fixed both.  Latencies of up to 336ms
> hit me during the recompile (make -j3), with nothing else running.
> Since reboot, latencies are, so far, very very nice.  I'm leaving it
> disabled for now.

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Srivatsa Vaddagiri

On Tue, Sep 25, 2007 at 12:41:20AM -0700, Andrew Morton wrote:
> This doornails the Vaio.  After grub handover the screen remains black
> and the fan goes whir.
> 
> http://userweb.kernel.org/~akpm/config-sony.txt

This seems to be UP regression. Sorry abt that. I could recreate 
the problem very easily with CONFIG_SMP turned off.

Can you check if this patch works? Works for me here.

--

Fix UP breakage.

Signed-off-by : Srivatsa Vaddagiri <[EMAIL PROTECTED]>


---
 kernel/sched.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: current/kernel/sched.c
===
--- current.orig/kernel/sched.c
+++ current/kernel/sched.c
@@ -1029,8 +1029,8 @@ static inline void __set_task_cpu(struct
 {
 #ifdef CONFIG_SMP
task_thread_info(p)->cpu = cpu;
-   set_task_cfs_rq(p);
 #endif
+   set_task_cfs_rq(p);
 }
 
 #ifdef CONFIG_SMP

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Mike Galbraith

On Tue, 2007-09-25 at 09:35 +0200, Mike Galbraith wrote:

> Darn, have news: latency thing isn't dead.  Two busy loops, one at nice
> 0 pinned to CPU0, and one at nice 19 pinned to CPU1 produced the
> latencies below for nice -5 Xorg.  Didn't kill the box though.
> 
> se.wait_max  :10.068169
> se.wait_max  : 7.465334
> se.wait_max  :   135.501816
> se.wait_max  : 0.884483
> se.wait_max  :   144.218955
> se.wait_max  :   128.578376
> se.wait_max  :93.975768
> se.wait_max  : 4.965965
> se.wait_max  :   113.655533
> se.wait_max  : 4.301075
> 
> sched_debug (attached) is.. strange.

Disabling CONFIG_FAIR_GROUP_SCHED fixed both.  Latencies of up to 336ms
hit me during the recompile (make -j3), with nothing else running.
Since reboot, latencies are, so far, very very nice.  I'm leaving it
disabled for now.

-Mike 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Andrew Morton

On Mon, 24 Sep 2007 23:45:37 +0200 Ingo Molnar <[EMAIL PROTECTED]> wrote:

> The latest sched-devel.git tree can be pulled from:
>  
>   
> git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel.git

This doornails the Vaio.  After grub handover the screen remains black
and the fan goes whir.

http://userweb.kernel.org/~akpm/config-sony.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Mike Galbraith

On Tue, 2007-09-25 at 08:10 +0200, Mike Galbraith wrote:
>  no news is good news.

Darn, have news: latency thing isn't dead.  Two busy loops, one at nice
0 pinned to CPU0, and one at nice 19 pinned to CPU1 produced the
latencies below for nice -5 Xorg.  Didn't kill the box though.

se.wait_max  :10.068169
se.wait_max  : 7.465334
se.wait_max  :   135.501816
se.wait_max  : 0.884483
se.wait_max  :   144.218955
se.wait_max  :   128.578376
se.wait_max  :93.975768
se.wait_max  : 4.965965
se.wait_max  :   113.655533
se.wait_max  : 4.301075

sched_debug (attached) is.. strange.

-Mike


sched_debug.gz
Description: GNU Zip compressed data

Re: [git] CFS-devel, latest code

2007-09-25 Thread S.Çağlar Onur

Hi;

25 Eyl 2007 Sal tarihinde, Ingo Molnar şunları yazmıştı: 
> 
> The latest sched-devel.git tree can be pulled from:
>  
>   
> git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel.git
> 
> Lots of scheduler updates in the past few days, done by many people. 
> Most importantly, the SMP latency problems reported and debugged by Mike 
> Galbraith should be fixed for good now.
> 
> I've also included the latest and greatest group-fairness scheduling 
> patch from Srivatsa Vaddagiri, which can now be used without containers 
> as well (in a simplified, each-uid-gets-its-fair-share mode). This 
> feature (CONFIG_FAIR_USER_SCHED) is now default-enabled.
> 
> Peter Zijlstra has been busy enhancing the math of the scheduler: we've 
> got the new 'vslice' forked-task code that should enable snappier shell 
> commands during load while still keeping kbuild workloads in check.
> 
> On my testsystems this codebase starts looking like something that could 
> be merged into v2.6.24, so please give it a good workout and let us know 
> if there's anything bad going on. (If this works out fine then i'll 
> propagate these changes back into the CFS backport, for wider testing.)

Seems like following trivial change needed to compile without CONFIG_SCHEDSTATS

[EMAIL PROTECTED] linux-2.6 $ LC_ALL=C make
  CHK include/linux/version.h
  CHK include/linux/utsrelease.h
  CALLscripts/checksyscalls.sh
  CHK include/linux/compile.h
  CC  kernel/sched.o
In file included from kernel/sched.c:853:
kernel/sched_debug.c: In function `print_cfs_rq':
kernel/sched_debug.c:139: error: structure has no member named `bkl_cnt'
kernel/sched_debug.c:139: error: structure has no member named `bkl_cnt'
make[1]: *** [kernel/sched.o] Error 1
make: *** [kernel] Error 2

Signed-off-by: S.Çağlar Onur <[EMAIL PROTECTED]>

diff --git a/kernel/sched_debug.c b/kernel/sched_debug.c
index b68e593..4659c90 100644
--- a/kernel/sched_debug.c
+++ b/kernel/sched_debug.c
@@ -136,8 +136,10 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct 
cfs_rq *cfs_rq)
SPLIT_NS(spread0));
SEQ_printf(m, "  .%-30s: %ld\n", "nr_running", cfs_rq->nr_running);
SEQ_printf(m, "  .%-30s: %ld\n", "load", cfs_rq->load.weight);
+#ifdef CONFIG_SCHEDSTATS
SEQ_printf(m, "  .%-30s: %ld\n", "bkl_cnt",
rq->bkl_cnt);
+#endif
SEQ_printf(m, "  .%-30s: %ld\n", "nr_spread_over",
cfs_rq->nr_spread_over);
 }


Cheers
-- 
S.Çağlar Onur <[EMAIL PROTECTED]>
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in house!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Ingo Molnar


* Daniel Walker <[EMAIL PROTECTED]> wrote:

> On Mon, 2007-09-24 at 23:45 +0200, Ingo Molnar wrote:
> > Lots of scheduler updates in the past few days, done by many people. 
> > Most importantly, the SMP latency problems reported and debugged by
> > Mike 
> > Galbraith should be fixed for good now. 
> 
> Does this have anything to do with idle balancing ? I noticed some 
> fairly large latencies in that code in 2.6.23-rc's ..

any measurements?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Mike Galbraith

On Mon, 2007-09-24 at 23:45 +0200, Ingo Molnar wrote:

> Mike Galbraith (2):
>   sched: fix SMP migration latencies
>   sched: fix formatting of /proc/sched_debug

Off-by-one bug in attribution, rocks and sticks (down boy!) don't
count ;-)  I just built, and will spend the morning beating on it... no
news is good news.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Mike Galbraith

On Mon, 2007-09-24 at 23:45 +0200, Ingo Molnar wrote:

 Mike Galbraith (2):
   sched: fix SMP migration latencies
   sched: fix formatting of /proc/sched_debug

Off-by-one bug in attribution, rocks and sticks (down boy!) don't
count ;-)  I just built, and will spend the morning beating on it... no
news is good news.

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Ingo Molnar


* Daniel Walker [EMAIL PROTECTED] wrote:

 On Mon, 2007-09-24 at 23:45 +0200, Ingo Molnar wrote:
  Lots of scheduler updates in the past few days, done by many people. 
  Most importantly, the SMP latency problems reported and debugged by
  Mike 
  Galbraith should be fixed for good now. 
 
 Does this have anything to do with idle balancing ? I noticed some 
 fairly large latencies in that code in 2.6.23-rc's ..

any measurements?

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread S.Çağlar Onur

Hi;

25 Eyl 2007 Sal tarihinde, Ingo Molnar şunları yazmıştı: 
 
 The latest sched-devel.git tree can be pulled from:
  
   
 git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel.git
 
 Lots of scheduler updates in the past few days, done by many people. 
 Most importantly, the SMP latency problems reported and debugged by Mike 
 Galbraith should be fixed for good now.
 
 I've also included the latest and greatest group-fairness scheduling 
 patch from Srivatsa Vaddagiri, which can now be used without containers 
 as well (in a simplified, each-uid-gets-its-fair-share mode). This 
 feature (CONFIG_FAIR_USER_SCHED) is now default-enabled.
 
 Peter Zijlstra has been busy enhancing the math of the scheduler: we've 
 got the new 'vslice' forked-task code that should enable snappier shell 
 commands during load while still keeping kbuild workloads in check.
 
 On my testsystems this codebase starts looking like something that could 
 be merged into v2.6.24, so please give it a good workout and let us know 
 if there's anything bad going on. (If this works out fine then i'll 
 propagate these changes back into the CFS backport, for wider testing.)

Seems like following trivial change needed to compile without CONFIG_SCHEDSTATS

[EMAIL PROTECTED] linux-2.6 $ LC_ALL=C make
  CHK include/linux/version.h
  CHK include/linux/utsrelease.h
  CALLscripts/checksyscalls.sh
  CHK include/linux/compile.h
  CC  kernel/sched.o
In file included from kernel/sched.c:853:
kernel/sched_debug.c: In function `print_cfs_rq':
kernel/sched_debug.c:139: error: structure has no member named `bkl_cnt'
kernel/sched_debug.c:139: error: structure has no member named `bkl_cnt'
make[1]: *** [kernel/sched.o] Error 1
make: *** [kernel] Error 2

Signed-off-by: S.Çağlar Onur [EMAIL PROTECTED]

diff --git a/kernel/sched_debug.c b/kernel/sched_debug.c
index b68e593..4659c90 100644
--- a/kernel/sched_debug.c
+++ b/kernel/sched_debug.c
@@ -136,8 +136,10 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct 
cfs_rq *cfs_rq)
SPLIT_NS(spread0));
SEQ_printf(m,   .%-30s: %ld\n, nr_running, cfs_rq-nr_running);
SEQ_printf(m,   .%-30s: %ld\n, load, cfs_rq-load.weight);
+#ifdef CONFIG_SCHEDSTATS
SEQ_printf(m,   .%-30s: %ld\n, bkl_cnt,
rq-bkl_cnt);
+#endif
SEQ_printf(m,   .%-30s: %ld\n, nr_spread_over,
cfs_rq-nr_spread_over);
 }


Cheers
-- 
S.Çağlar Onur [EMAIL PROTECTED]
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in house!
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Mike Galbraith

On Tue, 2007-09-25 at 08:10 +0200, Mike Galbraith wrote:
  no news is good news.

Darn, have news: latency thing isn't dead.  Two busy loops, one at nice
0 pinned to CPU0, and one at nice 19 pinned to CPU1 produced the
latencies below for nice -5 Xorg.  Didn't kill the box though.

se.wait_max  :10.068169
se.wait_max  : 7.465334
se.wait_max  :   135.501816
se.wait_max  : 0.884483
se.wait_max  :   144.218955
se.wait_max  :   128.578376
se.wait_max  :93.975768
se.wait_max  : 4.965965
se.wait_max  :   113.655533
se.wait_max  : 4.301075

sched_debug (attached) is.. strange.

-Mike


sched_debug.gz
Description: GNU Zip compressed data

Re: [git] CFS-devel, latest code

2007-09-25 Thread Andrew Morton

On Mon, 24 Sep 2007 23:45:37 +0200 Ingo Molnar [EMAIL PROTECTED] wrote:

 The latest sched-devel.git tree can be pulled from:
  
   
 git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel.git

This doornails the Vaio.  After grub handover the screen remains black
and the fan goes whir.

http://userweb.kernel.org/~akpm/config-sony.txt
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Srivatsa Vaddagiri

On Tue, Sep 25, 2007 at 12:41:20AM -0700, Andrew Morton wrote:
 This doornails the Vaio.  After grub handover the screen remains black
 and the fan goes whir.
 
 http://userweb.kernel.org/~akpm/config-sony.txt

This seems to be UP regression. Sorry abt that. I could recreate 
the problem very easily with CONFIG_SMP turned off.

Can you check if this patch works? Works for me here.

--

Fix UP breakage.

Signed-off-by : Srivatsa Vaddagiri [EMAIL PROTECTED]


---
 kernel/sched.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: current/kernel/sched.c
===
--- current.orig/kernel/sched.c
+++ current/kernel/sched.c
@@ -1029,8 +1029,8 @@ static inline void __set_task_cpu(struct
 {
 #ifdef CONFIG_SMP
task_thread_info(p)-cpu = cpu;
-   set_task_cfs_rq(p);
 #endif
+   set_task_cfs_rq(p);
 }
 
 #ifdef CONFIG_SMP

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Mike Galbraith

On Tue, 2007-09-25 at 09:35 +0200, Mike Galbraith wrote:

 Darn, have news: latency thing isn't dead.  Two busy loops, one at nice
 0 pinned to CPU0, and one at nice 19 pinned to CPU1 produced the
 latencies below for nice -5 Xorg.  Didn't kill the box though.
 
 se.wait_max  :10.068169
 se.wait_max  : 7.465334
 se.wait_max  :   135.501816
 se.wait_max  : 0.884483
 se.wait_max  :   144.218955
 se.wait_max  :   128.578376
 se.wait_max  :93.975768
 se.wait_max  : 4.965965
 se.wait_max  :   113.655533
 se.wait_max  : 4.301075
 
 sched_debug (attached) is.. strange.

Disabling CONFIG_FAIR_GROUP_SCHED fixed both.  Latencies of up to 336ms
hit me during the recompile (make -j3), with nothing else running.
Since reboot, latencies are, so far, very very nice.  I'm leaving it
disabled for now.

-Mike 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Srivatsa Vaddagiri

On Tue, Sep 25, 2007 at 10:33:27AM +0200, Mike Galbraith wrote:
  Darn, have news: latency thing isn't dead.  Two busy loops, one at nice
  0 pinned to CPU0, and one at nice 19 pinned to CPU1 produced the
  latencies below for nice -5 Xorg.  Didn't kill the box though.
  
  se.wait_max  :10.068169
  se.wait_max  : 7.465334
  se.wait_max  :   135.501816
  se.wait_max  : 0.884483
  se.wait_max  :   144.218955
  se.wait_max  :   128.578376
  se.wait_max  :93.975768
  se.wait_max  : 4.965965
  se.wait_max  :   113.655533
  se.wait_max  : 4.301075
  
  sched_debug (attached) is.. strange.

Mike,
Do you have FAIR_USER_SCHED turned on as well? Can you send me
your .config pls?

Also how do you check se.wait_max?

 Disabling CONFIG_FAIR_GROUP_SCHED fixed both.  Latencies of up to 336ms
 hit me during the recompile (make -j3), with nothing else running.
 Since reboot, latencies are, so far, very very nice.  I'm leaving it
 disabled for now.

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Andrew Morton

On Tue, 25 Sep 2007 14:13:27 +0530 Srivatsa Vaddagiri [EMAIL PROTECTED] wrote:

 On Tue, Sep 25, 2007 at 12:41:20AM -0700, Andrew Morton wrote:
  This doornails the Vaio.  After grub handover the screen remains black
  and the fan goes whir.
  
  http://userweb.kernel.org/~akpm/config-sony.txt
 
 This seems to be UP regression. Sorry abt that. I could recreate 
 the problem very easily with CONFIG_SMP turned off.
 
 Can you check if this patch works? Works for me here.
 
 --
 
 Fix UP breakage.
 
 Signed-off-by : Srivatsa Vaddagiri [EMAIL PROTECTED]
 
 
 ---
  kernel/sched.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 Index: current/kernel/sched.c
 ===
 --- current.orig/kernel/sched.c
 +++ current/kernel/sched.c
 @@ -1029,8 +1029,8 @@ static inline void __set_task_cpu(struct
  {
  #ifdef CONFIG_SMP
   task_thread_info(p)-cpu = cpu;
 - set_task_cfs_rq(p);
  #endif
 + set_task_cfs_rq(p);
  }
  
  #ifdef CONFIG_SMP

yup, that's a fix.  It was 15 minutes too late for rc8-mm1 though :(
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Srivatsa Vaddagiri

On Tue, Sep 25, 2007 at 02:23:29PM +0530, Srivatsa Vaddagiri wrote:
 On Tue, Sep 25, 2007 at 10:33:27AM +0200, Mike Galbraith wrote:
   Darn, have news: latency thing isn't dead.  Two busy loops, one at nice
   0 pinned to CPU0, and one at nice 19 pinned to CPU1 produced the
   latencies below for nice -5 Xorg.  Didn't kill the box though.

These busy loops - are they spawned by the same user? Is it the root
user? Also is this seen in UP mode also?

Can you also pls check if tuning root user's cpu share helps? Basically,

# echo 4096  /proc/root_user_share

[or any other higher value]

 Also how do you check se.wait_max?

Ok ..I see that it is in /proc/sched_debug.


-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Mike Galbraith

On Tue, 2007-09-25 at 14:23 +0530, Srivatsa Vaddagiri wrote:

 Mike,
   Do you have FAIR_USER_SCHED turned on as well? Can you send me
 your .config pls?

I did have.  gzipped config attached.. this is current though, after
disabling groups.  I'm still beating on the basic changes (boy does it
ever feel nice [awaits other shoe]).

-Mike


config.gz
Description: GNU Zip compressed data

Re: [git] CFS-devel, latest code

2007-09-25 Thread Ingo Molnar


* Mike Galbraith [EMAIL PROTECTED] wrote:

  sched_debug (attached) is.. strange.
 
 Disabling CONFIG_FAIR_GROUP_SCHED fixed both.  [...]

heh. Evil plan to enable the group scheduler by default worked out as 
planned! ;-) [guess how many container users would do ... interactivity 
tests like you do??]

 [...] Latencies of up to 336ms hit me during the recompile (make -j3), 
 with nothing else running. Since reboot, latencies are, so far, very 
 very nice. [...]

'very very nice' == 'best ever' ? :-)

 [...] I'm leaving it disabled for now.

ok, i'm too seeing some sort of latency weirdness with 
CONFIG_FAIR_GROUP_SCHED enabled, _if_ there's Xorg involved which runs 
under root uid on my box - and hence gets 50% of all CPU time.

Srivatsa, any ideas? It could either be an accounting buglet (less 
likely, seems like the group scheduling bits stick to the 50% splitup 
nicely), or a preemption buglet. One potential preemption buglet would 
be for the group scheduler to not properly preempt a running task when a 
task from another uid is woken?

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Mike Galbraith

On Tue, 2007-09-25 at 14:41 +0530, Srivatsa Vaddagiri wrote:
 On Tue, Sep 25, 2007 at 02:23:29PM +0530, Srivatsa Vaddagiri wrote:
  On Tue, Sep 25, 2007 at 10:33:27AM +0200, Mike Galbraith wrote:
Darn, have news: latency thing isn't dead.  Two busy loops, one at nice
0 pinned to CPU0, and one at nice 19 pinned to CPU1 produced the
latencies below for nice -5 Xorg.  Didn't kill the box though.
 
 These busy loops - are they spawned by the same user? Is it the root
 user? Also is this seen in UP mode also?
 
 Can you also pls check if tuning root user's cpu share helps? Basically,
 
   # echo 4096  /proc/root_user_share
 
 [or any other higher value]

I'll try these after I beat on the box some more.

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Mike Galbraith

On Tue, 2007-09-25 at 11:13 +0200, Ingo Molnar wrote:

  [...] Latencies of up to 336ms hit me during the recompile (make -j3), 
  with nothing else running. Since reboot, latencies are, so far, very 
  very nice. [...]
 
 'very very nice' == 'best ever' ? :-)

Yes.  Very VERY nice feel.

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Ingo Molnar


* S.Çağlar Onur [EMAIL PROTECTED] wrote:

 Seems like following trivial change needed to compile without 
 CONFIG_SCHEDSTATS
 
 [EMAIL PROTECTED] linux-2.6 $ LC_ALL=C make
   CHK include/linux/version.h
   CHK include/linux/utsrelease.h
   CALLscripts/checksyscalls.sh
   CHK include/linux/compile.h
   CC  kernel/sched.o
 In file included from kernel/sched.c:853:
 kernel/sched_debug.c: In function `print_cfs_rq':
 kernel/sched_debug.c:139: error: structure has no member named `bkl_cnt'
 kernel/sched_debug.c:139: error: structure has no member named `bkl_cnt'
 make[1]: *** [kernel/sched.o] Error 1
 make: *** [kernel] Error 2
 
 Signed-off-by: S.Çağlar Onur [EMAIL PROTECTED]

thanks, applied!

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Srivatsa Vaddagiri

On Tue, Sep 25, 2007 at 11:13:31AM +0200, Ingo Molnar wrote:
 ok, i'm too seeing some sort of latency weirdness with 
 CONFIG_FAIR_GROUP_SCHED enabled, _if_ there's Xorg involved which runs 
 under root uid on my box - and hence gets 50% of all CPU time.
 
 Srivatsa, any ideas? It could either be an accounting buglet (less 
 likely, seems like the group scheduling bits stick to the 50% splitup 
 nicely), or a preemption buglet. One potential preemption buglet would 
 be for the group scheduler to not properly preempt a running task when a 
 task from another uid is woken?

Yep, I noticed that too.

check_preempt_wakeup()
{
...

if (is_same_group(curr, p)) {
^

resched_task();
}

}

Will try a fix to check for preemption at higher levels ..

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Ingo Molnar


* Srivatsa Vaddagiri [EMAIL PROTECTED] wrote:

 On Tue, Sep 25, 2007 at 11:13:31AM +0200, Ingo Molnar wrote:
  ok, i'm too seeing some sort of latency weirdness with 
  CONFIG_FAIR_GROUP_SCHED enabled, _if_ there's Xorg involved which runs 
  under root uid on my box - and hence gets 50% of all CPU time.
  
  Srivatsa, any ideas? It could either be an accounting buglet (less 
  likely, seems like the group scheduling bits stick to the 50% splitup 
  nicely), or a preemption buglet. One potential preemption buglet would 
  be for the group scheduler to not properly preempt a running task when a 
  task from another uid is woken?
 
 Yep, I noticed that too.
 
 check_preempt_wakeup()
 {
   ...
 
   if (is_same_group(curr, p)) {
   ^
 
   resched_task();
   }
 
 }
 
 Will try a fix to check for preemption at higher levels ..

i bet fixing this will increase precision of group scheduling as well. 
Those long latencies can be thought of as noise as well, and the 
fair-scheduling engine might not be capable to offset all sources of 
noise. So generally, while we allow a certain amount of lag in 
preemption decisions (wakeup-granularity, etc.), with which the fairness 
engine will cope just fine, we do not want to allow unlimited lag.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Ingo Molnar


* Mike Galbraith [EMAIL PROTECTED] wrote:

 On Tue, 2007-09-25 at 11:13 +0200, Ingo Molnar wrote:
 
   [...] Latencies of up to 336ms hit me during the recompile (make -j3), 
   with nothing else running. Since reboot, latencies are, so far, very 
   very nice. [...]
  
  'very very nice' == 'best ever' ? :-)
 
 Yes.  Very VERY nice feel.

cool :-)

Maybe there's more to come: if we can get CONFIG_FAIR_USER_SCHED to work 
properly then your Xorg will have a load-independent 50% of CPU time all 
to itself. (Group scheduling is quite impressive already: i can log in 
as root without feeling _any_ effect from a perpetual 'hackbench 100' 
running as uid mingo. Fork bombs no more.) Will the Amarok gforce plugin 
like that CPU time splitup? (or is most of the gforce overhead under 
your user uid?)

it could also work out negatively, _sometimes_ X does not like being too 
high prio. (weird as that might be.) So we'll see.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Mike Galbraith

On Tue, 2007-09-25 at 11:47 +0200, Ingo Molnar wrote:
 * Mike Galbraith [EMAIL PROTECTED] wrote:
 
  On Tue, 2007-09-25 at 11:13 +0200, Ingo Molnar wrote:
  
[...] Latencies of up to 336ms hit me during the recompile (make -j3), 
with nothing else running. Since reboot, latencies are, so far, very 
very nice. [...]
   
   'very very nice' == 'best ever' ? :-)
  
  Yes.  Very VERY nice feel.
 
 cool :-)
 
 Maybe there's more to come: if we can get CONFIG_FAIR_USER_SCHED to work 
 properly then your Xorg will have a load-independent 50% of CPU time all 
 to itself. (Group scheduling is quite impressive already: i can log in 
 as root without feeling _any_ effect from a perpetual 'hackbench 100' 
 running as uid mingo. Fork bombs no more.) Will the Amarok gforce plugin 
 like that CPU time splitup? (or is most of the gforce overhead under 
 your user uid?)

I run everything as root (naughty me), so I'd have to change my evil
ways to reap the benefits.  (I'll do that to test, but it's unlikely to
ever become a permanent habit here)  Amarok/Gforce will definitely like
the user split as long as latency is low.  Visualizations are not only
bandwidth hungry, they're extremely latency sensitive.

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Ingo Molnar


* Ingo Molnar [EMAIL PROTECTED] wrote:

 
 * Srivatsa Vaddagiri [EMAIL PROTECTED] wrote:
 
  On Tue, Sep 25, 2007 at 11:13:31AM +0200, Ingo Molnar wrote:
   ok, i'm too seeing some sort of latency weirdness with 
   CONFIG_FAIR_GROUP_SCHED enabled, _if_ there's Xorg involved which runs 
   under root uid on my box - and hence gets 50% of all CPU time.
   
   Srivatsa, any ideas? It could either be an accounting buglet (less 
   likely, seems like the group scheduling bits stick to the 50% splitup 
   nicely), or a preemption buglet. One potential preemption buglet would 
   be for the group scheduler to not properly preempt a running task when a 
   task from another uid is woken?
  
  Yep, I noticed that too.
  
  check_preempt_wakeup()
  {
  ...
  
  if (is_same_group(curr, p)) {
  ^
  
  resched_task();
  }
  
  }
  
  Will try a fix to check for preemption at higher levels ..
 
 i bet fixing this will increase precision of group scheduling as well. 
 Those long latencies can be thought of as noise as well, and the 
 fair-scheduling engine might not be capable to offset all sources of 
 noise. So generally, while we allow a certain amount of lag in 
 preemption decisions (wakeup-granularity, etc.), with which the 
 fairness engine will cope just fine, we do not want to allow unlimited 
 lag.

hm, i tried the naive patch. In theory the vruntime of all scheduling 
entities should be 'compatible' and comparable (that's the point behind 
using vruntime - the fairness engine drives each vruntime forward and 
tries to balance them).

So the patch below just removes the is_same_group() condition. But i can 
still see bad (and obvious) latencies with Mike's 2-hogs test:

 taskset 01 perl -e 'while (1) {}' 
 nice -19 taskset 02 perl -e 'while (1) {}' 

So something's amiss.

Ingo

---
Subject: sched: group scheduler wakeup latency fix
From: Ingo Molnar [EMAIL PROTECTED]

group scheduler wakeup latency fix.

Signed-off-by: Ingo Molnar [EMAIL PROTECTED]
---
 kernel/sched_fair.c |9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

Index: linux/kernel/sched_fair.c
===
--- linux.orig/kernel/sched_fair.c
+++ linux/kernel/sched_fair.c
@@ -785,6 +785,7 @@ static void check_preempt_wakeup(struct 
 {
struct task_struct *curr = rq-curr;
struct cfs_rq *cfs_rq = task_cfs_rq(curr);
+   s64 delta;
 
if (unlikely(rt_prio(p-prio))) {
update_rq_clock(rq);
@@ -792,12 +793,10 @@ static void check_preempt_wakeup(struct 
resched_task(curr);
return;
}
-   if (is_same_group(curr, p)) {
-   s64 delta = curr-se.vruntime - p-se.vruntime;
+   delta = curr-se.vruntime - p-se.vruntime;
 
-   if (delta  (s64)sysctl_sched_wakeup_granularity)
-   resched_task(curr);
-   }
+   if (delta  (s64)sysctl_sched_wakeup_granularity)
+   resched_task(curr);
 }
 
 static struct task_struct *pick_next_task_fair(struct rq *rq)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Srivatsa Vaddagiri

On Tue, Sep 25, 2007 at 12:10:44PM +0200, Ingo Molnar wrote:
 So the patch below just removes the is_same_group() condition. But i can 
 still see bad (and obvious) latencies with Mike's 2-hogs test:
 
  taskset 01 perl -e 'while (1) {}' 
  nice -19 taskset 02 perl -e 'while (1) {}' 
 
 So something's amiss.

While I try recreating this myself, I wonder if this patch helps?

---
 kernel/sched_fair.c |   19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

Index: current/kernel/sched_fair.c
===
--- current.orig/kernel/sched_fair.c
+++ current/kernel/sched_fair.c
@@ -794,7 +794,8 @@ static void yield_task_fair(struct rq *r
 static void check_preempt_wakeup(struct rq *rq, struct task_struct *p)
 {
struct task_struct *curr = rq-curr;
-   struct cfs_rq *cfs_rq = task_cfs_rq(curr);
+   struct cfs_rq *cfs_rq = task_cfs_rq(curr), *pcfs_rq;
+   struct sched_entity *se = curr-se, *pse = p-se;
 
if (unlikely(rt_prio(p-prio))) {
update_rq_clock(rq);
@@ -802,11 +803,19 @@ static void check_preempt_wakeup(struct 
resched_task(curr);
return;
}
-   if (is_same_group(curr, p)) {
-   s64 delta = curr-se.vruntime - p-se.vruntime;
 
-   if (delta  (s64)sysctl_sched_wakeup_granularity)
-   resched_task(curr);
+   for_each_sched_entity(se) {
+   cfs_rq = cfs_rq_of(se);
+   pcfs_rq = cfs_rq_of(pse);
+
+   if (cfs_rq == pcfs_rq) {
+   s64 delta = se-vruntime - pse-vruntime;
+
+   if (delta  (s64)sysctl_sched_wakeup_granularity)
+   resched_task(curr);
+   break;
+   }
+   pse = pse-parent;
}
 }
 

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Ingo Molnar


* Srivatsa Vaddagiri [EMAIL PROTECTED] wrote:

 On Tue, Sep 25, 2007 at 12:10:44PM +0200, Ingo Molnar wrote:
  So the patch below just removes the is_same_group() condition. But i can 
  still see bad (and obvious) latencies with Mike's 2-hogs test:
  
   taskset 01 perl -e 'while (1) {}' 
   nice -19 taskset 02 perl -e 'while (1) {}' 
  
  So something's amiss.
 
 While I try recreating this myself, I wonder if this patch helps?

you should be able to recreate this easily by booting with maxcpus=1 and 
the commands above - then run a few instances of chew-max (without them 
being bound to any particular CPUs) and the latencies should show up.

i have tried your patch and it does not solve the problem - i think 
there's a more fundamental bug lurking, besides the wakeup latency 
problem.

Find below a /proc/sched_debug output of a really large latency. The 
latency is caused by the _huge_ (~450 seconds!) vruntime offset that 
'loop_silent' and 'sshd' has:

task   PID tree-key  switches  prio exec-runtime
 ---
 loop_silent  2391 55344.211189   203   120 55344.211189
sshd  2440513334.978030 4   120513334.978030
Rcat  2496513672.558835 4   120513672.558835

hm. perhaps this fixup in kernel/sched.c:set_task_cpu():

p-se.vruntime -= old_rq-cfs.min_vruntime - new_rq-cfs.min_vruntime;

needs to become properly group-hierarchy aware?

Ingo

--
Sched Debug Version: v0.05-v20, 2.6.23-rc7 #89
now at 95878.065440 msecs
  .sysctl_sched_latency: 20.00
  .sysctl_sched_min_granularity: 2.00
  .sysctl_sched_wakeup_granularity : 2.00
  .sysctl_sched_batch_wakeup_granularity   : 25.00
  .sysctl_sched_child_runs_first   : 0.01
  .sysctl_sched_features   : 3

cpu#0, 1828.868 MHz
  .nr_running: 3
  .load  : 3072
  .nr_switches   : 32032
  .nr_load_updates   : 95906
  .nr_uninterruptible: 4294967238
  .jiffies   : 4294763202
  .next_balance  : 4294.763420
  .curr-pid : 2496
  .clock : 95893.484495
  .idle_clock: 55385.089335
  .prev_clock_raw: 84753.749367
  .clock_warps   : 0
  .clock_overflows   : 1737
  .clock_deep_idle_events: 71815
  .clock_max_delta   : 0.999843
  .cpu_load[0]   : 3072
  .cpu_load[1]   : 2560
  .cpu_load[2]   : 2304
  .cpu_load[3]   : 2176
  .cpu_load[4]   : 2119

cfs_rq
  .exec_clock: 38202.223241
  .MIN_vruntime  : 36334.281860
  .min_vruntime  : 36334.279140
  .max_vruntime  : 36334.281860
  .spread: 0.00
  .spread0   : 0.00
  .nr_running: 2
  .load  : 3072
  .bkl_cnt   : 3934
  .nr_spread_over: 37

cfs_rq
  .exec_clock: 34769.316246
  .MIN_vruntime  : 55344.211189
  .min_vruntime  : 36334.279140
  .max_vruntime  : 513334.978030
  .spread: 457990.766841
  .spread0   : 0.00
  .nr_running: 2
  .load  : 2048
  .bkl_cnt   : 3934
  .nr_spread_over: 10

cfs_rq
  .exec_clock: 36.982394
  .MIN_vruntime  : 0.01
  .min_vruntime  : 36334.279140
  .max_vruntime  : 0.01
  .spread: 0.00
  .spread0   : 0.00
  .nr_running: 0
  .load  : 0
  .bkl_cnt   : 3934
  .nr_spread_over: 1

cfs_rq
  .exec_clock: 20.244893
  .MIN_vruntime  : 0.01
  .min_vruntime  : 36334.279140
  .max_vruntime  : 0.01
  .spread: 0.00
  .spread0   : 0.00
  .nr_running: 0
  .load  : 0
  .bkl_cnt   : 3934
  .nr_spread_over: 0

cfs_rq
  .exec_clock: 3305.155973
  .MIN_vruntime  : 0.01
  .min_vruntime  : 36334.279140
  .max_vruntime  : 0.01
  .spread: 0.00
  .spread0   : 0.00
  .nr_running: 1
  .load  : 1024
  .bkl_cnt   : 3934
  .nr_spread_over: 13

Re: [git] CFS-devel, latest code

2007-09-25 Thread Ingo Molnar


* Srivatsa Vaddagiri [EMAIL PROTECTED] wrote:

 On Tue, Sep 25, 2007 at 12:41:20AM -0700, Andrew Morton wrote:
  This doornails the Vaio.  After grub handover the screen remains black
  and the fan goes whir.
  
  http://userweb.kernel.org/~akpm/config-sony.txt
 
 This seems to be UP regression. Sorry abt that. I could recreate 
 the problem very easily with CONFIG_SMP turned off.
 
 Can you check if this patch works? Works for me here.

thanks - i've put this fix into the core group-scheduling patch.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Ingo Molnar


* Ingo Molnar [EMAIL PROTECTED] wrote:

 hm. perhaps this fixup in kernel/sched.c:set_task_cpu():
 
 p-se.vruntime -= old_rq-cfs.min_vruntime - new_rq-cfs.min_vruntime;
 
 needs to become properly group-hierarchy aware?

a quick first stab like the one below does not appear to solve the 
problem.

Ingo

---
Subject: sched: group scheduler SMP migration fix
From: Ingo Molnar [EMAIL PROTECTED]

group scheduler SMP migration fix.

Signed-off-by: Ingo Molnar [EMAIL PROTECTED]
---
 kernel/sched.c |9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Index: linux/kernel/sched.c
===
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -1039,7 +1039,8 @@ void set_task_cpu(struct task_struct *p,
 {
int old_cpu = task_cpu(p);
struct rq *old_rq = cpu_rq(old_cpu), *new_rq = cpu_rq(new_cpu);
-   u64 clock_offset;
+   struct sched_entity *se;
+   u64 clock_offset, voffset;
 
clock_offset = old_rq-clock - new_rq-clock;
 
@@ -1051,7 +1052,11 @@ void set_task_cpu(struct task_struct *p,
if (p-se.block_start)
p-se.block_start -= clock_offset;
 #endif
-   p-se.vruntime -= old_rq-cfs.min_vruntime - new_rq-cfs.min_vruntime;
+
+   se = p-se;
+   voffset = old_rq-cfs.min_vruntime - new_rq-cfs.min_vruntime;
+   for_each_sched_entity(se)
+   se-vruntime -= voffset;
 
__set_task_cpu(p, new_cpu);
 }
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Mike Galbraith

On Tue, 2007-09-25 at 15:58 +0530, Srivatsa Vaddagiri wrote:

 While I try recreating this myself, I wonder if this patch helps?

It didn't here, nor did tweaking root's share.  Booting with maxcpus=1,
I was unable to produce large latencies, but didn't try very many
things.

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Srivatsa Vaddagiri

On Tue, Sep 25, 2007 at 12:36:17PM +0200, Ingo Molnar wrote:
 hm. perhaps this fixup in kernel/sched.c:set_task_cpu():
 
 p-se.vruntime -= old_rq-cfs.min_vruntime - new_rq-cfs.min_vruntime;

This definitely does need some fixup, even though I am not sure yet if
it will solve completely the latency issue.

I tried the following patch. I *think* I see some improvement, wrt
latency seen when I type on the shell. Before this patch, I noticed
oddities like kill -9 chew-max-pid wont kill chew-max (it is queued in
runqueue waiting for a looong time to run before it can acknowledge
signal and exit). With this patch, I don't see such oddities ..So I am hoping 
it fixes the latency problem you are seeing as well.



Index: current/kernel/sched.c
===
--- current.orig/kernel/sched.c
+++ current/kernel/sched.c
@@ -1039,6 +1039,8 @@ void set_task_cpu(struct task_struct *p,
 {
int old_cpu = task_cpu(p);
struct rq *old_rq = cpu_rq(old_cpu), *new_rq = cpu_rq(new_cpu);
+   struct cfs_rq *old_cfsrq = task_cfs_rq(p),
+ *new_cfsrq = cpu_cfs_rq(old_cfsrq, new_cpu);
u64 clock_offset;
 
clock_offset = old_rq-clock - new_rq-clock;
@@ -1051,7 +1053,8 @@ void set_task_cpu(struct task_struct *p,
if (p-se.block_start)
p-se.block_start -= clock_offset;
 #endif
-   p-se.vruntime -= old_rq-cfs.min_vruntime - new_rq-cfs.min_vruntime;
+   p-se.vruntime -= old_cfsrq-min_vruntime -
+new_cfsrq-min_vruntime;
 
__set_task_cpu(p, new_cpu);
 }
 

--
Regards,
vatsa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Mike Galbraith

On Tue, 2007-09-25 at 14:28 +0200, Mike Galbraith wrote:
 On Tue, 2007-09-25 at 15:58 +0530, Srivatsa Vaddagiri wrote:
 
  While I try recreating this myself, I wonder if this patch helps?
 
 It didn't here, nor did tweaking root's share.  Booting with maxcpus=1,
 I was unable to produce large latencies, but didn't try very many
 things.

Easy way to make it pretty bad: pin a nice 0 loop to CPU0, pin a nice 19
loop to CPU1, then start an unpinned make.. more Xorg bouncing back and
forth I suppose.

se.wait_max  :14.105683
se.wait_max  :   316.943787
se.wait_max  :   692.884324
se.wait_max  :38.165534
se.wait_max  :   732.883492
se.wait_max  :   127.059784
se.wait_max  :63.403549
se.wait_max  :   372.933284

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Mike Galbraith

On Tue, 2007-09-25 at 18:21 +0530, Srivatsa Vaddagiri wrote:
 On Tue, Sep 25, 2007 at 12:36:17PM +0200, Ingo Molnar wrote:
  hm. perhaps this fixup in kernel/sched.c:set_task_cpu():
  
  p-se.vruntime -= old_rq-cfs.min_vruntime - 
  new_rq-cfs.min_vruntime;
 
 This definitely does need some fixup, even though I am not sure yet if
 it will solve completely the latency issue.
 
 I tried the following patch. I *think* I see some improvement, wrt
 latency seen when I type on the shell. Before this patch, I noticed
 oddities like kill -9 chew-max-pid wont kill chew-max (it is queued in
 runqueue waiting for a looong time to run before it can acknowledge
 signal and exit). With this patch, I don't see such oddities ..So I am hoping 
 it fixes the latency problem you are seeing as well.

http://lkml.org/lkml/2007/9/25/117 plus the below seems to be the SIlver
Bullet for the latencies I was seeing.

 Index: current/kernel/sched.c
 ===
 --- current.orig/kernel/sched.c
 +++ current/kernel/sched.c
 @@ -1039,6 +1039,8 @@ void set_task_cpu(struct task_struct *p,
  {
   int old_cpu = task_cpu(p);
   struct rq *old_rq = cpu_rq(old_cpu), *new_rq = cpu_rq(new_cpu);
 + struct cfs_rq *old_cfsrq = task_cfs_rq(p),
 +   *new_cfsrq = cpu_cfs_rq(old_cfsrq, new_cpu);
   u64 clock_offset;
  
   clock_offset = old_rq-clock - new_rq-clock;
 @@ -1051,7 +1053,8 @@ void set_task_cpu(struct task_struct *p,
   if (p-se.block_start)
   p-se.block_start -= clock_offset;
  #endif
 - p-se.vruntime -= old_rq-cfs.min_vruntime - new_rq-cfs.min_vruntime;
 + p-se.vruntime -= old_cfsrq-min_vruntime -
 +  new_cfsrq-min_vruntime;
  
   __set_task_cpu(p, new_cpu);
  }
  
 
 --
 Regards,
 vatsa

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Srivatsa Vaddagiri

On Tue, Sep 25, 2007 at 03:35:17PM +0200, Mike Galbraith wrote:
  I tried the following patch. I *think* I see some improvement, wrt
  latency seen when I type on the shell. Before this patch, I noticed
  oddities like kill -9 chew-max-pid wont kill chew-max (it is queued in
  runqueue waiting for a looong time to run before it can acknowledge
  signal and exit). With this patch, I don't see such oddities ..So I am 
  hoping 
  it fixes the latency problem you are seeing as well.
 
 http://lkml.org/lkml/2007/9/25/117 plus the below seems to be the SIlver
 Bullet for the latencies I was seeing.

Cool ..Thanks for the quick feedback.

Ingo, do the two patches fix the latency problems you were seeing as
well?

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[git] CFS-devel, latest code

2007-09-25 Thread Ingo Molnar


The latest sched-devel.git tree can be pulled from:
  
   git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel.git
 
This is a quick iteration after yesterday's: a couple of group 
scheduling bugs were found/debugged and fixed by Srivatsa Vaddagiri and 
Mike Galbraith. There's also a yield fix from Dmitry Adamushko, a build 
fix from S.Ceglar Onur and Andrew Morton, a cleanup from Hiroshi 
Shimamoto and the usual stream of goodies from Peter Zijlstra. Rebased 
it to -rc8 as well.

there are no known regressions at the moment in the sched-devel.git 
codebase. (yay :)

Ingo

-
the shortlog relative to 2.6.23-rc8:
 
Dmitry Adamushko (9):
  sched: clean up struct load_stat
  sched: clean up schedstat block in dequeue_entity()
  sched: sched_setscheduler() fix
  sched: add set_curr_task() calls
  sched: do not keep current in the tree and get rid of 
sched_entity::fair_key
  sched: optimize task_new_fair()
  sched: simplify sched_class::yield_task()
  sched: rework enqueue/dequeue_entity() to get rid of set_curr_task()
  sched: yield fix

Hiroshi Shimamoto (1):
  sched: clean up sched_fork()

Ingo Molnar (44):
  sched: fix new-task method
  sched: resched task in task_new_fair()
  sched: small sched_debug cleanup
  sched: debug: track maximum 'slice'
  sched: uniform tunings
  sched: use constants if !CONFIG_SCHED_DEBUG
  sched: remove stat_gran
  sched: remove precise CPU load
  sched: remove precise CPU load calculations #2
  sched: track cfs_rq-curr on !group-scheduling too
  sched: cleanup: simplify cfs_rq_curr() methods
  sched: uninline __enqueue_entity()/__dequeue_entity()
  sched: speed up update_load_add/_sub()
  sched: clean up calc_weighted()
  sched: introduce se-vruntime
  sched: move sched_feat() definitions
  sched: optimize vruntime based scheduling
  sched: simplify check_preempt() methods
  sched: wakeup granularity fix
  sched: add se-vruntime debugging
  sched: add more vruntime statistics
  sched: debug: update exec_clock only when SCHED_DEBUG
  sched: remove wait_runtime limit
  sched: remove wait_runtime fields and features
  sched: x86: allow single-depth wchan output
  sched: fix delay accounting performance regression
  sched: prettify /proc/sched_debug output
  sched: enhance debug output
  sched: kernel/sched_fair.c whitespace cleanups
  sched: fair-group sched, cleanups
  sched: enable CONFIG_FAIR_GROUP_SCHED=y by default
  sched debug: BKL usage statistics
  sched: remove unneeded tunables
  sched debug: print settings
  sched debug: more width for parameter printouts
  sched: entity_key() fix
  sched: remove condition from set_task_cpu()
  sched: remove last_min_vruntime effect
  sched: undo some of the recent changes
  sched: fix place_entity()
  sched: fix sched_fork()
  sched: remove set_leftmost()
  sched: clean up schedstats, cnt - count
  sched: cleanup, remove stale comment

Matthias Kaehlcke (1):
  sched: use list_for_each_entry_safe() in __wake_up_common()

Mike Galbraith (2):
  sched: fix SMP migration latencies
  sched: fix formatting of /proc/sched_debug

Peter Zijlstra (12):
  sched: simplify SCHED_FEAT_* code
  sched: new task placement for vruntime
  sched: simplify adaptive latency
  sched: clean up new task placement
  sched: add tree based averages
  sched: handle vruntime overflow
  sched: better min_vruntime tracking
  sched: add vslice
  sched debug: check spread
  sched: max_vruntime() simplification
  sched: clean up min_vruntime use
  sched: speed up and simplify vslice calculations

S.Ceglar Onur (1):
  sched debug: BKL usage statistics, fix

Srivatsa Vaddagiri (9):
  sched: group-scheduler core
  sched: revert recent removal of set_curr_task()
  sched: fix minor bug in yield
  sched: print nr_running and load in /proc/sched_debug
  sched: print rq-cfs stats
  sched: clean up code under CONFIG_FAIR_GROUP_SCHED
  sched: add fair-user scheduler
  sched: group scheduler wakeup latency fix
  sched: group scheduler SMP migration fix

 arch/i386/Kconfig   |   11 
 fs/proc/base.c  |2 
 include/linux/sched.h   |   55 ++-
 init/Kconfig|   21 +
 kernel/delayacct.c  |2 
 kernel/sched.c  |  577 +-
 kernel/sched_debug.c|  250 +++-
 kernel/sched_fair.c |  718 +---
 kernel/sched_idletask.c |5 
 kernel/sched_rt.c   |   12 
 kernel/sched_stats.h|   28 -
 kernel/sysctl.c |   31 --
 kernel/user.c   |   43 ++
 13 files changed, 954 insertions(+), 801 deletions(-)
-
To unsubscribe from this list: send the line unsubscribe

Re: [git] CFS-devel, latest code

2007-09-25 Thread Srivatsa Vaddagiri

On Tue, Sep 25, 2007 at 01:33:06PM +0200, Ingo Molnar wrote:
  hm. perhaps this fixup in kernel/sched.c:set_task_cpu():
  
  p-se.vruntime -= old_rq-cfs.min_vruntime - 
  new_rq-cfs.min_vruntime;
  
  needs to become properly group-hierarchy aware?

You seem to have hit the nerve for this problem. The two patches I sent:

http://lkml.org/lkml/2007/9/25/117
http://lkml.org/lkml/2007/9/25/168

partly help, but we can do better.

 ===
 --- linux.orig/kernel/sched.c
 +++ linux/kernel/sched.c
 @@ -1039,7 +1039,8 @@ void set_task_cpu(struct task_struct *p,
  {
   int old_cpu = task_cpu(p);
   struct rq *old_rq = cpu_rq(old_cpu), *new_rq = cpu_rq(new_cpu);
 - u64 clock_offset;
 + struct sched_entity *se;
 + u64 clock_offset, voffset;
 
   clock_offset = old_rq-clock - new_rq-clock;
 
 @@ -1051,7 +1052,11 @@ void set_task_cpu(struct task_struct *p,
   if (p-se.block_start)
   p-se.block_start -= clock_offset;
  #endif
 - p-se.vruntime -= old_rq-cfs.min_vruntime - new_rq-cfs.min_vruntime;
 +
 + se = p-se;
 + voffset = old_rq-cfs.min_vruntime - new_rq-cfs.min_vruntime;

This one feels wrong, although I can't express my reaction correctly ..

 + for_each_sched_entity(se)
 + se-vruntime -= voffset;

Note that parent entities for a task is per-cpu. So if a task A
belonging to userid guest hops from CPU0 to CPU1, then it gets a new parent 
entity as well, which is different from its parent entity on CPU0.

Before:
taskA-se.parent = guest's tg-se[0]

After:
taskA-se.parent = guest's tg-se[1]

So walking up the entity hierarchy and fixing up (parent)se-vruntime will do
little good after the task has moved to a new cpu.

IMO, we need to be doing this :

- For dequeue of higher level sched entities, simulate as if
  they are going to sleep 
- For enqueue of higher level entities, simulate as if they are
  waking up. This will cause enqueue_entity() to reset their
  vruntime (to existing value for cfs_rq-min_vruntime) when they 
  wakeup.

If we don't do this, then lets say a group had only one task (A) and it
moves from CPU0 to CPU1. Then on CPU1, when group level entity for task
A is enqueued, it will have a very low vruntime (since it was never
running) and this will give task A unlimited cpu time, until its group
entity catches up with all the sleep time.

Let me try a fix for this next ..

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Daniel Walker

On Tue, 2007-09-25 at 08:45 +0200, Ingo Molnar wrote:
 * Daniel Walker [EMAIL PROTECTED] wrote:
 
  On Mon, 2007-09-24 at 23:45 +0200, Ingo Molnar wrote:
   Lots of scheduler updates in the past few days, done by many people. 
   Most importantly, the SMP latency problems reported and debugged by
   Mike 
   Galbraith should be fixed for good now. 
  
  Does this have anything to do with idle balancing ? I noticed some 
  fairly large latencies in that code in 2.6.23-rc's ..
 
 any measurements?

Yes, I made this a while ago,

ftp://source.mvista.com/pub/dwalker/misc/long-cfs-load-balance-trace.txt

This was with PREEMPT_RT on btw, so it's not the most recent kernel. I
was able to reproduce it in all the -rc's I tried.

Daniel

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Srivatsa Vaddagiri

On Tue, Sep 25, 2007 at 04:44:43PM +0200, Ingo Molnar wrote:
 
 The latest sched-devel.git tree can be pulled from:
   

 git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel.git
 

This is required for it to compile.

---
 include/linux/sched.h |1 +
 1 files changed, 1 insertion(+)

Index: current/include/linux/sched.h
===
--- current.orig/include/linux/sched.h
+++ current/include/linux/sched.h
@@ -1404,6 +1404,7 @@ extern unsigned int sysctl_sched_wakeup_
 extern unsigned int sysctl_sched_batch_wakeup_granularity;
 extern unsigned int sysctl_sched_child_runs_first;
 extern unsigned int sysctl_sched_features;
+extern unsigned int sysctl_sched_nr_latency;
 #endif
 
 extern unsigned int sysctl_sched_compat_yield;

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Srivatsa Vaddagiri

On Tue, Sep 25, 2007 at 09:34:20PM +0530, Srivatsa Vaddagiri wrote:
 On Tue, Sep 25, 2007 at 04:44:43PM +0200, Ingo Molnar wrote:
  
  The latest sched-devel.git tree can be pulled from:

 
  git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel.git
  
 
 This is required for it to compile.
 
 ---
  include/linux/sched.h |1 +
  1 files changed, 1 insertion(+)
 
 Index: current/include/linux/sched.h
 ===
 --- current.orig/include/linux/sched.h
 +++ current/include/linux/sched.h
 @@ -1404,6 +1404,7 @@ extern unsigned int sysctl_sched_wakeup_
  extern unsigned int sysctl_sched_batch_wakeup_granularity;
  extern unsigned int sysctl_sched_child_runs_first;
  extern unsigned int sysctl_sched_features;
 +extern unsigned int sysctl_sched_nr_latency;
  #endif
  
  extern unsigned int sysctl_sched_compat_yield;

and this:

---
 kernel/sched_debug.c |1 -
 1 files changed, 1 deletion(-)

Index: current/kernel/sched_debug.c
===
--- current.orig/kernel/sched_debug.c
+++ current/kernel/sched_debug.c
@@ -210,7 +210,6 @@ static int sched_debug_show(struct seq_f
 #define PN(x) \
SEQ_printf(m,   .%-40s: %Ld.%06ld\n, #x, SPLIT_NS(x))
PN(sysctl_sched_latency);
-   PN(sysctl_sched_min_granularity);
PN(sysctl_sched_wakeup_granularity);
PN(sysctl_sched_batch_wakeup_granularity);
PN(sysctl_sched_child_runs_first);

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-25 Thread Dmitry Adamushko


humm... I think, it'd be safer to have something like the following
change in place.

The thing is that __pick_next_entity() must never be called when
first_fair(cfs_rq) == NULL. It wouldn't be a problem, should 'run_node'
be the very first field of 'struct sched_entity' (and it's the second).

The 'nr_running != 0' check is _not_ enough, due to the fact that
'current' is not within the tree. Generic paths are ok (e.g. schedule()
as put_prev_task() is called previously)... I'm more worried about e.g.
migration_call() - CPU_DEAD_FROZEN - migrate_dead_tasks()... if
'current' == rq-idle, no problems.. if it's one of the SCHED_NORMAL
tasks (or imagine, some other use-cases in the future -- i.e. we should
not make outer world dependent on internal details of sched_fair class)
-- it may be Houston, we've got a problem case.

it's +16 bytes to the .text. Another variant is to make 'run_node' the
first data member of 'struct sched_entity' but an additional check (se !
= NULL) is still needed in pick_next_entity().

what do you think?


---
 diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index dae714a..33b2376 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -563,9 +563,12 @@ set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity 
*se)
 
 static struct sched_entity *pick_next_entity(struct cfs_rq *cfs_rq)
 {
-   struct sched_entity *se = __pick_next_entity(cfs_rq);
-
-   set_next_entity(cfs_rq, se);
+   struct sched_entity *se = NULL;
+   
+   if (first_fair(cfs_rq)) {
+   se = __pick_next_entity(cfs_rq);
+   set_next_entity(cfs_rq, se);
+   }
 
return se;
 }

---


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-24 Thread Daniel Walker

On Mon, 2007-09-24 at 23:45 +0200, Ingo Molnar wrote:
> Lots of scheduler updates in the past few days, done by many people. 
> Most importantly, the SMP latency problems reported and debugged by
> Mike 
> Galbraith should be fixed for good now. 

Does this have anything to do with idle balancing ? I noticed some
fairly large latencies in that code in 2.6.23-rc's ..

Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, latest code

2007-09-24 Thread Ingo Molnar


* Andrew Morton <[EMAIL PROTECTED]> wrote:

> On Mon, 24 Sep 2007 23:45:37 +0200
> Ingo Molnar <[EMAIL PROTECTED]> wrote:
> 
> > 
> > The latest sched-devel.git tree can be pulled from:
> >  
> >   
> > git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel.git
> 
> I'm pulling linux-2.6-sched.git, and it's oopsing all over the place 
> on ia64, and Lee's observations about set_leftmost()'s weirdness are 
> pertinent.
> 
> Should I instead be pulling linux-2.6-sched-devel.git?

yeah, please pull that one.

linux-2.6-sched.git by mistake contained an older sched-devel tree for 
about a day (where your scripts picked it up). I've restored that one to 
-rc7 meanwhile. It's only supposed to contain strict fixes for upstream. 
(none at the moment)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 >

1 - 100 of 106 matches

Mail list logo