Re: [patch 5/5] sched: consolidate sbe sbf

2005-04-07 Thread Ingo Molnar

* Nick Piggin <[EMAIL PROTECTED]> wrote:

> We could just do a set_cpus_allowed, or take the lock, 
> set_cpus_allowed, and take the new lock, but that's probably a bit 
> heavy if we can avoid it. In the interests of speed in this fast path, 
> do you think we can do this in sched_fork, before the task has even 
> been put on the tasklist?

yeah, that shouldnt be a problem. Technically we set cpus_allowed up 
under the tasklist lock just to be non-preemptible and to copy the 
parent's _current_ affinity to the child. But sched_fork() is called 
just before and if the parent got its affinity changed between the two 
calls, so what? I'd move all of this code into sched_fork().

> That would avoid all locking problems. Passing clone_flags into 
> sched_fork would not be a problem if we want to distinguish fork() and 
> clone(CLONE_VM).

sure, that was the plan all along with sched_fork() anyway. (i think the 
initial versions had the flag)

> Yes? I'll cut a new patch to do just that.

sure, fine by me.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 5/5] sched: consolidate sbe sbf

2005-04-07 Thread Ingo Molnar

* Nick Piggin <[EMAIL PROTECTED]> wrote:

> Nick Piggin wrote:
> 
> >
> >One problem I just noticed, sorry. This is doing set_cpus_allowed
> >without holding the runqueue lock and without checking the hard
> >affinity mask either.
> >
> 
> Err, that is to say set_task_cpu, not set_cpus_allowed.

yes. The whole cpus_allowed+set_task_cpu() section in copy_process() 
should move into sched_fork().

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 5/5] sched: consolidate sbe sbf

2005-04-07 Thread Ingo Molnar

* Nick Piggin [EMAIL PROTECTED] wrote:

 Nick Piggin wrote:
 
 
 One problem I just noticed, sorry. This is doing set_cpus_allowed
 without holding the runqueue lock and without checking the hard
 affinity mask either.
 
 
 Err, that is to say set_task_cpu, not set_cpus_allowed.

yes. The whole cpus_allowed+set_task_cpu() section in copy_process() 
should move into sched_fork().

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 5/5] sched: consolidate sbe sbf

2005-04-07 Thread Ingo Molnar

* Nick Piggin [EMAIL PROTECTED] wrote:

 We could just do a set_cpus_allowed, or take the lock, 
 set_cpus_allowed, and take the new lock, but that's probably a bit 
 heavy if we can avoid it. In the interests of speed in this fast path, 
 do you think we can do this in sched_fork, before the task has even 
 been put on the tasklist?

yeah, that shouldnt be a problem. Technically we set cpus_allowed up 
under the tasklist lock just to be non-preemptible and to copy the 
parent's _current_ affinity to the child. But sched_fork() is called 
just before and if the parent got its affinity changed between the two 
calls, so what? I'd move all of this code into sched_fork().

 That would avoid all locking problems. Passing clone_flags into 
 sched_fork would not be a problem if we want to distinguish fork() and 
 clone(CLONE_VM).

sure, that was the plan all along with sched_fork() anyway. (i think the 
initial versions had the flag)

 Yes? I'll cut a new patch to do just that.

sure, fine by me.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 5/5] sched: consolidate sbe sbf

2005-04-06 Thread Nick Piggin
Nick Piggin wrote:
One problem I just noticed, sorry. This is doing set_cpus_allowed
without holding the runqueue lock and without checking the hard
affinity mask either.
Err, that is to say set_task_cpu, not set_cpus_allowed.
--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 5/5] sched: consolidate sbe sbf

2005-04-06 Thread Nick Piggin
Ingo Molnar wrote:
* Nick Piggin <[EMAIL PROTECTED]> wrote:

5/5
Any ideas about what to do with schedstats?
Do we really need balance on exec and fork as seperate
statistics?

Consolidate balance-on-exec with balance-on-fork. This is made easy
by the sched-domains RCU patches.
As well as the general goodness of code reduction, this allows
the runqueues to be unlocked during balance-on-fork.
schedstats is a problem. Maybe just have balance-on-event instead
of distinguishing fork and exec?
Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

looks good.
One problem I just noticed, sorry. This is doing set_cpus_allowed
without holding the runqueue lock and without checking the hard
affinity mask either.
We could just do a set_cpus_allowed, or take the lock, set_cpus_allowed,
and take the new lock, but that's probably a bit heavy if we can avoid it.
In the interests of speed in this fast path, do you think we can do this
in sched_fork, before the task has even been put on the tasklist?
That would avoid all locking problems. Passing clone_flags into sched_fork
would not be a problem if we want to distinguish fork() and clone(CLONE_VM).
Yes? I'll cut a new patch to do just that.
 Acked-by: Ingo Molnar <[EMAIL PROTECTED]>
while the code is now consolidated, i think we still need the separate 
fork/exec stats for schedstat.
This makes it a bit harder then, to get good stats in the sched-domain
(which is really what we want). It would basically mean doing
if (balance fork)
schedstat_inc(sbf_cnt);
else if (balance exec)
schedstat_inc(sbe_cnt);
etc.
That should all get optimised out by the compiler, but still a bit ugly.
Any ideas?
--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 5/5] sched: consolidate sbe sbf

2005-04-06 Thread Ingo Molnar

* Nick Piggin <[EMAIL PROTECTED]> wrote:

> 5/5
> 
> Any ideas about what to do with schedstats?
> Do we really need balance on exec and fork as seperate
> statistics?

> Consolidate balance-on-exec with balance-on-fork. This is made easy
> by the sched-domains RCU patches.
> 
> As well as the general goodness of code reduction, this allows
> the runqueues to be unlocked during balance-on-fork.
> 
> schedstats is a problem. Maybe just have balance-on-event instead
> of distinguishing fork and exec?
> 
> Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

looks good.

 Acked-by: Ingo Molnar <[EMAIL PROTECTED]>

while the code is now consolidated, i think we still need the separate 
fork/exec stats for schedstat. We have 3 conceptual ways to start off a 
new context: fork(), pthread_create() and execve(), and applications use 
them in different patterns. We have different flags and tuning 
parameters for two of them (which might have to become 3, i'm not 
entirely convinced we'll be able to ignore the 'process vs. thread' 
condition in wake_up_new_task(), STREAM is a really bad example in that 
sense), so we need 2 (or 3) separate stats.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 5/5] sched: consolidate sbe sbf

2005-04-06 Thread Ingo Molnar

* Nick Piggin [EMAIL PROTECTED] wrote:

 5/5
 
 Any ideas about what to do with schedstats?
 Do we really need balance on exec and fork as seperate
 statistics?

 Consolidate balance-on-exec with balance-on-fork. This is made easy
 by the sched-domains RCU patches.
 
 As well as the general goodness of code reduction, this allows
 the runqueues to be unlocked during balance-on-fork.
 
 schedstats is a problem. Maybe just have balance-on-event instead
 of distinguishing fork and exec?
 
 Signed-off-by: Nick Piggin [EMAIL PROTECTED]

looks good.

 Acked-by: Ingo Molnar [EMAIL PROTECTED]

while the code is now consolidated, i think we still need the separate 
fork/exec stats for schedstat. We have 3 conceptual ways to start off a 
new context: fork(), pthread_create() and execve(), and applications use 
them in different patterns. We have different flags and tuning 
parameters for two of them (which might have to become 3, i'm not 
entirely convinced we'll be able to ignore the 'process vs. thread' 
condition in wake_up_new_task(), STREAM is a really bad example in that 
sense), so we need 2 (or 3) separate stats.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 5/5] sched: consolidate sbe sbf

2005-04-06 Thread Nick Piggin
Ingo Molnar wrote:
* Nick Piggin [EMAIL PROTECTED] wrote:

5/5
Any ideas about what to do with schedstats?
Do we really need balance on exec and fork as seperate
statistics?

Consolidate balance-on-exec with balance-on-fork. This is made easy
by the sched-domains RCU patches.
As well as the general goodness of code reduction, this allows
the runqueues to be unlocked during balance-on-fork.
schedstats is a problem. Maybe just have balance-on-event instead
of distinguishing fork and exec?
Signed-off-by: Nick Piggin [EMAIL PROTECTED]

looks good.
One problem I just noticed, sorry. This is doing set_cpus_allowed
without holding the runqueue lock and without checking the hard
affinity mask either.
We could just do a set_cpus_allowed, or take the lock, set_cpus_allowed,
and take the new lock, but that's probably a bit heavy if we can avoid it.
In the interests of speed in this fast path, do you think we can do this
in sched_fork, before the task has even been put on the tasklist?
That would avoid all locking problems. Passing clone_flags into sched_fork
would not be a problem if we want to distinguish fork() and clone(CLONE_VM).
Yes? I'll cut a new patch to do just that.
 Acked-by: Ingo Molnar [EMAIL PROTECTED]
while the code is now consolidated, i think we still need the separate 
fork/exec stats for schedstat.
This makes it a bit harder then, to get good stats in the sched-domain
(which is really what we want). It would basically mean doing
if (balance fork)
schedstat_inc(sbf_cnt);
else if (balance exec)
schedstat_inc(sbe_cnt);
etc.
That should all get optimised out by the compiler, but still a bit ugly.
Any ideas?
--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 5/5] sched: consolidate sbe sbf

2005-04-06 Thread Nick Piggin
Nick Piggin wrote:
One problem I just noticed, sorry. This is doing set_cpus_allowed
without holding the runqueue lock and without checking the hard
affinity mask either.
Err, that is to say set_task_cpu, not set_cpus_allowed.
--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 5/5] sched: consolidate sbe sbf

2005-04-05 Thread Nick Piggin
5/5
Any ideas about what to do with schedstats?
Do we really need balance on exec and fork as seperate
statistics?
Consolidate balance-on-exec with balance-on-fork. This is made easy
by the sched-domains RCU patches.

As well as the general goodness of code reduction, this allows
the runqueues to be unlocked during balance-on-fork.

schedstats is a problem. Maybe just have balance-on-event instead
of distinguishing fork and exec?

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

Index: linux-2.6/kernel/sched.c
===
--- linux-2.6.orig/kernel/sched.c   2005-04-05 18:39:14.0 +1000
+++ linux-2.6/kernel/sched.c2005-04-05 18:40:18.0 +1000
@@ -1013,8 +1013,57 @@ static int find_idlest_cpu(struct sched_
return idlest;
 }
 
+/*
+ * sched_balance_self: balance the current task (running on cpu) in domains
+ * that have the 'flag' flag set. In practice, this is SD_BALANCE_FORK and
+ * SD_BALANCE_EXEC.
+ *
+ * Balance, ie. select the least loaded group.
+ *
+ * Returns the target CPU number, or the same CPU if no balancing is needed.
+ *
+ * preempt must be disabled.
+ */
+static int sched_balance_self(int cpu, int flag)
+{
+   struct task_struct *t = current;
+   struct sched_domain *tmp, *sd = NULL;
 
-#endif
+   for_each_domain(cpu, tmp)
+   if (tmp->flags & flag)
+   sd = tmp;
+
+   while (sd) {
+   cpumask_t span;
+   struct sched_group *group;
+   int new_cpu;
+
+   span = sd->span;
+   group = find_idlest_group(sd, t, cpu);
+   if (!group)
+   goto nextlevel;
+
+   new_cpu = find_idlest_cpu(group, cpu);
+   if (new_cpu == -1 || new_cpu == cpu)
+   goto nextlevel;
+
+   /* Now try balancing at a lower domain level */
+   cpu = new_cpu;
+nextlevel:
+   sd = NULL;
+   for_each_domain(cpu, tmp) {
+   if (cpus_subset(span, tmp->span))
+   break;
+   if (tmp->flags & flag)
+   sd = tmp;
+   }
+   /* while loop will break here if sd == NULL */
+   }
+
+   return cpu;
+}
+
+#endif /* CONFIG_SMP */
 
 /*
  * wake_idle() will wake a task on an idle cpu if task->cpu is
@@ -1295,63 +1344,22 @@ void fastcall wake_up_new_task(task_t * 
int this_cpu, cpu;
runqueue_t *rq, *this_rq;
 #ifdef CONFIG_SMP
-   struct sched_domain *tmp, *sd = NULL;
-#endif
+   int new_cpu;
 
+   cpu = task_cpu(p);
+   preempt_disable();
+   new_cpu = sched_balance_self(cpu, SD_BALANCE_FORK);
+   preempt_enable();
+   if (new_cpu != cpu)
+   set_task_cpu(p, new_cpu);
+#endif
+   
+   cpu = task_cpu(p);
rq = task_rq_lock(p, );
-   BUG_ON(p->state != TASK_RUNNING);
this_cpu = smp_processor_id();
-   cpu = task_cpu(p);
-
-#ifdef CONFIG_SMP
-   for_each_domain(cpu, tmp)
-   if (tmp->flags & SD_BALANCE_FORK)
-   sd = tmp;
-
-   if (sd) {
-   cpumask_t span;
-   int new_cpu;
-   struct sched_group *group;
-
-again:
-   schedstat_inc(sd, sbf_cnt);
-   span = sd->span;
-   cpu = task_cpu(p);
-   group = find_idlest_group(sd, p, cpu);
-   if (!group) {
-   schedstat_inc(sd, sbf_balanced);
-   goto nextlevel;
-   }
 
-   new_cpu = find_idlest_cpu(group, cpu);
-   if (new_cpu == -1 || new_cpu == cpu) {
-   schedstat_inc(sd, sbf_balanced);
-   goto nextlevel;
-   }
-
-   if (cpu_isset(new_cpu, p->cpus_allowed)) {
-   schedstat_inc(sd, sbf_pushed);
-   set_task_cpu(p, new_cpu);
-   task_rq_unlock(rq, );
-   rq = task_rq_lock(p, );
-   cpu = task_cpu(p);
-   }
-
-   /* Now try balancing at a lower domain level */
-nextlevel:
-   sd = NULL;
-   for_each_domain(cpu, tmp) {
-   if (cpus_subset(span, tmp->span))
-   break;
-   if (tmp->flags & SD_BALANCE_FORK)
-   sd = tmp;
-   }
-
-   if (sd)
-   goto again;
-   }
+   BUG_ON(p->state != TASK_RUNNING);
 
-#endif
/*
 * We decrease the sleep average of forking parents
 * and children as well, to keep max-interactive tasks
@@ -1699,59 +1707,17 @@ out:
task_rq_unlock(rq, );
 }
 
-/*
- * sched_exec(): find the highest-level, exec-balance-capable
- * domain and try to migrate the task to the least 

[patch 5/5] sched: consolidate sbe sbf

2005-04-05 Thread Nick Piggin
5/5
Any ideas about what to do with schedstats?
Do we really need balance on exec and fork as seperate
statistics?
Consolidate balance-on-exec with balance-on-fork. This is made easy
by the sched-domains RCU patches.

As well as the general goodness of code reduction, this allows
the runqueues to be unlocked during balance-on-fork.

schedstats is a problem. Maybe just have balance-on-event instead
of distinguishing fork and exec?

Signed-off-by: Nick Piggin [EMAIL PROTECTED]

Index: linux-2.6/kernel/sched.c
===
--- linux-2.6.orig/kernel/sched.c   2005-04-05 18:39:14.0 +1000
+++ linux-2.6/kernel/sched.c2005-04-05 18:40:18.0 +1000
@@ -1013,8 +1013,57 @@ static int find_idlest_cpu(struct sched_
return idlest;
 }
 
+/*
+ * sched_balance_self: balance the current task (running on cpu) in domains
+ * that have the 'flag' flag set. In practice, this is SD_BALANCE_FORK and
+ * SD_BALANCE_EXEC.
+ *
+ * Balance, ie. select the least loaded group.
+ *
+ * Returns the target CPU number, or the same CPU if no balancing is needed.
+ *
+ * preempt must be disabled.
+ */
+static int sched_balance_self(int cpu, int flag)
+{
+   struct task_struct *t = current;
+   struct sched_domain *tmp, *sd = NULL;
 
-#endif
+   for_each_domain(cpu, tmp)
+   if (tmp-flags  flag)
+   sd = tmp;
+
+   while (sd) {
+   cpumask_t span;
+   struct sched_group *group;
+   int new_cpu;
+
+   span = sd-span;
+   group = find_idlest_group(sd, t, cpu);
+   if (!group)
+   goto nextlevel;
+
+   new_cpu = find_idlest_cpu(group, cpu);
+   if (new_cpu == -1 || new_cpu == cpu)
+   goto nextlevel;
+
+   /* Now try balancing at a lower domain level */
+   cpu = new_cpu;
+nextlevel:
+   sd = NULL;
+   for_each_domain(cpu, tmp) {
+   if (cpus_subset(span, tmp-span))
+   break;
+   if (tmp-flags  flag)
+   sd = tmp;
+   }
+   /* while loop will break here if sd == NULL */
+   }
+
+   return cpu;
+}
+
+#endif /* CONFIG_SMP */
 
 /*
  * wake_idle() will wake a task on an idle cpu if task-cpu is
@@ -1295,63 +1344,22 @@ void fastcall wake_up_new_task(task_t * 
int this_cpu, cpu;
runqueue_t *rq, *this_rq;
 #ifdef CONFIG_SMP
-   struct sched_domain *tmp, *sd = NULL;
-#endif
+   int new_cpu;
 
+   cpu = task_cpu(p);
+   preempt_disable();
+   new_cpu = sched_balance_self(cpu, SD_BALANCE_FORK);
+   preempt_enable();
+   if (new_cpu != cpu)
+   set_task_cpu(p, new_cpu);
+#endif
+   
+   cpu = task_cpu(p);
rq = task_rq_lock(p, flags);
-   BUG_ON(p-state != TASK_RUNNING);
this_cpu = smp_processor_id();
-   cpu = task_cpu(p);
-
-#ifdef CONFIG_SMP
-   for_each_domain(cpu, tmp)
-   if (tmp-flags  SD_BALANCE_FORK)
-   sd = tmp;
-
-   if (sd) {
-   cpumask_t span;
-   int new_cpu;
-   struct sched_group *group;
-
-again:
-   schedstat_inc(sd, sbf_cnt);
-   span = sd-span;
-   cpu = task_cpu(p);
-   group = find_idlest_group(sd, p, cpu);
-   if (!group) {
-   schedstat_inc(sd, sbf_balanced);
-   goto nextlevel;
-   }
 
-   new_cpu = find_idlest_cpu(group, cpu);
-   if (new_cpu == -1 || new_cpu == cpu) {
-   schedstat_inc(sd, sbf_balanced);
-   goto nextlevel;
-   }
-
-   if (cpu_isset(new_cpu, p-cpus_allowed)) {
-   schedstat_inc(sd, sbf_pushed);
-   set_task_cpu(p, new_cpu);
-   task_rq_unlock(rq, flags);
-   rq = task_rq_lock(p, flags);
-   cpu = task_cpu(p);
-   }
-
-   /* Now try balancing at a lower domain level */
-nextlevel:
-   sd = NULL;
-   for_each_domain(cpu, tmp) {
-   if (cpus_subset(span, tmp-span))
-   break;
-   if (tmp-flags  SD_BALANCE_FORK)
-   sd = tmp;
-   }
-
-   if (sd)
-   goto again;
-   }
+   BUG_ON(p-state != TASK_RUNNING);
 
-#endif
/*
 * We decrease the sleep average of forking parents
 * and children as well, to keep max-interactive tasks
@@ -1699,59 +1707,17 @@ out:
task_rq_unlock(rq, flags);
 }
 
-/*
- * sched_exec(): find the highest-level, exec-balance-capable
- * domain and try to migrate the task to the least