Re: [patch 5/5] sched: consolidate sbe sbf
* Nick Piggin <[EMAIL PROTECTED]> wrote: > We could just do a set_cpus_allowed, or take the lock, > set_cpus_allowed, and take the new lock, but that's probably a bit > heavy if we can avoid it. In the interests of speed in this fast path, > do you think we can do this in sched_fork, before the task has even > been put on the tasklist? yeah, that shouldnt be a problem. Technically we set cpus_allowed up under the tasklist lock just to be non-preemptible and to copy the parent's _current_ affinity to the child. But sched_fork() is called just before and if the parent got its affinity changed between the two calls, so what? I'd move all of this code into sched_fork(). > That would avoid all locking problems. Passing clone_flags into > sched_fork would not be a problem if we want to distinguish fork() and > clone(CLONE_VM). sure, that was the plan all along with sched_fork() anyway. (i think the initial versions had the flag) > Yes? I'll cut a new patch to do just that. sure, fine by me. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 5/5] sched: consolidate sbe sbf
* Nick Piggin <[EMAIL PROTECTED]> wrote: > Nick Piggin wrote: > > > > >One problem I just noticed, sorry. This is doing set_cpus_allowed > >without holding the runqueue lock and without checking the hard > >affinity mask either. > > > > Err, that is to say set_task_cpu, not set_cpus_allowed. yes. The whole cpus_allowed+set_task_cpu() section in copy_process() should move into sched_fork(). Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 5/5] sched: consolidate sbe sbf
* Nick Piggin [EMAIL PROTECTED] wrote: Nick Piggin wrote: One problem I just noticed, sorry. This is doing set_cpus_allowed without holding the runqueue lock and without checking the hard affinity mask either. Err, that is to say set_task_cpu, not set_cpus_allowed. yes. The whole cpus_allowed+set_task_cpu() section in copy_process() should move into sched_fork(). Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 5/5] sched: consolidate sbe sbf
* Nick Piggin [EMAIL PROTECTED] wrote: We could just do a set_cpus_allowed, or take the lock, set_cpus_allowed, and take the new lock, but that's probably a bit heavy if we can avoid it. In the interests of speed in this fast path, do you think we can do this in sched_fork, before the task has even been put on the tasklist? yeah, that shouldnt be a problem. Technically we set cpus_allowed up under the tasklist lock just to be non-preemptible and to copy the parent's _current_ affinity to the child. But sched_fork() is called just before and if the parent got its affinity changed between the two calls, so what? I'd move all of this code into sched_fork(). That would avoid all locking problems. Passing clone_flags into sched_fork would not be a problem if we want to distinguish fork() and clone(CLONE_VM). sure, that was the plan all along with sched_fork() anyway. (i think the initial versions had the flag) Yes? I'll cut a new patch to do just that. sure, fine by me. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 5/5] sched: consolidate sbe sbf
Nick Piggin wrote: One problem I just noticed, sorry. This is doing set_cpus_allowed without holding the runqueue lock and without checking the hard affinity mask either. Err, that is to say set_task_cpu, not set_cpus_allowed. -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 5/5] sched: consolidate sbe sbf
Ingo Molnar wrote: * Nick Piggin <[EMAIL PROTECTED]> wrote: 5/5 Any ideas about what to do with schedstats? Do we really need balance on exec and fork as seperate statistics? Consolidate balance-on-exec with balance-on-fork. This is made easy by the sched-domains RCU patches. As well as the general goodness of code reduction, this allows the runqueues to be unlocked during balance-on-fork. schedstats is a problem. Maybe just have balance-on-event instead of distinguishing fork and exec? Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> looks good. One problem I just noticed, sorry. This is doing set_cpus_allowed without holding the runqueue lock and without checking the hard affinity mask either. We could just do a set_cpus_allowed, or take the lock, set_cpus_allowed, and take the new lock, but that's probably a bit heavy if we can avoid it. In the interests of speed in this fast path, do you think we can do this in sched_fork, before the task has even been put on the tasklist? That would avoid all locking problems. Passing clone_flags into sched_fork would not be a problem if we want to distinguish fork() and clone(CLONE_VM). Yes? I'll cut a new patch to do just that. Acked-by: Ingo Molnar <[EMAIL PROTECTED]> while the code is now consolidated, i think we still need the separate fork/exec stats for schedstat. This makes it a bit harder then, to get good stats in the sched-domain (which is really what we want). It would basically mean doing if (balance fork) schedstat_inc(sbf_cnt); else if (balance exec) schedstat_inc(sbe_cnt); etc. That should all get optimised out by the compiler, but still a bit ugly. Any ideas? -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 5/5] sched: consolidate sbe sbf
* Nick Piggin <[EMAIL PROTECTED]> wrote: > 5/5 > > Any ideas about what to do with schedstats? > Do we really need balance on exec and fork as seperate > statistics? > Consolidate balance-on-exec with balance-on-fork. This is made easy > by the sched-domains RCU patches. > > As well as the general goodness of code reduction, this allows > the runqueues to be unlocked during balance-on-fork. > > schedstats is a problem. Maybe just have balance-on-event instead > of distinguishing fork and exec? > > Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> looks good. Acked-by: Ingo Molnar <[EMAIL PROTECTED]> while the code is now consolidated, i think we still need the separate fork/exec stats for schedstat. We have 3 conceptual ways to start off a new context: fork(), pthread_create() and execve(), and applications use them in different patterns. We have different flags and tuning parameters for two of them (which might have to become 3, i'm not entirely convinced we'll be able to ignore the 'process vs. thread' condition in wake_up_new_task(), STREAM is a really bad example in that sense), so we need 2 (or 3) separate stats. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 5/5] sched: consolidate sbe sbf
* Nick Piggin [EMAIL PROTECTED] wrote: 5/5 Any ideas about what to do with schedstats? Do we really need balance on exec and fork as seperate statistics? Consolidate balance-on-exec with balance-on-fork. This is made easy by the sched-domains RCU patches. As well as the general goodness of code reduction, this allows the runqueues to be unlocked during balance-on-fork. schedstats is a problem. Maybe just have balance-on-event instead of distinguishing fork and exec? Signed-off-by: Nick Piggin [EMAIL PROTECTED] looks good. Acked-by: Ingo Molnar [EMAIL PROTECTED] while the code is now consolidated, i think we still need the separate fork/exec stats for schedstat. We have 3 conceptual ways to start off a new context: fork(), pthread_create() and execve(), and applications use them in different patterns. We have different flags and tuning parameters for two of them (which might have to become 3, i'm not entirely convinced we'll be able to ignore the 'process vs. thread' condition in wake_up_new_task(), STREAM is a really bad example in that sense), so we need 2 (or 3) separate stats. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 5/5] sched: consolidate sbe sbf
Ingo Molnar wrote: * Nick Piggin [EMAIL PROTECTED] wrote: 5/5 Any ideas about what to do with schedstats? Do we really need balance on exec and fork as seperate statistics? Consolidate balance-on-exec with balance-on-fork. This is made easy by the sched-domains RCU patches. As well as the general goodness of code reduction, this allows the runqueues to be unlocked during balance-on-fork. schedstats is a problem. Maybe just have balance-on-event instead of distinguishing fork and exec? Signed-off-by: Nick Piggin [EMAIL PROTECTED] looks good. One problem I just noticed, sorry. This is doing set_cpus_allowed without holding the runqueue lock and without checking the hard affinity mask either. We could just do a set_cpus_allowed, or take the lock, set_cpus_allowed, and take the new lock, but that's probably a bit heavy if we can avoid it. In the interests of speed in this fast path, do you think we can do this in sched_fork, before the task has even been put on the tasklist? That would avoid all locking problems. Passing clone_flags into sched_fork would not be a problem if we want to distinguish fork() and clone(CLONE_VM). Yes? I'll cut a new patch to do just that. Acked-by: Ingo Molnar [EMAIL PROTECTED] while the code is now consolidated, i think we still need the separate fork/exec stats for schedstat. This makes it a bit harder then, to get good stats in the sched-domain (which is really what we want). It would basically mean doing if (balance fork) schedstat_inc(sbf_cnt); else if (balance exec) schedstat_inc(sbe_cnt); etc. That should all get optimised out by the compiler, but still a bit ugly. Any ideas? -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 5/5] sched: consolidate sbe sbf
Nick Piggin wrote: One problem I just noticed, sorry. This is doing set_cpus_allowed without holding the runqueue lock and without checking the hard affinity mask either. Err, that is to say set_task_cpu, not set_cpus_allowed. -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 5/5] sched: consolidate sbe sbf
5/5 Any ideas about what to do with schedstats? Do we really need balance on exec and fork as seperate statistics? Consolidate balance-on-exec with balance-on-fork. This is made easy by the sched-domains RCU patches. As well as the general goodness of code reduction, this allows the runqueues to be unlocked during balance-on-fork. schedstats is a problem. Maybe just have balance-on-event instead of distinguishing fork and exec? Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/kernel/sched.c === --- linux-2.6.orig/kernel/sched.c 2005-04-05 18:39:14.0 +1000 +++ linux-2.6/kernel/sched.c2005-04-05 18:40:18.0 +1000 @@ -1013,8 +1013,57 @@ static int find_idlest_cpu(struct sched_ return idlest; } +/* + * sched_balance_self: balance the current task (running on cpu) in domains + * that have the 'flag' flag set. In practice, this is SD_BALANCE_FORK and + * SD_BALANCE_EXEC. + * + * Balance, ie. select the least loaded group. + * + * Returns the target CPU number, or the same CPU if no balancing is needed. + * + * preempt must be disabled. + */ +static int sched_balance_self(int cpu, int flag) +{ + struct task_struct *t = current; + struct sched_domain *tmp, *sd = NULL; -#endif + for_each_domain(cpu, tmp) + if (tmp->flags & flag) + sd = tmp; + + while (sd) { + cpumask_t span; + struct sched_group *group; + int new_cpu; + + span = sd->span; + group = find_idlest_group(sd, t, cpu); + if (!group) + goto nextlevel; + + new_cpu = find_idlest_cpu(group, cpu); + if (new_cpu == -1 || new_cpu == cpu) + goto nextlevel; + + /* Now try balancing at a lower domain level */ + cpu = new_cpu; +nextlevel: + sd = NULL; + for_each_domain(cpu, tmp) { + if (cpus_subset(span, tmp->span)) + break; + if (tmp->flags & flag) + sd = tmp; + } + /* while loop will break here if sd == NULL */ + } + + return cpu; +} + +#endif /* CONFIG_SMP */ /* * wake_idle() will wake a task on an idle cpu if task->cpu is @@ -1295,63 +1344,22 @@ void fastcall wake_up_new_task(task_t * int this_cpu, cpu; runqueue_t *rq, *this_rq; #ifdef CONFIG_SMP - struct sched_domain *tmp, *sd = NULL; -#endif + int new_cpu; + cpu = task_cpu(p); + preempt_disable(); + new_cpu = sched_balance_self(cpu, SD_BALANCE_FORK); + preempt_enable(); + if (new_cpu != cpu) + set_task_cpu(p, new_cpu); +#endif + + cpu = task_cpu(p); rq = task_rq_lock(p, ); - BUG_ON(p->state != TASK_RUNNING); this_cpu = smp_processor_id(); - cpu = task_cpu(p); - -#ifdef CONFIG_SMP - for_each_domain(cpu, tmp) - if (tmp->flags & SD_BALANCE_FORK) - sd = tmp; - - if (sd) { - cpumask_t span; - int new_cpu; - struct sched_group *group; - -again: - schedstat_inc(sd, sbf_cnt); - span = sd->span; - cpu = task_cpu(p); - group = find_idlest_group(sd, p, cpu); - if (!group) { - schedstat_inc(sd, sbf_balanced); - goto nextlevel; - } - new_cpu = find_idlest_cpu(group, cpu); - if (new_cpu == -1 || new_cpu == cpu) { - schedstat_inc(sd, sbf_balanced); - goto nextlevel; - } - - if (cpu_isset(new_cpu, p->cpus_allowed)) { - schedstat_inc(sd, sbf_pushed); - set_task_cpu(p, new_cpu); - task_rq_unlock(rq, ); - rq = task_rq_lock(p, ); - cpu = task_cpu(p); - } - - /* Now try balancing at a lower domain level */ -nextlevel: - sd = NULL; - for_each_domain(cpu, tmp) { - if (cpus_subset(span, tmp->span)) - break; - if (tmp->flags & SD_BALANCE_FORK) - sd = tmp; - } - - if (sd) - goto again; - } + BUG_ON(p->state != TASK_RUNNING); -#endif /* * We decrease the sleep average of forking parents * and children as well, to keep max-interactive tasks @@ -1699,59 +1707,17 @@ out: task_rq_unlock(rq, ); } -/* - * sched_exec(): find the highest-level, exec-balance-capable - * domain and try to migrate the task to the least
[patch 5/5] sched: consolidate sbe sbf
5/5 Any ideas about what to do with schedstats? Do we really need balance on exec and fork as seperate statistics? Consolidate balance-on-exec with balance-on-fork. This is made easy by the sched-domains RCU patches. As well as the general goodness of code reduction, this allows the runqueues to be unlocked during balance-on-fork. schedstats is a problem. Maybe just have balance-on-event instead of distinguishing fork and exec? Signed-off-by: Nick Piggin [EMAIL PROTECTED] Index: linux-2.6/kernel/sched.c === --- linux-2.6.orig/kernel/sched.c 2005-04-05 18:39:14.0 +1000 +++ linux-2.6/kernel/sched.c2005-04-05 18:40:18.0 +1000 @@ -1013,8 +1013,57 @@ static int find_idlest_cpu(struct sched_ return idlest; } +/* + * sched_balance_self: balance the current task (running on cpu) in domains + * that have the 'flag' flag set. In practice, this is SD_BALANCE_FORK and + * SD_BALANCE_EXEC. + * + * Balance, ie. select the least loaded group. + * + * Returns the target CPU number, or the same CPU if no balancing is needed. + * + * preempt must be disabled. + */ +static int sched_balance_self(int cpu, int flag) +{ + struct task_struct *t = current; + struct sched_domain *tmp, *sd = NULL; -#endif + for_each_domain(cpu, tmp) + if (tmp-flags flag) + sd = tmp; + + while (sd) { + cpumask_t span; + struct sched_group *group; + int new_cpu; + + span = sd-span; + group = find_idlest_group(sd, t, cpu); + if (!group) + goto nextlevel; + + new_cpu = find_idlest_cpu(group, cpu); + if (new_cpu == -1 || new_cpu == cpu) + goto nextlevel; + + /* Now try balancing at a lower domain level */ + cpu = new_cpu; +nextlevel: + sd = NULL; + for_each_domain(cpu, tmp) { + if (cpus_subset(span, tmp-span)) + break; + if (tmp-flags flag) + sd = tmp; + } + /* while loop will break here if sd == NULL */ + } + + return cpu; +} + +#endif /* CONFIG_SMP */ /* * wake_idle() will wake a task on an idle cpu if task-cpu is @@ -1295,63 +1344,22 @@ void fastcall wake_up_new_task(task_t * int this_cpu, cpu; runqueue_t *rq, *this_rq; #ifdef CONFIG_SMP - struct sched_domain *tmp, *sd = NULL; -#endif + int new_cpu; + cpu = task_cpu(p); + preempt_disable(); + new_cpu = sched_balance_self(cpu, SD_BALANCE_FORK); + preempt_enable(); + if (new_cpu != cpu) + set_task_cpu(p, new_cpu); +#endif + + cpu = task_cpu(p); rq = task_rq_lock(p, flags); - BUG_ON(p-state != TASK_RUNNING); this_cpu = smp_processor_id(); - cpu = task_cpu(p); - -#ifdef CONFIG_SMP - for_each_domain(cpu, tmp) - if (tmp-flags SD_BALANCE_FORK) - sd = tmp; - - if (sd) { - cpumask_t span; - int new_cpu; - struct sched_group *group; - -again: - schedstat_inc(sd, sbf_cnt); - span = sd-span; - cpu = task_cpu(p); - group = find_idlest_group(sd, p, cpu); - if (!group) { - schedstat_inc(sd, sbf_balanced); - goto nextlevel; - } - new_cpu = find_idlest_cpu(group, cpu); - if (new_cpu == -1 || new_cpu == cpu) { - schedstat_inc(sd, sbf_balanced); - goto nextlevel; - } - - if (cpu_isset(new_cpu, p-cpus_allowed)) { - schedstat_inc(sd, sbf_pushed); - set_task_cpu(p, new_cpu); - task_rq_unlock(rq, flags); - rq = task_rq_lock(p, flags); - cpu = task_cpu(p); - } - - /* Now try balancing at a lower domain level */ -nextlevel: - sd = NULL; - for_each_domain(cpu, tmp) { - if (cpus_subset(span, tmp-span)) - break; - if (tmp-flags SD_BALANCE_FORK) - sd = tmp; - } - - if (sd) - goto again; - } + BUG_ON(p-state != TASK_RUNNING); -#endif /* * We decrease the sleep average of forking parents * and children as well, to keep max-interactive tasks @@ -1699,59 +1707,17 @@ out: task_rq_unlock(rq, flags); } -/* - * sched_exec(): find the highest-level, exec-balance-capable - * domain and try to migrate the task to the least