Re: [patch 1/5] sched: remove degenerate domains
* Nick Piggin <[EMAIL PROTECTED]> wrote: > [...] Although I'd imagine it may be something distros may want. For > example, a generic x86-64 kernel for both AMD and Intel systems could > easily have SMT and NUMA turned on. yes, that's true - in fact reducing the number of separate kernel packages is of utmost importance to all distributions. (I'm not sure we are there yet with CONFIG_NUMA, but small steps wont hurt.) > I agree with the downside of exercising less code paths though. if we make CONFIG_NUMA good enough on small boxes so that distributors can turn it on then in the long run the loss could be offset by the win the extra QA gives. > >is there any case where we'd want to simplify the domain tree? One more > >domain level is just one (and very minor) aspect of CONFIG_NUMA - i'd > >not want to run a CONFIG_NUMA kernel on a non-NUMA box, even if the > >domain tree got optimized. Hm? > > I guess there is the SMT issue too, and even booting an SMP kernel on > a UP system. Also small ia64 NUMA systems will probably have one > redundant NUMA level. i think most factors of not running an SMP kernel on a UP box are not due scheduler overhead: the biggest cost is spinlock overhead. Someone should try a little prototype: use the 'alternate instructions' framework to patch out calls to spinlock functions to NOPs, and benchmark the resulting kernel against UP. If it's "good enough", distros will use it. Having just a single binary kernel RPM that supports everything from NUMA through SMP to UP is the holy grail of distros. (especially the ones that offer commercial support and services.) this is probably not possible on x86 - e.g. it would probably be expensive (in terms of runtime cost) to make the PAE/non-PAE decision runtime (the distro boot kernel needs to be non-PAE). But for newer arches like x64 it should be easier. > If/when topologies get more complex (for example, the recent Altix > discussions we had with Paul), it will be generally easier to set up > all levels in a generic way, then weed them out using something like > this, rather than put the logic in the domain setup code. ok. That should also make it easier to put more of the arch domain setup code into sched.c. E.g. i'm still uneasy about it having so much scheduler code in arch/ia64/kernel/domain.c, and all the ripple effects. (the #ifdefs, include file impact, etc.) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/5] sched: remove degenerate domains
* Nick Piggin [EMAIL PROTECTED] wrote: [...] Although I'd imagine it may be something distros may want. For example, a generic x86-64 kernel for both AMD and Intel systems could easily have SMT and NUMA turned on. yes, that's true - in fact reducing the number of separate kernel packages is of utmost importance to all distributions. (I'm not sure we are there yet with CONFIG_NUMA, but small steps wont hurt.) I agree with the downside of exercising less code paths though. if we make CONFIG_NUMA good enough on small boxes so that distributors can turn it on then in the long run the loss could be offset by the win the extra QA gives. is there any case where we'd want to simplify the domain tree? One more domain level is just one (and very minor) aspect of CONFIG_NUMA - i'd not want to run a CONFIG_NUMA kernel on a non-NUMA box, even if the domain tree got optimized. Hm? I guess there is the SMT issue too, and even booting an SMP kernel on a UP system. Also small ia64 NUMA systems will probably have one redundant NUMA level. i think most factors of not running an SMP kernel on a UP box are not due scheduler overhead: the biggest cost is spinlock overhead. Someone should try a little prototype: use the 'alternate instructions' framework to patch out calls to spinlock functions to NOPs, and benchmark the resulting kernel against UP. If it's good enough, distros will use it. Having just a single binary kernel RPM that supports everything from NUMA through SMP to UP is the holy grail of distros. (especially the ones that offer commercial support and services.) this is probably not possible on x86 - e.g. it would probably be expensive (in terms of runtime cost) to make the PAE/non-PAE decision runtime (the distro boot kernel needs to be non-PAE). But for newer arches like x64 it should be easier. If/when topologies get more complex (for example, the recent Altix discussions we had with Paul), it will be generally easier to set up all levels in a generic way, then weed them out using something like this, rather than put the logic in the domain setup code. ok. That should also make it easier to put more of the arch domain setup code into sched.c. E.g. i'm still uneasy about it having so much scheduler code in arch/ia64/kernel/domain.c, and all the ripple effects. (the #ifdefs, include file impact, etc.) Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/5] sched: remove degenerate domains
Ingo Molnar wrote: * Siddha, Suresh B <[EMAIL PROTECTED]> wrote: Similarly I am working on adding a new core domain for dual-core systems! All these domains are unnecessary and cause performance isssues on non Multi-threading/Multi-core capable cpus! Agreed that performance impact will be minor but still... ok, lets keep it then. It may in fact simplify the domain setup code: we could generate the 'most generic' layout for a given arch all the time, and then optimize it automatically. I.e. in theory we could have just a single domain-setup routine, which would e.g. generate the NUMA domains on SMP too, which would then be optimized away. Yep, exactly. Even so, Andrew: please ignore this patch series and I'll redo it for you when we all agree on everything. Thanks. -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/5] sched: remove degenerate domains
Ingo Molnar wrote: * Nick Piggin <[EMAIL PROTECTED]> wrote: This is Suresh's patch with some modifications. Remove degenerate scheduler domains during the sched-domain init. actually, i'd suggest to not do this patch. The point of booting with a CONFIG_NUMA kernel on a non-NUMA box is mostly for testing, and the 'degenerate' toplevel domain exposed conceptual bugs in the sched-domains code. In that sense removing such 'unnecessary' domains inhibits debuggability to a certain degree. If we had this patch earlier we'd not have experienced the wrong decisions taken by the scheduler, only on the much rarer 'really NUMA' boxes. True. Although I'd imagine it may be something distros may want. For example, a generic x86-64 kernel for both AMD and Intel systems could easily have SMT and NUMA turned on. I agree with the downside of exercising less code paths though. What about putting as a (default to off for 2.6) config option in the config embedded menu? is there any case where we'd want to simplify the domain tree? One more domain level is just one (and very minor) aspect of CONFIG_NUMA - i'd not want to run a CONFIG_NUMA kernel on a non-NUMA box, even if the domain tree got optimized. Hm? I guess there is the SMT issue too, and even booting an SMP kernel on a UP system. Also small ia64 NUMA systems will probably have one redundant NUMA level. If/when topologies get more complex (for example, the recent Altix discussions we had with Paul), it will be generally easier to set up all levels in a generic way, then weed them out using something like this, rather than put the logic in the domain setup code. Nick -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/5] sched: remove degenerate domains
* Siddha, Suresh B <[EMAIL PROTECTED]> wrote: > Similarly I am working on adding a new core domain for dual-core > systems! All these domains are unnecessary and cause performance > isssues on non Multi-threading/Multi-core capable cpus! Agreed that > performance impact will be minor but still... ok, lets keep it then. It may in fact simplify the domain setup code: we could generate the 'most generic' layout for a given arch all the time, and then optimize it automatically. I.e. in theory we could have just a single domain-setup routine, which would e.g. generate the NUMA domains on SMP too, which would then be optimized away. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/5] sched: remove degenerate domains
On Wed, Apr 06, 2005 at 07:44:12AM +0200, Ingo Molnar wrote: > > * Nick Piggin <[EMAIL PROTECTED]> wrote: > > > This is Suresh's patch with some modifications. > > > Remove degenerate scheduler domains during the sched-domain init. > > actually, i'd suggest to not do this patch. The point of booting with a > CONFIG_NUMA kernel on a non-NUMA box is mostly for testing, and the Not really. All of the x86_64 kernels are NUMA enabled and most Intel x86_64 systems today are non NUMA. > 'degenerate' toplevel domain exposed conceptual bugs in the > sched-domains code. In that sense removing such 'unnecessary' domains > inhibits debuggability to a certain degree. If we had this patch earlier > we'd not have experienced the wrong decisions taken by the scheduler, > only on the much rarer 'really NUMA' boxes. > > is there any case where we'd want to simplify the domain tree? One more > domain level is just one (and very minor) aspect of CONFIG_NUMA - i'd > not want to run a CONFIG_NUMA kernel on a non-NUMA box, even if the > domain tree got optimized. Hm? > Ingo, pardon me! Actually I used NUMA domain as an excuse to push domain degenerate patch As I mentioned earlier, we should remove SMT domain on a non-HT capable system. Similarly I am working on adding a new core domain for dual-core systems! All these domains are unnecessary and cause performance isssues on non Multi-threading/Multi-core capable cpus! Agreed that performance impact will be minor but still... thanks, suresh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/5] sched: remove degenerate domains
On Wed, Apr 06, 2005 at 07:44:12AM +0200, Ingo Molnar wrote: * Nick Piggin [EMAIL PROTECTED] wrote: This is Suresh's patch with some modifications. Remove degenerate scheduler domains during the sched-domain init. actually, i'd suggest to not do this patch. The point of booting with a CONFIG_NUMA kernel on a non-NUMA box is mostly for testing, and the Not really. All of the x86_64 kernels are NUMA enabled and most Intel x86_64 systems today are non NUMA. 'degenerate' toplevel domain exposed conceptual bugs in the sched-domains code. In that sense removing such 'unnecessary' domains inhibits debuggability to a certain degree. If we had this patch earlier we'd not have experienced the wrong decisions taken by the scheduler, only on the much rarer 'really NUMA' boxes. is there any case where we'd want to simplify the domain tree? One more domain level is just one (and very minor) aspect of CONFIG_NUMA - i'd not want to run a CONFIG_NUMA kernel on a non-NUMA box, even if the domain tree got optimized. Hm? Ingo, pardon me! Actually I used NUMA domain as an excuse to push domain degenerate patch As I mentioned earlier, we should remove SMT domain on a non-HT capable system. Similarly I am working on adding a new core domain for dual-core systems! All these domains are unnecessary and cause performance isssues on non Multi-threading/Multi-core capable cpus! Agreed that performance impact will be minor but still... thanks, suresh - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/5] sched: remove degenerate domains
* Siddha, Suresh B [EMAIL PROTECTED] wrote: Similarly I am working on adding a new core domain for dual-core systems! All these domains are unnecessary and cause performance isssues on non Multi-threading/Multi-core capable cpus! Agreed that performance impact will be minor but still... ok, lets keep it then. It may in fact simplify the domain setup code: we could generate the 'most generic' layout for a given arch all the time, and then optimize it automatically. I.e. in theory we could have just a single domain-setup routine, which would e.g. generate the NUMA domains on SMP too, which would then be optimized away. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/5] sched: remove degenerate domains
Ingo Molnar wrote: * Nick Piggin [EMAIL PROTECTED] wrote: This is Suresh's patch with some modifications. Remove degenerate scheduler domains during the sched-domain init. actually, i'd suggest to not do this patch. The point of booting with a CONFIG_NUMA kernel on a non-NUMA box is mostly for testing, and the 'degenerate' toplevel domain exposed conceptual bugs in the sched-domains code. In that sense removing such 'unnecessary' domains inhibits debuggability to a certain degree. If we had this patch earlier we'd not have experienced the wrong decisions taken by the scheduler, only on the much rarer 'really NUMA' boxes. True. Although I'd imagine it may be something distros may want. For example, a generic x86-64 kernel for both AMD and Intel systems could easily have SMT and NUMA turned on. I agree with the downside of exercising less code paths though. What about putting as a (default to off for 2.6) config option in the config embedded menu? is there any case where we'd want to simplify the domain tree? One more domain level is just one (and very minor) aspect of CONFIG_NUMA - i'd not want to run a CONFIG_NUMA kernel on a non-NUMA box, even if the domain tree got optimized. Hm? I guess there is the SMT issue too, and even booting an SMP kernel on a UP system. Also small ia64 NUMA systems will probably have one redundant NUMA level. If/when topologies get more complex (for example, the recent Altix discussions we had with Paul), it will be generally easier to set up all levels in a generic way, then weed them out using something like this, rather than put the logic in the domain setup code. Nick -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/5] sched: remove degenerate domains
Ingo Molnar wrote: * Siddha, Suresh B [EMAIL PROTECTED] wrote: Similarly I am working on adding a new core domain for dual-core systems! All these domains are unnecessary and cause performance isssues on non Multi-threading/Multi-core capable cpus! Agreed that performance impact will be minor but still... ok, lets keep it then. It may in fact simplify the domain setup code: we could generate the 'most generic' layout for a given arch all the time, and then optimize it automatically. I.e. in theory we could have just a single domain-setup routine, which would e.g. generate the NUMA domains on SMP too, which would then be optimized away. Yep, exactly. Even so, Andrew: please ignore this patch series and I'll redo it for you when we all agree on everything. Thanks. -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/5] sched: remove degenerate domains
* Nick Piggin <[EMAIL PROTECTED]> wrote: > This is Suresh's patch with some modifications. > Remove degenerate scheduler domains during the sched-domain init. actually, i'd suggest to not do this patch. The point of booting with a CONFIG_NUMA kernel on a non-NUMA box is mostly for testing, and the 'degenerate' toplevel domain exposed conceptual bugs in the sched-domains code. In that sense removing such 'unnecessary' domains inhibits debuggability to a certain degree. If we had this patch earlier we'd not have experienced the wrong decisions taken by the scheduler, only on the much rarer 'really NUMA' boxes. is there any case where we'd want to simplify the domain tree? One more domain level is just one (and very minor) aspect of CONFIG_NUMA - i'd not want to run a CONFIG_NUMA kernel on a non-NUMA box, even if the domain tree got optimized. Hm? Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 1/5] sched: remove degenerate domains
This is Suresh's patch with some modifications. -- SUSE Labs, Novell Inc. Remove degenerate scheduler domains during the sched-domain init. For example on x86_64, we always have NUMA configured in. On Intel EM64T systems, top most sched domain will be of NUMA and with only one sched_group in it. With fork/exec balances(recent Nick's fixes in -mm tree), we always endup taking wrong decisions because of this topmost domain (as it contains only one group and find_idlest_group always returns NULL). We will endup loading HT package completely first, letting active load balance kickin and correct it. In general, this patch also makes sense with out recent Nick's fixes in -mm. Signed-off-by: Suresh Siddha <[EMAIL PROTECTED]> Modified to account for more than just sched_groups when scanning for degenerate domains by Nick Piggin. Allow a runqueue's sd to go NULL, which required small changes to the smtnice code. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/kernel/sched.c === --- linux-2.6.orig/kernel/sched.c 2005-04-05 16:38:21.0 +1000 +++ linux-2.6/kernel/sched.c2005-04-05 18:39:09.0 +1000 @@ -2583,11 +2583,15 @@ out: #ifdef CONFIG_SCHED_SMT static inline void wake_sleeping_dependent(int this_cpu, runqueue_t *this_rq) { - struct sched_domain *sd = this_rq->sd; + struct sched_domain *tmp, *sd = NULL; cpumask_t sibling_map; int i; + + for_each_domain(this_cpu, tmp) + if (tmp->flags & SD_SHARE_CPUPOWER) + sd = tmp; - if (!(sd->flags & SD_SHARE_CPUPOWER)) + if (!sd) return; /* @@ -2628,13 +2632,17 @@ static inline void wake_sleeping_depende static inline int dependent_sleeper(int this_cpu, runqueue_t *this_rq) { - struct sched_domain *sd = this_rq->sd; + struct sched_domain *tmp, *sd = NULL; cpumask_t sibling_map; prio_array_t *array; int ret = 0, i; task_t *p; - if (!(sd->flags & SD_SHARE_CPUPOWER)) + for_each_domain(this_cpu, tmp) + if (tmp->flags & SD_SHARE_CPUPOWER) + sd = tmp; + + if (!sd) return 0; /* @@ -4604,6 +4612,11 @@ static void sched_domain_debug(struct sc { int level = 0; + if (!sd) { + printk(KERN_DEBUG "CPU%d attaching NULL sched-domain.\n", cpu); + return; + } + printk(KERN_DEBUG "CPU%d attaching sched-domain:\n", cpu); do { @@ -4809,6 +4822,50 @@ static void init_sched_domain_sysctl(voi } #endif +static int __devinit sd_degenerate(struct sched_domain *sd) +{ + if (cpus_weight(sd->span) == 1) + return 1; + + /* Following flags need at least 2 groups */ + if (sd->flags & (SD_LOAD_BALANCE | +SD_BALANCE_NEWIDLE | +SD_BALANCE_FORK | +SD_BALANCE_EXEC)) { + if (sd->groups != sd->groups->next) + return 0; + } + + /* Following flags don't use groups */ + if (sd->flags & (SD_WAKE_IDLE | +SD_WAKE_AFFINE | +SD_WAKE_BALANCE)) + return 0; + + return 1; +} + +static int __devinit sd_parent_degenerate(struct sched_domain *sd, + struct sched_domain *parent) +{ + unsigned long cflags = sd->flags, pflags = parent->flags; + + if (sd_degenerate(parent)) + return 1; + + if (!cpus_equal(sd->span, parent->span)) + return 0; + + /* Does parent contain flags not in child? */ + /* WAKE_BALANCE is a subset of WAKE_AFFINE */ + if (cflags & SD_WAKE_AFFINE) + pflags &= ~SD_WAKE_BALANCE; + if ((~sd->flags) & parent->flags) + return 0; + + return 1; +} + /* * Attach the domain 'sd' to 'cpu' as its base domain. Callers must * hold the hotplug lock. @@ -4819,6 +4876,19 @@ void __devinit cpu_attach_domain(struct unsigned long flags; runqueue_t *rq = cpu_rq(cpu); int local = 1; + struct sched_domain *tmp; + + /* Remove the sched domains which do not contribute to scheduling. */ + for (tmp = sd; tmp; tmp = tmp->parent) { + struct sched_domain *parent = tmp->parent; + if (!parent) + break; + if (sd_parent_degenerate(tmp, parent)) + tmp->parent = parent->parent; + } + + if (sd_degenerate(sd)) + sd = sd->parent; sched_domain_debug(sd, cpu);
[patch 1/5] sched: remove degenerate domains
This is Suresh's patch with some modifications. -- SUSE Labs, Novell Inc. Remove degenerate scheduler domains during the sched-domain init. For example on x86_64, we always have NUMA configured in. On Intel EM64T systems, top most sched domain will be of NUMA and with only one sched_group in it. With fork/exec balances(recent Nick's fixes in -mm tree), we always endup taking wrong decisions because of this topmost domain (as it contains only one group and find_idlest_group always returns NULL). We will endup loading HT package completely first, letting active load balance kickin and correct it. In general, this patch also makes sense with out recent Nick's fixes in -mm. Signed-off-by: Suresh Siddha [EMAIL PROTECTED] Modified to account for more than just sched_groups when scanning for degenerate domains by Nick Piggin. Allow a runqueue's sd to go NULL, which required small changes to the smtnice code. Signed-off-by: Nick Piggin [EMAIL PROTECTED] Index: linux-2.6/kernel/sched.c === --- linux-2.6.orig/kernel/sched.c 2005-04-05 16:38:21.0 +1000 +++ linux-2.6/kernel/sched.c2005-04-05 18:39:09.0 +1000 @@ -2583,11 +2583,15 @@ out: #ifdef CONFIG_SCHED_SMT static inline void wake_sleeping_dependent(int this_cpu, runqueue_t *this_rq) { - struct sched_domain *sd = this_rq-sd; + struct sched_domain *tmp, *sd = NULL; cpumask_t sibling_map; int i; + + for_each_domain(this_cpu, tmp) + if (tmp-flags SD_SHARE_CPUPOWER) + sd = tmp; - if (!(sd-flags SD_SHARE_CPUPOWER)) + if (!sd) return; /* @@ -2628,13 +2632,17 @@ static inline void wake_sleeping_depende static inline int dependent_sleeper(int this_cpu, runqueue_t *this_rq) { - struct sched_domain *sd = this_rq-sd; + struct sched_domain *tmp, *sd = NULL; cpumask_t sibling_map; prio_array_t *array; int ret = 0, i; task_t *p; - if (!(sd-flags SD_SHARE_CPUPOWER)) + for_each_domain(this_cpu, tmp) + if (tmp-flags SD_SHARE_CPUPOWER) + sd = tmp; + + if (!sd) return 0; /* @@ -4604,6 +4612,11 @@ static void sched_domain_debug(struct sc { int level = 0; + if (!sd) { + printk(KERN_DEBUG CPU%d attaching NULL sched-domain.\n, cpu); + return; + } + printk(KERN_DEBUG CPU%d attaching sched-domain:\n, cpu); do { @@ -4809,6 +4822,50 @@ static void init_sched_domain_sysctl(voi } #endif +static int __devinit sd_degenerate(struct sched_domain *sd) +{ + if (cpus_weight(sd-span) == 1) + return 1; + + /* Following flags need at least 2 groups */ + if (sd-flags (SD_LOAD_BALANCE | +SD_BALANCE_NEWIDLE | +SD_BALANCE_FORK | +SD_BALANCE_EXEC)) { + if (sd-groups != sd-groups-next) + return 0; + } + + /* Following flags don't use groups */ + if (sd-flags (SD_WAKE_IDLE | +SD_WAKE_AFFINE | +SD_WAKE_BALANCE)) + return 0; + + return 1; +} + +static int __devinit sd_parent_degenerate(struct sched_domain *sd, + struct sched_domain *parent) +{ + unsigned long cflags = sd-flags, pflags = parent-flags; + + if (sd_degenerate(parent)) + return 1; + + if (!cpus_equal(sd-span, parent-span)) + return 0; + + /* Does parent contain flags not in child? */ + /* WAKE_BALANCE is a subset of WAKE_AFFINE */ + if (cflags SD_WAKE_AFFINE) + pflags = ~SD_WAKE_BALANCE; + if ((~sd-flags) parent-flags) + return 0; + + return 1; +} + /* * Attach the domain 'sd' to 'cpu' as its base domain. Callers must * hold the hotplug lock. @@ -4819,6 +4876,19 @@ void __devinit cpu_attach_domain(struct unsigned long flags; runqueue_t *rq = cpu_rq(cpu); int local = 1; + struct sched_domain *tmp; + + /* Remove the sched domains which do not contribute to scheduling. */ + for (tmp = sd; tmp; tmp = tmp-parent) { + struct sched_domain *parent = tmp-parent; + if (!parent) + break; + if (sd_parent_degenerate(tmp, parent)) + tmp-parent = parent-parent; + } + + if (sd_degenerate(sd)) + sd = sd-parent; sched_domain_debug(sd, cpu);
Re: [patch 1/5] sched: remove degenerate domains
* Nick Piggin [EMAIL PROTECTED] wrote: This is Suresh's patch with some modifications. Remove degenerate scheduler domains during the sched-domain init. actually, i'd suggest to not do this patch. The point of booting with a CONFIG_NUMA kernel on a non-NUMA box is mostly for testing, and the 'degenerate' toplevel domain exposed conceptual bugs in the sched-domains code. In that sense removing such 'unnecessary' domains inhibits debuggability to a certain degree. If we had this patch earlier we'd not have experienced the wrong decisions taken by the scheduler, only on the much rarer 'really NUMA' boxes. is there any case where we'd want to simplify the domain tree? One more domain level is just one (and very minor) aspect of CONFIG_NUMA - i'd not want to run a CONFIG_NUMA kernel on a non-NUMA box, even if the domain tree got optimized. Hm? Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/