Re: [patch 4/5] sched: RCU sched domains
Paul E. McKenney wrote: On Thu, Apr 07, 2005 at 05:58:40PM +1000, Nick Piggin wrote: OK thanks for the good explanation. So I'll keep it as is for now, and whatever needs cleaning up later can be worked out as it comes up. Looking forward to the split of synchronize_kernel() into synchronize_rcu() and synchronize_sched(), the two choices are: o Use synchronize_rcu(), but insert rcu_read_lock()/rcu_read_unlock() pairs on the read side. o Use synchronize_sched(), and make sure all read-side code is under preempt_disable(). Yep, I think we'll go for the second option initially (because that pretty closely matches the homebrew locking scheme that it used to use). Either way, there may also need to be some rcu_dereference()s when picking up pointer and rcu_assign_pointer()s when updating the pointers. For example, if traversing the domain parent list is to be RCU protected, the for_each_domain() macro should change to something like: Yes, I think you're right, because there's no barriers or synchronisation when attaching a new domain. Just a small point though: #define for_each_domain(cpu, domain) \ for (domain = cpu_rq(cpu)->sd; domain; domain = rcu_dereference(domain->parent)) This should probably be done like so? #define for_each_domain(cpu, domain) \ for (domain = rcu_dereference(cpu_rq(cpu)->sd); domain; domain = domain->parent) And I think it would be wise to use rcu_assign_pointer in the update too. Thanks Paul. -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/5] sched: RCU sched domains
On Thu, Apr 07, 2005 at 05:58:40PM +1000, Nick Piggin wrote: > Ingo Molnar wrote: > >* Nick Piggin <[EMAIL PROTECTED]> wrote: > > > > > >>>At a minimum i think we need the fix+comment below. > >> > >>Well if we say "this is actually RCU", then yes. And we should > >>probably change the preempt_{dis|en}ables in other places to > >>rcu_read_lock. > >> > >>OTOH, if we say we just want all running threads to process through a > >>preemption stage, then this would just be a preempt_disable/enable > >>pair. > >> > >>In practice that makes no difference yet, but it looks like you and > >>Paul are working to distinguish these two cases in the RCU code, to > >>accomodate your low latency RCU stuff? > > > > > >it doesnt impact PREEMPT_RCU/PREEMPT_RT directly, because the scheduler > >itself always needs to be non-preemptible. > > > >those few places where we currently do preempt_disable(), which should > >thus be rcu_read_lock(), are never in codepaths that can take alot of > >time. > > > >but yes, in principle you are right, but in this particular (and > >special) case it's not a big issue. We should document the RCU read-lock > >dependencies cleanly and make all rcu-read-lock cases truly > >rcu_read_lock(), but it's not a pressing issue even considering possible > >future features like PREEMPT_RT. > > > >the only danger in this area is to PREEMPT_RT: it is a bug on PREEMPT_RT > >if kernel code has an implicit 'spinlock means preempt-off and thus > >RCU-read-lock' assumption. Most of the time these get discovered via > >PREEMPT_DEBUG. (preempt_disable() disables preemption on PREEMPT_RT too, > >so that is not a problem either.) > > > > OK thanks for the good explanation. So I'll keep it as is for now, > and whatever needs cleaning up later can be worked out as it comes > up. Looking forward to the split of synchronize_kernel() into synchronize_rcu() and synchronize_sched(), the two choices are: o Use synchronize_rcu(), but insert rcu_read_lock()/rcu_read_unlock() pairs on the read side. o Use synchronize_sched(), and make sure all read-side code is under preempt_disable(). Either way, there may also need to be some rcu_dereference()s when picking up pointer and rcu_assign_pointer()s when updating the pointers. For example, if traversing the domain parent list is to be RCU protected, the for_each_domain() macro should change to something like: #define for_each_domain(cpu, domain) \ for (domain = cpu_rq(cpu)->sd; domain; domain = rcu_dereference(domain->parent)) Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/5] sched: RCU sched domains
On Thu, Apr 07, 2005 at 05:58:40PM +1000, Nick Piggin wrote: Ingo Molnar wrote: * Nick Piggin [EMAIL PROTECTED] wrote: At a minimum i think we need the fix+comment below. Well if we say this is actually RCU, then yes. And we should probably change the preempt_{dis|en}ables in other places to rcu_read_lock. OTOH, if we say we just want all running threads to process through a preemption stage, then this would just be a preempt_disable/enable pair. In practice that makes no difference yet, but it looks like you and Paul are working to distinguish these two cases in the RCU code, to accomodate your low latency RCU stuff? it doesnt impact PREEMPT_RCU/PREEMPT_RT directly, because the scheduler itself always needs to be non-preemptible. those few places where we currently do preempt_disable(), which should thus be rcu_read_lock(), are never in codepaths that can take alot of time. but yes, in principle you are right, but in this particular (and special) case it's not a big issue. We should document the RCU read-lock dependencies cleanly and make all rcu-read-lock cases truly rcu_read_lock(), but it's not a pressing issue even considering possible future features like PREEMPT_RT. the only danger in this area is to PREEMPT_RT: it is a bug on PREEMPT_RT if kernel code has an implicit 'spinlock means preempt-off and thus RCU-read-lock' assumption. Most of the time these get discovered via PREEMPT_DEBUG. (preempt_disable() disables preemption on PREEMPT_RT too, so that is not a problem either.) OK thanks for the good explanation. So I'll keep it as is for now, and whatever needs cleaning up later can be worked out as it comes up. Looking forward to the split of synchronize_kernel() into synchronize_rcu() and synchronize_sched(), the two choices are: o Use synchronize_rcu(), but insert rcu_read_lock()/rcu_read_unlock() pairs on the read side. o Use synchronize_sched(), and make sure all read-side code is under preempt_disable(). Either way, there may also need to be some rcu_dereference()s when picking up pointer and rcu_assign_pointer()s when updating the pointers. For example, if traversing the domain parent list is to be RCU protected, the for_each_domain() macro should change to something like: #define for_each_domain(cpu, domain) \ for (domain = cpu_rq(cpu)-sd; domain; domain = rcu_dereference(domain-parent)) Thanx, Paul - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/5] sched: RCU sched domains
Paul E. McKenney wrote: On Thu, Apr 07, 2005 at 05:58:40PM +1000, Nick Piggin wrote: OK thanks for the good explanation. So I'll keep it as is for now, and whatever needs cleaning up later can be worked out as it comes up. Looking forward to the split of synchronize_kernel() into synchronize_rcu() and synchronize_sched(), the two choices are: o Use synchronize_rcu(), but insert rcu_read_lock()/rcu_read_unlock() pairs on the read side. o Use synchronize_sched(), and make sure all read-side code is under preempt_disable(). Yep, I think we'll go for the second option initially (because that pretty closely matches the homebrew locking scheme that it used to use). Either way, there may also need to be some rcu_dereference()s when picking up pointer and rcu_assign_pointer()s when updating the pointers. For example, if traversing the domain parent list is to be RCU protected, the for_each_domain() macro should change to something like: Yes, I think you're right, because there's no barriers or synchronisation when attaching a new domain. Just a small point though: #define for_each_domain(cpu, domain) \ for (domain = cpu_rq(cpu)-sd; domain; domain = rcu_dereference(domain-parent)) This should probably be done like so? #define for_each_domain(cpu, domain) \ for (domain = rcu_dereference(cpu_rq(cpu)-sd); domain; domain = domain-parent) And I think it would be wise to use rcu_assign_pointer in the update too. Thanks Paul. -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/5] sched: RCU sched domains
Ingo Molnar wrote: * Nick Piggin <[EMAIL PROTECTED]> wrote: At a minimum i think we need the fix+comment below. Well if we say "this is actually RCU", then yes. And we should probably change the preempt_{dis|en}ables in other places to rcu_read_lock. OTOH, if we say we just want all running threads to process through a preemption stage, then this would just be a preempt_disable/enable pair. In practice that makes no difference yet, but it looks like you and Paul are working to distinguish these two cases in the RCU code, to accomodate your low latency RCU stuff? it doesnt impact PREEMPT_RCU/PREEMPT_RT directly, because the scheduler itself always needs to be non-preemptible. those few places where we currently do preempt_disable(), which should thus be rcu_read_lock(), are never in codepaths that can take alot of time. but yes, in principle you are right, but in this particular (and special) case it's not a big issue. We should document the RCU read-lock dependencies cleanly and make all rcu-read-lock cases truly rcu_read_lock(), but it's not a pressing issue even considering possible future features like PREEMPT_RT. the only danger in this area is to PREEMPT_RT: it is a bug on PREEMPT_RT if kernel code has an implicit 'spinlock means preempt-off and thus RCU-read-lock' assumption. Most of the time these get discovered via PREEMPT_DEBUG. (preempt_disable() disables preemption on PREEMPT_RT too, so that is not a problem either.) OK thanks for the good explanation. So I'll keep it as is for now, and whatever needs cleaning up later can be worked out as it comes up. -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/5] sched: RCU sched domains
* Nick Piggin <[EMAIL PROTECTED]> wrote: > > At a minimum i think we need the fix+comment below. > > Well if we say "this is actually RCU", then yes. And we should > probably change the preempt_{dis|en}ables in other places to > rcu_read_lock. > > OTOH, if we say we just want all running threads to process through a > preemption stage, then this would just be a preempt_disable/enable > pair. > > In practice that makes no difference yet, but it looks like you and > Paul are working to distinguish these two cases in the RCU code, to > accomodate your low latency RCU stuff? it doesnt impact PREEMPT_RCU/PREEMPT_RT directly, because the scheduler itself always needs to be non-preemptible. those few places where we currently do preempt_disable(), which should thus be rcu_read_lock(), are never in codepaths that can take alot of time. but yes, in principle you are right, but in this particular (and special) case it's not a big issue. We should document the RCU read-lock dependencies cleanly and make all rcu-read-lock cases truly rcu_read_lock(), but it's not a pressing issue even considering possible future features like PREEMPT_RT. the only danger in this area is to PREEMPT_RT: it is a bug on PREEMPT_RT if kernel code has an implicit 'spinlock means preempt-off and thus RCU-read-lock' assumption. Most of the time these get discovered via PREEMPT_DEBUG. (preempt_disable() disables preemption on PREEMPT_RT too, so that is not a problem either.) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/5] sched: RCU sched domains
* Nick Piggin [EMAIL PROTECTED] wrote: At a minimum i think we need the fix+comment below. Well if we say this is actually RCU, then yes. And we should probably change the preempt_{dis|en}ables in other places to rcu_read_lock. OTOH, if we say we just want all running threads to process through a preemption stage, then this would just be a preempt_disable/enable pair. In practice that makes no difference yet, but it looks like you and Paul are working to distinguish these two cases in the RCU code, to accomodate your low latency RCU stuff? it doesnt impact PREEMPT_RCU/PREEMPT_RT directly, because the scheduler itself always needs to be non-preemptible. those few places where we currently do preempt_disable(), which should thus be rcu_read_lock(), are never in codepaths that can take alot of time. but yes, in principle you are right, but in this particular (and special) case it's not a big issue. We should document the RCU read-lock dependencies cleanly and make all rcu-read-lock cases truly rcu_read_lock(), but it's not a pressing issue even considering possible future features like PREEMPT_RT. the only danger in this area is to PREEMPT_RT: it is a bug on PREEMPT_RT if kernel code has an implicit 'spinlock means preempt-off and thus RCU-read-lock' assumption. Most of the time these get discovered via PREEMPT_DEBUG. (preempt_disable() disables preemption on PREEMPT_RT too, so that is not a problem either.) Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/5] sched: RCU sched domains
Ingo Molnar wrote: * Nick Piggin [EMAIL PROTECTED] wrote: At a minimum i think we need the fix+comment below. Well if we say this is actually RCU, then yes. And we should probably change the preempt_{dis|en}ables in other places to rcu_read_lock. OTOH, if we say we just want all running threads to process through a preemption stage, then this would just be a preempt_disable/enable pair. In practice that makes no difference yet, but it looks like you and Paul are working to distinguish these two cases in the RCU code, to accomodate your low latency RCU stuff? it doesnt impact PREEMPT_RCU/PREEMPT_RT directly, because the scheduler itself always needs to be non-preemptible. those few places where we currently do preempt_disable(), which should thus be rcu_read_lock(), are never in codepaths that can take alot of time. but yes, in principle you are right, but in this particular (and special) case it's not a big issue. We should document the RCU read-lock dependencies cleanly and make all rcu-read-lock cases truly rcu_read_lock(), but it's not a pressing issue even considering possible future features like PREEMPT_RT. the only danger in this area is to PREEMPT_RT: it is a bug on PREEMPT_RT if kernel code has an implicit 'spinlock means preempt-off and thus RCU-read-lock' assumption. Most of the time these get discovered via PREEMPT_DEBUG. (preempt_disable() disables preemption on PREEMPT_RT too, so that is not a problem either.) OK thanks for the good explanation. So I'll keep it as is for now, and whatever needs cleaning up later can be worked out as it comes up. -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/5] sched: RCU sched domains
Ingo Molnar wrote: * Nick Piggin <[EMAIL PROTECTED]> wrote: 4/5 One of the problems with the multilevel balance-on-fork/exec is that it needs to jump through hoops to satisfy sched-domain's locking semantics (that is, you may traverse your own domain when not preemptable, and you may traverse others' domains when holding their runqueue lock). balance-on-exec had to potentially migrate between more than one CPU before finding a final CPU to migrate to, and balance-on-fork needed to potentially take multiple runqueue locks. So bite the bullet and make sched-domains go completely RCU. This actually simplifies the code quite a bit. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> i like it conceptually, so: Acked-by: Ingo Molnar <[EMAIL PROTECTED]> Oh good, thanks. from now on, all domain-tree readonly uses have to be rcu_read_lock()-ed (or otherwise have to be in a non-preemptible section). But there's a bug in show_shedstats() which does a for_each_domain() from within a preemptible section. (It was a bug with the current hotplug logic too i think.) Ah, thanks. That looks like a bug in the code with the locking we have now too... At a minimum i think we need the fix+comment below. Well if we say "this is actually RCU", then yes. And we should probably change the preempt_{dis|en}ables in other places to rcu_read_lock. OTOH, if we say we just want all running threads to process through a preemption stage, then this would just be a preempt_disable/enable pair. In practice that makes no difference yet, but it looks like you and Paul are working to distinguish these two cases in the RCU code, to accomodate your low latency RCU stuff? I'd prefer the latter (ie. just disable preempt, and use synchronize_sched), but I'm not too sure of what is going on with your the low latency RCU work...? Ingo Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> Thanks for catching that. I may just push it through first as a fix to the current 2.6 schedstats code (using preempt_disable), and afterwards we can change it to rcu_read_lock if that is required. -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/5] sched: RCU sched domains
* Nick Piggin <[EMAIL PROTECTED]> wrote: > 4/5 > One of the problems with the multilevel balance-on-fork/exec is that > it needs to jump through hoops to satisfy sched-domain's locking > semantics (that is, you may traverse your own domain when not > preemptable, and you may traverse others' domains when holding their > runqueue lock). > > balance-on-exec had to potentially migrate between more than one CPU > before finding a final CPU to migrate to, and balance-on-fork needed > to potentially take multiple runqueue locks. > > So bite the bullet and make sched-domains go completely RCU. This > actually simplifies the code quite a bit. > > Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> i like it conceptually, so: Acked-by: Ingo Molnar <[EMAIL PROTECTED]> from now on, all domain-tree readonly uses have to be rcu_read_lock()-ed (or otherwise have to be in a non-preemptible section). But there's a bug in show_shedstats() which does a for_each_domain() from within a preemptible section. (It was a bug with the current hotplug logic too i think.) At a minimum i think we need the fix+comment below. Ingo Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- linux/kernel/sched.c.orig +++ linux/kernel/sched.c @@ -260,6 +260,10 @@ struct runqueue { static DEFINE_PER_CPU(struct runqueue, runqueues); +/* + * The domain tree (rq->sd) is RCU locked. I.e. it may only be accessed + * from within an rcu_read_lock() [or otherwise preempt-disabled] sections. + */ #define for_each_domain(cpu, domain) \ for (domain = cpu_rq(cpu)->sd; domain; domain = domain->parent) @@ -338,6 +342,7 @@ static int show_schedstat(struct seq_fil #ifdef CONFIG_SMP /* domain-specific stats */ + rcu_read_lock(); for_each_domain(cpu, sd) { enum idle_type itype; char mask_str[NR_CPUS]; @@ -361,6 +366,7 @@ static int show_schedstat(struct seq_fil sd->sbe_pushed, sd->sbe_attempts, sd->ttwu_wake_remote, sd->ttwu_move_affine, sd->ttwu_move_balance); } + rcu_read_unlock(); #endif } return 0; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/5] sched: RCU sched domains
* Nick Piggin [EMAIL PROTECTED] wrote: 4/5 One of the problems with the multilevel balance-on-fork/exec is that it needs to jump through hoops to satisfy sched-domain's locking semantics (that is, you may traverse your own domain when not preemptable, and you may traverse others' domains when holding their runqueue lock). balance-on-exec had to potentially migrate between more than one CPU before finding a final CPU to migrate to, and balance-on-fork needed to potentially take multiple runqueue locks. So bite the bullet and make sched-domains go completely RCU. This actually simplifies the code quite a bit. Signed-off-by: Nick Piggin [EMAIL PROTECTED] i like it conceptually, so: Acked-by: Ingo Molnar [EMAIL PROTECTED] from now on, all domain-tree readonly uses have to be rcu_read_lock()-ed (or otherwise have to be in a non-preemptible section). But there's a bug in show_shedstats() which does a for_each_domain() from within a preemptible section. (It was a bug with the current hotplug logic too i think.) At a minimum i think we need the fix+comment below. Ingo Signed-off-by: Ingo Molnar [EMAIL PROTECTED] --- linux/kernel/sched.c.orig +++ linux/kernel/sched.c @@ -260,6 +260,10 @@ struct runqueue { static DEFINE_PER_CPU(struct runqueue, runqueues); +/* + * The domain tree (rq-sd) is RCU locked. I.e. it may only be accessed + * from within an rcu_read_lock() [or otherwise preempt-disabled] sections. + */ #define for_each_domain(cpu, domain) \ for (domain = cpu_rq(cpu)-sd; domain; domain = domain-parent) @@ -338,6 +342,7 @@ static int show_schedstat(struct seq_fil #ifdef CONFIG_SMP /* domain-specific stats */ + rcu_read_lock(); for_each_domain(cpu, sd) { enum idle_type itype; char mask_str[NR_CPUS]; @@ -361,6 +366,7 @@ static int show_schedstat(struct seq_fil sd-sbe_pushed, sd-sbe_attempts, sd-ttwu_wake_remote, sd-ttwu_move_affine, sd-ttwu_move_balance); } + rcu_read_unlock(); #endif } return 0; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/5] sched: RCU sched domains
Ingo Molnar wrote: * Nick Piggin [EMAIL PROTECTED] wrote: 4/5 One of the problems with the multilevel balance-on-fork/exec is that it needs to jump through hoops to satisfy sched-domain's locking semantics (that is, you may traverse your own domain when not preemptable, and you may traverse others' domains when holding their runqueue lock). balance-on-exec had to potentially migrate between more than one CPU before finding a final CPU to migrate to, and balance-on-fork needed to potentially take multiple runqueue locks. So bite the bullet and make sched-domains go completely RCU. This actually simplifies the code quite a bit. Signed-off-by: Nick Piggin [EMAIL PROTECTED] i like it conceptually, so: Acked-by: Ingo Molnar [EMAIL PROTECTED] Oh good, thanks. from now on, all domain-tree readonly uses have to be rcu_read_lock()-ed (or otherwise have to be in a non-preemptible section). But there's a bug in show_shedstats() which does a for_each_domain() from within a preemptible section. (It was a bug with the current hotplug logic too i think.) Ah, thanks. That looks like a bug in the code with the locking we have now too... At a minimum i think we need the fix+comment below. Well if we say this is actually RCU, then yes. And we should probably change the preempt_{dis|en}ables in other places to rcu_read_lock. OTOH, if we say we just want all running threads to process through a preemption stage, then this would just be a preempt_disable/enable pair. In practice that makes no difference yet, but it looks like you and Paul are working to distinguish these two cases in the RCU code, to accomodate your low latency RCU stuff? I'd prefer the latter (ie. just disable preempt, and use synchronize_sched), but I'm not too sure of what is going on with your the low latency RCU work...? Ingo Signed-off-by: Ingo Molnar [EMAIL PROTECTED] Thanks for catching that. I may just push it through first as a fix to the current 2.6 schedstats code (using preempt_disable), and afterwards we can change it to rcu_read_lock if that is required. -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 4/5] sched: RCU sched domains
4/5 One of the problems with the multilevel balance-on-fork/exec is that it needs to jump through hoops to satisfy sched-domain's locking semantics (that is, you may traverse your own domain when not preemptable, and you may traverse others' domains when holding their runqueue lock). balance-on-exec had to potentially migrate between more than one CPU before finding a final CPU to migrate to, and balance-on-fork needed to potentially take multiple runqueue locks. So bite the bullet and make sched-domains go completely RCU. This actually simplifies the code quite a bit. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/kernel/sched.c === --- linux-2.6.orig/kernel/sched.c 2005-04-05 16:39:14.0 +1000 +++ linux-2.6/kernel/sched.c2005-04-05 18:39:05.0 +1000 @@ -825,22 +825,12 @@ inline int task_curr(const task_t *p) } #ifdef CONFIG_SMP -enum request_type { - REQ_MOVE_TASK, - REQ_SET_DOMAIN, -}; - typedef struct { struct list_head list; - enum request_type type; - /* For REQ_MOVE_TASK */ task_t *task; int dest_cpu; - /* For REQ_SET_DOMAIN */ - struct sched_domain *sd; - struct completion done; } migration_req_t; @@ -862,7 +852,6 @@ static int migrate_task(task_t *p, int d } init_completion(>done); - req->type = REQ_MOVE_TASK; req->task = p; req->dest_cpu = dest_cpu; list_add(>list, >migration_queue); @@ -4365,17 +4354,9 @@ static int migration_thread(void * data) req = list_entry(head->next, migration_req_t, list); list_del_init(head->next); - if (req->type == REQ_MOVE_TASK) { - spin_unlock(>lock); - __migrate_task(req->task, cpu, req->dest_cpu); - local_irq_enable(); - } else if (req->type == REQ_SET_DOMAIN) { - rq->sd = req->sd; - spin_unlock_irq(>lock); - } else { - spin_unlock_irq(>lock); - WARN_ON(1); - } + spin_unlock(>lock); + __migrate_task(req->task, cpu, req->dest_cpu); + local_irq_enable(); complete(>done); } @@ -4606,7 +4587,6 @@ static int migration_call(struct notifie migration_req_t *req; req = list_entry(rq->migration_queue.next, migration_req_t, list); - BUG_ON(req->type != REQ_MOVE_TASK); list_del_init(>list); complete(>done); } @@ -4903,10 +4883,7 @@ static int __devinit sd_parent_degenerat */ void __devinit cpu_attach_domain(struct sched_domain *sd, int cpu) { - migration_req_t req; - unsigned long flags; runqueue_t *rq = cpu_rq(cpu); - int local = 1; struct sched_domain *tmp; /* Remove the sched domains which do not contribute to scheduling. */ @@ -4923,24 +4900,7 @@ void __devinit cpu_attach_domain(struct sched_domain_debug(sd, cpu); - spin_lock_irqsave(>lock, flags); - - if (cpu == smp_processor_id() || !cpu_online(cpu)) { - rq->sd = sd; - } else { - init_completion(); - req.type = REQ_SET_DOMAIN; - req.sd = sd; - list_add(, >migration_queue); - local = 0; - } - - spin_unlock_irqrestore(>lock, flags); - - if (!local) { - wake_up_process(rq->migration_thread); - wait_for_completion(); - } + rq->sd = sd; } /* cpus with isolated domains */ @@ -5215,6 +5175,7 @@ static int update_sched_domains(struct n case CPU_DOWN_PREPARE: for_each_online_cpu(i) cpu_attach_domain(NULL, i); + synchronize_kernel(); arch_destroy_sched_domains(); return NOTIFY_OK;
[patch 4/5] sched: RCU sched domains
4/5 One of the problems with the multilevel balance-on-fork/exec is that it needs to jump through hoops to satisfy sched-domain's locking semantics (that is, you may traverse your own domain when not preemptable, and you may traverse others' domains when holding their runqueue lock). balance-on-exec had to potentially migrate between more than one CPU before finding a final CPU to migrate to, and balance-on-fork needed to potentially take multiple runqueue locks. So bite the bullet and make sched-domains go completely RCU. This actually simplifies the code quite a bit. Signed-off-by: Nick Piggin [EMAIL PROTECTED] Index: linux-2.6/kernel/sched.c === --- linux-2.6.orig/kernel/sched.c 2005-04-05 16:39:14.0 +1000 +++ linux-2.6/kernel/sched.c2005-04-05 18:39:05.0 +1000 @@ -825,22 +825,12 @@ inline int task_curr(const task_t *p) } #ifdef CONFIG_SMP -enum request_type { - REQ_MOVE_TASK, - REQ_SET_DOMAIN, -}; - typedef struct { struct list_head list; - enum request_type type; - /* For REQ_MOVE_TASK */ task_t *task; int dest_cpu; - /* For REQ_SET_DOMAIN */ - struct sched_domain *sd; - struct completion done; } migration_req_t; @@ -862,7 +852,6 @@ static int migrate_task(task_t *p, int d } init_completion(req-done); - req-type = REQ_MOVE_TASK; req-task = p; req-dest_cpu = dest_cpu; list_add(req-list, rq-migration_queue); @@ -4365,17 +4354,9 @@ static int migration_thread(void * data) req = list_entry(head-next, migration_req_t, list); list_del_init(head-next); - if (req-type == REQ_MOVE_TASK) { - spin_unlock(rq-lock); - __migrate_task(req-task, cpu, req-dest_cpu); - local_irq_enable(); - } else if (req-type == REQ_SET_DOMAIN) { - rq-sd = req-sd; - spin_unlock_irq(rq-lock); - } else { - spin_unlock_irq(rq-lock); - WARN_ON(1); - } + spin_unlock(rq-lock); + __migrate_task(req-task, cpu, req-dest_cpu); + local_irq_enable(); complete(req-done); } @@ -4606,7 +4587,6 @@ static int migration_call(struct notifie migration_req_t *req; req = list_entry(rq-migration_queue.next, migration_req_t, list); - BUG_ON(req-type != REQ_MOVE_TASK); list_del_init(req-list); complete(req-done); } @@ -4903,10 +4883,7 @@ static int __devinit sd_parent_degenerat */ void __devinit cpu_attach_domain(struct sched_domain *sd, int cpu) { - migration_req_t req; - unsigned long flags; runqueue_t *rq = cpu_rq(cpu); - int local = 1; struct sched_domain *tmp; /* Remove the sched domains which do not contribute to scheduling. */ @@ -4923,24 +4900,7 @@ void __devinit cpu_attach_domain(struct sched_domain_debug(sd, cpu); - spin_lock_irqsave(rq-lock, flags); - - if (cpu == smp_processor_id() || !cpu_online(cpu)) { - rq-sd = sd; - } else { - init_completion(req.done); - req.type = REQ_SET_DOMAIN; - req.sd = sd; - list_add(req.list, rq-migration_queue); - local = 0; - } - - spin_unlock_irqrestore(rq-lock, flags); - - if (!local) { - wake_up_process(rq-migration_thread); - wait_for_completion(req.done); - } + rq-sd = sd; } /* cpus with isolated domains */ @@ -5215,6 +5175,7 @@ static int update_sched_domains(struct n case CPU_DOWN_PREPARE: for_each_online_cpu(i) cpu_attach_domain(NULL, i); + synchronize_kernel(); arch_destroy_sched_domains(); return NOTIFY_OK;