Re: [patch 4/5] sched: RCU sched domains

2005-04-11 Thread Nick Piggin
Paul E. McKenney wrote:
On Thu, Apr 07, 2005 at 05:58:40PM +1000, Nick Piggin wrote:

OK thanks for the good explanation. So I'll keep it as is for now,
and whatever needs cleaning up later can be worked out as it comes
up.

Looking forward to the split of synchronize_kernel() into synchronize_rcu()
and synchronize_sched(), the two choices are:
o   Use synchronize_rcu(), but insert rcu_read_lock()/rcu_read_unlock()
pairs on the read side.
o   Use synchronize_sched(), and make sure all read-side code is
under preempt_disable().
Yep, I think we'll go for the second option initially (because that
pretty closely matches the homebrew locking scheme that it used to
use).
Either way, there may also need to be some rcu_dereference()s when picking
up pointer and rcu_assign_pointer()s when updating the pointers.
For example, if traversing the domain parent list is to be RCU protected,
the for_each_domain() macro should change to something like:
Yes, I think you're right, because there's no barriers or synchronisation
when attaching a new domain. Just a small point though:
#define for_each_domain(cpu, domain) \
for (domain = cpu_rq(cpu)->sd; domain; domain = 
rcu_dereference(domain->parent))
This should probably be done like so?
#define for_each_domain(cpu, domain) \
for (domain = rcu_dereference(cpu_rq(cpu)->sd); domain; domain = 
domain->parent)
And I think it would be wise to use rcu_assign_pointer in the update too.
Thanks Paul.
--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/5] sched: RCU sched domains

2005-04-11 Thread Paul E. McKenney
On Thu, Apr 07, 2005 at 05:58:40PM +1000, Nick Piggin wrote:
> Ingo Molnar wrote:
> >* Nick Piggin <[EMAIL PROTECTED]> wrote:
> >
> >
> >>>At a minimum i think we need the fix+comment below.
> >>
> >>Well if we say "this is actually RCU", then yes. And we should 
> >>probably change the preempt_{dis|en}ables in other places to 
> >>rcu_read_lock.
> >>
> >>OTOH, if we say we just want all running threads to process through a 
> >>preemption stage, then this would just be a preempt_disable/enable 
> >>pair.
> >>
> >>In practice that makes no difference yet, but it looks like you and 
> >>Paul are working to distinguish these two cases in the RCU code, to 
> >>accomodate your low latency RCU stuff?
> >
> >
> >it doesnt impact PREEMPT_RCU/PREEMPT_RT directly, because the scheduler 
> >itself always needs to be non-preemptible.
> >
> >those few places where we currently do preempt_disable(), which should 
> >thus be rcu_read_lock(), are never in codepaths that can take alot of 
> >time.
> >
> >but yes, in principle you are right, but in this particular (and 
> >special) case it's not a big issue. We should document the RCU read-lock 
> >dependencies cleanly and make all rcu-read-lock cases truly 
> >rcu_read_lock(), but it's not a pressing issue even considering possible 
> >future features like PREEMPT_RT.
> >
> >the only danger in this area is to PREEMPT_RT: it is a bug on PREEMPT_RT 
> >if kernel code has an implicit 'spinlock means preempt-off and thus 
> >RCU-read-lock' assumption. Most of the time these get discovered via 
> >PREEMPT_DEBUG. (preempt_disable() disables preemption on PREEMPT_RT too, 
> >so that is not a problem either.)
> >
> 
> OK thanks for the good explanation. So I'll keep it as is for now,
> and whatever needs cleaning up later can be worked out as it comes
> up.

Looking forward to the split of synchronize_kernel() into synchronize_rcu()
and synchronize_sched(), the two choices are:

o   Use synchronize_rcu(), but insert rcu_read_lock()/rcu_read_unlock()
pairs on the read side.

o   Use synchronize_sched(), and make sure all read-side code is
under preempt_disable().

Either way, there may also need to be some rcu_dereference()s when picking
up pointer and rcu_assign_pointer()s when updating the pointers.
For example, if traversing the domain parent list is to be RCU protected,
the for_each_domain() macro should change to something like:

#define for_each_domain(cpu, domain) \
for (domain = cpu_rq(cpu)->sd; domain; domain = 
rcu_dereference(domain->parent))

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/5] sched: RCU sched domains

2005-04-11 Thread Paul E. McKenney
On Thu, Apr 07, 2005 at 05:58:40PM +1000, Nick Piggin wrote:
 Ingo Molnar wrote:
 * Nick Piggin [EMAIL PROTECTED] wrote:
 
 
 At a minimum i think we need the fix+comment below.
 
 Well if we say this is actually RCU, then yes. And we should 
 probably change the preempt_{dis|en}ables in other places to 
 rcu_read_lock.
 
 OTOH, if we say we just want all running threads to process through a 
 preemption stage, then this would just be a preempt_disable/enable 
 pair.
 
 In practice that makes no difference yet, but it looks like you and 
 Paul are working to distinguish these two cases in the RCU code, to 
 accomodate your low latency RCU stuff?
 
 
 it doesnt impact PREEMPT_RCU/PREEMPT_RT directly, because the scheduler 
 itself always needs to be non-preemptible.
 
 those few places where we currently do preempt_disable(), which should 
 thus be rcu_read_lock(), are never in codepaths that can take alot of 
 time.
 
 but yes, in principle you are right, but in this particular (and 
 special) case it's not a big issue. We should document the RCU read-lock 
 dependencies cleanly and make all rcu-read-lock cases truly 
 rcu_read_lock(), but it's not a pressing issue even considering possible 
 future features like PREEMPT_RT.
 
 the only danger in this area is to PREEMPT_RT: it is a bug on PREEMPT_RT 
 if kernel code has an implicit 'spinlock means preempt-off and thus 
 RCU-read-lock' assumption. Most of the time these get discovered via 
 PREEMPT_DEBUG. (preempt_disable() disables preemption on PREEMPT_RT too, 
 so that is not a problem either.)
 
 
 OK thanks for the good explanation. So I'll keep it as is for now,
 and whatever needs cleaning up later can be worked out as it comes
 up.

Looking forward to the split of synchronize_kernel() into synchronize_rcu()
and synchronize_sched(), the two choices are:

o   Use synchronize_rcu(), but insert rcu_read_lock()/rcu_read_unlock()
pairs on the read side.

o   Use synchronize_sched(), and make sure all read-side code is
under preempt_disable().

Either way, there may also need to be some rcu_dereference()s when picking
up pointer and rcu_assign_pointer()s when updating the pointers.
For example, if traversing the domain parent list is to be RCU protected,
the for_each_domain() macro should change to something like:

#define for_each_domain(cpu, domain) \
for (domain = cpu_rq(cpu)-sd; domain; domain = 
rcu_dereference(domain-parent))

Thanx, Paul
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/5] sched: RCU sched domains

2005-04-11 Thread Nick Piggin
Paul E. McKenney wrote:
On Thu, Apr 07, 2005 at 05:58:40PM +1000, Nick Piggin wrote:

OK thanks for the good explanation. So I'll keep it as is for now,
and whatever needs cleaning up later can be worked out as it comes
up.

Looking forward to the split of synchronize_kernel() into synchronize_rcu()
and synchronize_sched(), the two choices are:
o   Use synchronize_rcu(), but insert rcu_read_lock()/rcu_read_unlock()
pairs on the read side.
o   Use synchronize_sched(), and make sure all read-side code is
under preempt_disable().
Yep, I think we'll go for the second option initially (because that
pretty closely matches the homebrew locking scheme that it used to
use).
Either way, there may also need to be some rcu_dereference()s when picking
up pointer and rcu_assign_pointer()s when updating the pointers.
For example, if traversing the domain parent list is to be RCU protected,
the for_each_domain() macro should change to something like:
Yes, I think you're right, because there's no barriers or synchronisation
when attaching a new domain. Just a small point though:
#define for_each_domain(cpu, domain) \
for (domain = cpu_rq(cpu)-sd; domain; domain = 
rcu_dereference(domain-parent))
This should probably be done like so?
#define for_each_domain(cpu, domain) \
for (domain = rcu_dereference(cpu_rq(cpu)-sd); domain; domain = 
domain-parent)
And I think it would be wise to use rcu_assign_pointer in the update too.
Thanks Paul.
--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/5] sched: RCU sched domains

2005-04-07 Thread Nick Piggin
Ingo Molnar wrote:
* Nick Piggin <[EMAIL PROTECTED]> wrote:

At a minimum i think we need the fix+comment below.
Well if we say "this is actually RCU", then yes. And we should 
probably change the preempt_{dis|en}ables in other places to 
rcu_read_lock.

OTOH, if we say we just want all running threads to process through a 
preemption stage, then this would just be a preempt_disable/enable 
pair.

In practice that makes no difference yet, but it looks like you and 
Paul are working to distinguish these two cases in the RCU code, to 
accomodate your low latency RCU stuff?

it doesnt impact PREEMPT_RCU/PREEMPT_RT directly, because the scheduler 
itself always needs to be non-preemptible.

those few places where we currently do preempt_disable(), which should 
thus be rcu_read_lock(), are never in codepaths that can take alot of 
time.

but yes, in principle you are right, but in this particular (and 
special) case it's not a big issue. We should document the RCU read-lock 
dependencies cleanly and make all rcu-read-lock cases truly 
rcu_read_lock(), but it's not a pressing issue even considering possible 
future features like PREEMPT_RT.

the only danger in this area is to PREEMPT_RT: it is a bug on PREEMPT_RT 
if kernel code has an implicit 'spinlock means preempt-off and thus 
RCU-read-lock' assumption. Most of the time these get discovered via 
PREEMPT_DEBUG. (preempt_disable() disables preemption on PREEMPT_RT too, 
so that is not a problem either.)

OK thanks for the good explanation. So I'll keep it as is for now,
and whatever needs cleaning up later can be worked out as it comes
up.
--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/5] sched: RCU sched domains

2005-04-07 Thread Ingo Molnar

* Nick Piggin <[EMAIL PROTECTED]> wrote:

> > At a minimum i think we need the fix+comment below.
> 
> Well if we say "this is actually RCU", then yes. And we should 
> probably change the preempt_{dis|en}ables in other places to 
> rcu_read_lock.
> 
> OTOH, if we say we just want all running threads to process through a 
> preemption stage, then this would just be a preempt_disable/enable 
> pair.
> 
> In practice that makes no difference yet, but it looks like you and 
> Paul are working to distinguish these two cases in the RCU code, to 
> accomodate your low latency RCU stuff?

it doesnt impact PREEMPT_RCU/PREEMPT_RT directly, because the scheduler 
itself always needs to be non-preemptible.

those few places where we currently do preempt_disable(), which should 
thus be rcu_read_lock(), are never in codepaths that can take alot of 
time.

but yes, in principle you are right, but in this particular (and 
special) case it's not a big issue. We should document the RCU read-lock 
dependencies cleanly and make all rcu-read-lock cases truly 
rcu_read_lock(), but it's not a pressing issue even considering possible 
future features like PREEMPT_RT.

the only danger in this area is to PREEMPT_RT: it is a bug on PREEMPT_RT 
if kernel code has an implicit 'spinlock means preempt-off and thus 
RCU-read-lock' assumption. Most of the time these get discovered via 
PREEMPT_DEBUG. (preempt_disable() disables preemption on PREEMPT_RT too, 
so that is not a problem either.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/5] sched: RCU sched domains

2005-04-07 Thread Ingo Molnar

* Nick Piggin [EMAIL PROTECTED] wrote:

  At a minimum i think we need the fix+comment below.
 
 Well if we say this is actually RCU, then yes. And we should 
 probably change the preempt_{dis|en}ables in other places to 
 rcu_read_lock.
 
 OTOH, if we say we just want all running threads to process through a 
 preemption stage, then this would just be a preempt_disable/enable 
 pair.
 
 In practice that makes no difference yet, but it looks like you and 
 Paul are working to distinguish these two cases in the RCU code, to 
 accomodate your low latency RCU stuff?

it doesnt impact PREEMPT_RCU/PREEMPT_RT directly, because the scheduler 
itself always needs to be non-preemptible.

those few places where we currently do preempt_disable(), which should 
thus be rcu_read_lock(), are never in codepaths that can take alot of 
time.

but yes, in principle you are right, but in this particular (and 
special) case it's not a big issue. We should document the RCU read-lock 
dependencies cleanly and make all rcu-read-lock cases truly 
rcu_read_lock(), but it's not a pressing issue even considering possible 
future features like PREEMPT_RT.

the only danger in this area is to PREEMPT_RT: it is a bug on PREEMPT_RT 
if kernel code has an implicit 'spinlock means preempt-off and thus 
RCU-read-lock' assumption. Most of the time these get discovered via 
PREEMPT_DEBUG. (preempt_disable() disables preemption on PREEMPT_RT too, 
so that is not a problem either.)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/5] sched: RCU sched domains

2005-04-07 Thread Nick Piggin
Ingo Molnar wrote:
* Nick Piggin [EMAIL PROTECTED] wrote:

At a minimum i think we need the fix+comment below.
Well if we say this is actually RCU, then yes. And we should 
probably change the preempt_{dis|en}ables in other places to 
rcu_read_lock.

OTOH, if we say we just want all running threads to process through a 
preemption stage, then this would just be a preempt_disable/enable 
pair.

In practice that makes no difference yet, but it looks like you and 
Paul are working to distinguish these two cases in the RCU code, to 
accomodate your low latency RCU stuff?

it doesnt impact PREEMPT_RCU/PREEMPT_RT directly, because the scheduler 
itself always needs to be non-preemptible.

those few places where we currently do preempt_disable(), which should 
thus be rcu_read_lock(), are never in codepaths that can take alot of 
time.

but yes, in principle you are right, but in this particular (and 
special) case it's not a big issue. We should document the RCU read-lock 
dependencies cleanly and make all rcu-read-lock cases truly 
rcu_read_lock(), but it's not a pressing issue even considering possible 
future features like PREEMPT_RT.

the only danger in this area is to PREEMPT_RT: it is a bug on PREEMPT_RT 
if kernel code has an implicit 'spinlock means preempt-off and thus 
RCU-read-lock' assumption. Most of the time these get discovered via 
PREEMPT_DEBUG. (preempt_disable() disables preemption on PREEMPT_RT too, 
so that is not a problem either.)

OK thanks for the good explanation. So I'll keep it as is for now,
and whatever needs cleaning up later can be worked out as it comes
up.
--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/5] sched: RCU sched domains

2005-04-06 Thread Nick Piggin
Ingo Molnar wrote:
* Nick Piggin <[EMAIL PROTECTED]> wrote:

4/5

One of the problems with the multilevel balance-on-fork/exec is that 
it needs to jump through hoops to satisfy sched-domain's locking 
semantics (that is, you may traverse your own domain when not 
preemptable, and you may traverse others' domains when holding their 
runqueue lock).

balance-on-exec had to potentially migrate between more than one CPU 
before finding a final CPU to migrate to, and balance-on-fork needed 
to potentially take multiple runqueue locks.

So bite the bullet and make sched-domains go completely RCU. This 
actually simplifies the code quite a bit.

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

i like it conceptually, so:
Acked-by: Ingo Molnar <[EMAIL PROTECTED]>
Oh good, thanks.
from now on, all domain-tree readonly uses have to be rcu_read_lock()-ed 
(or otherwise have to be in a non-preemptible section). But there's a 
bug in show_shedstats() which does a for_each_domain() from within a 
preemptible section. (It was a bug with the current hotplug logic too i 
think.)

Ah, thanks. That looks like a bug in the code with the locking
we have now too...
At a minimum i think we need the fix+comment below.
Well if we say "this is actually RCU", then yes. And we should
probably change the preempt_{dis|en}ables in other places to
rcu_read_lock.
OTOH, if we say we just want all running threads to process through
a preemption stage, then this would just be a preempt_disable/enable
pair.
In practice that makes no difference yet, but it looks like you and
Paul are working to distinguish these two cases in the RCU code, to
accomodate your low latency RCU stuff?
I'd prefer the latter (ie. just disable preempt, and use
synchronize_sched), but I'm not too sure of what is going on with
your the low latency RCU work...?
Ingo
Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
Thanks for catching that. I may just push it through first as a fix
to the current 2.6 schedstats code (using preempt_disable), and
afterwards we can change it to rcu_read_lock if that is required.
--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/5] sched: RCU sched domains

2005-04-06 Thread Ingo Molnar

* Nick Piggin <[EMAIL PROTECTED]> wrote:

> 4/5

> One of the problems with the multilevel balance-on-fork/exec is that 
> it needs to jump through hoops to satisfy sched-domain's locking 
> semantics (that is, you may traverse your own domain when not 
> preemptable, and you may traverse others' domains when holding their 
> runqueue lock).
> 
> balance-on-exec had to potentially migrate between more than one CPU 
> before finding a final CPU to migrate to, and balance-on-fork needed 
> to potentially take multiple runqueue locks.
> 
> So bite the bullet and make sched-domains go completely RCU. This 
> actually simplifies the code quite a bit.
> 
> Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

i like it conceptually, so:

Acked-by: Ingo Molnar <[EMAIL PROTECTED]>

from now on, all domain-tree readonly uses have to be rcu_read_lock()-ed 
(or otherwise have to be in a non-preemptible section). But there's a 
bug in show_shedstats() which does a for_each_domain() from within a 
preemptible section. (It was a bug with the current hotplug logic too i 
think.)

At a minimum i think we need the fix+comment below.

Ingo

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>

--- linux/kernel/sched.c.orig
+++ linux/kernel/sched.c
@@ -260,6 +260,10 @@ struct runqueue {
 
 static DEFINE_PER_CPU(struct runqueue, runqueues);
 
+/*
+ * The domain tree (rq->sd) is RCU locked. I.e. it may only be accessed
+ * from within an rcu_read_lock() [or otherwise preempt-disabled] sections.
+ */
 #define for_each_domain(cpu, domain) \
for (domain = cpu_rq(cpu)->sd; domain; domain = domain->parent)
 
@@ -338,6 +342,7 @@ static int show_schedstat(struct seq_fil
 
 #ifdef CONFIG_SMP
/* domain-specific stats */
+   rcu_read_lock();
for_each_domain(cpu, sd) {
enum idle_type itype;
char mask_str[NR_CPUS];
@@ -361,6 +366,7 @@ static int show_schedstat(struct seq_fil
sd->sbe_pushed, sd->sbe_attempts,
sd->ttwu_wake_remote, sd->ttwu_move_affine, 
sd->ttwu_move_balance);
}
+   rcu_read_unlock();
 #endif
}
return 0;
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/5] sched: RCU sched domains

2005-04-06 Thread Ingo Molnar

* Nick Piggin [EMAIL PROTECTED] wrote:

 4/5

 One of the problems with the multilevel balance-on-fork/exec is that 
 it needs to jump through hoops to satisfy sched-domain's locking 
 semantics (that is, you may traverse your own domain when not 
 preemptable, and you may traverse others' domains when holding their 
 runqueue lock).
 
 balance-on-exec had to potentially migrate between more than one CPU 
 before finding a final CPU to migrate to, and balance-on-fork needed 
 to potentially take multiple runqueue locks.
 
 So bite the bullet and make sched-domains go completely RCU. This 
 actually simplifies the code quite a bit.
 
 Signed-off-by: Nick Piggin [EMAIL PROTECTED]

i like it conceptually, so:

Acked-by: Ingo Molnar [EMAIL PROTECTED]

from now on, all domain-tree readonly uses have to be rcu_read_lock()-ed 
(or otherwise have to be in a non-preemptible section). But there's a 
bug in show_shedstats() which does a for_each_domain() from within a 
preemptible section. (It was a bug with the current hotplug logic too i 
think.)

At a minimum i think we need the fix+comment below.

Ingo

Signed-off-by: Ingo Molnar [EMAIL PROTECTED]

--- linux/kernel/sched.c.orig
+++ linux/kernel/sched.c
@@ -260,6 +260,10 @@ struct runqueue {
 
 static DEFINE_PER_CPU(struct runqueue, runqueues);
 
+/*
+ * The domain tree (rq-sd) is RCU locked. I.e. it may only be accessed
+ * from within an rcu_read_lock() [or otherwise preempt-disabled] sections.
+ */
 #define for_each_domain(cpu, domain) \
for (domain = cpu_rq(cpu)-sd; domain; domain = domain-parent)
 
@@ -338,6 +342,7 @@ static int show_schedstat(struct seq_fil
 
 #ifdef CONFIG_SMP
/* domain-specific stats */
+   rcu_read_lock();
for_each_domain(cpu, sd) {
enum idle_type itype;
char mask_str[NR_CPUS];
@@ -361,6 +366,7 @@ static int show_schedstat(struct seq_fil
sd-sbe_pushed, sd-sbe_attempts,
sd-ttwu_wake_remote, sd-ttwu_move_affine, 
sd-ttwu_move_balance);
}
+   rcu_read_unlock();
 #endif
}
return 0;
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/5] sched: RCU sched domains

2005-04-06 Thread Nick Piggin
Ingo Molnar wrote:
* Nick Piggin [EMAIL PROTECTED] wrote:

4/5

One of the problems with the multilevel balance-on-fork/exec is that 
it needs to jump through hoops to satisfy sched-domain's locking 
semantics (that is, you may traverse your own domain when not 
preemptable, and you may traverse others' domains when holding their 
runqueue lock).

balance-on-exec had to potentially migrate between more than one CPU 
before finding a final CPU to migrate to, and balance-on-fork needed 
to potentially take multiple runqueue locks.

So bite the bullet and make sched-domains go completely RCU. This 
actually simplifies the code quite a bit.

Signed-off-by: Nick Piggin [EMAIL PROTECTED]

i like it conceptually, so:
Acked-by: Ingo Molnar [EMAIL PROTECTED]
Oh good, thanks.
from now on, all domain-tree readonly uses have to be rcu_read_lock()-ed 
(or otherwise have to be in a non-preemptible section). But there's a 
bug in show_shedstats() which does a for_each_domain() from within a 
preemptible section. (It was a bug with the current hotplug logic too i 
think.)

Ah, thanks. That looks like a bug in the code with the locking
we have now too...
At a minimum i think we need the fix+comment below.
Well if we say this is actually RCU, then yes. And we should
probably change the preempt_{dis|en}ables in other places to
rcu_read_lock.
OTOH, if we say we just want all running threads to process through
a preemption stage, then this would just be a preempt_disable/enable
pair.
In practice that makes no difference yet, but it looks like you and
Paul are working to distinguish these two cases in the RCU code, to
accomodate your low latency RCU stuff?
I'd prefer the latter (ie. just disable preempt, and use
synchronize_sched), but I'm not too sure of what is going on with
your the low latency RCU work...?
Ingo
Signed-off-by: Ingo Molnar [EMAIL PROTECTED]
Thanks for catching that. I may just push it through first as a fix
to the current 2.6 schedstats code (using preempt_disable), and
afterwards we can change it to rcu_read_lock if that is required.
--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 4/5] sched: RCU sched domains

2005-04-05 Thread Nick Piggin
4/5
One of the problems with the multilevel balance-on-fork/exec is that it
needs to jump through hoops to satisfy sched-domain's locking semantics
(that is, you may traverse your own domain when not preemptable, and
you may traverse others' domains when holding their runqueue lock).

balance-on-exec had to potentially migrate between more than one CPU before
finding a final CPU to migrate to, and balance-on-fork needed to potentially
take multiple runqueue locks.

So bite the bullet and make sched-domains go completely RCU. This actually
simplifies the code quite a bit.

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

Index: linux-2.6/kernel/sched.c
===
--- linux-2.6.orig/kernel/sched.c   2005-04-05 16:39:14.0 +1000
+++ linux-2.6/kernel/sched.c2005-04-05 18:39:05.0 +1000
@@ -825,22 +825,12 @@ inline int task_curr(const task_t *p)
 }
 
 #ifdef CONFIG_SMP
-enum request_type {
-   REQ_MOVE_TASK,
-   REQ_SET_DOMAIN,
-};
-
 typedef struct {
struct list_head list;
-   enum request_type type;
 
-   /* For REQ_MOVE_TASK */
task_t *task;
int dest_cpu;
 
-   /* For REQ_SET_DOMAIN */
-   struct sched_domain *sd;
-
struct completion done;
 } migration_req_t;
 
@@ -862,7 +852,6 @@ static int migrate_task(task_t *p, int d
}
 
init_completion(>done);
-   req->type = REQ_MOVE_TASK;
req->task = p;
req->dest_cpu = dest_cpu;
list_add(>list, >migration_queue);
@@ -4365,17 +4354,9 @@ static int migration_thread(void * data)
req = list_entry(head->next, migration_req_t, list);
list_del_init(head->next);
 
-   if (req->type == REQ_MOVE_TASK) {
-   spin_unlock(>lock);
-   __migrate_task(req->task, cpu, req->dest_cpu);
-   local_irq_enable();
-   } else if (req->type == REQ_SET_DOMAIN) {
-   rq->sd = req->sd;
-   spin_unlock_irq(>lock);
-   } else {
-   spin_unlock_irq(>lock);
-   WARN_ON(1);
-   }
+   spin_unlock(>lock);
+   __migrate_task(req->task, cpu, req->dest_cpu);
+   local_irq_enable();
 
complete(>done);
}
@@ -4606,7 +4587,6 @@ static int migration_call(struct notifie
migration_req_t *req;
req = list_entry(rq->migration_queue.next,
 migration_req_t, list);
-   BUG_ON(req->type != REQ_MOVE_TASK);
list_del_init(>list);
complete(>done);
}
@@ -4903,10 +4883,7 @@ static int __devinit sd_parent_degenerat
  */
 void __devinit cpu_attach_domain(struct sched_domain *sd, int cpu)
 {
-   migration_req_t req;
-   unsigned long flags;
runqueue_t *rq = cpu_rq(cpu);
-   int local = 1;
struct sched_domain *tmp;
 
/* Remove the sched domains which do not contribute to scheduling. */
@@ -4923,24 +4900,7 @@ void __devinit cpu_attach_domain(struct 
 
sched_domain_debug(sd, cpu);
 
-   spin_lock_irqsave(>lock, flags);
-
-   if (cpu == smp_processor_id() || !cpu_online(cpu)) {
-   rq->sd = sd;
-   } else {
-   init_completion();
-   req.type = REQ_SET_DOMAIN;
-   req.sd = sd;
-   list_add(, >migration_queue);
-   local = 0;
-   }
-
-   spin_unlock_irqrestore(>lock, flags);
-
-   if (!local) {
-   wake_up_process(rq->migration_thread);
-   wait_for_completion();
-   }
+   rq->sd = sd;
 }
 
 /* cpus with isolated domains */
@@ -5215,6 +5175,7 @@ static int update_sched_domains(struct n
case CPU_DOWN_PREPARE:
for_each_online_cpu(i)
cpu_attach_domain(NULL, i);
+   synchronize_kernel();
arch_destroy_sched_domains();
return NOTIFY_OK;
 


[patch 4/5] sched: RCU sched domains

2005-04-05 Thread Nick Piggin
4/5
One of the problems with the multilevel balance-on-fork/exec is that it
needs to jump through hoops to satisfy sched-domain's locking semantics
(that is, you may traverse your own domain when not preemptable, and
you may traverse others' domains when holding their runqueue lock).

balance-on-exec had to potentially migrate between more than one CPU before
finding a final CPU to migrate to, and balance-on-fork needed to potentially
take multiple runqueue locks.

So bite the bullet and make sched-domains go completely RCU. This actually
simplifies the code quite a bit.

Signed-off-by: Nick Piggin [EMAIL PROTECTED]

Index: linux-2.6/kernel/sched.c
===
--- linux-2.6.orig/kernel/sched.c   2005-04-05 16:39:14.0 +1000
+++ linux-2.6/kernel/sched.c2005-04-05 18:39:05.0 +1000
@@ -825,22 +825,12 @@ inline int task_curr(const task_t *p)
 }
 
 #ifdef CONFIG_SMP
-enum request_type {
-   REQ_MOVE_TASK,
-   REQ_SET_DOMAIN,
-};
-
 typedef struct {
struct list_head list;
-   enum request_type type;
 
-   /* For REQ_MOVE_TASK */
task_t *task;
int dest_cpu;
 
-   /* For REQ_SET_DOMAIN */
-   struct sched_domain *sd;
-
struct completion done;
 } migration_req_t;
 
@@ -862,7 +852,6 @@ static int migrate_task(task_t *p, int d
}
 
init_completion(req-done);
-   req-type = REQ_MOVE_TASK;
req-task = p;
req-dest_cpu = dest_cpu;
list_add(req-list, rq-migration_queue);
@@ -4365,17 +4354,9 @@ static int migration_thread(void * data)
req = list_entry(head-next, migration_req_t, list);
list_del_init(head-next);
 
-   if (req-type == REQ_MOVE_TASK) {
-   spin_unlock(rq-lock);
-   __migrate_task(req-task, cpu, req-dest_cpu);
-   local_irq_enable();
-   } else if (req-type == REQ_SET_DOMAIN) {
-   rq-sd = req-sd;
-   spin_unlock_irq(rq-lock);
-   } else {
-   spin_unlock_irq(rq-lock);
-   WARN_ON(1);
-   }
+   spin_unlock(rq-lock);
+   __migrate_task(req-task, cpu, req-dest_cpu);
+   local_irq_enable();
 
complete(req-done);
}
@@ -4606,7 +4587,6 @@ static int migration_call(struct notifie
migration_req_t *req;
req = list_entry(rq-migration_queue.next,
 migration_req_t, list);
-   BUG_ON(req-type != REQ_MOVE_TASK);
list_del_init(req-list);
complete(req-done);
}
@@ -4903,10 +4883,7 @@ static int __devinit sd_parent_degenerat
  */
 void __devinit cpu_attach_domain(struct sched_domain *sd, int cpu)
 {
-   migration_req_t req;
-   unsigned long flags;
runqueue_t *rq = cpu_rq(cpu);
-   int local = 1;
struct sched_domain *tmp;
 
/* Remove the sched domains which do not contribute to scheduling. */
@@ -4923,24 +4900,7 @@ void __devinit cpu_attach_domain(struct 
 
sched_domain_debug(sd, cpu);
 
-   spin_lock_irqsave(rq-lock, flags);
-
-   if (cpu == smp_processor_id() || !cpu_online(cpu)) {
-   rq-sd = sd;
-   } else {
-   init_completion(req.done);
-   req.type = REQ_SET_DOMAIN;
-   req.sd = sd;
-   list_add(req.list, rq-migration_queue);
-   local = 0;
-   }
-
-   spin_unlock_irqrestore(rq-lock, flags);
-
-   if (!local) {
-   wake_up_process(rq-migration_thread);
-   wait_for_completion(req.done);
-   }
+   rq-sd = sd;
 }
 
 /* cpus with isolated domains */
@@ -5215,6 +5175,7 @@ static int update_sched_domains(struct n
case CPU_DOWN_PREPARE:
for_each_online_cpu(i)
cpu_attach_domain(NULL, i);
+   synchronize_kernel();
arch_destroy_sched_domains();
return NOTIFY_OK;