Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-04 Thread Max Krasnyansky


Paul Jackson wrote:
> Max K wrote:
>>> And for another thing, we already declare externs in cpumask.h for
>>> the other, more widely used, cpu_*_map variables cpu_possible_map,
>>> cpu_online_map, and cpu_present_map.
>> Well, to address #2 and #3 isolated map will need to be exported as well.
>> Those other maps do not really have much to do with the scheduler code.
>> That's why I think either kernel/cpumask.c or kernel/cpu.c is a better place 
>> for them.
> 
> Well, if you have need it to be exported for #2 or #3, then that's ok
> by me - export it.
> 
> I'm unaware of any kernel/cpumask.c.  If you meant lib/cpumask.c, then
> I'd prefer you not put it there, as lib/cpumask.c just contains the
> implementation details of the abstract data type cpumask_t, not any of
> its uses.  If you mean kernel/cpuset.c, then that's not a good choice
> either, as that just contains the implementation details of the cpuset
> subsystem.  You should usually define such things in one of the files
> using it, and unless there is clearly a -better- place to move the
> definition, it's usually better to just leave it where it is.

I was thinking of creating the new file kernel/cpumask.c. But it probably does 
not make sense 
just for the masks. I'm now thinking kernel/cpu.c is the best place for it. It 
contains all 
the cpu hotplug logic that deals with those maps at the very top it has stuff 
like

/* Serializes the updates to cpu_online_map, cpu_present_map */
static DEFINE_MUTEX(cpu_add_remove_lock);

So it seems to make sense to keep the maps in there.

Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-04 Thread Paul Jackson
Max K wrote:
> > And for another thing, we already declare externs in cpumask.h for
> > the other, more widely used, cpu_*_map variables cpu_possible_map,
> > cpu_online_map, and cpu_present_map.
> Well, to address #2 and #3 isolated map will need to be exported as well.
> Those other maps do not really have much to do with the scheduler code.
> That's why I think either kernel/cpumask.c or kernel/cpu.c is a better place 
> for them.

Well, if you have need it to be exported for #2 or #3, then that's ok
by me - export it.

I'm unaware of any kernel/cpumask.c.  If you meant lib/cpumask.c, then
I'd prefer you not put it there, as lib/cpumask.c just contains the
implementation details of the abstract data type cpumask_t, not any of
its uses.  If you mean kernel/cpuset.c, then that's not a good choice
either, as that just contains the implementation details of the cpuset
subsystem.  You should usually define such things in one of the files
using it, and unless there is clearly a -better- place to move the
definition, it's usually better to just leave it where it is.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.940.382.4214
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-04 Thread Max Krasnyanskiy

Paul Jackson wrote:

Max wrote:

Looks like I failed to explain what I'm trying to achieve. So let me try again.


Well done.  I read through that, expecting to disagree or at least
to not understand at some point, and got all the way through nodding
my head in agreement.  Good.

Whether the earlier confusions were lack of clarity in the presentation,
or lack of competence in my brain ... well guess I don't want to ask that
question ;).

:)


Well ... just one minor point:

Max wrote in reply to pj:

The cpu_isolated_map is a file static variable known only within
the kernel/sched.c file; this should not change.

I completely disagree. In fact I think all the cpu_xxx_map (online, present, 
isolated)
variables do not belong in the scheduler code. I'm thinking of submitting a 
patch that
factors them out into kernel/cpumask.c We already have cpumask.h.


Huh?  Why would you want to do that?

For one thing, the map being discussed here, cpu_isolated_map,
is only used in sched.c, so why publish it wider?

And for another thing, we already declare externs in cpumask.h for
the other, more widely used, cpu_*_map variables cpu_possible_map,
cpu_online_map, and cpu_present_map.

Well, to address #2 and #3 isolated map will need to be exported as well.
Those other maps do not really have much to do with the scheduler code.
That's why I think either kernel/cpumask.c or kernel/cpu.c is a better place 
for them.


Other than that detail, we seem to be communicating and in agreement on
your first item, isolating CPU scheduler load balancing.  Good.

On your other two items, irq and workqueue isolation, which I had
suggested doing via cpuset sched_load_balance, I now agree that that
wasn't a good idea.

I am still a little surprised at using isolation extensions to
disable irqs on select CPUs; but others have thought far more about
irqs than I have, so I'll be quiet.

Please note that we're not talking about completely disabling IRQs. We're 
talking about
not routing them to the isolated CPUs by default. It's still possible to 
explicitly reroute an IRQ
to the isolated CPU.
Why is this needed  ? It is actually very easy to explain. IRQs are the major source of latency 
and overhead. IRQ handlers themselves are mostly ok but they typically schedule soft irqs, work 
queues and timers on the same CPU where an IRQ is handled. In other words if an isolated CPU is 
receiving IRQs it's not really isolated, because it's running a whole bunch of different kernel 
code (ie we're talking latencies, cache usage, etc). 
If course some folks may want to explicitly route certain  IRQs to the isolated CPUs. For example 
if an app depends on the network stack it may make sense to route an IRQ from the NIC to the same 
CPU the app is running on.


Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-04 Thread Paul Jackson
Max wrote:
> Looks like I failed to explain what I'm trying to achieve. So let me try 
> again.

Well done.  I read through that, expecting to disagree or at least
to not understand at some point, and got all the way through nodding
my head in agreement.  Good.

Whether the earlier confusions were lack of clarity in the presentation,
or lack of competence in my brain ... well guess I don't want to ask that
question ;).

Well ... just one minor point:

Max wrote in reply to pj:
> > The cpu_isolated_map is a file static variable known only within
> > the kernel/sched.c file; this should not change.
> I completely disagree. In fact I think all the cpu_xxx_map (online, present, 
> isolated)
> variables do not belong in the scheduler code. I'm thinking of submitting a 
> patch that
> factors them out into kernel/cpumask.c We already have cpumask.h.

Huh?  Why would you want to do that?

For one thing, the map being discussed here, cpu_isolated_map,
is only used in sched.c, so why publish it wider?

And for another thing, we already declare externs in cpumask.h for
the other, more widely used, cpu_*_map variables cpu_possible_map,
cpu_online_map, and cpu_present_map.

Other than that detail, we seem to be communicating and in agreement on
your first item, isolating CPU scheduler load balancing.  Good.

On your other two items, irq and workqueue isolation, which I had
suggested doing via cpuset sched_load_balance, I now agree that that
wasn't a good idea.

I am still a little surprised at using isolation extensions to
disable irqs on select CPUs; but others have thought far more about
irqs than I have, so I'll be quiet.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.940.382.4214
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-04 Thread Paul Jackson
Max wrote:
 Looks like I failed to explain what I'm trying to achieve. So let me try 
 again.

Well done.  I read through that, expecting to disagree or at least
to not understand at some point, and got all the way through nodding
my head in agreement.  Good.

Whether the earlier confusions were lack of clarity in the presentation,
or lack of competence in my brain ... well guess I don't want to ask that
question ;).

Well ... just one minor point:

Max wrote in reply to pj:
  The cpu_isolated_map is a file static variable known only within
  the kernel/sched.c file; this should not change.
 I completely disagree. In fact I think all the cpu_xxx_map (online, present, 
 isolated)
 variables do not belong in the scheduler code. I'm thinking of submitting a 
 patch that
 factors them out into kernel/cpumask.c We already have cpumask.h.

Huh?  Why would you want to do that?

For one thing, the map being discussed here, cpu_isolated_map,
is only used in sched.c, so why publish it wider?

And for another thing, we already declare externs in cpumask.h for
the other, more widely used, cpu_*_map variables cpu_possible_map,
cpu_online_map, and cpu_present_map.

Other than that detail, we seem to be communicating and in agreement on
your first item, isolating CPU scheduler load balancing.  Good.

On your other two items, irq and workqueue isolation, which I had
suggested doing via cpuset sched_load_balance, I now agree that that
wasn't a good idea.

I am still a little surprised at using isolation extensions to
disable irqs on select CPUs; but others have thought far more about
irqs than I have, so I'll be quiet.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.940.382.4214
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-04 Thread Max Krasnyanskiy

Paul Jackson wrote:

Max wrote:

Looks like I failed to explain what I'm trying to achieve. So let me try again.


Well done.  I read through that, expecting to disagree or at least
to not understand at some point, and got all the way through nodding
my head in agreement.  Good.

Whether the earlier confusions were lack of clarity in the presentation,
or lack of competence in my brain ... well guess I don't want to ask that
question ;).

:)


Well ... just one minor point:

Max wrote in reply to pj:

The cpu_isolated_map is a file static variable known only within
the kernel/sched.c file; this should not change.

I completely disagree. In fact I think all the cpu_xxx_map (online, present, 
isolated)
variables do not belong in the scheduler code. I'm thinking of submitting a 
patch that
factors them out into kernel/cpumask.c We already have cpumask.h.


Huh?  Why would you want to do that?

For one thing, the map being discussed here, cpu_isolated_map,
is only used in sched.c, so why publish it wider?

And for another thing, we already declare externs in cpumask.h for
the other, more widely used, cpu_*_map variables cpu_possible_map,
cpu_online_map, and cpu_present_map.

Well, to address #2 and #3 isolated map will need to be exported as well.
Those other maps do not really have much to do with the scheduler code.
That's why I think either kernel/cpumask.c or kernel/cpu.c is a better place 
for them.


Other than that detail, we seem to be communicating and in agreement on
your first item, isolating CPU scheduler load balancing.  Good.

On your other two items, irq and workqueue isolation, which I had
suggested doing via cpuset sched_load_balance, I now agree that that
wasn't a good idea.

I am still a little surprised at using isolation extensions to
disable irqs on select CPUs; but others have thought far more about
irqs than I have, so I'll be quiet.

Please note that we're not talking about completely disabling IRQs. We're 
talking about
not routing them to the isolated CPUs by default. It's still possible to 
explicitly reroute an IRQ
to the isolated CPU.
Why is this needed  ? It is actually very easy to explain. IRQs are the major source of latency 
and overhead. IRQ handlers themselves are mostly ok but they typically schedule soft irqs, work 
queues and timers on the same CPU where an IRQ is handled. In other words if an isolated CPU is 
receiving IRQs it's not really isolated, because it's running a whole bunch of different kernel 
code (ie we're talking latencies, cache usage, etc). 
If course some folks may want to explicitly route certain  IRQs to the isolated CPUs. For example 
if an app depends on the network stack it may make sense to route an IRQ from the NIC to the same 
CPU the app is running on.


Max
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-04 Thread Paul Jackson
Max K wrote:
  And for another thing, we already declare externs in cpumask.h for
  the other, more widely used, cpu_*_map variables cpu_possible_map,
  cpu_online_map, and cpu_present_map.
 Well, to address #2 and #3 isolated map will need to be exported as well.
 Those other maps do not really have much to do with the scheduler code.
 That's why I think either kernel/cpumask.c or kernel/cpu.c is a better place 
 for them.

Well, if you have need it to be exported for #2 or #3, then that's ok
by me - export it.

I'm unaware of any kernel/cpumask.c.  If you meant lib/cpumask.c, then
I'd prefer you not put it there, as lib/cpumask.c just contains the
implementation details of the abstract data type cpumask_t, not any of
its uses.  If you mean kernel/cpuset.c, then that's not a good choice
either, as that just contains the implementation details of the cpuset
subsystem.  You should usually define such things in one of the files
using it, and unless there is clearly a -better- place to move the
definition, it's usually better to just leave it where it is.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.940.382.4214
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-04 Thread Max Krasnyansky


Paul Jackson wrote:
 Max K wrote:
 And for another thing, we already declare externs in cpumask.h for
 the other, more widely used, cpu_*_map variables cpu_possible_map,
 cpu_online_map, and cpu_present_map.
 Well, to address #2 and #3 isolated map will need to be exported as well.
 Those other maps do not really have much to do with the scheduler code.
 That's why I think either kernel/cpumask.c or kernel/cpu.c is a better place 
 for them.
 
 Well, if you have need it to be exported for #2 or #3, then that's ok
 by me - export it.
 
 I'm unaware of any kernel/cpumask.c.  If you meant lib/cpumask.c, then
 I'd prefer you not put it there, as lib/cpumask.c just contains the
 implementation details of the abstract data type cpumask_t, not any of
 its uses.  If you mean kernel/cpuset.c, then that's not a good choice
 either, as that just contains the implementation details of the cpuset
 subsystem.  You should usually define such things in one of the files
 using it, and unless there is clearly a -better- place to move the
 definition, it's usually better to just leave it where it is.

I was thinking of creating the new file kernel/cpumask.c. But it probably does 
not make sense 
just for the masks. I'm now thinking kernel/cpu.c is the best place for it. It 
contains all 
the cpu hotplug logic that deals with those maps at the very top it has stuff 
like

/* Serializes the updates to cpu_online_map, cpu_present_map */
static DEFINE_MUTEX(cpu_add_remove_lock);

So it seems to make sense to keep the maps in there.

Max
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-03 Thread Max Krasnyansky
Paul Jackson wrote:
> Max wrote:
>> Paul, I actually mentioned at the beginning of my email that I did read that 
>> thread
>> started by Peter. I did learn quite a bit from it :)
> 
> Ah - sorry - I missed that part.  However, I'm still getting the feeling
> that there were some key points in that thread that we have not managed
> to communicate successfully.
I think you are assuming that I only need to deal with RT scheduler and 
scheduler
domains which is not correct. See below.

>> Sounds like at this point we're in agreement that sched_load_balance is not 
>> suitable
>> for what I'd like to achieve.
> 
> I don't think we're in agreement; I think we're in confusion ;)
Yeah. I don't believe I'm the confused side though ;-)

> Yes, sched_load_balance does not *directly* have anything to do with this.
> 
> But indirectly it is a critical element in what I think you'd like to
> achieve.  It affects how the cpuset code sets up sched_domains, and
> if I understand correctly, you require either (1) some sched_domains to
> only contain RT tasks, or (2) some CPUs to be in no sched_domain at all.
> 
> Proper configuration of the cpuset hierarchy, including the setting of
> the per-cpuset sched_load_balance flag, can provide either of these
> sched_domain partitions, as desired.
Again you're assuming that scheduling domain partitioning satisfies my 
requirements
or addresses my use case. It does not. See below for more details.
 
>> But how about making cpusets aware of the cpu_isolated_map ?
> 
> No.  That's confusing cpusets and the scheduler again.
> 
> The cpu_isolated_map is a file static variable known only within
> the kernel/sched.c file; this should not change.
I completely disagree. In fact I think all the cpu_xxx_map (online, present, 
isolated)
variables do not belong in the scheduler code. I'm thinking of submitting a 
patch that
factors them out into kernel/cpumask.c We already have cpumask.h.

> Presently, the boot parameter isolcpus= is just used to initialize
> what CPUs are isolated at boot, and then the sched_domain partitioning,
> as done in kernel/sched.c:partition_sched_domains() (the hook into
> the sched code that cpusets uses) determines which CPUs are isolated
> from that point forward.  I doubt that this should change either.
Sure, I did not even touch that part. I just proposed to extend the meaning of 
the 
'isolated' bit.

> In that thread referenced above, did you see the part where RT is
> achieved not by isolating CPUs from any scheduler, but rather by
> polymorphically having several schedulers available to operate on each
> sched_domain, and having RT threads self-select the RT scheduler?
Absolutely. Yes that is. I saw that part. But it has nothing to do with my use 
case.

Looks like I failed to explain what I'm trying to achieve. So let me try again.
I'd like to be able to run a CPU intensive (%100) RT task on one of the 
processors without 
adversely affecting or being affected by the other system activities. System 
activities 
here include _kernel_ activities as well. Hence the proposal is to extend 
current CPU 
isolation feature.

The new definition of the CPU isolation would be:
---
1. Isolated CPU(s) must not be subject to scheduler load balancing
   Users must explicitly bind threads in order to run on those CPU(s).

2. By default interrupts must not be routed to the isolated CPU(s)
   User must route interrupts (if any) explicitly.

3. In general kernel subsystems must avoid activity on the isolated CPU(s) as 
much as possible
   Includes workqueues, per CPU threads, etc.
   This feature is configurable and is disabled by default.  
---

#1 affects scheduler and scheduler domains. It's already supported either by 
using isolcpus= boot
option or by setting "sched_load_balance" in cpusets. I'm totally happy with 
the current behavior
and my original patch did not mess with this functionality in any way.

#2 and #3 have _nothing_ to do with the scheduler or scheduler domains. I've 
been trying to explain 
that for a few days now ;-). When you saw my patches for #2 and #3 you told me 
that you'd be interested 
to see them implemented on top of the "sched_load_balance" flag. Here is your 
original reply
http://marc.info/?l=linux-kernel=120153260217699=2

So I looked into that and provided an explanation why it would not work or 
would work but would add 
lots of complexity (access to internal cpuset structures, locking, etc).
My email on that is here:
http://marc.info/?l=linux-kernel=120180692331461=2

Now, I felt from the beginning that cpusets is not the right mechanism to 
address number #2 and #3.
The best mechanism IMO is to simply provide an access to the cpu_isolated_map 
to the rest of the kernel.
Again the fact that cpu_isolated_map currently lives in the scheduler code does 
not change anything 
here because as I explained I'm proposing to extend the meaning of the "CPU 
isolation". I provided 
dynamic access to the "isolated" bit only for 

Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-03 Thread Max Krasnyansky
Paul Jackson wrote:
 Max wrote:
 Paul, I actually mentioned at the beginning of my email that I did read that 
 thread
 started by Peter. I did learn quite a bit from it :)
 
 Ah - sorry - I missed that part.  However, I'm still getting the feeling
 that there were some key points in that thread that we have not managed
 to communicate successfully.
I think you are assuming that I only need to deal with RT scheduler and 
scheduler
domains which is not correct. See below.

 Sounds like at this point we're in agreement that sched_load_balance is not 
 suitable
 for what I'd like to achieve.
 
 I don't think we're in agreement; I think we're in confusion ;)
Yeah. I don't believe I'm the confused side though ;-)

 Yes, sched_load_balance does not *directly* have anything to do with this.
 
 But indirectly it is a critical element in what I think you'd like to
 achieve.  It affects how the cpuset code sets up sched_domains, and
 if I understand correctly, you require either (1) some sched_domains to
 only contain RT tasks, or (2) some CPUs to be in no sched_domain at all.
 
 Proper configuration of the cpuset hierarchy, including the setting of
 the per-cpuset sched_load_balance flag, can provide either of these
 sched_domain partitions, as desired.
Again you're assuming that scheduling domain partitioning satisfies my 
requirements
or addresses my use case. It does not. See below for more details.
 
 But how about making cpusets aware of the cpu_isolated_map ?
 
 No.  That's confusing cpusets and the scheduler again.
 
 The cpu_isolated_map is a file static variable known only within
 the kernel/sched.c file; this should not change.
I completely disagree. In fact I think all the cpu_xxx_map (online, present, 
isolated)
variables do not belong in the scheduler code. I'm thinking of submitting a 
patch that
factors them out into kernel/cpumask.c We already have cpumask.h.

 Presently, the boot parameter isolcpus= is just used to initialize
 what CPUs are isolated at boot, and then the sched_domain partitioning,
 as done in kernel/sched.c:partition_sched_domains() (the hook into
 the sched code that cpusets uses) determines which CPUs are isolated
 from that point forward.  I doubt that this should change either.
Sure, I did not even touch that part. I just proposed to extend the meaning of 
the 
'isolated' bit.

 In that thread referenced above, did you see the part where RT is
 achieved not by isolating CPUs from any scheduler, but rather by
 polymorphically having several schedulers available to operate on each
 sched_domain, and having RT threads self-select the RT scheduler?
Absolutely. Yes that is. I saw that part. But it has nothing to do with my use 
case.

Looks like I failed to explain what I'm trying to achieve. So let me try again.
I'd like to be able to run a CPU intensive (%100) RT task on one of the 
processors without 
adversely affecting or being affected by the other system activities. System 
activities 
here include _kernel_ activities as well. Hence the proposal is to extend 
current CPU 
isolation feature.

The new definition of the CPU isolation would be:
---
1. Isolated CPU(s) must not be subject to scheduler load balancing
   Users must explicitly bind threads in order to run on those CPU(s).

2. By default interrupts must not be routed to the isolated CPU(s)
   User must route interrupts (if any) explicitly.

3. In general kernel subsystems must avoid activity on the isolated CPU(s) as 
much as possible
   Includes workqueues, per CPU threads, etc.
   This feature is configurable and is disabled by default.  
---

#1 affects scheduler and scheduler domains. It's already supported either by 
using isolcpus= boot
option or by setting sched_load_balance in cpusets. I'm totally happy with 
the current behavior
and my original patch did not mess with this functionality in any way.

#2 and #3 have _nothing_ to do with the scheduler or scheduler domains. I've 
been trying to explain 
that for a few days now ;-). When you saw my patches for #2 and #3 you told me 
that you'd be interested 
to see them implemented on top of the sched_load_balance flag. Here is your 
original reply
http://marc.info/?l=linux-kernelm=120153260217699w=2

So I looked into that and provided an explanation why it would not work or 
would work but would add 
lots of complexity (access to internal cpuset structures, locking, etc).
My email on that is here:
http://marc.info/?l=linux-kernelm=120180692331461w=2

Now, I felt from the beginning that cpusets is not the right mechanism to 
address number #2 and #3.
The best mechanism IMO is to simply provide an access to the cpu_isolated_map 
to the rest of the kernel.
Again the fact that cpu_isolated_map currently lives in the scheduler code does 
not change anything 
here because as I explained I'm proposing to extend the meaning of the CPU 
isolation. I provided 
dynamic access to the isolated bit only for convince, it does _not_ change 
existing 

Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-02 Thread Paul Jackson
Max wrote:
> Paul, I actually mentioned at the beginning of my email that I did read that 
> thread
> started by Peter. I did learn quite a bit from it :)

Ah - sorry - I missed that part.  However, I'm still getting the feeling
that there were some key points in that thread that we have not managed
to communicate successfully.

> Sounds like at this point we're in agreement that sched_load_balance is not 
> suitable
> for what I'd like to achieve.

I don't think we're in agreement; I think we're in confusion ;)

Yes, sched_load_balance does not *directly* have anything to do with
this.

But indirectly it is a critical element in what I think you'd like to
achieve.  It affects how the cpuset code sets up sched_domains, and
if I understand correctly, you require either (1) some sched_domains to
only contain RT tasks, or (2) some CPUs to be in no sched_domain at all.

Proper configuration of the cpuset hierarchy, including the setting of
the per-cpuset sched_load_balance flag, can provide either of these
sched_domain partitions, as desired.

> But how about making cpusets aware of the cpu_isolated_map ?

No.  That's confusing cpusets and the scheduler again.

The cpu_isolated_map is a file static variable known only within
the kernel/sched.c file; this should not change.

Presently, the boot parameter isolcpus= is just used to initialize
what CPUs are isolated at boot, and then the sched_domain partitioning,
as done in kernel/sched.c:partition_sched_domains() (the hook into
the sched code that cpusets uses) determines which CPUs are isolated
from that point forward.  I doubt that this should change either.

In that thread referenced above, did you see the part where RT is
achieved not by isolating CPUs from any scheduler, but rather by
polymorphically having several schedulers available to operate on each
sched_domain, and having RT threads self-select the RT scheduler?

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.940.382.4214
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-02 Thread Max Krasnyansky
Paul Jackson wrote:
> Max wrote:
>> Here is the list of things of issues with sched_load_balance flag from CPU 
>> isolation 
>> perspective:
> 
> A separate thread happened to start up on lkml.org, shortly after
> yours, that went into this in considerable detail.
> 
> For example, the interaction of cpusets, sched_load_balance,
> sched_domains and real time scheduling is examined in some detail on
> this thread.  Everyone participating on that thread learned something
> (we all came into it with less than a full picture of what's there.)
> 
> I would encourage you to read it closely.  For example, the scheduler
> code should not be trying to access per-cpuset attributes such as
> the sched_load_balance flag (you are correct that this would be
> difficult to do because of the locking; however by design, that is
> not to be done.)
> 
> This thread begins at:
> 
> scheduler scalability - cgroups, cpusets and load-balancing
> http://lkml.org/lkml/2008/1/29/60
> 
> Too bad we didn't think to include you in the CC list of that
> thread from the beginning.

Paul, I actually mentioned at the beginning of my email that I did read that 
thread
started by Peter. I did learn quite a bit from it :)
You guys did not discuss isolation stuff though. The thread was only about 
scheduling
and my cpu isolation extension patches deal with other aspects. 

Sounds like at this point we're in agreement that sched_load_balance is not 
suitable
for what I'd like to achieve. But how about making cpusets aware of the 
cpu_isolated_map ?
Even without my patches it's somewhat of an issue right now. I mean of you use 
isolcpus= 
boot option to put cpus into null domain, cpusets will not be aware of it. The 
result maybe 
a bit confusing if an isolated cpu is added to some cpuset.

Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-02 Thread Max Krasnyansky
Paul Jackson wrote:
 Max wrote:
 Here is the list of things of issues with sched_load_balance flag from CPU 
 isolation 
 perspective:
 
 A separate thread happened to start up on lkml.org, shortly after
 yours, that went into this in considerable detail.
 
 For example, the interaction of cpusets, sched_load_balance,
 sched_domains and real time scheduling is examined in some detail on
 this thread.  Everyone participating on that thread learned something
 (we all came into it with less than a full picture of what's there.)
 
 I would encourage you to read it closely.  For example, the scheduler
 code should not be trying to access per-cpuset attributes such as
 the sched_load_balance flag (you are correct that this would be
 difficult to do because of the locking; however by design, that is
 not to be done.)
 
 This thread begins at:
 
 scheduler scalability - cgroups, cpusets and load-balancing
 http://lkml.org/lkml/2008/1/29/60
 
 Too bad we didn't think to include you in the CC list of that
 thread from the beginning.

Paul, I actually mentioned at the beginning of my email that I did read that 
thread
started by Peter. I did learn quite a bit from it :)
You guys did not discuss isolation stuff though. The thread was only about 
scheduling
and my cpu isolation extension patches deal with other aspects. 

Sounds like at this point we're in agreement that sched_load_balance is not 
suitable
for what I'd like to achieve. But how about making cpusets aware of the 
cpu_isolated_map ?
Even without my patches it's somewhat of an issue right now. I mean of you use 
isolcpus= 
boot option to put cpus into null domain, cpusets will not be aware of it. The 
result maybe 
a bit confusing if an isolated cpu is added to some cpuset.

Max
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-02 Thread Paul Jackson
Max wrote:
 Paul, I actually mentioned at the beginning of my email that I did read that 
 thread
 started by Peter. I did learn quite a bit from it :)

Ah - sorry - I missed that part.  However, I'm still getting the feeling
that there were some key points in that thread that we have not managed
to communicate successfully.

 Sounds like at this point we're in agreement that sched_load_balance is not 
 suitable
 for what I'd like to achieve.

I don't think we're in agreement; I think we're in confusion ;)

Yes, sched_load_balance does not *directly* have anything to do with
this.

But indirectly it is a critical element in what I think you'd like to
achieve.  It affects how the cpuset code sets up sched_domains, and
if I understand correctly, you require either (1) some sched_domains to
only contain RT tasks, or (2) some CPUs to be in no sched_domain at all.

Proper configuration of the cpuset hierarchy, including the setting of
the per-cpuset sched_load_balance flag, can provide either of these
sched_domain partitions, as desired.

 But how about making cpusets aware of the cpu_isolated_map ?

No.  That's confusing cpusets and the scheduler again.

The cpu_isolated_map is a file static variable known only within
the kernel/sched.c file; this should not change.

Presently, the boot parameter isolcpus= is just used to initialize
what CPUs are isolated at boot, and then the sched_domain partitioning,
as done in kernel/sched.c:partition_sched_domains() (the hook into
the sched code that cpusets uses) determines which CPUs are isolated
from that point forward.  I doubt that this should change either.

In that thread referenced above, did you see the part where RT is
achieved not by isolating CPUs from any scheduler, but rather by
polymorphically having several schedulers available to operate on each
sched_domain, and having RT threads self-select the RT scheduler?

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.940.382.4214
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-01 Thread Paul Jackson
Max wrote:
> Here is the list of things of issues with sched_load_balance flag from CPU 
> isolation 
> perspective:

A separate thread happened to start up on lkml.org, shortly after
yours, that went into this in considerable detail.

For example, the interaction of cpusets, sched_load_balance,
sched_domains and real time scheduling is examined in some detail on
this thread.  Everyone participating on that thread learned something
(we all came into it with less than a full picture of what's there.)

I would encourage you to read it closely.  For example, the scheduler
code should not be trying to access per-cpuset attributes such as
the sched_load_balance flag (you are correct that this would be
difficult to do because of the locking; however by design, that is
not to be done.)

This thread begins at:

scheduler scalability - cgroups, cpusets and load-balancing
http://lkml.org/lkml/2008/1/29/60

Too bad we didn't think to include you in the CC list of that
thread from the beginning.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.940.382.4214
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-01 Thread Paul Jackson
Max wrote:
 Here is the list of things of issues with sched_load_balance flag from CPU 
 isolation 
 perspective:

A separate thread happened to start up on lkml.org, shortly after
yours, that went into this in considerable detail.

For example, the interaction of cpusets, sched_load_balance,
sched_domains and real time scheduling is examined in some detail on
this thread.  Everyone participating on that thread learned something
(we all came into it with less than a full picture of what's there.)

I would encourage you to read it closely.  For example, the scheduler
code should not be trying to access per-cpuset attributes such as
the sched_load_balance flag (you are correct that this would be
difficult to do because of the locking; however by design, that is
not to be done.)

This thread begins at:

scheduler scalability - cgroups, cpusets and load-balancing
http://lkml.org/lkml/2008/1/29/60

Too bad we didn't think to include you in the CC list of that
thread from the beginning.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.940.382.4214
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-01-31 Thread Max Krasnyanskiy

Paul Jackson wrote:

Max wrote:

So far it seems that extending cpu_isolated_map
is more natural way of propagating this notion to the rest of the kernel.
Since it's very similar to the cpu_online_map concept and it's easy to 
integrated
with the code that already uses it. 


If it were just realtime support, then I suspect I'd agree that
extending cpu_isolated_map makes more sense.

But some people use realtime on systems that are also heavily
managed using cpusets.  The two have to work together.  I have
customers with systems running realtime on a few CPUs, at the
same time that they have a large batch scheduler (which is layered
on top of cpusets) managing jobs on a few hundred other CPUs.
Hence with the cpuset 'sched_load_balance' flag I think I've already
done what I think is one part of what your patches achieve by extending
the cpu_isolated_map.

This is a common situation with "resource management" mechanisms such
as cpusets (and more recently cgroups and the subsystem modules it
supports.)  They cut across existing core kernel code that manages such
key resources as CPUs and memory.  As best we can, they have to work
with each other.


Hi Paul,

I thought some more about your proposal to use sched_load_balance flag in 
cpusets instead
of extending cpu_isolated_map. I looked at the cpusets, cgroups, latest thread 
started by
Peter (about sched domains and stuff) and here are my thoughts on this.

Here is the list of things of issues with sched_load_balance flag from CPU isolation 
perspective:

--
(1) Boot time isolation is not possible. There is currently no way to setup a 
cpuset at
boot time. For example we won't be able to isolate cpus from irqs and 
workqueues at boot.
Not a major issue but still an inconvenience.

--
(2) There is currently no easy way to figure out what cpuset a cpu belongs to in order 
to query it's sched_load_balance flag. In order to do that we need a method that iterates
all active cpusets and checks their cpus_allowed masks. This implies holding cgroup and 
cpuset mutexes. It's not clear whether it's ok to do that from the the contexts CPU 
isolation happens in (apic, sched, workqueue). It seems that cgroup/cpuset api is designed

from top down access. ie adding a cpu to a set and then recomputing domains. 
Which makes
perfect sense for the common cpuset usecase but is not what cpu isolation needs.
In other words I think it's much simpler and cleaner to use the 
cpu_isolated_map for isolation
purposes.

--
(3) cpusets are a bit too dynamic :). What I mean by this is that 
sched_load_balance flag
can be changed at any time without bringing a CPU offline. What that means is 
that we'll
need some notifier mechanisms for killing and restarting workqueue threads when that flag 
changes. Also we'd need some logic that makes sure that a user does not disable load balancing 
on all cpus because that effectively will kill workqueues on all the cpus.

This particular case is already handled very nicely in my patches. Isolated bit 
can be set
only when cpu is offline and it cannot be set on the first online cpu. 
Workqueus and other
subsystems already handle cpu hotplug events nicely and can easily ignore 
isolated cpus when
they come online.

-

#1 is probably unfixable. #2 and #3 can be fixed but at the expense of extra 
complexity across
the board. I seriously doubt that I'll be able to push that through the reviews ;-). 

Also personally I still think cpusets and cpu isolation attack two different problems. cpusets 
is about partitioning cpus and memory nodes, and managing tasks. Most of the cgroups/cpuset APIs 
are designed to deal with tasks. CPU isolation is much simpler and is at the lower layer. It deals 
with IRQs, kernel per cpu threads, etc. The only intersection I see is that both features affect 
scheduling domains (cpu isolation is again simple here it just puts cpus into null domains and

that's an existing logic in sched.c nothing new here).
So here are some proposal on how we can make them play nicely with each other. 


--
(A) Make cpusets aware of isolated cpus.
All we have to do here is to change 
	guarantee_online_cpus()

common_cpu_mem_hotplug_unplug()
to exclude cpu_isolated_map from cpu_online_map before using it.
And we'd need to change 
	update_cpumasks()

to simply ignore isolated cpus.

That way if a cpu is isolated it'll be ignored by the cpusets logic. Which I 
believe would be
correct behavior. 
We're talking trivial ~5 liner patch which will be noop if cpu isolation is disabled.


(B) Ignore isolated map in cpuset. That's the current state of affairs with my 
patches applied.
Looks like your customers are happy with what they have now so they will probably not enable 
cpu isolation anyway :).


(C) Introduce cpu_usable_map. That map will be recomputed on hotplug events. 
Essentially it'd be
cpu_online_map AND ~cpu_isolated_map. Convert things like cpusets to use that map instead of 
online map.


We can probably come up with other 

Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-01-31 Thread Max Krasnyanskiy

Paul Jackson wrote:

Max wrote:

So far it seems that extending cpu_isolated_map
is more natural way of propagating this notion to the rest of the kernel.
Since it's very similar to the cpu_online_map concept and it's easy to 
integrated
with the code that already uses it. 


If it were just realtime support, then I suspect I'd agree that
extending cpu_isolated_map makes more sense.

But some people use realtime on systems that are also heavily
managed using cpusets.  The two have to work together.  I have
customers with systems running realtime on a few CPUs, at the
same time that they have a large batch scheduler (which is layered
on top of cpusets) managing jobs on a few hundred other CPUs.
Hence with the cpuset 'sched_load_balance' flag I think I've already
done what I think is one part of what your patches achieve by extending
the cpu_isolated_map.

This is a common situation with resource management mechanisms such
as cpusets (and more recently cgroups and the subsystem modules it
supports.)  They cut across existing core kernel code that manages such
key resources as CPUs and memory.  As best we can, they have to work
with each other.


Hi Paul,

I thought some more about your proposal to use sched_load_balance flag in 
cpusets instead
of extending cpu_isolated_map. I looked at the cpusets, cgroups, latest thread 
started by
Peter (about sched domains and stuff) and here are my thoughts on this.

Here is the list of things of issues with sched_load_balance flag from CPU isolation 
perspective:

--
(1) Boot time isolation is not possible. There is currently no way to setup a 
cpuset at
boot time. For example we won't be able to isolate cpus from irqs and 
workqueues at boot.
Not a major issue but still an inconvenience.

--
(2) There is currently no easy way to figure out what cpuset a cpu belongs to in order 
to query it's sched_load_balance flag. In order to do that we need a method that iterates
all active cpusets and checks their cpus_allowed masks. This implies holding cgroup and 
cpuset mutexes. It's not clear whether it's ok to do that from the the contexts CPU 
isolation happens in (apic, sched, workqueue). It seems that cgroup/cpuset api is designed

from top down access. ie adding a cpu to a set and then recomputing domains. 
Which makes
perfect sense for the common cpuset usecase but is not what cpu isolation needs.
In other words I think it's much simpler and cleaner to use the 
cpu_isolated_map for isolation
purposes.

--
(3) cpusets are a bit too dynamic :). What I mean by this is that 
sched_load_balance flag
can be changed at any time without bringing a CPU offline. What that means is 
that we'll
need some notifier mechanisms for killing and restarting workqueue threads when that flag 
changes. Also we'd need some logic that makes sure that a user does not disable load balancing 
on all cpus because that effectively will kill workqueues on all the cpus.

This particular case is already handled very nicely in my patches. Isolated bit 
can be set
only when cpu is offline and it cannot be set on the first online cpu. 
Workqueus and other
subsystems already handle cpu hotplug events nicely and can easily ignore 
isolated cpus when
they come online.

-

#1 is probably unfixable. #2 and #3 can be fixed but at the expense of extra 
complexity across
the board. I seriously doubt that I'll be able to push that through the reviews ;-). 

Also personally I still think cpusets and cpu isolation attack two different problems. cpusets 
is about partitioning cpus and memory nodes, and managing tasks. Most of the cgroups/cpuset APIs 
are designed to deal with tasks. CPU isolation is much simpler and is at the lower layer. It deals 
with IRQs, kernel per cpu threads, etc. The only intersection I see is that both features affect 
scheduling domains (cpu isolation is again simple here it just puts cpus into null domains and

that's an existing logic in sched.c nothing new here).
So here are some proposal on how we can make them play nicely with each other. 


--
(A) Make cpusets aware of isolated cpus.
All we have to do here is to change 
	guarantee_online_cpus()

common_cpu_mem_hotplug_unplug()
to exclude cpu_isolated_map from cpu_online_map before using it.
And we'd need to change 
	update_cpumasks()

to simply ignore isolated cpus.

That way if a cpu is isolated it'll be ignored by the cpusets logic. Which I 
believe would be
correct behavior. 
We're talking trivial ~5 liner patch which will be noop if cpu isolation is disabled.


(B) Ignore isolated map in cpuset. That's the current state of affairs with my 
patches applied.
Looks like your customers are happy with what they have now so they will probably not enable 
cpu isolation anyway :).


(C) Introduce cpu_usable_map. That map will be recomputed on hotplug events. 
Essentially it'd be
cpu_online_map AND ~cpu_isolated_map. Convert things like cpusets to use that map instead of 
online map.


We can probably come up with other