Re: cpuisol: CPU isolation extensions (take 2)

2008-02-06 Thread Max Krasnyanskiy
CC'ing linux-rt-users because I think my explanation below may be interesting for the 
RT folks.


Mark Hounschell wrote:

Max Krasnyanskiy wrote:


With CPU isolation
it's very easy to achieve single digit usec worst case and around 200
nsec average response times on off-the-shelf
multi- processor/core systems (vanilla kernel plus these patches) even
under exteme system load. 


Hi Max, could you elaborate on what sort events your response times are
from?


Sure. As I mentioned before I'm working with our legal team on releasing hard RT engine 
that uses isolated CPUs. You can think of that engine as a giant SW PLL. 
It requires a time source that it locks on to. For example the time source can be the 
kernel clock (gtod), some kind of memory mapped counter, or some external event. 
In my case the HW sends me an Ethernet packet every 24 millisecond. 
Once the PLL locks onto the timesource the engine executes a predefined "timeline". 
The timeline basically specifies tasks with offsets in nanoseconds from the start of 
the cycle (ie "at 100 nsec run task1", "at 15000 run task2", etc). The tasks are just 
callbacks.
The jitter in running those tasks is what I meant by "response time". Essentially it's 
a polling design where SW knows precisely when to expect an event. It's not a general

purpose solution but works beautifully for things like wireless PHY/MAC layers 
were the
framing structure is very deterministic and must be strictly enforced. It works for other 
applications as well once you get your head wrapped around the idea :). ie That you do 
not get interrupts for every single event, the SW already knows when that even will come.

btw The engine also enforces the deadlines. For example it knows right away if 
a task is
late and it knows exactly how late. That helps in debugging, a lot :).

The other option is to run normal pthreads on the isolated CPUs. As long as the 
threads
are carefully designed not to do certain things you can get very decent worst case latencies 
(10-12 usec on Opterons and Core2) even with vanilla kernels (patched with the isolation 
patches of course) because all the latency sources have been removed from those CPUs.


Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cpuisol: CPU isolation extensions (take 2)

2008-02-06 Thread Mark Hounschell
Max Krasnyanskiy wrote:

> With CPU isolation
> it's very easy to achieve single digit usec worst case and around 200
> nsec average response times on off-the-shelf
> multi- processor/core systems (vanilla kernel plus these patches) even
> under exteme system load. 

Hi Max, could you elaborate on what sort events your response times are
from?

Regards
Mark
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cpuisol: CPU isolation extensions (take 2)

2008-02-06 Thread Mark Hounschell
Max Krasnyanskiy wrote:

 With CPU isolation
 it's very easy to achieve single digit usec worst case and around 200
 nsec average response times on off-the-shelf
 multi- processor/core systems (vanilla kernel plus these patches) even
 under exteme system load. 

Hi Max, could you elaborate on what sort events your response times are
from?

Regards
Mark
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


cpuisol: CPU isolation extensions (take 2)

2008-02-05 Thread Max Krasnyanskiy

It seems that git-send-email for some reasons did not send an introductory 
email.
So I'm sending it manually. Sorry if you get it twice.

---

Following patch series extends CPU isolation support. Yes, most people want to virtuallize 
CPUs these days and I want to isolate them  :) .


The primary idea here is to be able to use some CPU cores as the dedicated 
engines for running
user-space code with minimal kernel overhead/intervention, think of it as an SPE in the 
Cell processor. I'd like to be able to run a CPU intensive (%100) RT task on one of the 
processors without adversely affecting or being affected by the other system activities. 
System activities here include _kernel_ activities as well. 

I'm personally using this for hard realtime purposes. With CPU isolation it's very easy to 
achieve single digit usec worst case and around 200 nsec average response times on off-the-shelf
multi- processor/core systems (vanilla kernel plus these patches) even under exteme system load. 
I'm working with legal folks on releasing hard RT user-space framework for that.
I believe with the current multi-core CPU trend we will see more and more applications that 
explore this capability: RT gaming engines, simulators, hard RT apps, etc.


Hence the proposal is to extend current CPU isolation feature.
The new definition of the CPU isolation would be:
---
1. Isolated CPU(s) must not be subject to scheduler load balancing
  Users must explicitly bind threads in order to run on those CPU(s).

2. By default interrupts must not be routed to the isolated CPU(s)
  User must route interrupts (if any) to those CPUs explicitly.

3. In general kernel subsystems must avoid activity on the isolated CPU(s) as 
much as possible
  Includes workqueues, per CPU threads, etc.
  This feature is configurable and is disabled by default.  
---


I've been maintaining this stuff since around 2.6.18 and it's been running in 
production
environment for a couple of years now. It's been tested on all kinds of 
machines, from NUMA
boxes like HP xw9300/9400 to tiny uTCA boards like Mercury AXA110.
The messiest part used to be SLAB garbage collector changes. With the new SLUB all that mess 
goes away (ie no changes necessary). Also CFS seems to handle CPU hotplug much better than O(1) 
did (ie domains are recomputed dynamically) so that isolation can be done at any time (via sysfs). 
So this seems like a good time to merge. 


We've had scheduler support for CPU isolation ever since O(1) scheduler went 
it. In other words
#1 is already supported. These patches do not change/affect that functionality in any way. 
#2 is trivial one liner change to the IRQ init code. #3 is address by a couple of separate patches.


The patchset consist of 4 patches. First two are very simple. They simply make "CPU isolation" a 
configurable feature, export cpu_isolated_map and provide some helper functions to access it (just 
like cpu_online() and friends).

Last two patches add support for isolating CPUs from running workqueus and stop 
machine.
More details in the individual patch descriptions.

Folks involved in the scheduler/cpuset development provided a lot of feedback 
on the first series
of patches. I believe I managed to explain and clarify every aspect. 
Paul Jackson initially suggested to implement #2 and #3 using cpusets subsystem. Paul and I looked 
at it more closely and determined that exporting cpu_isolated_map instead is a better option. 

Last patch to the stop machine is potentially unsafe is marked as highly experimental. Unfortunately 
it's currently the only option that allows dynamic module insertion/removal for above scenarios. 
If people still feel that it's t ugly I can revert that change and keep it in the separate tree 
for now.


Ideally I'd like all of this to go in during this merge window. 
Linus please pull this patch set from

git://git.kernel.org/pub/scm/linux/kernel/git/maxk/cpuisol-2.6.git

That tree is rebased against latest (as of today) Linus' tree.

Thanx
Max

b/arch/x86/Kconfig  |1 
b/arch/x86/kernel/genapic_flat_64.c |5 ++-

b/drivers/base/cpu.c|   48 +++
b/include/linux/cpumask.h   |3 ++
b/kernel/Kconfig.cpuisol|   15 +++
b/kernel/Makefile   |4 +-
b/kernel/cpu.c  |   49 
b/kernel/sched.c|   37 ---
b/kernel/stop_machine.c |9 +-
b/kernel/workqueue.c|   31 --
kernel/Kconfig.cpuisol  |   26 ++-
11 files changed, 176 insertions(+), 52 deletions(-)


cpuisol: Make cpu isolation configrable and export isolated map
cpuisol: Do not route IRQs to the CPUs isolated at boot
cpuisol: Do not schedule workqueues on the isolated CPUs
cpuisol: Do not halt isolated CPUs with Stop Machine





--
To unsubscribe from this 

cpuisol: CPU isolation extensions (take 2)

2008-02-05 Thread Max Krasnyanskiy

It seems that git-send-email for some reasons did not send an introductory 
email.
So I'm sending it manually. Sorry if you get it twice.

---

Following patch series extends CPU isolation support. Yes, most people want to virtuallize 
CPUs these days and I want to isolate them  :) .


The primary idea here is to be able to use some CPU cores as the dedicated 
engines for running
user-space code with minimal kernel overhead/intervention, think of it as an SPE in the 
Cell processor. I'd like to be able to run a CPU intensive (%100) RT task on one of the 
processors without adversely affecting or being affected by the other system activities. 
System activities here include _kernel_ activities as well. 

I'm personally using this for hard realtime purposes. With CPU isolation it's very easy to 
achieve single digit usec worst case and around 200 nsec average response times on off-the-shelf
multi- processor/core systems (vanilla kernel plus these patches) even under exteme system load. 
I'm working with legal folks on releasing hard RT user-space framework for that.
I believe with the current multi-core CPU trend we will see more and more applications that 
explore this capability: RT gaming engines, simulators, hard RT apps, etc.


Hence the proposal is to extend current CPU isolation feature.
The new definition of the CPU isolation would be:
---
1. Isolated CPU(s) must not be subject to scheduler load balancing
  Users must explicitly bind threads in order to run on those CPU(s).

2. By default interrupts must not be routed to the isolated CPU(s)
  User must route interrupts (if any) to those CPUs explicitly.

3. In general kernel subsystems must avoid activity on the isolated CPU(s) as 
much as possible
  Includes workqueues, per CPU threads, etc.
  This feature is configurable and is disabled by default.  
---


I've been maintaining this stuff since around 2.6.18 and it's been running in 
production
environment for a couple of years now. It's been tested on all kinds of 
machines, from NUMA
boxes like HP xw9300/9400 to tiny uTCA boards like Mercury AXA110.
The messiest part used to be SLAB garbage collector changes. With the new SLUB all that mess 
goes away (ie no changes necessary). Also CFS seems to handle CPU hotplug much better than O(1) 
did (ie domains are recomputed dynamically) so that isolation can be done at any time (via sysfs). 
So this seems like a good time to merge. 


We've had scheduler support for CPU isolation ever since O(1) scheduler went 
it. In other words
#1 is already supported. These patches do not change/affect that functionality in any way. 
#2 is trivial one liner change to the IRQ init code. #3 is address by a couple of separate patches.


The patchset consist of 4 patches. First two are very simple. They simply make CPU isolation a 
configurable feature, export cpu_isolated_map and provide some helper functions to access it (just 
like cpu_online() and friends).

Last two patches add support for isolating CPUs from running workqueus and stop 
machine.
More details in the individual patch descriptions.

Folks involved in the scheduler/cpuset development provided a lot of feedback 
on the first series
of patches. I believe I managed to explain and clarify every aspect. 
Paul Jackson initially suggested to implement #2 and #3 using cpusets subsystem. Paul and I looked 
at it more closely and determined that exporting cpu_isolated_map instead is a better option. 

Last patch to the stop machine is potentially unsafe is marked as highly experimental. Unfortunately 
it's currently the only option that allows dynamic module insertion/removal for above scenarios. 
If people still feel that it's t ugly I can revert that change and keep it in the separate tree 
for now.


Ideally I'd like all of this to go in during this merge window. 
Linus please pull this patch set from

git://git.kernel.org/pub/scm/linux/kernel/git/maxk/cpuisol-2.6.git

That tree is rebased against latest (as of today) Linus' tree.

Thanx
Max

b/arch/x86/Kconfig  |1 
b/arch/x86/kernel/genapic_flat_64.c |5 ++-

b/drivers/base/cpu.c|   48 +++
b/include/linux/cpumask.h   |3 ++
b/kernel/Kconfig.cpuisol|   15 +++
b/kernel/Makefile   |4 +-
b/kernel/cpu.c  |   49 
b/kernel/sched.c|   37 ---
b/kernel/stop_machine.c |9 +-
b/kernel/workqueue.c|   31 --
kernel/Kconfig.cpuisol  |   26 ++-
11 files changed, 176 insertions(+), 52 deletions(-)


cpuisol: Make cpu isolation configrable and export isolated map
cpuisol: Do not route IRQs to the CPUs isolated at boot
cpuisol: Do not schedule workqueues on the isolated CPUs
cpuisol: Do not halt isolated CPUs with Stop Machine





--
To unsubscribe from this list: 

Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-04 Thread Max Krasnyansky


Paul Jackson wrote:
> Max K wrote:
>>> And for another thing, we already declare externs in cpumask.h for
>>> the other, more widely used, cpu_*_map variables cpu_possible_map,
>>> cpu_online_map, and cpu_present_map.
>> Well, to address #2 and #3 isolated map will need to be exported as well.
>> Those other maps do not really have much to do with the scheduler code.
>> That's why I think either kernel/cpumask.c or kernel/cpu.c is a better place 
>> for them.
> 
> Well, if you have need it to be exported for #2 or #3, then that's ok
> by me - export it.
> 
> I'm unaware of any kernel/cpumask.c.  If you meant lib/cpumask.c, then
> I'd prefer you not put it there, as lib/cpumask.c just contains the
> implementation details of the abstract data type cpumask_t, not any of
> its uses.  If you mean kernel/cpuset.c, then that's not a good choice
> either, as that just contains the implementation details of the cpuset
> subsystem.  You should usually define such things in one of the files
> using it, and unless there is clearly a -better- place to move the
> definition, it's usually better to just leave it where it is.

I was thinking of creating the new file kernel/cpumask.c. But it probably does 
not make sense 
just for the masks. I'm now thinking kernel/cpu.c is the best place for it. It 
contains all 
the cpu hotplug logic that deals with those maps at the very top it has stuff 
like

/* Serializes the updates to cpu_online_map, cpu_present_map */
static DEFINE_MUTEX(cpu_add_remove_lock);

So it seems to make sense to keep the maps in there.

Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-04 Thread Paul Jackson
Max K wrote:
> > And for another thing, we already declare externs in cpumask.h for
> > the other, more widely used, cpu_*_map variables cpu_possible_map,
> > cpu_online_map, and cpu_present_map.
> Well, to address #2 and #3 isolated map will need to be exported as well.
> Those other maps do not really have much to do with the scheduler code.
> That's why I think either kernel/cpumask.c or kernel/cpu.c is a better place 
> for them.

Well, if you have need it to be exported for #2 or #3, then that's ok
by me - export it.

I'm unaware of any kernel/cpumask.c.  If you meant lib/cpumask.c, then
I'd prefer you not put it there, as lib/cpumask.c just contains the
implementation details of the abstract data type cpumask_t, not any of
its uses.  If you mean kernel/cpuset.c, then that's not a good choice
either, as that just contains the implementation details of the cpuset
subsystem.  You should usually define such things in one of the files
using it, and unless there is clearly a -better- place to move the
definition, it's usually better to just leave it where it is.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.940.382.4214
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


CPU isolation and workqueues [was Re: [CPUISOL] CPU isolation extensions]

2008-02-04 Thread Max Krasnyanskiy


Peter Zijlstra wrote:

On Mon, 2008-01-28 at 14:00 -0500, Steven Rostedt wrote:

On Mon, 28 Jan 2008, Max Krasnyanskiy wrote:

  [PATCH] [CPUISOL] Support for workqueue isolation

The thing about workqueues is that they should only be woken on a CPU if
something on that CPU accessed them. IOW, the workqueue on a CPU handles
work that was called by something on that CPU. Which means that
something that high prio task did triggered a workqueue to do some work.
But this can also be triggered by interrupts, so by keeping interrupts
off the CPU no workqueue should be activated.

No no no. That's what I though too ;-). The problem is that things like NFS and 
friends
expect _all_ their workqueue threads to report back when they do certain things 
like
flushing buffers and stuff. The reason I added this is because my machines were 
getting
stuck because CPU0 was waiting for CPU1 to run NFS work queue threads even 
though no IRQs
or other things are running on it.

This sounds more like we should fix NFS than add this for all workqueues.
Again, we want workqueues to run on the behalf of whatever is running on
that CPU, including those tasks that are running on an isolcpu.


agreed, by looking at my top output (and not the nfs code) it looks like
it just spawns a configurable number of active kernel threads which are
not cpu bound by in any way. I think just removing the isolated cpus
from their runnable mask should take care of them.


Peter, Steven,

I think I convinced you guys last time but I did not have a convincing example. 
So here is some
more info on why workqueues need to be aware of isolated cpus.

Here is how a work queue gets flushed.

static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq)
{
   int active;

   if (cwq->thread == current) {
   /*
* Probably keventd trying to flush its own queue. So simply run
* it by hand rather than deadlocking.
*/
   run_workqueue(cwq);
   active = 1;
   } else {
   struct wq_barrier barr;

   active = 0;
   spin_lock_irq(>lock);
   if (!list_empty(>worklist) || cwq->current_work != NULL) {
   insert_wq_barrier(cwq, , 1);
   active = 1;
   }
   spin_unlock_irq(>lock);

   if (active)
   wait_for_completion();
   }

   return active;
}

void fastcall flush_workqueue(struct workqueue_struct *wq)
{
   const cpumask_t *cpu_map = wq_cpu_map(wq);
   int cpu;

   might_sleep();
   lock_acquire(>lockdep_map, 0, 0, 0, 2, _THIS_IP_);
   lock_release(>lockdep_map, 1, _THIS_IP_);
   for_each_cpu_mask(cpu, *cpu_map)
   flush_cpu_workqueue(per_cpu_ptr(wq->cpu_wq, cpu));
}

In other words it schedules some work on each cpu and expects workqueue thread to run and 
trigger the completion. This is what I meant that _all_ threads are expected to report 
back even if there is nothing running on that CPU.


So my patch simply makes sure that isolated CPUs are ignored (if work queue 
isolation is enabled)
that work queue threads are not started on isolated in the CPUs that are 
isolated.

Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-04 Thread Max Krasnyanskiy

Paul Jackson wrote:

Max wrote:

Looks like I failed to explain what I'm trying to achieve. So let me try again.


Well done.  I read through that, expecting to disagree or at least
to not understand at some point, and got all the way through nodding
my head in agreement.  Good.

Whether the earlier confusions were lack of clarity in the presentation,
or lack of competence in my brain ... well guess I don't want to ask that
question ;).

:)


Well ... just one minor point:

Max wrote in reply to pj:

The cpu_isolated_map is a file static variable known only within
the kernel/sched.c file; this should not change.

I completely disagree. In fact I think all the cpu_xxx_map (online, present, 
isolated)
variables do not belong in the scheduler code. I'm thinking of submitting a 
patch that
factors them out into kernel/cpumask.c We already have cpumask.h.


Huh?  Why would you want to do that?

For one thing, the map being discussed here, cpu_isolated_map,
is only used in sched.c, so why publish it wider?

And for another thing, we already declare externs in cpumask.h for
the other, more widely used, cpu_*_map variables cpu_possible_map,
cpu_online_map, and cpu_present_map.

Well, to address #2 and #3 isolated map will need to be exported as well.
Those other maps do not really have much to do with the scheduler code.
That's why I think either kernel/cpumask.c or kernel/cpu.c is a better place 
for them.


Other than that detail, we seem to be communicating and in agreement on
your first item, isolating CPU scheduler load balancing.  Good.

On your other two items, irq and workqueue isolation, which I had
suggested doing via cpuset sched_load_balance, I now agree that that
wasn't a good idea.

I am still a little surprised at using isolation extensions to
disable irqs on select CPUs; but others have thought far more about
irqs than I have, so I'll be quiet.

Please note that we're not talking about completely disabling IRQs. We're 
talking about
not routing them to the isolated CPUs by default. It's still possible to 
explicitly reroute an IRQ
to the isolated CPU.
Why is this needed  ? It is actually very easy to explain. IRQs are the major source of latency 
and overhead. IRQ handlers themselves are mostly ok but they typically schedule soft irqs, work 
queues and timers on the same CPU where an IRQ is handled. In other words if an isolated CPU is 
receiving IRQs it's not really isolated, because it's running a whole bunch of different kernel 
code (ie we're talking latencies, cache usage, etc). 
If course some folks may want to explicitly route certain  IRQs to the isolated CPUs. For example 
if an app depends on the network stack it may make sense to route an IRQ from the NIC to the same 
CPU the app is running on.


Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-04 Thread Paul Jackson
Max wrote:
> Looks like I failed to explain what I'm trying to achieve. So let me try 
> again.

Well done.  I read through that, expecting to disagree or at least
to not understand at some point, and got all the way through nodding
my head in agreement.  Good.

Whether the earlier confusions were lack of clarity in the presentation,
or lack of competence in my brain ... well guess I don't want to ask that
question ;).

Well ... just one minor point:

Max wrote in reply to pj:
> > The cpu_isolated_map is a file static variable known only within
> > the kernel/sched.c file; this should not change.
> I completely disagree. In fact I think all the cpu_xxx_map (online, present, 
> isolated)
> variables do not belong in the scheduler code. I'm thinking of submitting a 
> patch that
> factors them out into kernel/cpumask.c We already have cpumask.h.

Huh?  Why would you want to do that?

For one thing, the map being discussed here, cpu_isolated_map,
is only used in sched.c, so why publish it wider?

And for another thing, we already declare externs in cpumask.h for
the other, more widely used, cpu_*_map variables cpu_possible_map,
cpu_online_map, and cpu_present_map.

Other than that detail, we seem to be communicating and in agreement on
your first item, isolating CPU scheduler load balancing.  Good.

On your other two items, irq and workqueue isolation, which I had
suggested doing via cpuset sched_load_balance, I now agree that that
wasn't a good idea.

I am still a little surprised at using isolation extensions to
disable irqs on select CPUs; but others have thought far more about
irqs than I have, so I'll be quiet.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.940.382.4214
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-04 Thread Paul Jackson
Max wrote:
 Looks like I failed to explain what I'm trying to achieve. So let me try 
 again.

Well done.  I read through that, expecting to disagree or at least
to not understand at some point, and got all the way through nodding
my head in agreement.  Good.

Whether the earlier confusions were lack of clarity in the presentation,
or lack of competence in my brain ... well guess I don't want to ask that
question ;).

Well ... just one minor point:

Max wrote in reply to pj:
  The cpu_isolated_map is a file static variable known only within
  the kernel/sched.c file; this should not change.
 I completely disagree. In fact I think all the cpu_xxx_map (online, present, 
 isolated)
 variables do not belong in the scheduler code. I'm thinking of submitting a 
 patch that
 factors them out into kernel/cpumask.c We already have cpumask.h.

Huh?  Why would you want to do that?

For one thing, the map being discussed here, cpu_isolated_map,
is only used in sched.c, so why publish it wider?

And for another thing, we already declare externs in cpumask.h for
the other, more widely used, cpu_*_map variables cpu_possible_map,
cpu_online_map, and cpu_present_map.

Other than that detail, we seem to be communicating and in agreement on
your first item, isolating CPU scheduler load balancing.  Good.

On your other two items, irq and workqueue isolation, which I had
suggested doing via cpuset sched_load_balance, I now agree that that
wasn't a good idea.

I am still a little surprised at using isolation extensions to
disable irqs on select CPUs; but others have thought far more about
irqs than I have, so I'll be quiet.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.940.382.4214
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-04 Thread Max Krasnyanskiy

Paul Jackson wrote:

Max wrote:

Looks like I failed to explain what I'm trying to achieve. So let me try again.


Well done.  I read through that, expecting to disagree or at least
to not understand at some point, and got all the way through nodding
my head in agreement.  Good.

Whether the earlier confusions were lack of clarity in the presentation,
or lack of competence in my brain ... well guess I don't want to ask that
question ;).

:)


Well ... just one minor point:

Max wrote in reply to pj:

The cpu_isolated_map is a file static variable known only within
the kernel/sched.c file; this should not change.

I completely disagree. In fact I think all the cpu_xxx_map (online, present, 
isolated)
variables do not belong in the scheduler code. I'm thinking of submitting a 
patch that
factors them out into kernel/cpumask.c We already have cpumask.h.


Huh?  Why would you want to do that?

For one thing, the map being discussed here, cpu_isolated_map,
is only used in sched.c, so why publish it wider?

And for another thing, we already declare externs in cpumask.h for
the other, more widely used, cpu_*_map variables cpu_possible_map,
cpu_online_map, and cpu_present_map.

Well, to address #2 and #3 isolated map will need to be exported as well.
Those other maps do not really have much to do with the scheduler code.
That's why I think either kernel/cpumask.c or kernel/cpu.c is a better place 
for them.


Other than that detail, we seem to be communicating and in agreement on
your first item, isolating CPU scheduler load balancing.  Good.

On your other two items, irq and workqueue isolation, which I had
suggested doing via cpuset sched_load_balance, I now agree that that
wasn't a good idea.

I am still a little surprised at using isolation extensions to
disable irqs on select CPUs; but others have thought far more about
irqs than I have, so I'll be quiet.

Please note that we're not talking about completely disabling IRQs. We're 
talking about
not routing them to the isolated CPUs by default. It's still possible to 
explicitly reroute an IRQ
to the isolated CPU.
Why is this needed  ? It is actually very easy to explain. IRQs are the major source of latency 
and overhead. IRQ handlers themselves are mostly ok but they typically schedule soft irqs, work 
queues and timers on the same CPU where an IRQ is handled. In other words if an isolated CPU is 
receiving IRQs it's not really isolated, because it's running a whole bunch of different kernel 
code (ie we're talking latencies, cache usage, etc). 
If course some folks may want to explicitly route certain  IRQs to the isolated CPUs. For example 
if an app depends on the network stack it may make sense to route an IRQ from the NIC to the same 
CPU the app is running on.


Max
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


CPU isolation and workqueues [was Re: [CPUISOL] CPU isolation extensions]

2008-02-04 Thread Max Krasnyanskiy


Peter Zijlstra wrote:

On Mon, 2008-01-28 at 14:00 -0500, Steven Rostedt wrote:

On Mon, 28 Jan 2008, Max Krasnyanskiy wrote:

  [PATCH] [CPUISOL] Support for workqueue isolation

The thing about workqueues is that they should only be woken on a CPU if
something on that CPU accessed them. IOW, the workqueue on a CPU handles
work that was called by something on that CPU. Which means that
something that high prio task did triggered a workqueue to do some work.
But this can also be triggered by interrupts, so by keeping interrupts
off the CPU no workqueue should be activated.

No no no. That's what I though too ;-). The problem is that things like NFS and 
friends
expect _all_ their workqueue threads to report back when they do certain things 
like
flushing buffers and stuff. The reason I added this is because my machines were 
getting
stuck because CPU0 was waiting for CPU1 to run NFS work queue threads even 
though no IRQs
or other things are running on it.

This sounds more like we should fix NFS than add this for all workqueues.
Again, we want workqueues to run on the behalf of whatever is running on
that CPU, including those tasks that are running on an isolcpu.


agreed, by looking at my top output (and not the nfs code) it looks like
it just spawns a configurable number of active kernel threads which are
not cpu bound by in any way. I think just removing the isolated cpus
from their runnable mask should take care of them.


Peter, Steven,

I think I convinced you guys last time but I did not have a convincing example. 
So here is some
more info on why workqueues need to be aware of isolated cpus.

Here is how a work queue gets flushed.

static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq)
{
   int active;

   if (cwq-thread == current) {
   /*
* Probably keventd trying to flush its own queue. So simply run
* it by hand rather than deadlocking.
*/
   run_workqueue(cwq);
   active = 1;
   } else {
   struct wq_barrier barr;

   active = 0;
   spin_lock_irq(cwq-lock);
   if (!list_empty(cwq-worklist) || cwq-current_work != NULL) {
   insert_wq_barrier(cwq, barr, 1);
   active = 1;
   }
   spin_unlock_irq(cwq-lock);

   if (active)
   wait_for_completion(barr.done);
   }

   return active;
}

void fastcall flush_workqueue(struct workqueue_struct *wq)
{
   const cpumask_t *cpu_map = wq_cpu_map(wq);
   int cpu;

   might_sleep();
   lock_acquire(wq-lockdep_map, 0, 0, 0, 2, _THIS_IP_);
   lock_release(wq-lockdep_map, 1, _THIS_IP_);
   for_each_cpu_mask(cpu, *cpu_map)
   flush_cpu_workqueue(per_cpu_ptr(wq-cpu_wq, cpu));
}

In other words it schedules some work on each cpu and expects workqueue thread to run and 
trigger the completion. This is what I meant that _all_ threads are expected to report 
back even if there is nothing running on that CPU.


So my patch simply makes sure that isolated CPUs are ignored (if work queue 
isolation is enabled)
that work queue threads are not started on isolated in the CPUs that are 
isolated.

Max
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-04 Thread Paul Jackson
Max K wrote:
  And for another thing, we already declare externs in cpumask.h for
  the other, more widely used, cpu_*_map variables cpu_possible_map,
  cpu_online_map, and cpu_present_map.
 Well, to address #2 and #3 isolated map will need to be exported as well.
 Those other maps do not really have much to do with the scheduler code.
 That's why I think either kernel/cpumask.c or kernel/cpu.c is a better place 
 for them.

Well, if you have need it to be exported for #2 or #3, then that's ok
by me - export it.

I'm unaware of any kernel/cpumask.c.  If you meant lib/cpumask.c, then
I'd prefer you not put it there, as lib/cpumask.c just contains the
implementation details of the abstract data type cpumask_t, not any of
its uses.  If you mean kernel/cpuset.c, then that's not a good choice
either, as that just contains the implementation details of the cpuset
subsystem.  You should usually define such things in one of the files
using it, and unless there is clearly a -better- place to move the
definition, it's usually better to just leave it where it is.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.940.382.4214
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-04 Thread Max Krasnyansky


Paul Jackson wrote:
 Max K wrote:
 And for another thing, we already declare externs in cpumask.h for
 the other, more widely used, cpu_*_map variables cpu_possible_map,
 cpu_online_map, and cpu_present_map.
 Well, to address #2 and #3 isolated map will need to be exported as well.
 Those other maps do not really have much to do with the scheduler code.
 That's why I think either kernel/cpumask.c or kernel/cpu.c is a better place 
 for them.
 
 Well, if you have need it to be exported for #2 or #3, then that's ok
 by me - export it.
 
 I'm unaware of any kernel/cpumask.c.  If you meant lib/cpumask.c, then
 I'd prefer you not put it there, as lib/cpumask.c just contains the
 implementation details of the abstract data type cpumask_t, not any of
 its uses.  If you mean kernel/cpuset.c, then that's not a good choice
 either, as that just contains the implementation details of the cpuset
 subsystem.  You should usually define such things in one of the files
 using it, and unless there is clearly a -better- place to move the
 definition, it's usually better to just leave it where it is.

I was thinking of creating the new file kernel/cpumask.c. But it probably does 
not make sense 
just for the masks. I'm now thinking kernel/cpu.c is the best place for it. It 
contains all 
the cpu hotplug logic that deals with those maps at the very top it has stuff 
like

/* Serializes the updates to cpu_online_map, cpu_present_map */
static DEFINE_MUTEX(cpu_add_remove_lock);

So it seems to make sense to keep the maps in there.

Max
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-02-03 Thread Max Krasnyansky
Hi Daniel,

Sorry for not replying right away.

Daniel Walker wrote:
> On Mon, 2008-01-28 at 16:12 -0800, Max Krasnyanskiy wrote:
> 
>> Not accurate enough and way too much overhead for what I need. I know at 
>> this point it probably 
>> sounds like I'm talking BS :). I wish I've released the engine and examples 
>> by now. Anyway let 
>> me just say that SW MAC has crazy tight deadlines with lots of small tasks. 
>> Using nanosleep() & 
>> gettimeofday() is simply not practical. So it's all TSC based with clever 
>> time sync logic between
>> HW and SW.
> 
> I don't know if it's BS or not, you clearly fixed your own problem which
> is good .. Although when you say "RT patches cannot achieve what I
> needed. Even RTAI/Xenomai can't do that." , and HRT is "Not accurate
> enough and way too much overhead" .. Given the hardware your using,
> that's all difficult to believe.. You also said this code has been
> running on production systems for two year, which means it's at least
> two years old .. There's been some good sized leaps in real time linux
> in the past two years ..

I've been actually tracking RT patches fairly closely. I can't say I tried all 
of them but I do try
them from time to time. I just got latest 2.6.24-rt1 running on HP xw9300. 
Looks like it does not handle
CPU hotplug very well, I manged to kill it by bringing cpu 1 off-line. So I 
cannot run any tests right 
now will run some tomorrow.
 
For now let me mention that I have a simple tests that sleeps for a 
millisecond, then does some bitbanging 
for 200 usec. It measures jitter caused by the periodic scheduler tick, IPIs 
and other kernel activities.
With high-res timers disabled on most of the machines I mentioned before it 
shows around 1-1.2usec worst case. 
With high-res timers enabled it shows 5-6usec. This is with 2.6.24 running on 
an isolated CPU. Forget about
using a user-space timer (nanosleep(), etc). Even scheduler tick itself is 
fairly heavy.
gettimeofday() call on that machine takes on average 2-3usec (not a vsyscall) 
and SW MAC is all about precise 
timing. That's why I said that it's not practical to use that stuff for me. I 
do not see anything in -rt kernel 
that would improve this.

This is btw not to say that -rt kernel is not useful for my app in general. We 
have a bunch of soft-RT threads
that talk to the MAC thread. Those would definitely benefit. I think cpu 
isolation + -rt would work beautifully
for wireless basestations.

Max 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-03 Thread Max Krasnyansky
Paul Jackson wrote:
> Max wrote:
>> Paul, I actually mentioned at the beginning of my email that I did read that 
>> thread
>> started by Peter. I did learn quite a bit from it :)
> 
> Ah - sorry - I missed that part.  However, I'm still getting the feeling
> that there were some key points in that thread that we have not managed
> to communicate successfully.
I think you are assuming that I only need to deal with RT scheduler and 
scheduler
domains which is not correct. See below.

>> Sounds like at this point we're in agreement that sched_load_balance is not 
>> suitable
>> for what I'd like to achieve.
> 
> I don't think we're in agreement; I think we're in confusion ;)
Yeah. I don't believe I'm the confused side though ;-)

> Yes, sched_load_balance does not *directly* have anything to do with this.
> 
> But indirectly it is a critical element in what I think you'd like to
> achieve.  It affects how the cpuset code sets up sched_domains, and
> if I understand correctly, you require either (1) some sched_domains to
> only contain RT tasks, or (2) some CPUs to be in no sched_domain at all.
> 
> Proper configuration of the cpuset hierarchy, including the setting of
> the per-cpuset sched_load_balance flag, can provide either of these
> sched_domain partitions, as desired.
Again you're assuming that scheduling domain partitioning satisfies my 
requirements
or addresses my use case. It does not. See below for more details.
 
>> But how about making cpusets aware of the cpu_isolated_map ?
> 
> No.  That's confusing cpusets and the scheduler again.
> 
> The cpu_isolated_map is a file static variable known only within
> the kernel/sched.c file; this should not change.
I completely disagree. In fact I think all the cpu_xxx_map (online, present, 
isolated)
variables do not belong in the scheduler code. I'm thinking of submitting a 
patch that
factors them out into kernel/cpumask.c We already have cpumask.h.

> Presently, the boot parameter isolcpus= is just used to initialize
> what CPUs are isolated at boot, and then the sched_domain partitioning,
> as done in kernel/sched.c:partition_sched_domains() (the hook into
> the sched code that cpusets uses) determines which CPUs are isolated
> from that point forward.  I doubt that this should change either.
Sure, I did not even touch that part. I just proposed to extend the meaning of 
the 
'isolated' bit.

> In that thread referenced above, did you see the part where RT is
> achieved not by isolating CPUs from any scheduler, but rather by
> polymorphically having several schedulers available to operate on each
> sched_domain, and having RT threads self-select the RT scheduler?
Absolutely. Yes that is. I saw that part. But it has nothing to do with my use 
case.

Looks like I failed to explain what I'm trying to achieve. So let me try again.
I'd like to be able to run a CPU intensive (%100) RT task on one of the 
processors without 
adversely affecting or being affected by the other system activities. System 
activities 
here include _kernel_ activities as well. Hence the proposal is to extend 
current CPU 
isolation feature.

The new definition of the CPU isolation would be:
---
1. Isolated CPU(s) must not be subject to scheduler load balancing
   Users must explicitly bind threads in order to run on those CPU(s).

2. By default interrupts must not be routed to the isolated CPU(s)
   User must route interrupts (if any) explicitly.

3. In general kernel subsystems must avoid activity on the isolated CPU(s) as 
much as possible
   Includes workqueues, per CPU threads, etc.
   This feature is configurable and is disabled by default.  
---

#1 affects scheduler and scheduler domains. It's already supported either by 
using isolcpus= boot
option or by setting "sched_load_balance" in cpusets. I'm totally happy with 
the current behavior
and my original patch did not mess with this functionality in any way.

#2 and #3 have _nothing_ to do with the scheduler or scheduler domains. I've 
been trying to explain 
that for a few days now ;-). When you saw my patches for #2 and #3 you told me 
that you'd be interested 
to see them implemented on top of the "sched_load_balance" flag. Here is your 
original reply
http://marc.info/?l=linux-kernel=120153260217699=2

So I looked into that and provided an explanation why it would not work or 
would work but would add 
lots of complexity (access to internal cpuset structures, locking, etc).
My email on that is here:
http://marc.info/?l=linux-kernel=120180692331461=2

Now, I felt from the beginning that cpusets is not the right mechanism to 
address number #2 and #3.
The best mechanism IMO is to simply provide an access to the cpu_isolated_map 
to the rest of the kernel.
Again the fact that cpu_isolated_map currently lives in the scheduler code does 
not change anything 
here because as I explained I'm proposing to extend the meaning of the "CPU 
isolation". I provided 
dynamic access to the "isolated" bit only for 

Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-03 Thread Max Krasnyansky
Paul Jackson wrote:
 Max wrote:
 Paul, I actually mentioned at the beginning of my email that I did read that 
 thread
 started by Peter. I did learn quite a bit from it :)
 
 Ah - sorry - I missed that part.  However, I'm still getting the feeling
 that there were some key points in that thread that we have not managed
 to communicate successfully.
I think you are assuming that I only need to deal with RT scheduler and 
scheduler
domains which is not correct. See below.

 Sounds like at this point we're in agreement that sched_load_balance is not 
 suitable
 for what I'd like to achieve.
 
 I don't think we're in agreement; I think we're in confusion ;)
Yeah. I don't believe I'm the confused side though ;-)

 Yes, sched_load_balance does not *directly* have anything to do with this.
 
 But indirectly it is a critical element in what I think you'd like to
 achieve.  It affects how the cpuset code sets up sched_domains, and
 if I understand correctly, you require either (1) some sched_domains to
 only contain RT tasks, or (2) some CPUs to be in no sched_domain at all.
 
 Proper configuration of the cpuset hierarchy, including the setting of
 the per-cpuset sched_load_balance flag, can provide either of these
 sched_domain partitions, as desired.
Again you're assuming that scheduling domain partitioning satisfies my 
requirements
or addresses my use case. It does not. See below for more details.
 
 But how about making cpusets aware of the cpu_isolated_map ?
 
 No.  That's confusing cpusets and the scheduler again.
 
 The cpu_isolated_map is a file static variable known only within
 the kernel/sched.c file; this should not change.
I completely disagree. In fact I think all the cpu_xxx_map (online, present, 
isolated)
variables do not belong in the scheduler code. I'm thinking of submitting a 
patch that
factors them out into kernel/cpumask.c We already have cpumask.h.

 Presently, the boot parameter isolcpus= is just used to initialize
 what CPUs are isolated at boot, and then the sched_domain partitioning,
 as done in kernel/sched.c:partition_sched_domains() (the hook into
 the sched code that cpusets uses) determines which CPUs are isolated
 from that point forward.  I doubt that this should change either.
Sure, I did not even touch that part. I just proposed to extend the meaning of 
the 
'isolated' bit.

 In that thread referenced above, did you see the part where RT is
 achieved not by isolating CPUs from any scheduler, but rather by
 polymorphically having several schedulers available to operate on each
 sched_domain, and having RT threads self-select the RT scheduler?
Absolutely. Yes that is. I saw that part. But it has nothing to do with my use 
case.

Looks like I failed to explain what I'm trying to achieve. So let me try again.
I'd like to be able to run a CPU intensive (%100) RT task on one of the 
processors without 
adversely affecting or being affected by the other system activities. System 
activities 
here include _kernel_ activities as well. Hence the proposal is to extend 
current CPU 
isolation feature.

The new definition of the CPU isolation would be:
---
1. Isolated CPU(s) must not be subject to scheduler load balancing
   Users must explicitly bind threads in order to run on those CPU(s).

2. By default interrupts must not be routed to the isolated CPU(s)
   User must route interrupts (if any) explicitly.

3. In general kernel subsystems must avoid activity on the isolated CPU(s) as 
much as possible
   Includes workqueues, per CPU threads, etc.
   This feature is configurable and is disabled by default.  
---

#1 affects scheduler and scheduler domains. It's already supported either by 
using isolcpus= boot
option or by setting sched_load_balance in cpusets. I'm totally happy with 
the current behavior
and my original patch did not mess with this functionality in any way.

#2 and #3 have _nothing_ to do with the scheduler or scheduler domains. I've 
been trying to explain 
that for a few days now ;-). When you saw my patches for #2 and #3 you told me 
that you'd be interested 
to see them implemented on top of the sched_load_balance flag. Here is your 
original reply
http://marc.info/?l=linux-kernelm=120153260217699w=2

So I looked into that and provided an explanation why it would not work or 
would work but would add 
lots of complexity (access to internal cpuset structures, locking, etc).
My email on that is here:
http://marc.info/?l=linux-kernelm=120180692331461w=2

Now, I felt from the beginning that cpusets is not the right mechanism to 
address number #2 and #3.
The best mechanism IMO is to simply provide an access to the cpu_isolated_map 
to the rest of the kernel.
Again the fact that cpu_isolated_map currently lives in the scheduler code does 
not change anything 
here because as I explained I'm proposing to extend the meaning of the CPU 
isolation. I provided 
dynamic access to the isolated bit only for convince, it does _not_ change 
existing 

Re: [CPUISOL] CPU isolation extensions

2008-02-03 Thread Max Krasnyansky
Hi Daniel,

Sorry for not replying right away.

Daniel Walker wrote:
 On Mon, 2008-01-28 at 16:12 -0800, Max Krasnyanskiy wrote:
 
 Not accurate enough and way too much overhead for what I need. I know at 
 this point it probably 
 sounds like I'm talking BS :). I wish I've released the engine and examples 
 by now. Anyway let 
 me just say that SW MAC has crazy tight deadlines with lots of small tasks. 
 Using nanosleep()  
 gettimeofday() is simply not practical. So it's all TSC based with clever 
 time sync logic between
 HW and SW.
 
 I don't know if it's BS or not, you clearly fixed your own problem which
 is good .. Although when you say RT patches cannot achieve what I
 needed. Even RTAI/Xenomai can't do that. , and HRT is Not accurate
 enough and way too much overhead .. Given the hardware your using,
 that's all difficult to believe.. You also said this code has been
 running on production systems for two year, which means it's at least
 two years old .. There's been some good sized leaps in real time linux
 in the past two years ..

I've been actually tracking RT patches fairly closely. I can't say I tried all 
of them but I do try
them from time to time. I just got latest 2.6.24-rt1 running on HP xw9300. 
Looks like it does not handle
CPU hotplug very well, I manged to kill it by bringing cpu 1 off-line. So I 
cannot run any tests right 
now will run some tomorrow.
 
For now let me mention that I have a simple tests that sleeps for a 
millisecond, then does some bitbanging 
for 200 usec. It measures jitter caused by the periodic scheduler tick, IPIs 
and other kernel activities.
With high-res timers disabled on most of the machines I mentioned before it 
shows around 1-1.2usec worst case. 
With high-res timers enabled it shows 5-6usec. This is with 2.6.24 running on 
an isolated CPU. Forget about
using a user-space timer (nanosleep(), etc). Even scheduler tick itself is 
fairly heavy.
gettimeofday() call on that machine takes on average 2-3usec (not a vsyscall) 
and SW MAC is all about precise 
timing. That's why I said that it's not practical to use that stuff for me. I 
do not see anything in -rt kernel 
that would improve this.

This is btw not to say that -rt kernel is not useful for my app in general. We 
have a bunch of soft-RT threads
that talk to the MAC thread. Those would definitely benefit. I think cpu 
isolation + -rt would work beautifully
for wireless basestations.

Max 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-02 Thread Paul Jackson
Max wrote:
> Paul, I actually mentioned at the beginning of my email that I did read that 
> thread
> started by Peter. I did learn quite a bit from it :)

Ah - sorry - I missed that part.  However, I'm still getting the feeling
that there were some key points in that thread that we have not managed
to communicate successfully.

> Sounds like at this point we're in agreement that sched_load_balance is not 
> suitable
> for what I'd like to achieve.

I don't think we're in agreement; I think we're in confusion ;)

Yes, sched_load_balance does not *directly* have anything to do with
this.

But indirectly it is a critical element in what I think you'd like to
achieve.  It affects how the cpuset code sets up sched_domains, and
if I understand correctly, you require either (1) some sched_domains to
only contain RT tasks, or (2) some CPUs to be in no sched_domain at all.

Proper configuration of the cpuset hierarchy, including the setting of
the per-cpuset sched_load_balance flag, can provide either of these
sched_domain partitions, as desired.

> But how about making cpusets aware of the cpu_isolated_map ?

No.  That's confusing cpusets and the scheduler again.

The cpu_isolated_map is a file static variable known only within
the kernel/sched.c file; this should not change.

Presently, the boot parameter isolcpus= is just used to initialize
what CPUs are isolated at boot, and then the sched_domain partitioning,
as done in kernel/sched.c:partition_sched_domains() (the hook into
the sched code that cpusets uses) determines which CPUs are isolated
from that point forward.  I doubt that this should change either.

In that thread referenced above, did you see the part where RT is
achieved not by isolating CPUs from any scheduler, but rather by
polymorphically having several schedulers available to operate on each
sched_domain, and having RT threads self-select the RT scheduler?

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.940.382.4214
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-02 Thread Max Krasnyansky
Paul Jackson wrote:
> Max wrote:
>> Here is the list of things of issues with sched_load_balance flag from CPU 
>> isolation 
>> perspective:
> 
> A separate thread happened to start up on lkml.org, shortly after
> yours, that went into this in considerable detail.
> 
> For example, the interaction of cpusets, sched_load_balance,
> sched_domains and real time scheduling is examined in some detail on
> this thread.  Everyone participating on that thread learned something
> (we all came into it with less than a full picture of what's there.)
> 
> I would encourage you to read it closely.  For example, the scheduler
> code should not be trying to access per-cpuset attributes such as
> the sched_load_balance flag (you are correct that this would be
> difficult to do because of the locking; however by design, that is
> not to be done.)
> 
> This thread begins at:
> 
> scheduler scalability - cgroups, cpusets and load-balancing
> http://lkml.org/lkml/2008/1/29/60
> 
> Too bad we didn't think to include you in the CC list of that
> thread from the beginning.

Paul, I actually mentioned at the beginning of my email that I did read that 
thread
started by Peter. I did learn quite a bit from it :)
You guys did not discuss isolation stuff though. The thread was only about 
scheduling
and my cpu isolation extension patches deal with other aspects. 

Sounds like at this point we're in agreement that sched_load_balance is not 
suitable
for what I'd like to achieve. But how about making cpusets aware of the 
cpu_isolated_map ?
Even without my patches it's somewhat of an issue right now. I mean of you use 
isolcpus= 
boot option to put cpus into null domain, cpusets will not be aware of it. The 
result maybe 
a bit confusing if an isolated cpu is added to some cpuset.

Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-02 Thread Max Krasnyansky
Paul Jackson wrote:
 Max wrote:
 Here is the list of things of issues with sched_load_balance flag from CPU 
 isolation 
 perspective:
 
 A separate thread happened to start up on lkml.org, shortly after
 yours, that went into this in considerable detail.
 
 For example, the interaction of cpusets, sched_load_balance,
 sched_domains and real time scheduling is examined in some detail on
 this thread.  Everyone participating on that thread learned something
 (we all came into it with less than a full picture of what's there.)
 
 I would encourage you to read it closely.  For example, the scheduler
 code should not be trying to access per-cpuset attributes such as
 the sched_load_balance flag (you are correct that this would be
 difficult to do because of the locking; however by design, that is
 not to be done.)
 
 This thread begins at:
 
 scheduler scalability - cgroups, cpusets and load-balancing
 http://lkml.org/lkml/2008/1/29/60
 
 Too bad we didn't think to include you in the CC list of that
 thread from the beginning.

Paul, I actually mentioned at the beginning of my email that I did read that 
thread
started by Peter. I did learn quite a bit from it :)
You guys did not discuss isolation stuff though. The thread was only about 
scheduling
and my cpu isolation extension patches deal with other aspects. 

Sounds like at this point we're in agreement that sched_load_balance is not 
suitable
for what I'd like to achieve. But how about making cpusets aware of the 
cpu_isolated_map ?
Even without my patches it's somewhat of an issue right now. I mean of you use 
isolcpus= 
boot option to put cpus into null domain, cpusets will not be aware of it. The 
result maybe 
a bit confusing if an isolated cpu is added to some cpuset.

Max
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-02 Thread Paul Jackson
Max wrote:
 Paul, I actually mentioned at the beginning of my email that I did read that 
 thread
 started by Peter. I did learn quite a bit from it :)

Ah - sorry - I missed that part.  However, I'm still getting the feeling
that there were some key points in that thread that we have not managed
to communicate successfully.

 Sounds like at this point we're in agreement that sched_load_balance is not 
 suitable
 for what I'd like to achieve.

I don't think we're in agreement; I think we're in confusion ;)

Yes, sched_load_balance does not *directly* have anything to do with
this.

But indirectly it is a critical element in what I think you'd like to
achieve.  It affects how the cpuset code sets up sched_domains, and
if I understand correctly, you require either (1) some sched_domains to
only contain RT tasks, or (2) some CPUs to be in no sched_domain at all.

Proper configuration of the cpuset hierarchy, including the setting of
the per-cpuset sched_load_balance flag, can provide either of these
sched_domain partitions, as desired.

 But how about making cpusets aware of the cpu_isolated_map ?

No.  That's confusing cpusets and the scheduler again.

The cpu_isolated_map is a file static variable known only within
the kernel/sched.c file; this should not change.

Presently, the boot parameter isolcpus= is just used to initialize
what CPUs are isolated at boot, and then the sched_domain partitioning,
as done in kernel/sched.c:partition_sched_domains() (the hook into
the sched code that cpusets uses) determines which CPUs are isolated
from that point forward.  I doubt that this should change either.

In that thread referenced above, did you see the part where RT is
achieved not by isolating CPUs from any scheduler, but rather by
polymorphically having several schedulers available to operate on each
sched_domain, and having RT threads self-select the RT scheduler?

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.940.382.4214
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-01 Thread Paul Jackson
Max wrote:
> Here is the list of things of issues with sched_load_balance flag from CPU 
> isolation 
> perspective:

A separate thread happened to start up on lkml.org, shortly after
yours, that went into this in considerable detail.

For example, the interaction of cpusets, sched_load_balance,
sched_domains and real time scheduling is examined in some detail on
this thread.  Everyone participating on that thread learned something
(we all came into it with less than a full picture of what's there.)

I would encourage you to read it closely.  For example, the scheduler
code should not be trying to access per-cpuset attributes such as
the sched_load_balance flag (you are correct that this would be
difficult to do because of the locking; however by design, that is
not to be done.)

This thread begins at:

scheduler scalability - cgroups, cpusets and load-balancing
http://lkml.org/lkml/2008/1/29/60

Too bad we didn't think to include you in the CC list of that
thread from the beginning.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.940.382.4214
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-01 Thread Paul Jackson
Max wrote:
 Here is the list of things of issues with sched_load_balance flag from CPU 
 isolation 
 perspective:

A separate thread happened to start up on lkml.org, shortly after
yours, that went into this in considerable detail.

For example, the interaction of cpusets, sched_load_balance,
sched_domains and real time scheduling is examined in some detail on
this thread.  Everyone participating on that thread learned something
(we all came into it with less than a full picture of what's there.)

I would encourage you to read it closely.  For example, the scheduler
code should not be trying to access per-cpuset attributes such as
the sched_load_balance flag (you are correct that this would be
difficult to do because of the locking; however by design, that is
not to be done.)

This thread begins at:

scheduler scalability - cgroups, cpusets and load-balancing
http://lkml.org/lkml/2008/1/29/60

Too bad we didn't think to include you in the CC list of that
thread from the beginning.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.940.382.4214
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-01-31 Thread Max Krasnyanskiy

Hi Mark,

[EMAIL PROTECTED] wrote:
Following patch series extends CPU isolation support. Yes, most people want to virtuallize 
CPUs these days and I want to isolate them :).

The primary idea here is to be able to use some CPU cores as dedicated engines 
for running
user-space code with minimal kernel overhead/intervention, think of it as an SPE in the 
Cell processor.


We've had scheduler support for CPU isolation ever since O(1) scheduler went it. 
I'd like to extend it further to avoid kernel activity on those CPUs as much as possible.
In fact that the primary distinction that I'm making between say "CPU sets" and 
"CPU isolation". "CPU sets" let you manage user-space load while "CPU isolation" provides

a way to isolate a CPU as much as possible (including kernel activities).

I'm personally using this for hard realtime purposes. With CPU isolation it's very easy to 
achieve single digit usec worst case and around 200 nsec average response times on off-the-shelf
multi- processor/core systems under exteme system load. I'm working with legal folks on releasing 
hard RT user-space framework for that.

I can also see other application like simulators and stuff that can benefit 
from this.

I've been maintaining this stuff since around 2.6.18 and it's been running in 
production
environment for a couple of years now. It's been tested on all kinds of 
machines, from NUMA
boxes like HP xw9300/9400 to tiny uTCA boards like Mercury AXA110.
The messiest part used to be SLAB garbage collector changes. With the new SLUB all that mess 
goes away (ie no changes necessary). Also CFS seems to handle CPU hotplug much better than O(1) 
did (ie domains are recomputed dynamically) so that isolation can be done at any time (via sysfs). 
So this seems like a good time to merge. 


Anyway. The patchset consist of 5 patches. First three are very simple and 
non-controversial.
They simply make "CPU isolation" a configurable feature, export 
cpu_isolated_map and provide
some helper functions to access it (just like cpu_online() and friends).
Last two patches add support for isolating CPUs from running workqueus and stop 
machine.
More details in the individual patch descriptions.

Ideally I'd like all of this to go in during this merge window. If people think it's acceptable 
Linus or Andrew (or whoever is more appropriate Ingo maybe) can pull this patch set from

git://git.kernel.org/pub/scm/linux/kernel/git/maxk/cpuisol-2.6.git



It's good to see hear from someone else that thinks a multi-processor
box _should_ be able to run a CPU intensive (%100) RT app on one of the
processors without adversely affecting or being affected by the others.
I have had issues that were _traced_ back to the fact that I am doing
just that. All I got was, you can't do that or we don't support that
kind of thing in the Linux kernel.

One example,  Andrew Mortons feedback to the LKML thread "floppy.c soft lockup"

Good luck with this. I hope this gets someones attention.

Thanks for the support. I do the best I can because just like you I believe 
that it's
a perfectly valid workload and there a lot of interesting applications that 
will benefit
from mainline support.


BTW, I have tried your patches against a vanilla 2.6.24 kernel but am
not successful.

# echo '1' > /sys/devices/system/cpu/cpu1/isolated
bash: echo: write error: Device or resource busy

You have to bring it offline first.
In other words:
echo 0 > /sys/devices/system/cpu/cpu1/online
echo 1 > /sys/devices/system/cpu/cpu1/isolated
echo 1 > /sys/devices/system/cpu/cpu1/online


The cpuisol=1 cmdline option yields:

harley:# cat /sys/devices/system/cpu/cpu1/isolated
0

harley:# cat /proc/cmdline
root=/dev/sda3 vga=normal apm=off selinux=0 noresume splash=silent
kmalloc=192M cpuisol=1

Sorry my bad. I had a typo in the patch description the option is "isolcpus=N".
We've had that option for awhile now. I mean it's not even part of my patch.

Thanx
Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-01-31 Thread Max Krasnyanskiy

Paul Jackson wrote:

Max wrote:

So far it seems that extending cpu_isolated_map
is more natural way of propagating this notion to the rest of the kernel.
Since it's very similar to the cpu_online_map concept and it's easy to 
integrated
with the code that already uses it. 


If it were just realtime support, then I suspect I'd agree that
extending cpu_isolated_map makes more sense.

But some people use realtime on systems that are also heavily
managed using cpusets.  The two have to work together.  I have
customers with systems running realtime on a few CPUs, at the
same time that they have a large batch scheduler (which is layered
on top of cpusets) managing jobs on a few hundred other CPUs.
Hence with the cpuset 'sched_load_balance' flag I think I've already
done what I think is one part of what your patches achieve by extending
the cpu_isolated_map.

This is a common situation with "resource management" mechanisms such
as cpusets (and more recently cgroups and the subsystem modules it
supports.)  They cut across existing core kernel code that manages such
key resources as CPUs and memory.  As best we can, they have to work
with each other.


Hi Paul,

I thought some more about your proposal to use sched_load_balance flag in 
cpusets instead
of extending cpu_isolated_map. I looked at the cpusets, cgroups, latest thread 
started by
Peter (about sched domains and stuff) and here are my thoughts on this.

Here is the list of things of issues with sched_load_balance flag from CPU isolation 
perspective:

--
(1) Boot time isolation is not possible. There is currently no way to setup a 
cpuset at
boot time. For example we won't be able to isolate cpus from irqs and 
workqueues at boot.
Not a major issue but still an inconvenience.

--
(2) There is currently no easy way to figure out what cpuset a cpu belongs to in order 
to query it's sched_load_balance flag. In order to do that we need a method that iterates
all active cpusets and checks their cpus_allowed masks. This implies holding cgroup and 
cpuset mutexes. It's not clear whether it's ok to do that from the the contexts CPU 
isolation happens in (apic, sched, workqueue). It seems that cgroup/cpuset api is designed

from top down access. ie adding a cpu to a set and then recomputing domains. 
Which makes
perfect sense for the common cpuset usecase but is not what cpu isolation needs.
In other words I think it's much simpler and cleaner to use the 
cpu_isolated_map for isolation
purposes.

--
(3) cpusets are a bit too dynamic :). What I mean by this is that 
sched_load_balance flag
can be changed at any time without bringing a CPU offline. What that means is 
that we'll
need some notifier mechanisms for killing and restarting workqueue threads when that flag 
changes. Also we'd need some logic that makes sure that a user does not disable load balancing 
on all cpus because that effectively will kill workqueues on all the cpus.

This particular case is already handled very nicely in my patches. Isolated bit 
can be set
only when cpu is offline and it cannot be set on the first online cpu. 
Workqueus and other
subsystems already handle cpu hotplug events nicely and can easily ignore 
isolated cpus when
they come online.

-

#1 is probably unfixable. #2 and #3 can be fixed but at the expense of extra 
complexity across
the board. I seriously doubt that I'll be able to push that through the reviews ;-). 

Also personally I still think cpusets and cpu isolation attack two different problems. cpusets 
is about partitioning cpus and memory nodes, and managing tasks. Most of the cgroups/cpuset APIs 
are designed to deal with tasks. CPU isolation is much simpler and is at the lower layer. It deals 
with IRQs, kernel per cpu threads, etc. The only intersection I see is that both features affect 
scheduling domains (cpu isolation is again simple here it just puts cpus into null domains and

that's an existing logic in sched.c nothing new here).
So here are some proposal on how we can make them play nicely with each other. 


--
(A) Make cpusets aware of isolated cpus.
All we have to do here is to change 
	guarantee_online_cpus()

common_cpu_mem_hotplug_unplug()
to exclude cpu_isolated_map from cpu_online_map before using it.
And we'd need to change 
	update_cpumasks()

to simply ignore isolated cpus.

That way if a cpu is isolated it'll be ignored by the cpusets logic. Which I 
believe would be
correct behavior. 
We're talking trivial ~5 liner patch which will be noop if cpu isolation is disabled.


(B) Ignore isolated map in cpuset. That's the current state of affairs with my 
patches applied.
Looks like your customers are happy with what they have now so they will probably not enable 
cpu isolation anyway :).


(C) Introduce cpu_usable_map. That map will be recomputed on hotplug events. 
Essentially it'd be
cpu_online_map AND ~cpu_isolated_map. Convert things like cpusets to use that map instead of 
online map.


We can probably come up with other 

Re: [CPUISOL] CPU isolation extensions

2008-01-31 Thread Mark Hounschell
[EMAIL PROTECTED] wrote:
> Following patch series extends CPU isolation support. Yes, most people want 
> to virtuallize 
> CPUs these days and I want to isolate them :).
> The primary idea here is to be able to use some CPU cores as dedicated 
> engines for running
> user-space code with minimal kernel overhead/intervention, think of it as an 
> SPE in the 
> Cell processor.
> 
> We've had scheduler support for CPU isolation ever since O(1) scheduler went 
> it. 
> I'd like to extend it further to avoid kernel activity on those CPUs as much 
> as possible.
> In fact that the primary distinction that I'm making between say "CPU sets" 
> and 
> "CPU isolation". "CPU sets" let you manage user-space load while "CPU 
> isolation" provides
> a way to isolate a CPU as much as possible (including kernel activities).
> 
> I'm personally using this for hard realtime purposes. With CPU isolation it's 
> very easy to 
> achieve single digit usec worst case and around 200 nsec average response 
> times on off-the-shelf
> multi- processor/core systems under exteme system load. I'm working with 
> legal folks on releasing 
> hard RT user-space framework for that.
> I can also see other application like simulators and stuff that can benefit 
> from this.
> 
> I've been maintaining this stuff since around 2.6.18 and it's been running in 
> production
> environment for a couple of years now. It's been tested on all kinds of 
> machines, from NUMA
> boxes like HP xw9300/9400 to tiny uTCA boards like Mercury AXA110.
> The messiest part used to be SLAB garbage collector changes. With the new 
> SLUB all that mess 
> goes away (ie no changes necessary). Also CFS seems to handle CPU hotplug 
> much better than O(1) 
> did (ie domains are recomputed dynamically) so that isolation can be done at 
> any time (via sysfs). 
> So this seems like a good time to merge. 
> 
> Anyway. The patchset consist of 5 patches. First three are very simple and 
> non-controversial.
> They simply make "CPU isolation" a configurable feature, export 
> cpu_isolated_map and provide
> some helper functions to access it (just like cpu_online() and friends).
> Last two patches add support for isolating CPUs from running workqueus and 
> stop machine.
> More details in the individual patch descriptions.
> 
> Ideally I'd like all of this to go in during this merge window. If people 
> think it's acceptable 
> Linus or Andrew (or whoever is more appropriate Ingo maybe) can pull this 
> patch set from
>   git://git.kernel.org/pub/scm/linux/kernel/git/maxk/cpuisol-2.6.git
> 

It's good to see hear from someone else that thinks a multi-processor
box _should_ be able to run a CPU intensive (%100) RT app on one of the
processors without adversely affecting or being affected by the others.
I have had issues that were _traced_ back to the fact that I am doing
just that. All I got was, you can't do that or we don't support that
kind of thing in the Linux kernel.

One example,  Andrew Mortons feedback to the LKML thread "floppy.c soft
lockup"

Good luck with this. I hope this gets someones attention.

BTW, I have tried your patches against a vanilla 2.6.24 kernel but am
not successful.

# echo '1' > /sys/devices/system/cpu/cpu1/isolated
bash: echo: write error: Device or resource busy

The cpuisol=1 cmdline option yields:

harley:# cat /sys/devices/system/cpu/cpu1/isolated
0

harley:# cat /proc/cmdline
root=/dev/sda3 vga=normal apm=off selinux=0 noresume splash=silent
kmalloc=192M cpuisol=1




Regards
Mark



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-01-31 Thread Mark Hounschell
[EMAIL PROTECTED] wrote:
 Following patch series extends CPU isolation support. Yes, most people want 
 to virtuallize 
 CPUs these days and I want to isolate them :).
 The primary idea here is to be able to use some CPU cores as dedicated 
 engines for running
 user-space code with minimal kernel overhead/intervention, think of it as an 
 SPE in the 
 Cell processor.
 
 We've had scheduler support for CPU isolation ever since O(1) scheduler went 
 it. 
 I'd like to extend it further to avoid kernel activity on those CPUs as much 
 as possible.
 In fact that the primary distinction that I'm making between say CPU sets 
 and 
 CPU isolation. CPU sets let you manage user-space load while CPU 
 isolation provides
 a way to isolate a CPU as much as possible (including kernel activities).
 
 I'm personally using this for hard realtime purposes. With CPU isolation it's 
 very easy to 
 achieve single digit usec worst case and around 200 nsec average response 
 times on off-the-shelf
 multi- processor/core systems under exteme system load. I'm working with 
 legal folks on releasing 
 hard RT user-space framework for that.
 I can also see other application like simulators and stuff that can benefit 
 from this.
 
 I've been maintaining this stuff since around 2.6.18 and it's been running in 
 production
 environment for a couple of years now. It's been tested on all kinds of 
 machines, from NUMA
 boxes like HP xw9300/9400 to tiny uTCA boards like Mercury AXA110.
 The messiest part used to be SLAB garbage collector changes. With the new 
 SLUB all that mess 
 goes away (ie no changes necessary). Also CFS seems to handle CPU hotplug 
 much better than O(1) 
 did (ie domains are recomputed dynamically) so that isolation can be done at 
 any time (via sysfs). 
 So this seems like a good time to merge. 
 
 Anyway. The patchset consist of 5 patches. First three are very simple and 
 non-controversial.
 They simply make CPU isolation a configurable feature, export 
 cpu_isolated_map and provide
 some helper functions to access it (just like cpu_online() and friends).
 Last two patches add support for isolating CPUs from running workqueus and 
 stop machine.
 More details in the individual patch descriptions.
 
 Ideally I'd like all of this to go in during this merge window. If people 
 think it's acceptable 
 Linus or Andrew (or whoever is more appropriate Ingo maybe) can pull this 
 patch set from
   git://git.kernel.org/pub/scm/linux/kernel/git/maxk/cpuisol-2.6.git
 

It's good to see hear from someone else that thinks a multi-processor
box _should_ be able to run a CPU intensive (%100) RT app on one of the
processors without adversely affecting or being affected by the others.
I have had issues that were _traced_ back to the fact that I am doing
just that. All I got was, you can't do that or we don't support that
kind of thing in the Linux kernel.

One example,  Andrew Mortons feedback to the LKML thread floppy.c soft
lockup

Good luck with this. I hope this gets someones attention.

BTW, I have tried your patches against a vanilla 2.6.24 kernel but am
not successful.

# echo '1'  /sys/devices/system/cpu/cpu1/isolated
bash: echo: write error: Device or resource busy

The cpuisol=1 cmdline option yields:

harley:# cat /sys/devices/system/cpu/cpu1/isolated
0

harley:# cat /proc/cmdline
root=/dev/sda3 vga=normal apm=off selinux=0 noresume splash=silent
kmalloc=192M cpuisol=1




Regards
Mark



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-01-31 Thread Max Krasnyanskiy

Hi Mark,

[EMAIL PROTECTED] wrote:
Following patch series extends CPU isolation support. Yes, most people want to virtuallize 
CPUs these days and I want to isolate them :).

The primary idea here is to be able to use some CPU cores as dedicated engines 
for running
user-space code with minimal kernel overhead/intervention, think of it as an SPE in the 
Cell processor.


We've had scheduler support for CPU isolation ever since O(1) scheduler went it. 
I'd like to extend it further to avoid kernel activity on those CPUs as much as possible.
In fact that the primary distinction that I'm making between say CPU sets and 
CPU isolation. CPU sets let you manage user-space load while CPU isolation provides

a way to isolate a CPU as much as possible (including kernel activities).

I'm personally using this for hard realtime purposes. With CPU isolation it's very easy to 
achieve single digit usec worst case and around 200 nsec average response times on off-the-shelf
multi- processor/core systems under exteme system load. I'm working with legal folks on releasing 
hard RT user-space framework for that.

I can also see other application like simulators and stuff that can benefit 
from this.

I've been maintaining this stuff since around 2.6.18 and it's been running in 
production
environment for a couple of years now. It's been tested on all kinds of 
machines, from NUMA
boxes like HP xw9300/9400 to tiny uTCA boards like Mercury AXA110.
The messiest part used to be SLAB garbage collector changes. With the new SLUB all that mess 
goes away (ie no changes necessary). Also CFS seems to handle CPU hotplug much better than O(1) 
did (ie domains are recomputed dynamically) so that isolation can be done at any time (via sysfs). 
So this seems like a good time to merge. 


Anyway. The patchset consist of 5 patches. First three are very simple and 
non-controversial.
They simply make CPU isolation a configurable feature, export 
cpu_isolated_map and provide
some helper functions to access it (just like cpu_online() and friends).
Last two patches add support for isolating CPUs from running workqueus and stop 
machine.
More details in the individual patch descriptions.

Ideally I'd like all of this to go in during this merge window. If people think it's acceptable 
Linus or Andrew (or whoever is more appropriate Ingo maybe) can pull this patch set from

git://git.kernel.org/pub/scm/linux/kernel/git/maxk/cpuisol-2.6.git



It's good to see hear from someone else that thinks a multi-processor
box _should_ be able to run a CPU intensive (%100) RT app on one of the
processors without adversely affecting or being affected by the others.
I have had issues that were _traced_ back to the fact that I am doing
just that. All I got was, you can't do that or we don't support that
kind of thing in the Linux kernel.

One example,  Andrew Mortons feedback to the LKML thread floppy.c soft lockup

Good luck with this. I hope this gets someones attention.

Thanks for the support. I do the best I can because just like you I believe 
that it's
a perfectly valid workload and there a lot of interesting applications that 
will benefit
from mainline support.


BTW, I have tried your patches against a vanilla 2.6.24 kernel but am
not successful.

# echo '1'  /sys/devices/system/cpu/cpu1/isolated
bash: echo: write error: Device or resource busy

You have to bring it offline first.
In other words:
echo 0  /sys/devices/system/cpu/cpu1/online
echo 1  /sys/devices/system/cpu/cpu1/isolated
echo 1  /sys/devices/system/cpu/cpu1/online


The cpuisol=1 cmdline option yields:

harley:# cat /sys/devices/system/cpu/cpu1/isolated
0

harley:# cat /proc/cmdline
root=/dev/sda3 vga=normal apm=off selinux=0 noresume splash=silent
kmalloc=192M cpuisol=1

Sorry my bad. I had a typo in the patch description the option is isolcpus=N.
We've had that option for awhile now. I mean it's not even part of my patch.

Thanx
Max
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-01-31 Thread Max Krasnyanskiy

Paul Jackson wrote:

Max wrote:

So far it seems that extending cpu_isolated_map
is more natural way of propagating this notion to the rest of the kernel.
Since it's very similar to the cpu_online_map concept and it's easy to 
integrated
with the code that already uses it. 


If it were just realtime support, then I suspect I'd agree that
extending cpu_isolated_map makes more sense.

But some people use realtime on systems that are also heavily
managed using cpusets.  The two have to work together.  I have
customers with systems running realtime on a few CPUs, at the
same time that they have a large batch scheduler (which is layered
on top of cpusets) managing jobs on a few hundred other CPUs.
Hence with the cpuset 'sched_load_balance' flag I think I've already
done what I think is one part of what your patches achieve by extending
the cpu_isolated_map.

This is a common situation with resource management mechanisms such
as cpusets (and more recently cgroups and the subsystem modules it
supports.)  They cut across existing core kernel code that manages such
key resources as CPUs and memory.  As best we can, they have to work
with each other.


Hi Paul,

I thought some more about your proposal to use sched_load_balance flag in 
cpusets instead
of extending cpu_isolated_map. I looked at the cpusets, cgroups, latest thread 
started by
Peter (about sched domains and stuff) and here are my thoughts on this.

Here is the list of things of issues with sched_load_balance flag from CPU isolation 
perspective:

--
(1) Boot time isolation is not possible. There is currently no way to setup a 
cpuset at
boot time. For example we won't be able to isolate cpus from irqs and 
workqueues at boot.
Not a major issue but still an inconvenience.

--
(2) There is currently no easy way to figure out what cpuset a cpu belongs to in order 
to query it's sched_load_balance flag. In order to do that we need a method that iterates
all active cpusets and checks their cpus_allowed masks. This implies holding cgroup and 
cpuset mutexes. It's not clear whether it's ok to do that from the the contexts CPU 
isolation happens in (apic, sched, workqueue). It seems that cgroup/cpuset api is designed

from top down access. ie adding a cpu to a set and then recomputing domains. 
Which makes
perfect sense for the common cpuset usecase but is not what cpu isolation needs.
In other words I think it's much simpler and cleaner to use the 
cpu_isolated_map for isolation
purposes.

--
(3) cpusets are a bit too dynamic :). What I mean by this is that 
sched_load_balance flag
can be changed at any time without bringing a CPU offline. What that means is 
that we'll
need some notifier mechanisms for killing and restarting workqueue threads when that flag 
changes. Also we'd need some logic that makes sure that a user does not disable load balancing 
on all cpus because that effectively will kill workqueues on all the cpus.

This particular case is already handled very nicely in my patches. Isolated bit 
can be set
only when cpu is offline and it cannot be set on the first online cpu. 
Workqueus and other
subsystems already handle cpu hotplug events nicely and can easily ignore 
isolated cpus when
they come online.

-

#1 is probably unfixable. #2 and #3 can be fixed but at the expense of extra 
complexity across
the board. I seriously doubt that I'll be able to push that through the reviews ;-). 

Also personally I still think cpusets and cpu isolation attack two different problems. cpusets 
is about partitioning cpus and memory nodes, and managing tasks. Most of the cgroups/cpuset APIs 
are designed to deal with tasks. CPU isolation is much simpler and is at the lower layer. It deals 
with IRQs, kernel per cpu threads, etc. The only intersection I see is that both features affect 
scheduling domains (cpu isolation is again simple here it just puts cpus into null domains and

that's an existing logic in sched.c nothing new here).
So here are some proposal on how we can make them play nicely with each other. 


--
(A) Make cpusets aware of isolated cpus.
All we have to do here is to change 
	guarantee_online_cpus()

common_cpu_mem_hotplug_unplug()
to exclude cpu_isolated_map from cpu_online_map before using it.
And we'd need to change 
	update_cpumasks()

to simply ignore isolated cpus.

That way if a cpu is isolated it'll be ignored by the cpusets logic. Which I 
believe would be
correct behavior. 
We're talking trivial ~5 liner patch which will be noop if cpu isolation is disabled.


(B) Ignore isolated map in cpuset. That's the current state of affairs with my 
patches applied.
Looks like your customers are happy with what they have now so they will probably not enable 
cpu isolation anyway :).


(C) Introduce cpu_usable_map. That map will be recomputed on hotplug events. 
Essentially it'd be
cpu_online_map AND ~cpu_isolated_map. Convert things like cpusets to use that map instead of 
online map.


We can probably come up with other 

Re: [CPUISOL] CPU isolation extensions

2008-01-28 Thread Daniel Walker

On Mon, 2008-01-28 at 16:12 -0800, Max Krasnyanskiy wrote:

> Not accurate enough and way too much overhead for what I need. I know at this 
> point it probably 
> sounds like I'm talking BS :). I wish I've released the engine and examples 
> by now. Anyway let 
> me just say that SW MAC has crazy tight deadlines with lots of small tasks. 
> Using nanosleep() & 
> gettimeofday() is simply not practical. So it's all TSC based with clever 
> time sync logic between
> HW and SW.

I don't know if it's BS or not, you clearly fixed your own problem which
is good .. Although when you say "RT patches cannot achieve what I
needed. Even RTAI/Xenomai can't do that." , and HRT is "Not accurate
enough and way too much overhead" .. Given the hardware your using,
that's all difficult to believe.. You also said this code has been
running on production systems for two year, which means it's at least
two years old .. There's been some good sized leaps in real time linux
in the past two years ..

Daniel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-01-28 Thread Max Krasnyanskiy

Daniel Walker wrote:

On Mon, 2008-01-28 at 10:32 -0800, Max Krasnyanskiy wrote:

Just this patches. RT patches cannot achieve what I needed. Even RTAI/Xenomai 
can't do that.
For example I have separate tasks with hard deadlines that must be enforced in 50usec kind 
of range and basically no idle time whatsoever. Just to give more background it's a wireless

basestation with SW MAC/Scheduler. Another requirement is for the SW to know 
precise timing
because SW. For example there is no way we can do predictable 1-2 usec sleeps. 
So I wrote a user-space engine that does all this, it requires full control of the CPU ie minimal
overhead from the kernel, just IPIs for memory management and that's basically it. When my legal 
department lets me I'll do a presentation on this stuff at Linux RT conference or something. 


What kind of hardware are you doing this on? 

All kinds of HW. I mentioned it in the intro email.
Here are the highlights
HP XW9300 (Dual Opteron NUMA box) and XW9400 (Dual Core Opteron)
HP DL145 G2 (Dual Opteron) and G3 (Dual Core Opteron)
	Dell Precision workstations (Core2 Duo and Quad) 
	Various Core2 Duo based systems uTCA boards

Mercury AXA110 (1.5Ghz)
Concurrent Tech AM110 (2.1Ghz)

This scheme should work on anything that lets you disable SMI on the isolated 
core(s).


Also I should note there is HRT (High resolution timers) which provided 
microsecond level
granularity ..
Not accurate enough and way too much overhead for what I need. I know at this point it probably 
sounds like I'm talking BS :). I wish I've released the engine and examples by now. Anyway let 
me just say that SW MAC has crazy tight deadlines with lots of small tasks. Using nanosleep() & 
gettimeofday() is simply not practical. So it's all TSC based with clever time sync logic between

HW and SW.

Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-01-28 Thread Daniel Walker

On Mon, 2008-01-28 at 10:32 -0800, Max Krasnyanskiy wrote:
> Just this patches. RT patches cannot achieve what I needed. Even RTAI/Xenomai 
> can't do that.
> For example I have separate tasks with hard deadlines that must be enforced 
> in 50usec kind 
> of range and basically no idle time whatsoever. Just to give more background 
> it's a wireless
> basestation with SW MAC/Scheduler. Another requirement is for the SW to know 
> precise timing
> because SW. For example there is no way we can do predictable 1-2 usec 
> sleeps. 
> So I wrote a user-space engine that does all this, it requires full control 
> of the CPU ie minimal
> overhead from the kernel, just IPIs for memory management and that's 
> basically it. When my legal 
> department lets me I'll do a presentation on this stuff at Linux RT 
> conference or something. 

What kind of hardware are you doing this on? Also I should note there is
HRT (High resolution timers) which provided microsecond level
granularity ..

Daniel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-01-28 Thread Max Krasnyanskiy

Paul Jackson wrote:

Max wrote:

So far it seems that extending cpu_isolated_map
is more natural way of propagating this notion to the rest of the kernel.
Since it's very similar to the cpu_online_map concept and it's easy to 
integrated
with the code that already uses it. 


If it were just realtime support, then I suspect I'd agree that
extending cpu_isolated_map makes more sense.

But some people use realtime on systems that are also heavily
managed using cpusets.  The two have to work together.  I have
customers with systems running realtime on a few CPUs, at the
same time that they have a large batch scheduler (which is layered
on top of cpusets) managing jobs on a few hundred other CPUs.
Hence with the cpuset 'sched_load_balance' flag I think I've already
done what I think is one part of what your patches achieve by extending
the cpu_isolated_map.

This is a common situation with "resource management" mechanisms such
as cpusets (and more recently cgroups and the subsystem modules it
supports.)  They cut across existing core kernel code that manages such
key resources as CPUs and memory.  As best we can, they have to work
with each other.


Thanks for the info Paul. I'll definitely look into using this flag instead 
and reply with pros and cons (if any).


Max


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-01-28 Thread Max Krasnyanskiy

Peter Zijlstra wrote:

On Mon, 2008-01-28 at 14:00 -0500, Steven Rostedt wrote:

On Mon, 28 Jan 2008, Max Krasnyanskiy wrote:

  [PATCH] [CPUISOL] Support for workqueue isolation

The thing about workqueues is that they should only be woken on a CPU if
something on that CPU accessed them. IOW, the workqueue on a CPU handles
work that was called by something on that CPU. Which means that
something that high prio task did triggered a workqueue to do some work.
But this can also be triggered by interrupts, so by keeping interrupts
off the CPU no workqueue should be activated.

No no no. That's what I though too ;-). The problem is that things like NFS and 
friends
expect _all_ their workqueue threads to report back when they do certain things 
like
flushing buffers and stuff. The reason I added this is because my machines were 
getting
stuck because CPU0 was waiting for CPU1 to run NFS work queue threads even 
though no IRQs
or other things are running on it.

This sounds more like we should fix NFS than add this for all workqueues.
Again, we want workqueues to run on the behalf of whatever is running on
that CPU, including those tasks that are running on an isolcpu.


agreed, by looking at my top output (and not the nfs code) it looks like
it just spawns a configurable number of active kernel threads which are
not cpu bound by in any way. I think just removing the isolated cpus
from their runnable mask should take care of them.


Actually NFS was just one example. I cannot remember of a top of my head what 
else was there
but there are definitely other users of work queues that expect all the threads 
to run at
some point in time.
Also if you think about it. The patch does _exactly_ what you propose. It removes workqueue 
threads from isolated CPUs. But instead of doing just for NFS and/or other subsystems 
separately it just does it in a generic way by simply not starting those threads in first 
place.



  [PATCH] [CPUISOL] Isolated CPUs should be ignored by the "stop machine"

This I find very dangerous. We are making an assumption that tasks on an
isolated CPU wont be doing things that stopmachine requires. What stops
a task on an isolated CPU from calling something into the kernel that
stop_machine requires to halt?

I agree in general. The thing is though that stop machine just kills any kind 
of latency
guaranties. Without the patch the machine just hangs waiting for the 
stop-machine to run
when module is inserted/removed. And running without dynamic module loading is 
not very
practical on general purpose machines. So I'd rather have an option with a big 
red warning
than no option at all :).

Well, that's something one of the greater powers (Linus, Andrew, Ingo)
must decide. ;-)


I'm in favour of better engineered method, that is, we really should try
to solve these problems in a proper way. Hacks like this might be fine
for custom kernels, but I think we should have a higher standard when it
comes to upstream - we all have to live many years with whatever we put
in there, we'd better think well about it.


100% agree. That's why I said mentioned that this patches is controversial in the first place. 
Right now those short from rewriting module loading to not use stop machine there is no other 
option. I'll think some more about it. If you guys have other ideas please drop me a note.


Thanx
Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-01-28 Thread Peter Zijlstra

On Mon, 2008-01-28 at 14:00 -0500, Steven Rostedt wrote:
> 
> On Mon, 28 Jan 2008, Max Krasnyanskiy wrote:
> > >>   [PATCH] [CPUISOL] Support for workqueue isolation
> > >
> > > The thing about workqueues is that they should only be woken on a CPU if
> > > something on that CPU accessed them. IOW, the workqueue on a CPU handles
> > > work that was called by something on that CPU. Which means that
> > > something that high prio task did triggered a workqueue to do some work.
> > > But this can also be triggered by interrupts, so by keeping interrupts
> > > off the CPU no workqueue should be activated.
> 
> > No no no. That's what I though too ;-). The problem is that things like NFS 
> > and friends
> > expect _all_ their workqueue threads to report back when they do certain 
> > things like
> > flushing buffers and stuff. The reason I added this is because my machines 
> > were getting
> > stuck because CPU0 was waiting for CPU1 to run NFS work queue threads even 
> > though no IRQs
> > or other things are running on it.
> 
> This sounds more like we should fix NFS than add this for all workqueues.
> Again, we want workqueues to run on the behalf of whatever is running on
> that CPU, including those tasks that are running on an isolcpu.

agreed, by looking at my top output (and not the nfs code) it looks like
it just spawns a configurable number of active kernel threads which are
not cpu bound by in any way. I think just removing the isolated cpus
from their runnable mask should take care of them.

> 
> >
> > >>   [PATCH] [CPUISOL] Isolated CPUs should be ignored by the "stop machine"
> > >
> > > This I find very dangerous. We are making an assumption that tasks on an
> > > isolated CPU wont be doing things that stopmachine requires. What stops
> > > a task on an isolated CPU from calling something into the kernel that
> > > stop_machine requires to halt?
> 
> > I agree in general. The thing is though that stop machine just kills any 
> > kind of latency
> > guaranties. Without the patch the machine just hangs waiting for the 
> > stop-machine to run
> > when module is inserted/removed. And running without dynamic module loading 
> > is not very
> > practical on general purpose machines. So I'd rather have an option with a 
> > big red warning
> > than no option at all :).
> 
> Well, that's something one of the greater powers (Linus, Andrew, Ingo)
> must decide. ;-)

I'm in favour of better engineered method, that is, we really should try
to solve these problems in a proper way. Hacks like this might be fine
for custom kernels, but I think we should have a higher standard when it
comes to upstream - we all have to live many years with whatever we put
in there, we'd better think well about it.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-01-28 Thread Paul Jackson
Max wrote:
> Also "CPU sets" seem to mostly deal with the scheduler domains.

True - though "cpusets" (no space ;) sched_load_balance flag can
be used to see that some CPUs are not in any scheduler domain,
which is equivalent to not having the scheduler run on them.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.940.382.4214
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-01-28 Thread Paul Jackson
Max wrote:
> So far it seems that extending cpu_isolated_map
> is more natural way of propagating this notion to the rest of the kernel.
> Since it's very similar to the cpu_online_map concept and it's easy to 
> integrated
> with the code that already uses it. 

If it were just realtime support, then I suspect I'd agree that
extending cpu_isolated_map makes more sense.

But some people use realtime on systems that are also heavily
managed using cpusets.  The two have to work together.  I have
customers with systems running realtime on a few CPUs, at the
same time that they have a large batch scheduler (which is layered
on top of cpusets) managing jobs on a few hundred other CPUs.
Hence with the cpuset 'sched_load_balance' flag I think I've already
done what I think is one part of what your patches achieve by extending
the cpu_isolated_map.

This is a common situation with "resource management" mechanisms such
as cpusets (and more recently cgroups and the subsystem modules it
supports.)  They cut across existing core kernel code that manages such
key resources as CPUs and memory.  As best we can, they have to work
with each other.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.940.382.4214
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-01-28 Thread Steven Rostedt


On Mon, 28 Jan 2008, Max Krasnyanskiy wrote:
> >>   [PATCH] [CPUISOL] Support for workqueue isolation
> >
> > The thing about workqueues is that they should only be woken on a CPU if
> > something on that CPU accessed them. IOW, the workqueue on a CPU handles
> > work that was called by something on that CPU. Which means that
> > something that high prio task did triggered a workqueue to do some work.
> > But this can also be triggered by interrupts, so by keeping interrupts
> > off the CPU no workqueue should be activated.

> No no no. That's what I though too ;-). The problem is that things like NFS 
> and friends
> expect _all_ their workqueue threads to report back when they do certain 
> things like
> flushing buffers and stuff. The reason I added this is because my machines 
> were getting
> stuck because CPU0 was waiting for CPU1 to run NFS work queue threads even 
> though no IRQs
> or other things are running on it.

This sounds more like we should fix NFS than add this for all workqueues.
Again, we want workqueues to run on the behalf of whatever is running on
that CPU, including those tasks that are running on an isolcpu.


>
> >>   [PATCH] [CPUISOL] Isolated CPUs should be ignored by the "stop machine"
> >
> > This I find very dangerous. We are making an assumption that tasks on an
> > isolated CPU wont be doing things that stopmachine requires. What stops
> > a task on an isolated CPU from calling something into the kernel that
> > stop_machine requires to halt?

> I agree in general. The thing is though that stop machine just kills any kind 
> of latency
> guaranties. Without the patch the machine just hangs waiting for the 
> stop-machine to run
> when module is inserted/removed. And running without dynamic module loading 
> is not very
> practical on general purpose machines. So I'd rather have an option with a 
> big red warning
> than no option at all :).

Well, that's something one of the greater powers (Linus, Andrew, Ingo)
must decide. ;-)


-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-01-28 Thread Max Krasnyanskiy

Peter Zijlstra wrote:

On Mon, 2008-01-28 at 11:34 -0500, Steven Rostedt wrote:

On Mon, Jan 28, 2008 at 08:59:10AM -0600, Paul Jackson wrote:

Thanks for the CC, Peter.

Thanks from me too.


Max wrote:
We've had scheduler support for CPU isolation ever since O(1) scheduler went it. 
I'd like to extend it further to avoid kernel activity on those CPUs as much as possible.

I recently added the per-cpuset flag 'sched_load_balance' for some
other realtime folks, so that they can disable the kernel scheduler
load balancing on isolated CPUs.  It essentially allows for dynamic
control of which CPUs are isolated by the scheduler, using the cpuset
hierarchy, rather than enhancing the 'isolated_cpus' mask.   That
'isolated_cpus' mask remained a minimal kernel boottime parameter.
I believe this went to Linus's tree about Oct 2007.

It looks like you have three additional tweaks for realtime in this
patch set, with your patches:

  [PATCH] [CPUISOL] Do not route IRQs to the CPUs isolated at boot

I didn't know we still routed IRQs to isolated CPUs. I guess I need to
look deeper into the code on this one. But I agree that isolated CPUs
should not have IRQs routed to them.


While I agree with this in principle, I'm not sure flat out denying all
IRQs to these cpus is a good option. What about the case where we want
to service just this one specific IRQ on this CPU and no others?

Can't this be done by userspace irq routing as used by irqbalanced?

Peter, I think you missed the point of this patch. It's just a convenience 
feature.
It simply excludes isolated CPUs from IRQ smp affinity masks. That's all. What 
did you
mean by "flat out denying all IRQs to these cpus" ? IRQs can still be routed to them 
by writing to /proc/irq/N/smp_affinity.


Also, this happens naturally when we bring a CPU off-line and then bring it 
back online.
ie When CPU comes back online it's excluded from the IRQ smp_affinity masks 
even without
my patch.


  [PATCH] [CPUISOL] Support for workqueue isolation

The thing about workqueues is that they should only be woken on a CPU if
something on that CPU accessed them. IOW, the workqueue on a CPU handles
work that was called by something on that CPU. Which means that
something that high prio task did triggered a workqueue to do some work.
But this can also be triggered by interrupts, so by keeping interrupts
off the CPU no workqueue should be activated.


Quite so, if nobody uses it, there is no harm in having them around. If
they are used, its by someone already allowed on the cpu.


No no no. I just replied to Steven about that. The problem is that things like NFS and 
friends expect _all_ their workqueue threads to report back when they do certain things 
like flushing buffers and stuff. The reason I added this is because my machines were 
getting stuck because CPU0 was waiting for CPU1 to run NFS work queue threads even though 
no IRQs, softirqs or other things are running on it.



  [PATCH] [CPUISOL] Isolated CPUs should be ignored by the "stop machine"

This I find very dangerous. We are making an assumption that tasks on an
isolated CPU wont be doing things that stopmachine requires. What stops
a task on an isolated CPU from calling something into the kernel that
stop_machine requires to halt?


Very dangerous indeed!

Please see my reply to Steven. I agree it's somewhat dangerous. What we could 
do is make it
configurable with a big fat warning. In other words I'd rather have an option 
than just says
"do not use dynamic module loading" on those systems.

Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-01-28 Thread Max Krasnyanskiy

Steven Rostedt wrote:

On Mon, Jan 28, 2008 at 08:59:10AM -0600, Paul Jackson wrote:

Thanks for the CC, Peter.


Thanks from me too.


Max wrote:
We've had scheduler support for CPU isolation ever since O(1) scheduler went it. 
I'd like to extend it further to avoid kernel activity on those CPUs as much as possible.

I recently added the per-cpuset flag 'sched_load_balance' for some
other realtime folks, so that they can disable the kernel scheduler
load balancing on isolated CPUs.  It essentially allows for dynamic
control of which CPUs are isolated by the scheduler, using the cpuset
hierarchy, rather than enhancing the 'isolated_cpus' mask.   That
'isolated_cpus' mask remained a minimal kernel boottime parameter.
I believe this went to Linus's tree about Oct 2007.

It looks like you have three additional tweaks for realtime in this
patch set, with your patches:

  [PATCH] [CPUISOL] Do not route IRQs to the CPUs isolated at boot


I didn't know we still routed IRQs to isolated CPUs. I guess I need to
look deeper into the code on this one. But I agree that isolated CPUs
should not have IRQs routed to them.

Also note that it's just a convenience feature. In other words it's not that 
with this patch
we'll never route IRQs to those CPUs. They can still be explicitly routed by writing to 
irq/N/smp_affitnity.



  [PATCH] [CPUISOL] Support for workqueue isolation


The thing about workqueues is that they should only be woken on a CPU if
something on that CPU accessed them. IOW, the workqueue on a CPU handles
work that was called by something on that CPU. Which means that
something that high prio task did triggered a workqueue to do some work.
But this can also be triggered by interrupts, so by keeping interrupts
off the CPU no workqueue should be activated.

No no no. That's what I though too ;-). The problem is that things like NFS and 
friends
expect _all_ their workqueue threads to report back when they do certain things 
like
flushing buffers and stuff. The reason I added this is because my machines were 
getting
stuck because CPU0 was waiting for CPU1 to run NFS work queue threads even 
though no IRQs
or other things are running on it.


  [PATCH] [CPUISOL] Isolated CPUs should be ignored by the "stop machine"


This I find very dangerous. We are making an assumption that tasks on an
isolated CPU wont be doing things that stopmachine requires. What stops
a task on an isolated CPU from calling something into the kernel that
stop_machine requires to halt?
I agree in general. The thing is though that stop machine just kills any kind of latency 
guaranties. Without the patch the machine just hangs waiting for the stop-machine to run
when module is inserted/removed. And running without dynamic module loading is not very 
practical on general purpose machines. So I'd rather have an option with a big red warning 
than no option at all :).


Thanx
Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-01-28 Thread Max Krasnyanskiy

Paul Jackson wrote:

Thanks for the CC, Peter.

  Ingo - see question at end of message.

Max wrote:
We've had scheduler support for CPU isolation ever since O(1) scheduler went it. 
I'd like to extend it further to avoid kernel activity on those CPUs as much as possible.


I recently added the per-cpuset flag 'sched_load_balance' for some
other realtime folks, so that they can disable the kernel scheduler
load balancing on isolated CPUs.  It essentially allows for dynamic
control of which CPUs are isolated by the scheduler, using the cpuset
hierarchy, rather than enhancing the 'isolated_cpus' mask.   That
'isolated_cpus' mask remained a minimal kernel boottime parameter.
I believe this went to Linus's tree about Oct 2007.

It looks like you have three additional tweaks for realtime in this
patch set, with your patches:

  [PATCH] [CPUISOL] Do not route IRQs to the CPUs isolated at boot
  [PATCH] [CPUISOL] Support for workqueue isolation
  [PATCH] [CPUISOL] Isolated CPUs should be ignored by the "stop machine"

It would be interesting to see a patchset with the above three realtime
tweaks, layered on this new cpuset 'sched_load_balance' apparatus, rather
than layered on changes to make 'isolated_cpus' more dynamic.  Some of us
run realtime and cpuset-intensive loads on the same system, so like to
have those two capabilities co-operate with each other.

I'll definitely take a look. So far it seems that extending cpu_isolated_map
is more natural way of propagating this notion to the rest of the kernel.
Since it's very similar to the cpu_online_map concept and it's easy to 
integrated
with the code that already uses it. 
Anyway. I'll take a look at the cpuset flag that you mentioned and report back.


Thanx
Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-01-28 Thread Max Krasnyanskiy

Hi Peter,

Peter Zijlstra wrote:

[ You really ought to CC people :-) ]

I was not sure who though :)
Do we have a mailing list for scheduler development btw ?
Or it's just folks that you included in CC ?
Some of the latest scheduler patches brake things that I'm doing and I'd like 
to make
them configurable (RT watchdog, etc).


On Sun, 2008-01-27 at 20:09 -0800, [EMAIL PROTECTED] wrote:
Following patch series extends CPU isolation support. Yes, most people want to virtuallize 
CPUs these days and I want to isolate them :).

The primary idea here is to be able to use some CPU cores as dedicated engines 
for running
user-space code with minimal kernel overhead/intervention, think of it as an SPE in the 
Cell processor.


We've had scheduler support for CPU isolation ever since O(1) scheduler went it. 
I'd like to extend it further to avoid kernel activity on those CPUs as much as possible.
In fact that the primary distinction that I'm making between say "CPU sets" and 
"CPU isolation". "CPU sets" let you manage user-space load while "CPU isolation" provides

a way to isolate a CPU as much as possible (including kernel activities).


Ok, so you're aware of CPU sets, miss a feature, but instead of
extending it to cover your needs you build something new entirely?

It's not really new. CPU isolation bits just has not been exported before 
that's all.
Also "CPU sets" seem to mostly deal with the scheduler domains. I'll reply to Paul's 
proposal to use that instead.


I'm personally using this for hard realtime purposes. With CPU isolation it's very easy to 
achieve single digit usec worst case and around 200 nsec average response times on off-the-shelf
multi- processor/core systems under exteme system load. I'm working with legal folks on releasing 
hard RT user-space framework for that.

I can also see other application like simulators and stuff that can benefit 
from this.


have you been using just this, or in combination with the -rt effort?

Just this patches. RT patches cannot achieve what I needed. Even RTAI/Xenomai 
can't do that.
For example I have separate tasks with hard deadlines that must be enforced in 50usec kind 
of range and basically no idle time whatsoever. Just to give more background it's a wireless

basestation with SW MAC/Scheduler. Another requirement is for the SW to know 
precise timing
because SW. For example there is no way we can do predictable 1-2 usec sleeps. 
So I wrote a user-space engine that does all this, it requires full control of the CPU ie minimal
overhead from the kernel, just IPIs for memory management and that's basically it. When my legal 
department lets me I'll do a presentation on this stuff at Linux RT conference or something. 


Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-01-28 Thread Peter Zijlstra

On Mon, 2008-01-28 at 11:34 -0500, Steven Rostedt wrote:
> On Mon, Jan 28, 2008 at 08:59:10AM -0600, Paul Jackson wrote:
> > Thanks for the CC, Peter.
> 
> Thanks from me too.
> 
> > Max wrote:
> > > We've had scheduler support for CPU isolation ever since O(1) scheduler 
> > > went it. 
> > > I'd like to extend it further to avoid kernel activity on those CPUs as 
> > > much as possible.
> > 
> > I recently added the per-cpuset flag 'sched_load_balance' for some
> > other realtime folks, so that they can disable the kernel scheduler
> > load balancing on isolated CPUs.  It essentially allows for dynamic
> > control of which CPUs are isolated by the scheduler, using the cpuset
> > hierarchy, rather than enhancing the 'isolated_cpus' mask.   That
> > 'isolated_cpus' mask remained a minimal kernel boottime parameter.
> > I believe this went to Linus's tree about Oct 2007.
> > 
> > It looks like you have three additional tweaks for realtime in this
> > patch set, with your patches:
> > 
> >   [PATCH] [CPUISOL] Do not route IRQs to the CPUs isolated at boot
> 
> I didn't know we still routed IRQs to isolated CPUs. I guess I need to
> look deeper into the code on this one. But I agree that isolated CPUs
> should not have IRQs routed to them.

While I agree with this in principle, I'm not sure flat out denying all
IRQs to these cpus is a good option. What about the case where we want
to service just this one specific IRQ on this CPU and no others?

Can't this be done by userspace irq routing as used by irqbalanced?

> >   [PATCH] [CPUISOL] Support for workqueue isolation
> 
> The thing about workqueues is that they should only be woken on a CPU if
> something on that CPU accessed them. IOW, the workqueue on a CPU handles
> work that was called by something on that CPU. Which means that
> something that high prio task did triggered a workqueue to do some work.
> But this can also be triggered by interrupts, so by keeping interrupts
> off the CPU no workqueue should be activated.

Quite so, if nobody uses it, there is no harm in having them around. If
they are used, its by someone already allowed on the cpu.

> >   [PATCH] [CPUISOL] Isolated CPUs should be ignored by the "stop machine"
> 
> This I find very dangerous. We are making an assumption that tasks on an
> isolated CPU wont be doing things that stopmachine requires. What stops
> a task on an isolated CPU from calling something into the kernel that
> stop_machine requires to halt?

Very dangerous indeed!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-01-28 Thread Steven Rostedt
On Mon, Jan 28, 2008 at 08:59:10AM -0600, Paul Jackson wrote:
> Thanks for the CC, Peter.

Thanks from me too.

> Max wrote:
> > We've had scheduler support for CPU isolation ever since O(1) scheduler 
> > went it. 
> > I'd like to extend it further to avoid kernel activity on those CPUs as 
> > much as possible.
> 
> I recently added the per-cpuset flag 'sched_load_balance' for some
> other realtime folks, so that they can disable the kernel scheduler
> load balancing on isolated CPUs.  It essentially allows for dynamic
> control of which CPUs are isolated by the scheduler, using the cpuset
> hierarchy, rather than enhancing the 'isolated_cpus' mask.   That
> 'isolated_cpus' mask remained a minimal kernel boottime parameter.
> I believe this went to Linus's tree about Oct 2007.
> 
> It looks like you have three additional tweaks for realtime in this
> patch set, with your patches:
> 
>   [PATCH] [CPUISOL] Do not route IRQs to the CPUs isolated at boot

I didn't know we still routed IRQs to isolated CPUs. I guess I need to
look deeper into the code on this one. But I agree that isolated CPUs
should not have IRQs routed to them.

>   [PATCH] [CPUISOL] Support for workqueue isolation

The thing about workqueues is that they should only be woken on a CPU if
something on that CPU accessed them. IOW, the workqueue on a CPU handles
work that was called by something on that CPU. Which means that
something that high prio task did triggered a workqueue to do some work.
But this can also be triggered by interrupts, so by keeping interrupts
off the CPU no workqueue should be activated.

>   [PATCH] [CPUISOL] Isolated CPUs should be ignored by the "stop machine"

This I find very dangerous. We are making an assumption that tasks on an
isolated CPU wont be doing things that stopmachine requires. What stops
a task on an isolated CPU from calling something into the kernel that
stop_machine requires to halt?

-- Steve


> 
> It would be interesting to see a patchset with the above three realtime
> tweaks, layered on this new cpuset 'sched_load_balance' apparatus, rather
> than layered on changes to make 'isolated_cpus' more dynamic.  Some of us
> run realtime and cpuset-intensive loads on the same system, so like to
> have those two capabilities co-operate with each other.
> 
> Ingo - what's your sense of the value of the above three realtime tweaks
>(the last three patches in Max's patch set)?
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-01-28 Thread Paul Jackson
Thanks for the CC, Peter.

  Ingo - see question at end of message.

Max wrote:
> We've had scheduler support for CPU isolation ever since O(1) scheduler went 
> it. 
> I'd like to extend it further to avoid kernel activity on those CPUs as much 
> as possible.

I recently added the per-cpuset flag 'sched_load_balance' for some
other realtime folks, so that they can disable the kernel scheduler
load balancing on isolated CPUs.  It essentially allows for dynamic
control of which CPUs are isolated by the scheduler, using the cpuset
hierarchy, rather than enhancing the 'isolated_cpus' mask.   That
'isolated_cpus' mask remained a minimal kernel boottime parameter.
I believe this went to Linus's tree about Oct 2007.

It looks like you have three additional tweaks for realtime in this
patch set, with your patches:

  [PATCH] [CPUISOL] Do not route IRQs to the CPUs isolated at boot
  [PATCH] [CPUISOL] Support for workqueue isolation
  [PATCH] [CPUISOL] Isolated CPUs should be ignored by the "stop machine"

It would be interesting to see a patchset with the above three realtime
tweaks, layered on this new cpuset 'sched_load_balance' apparatus, rather
than layered on changes to make 'isolated_cpus' more dynamic.  Some of us
run realtime and cpuset-intensive loads on the same system, so like to
have those two capabilities co-operate with each other.

Ingo - what's your sense of the value of the above three realtime tweaks
   (the last three patches in Max's patch set)?

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.940.382.4214
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-01-28 Thread Peter Zijlstra
[ You really ought to CC people :-) ]

On Sun, 2008-01-27 at 20:09 -0800, [EMAIL PROTECTED] wrote:
> Following patch series extends CPU isolation support. Yes, most people want 
> to virtuallize 
> CPUs these days and I want to isolate them :).
> The primary idea here is to be able to use some CPU cores as dedicated 
> engines for running
> user-space code with minimal kernel overhead/intervention, think of it as an 
> SPE in the 
> Cell processor.
> 
> We've had scheduler support for CPU isolation ever since O(1) scheduler went 
> it. 
> I'd like to extend it further to avoid kernel activity on those CPUs as much 
> as possible.
> In fact that the primary distinction that I'm making between say "CPU sets" 
> and 
> "CPU isolation". "CPU sets" let you manage user-space load while "CPU 
> isolation" provides
> a way to isolate a CPU as much as possible (including kernel activities).

Ok, so you're aware of CPU sets, miss a feature, but instead of
extending it to cover your needs you build something new entirely?

> I'm personally using this for hard realtime purposes. With CPU isolation it's 
> very easy to 
> achieve single digit usec worst case and around 200 nsec average response 
> times on off-the-shelf
> multi- processor/core systems under exteme system load. I'm working with 
> legal folks on releasing 
> hard RT user-space framework for that.
> I can also see other application like simulators and stuff that can benefit 
> from this.

have you been using just this, or in combination with the -rt effort?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-01-28 Thread Max Krasnyanskiy

Peter Zijlstra wrote:

On Mon, 2008-01-28 at 11:34 -0500, Steven Rostedt wrote:

On Mon, Jan 28, 2008 at 08:59:10AM -0600, Paul Jackson wrote:

Thanks for the CC, Peter.

Thanks from me too.


Max wrote:
We've had scheduler support for CPU isolation ever since O(1) scheduler went it. 
I'd like to extend it further to avoid kernel activity on those CPUs as much as possible.

I recently added the per-cpuset flag 'sched_load_balance' for some
other realtime folks, so that they can disable the kernel scheduler
load balancing on isolated CPUs.  It essentially allows for dynamic
control of which CPUs are isolated by the scheduler, using the cpuset
hierarchy, rather than enhancing the 'isolated_cpus' mask.   That
'isolated_cpus' mask remained a minimal kernel boottime parameter.
I believe this went to Linus's tree about Oct 2007.

It looks like you have three additional tweaks for realtime in this
patch set, with your patches:

  [PATCH] [CPUISOL] Do not route IRQs to the CPUs isolated at boot

I didn't know we still routed IRQs to isolated CPUs. I guess I need to
look deeper into the code on this one. But I agree that isolated CPUs
should not have IRQs routed to them.


While I agree with this in principle, I'm not sure flat out denying all
IRQs to these cpus is a good option. What about the case where we want
to service just this one specific IRQ on this CPU and no others?

Can't this be done by userspace irq routing as used by irqbalanced?

Peter, I think you missed the point of this patch. It's just a convenience 
feature.
It simply excludes isolated CPUs from IRQ smp affinity masks. That's all. What 
did you
mean by flat out denying all IRQs to these cpus ? IRQs can still be routed to them 
by writing to /proc/irq/N/smp_affinity.


Also, this happens naturally when we bring a CPU off-line and then bring it 
back online.
ie When CPU comes back online it's excluded from the IRQ smp_affinity masks 
even without
my patch.


  [PATCH] [CPUISOL] Support for workqueue isolation

The thing about workqueues is that they should only be woken on a CPU if
something on that CPU accessed them. IOW, the workqueue on a CPU handles
work that was called by something on that CPU. Which means that
something that high prio task did triggered a workqueue to do some work.
But this can also be triggered by interrupts, so by keeping interrupts
off the CPU no workqueue should be activated.


Quite so, if nobody uses it, there is no harm in having them around. If
they are used, its by someone already allowed on the cpu.


No no no. I just replied to Steven about that. The problem is that things like NFS and 
friends expect _all_ their workqueue threads to report back when they do certain things 
like flushing buffers and stuff. The reason I added this is because my machines were 
getting stuck because CPU0 was waiting for CPU1 to run NFS work queue threads even though 
no IRQs, softirqs or other things are running on it.



  [PATCH] [CPUISOL] Isolated CPUs should be ignored by the stop machine

This I find very dangerous. We are making an assumption that tasks on an
isolated CPU wont be doing things that stopmachine requires. What stops
a task on an isolated CPU from calling something into the kernel that
stop_machine requires to halt?


Very dangerous indeed!

Please see my reply to Steven. I agree it's somewhat dangerous. What we could 
do is make it
configurable with a big fat warning. In other words I'd rather have an option 
than just says
do not use dynamic module loading on those systems.

Max
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-01-28 Thread Peter Zijlstra

On Mon, 2008-01-28 at 14:00 -0500, Steven Rostedt wrote:
 
 On Mon, 28 Jan 2008, Max Krasnyanskiy wrote:
 [PATCH] [CPUISOL] Support for workqueue isolation
  
   The thing about workqueues is that they should only be woken on a CPU if
   something on that CPU accessed them. IOW, the workqueue on a CPU handles
   work that was called by something on that CPU. Which means that
   something that high prio task did triggered a workqueue to do some work.
   But this can also be triggered by interrupts, so by keeping interrupts
   off the CPU no workqueue should be activated.
 
  No no no. That's what I though too ;-). The problem is that things like NFS 
  and friends
  expect _all_ their workqueue threads to report back when they do certain 
  things like
  flushing buffers and stuff. The reason I added this is because my machines 
  were getting
  stuck because CPU0 was waiting for CPU1 to run NFS work queue threads even 
  though no IRQs
  or other things are running on it.
 
 This sounds more like we should fix NFS than add this for all workqueues.
 Again, we want workqueues to run on the behalf of whatever is running on
 that CPU, including those tasks that are running on an isolcpu.

agreed, by looking at my top output (and not the nfs code) it looks like
it just spawns a configurable number of active kernel threads which are
not cpu bound by in any way. I think just removing the isolated cpus
from their runnable mask should take care of them.

 
 
 [PATCH] [CPUISOL] Isolated CPUs should be ignored by the stop machine
  
   This I find very dangerous. We are making an assumption that tasks on an
   isolated CPU wont be doing things that stopmachine requires. What stops
   a task on an isolated CPU from calling something into the kernel that
   stop_machine requires to halt?
 
  I agree in general. The thing is though that stop machine just kills any 
  kind of latency
  guaranties. Without the patch the machine just hangs waiting for the 
  stop-machine to run
  when module is inserted/removed. And running without dynamic module loading 
  is not very
  practical on general purpose machines. So I'd rather have an option with a 
  big red warning
  than no option at all :).
 
 Well, that's something one of the greater powers (Linus, Andrew, Ingo)
 must decide. ;-)

I'm in favour of better engineered method, that is, we really should try
to solve these problems in a proper way. Hacks like this might be fine
for custom kernels, but I think we should have a higher standard when it
comes to upstream - we all have to live many years with whatever we put
in there, we'd better think well about it.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-01-28 Thread Paul Jackson
Max wrote:
 Also CPU sets seem to mostly deal with the scheduler domains.

True - though cpusets (no space ;) sched_load_balance flag can
be used to see that some CPUs are not in any scheduler domain,
which is equivalent to not having the scheduler run on them.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.940.382.4214
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-01-28 Thread Max Krasnyanskiy

Peter Zijlstra wrote:

On Mon, 2008-01-28 at 14:00 -0500, Steven Rostedt wrote:

On Mon, 28 Jan 2008, Max Krasnyanskiy wrote:

  [PATCH] [CPUISOL] Support for workqueue isolation

The thing about workqueues is that they should only be woken on a CPU if
something on that CPU accessed them. IOW, the workqueue on a CPU handles
work that was called by something on that CPU. Which means that
something that high prio task did triggered a workqueue to do some work.
But this can also be triggered by interrupts, so by keeping interrupts
off the CPU no workqueue should be activated.

No no no. That's what I though too ;-). The problem is that things like NFS and 
friends
expect _all_ their workqueue threads to report back when they do certain things 
like
flushing buffers and stuff. The reason I added this is because my machines were 
getting
stuck because CPU0 was waiting for CPU1 to run NFS work queue threads even 
though no IRQs
or other things are running on it.

This sounds more like we should fix NFS than add this for all workqueues.
Again, we want workqueues to run on the behalf of whatever is running on
that CPU, including those tasks that are running on an isolcpu.


agreed, by looking at my top output (and not the nfs code) it looks like
it just spawns a configurable number of active kernel threads which are
not cpu bound by in any way. I think just removing the isolated cpus
from their runnable mask should take care of them.


Actually NFS was just one example. I cannot remember of a top of my head what 
else was there
but there are definitely other users of work queues that expect all the threads 
to run at
some point in time.
Also if you think about it. The patch does _exactly_ what you propose. It removes workqueue 
threads from isolated CPUs. But instead of doing just for NFS and/or other subsystems 
separately it just does it in a generic way by simply not starting those threads in first 
place.



  [PATCH] [CPUISOL] Isolated CPUs should be ignored by the stop machine

This I find very dangerous. We are making an assumption that tasks on an
isolated CPU wont be doing things that stopmachine requires. What stops
a task on an isolated CPU from calling something into the kernel that
stop_machine requires to halt?

I agree in general. The thing is though that stop machine just kills any kind 
of latency
guaranties. Without the patch the machine just hangs waiting for the 
stop-machine to run
when module is inserted/removed. And running without dynamic module loading is 
not very
practical on general purpose machines. So I'd rather have an option with a big 
red warning
than no option at all :).

Well, that's something one of the greater powers (Linus, Andrew, Ingo)
must decide. ;-)


I'm in favour of better engineered method, that is, we really should try
to solve these problems in a proper way. Hacks like this might be fine
for custom kernels, but I think we should have a higher standard when it
comes to upstream - we all have to live many years with whatever we put
in there, we'd better think well about it.


100% agree. That's why I said mentioned that this patches is controversial in the first place. 
Right now those short from rewriting module loading to not use stop machine there is no other 
option. I'll think some more about it. If you guys have other ideas please drop me a note.


Thanx
Max
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-01-28 Thread Max Krasnyanskiy

Paul Jackson wrote:

Max wrote:

So far it seems that extending cpu_isolated_map
is more natural way of propagating this notion to the rest of the kernel.
Since it's very similar to the cpu_online_map concept and it's easy to 
integrated
with the code that already uses it. 


If it were just realtime support, then I suspect I'd agree that
extending cpu_isolated_map makes more sense.

But some people use realtime on systems that are also heavily
managed using cpusets.  The two have to work together.  I have
customers with systems running realtime on a few CPUs, at the
same time that they have a large batch scheduler (which is layered
on top of cpusets) managing jobs on a few hundred other CPUs.
Hence with the cpuset 'sched_load_balance' flag I think I've already
done what I think is one part of what your patches achieve by extending
the cpu_isolated_map.

This is a common situation with resource management mechanisms such
as cpusets (and more recently cgroups and the subsystem modules it
supports.)  They cut across existing core kernel code that manages such
key resources as CPUs and memory.  As best we can, they have to work
with each other.


Thanks for the info Paul. I'll definitely look into using this flag instead 
and reply with pros and cons (if any).


Max


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-01-28 Thread Paul Jackson
Thanks for the CC, Peter.

  Ingo - see question at end of message.

Max wrote:
 We've had scheduler support for CPU isolation ever since O(1) scheduler went 
 it. 
 I'd like to extend it further to avoid kernel activity on those CPUs as much 
 as possible.

I recently added the per-cpuset flag 'sched_load_balance' for some
other realtime folks, so that they can disable the kernel scheduler
load balancing on isolated CPUs.  It essentially allows for dynamic
control of which CPUs are isolated by the scheduler, using the cpuset
hierarchy, rather than enhancing the 'isolated_cpus' mask.   That
'isolated_cpus' mask remained a minimal kernel boottime parameter.
I believe this went to Linus's tree about Oct 2007.

It looks like you have three additional tweaks for realtime in this
patch set, with your patches:

  [PATCH] [CPUISOL] Do not route IRQs to the CPUs isolated at boot
  [PATCH] [CPUISOL] Support for workqueue isolation
  [PATCH] [CPUISOL] Isolated CPUs should be ignored by the stop machine

It would be interesting to see a patchset with the above three realtime
tweaks, layered on this new cpuset 'sched_load_balance' apparatus, rather
than layered on changes to make 'isolated_cpus' more dynamic.  Some of us
run realtime and cpuset-intensive loads on the same system, so like to
have those two capabilities co-operate with each other.

Ingo - what's your sense of the value of the above three realtime tweaks
   (the last three patches in Max's patch set)?

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.940.382.4214
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-01-28 Thread Paul Jackson
Max wrote:
 So far it seems that extending cpu_isolated_map
 is more natural way of propagating this notion to the rest of the kernel.
 Since it's very similar to the cpu_online_map concept and it's easy to 
 integrated
 with the code that already uses it. 

If it were just realtime support, then I suspect I'd agree that
extending cpu_isolated_map makes more sense.

But some people use realtime on systems that are also heavily
managed using cpusets.  The two have to work together.  I have
customers with systems running realtime on a few CPUs, at the
same time that they have a large batch scheduler (which is layered
on top of cpusets) managing jobs on a few hundred other CPUs.
Hence with the cpuset 'sched_load_balance' flag I think I've already
done what I think is one part of what your patches achieve by extending
the cpu_isolated_map.

This is a common situation with resource management mechanisms such
as cpusets (and more recently cgroups and the subsystem modules it
supports.)  They cut across existing core kernel code that manages such
key resources as CPUs and memory.  As best we can, they have to work
with each other.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.940.382.4214
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-01-28 Thread Peter Zijlstra

On Mon, 2008-01-28 at 11:34 -0500, Steven Rostedt wrote:
 On Mon, Jan 28, 2008 at 08:59:10AM -0600, Paul Jackson wrote:
  Thanks for the CC, Peter.
 
 Thanks from me too.
 
  Max wrote:
   We've had scheduler support for CPU isolation ever since O(1) scheduler 
   went it. 
   I'd like to extend it further to avoid kernel activity on those CPUs as 
   much as possible.
  
  I recently added the per-cpuset flag 'sched_load_balance' for some
  other realtime folks, so that they can disable the kernel scheduler
  load balancing on isolated CPUs.  It essentially allows for dynamic
  control of which CPUs are isolated by the scheduler, using the cpuset
  hierarchy, rather than enhancing the 'isolated_cpus' mask.   That
  'isolated_cpus' mask remained a minimal kernel boottime parameter.
  I believe this went to Linus's tree about Oct 2007.
  
  It looks like you have three additional tweaks for realtime in this
  patch set, with your patches:
  
[PATCH] [CPUISOL] Do not route IRQs to the CPUs isolated at boot
 
 I didn't know we still routed IRQs to isolated CPUs. I guess I need to
 look deeper into the code on this one. But I agree that isolated CPUs
 should not have IRQs routed to them.

While I agree with this in principle, I'm not sure flat out denying all
IRQs to these cpus is a good option. What about the case where we want
to service just this one specific IRQ on this CPU and no others?

Can't this be done by userspace irq routing as used by irqbalanced?

[PATCH] [CPUISOL] Support for workqueue isolation
 
 The thing about workqueues is that they should only be woken on a CPU if
 something on that CPU accessed them. IOW, the workqueue on a CPU handles
 work that was called by something on that CPU. Which means that
 something that high prio task did triggered a workqueue to do some work.
 But this can also be triggered by interrupts, so by keeping interrupts
 off the CPU no workqueue should be activated.

Quite so, if nobody uses it, there is no harm in having them around. If
they are used, its by someone already allowed on the cpu.

[PATCH] [CPUISOL] Isolated CPUs should be ignored by the stop machine
 
 This I find very dangerous. We are making an assumption that tasks on an
 isolated CPU wont be doing things that stopmachine requires. What stops
 a task on an isolated CPU from calling something into the kernel that
 stop_machine requires to halt?

Very dangerous indeed!

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-01-28 Thread Daniel Walker

On Mon, 2008-01-28 at 10:32 -0800, Max Krasnyanskiy wrote:
 Just this patches. RT patches cannot achieve what I needed. Even RTAI/Xenomai 
 can't do that.
 For example I have separate tasks with hard deadlines that must be enforced 
 in 50usec kind 
 of range and basically no idle time whatsoever. Just to give more background 
 it's a wireless
 basestation with SW MAC/Scheduler. Another requirement is for the SW to know 
 precise timing
 because SW. For example there is no way we can do predictable 1-2 usec 
 sleeps. 
 So I wrote a user-space engine that does all this, it requires full control 
 of the CPU ie minimal
 overhead from the kernel, just IPIs for memory management and that's 
 basically it. When my legal 
 department lets me I'll do a presentation on this stuff at Linux RT 
 conference or something. 

What kind of hardware are you doing this on? Also I should note there is
HRT (High resolution timers) which provided microsecond level
granularity ..

Daniel

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-01-28 Thread Max Krasnyanskiy

Daniel Walker wrote:

On Mon, 2008-01-28 at 10:32 -0800, Max Krasnyanskiy wrote:

Just this patches. RT patches cannot achieve what I needed. Even RTAI/Xenomai 
can't do that.
For example I have separate tasks with hard deadlines that must be enforced in 50usec kind 
of range and basically no idle time whatsoever. Just to give more background it's a wireless

basestation with SW MAC/Scheduler. Another requirement is for the SW to know 
precise timing
because SW. For example there is no way we can do predictable 1-2 usec sleeps. 
So I wrote a user-space engine that does all this, it requires full control of the CPU ie minimal
overhead from the kernel, just IPIs for memory management and that's basically it. When my legal 
department lets me I'll do a presentation on this stuff at Linux RT conference or something. 


What kind of hardware are you doing this on? 

All kinds of HW. I mentioned it in the intro email.
Here are the highlights
HP XW9300 (Dual Opteron NUMA box) and XW9400 (Dual Core Opteron)
HP DL145 G2 (Dual Opteron) and G3 (Dual Core Opteron)
	Dell Precision workstations (Core2 Duo and Quad) 
	Various Core2 Duo based systems uTCA boards

Mercury AXA110 (1.5Ghz)
Concurrent Tech AM110 (2.1Ghz)

This scheme should work on anything that lets you disable SMI on the isolated 
core(s).


Also I should note there is HRT (High resolution timers) which provided 
microsecond level
granularity ..
Not accurate enough and way too much overhead for what I need. I know at this point it probably 
sounds like I'm talking BS :). I wish I've released the engine and examples by now. Anyway let 
me just say that SW MAC has crazy tight deadlines with lots of small tasks. Using nanosleep()  
gettimeofday() is simply not practical. So it's all TSC based with clever time sync logic between

HW and SW.

Max
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-01-28 Thread Daniel Walker

On Mon, 2008-01-28 at 16:12 -0800, Max Krasnyanskiy wrote:

 Not accurate enough and way too much overhead for what I need. I know at this 
 point it probably 
 sounds like I'm talking BS :). I wish I've released the engine and examples 
 by now. Anyway let 
 me just say that SW MAC has crazy tight deadlines with lots of small tasks. 
 Using nanosleep()  
 gettimeofday() is simply not practical. So it's all TSC based with clever 
 time sync logic between
 HW and SW.

I don't know if it's BS or not, you clearly fixed your own problem which
is good .. Although when you say RT patches cannot achieve what I
needed. Even RTAI/Xenomai can't do that. , and HRT is Not accurate
enough and way too much overhead .. Given the hardware your using,
that's all difficult to believe.. You also said this code has been
running on production systems for two year, which means it's at least
two years old .. There's been some good sized leaps in real time linux
in the past two years ..

Daniel

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-01-28 Thread Peter Zijlstra
[ You really ought to CC people :-) ]

On Sun, 2008-01-27 at 20:09 -0800, [EMAIL PROTECTED] wrote:
 Following patch series extends CPU isolation support. Yes, most people want 
 to virtuallize 
 CPUs these days and I want to isolate them :).
 The primary idea here is to be able to use some CPU cores as dedicated 
 engines for running
 user-space code with minimal kernel overhead/intervention, think of it as an 
 SPE in the 
 Cell processor.
 
 We've had scheduler support for CPU isolation ever since O(1) scheduler went 
 it. 
 I'd like to extend it further to avoid kernel activity on those CPUs as much 
 as possible.
 In fact that the primary distinction that I'm making between say CPU sets 
 and 
 CPU isolation. CPU sets let you manage user-space load while CPU 
 isolation provides
 a way to isolate a CPU as much as possible (including kernel activities).

Ok, so you're aware of CPU sets, miss a feature, but instead of
extending it to cover your needs you build something new entirely?

 I'm personally using this for hard realtime purposes. With CPU isolation it's 
 very easy to 
 achieve single digit usec worst case and around 200 nsec average response 
 times on off-the-shelf
 multi- processor/core systems under exteme system load. I'm working with 
 legal folks on releasing 
 hard RT user-space framework for that.
 I can also see other application like simulators and stuff that can benefit 
 from this.

have you been using just this, or in combination with the -rt effort?


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-01-28 Thread Steven Rostedt


On Mon, 28 Jan 2008, Max Krasnyanskiy wrote:
[PATCH] [CPUISOL] Support for workqueue isolation
 
  The thing about workqueues is that they should only be woken on a CPU if
  something on that CPU accessed them. IOW, the workqueue on a CPU handles
  work that was called by something on that CPU. Which means that
  something that high prio task did triggered a workqueue to do some work.
  But this can also be triggered by interrupts, so by keeping interrupts
  off the CPU no workqueue should be activated.

 No no no. That's what I though too ;-). The problem is that things like NFS 
 and friends
 expect _all_ their workqueue threads to report back when they do certain 
 things like
 flushing buffers and stuff. The reason I added this is because my machines 
 were getting
 stuck because CPU0 was waiting for CPU1 to run NFS work queue threads even 
 though no IRQs
 or other things are running on it.

This sounds more like we should fix NFS than add this for all workqueues.
Again, we want workqueues to run on the behalf of whatever is running on
that CPU, including those tasks that are running on an isolcpu.



[PATCH] [CPUISOL] Isolated CPUs should be ignored by the stop machine
 
  This I find very dangerous. We are making an assumption that tasks on an
  isolated CPU wont be doing things that stopmachine requires. What stops
  a task on an isolated CPU from calling something into the kernel that
  stop_machine requires to halt?

 I agree in general. The thing is though that stop machine just kills any kind 
 of latency
 guaranties. Without the patch the machine just hangs waiting for the 
 stop-machine to run
 when module is inserted/removed. And running without dynamic module loading 
 is not very
 practical on general purpose machines. So I'd rather have an option with a 
 big red warning
 than no option at all :).

Well, that's something one of the greater powers (Linus, Andrew, Ingo)
must decide. ;-)


-- Steve

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-01-28 Thread Steven Rostedt
On Mon, Jan 28, 2008 at 08:59:10AM -0600, Paul Jackson wrote:
 Thanks for the CC, Peter.

Thanks from me too.

 Max wrote:
  We've had scheduler support for CPU isolation ever since O(1) scheduler 
  went it. 
  I'd like to extend it further to avoid kernel activity on those CPUs as 
  much as possible.
 
 I recently added the per-cpuset flag 'sched_load_balance' for some
 other realtime folks, so that they can disable the kernel scheduler
 load balancing on isolated CPUs.  It essentially allows for dynamic
 control of which CPUs are isolated by the scheduler, using the cpuset
 hierarchy, rather than enhancing the 'isolated_cpus' mask.   That
 'isolated_cpus' mask remained a minimal kernel boottime parameter.
 I believe this went to Linus's tree about Oct 2007.
 
 It looks like you have three additional tweaks for realtime in this
 patch set, with your patches:
 
   [PATCH] [CPUISOL] Do not route IRQs to the CPUs isolated at boot

I didn't know we still routed IRQs to isolated CPUs. I guess I need to
look deeper into the code on this one. But I agree that isolated CPUs
should not have IRQs routed to them.

   [PATCH] [CPUISOL] Support for workqueue isolation

The thing about workqueues is that they should only be woken on a CPU if
something on that CPU accessed them. IOW, the workqueue on a CPU handles
work that was called by something on that CPU. Which means that
something that high prio task did triggered a workqueue to do some work.
But this can also be triggered by interrupts, so by keeping interrupts
off the CPU no workqueue should be activated.

   [PATCH] [CPUISOL] Isolated CPUs should be ignored by the stop machine

This I find very dangerous. We are making an assumption that tasks on an
isolated CPU wont be doing things that stopmachine requires. What stops
a task on an isolated CPU from calling something into the kernel that
stop_machine requires to halt?

-- Steve


 
 It would be interesting to see a patchset with the above three realtime
 tweaks, layered on this new cpuset 'sched_load_balance' apparatus, rather
 than layered on changes to make 'isolated_cpus' more dynamic.  Some of us
 run realtime and cpuset-intensive loads on the same system, so like to
 have those two capabilities co-operate with each other.
 
 Ingo - what's your sense of the value of the above three realtime tweaks
(the last three patches in Max's patch set)?
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-01-28 Thread Max Krasnyanskiy

Steven Rostedt wrote:

On Mon, Jan 28, 2008 at 08:59:10AM -0600, Paul Jackson wrote:

Thanks for the CC, Peter.


Thanks from me too.


Max wrote:
We've had scheduler support for CPU isolation ever since O(1) scheduler went it. 
I'd like to extend it further to avoid kernel activity on those CPUs as much as possible.

I recently added the per-cpuset flag 'sched_load_balance' for some
other realtime folks, so that they can disable the kernel scheduler
load balancing on isolated CPUs.  It essentially allows for dynamic
control of which CPUs are isolated by the scheduler, using the cpuset
hierarchy, rather than enhancing the 'isolated_cpus' mask.   That
'isolated_cpus' mask remained a minimal kernel boottime parameter.
I believe this went to Linus's tree about Oct 2007.

It looks like you have three additional tweaks for realtime in this
patch set, with your patches:

  [PATCH] [CPUISOL] Do not route IRQs to the CPUs isolated at boot


I didn't know we still routed IRQs to isolated CPUs. I guess I need to
look deeper into the code on this one. But I agree that isolated CPUs
should not have IRQs routed to them.

Also note that it's just a convenience feature. In other words it's not that 
with this patch
we'll never route IRQs to those CPUs. They can still be explicitly routed by writing to 
irq/N/smp_affitnity.



  [PATCH] [CPUISOL] Support for workqueue isolation


The thing about workqueues is that they should only be woken on a CPU if
something on that CPU accessed them. IOW, the workqueue on a CPU handles
work that was called by something on that CPU. Which means that
something that high prio task did triggered a workqueue to do some work.
But this can also be triggered by interrupts, so by keeping interrupts
off the CPU no workqueue should be activated.

No no no. That's what I though too ;-). The problem is that things like NFS and 
friends
expect _all_ their workqueue threads to report back when they do certain things 
like
flushing buffers and stuff. The reason I added this is because my machines were 
getting
stuck because CPU0 was waiting for CPU1 to run NFS work queue threads even 
though no IRQs
or other things are running on it.


  [PATCH] [CPUISOL] Isolated CPUs should be ignored by the stop machine


This I find very dangerous. We are making an assumption that tasks on an
isolated CPU wont be doing things that stopmachine requires. What stops
a task on an isolated CPU from calling something into the kernel that
stop_machine requires to halt?
I agree in general. The thing is though that stop machine just kills any kind of latency 
guaranties. Without the patch the machine just hangs waiting for the stop-machine to run
when module is inserted/removed. And running without dynamic module loading is not very 
practical on general purpose machines. So I'd rather have an option with a big red warning 
than no option at all :).


Thanx
Max
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-01-28 Thread Max Krasnyanskiy

Paul Jackson wrote:

Thanks for the CC, Peter.

  Ingo - see question at end of message.

Max wrote:
We've had scheduler support for CPU isolation ever since O(1) scheduler went it. 
I'd like to extend it further to avoid kernel activity on those CPUs as much as possible.


I recently added the per-cpuset flag 'sched_load_balance' for some
other realtime folks, so that they can disable the kernel scheduler
load balancing on isolated CPUs.  It essentially allows for dynamic
control of which CPUs are isolated by the scheduler, using the cpuset
hierarchy, rather than enhancing the 'isolated_cpus' mask.   That
'isolated_cpus' mask remained a minimal kernel boottime parameter.
I believe this went to Linus's tree about Oct 2007.

It looks like you have three additional tweaks for realtime in this
patch set, with your patches:

  [PATCH] [CPUISOL] Do not route IRQs to the CPUs isolated at boot
  [PATCH] [CPUISOL] Support for workqueue isolation
  [PATCH] [CPUISOL] Isolated CPUs should be ignored by the stop machine

It would be interesting to see a patchset with the above three realtime
tweaks, layered on this new cpuset 'sched_load_balance' apparatus, rather
than layered on changes to make 'isolated_cpus' more dynamic.  Some of us
run realtime and cpuset-intensive loads on the same system, so like to
have those two capabilities co-operate with each other.

I'll definitely take a look. So far it seems that extending cpu_isolated_map
is more natural way of propagating this notion to the rest of the kernel.
Since it's very similar to the cpu_online_map concept and it's easy to 
integrated
with the code that already uses it. 
Anyway. I'll take a look at the cpuset flag that you mentioned and report back.


Thanx
Max
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPUISOL] CPU isolation extensions

2008-01-28 Thread Max Krasnyanskiy

Hi Peter,

Peter Zijlstra wrote:

[ You really ought to CC people :-) ]

I was not sure who though :)
Do we have a mailing list for scheduler development btw ?
Or it's just folks that you included in CC ?
Some of the latest scheduler patches brake things that I'm doing and I'd like 
to make
them configurable (RT watchdog, etc).


On Sun, 2008-01-27 at 20:09 -0800, [EMAIL PROTECTED] wrote:
Following patch series extends CPU isolation support. Yes, most people want to virtuallize 
CPUs these days and I want to isolate them :).

The primary idea here is to be able to use some CPU cores as dedicated engines 
for running
user-space code with minimal kernel overhead/intervention, think of it as an SPE in the 
Cell processor.


We've had scheduler support for CPU isolation ever since O(1) scheduler went it. 
I'd like to extend it further to avoid kernel activity on those CPUs as much as possible.
In fact that the primary distinction that I'm making between say CPU sets and 
CPU isolation. CPU sets let you manage user-space load while CPU isolation provides

a way to isolate a CPU as much as possible (including kernel activities).


Ok, so you're aware of CPU sets, miss a feature, but instead of
extending it to cover your needs you build something new entirely?

It's not really new. CPU isolation bits just has not been exported before 
that's all.
Also CPU sets seem to mostly deal with the scheduler domains. I'll reply to Paul's 
proposal to use that instead.


I'm personally using this for hard realtime purposes. With CPU isolation it's very easy to 
achieve single digit usec worst case and around 200 nsec average response times on off-the-shelf
multi- processor/core systems under exteme system load. I'm working with legal folks on releasing 
hard RT user-space framework for that.

I can also see other application like simulators and stuff that can benefit 
from this.


have you been using just this, or in combination with the -rt effort?

Just this patches. RT patches cannot achieve what I needed. Even RTAI/Xenomai 
can't do that.
For example I have separate tasks with hard deadlines that must be enforced in 50usec kind 
of range and basically no idle time whatsoever. Just to give more background it's a wireless

basestation with SW MAC/Scheduler. Another requirement is for the SW to know 
precise timing
because SW. For example there is no way we can do predictable 1-2 usec sleeps. 
So I wrote a user-space engine that does all this, it requires full control of the CPU ie minimal
overhead from the kernel, just IPIs for memory management and that's basically it. When my legal 
department lets me I'll do a presentation on this stuff at Linux RT conference or something. 


Max
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[CPUISOL] CPU isolation extensions

2008-01-27 Thread maxk

Following patch series extends CPU isolation support. Yes, most people want to 
virtuallize 
CPUs these days and I want to isolate them :).
The primary idea here is to be able to use some CPU cores as dedicated engines 
for running
user-space code with minimal kernel overhead/intervention, think of it as an 
SPE in the 
Cell processor.

We've had scheduler support for CPU isolation ever since O(1) scheduler went 
it. 
I'd like to extend it further to avoid kernel activity on those CPUs as much as 
possible.
In fact that the primary distinction that I'm making between say "CPU sets" and 
"CPU isolation". "CPU sets" let you manage user-space load while "CPU 
isolation" provides
a way to isolate a CPU as much as possible (including kernel activities).

I'm personally using this for hard realtime purposes. With CPU isolation it's 
very easy to 
achieve single digit usec worst case and around 200 nsec average response times 
on off-the-shelf
multi- processor/core systems under exteme system load. I'm working with legal 
folks on releasing 
hard RT user-space framework for that.
I can also see other application like simulators and stuff that can benefit 
from this.

I've been maintaining this stuff since around 2.6.18 and it's been running in 
production
environment for a couple of years now. It's been tested on all kinds of 
machines, from NUMA
boxes like HP xw9300/9400 to tiny uTCA boards like Mercury AXA110.
The messiest part used to be SLAB garbage collector changes. With the new SLUB 
all that mess 
goes away (ie no changes necessary). Also CFS seems to handle CPU hotplug much 
better than O(1) 
did (ie domains are recomputed dynamically) so that isolation can be done at 
any time (via sysfs). 
So this seems like a good time to merge. 

Anyway. The patchset consist of 5 patches. First three are very simple and 
non-controversial.
They simply make "CPU isolation" a configurable feature, export 
cpu_isolated_map and provide
some helper functions to access it (just like cpu_online() and friends).
Last two patches add support for isolating CPUs from running workqueus and stop 
machine.
More details in the individual patch descriptions.

Ideally I'd like all of this to go in during this merge window. If people think 
it's acceptable 
Linus or Andrew (or whoever is more appropriate Ingo maybe) can pull this patch 
set from
git://git.kernel.org/pub/scm/linux/kernel/git/maxk/cpuisol-2.6.git

That tree is rebased against latest (as of yesterday) Linus' tree.

Thanx
Max

 arch/x86/Kconfig  |1 
 arch/x86/kernel/genapic_flat_64.c |5 ++--
 drivers/base/cpu.c|   47 ++
 include/linux/cpumask.h   |3 ++
 kernel/Kconfig.cpuisol|   25 +++-
 kernel/sched.c|   13 ++
 kernel/stop_machine.c |3 --
 kernel/workqueue.c|   31 ++---
 8 files changed, 110 insertions(+), 18 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[CPUISOL] CPU isolation extensions

2008-01-27 Thread maxk

Following patch series extends CPU isolation support. Yes, most people want to 
virtuallize 
CPUs these days and I want to isolate them :).
The primary idea here is to be able to use some CPU cores as dedicated engines 
for running
user-space code with minimal kernel overhead/intervention, think of it as an 
SPE in the 
Cell processor.

We've had scheduler support for CPU isolation ever since O(1) scheduler went 
it. 
I'd like to extend it further to avoid kernel activity on those CPUs as much as 
possible.
In fact that the primary distinction that I'm making between say CPU sets and 
CPU isolation. CPU sets let you manage user-space load while CPU 
isolation provides
a way to isolate a CPU as much as possible (including kernel activities).

I'm personally using this for hard realtime purposes. With CPU isolation it's 
very easy to 
achieve single digit usec worst case and around 200 nsec average response times 
on off-the-shelf
multi- processor/core systems under exteme system load. I'm working with legal 
folks on releasing 
hard RT user-space framework for that.
I can also see other application like simulators and stuff that can benefit 
from this.

I've been maintaining this stuff since around 2.6.18 and it's been running in 
production
environment for a couple of years now. It's been tested on all kinds of 
machines, from NUMA
boxes like HP xw9300/9400 to tiny uTCA boards like Mercury AXA110.
The messiest part used to be SLAB garbage collector changes. With the new SLUB 
all that mess 
goes away (ie no changes necessary). Also CFS seems to handle CPU hotplug much 
better than O(1) 
did (ie domains are recomputed dynamically) so that isolation can be done at 
any time (via sysfs). 
So this seems like a good time to merge. 

Anyway. The patchset consist of 5 patches. First three are very simple and 
non-controversial.
They simply make CPU isolation a configurable feature, export 
cpu_isolated_map and provide
some helper functions to access it (just like cpu_online() and friends).
Last two patches add support for isolating CPUs from running workqueus and stop 
machine.
More details in the individual patch descriptions.

Ideally I'd like all of this to go in during this merge window. If people think 
it's acceptable 
Linus or Andrew (or whoever is more appropriate Ingo maybe) can pull this patch 
set from
git://git.kernel.org/pub/scm/linux/kernel/git/maxk/cpuisol-2.6.git

That tree is rebased against latest (as of yesterday) Linus' tree.

Thanx
Max

 arch/x86/Kconfig  |1 
 arch/x86/kernel/genapic_flat_64.c |5 ++--
 drivers/base/cpu.c|   47 ++
 include/linux/cpumask.h   |3 ++
 kernel/Kconfig.cpuisol|   25 +++-
 kernel/sched.c|   13 ++
 kernel/stop_machine.c |3 --
 kernel/workqueue.c|   31 ++---
 8 files changed, 110 insertions(+), 18 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/