subject:"\[CPUISOL\] CPU isolation extensions"

Re: cpuisol: CPU isolation extensions (take 2)

2008-02-06 Thread Max Krasnyanskiy

CC'ing linux-rt-users because I think my explanation below may be interesting for the
RT folks.

Mark Hounschell wrote:

Max Krasnyanskiy wrote:

With CPU isolation
it's very easy to achieve single digit usec worst case and around 200
nsec average response times on off-the-shelf
multi- processor/core systems (vanilla kernel plus these patches) even
under exteme system load.

Hi Max, could you elaborate on what sort events your response times are
from?

Sure. As I mentioned before I'm working with our legal team on releasing hard RT engine
that uses isolated CPUs. You can think of that engine as a giant SW PLL.
It requires a time source that it locks on to. For example the time source can be the
kernel clock (gtod), some kind of memory mapped counter, or some external event.
In my case the HW sends me an Ethernet packet every 24 millisecond.
Once the PLL locks onto the timesource the engine executes a predefined "timeline".
The timeline basically specifies tasks with offsets in nanoseconds from the start of
the cycle (ie "at 100 nsec run task1", "at 15000 run task2", etc). The tasks are just
callbacks.
The jitter in running those tasks is what I meant by "response time". Essentially it's
a polling design where SW knows precisely when to expect an event. It's not a general

purpose solution but works beautifully for things like wireless PHY/MAC layers
were the
framing structure is very deterministic and must be strictly enforced. It works for other
applications as well once you get your head wrapped around the idea :). ie That you do
not get interrupts for every single event, the SW already knows when that even will come.

btw The engine also enforces the deadlines. For example it knows right away if
a task is
late and it knows exactly how late. That helps in debugging, a lot :).

The other option is to run normal pthreads on the isolated CPUs. As long as the
threads
are carefully designed not to do certain things you can get very decent worst case latencies
(10-12 usec on Opterons and Core2) even with vanilla kernels (patched with the isolation
patches of course) because all the latency sources have been removed from those CPUs.

Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: cpuisol: CPU isolation extensions (take 2)

2008-02-06 Thread Mark Hounschell

Max Krasnyanskiy wrote:

> With CPU isolation
> it's very easy to achieve single digit usec worst case and around 200
> nsec average response times on off-the-shelf
> multi- processor/core systems (vanilla kernel plus these patches) even
> under exteme system load. 

Hi Max, could you elaborate on what sort events your response times are
from?

Regards
Mark
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: cpuisol: CPU isolation extensions (take 2)

2008-02-06 Thread Mark Hounschell

Max Krasnyanskiy wrote:

 With CPU isolation
 it's very easy to achieve single digit usec worst case and around 200
 nsec average response times on off-the-shelf
 multi- processor/core systems (vanilla kernel plus these patches) even
 under exteme system load. 

Hi Max, could you elaborate on what sort events your response times are
from?

Regards
Mark
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

cpuisol: CPU isolation extensions (take 2)

2008-02-05 Thread Max Krasnyanskiy

It seems that git-send-email for some reasons did not send an introductory
email.
So I'm sending it manually. Sorry if you get it twice.

---

Following patch series extends CPU isolation support. Yes, most people want to virtuallize
CPUs these days and I want to isolate them :) .

The primary idea here is to be able to use some CPU cores as the dedicated
engines for running
user-space code with minimal kernel overhead/intervention, think of it as an SPE in the
Cell processor. I'd like to be able to run a CPU intensive (%100) RT task on one of the
processors without adversely affecting or being affected by the other system activities.
System activities here include _kernel_ activities as well.

I'm personally using this for hard realtime purposes. With CPU isolation it's very easy to
achieve single digit usec worst case and around 200 nsec average response times on off-the-shelf
multi- processor/core systems (vanilla kernel plus these patches) even under exteme system load.
I'm working with legal folks on releasing hard RT user-space framework for that.
I believe with the current multi-core CPU trend we will see more and more applications that
explore this capability: RT gaming engines, simulators, hard RT apps, etc.

Hence the proposal is to extend current CPU isolation feature.
The new definition of the CPU isolation would be:
---
1. Isolated CPU(s) must not be subject to scheduler load balancing
Users must explicitly bind threads in order to run on those CPU(s).

2. By default interrupts must not be routed to the isolated CPU(s)
User must route interrupts (if any) to those CPUs explicitly.

3. In general kernel subsystems must avoid activity on the isolated CPU(s) as
much as possible
Includes workqueues, per CPU threads, etc.
This feature is configurable and is disabled by default.
---

I've been maintaining this stuff since around 2.6.18 and it's been running in
production
environment for a couple of years now. It's been tested on all kinds of
machines, from NUMA
boxes like HP xw9300/9400 to tiny uTCA boards like Mercury AXA110.
The messiest part used to be SLAB garbage collector changes. With the new SLUB all that mess
goes away (ie no changes necessary). Also CFS seems to handle CPU hotplug much better than O(1)
did (ie domains are recomputed dynamically) so that isolation can be done at any time (via sysfs).
So this seems like a good time to merge.

We've had scheduler support for CPU isolation ever since O(1) scheduler went
it. In other words
#1 is already supported. These patches do not change/affect that functionality in any way.
#2 is trivial one liner change to the IRQ init code. #3 is address by a couple of separate patches.

The patchset consist of 4 patches. First two are very simple. They simply make "CPU isolation" a
configurable feature, export cpu_isolated_map and provide some helper functions to access it (just
like cpu_online() and friends).

Last two patches add support for isolating CPUs from running workqueus and stop
machine.
More details in the individual patch descriptions.

Folks involved in the scheduler/cpuset development provided a lot of feedback
on the first series
of patches. I believe I managed to explain and clarify every aspect.
Paul Jackson initially suggested to implement #2 and #3 using cpusets subsystem. Paul and I looked
at it more closely and determined that exporting cpu_isolated_map instead is a better option.

Last patch to the stop machine is potentially unsafe is marked as highly experimental. Unfortunately
it's currently the only option that allows dynamic module insertion/removal for above scenarios.
If people still feel that it's t ugly I can revert that change and keep it in the separate tree
for now.

Ideally I'd like all of this to go in during this merge window.
Linus please pull this patch set from

git://git.kernel.org/pub/scm/linux/kernel/git/maxk/cpuisol-2.6.git

That tree is rebased against latest (as of today) Linus' tree.

Thanx
Max

b/arch/x86/Kconfig |1
b/arch/x86/kernel/genapic_flat_64.c |5 ++-

cpuisol: Make cpu isolation configrable and export isolated map
cpuisol: Do not route IRQs to the CPUs isolated at boot
cpuisol: Do not schedule workqueues on the isolated CPUs
cpuisol: Do not halt isolated CPUs with Stop Machine

--
To unsubscribe from this

cpuisol: CPU isolation extensions (take 2)

2008-02-05 Thread Max Krasnyanskiy

It seems that git-send-email for some reasons did not send an introductory
email.
So I'm sending it manually. Sorry if you get it twice.

---

Following patch series extends CPU isolation support. Yes, most people want to virtuallize
CPUs these days and I want to isolate them :) .

2. By default interrupts must not be routed to the isolated CPU(s)
User must route interrupts (if any) to those CPUs explicitly.

3. In general kernel subsystems must avoid activity on the isolated CPU(s) as
much as possible
Includes workqueues, per CPU threads, etc.
This feature is configurable and is disabled by default.
---

The patchset consist of 4 patches. First two are very simple. They simply make CPU isolation a
configurable feature, export cpu_isolated_map and provide some helper functions to access it (just
like cpu_online() and friends).

Last two patches add support for isolating CPUs from running workqueus and stop
machine.
More details in the individual patch descriptions.

Ideally I'd like all of this to go in during this merge window.
Linus please pull this patch set from

git://git.kernel.org/pub/scm/linux/kernel/git/maxk/cpuisol-2.6.git

That tree is rebased against latest (as of today) Linus' tree.

Thanx
Max

b/arch/x86/Kconfig |1
b/arch/x86/kernel/genapic_flat_64.c |5 ++-

--
To unsubscribe from this list:

Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-04 Thread Max Krasnyansky



Paul Jackson wrote:
> Max K wrote:
>>> And for another thing, we already declare externs in cpumask.h for
>>> the other, more widely used, cpu_*_map variables cpu_possible_map,
>>> cpu_online_map, and cpu_present_map.
>> Well, to address #2 and #3 isolated map will need to be exported as well.
>> Those other maps do not really have much to do with the scheduler code.
>> That's why I think either kernel/cpumask.c or kernel/cpu.c is a better place 
>> for them.
> 
> Well, if you have need it to be exported for #2 or #3, then that's ok
> by me - export it.
> 
> I'm unaware of any kernel/cpumask.c.  If you meant lib/cpumask.c, then
> I'd prefer you not put it there, as lib/cpumask.c just contains the
> implementation details of the abstract data type cpumask_t, not any of
> its uses.  If you mean kernel/cpuset.c, then that's not a good choice
> either, as that just contains the implementation details of the cpuset
> subsystem.  You should usually define such things in one of the files
> using it, and unless there is clearly a -better- place to move the
> definition, it's usually better to just leave it where it is.

I was thinking of creating the new file kernel/cpumask.c. But it probably does 
not make sense 
just for the masks. I'm now thinking kernel/cpu.c is the best place for it. It 
contains all 
the cpu hotplug logic that deals with those maps at the very top it has stuff 
like

/* Serializes the updates to cpu_online_map, cpu_present_map */
static DEFINE_MUTEX(cpu_add_remove_lock);

So it seems to make sense to keep the maps in there.

Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-04 Thread Paul Jackson

Max K wrote:
> > And for another thing, we already declare externs in cpumask.h for
> > the other, more widely used, cpu_*_map variables cpu_possible_map,
> > cpu_online_map, and cpu_present_map.
> Well, to address #2 and #3 isolated map will need to be exported as well.
> Those other maps do not really have much to do with the scheduler code.
> That's why I think either kernel/cpumask.c or kernel/cpu.c is a better place 
> for them.

Well, if you have need it to be exported for #2 or #3, then that's ok
by me - export it.

I'm unaware of any kernel/cpumask.c.  If you meant lib/cpumask.c, then
I'd prefer you not put it there, as lib/cpumask.c just contains the
implementation details of the abstract data type cpumask_t, not any of
its uses.  If you mean kernel/cpuset.c, then that's not a good choice
either, as that just contains the implementation details of the cpuset
subsystem.  You should usually define such things in one of the files
using it, and unless there is clearly a -better- place to move the
definition, it's usually better to just leave it where it is.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.940.382.4214
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

CPU isolation and workqueues [was Re: [CPUISOL] CPU isolation extensions]

2008-02-04 Thread Max Krasnyanskiy



Peter Zijlstra wrote:

On Mon, 2008-01-28 at 14:00 -0500, Steven Rostedt wrote:

On Mon, 28 Jan 2008, Max Krasnyanskiy wrote:

  [PATCH] [CPUISOL] Support for workqueue isolation

The thing about workqueues is that they should only be woken on a CPU if
something on that CPU accessed them. IOW, the workqueue on a CPU handles
work that was called by something on that CPU. Which means that
something that high prio task did triggered a workqueue to do some work.
But this can also be triggered by interrupts, so by keeping interrupts
off the CPU no workqueue should be activated.

No no no. That's what I though too ;-). The problem is that things like NFS and 
friends
expect _all_ their workqueue threads to report back when they do certain things 
like
flushing buffers and stuff. The reason I added this is because my machines were 
getting
stuck because CPU0 was waiting for CPU1 to run NFS work queue threads even 
though no IRQs
or other things are running on it.

This sounds more like we should fix NFS than add this for all workqueues.
Again, we want workqueues to run on the behalf of whatever is running on
that CPU, including those tasks that are running on an isolcpu.


agreed, by looking at my top output (and not the nfs code) it looks like
it just spawns a configurable number of active kernel threads which are
not cpu bound by in any way. I think just removing the isolated cpus
from their runnable mask should take care of them.


Peter, Steven,

I think I convinced you guys last time but I did not have a convincing example. 
So here is some
more info on why workqueues need to be aware of isolated cpus.

Here is how a work queue gets flushed.

static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq)
{
   int active;

   if (cwq->thread == current) {
   /*
* Probably keventd trying to flush its own queue. So simply run
* it by hand rather than deadlocking.
*/
   run_workqueue(cwq);
   active = 1;
   } else {
   struct wq_barrier barr;

   active = 0;
   spin_lock_irq(>lock);
   if (!list_empty(>worklist) || cwq->current_work != NULL) {
   insert_wq_barrier(cwq, , 1);
   active = 1;
   }
   spin_unlock_irq(>lock);

   if (active)
   wait_for_completion();
   }

   return active;
}

void fastcall flush_workqueue(struct workqueue_struct *wq)
{
   const cpumask_t *cpu_map = wq_cpu_map(wq);
   int cpu;

   might_sleep();
   lock_acquire(>lockdep_map, 0, 0, 0, 2, _THIS_IP_);
   lock_release(>lockdep_map, 1, _THIS_IP_);
   for_each_cpu_mask(cpu, *cpu_map)
   flush_cpu_workqueue(per_cpu_ptr(wq->cpu_wq, cpu));
}

In other words it schedules some work on each cpu and expects workqueue thread to run and 
trigger the completion. This is what I meant that _all_ threads are expected to report 
back even if there is nothing running on that CPU.


So my patch simply makes sure that isolated CPUs are ignored (if work queue 
isolation is enabled)
that work queue threads are not started on isolated in the CPUs that are 
isolated.

Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-04 Thread Max Krasnyanskiy

Paul Jackson wrote:

Max wrote:

Looks like I failed to explain what I'm trying to achieve. So let me try again.

Well done. I read through that, expecting to disagree or at least
to not understand at some point, and got all the way through nodding
my head in agreement. Good.

Whether the earlier confusions were lack of clarity in the presentation,
or lack of competence in my brain ... well guess I don't want to ask that
question ;).

Well ... just one minor point:

Max wrote in reply to pj:

The cpu_isolated_map is a file static variable known only within
the kernel/sched.c file; this should not change.

I completely disagree. In fact I think all the cpu_xxx_map (online, present,
isolated)
variables do not belong in the scheduler code. I'm thinking of submitting a
patch that
factors them out into kernel/cpumask.c We already have cpumask.h.

Huh? Why would you want to do that?

For one thing, the map being discussed here, cpu_isolated_map,
is only used in sched.c, so why publish it wider?

And for another thing, we already declare externs in cpumask.h for
the other, more widely used, cpu_*_map variables cpu_possible_map,
cpu_online_map, and cpu_present_map.

Well, to address #2 and #3 isolated map will need to be exported as well.
Those other maps do not really have much to do with the scheduler code.
That's why I think either kernel/cpumask.c or kernel/cpu.c is a better place
for them.

Other than that detail, we seem to be communicating and in agreement on
your first item, isolating CPU scheduler load balancing. Good.

On your other two items, irq and workqueue isolation, which I had
suggested doing via cpuset sched_load_balance, I now agree that that
wasn't a good idea.

I am still a little surprised at using isolation extensions to
disable irqs on select CPUs; but others have thought far more about
irqs than I have, so I'll be quiet.

Please note that we're not talking about completely disabling IRQs. We're
talking about
not routing them to the isolated CPUs by default. It's still possible to
explicitly reroute an IRQ
to the isolated CPU.
Why is this needed ? It is actually very easy to explain. IRQs are the major source of latency
and overhead. IRQ handlers themselves are mostly ok but they typically schedule soft irqs, work
queues and timers on the same CPU where an IRQ is handled. In other words if an isolated CPU is
receiving IRQs it's not really isolated, because it's running a whole bunch of different kernel
code (ie we're talking latencies, cache usage, etc).
If course some folks may want to explicitly route certain IRQs to the isolated CPUs. For example
if an app depends on the network stack it may make sense to route an IRQ from the NIC to the same
CPU the app is running on.

Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-04 Thread Paul Jackson

Max wrote:
> Looks like I failed to explain what I'm trying to achieve. So let me try 
> again.

Well done.  I read through that, expecting to disagree or at least
to not understand at some point, and got all the way through nodding
my head in agreement.  Good.

Whether the earlier confusions were lack of clarity in the presentation,
or lack of competence in my brain ... well guess I don't want to ask that
question ;).

Well ... just one minor point:

Max wrote in reply to pj:
> > The cpu_isolated_map is a file static variable known only within
> > the kernel/sched.c file; this should not change.
> I completely disagree. In fact I think all the cpu_xxx_map (online, present, 
> isolated)
> variables do not belong in the scheduler code. I'm thinking of submitting a 
> patch that
> factors them out into kernel/cpumask.c We already have cpumask.h.

Huh?  Why would you want to do that?

For one thing, the map being discussed here, cpu_isolated_map,
is only used in sched.c, so why publish it wider?

And for another thing, we already declare externs in cpumask.h for
the other, more widely used, cpu_*_map variables cpu_possible_map,
cpu_online_map, and cpu_present_map.

Other than that detail, we seem to be communicating and in agreement on
your first item, isolating CPU scheduler load balancing.  Good.

On your other two items, irq and workqueue isolation, which I had
suggested doing via cpuset sched_load_balance, I now agree that that
wasn't a good idea.

I am still a little surprised at using isolation extensions to
disable irqs on select CPUs; but others have thought far more about
irqs than I have, so I'll be quiet.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.940.382.4214
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-04 Thread Paul Jackson

Max wrote:
 Looks like I failed to explain what I'm trying to achieve. So let me try 
 again.

Well done.  I read through that, expecting to disagree or at least
to not understand at some point, and got all the way through nodding
my head in agreement.  Good.

Whether the earlier confusions were lack of clarity in the presentation,
or lack of competence in my brain ... well guess I don't want to ask that
question ;).

Well ... just one minor point:

Max wrote in reply to pj:
  The cpu_isolated_map is a file static variable known only within
  the kernel/sched.c file; this should not change.
 I completely disagree. In fact I think all the cpu_xxx_map (online, present, 
 isolated)
 variables do not belong in the scheduler code. I'm thinking of submitting a 
 patch that
 factors them out into kernel/cpumask.c We already have cpumask.h.

Huh?  Why would you want to do that?

For one thing, the map being discussed here, cpu_isolated_map,
is only used in sched.c, so why publish it wider?

And for another thing, we already declare externs in cpumask.h for
the other, more widely used, cpu_*_map variables cpu_possible_map,
cpu_online_map, and cpu_present_map.

Other than that detail, we seem to be communicating and in agreement on
your first item, isolating CPU scheduler load balancing.  Good.

On your other two items, irq and workqueue isolation, which I had
suggested doing via cpuset sched_load_balance, I now agree that that
wasn't a good idea.

I am still a little surprised at using isolation extensions to
disable irqs on select CPUs; but others have thought far more about
irqs than I have, so I'll be quiet.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.940.382.4214
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-04 Thread Max Krasnyanskiy

Paul Jackson wrote:

Max wrote:

Looks like I failed to explain what I'm trying to achieve. So let me try again.

Well done. I read through that, expecting to disagree or at least
to not understand at some point, and got all the way through nodding
my head in agreement. Good.

Whether the earlier confusions were lack of clarity in the presentation,
or lack of competence in my brain ... well guess I don't want to ask that
question ;).

Well ... just one minor point:

Max wrote in reply to pj:

The cpu_isolated_map is a file static variable known only within
the kernel/sched.c file; this should not change.

Huh? Why would you want to do that?

For one thing, the map being discussed here, cpu_isolated_map,
is only used in sched.c, so why publish it wider?

And for another thing, we already declare externs in cpumask.h for
the other, more widely used, cpu_*_map variables cpu_possible_map,
cpu_online_map, and cpu_present_map.

Other than that detail, we seem to be communicating and in agreement on
your first item, isolating CPU scheduler load balancing. Good.

On your other two items, irq and workqueue isolation, which I had
suggested doing via cpuset sched_load_balance, I now agree that that
wasn't a good idea.

I am still a little surprised at using isolation extensions to
disable irqs on select CPUs; but others have thought far more about
irqs than I have, so I'll be quiet.

Max
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

CPU isolation and workqueues [was Re: [CPUISOL] CPU isolation extensions]

2008-02-04 Thread Max Krasnyanskiy



Peter Zijlstra wrote:

On Mon, 2008-01-28 at 14:00 -0500, Steven Rostedt wrote:

On Mon, 28 Jan 2008, Max Krasnyanskiy wrote:

  [PATCH] [CPUISOL] Support for workqueue isolation

The thing about workqueues is that they should only be woken on a CPU if
something on that CPU accessed them. IOW, the workqueue on a CPU handles
work that was called by something on that CPU. Which means that
something that high prio task did triggered a workqueue to do some work.
But this can also be triggered by interrupts, so by keeping interrupts
off the CPU no workqueue should be activated.

No no no. That's what I though too ;-). The problem is that things like NFS and 
friends
expect _all_ their workqueue threads to report back when they do certain things 
like
flushing buffers and stuff. The reason I added this is because my machines were 
getting
stuck because CPU0 was waiting for CPU1 to run NFS work queue threads even 
though no IRQs
or other things are running on it.

This sounds more like we should fix NFS than add this for all workqueues.
Again, we want workqueues to run on the behalf of whatever is running on
that CPU, including those tasks that are running on an isolcpu.


agreed, by looking at my top output (and not the nfs code) it looks like
it just spawns a configurable number of active kernel threads which are
not cpu bound by in any way. I think just removing the isolated cpus
from their runnable mask should take care of them.


Peter, Steven,

I think I convinced you guys last time but I did not have a convincing example. 
So here is some
more info on why workqueues need to be aware of isolated cpus.

Here is how a work queue gets flushed.

static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq)
{
   int active;

   if (cwq-thread == current) {
   /*
* Probably keventd trying to flush its own queue. So simply run
* it by hand rather than deadlocking.
*/
   run_workqueue(cwq);
   active = 1;
   } else {
   struct wq_barrier barr;

   active = 0;
   spin_lock_irq(cwq-lock);
   if (!list_empty(cwq-worklist) || cwq-current_work != NULL) {
   insert_wq_barrier(cwq, barr, 1);
   active = 1;
   }
   spin_unlock_irq(cwq-lock);

   if (active)
   wait_for_completion(barr.done);
   }

   return active;
}

void fastcall flush_workqueue(struct workqueue_struct *wq)
{
   const cpumask_t *cpu_map = wq_cpu_map(wq);
   int cpu;

   might_sleep();
   lock_acquire(wq-lockdep_map, 0, 0, 0, 2, _THIS_IP_);
   lock_release(wq-lockdep_map, 1, _THIS_IP_);
   for_each_cpu_mask(cpu, *cpu_map)
   flush_cpu_workqueue(per_cpu_ptr(wq-cpu_wq, cpu));
}

In other words it schedules some work on each cpu and expects workqueue thread to run and 
trigger the completion. This is what I meant that _all_ threads are expected to report 
back even if there is nothing running on that CPU.


So my patch simply makes sure that isolated CPUs are ignored (if work queue 
isolation is enabled)
that work queue threads are not started on isolated in the CPUs that are 
isolated.

Max
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-04 Thread Paul Jackson

Max K wrote:
  And for another thing, we already declare externs in cpumask.h for
  the other, more widely used, cpu_*_map variables cpu_possible_map,
  cpu_online_map, and cpu_present_map.
 Well, to address #2 and #3 isolated map will need to be exported as well.
 Those other maps do not really have much to do with the scheduler code.
 That's why I think either kernel/cpumask.c or kernel/cpu.c is a better place 
 for them.

Well, if you have need it to be exported for #2 or #3, then that's ok
by me - export it.

I'm unaware of any kernel/cpumask.c.  If you meant lib/cpumask.c, then
I'd prefer you not put it there, as lib/cpumask.c just contains the
implementation details of the abstract data type cpumask_t, not any of
its uses.  If you mean kernel/cpuset.c, then that's not a good choice
either, as that just contains the implementation details of the cpuset
subsystem.  You should usually define such things in one of the files
using it, and unless there is clearly a -better- place to move the
definition, it's usually better to just leave it where it is.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.940.382.4214
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-04 Thread Max Krasnyansky



Paul Jackson wrote:
 Max K wrote:
 And for another thing, we already declare externs in cpumask.h for
 the other, more widely used, cpu_*_map variables cpu_possible_map,
 cpu_online_map, and cpu_present_map.
 Well, to address #2 and #3 isolated map will need to be exported as well.
 Those other maps do not really have much to do with the scheduler code.
 That's why I think either kernel/cpumask.c or kernel/cpu.c is a better place 
 for them.
 
 Well, if you have need it to be exported for #2 or #3, then that's ok
 by me - export it.
 
 I'm unaware of any kernel/cpumask.c.  If you meant lib/cpumask.c, then
 I'd prefer you not put it there, as lib/cpumask.c just contains the
 implementation details of the abstract data type cpumask_t, not any of
 its uses.  If you mean kernel/cpuset.c, then that's not a good choice
 either, as that just contains the implementation details of the cpuset
 subsystem.  You should usually define such things in one of the files
 using it, and unless there is clearly a -better- place to move the
 definition, it's usually better to just leave it where it is.

I was thinking of creating the new file kernel/cpumask.c. But it probably does 
not make sense 
just for the masks. I'm now thinking kernel/cpu.c is the best place for it. It 
contains all 
the cpu hotplug logic that deals with those maps at the very top it has stuff 
like

/* Serializes the updates to cpu_online_map, cpu_present_map */
static DEFINE_MUTEX(cpu_add_remove_lock);

So it seems to make sense to keep the maps in there.

Max
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

67 matches

Mail list logo