Re: Tiny cpusets -- cpusets for small systems?

2008-02-25 Thread Max Krasnyanskiy

Paul Jackson wrote:

So. I see cpusets as a higher level API/mechanism and cpu_isolated_map as lower
level mechanism that actually makes kernel aware of what's isolated what's not.
Kind of like sched domain/cpuset relationship. ie cpusets affect sched domains
but scheduler does not use cpusets directly.


One could use cpusets to control the setting of cpu_isolated_map,
separate from the code such as your select_irq_affinity() that
uses it.
Yes. That's what I proposed too. In one of the CPU isolation threads with 
Peter. The only issue is that you need to simulate CPU_DOWN hotplug even in 
order to cleanup what's already running on those CPUs.



In a foreseeable future 2-8 cores will be most common configuration.
Do you think that cpusets are needed/useful for those machines ?
The reason I'm asking is because given the restrictions you mentioned
above it seems that you might as well just do
taskset -c 1,2,3 app1
	taskset -c 3,4,5 app2 


People tend to manage the CPU and memory placement of the threads
and processes within a single co-operating job using taskset
(sched_setaffinity) and numactl (mbind, set_mempolicy.)

They tend to manage the placement of multiple unrelated jobs onto
a single system, whether on separate or shared CPUs and nodes,
using cpusets.

>

Something like cpu_isolated_map looks to me like a system-wide
mechanism, which should, like sched_domains, be managed system-wide.
Managing it with a mechanism that encourages each thread to update
it directly, as if that thread owned the system, will break down,
resulting in conflicting updates, as multiple, insufficiently
co-operating threads issue conflicting settings.
I'm not sure how to interpret that. I think you might have mixed a couple of 
things I asked about in one reply ;-).
The question was that given the restrictions you talked about when you 
explained tiny-cpusets functionality I asked how much one gains from using 
them compared to the taskset/numactl. ie On the machines with 2-8 cores it's 
fairly easy to manage cpus with simple affinity masks.


The second part of your reply seems to imply that I somehow made you think 
that I suggested that cpu_isolated_map is managed per thread. That is of 
course not the case. It's definitely a system-wide mechanism and individual 
threads have nothing to do with it.
btw I just re-read my prev reply. I definitely did not say anything about 
threads managing cpu_isolated_map :).



Stuff that I'm working on this days (wireless basestations) is designed
with the  following model:
cpuN - runs soft-RT networking and management code
cpuN+1 to cpuN+x - are used as dedicated engines
ie Simplest example would be
cpu0 - runs IP, L2 and control plane
	cpu1 - runs hard-RT MAC 

So if CPU isolation is implemented on top of the cpusets what kind of API do 
you envision for such an app ?


That depends on what more API is needed.  Do we need to place
irqs better ... cpusets might not be a natural for that use.
Aren't irqs directed to specific CPUs, not to hierarchically
nested subsets of CPUs.


You clipped the part where I elaborated. Which was:
So if CPU isolation is implemented on top of the cpusets what kind of API do 
you envision for such an app ? I mean currently cpusets seems to be mostly dealing
with entire processes, whereas in this case we're really dealing with the threads. 
ie Different threads of the same process require different policies, some must run

on isolated cpus some must not. I guess one could write a thread's pid into 
cpusets
fs but that's not very convenient. pthread_set_affinity() is exactly what's 
needed.
In other words how would an app place its individual threads into the 
different cpusets.
IRQ stuff is separate, like we said above cpusets could simply update 
cpu_isolated_map which would take care of IRQs. I was talking specifically 
about the thread management.



Separate question:
  Is it desired that the dedicated CPUs cpuN+1 ... cpuN+x even appear
  as general purpose systems running a Linux kernel in your systems?
  These dedicated engines seem more like intelligent devices to me,
  such as disk controllers, which the kernel controls via device
  drivers, not by loading itself on them too.
We still want to be able to run normal threads on them. Which means IPI, 
memory management, etc is still needed. So yes they better show up as normal 
CPUs :)
Also with dynamic isolation you can for example un-isolate a cpu when you're 
compiling stuff on the machine and then isolate it when you're running special 
app(s).


Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Tiny cpusets -- cpusets for small systems?

2008-02-25 Thread Paul Jackson
> So. I see cpusets as a higher level API/mechanism and cpu_isolated_map as 
> lower
> level mechanism that actually makes kernel aware of what's isolated what's 
> not.
> Kind of like sched domain/cpuset relationship. ie cpusets affect sched domains
> but scheduler does not use cpusets directly.

One could use cpusets to control the setting of cpu_isolated_map,
separate from the code such as your select_irq_affinity() that
uses it.


> In a foreseeable future 2-8 cores will be most common configuration.
> Do you think that cpusets are needed/useful for those machines ?
> The reason I'm asking is because given the restrictions you mentioned
> above it seems that you might as well just do
>   taskset -c 1,2,3 app1
>   taskset -c 3,4,5 app2 

People tend to manage the CPU and memory placement of the threads
and processes within a single co-operating job using taskset
(sched_setaffinity) and numactl (mbind, set_mempolicy.)

They tend to manage the placement of multiple unrelated jobs onto
a single system, whether on separate or shared CPUs and nodes,
using cpusets.

Something like cpu_isolated_map looks to me like a system-wide
mechanism, which should, like sched_domains, be managed system-wide.
Managing it with a mechanism that encourages each thread to update
it directly, as if that thread owned the system, will break down,
resulting in conflicting updates, as multiple, insufficiently
co-operating threads issue conflicting settings.


> Stuff that I'm working on this days (wireless basestations) is designed
> with the  following model:
>   cpuN - runs soft-RT networking and management code
>   cpuN+1 to cpuN+x - are used as dedicated engines
> ie Simplest example would be
>   cpu0 - runs IP, L2 and control plane
>   cpu1 - runs hard-RT MAC 
> 
> So if CPU isolation is implemented on top of the cpusets what kind of API do 
> you envision for such an app ?

That depends on what more API is needed.  Do we need to place
irqs better ... cpusets might not be a natural for that use.
Aren't irqs directed to specific CPUs, not to hierarchically
nested subsets of CPUs.

Separate question:
  Is it desired that the dedicated CPUs cpuN+1 ... cpuN+x even appear
  as general purpose systems running a Linux kernel in your systems?
  These dedicated engines seem more like intelligent devices to me,
  such as disk controllers, which the kernel controls via device
  drivers, not by loading itself on them too.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.940.382.4214
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Tiny cpusets -- cpusets for small systems?

2008-02-25 Thread Paul Jackson
 So. I see cpusets as a higher level API/mechanism and cpu_isolated_map as 
 lower
 level mechanism that actually makes kernel aware of what's isolated what's 
 not.
 Kind of like sched domain/cpuset relationship. ie cpusets affect sched domains
 but scheduler does not use cpusets directly.

One could use cpusets to control the setting of cpu_isolated_map,
separate from the code such as your select_irq_affinity() that
uses it.


 In a foreseeable future 2-8 cores will be most common configuration.
 Do you think that cpusets are needed/useful for those machines ?
 The reason I'm asking is because given the restrictions you mentioned
 above it seems that you might as well just do
   taskset -c 1,2,3 app1
   taskset -c 3,4,5 app2 

People tend to manage the CPU and memory placement of the threads
and processes within a single co-operating job using taskset
(sched_setaffinity) and numactl (mbind, set_mempolicy.)

They tend to manage the placement of multiple unrelated jobs onto
a single system, whether on separate or shared CPUs and nodes,
using cpusets.

Something like cpu_isolated_map looks to me like a system-wide
mechanism, which should, like sched_domains, be managed system-wide.
Managing it with a mechanism that encourages each thread to update
it directly, as if that thread owned the system, will break down,
resulting in conflicting updates, as multiple, insufficiently
co-operating threads issue conflicting settings.


 Stuff that I'm working on this days (wireless basestations) is designed
 with the  following model:
   cpuN - runs soft-RT networking and management code
   cpuN+1 to cpuN+x - are used as dedicated engines
 ie Simplest example would be
   cpu0 - runs IP, L2 and control plane
   cpu1 - runs hard-RT MAC 
 
 So if CPU isolation is implemented on top of the cpusets what kind of API do 
 you envision for such an app ?

That depends on what more API is needed.  Do we need to place
irqs better ... cpusets might not be a natural for that use.
Aren't irqs directed to specific CPUs, not to hierarchically
nested subsets of CPUs.

Separate question:
  Is it desired that the dedicated CPUs cpuN+1 ... cpuN+x even appear
  as general purpose systems running a Linux kernel in your systems?
  These dedicated engines seem more like intelligent devices to me,
  such as disk controllers, which the kernel controls via device
  drivers, not by loading itself on them too.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.940.382.4214
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Tiny cpusets -- cpusets for small systems?

2008-02-25 Thread Max Krasnyanskiy

Paul Jackson wrote:

So. I see cpusets as a higher level API/mechanism and cpu_isolated_map as lower
level mechanism that actually makes kernel aware of what's isolated what's not.
Kind of like sched domain/cpuset relationship. ie cpusets affect sched domains
but scheduler does not use cpusets directly.


One could use cpusets to control the setting of cpu_isolated_map,
separate from the code such as your select_irq_affinity() that
uses it.
Yes. That's what I proposed too. In one of the CPU isolation threads with 
Peter. The only issue is that you need to simulate CPU_DOWN hotplug even in 
order to cleanup what's already running on those CPUs.



In a foreseeable future 2-8 cores will be most common configuration.
Do you think that cpusets are needed/useful for those machines ?
The reason I'm asking is because given the restrictions you mentioned
above it seems that you might as well just do
taskset -c 1,2,3 app1
	taskset -c 3,4,5 app2 


People tend to manage the CPU and memory placement of the threads
and processes within a single co-operating job using taskset
(sched_setaffinity) and numactl (mbind, set_mempolicy.)

They tend to manage the placement of multiple unrelated jobs onto
a single system, whether on separate or shared CPUs and nodes,
using cpusets.



Something like cpu_isolated_map looks to me like a system-wide
mechanism, which should, like sched_domains, be managed system-wide.
Managing it with a mechanism that encourages each thread to update
it directly, as if that thread owned the system, will break down,
resulting in conflicting updates, as multiple, insufficiently
co-operating threads issue conflicting settings.
I'm not sure how to interpret that. I think you might have mixed a couple of 
things I asked about in one reply ;-).
The question was that given the restrictions you talked about when you 
explained tiny-cpusets functionality I asked how much one gains from using 
them compared to the taskset/numactl. ie On the machines with 2-8 cores it's 
fairly easy to manage cpus with simple affinity masks.


The second part of your reply seems to imply that I somehow made you think 
that I suggested that cpu_isolated_map is managed per thread. That is of 
course not the case. It's definitely a system-wide mechanism and individual 
threads have nothing to do with it.
btw I just re-read my prev reply. I definitely did not say anything about 
threads managing cpu_isolated_map :).



Stuff that I'm working on this days (wireless basestations) is designed
with the  following model:
cpuN - runs soft-RT networking and management code
cpuN+1 to cpuN+x - are used as dedicated engines
ie Simplest example would be
cpu0 - runs IP, L2 and control plane
	cpu1 - runs hard-RT MAC 

So if CPU isolation is implemented on top of the cpusets what kind of API do 
you envision for such an app ?


That depends on what more API is needed.  Do we need to place
irqs better ... cpusets might not be a natural for that use.
Aren't irqs directed to specific CPUs, not to hierarchically
nested subsets of CPUs.


You clipped the part where I elaborated. Which was:
So if CPU isolation is implemented on top of the cpusets what kind of API do 
you envision for such an app ? I mean currently cpusets seems to be mostly dealing
with entire processes, whereas in this case we're really dealing with the threads. 
ie Different threads of the same process require different policies, some must run

on isolated cpus some must not. I guess one could write a thread's pid into 
cpusets
fs but that's not very convenient. pthread_set_affinity() is exactly what's 
needed.
In other words how would an app place its individual threads into the 
different cpusets.
IRQ stuff is separate, like we said above cpusets could simply update 
cpu_isolated_map which would take care of IRQs. I was talking specifically 
about the thread management.



Separate question:
  Is it desired that the dedicated CPUs cpuN+1 ... cpuN+x even appear
  as general purpose systems running a Linux kernel in your systems?
  These dedicated engines seem more like intelligent devices to me,
  such as disk controllers, which the kernel controls via device
  drivers, not by loading itself on them too.
We still want to be able to run normal threads on them. Which means IPI, 
memory management, etc is still needed. So yes they better show up as normal 
CPUs :)
Also with dynamic isolation you can for example un-isolate a cpu when you're 
compiling stuff on the machine and then isolate it when you're running special 
app(s).


Max
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Tiny cpusets -- cpusets for small systems?

2008-02-24 Thread KOSAKI Motohiro
Hi Pual

> Looking at some IA64 sn2 config builds I have laying about, I see the
> following text sizes for a couple of versions, showing the growth of
> the cpuset/cgroup apparatus over time:
> 
>   25933   2.6.18-rc3-mm1/kernel/cpuset.o (Aug 2006)
> vs.
>   37823   2.6.25-rc2-mm1/kernel/cgroup.o (Feb 2008)
>   19558   2.6.25-rc2-mm1/kernel/cpuset.o
> 
> So the total has grown from 25933 to 57381 text bytes (note that
> this is IA64 arch; most arch's will have proportionately smaller
> text sizes.)

hm, interesting.
but unfortunately the cpuset have more than depend.(i.e. CONFIG_SMP)

To more bad thing, some embedded cpu have poor or no atomic instruction
support.
at that, turn on CONFIG_SMP become large performace regression ;)


I am not already embedded engineer.
thus, I might have made a mistake.
(BTW: I am large server engineer now)

but no thinking dependency is wrong, may be.


Pavel, what do you think it?

- kosaki


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Tiny cpusets -- cpusets for small systems?

2008-02-24 Thread KOSAKI Motohiro
Hi Pual

 Looking at some IA64 sn2 config builds I have laying about, I see the
 following text sizes for a couple of versions, showing the growth of
 the cpuset/cgroup apparatus over time:
 
   25933   2.6.18-rc3-mm1/kernel/cpuset.o (Aug 2006)
 vs.
   37823   2.6.25-rc2-mm1/kernel/cgroup.o (Feb 2008)
   19558   2.6.25-rc2-mm1/kernel/cpuset.o
 
 So the total has grown from 25933 to 57381 text bytes (note that
 this is IA64 arch; most arch's will have proportionately smaller
 text sizes.)

hm, interesting.
but unfortunately the cpuset have more than depend.(i.e. CONFIG_SMP)

To more bad thing, some embedded cpu have poor or no atomic instruction
support.
at that, turn on CONFIG_SMP become large performace regression ;)


I am not already embedded engineer.
thus, I might have made a mistake.
(BTW: I am large server engineer now)

but no thinking dependency is wrong, may be.


Pavel, what do you think it?

- kosaki


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Tiny cpusets -- cpusets for small systems?

2008-02-23 Thread Max Krasnyansky
Hi Paul,

> A couple of proposals have been made recently by people working Linux
> on smaller systems, for improving realtime isolation and memory
> pressure handling:
> 
> (1) cpu isolation for hard(er) realtime
>   http://lkml.org/lkml/2008/2/21/517
>   Max Krasnyanskiy <[EMAIL PROTECTED]>
>   [PATCH sched-devel 0/7] CPU isolation extensions
> 
> (2) notify user space of tight memory
>   http://lkml.org/lkml/2008/2/9/144
>   KOSAKI Motohiro <[EMAIL PROTECTED]>
>   [PATCH 0/8][for -mm] mem_notify v6
> 
> In both cases, some of us have responded "why not use cpusets", and the
> original submitters have replied "cpusets are too fat"  (well, they
> were more diplomatic than that, but I guess I can say that ;)

My primary issue with cpusets (from CPU isolation perspective that is) was 
not the fatness. I did make a couple of comments like "On dual-cpu box
I do not need cpusets to manage the CPUs" but that's not directly related to
the CPU isolation.
For the CPU isolation in particular I need code like this

int select_irq_affinity(unsigned int irq)
{
cpumask_t usable_cpus;
cpus_andnot(usable_cpus, cpu_online_map, cpu_isolated_map);
irq_desc[irq].affinity = usable_cpus;
irq_desc[irq].chip->set_affinity(irq, usable_cpus);
return 0;
}

How would you implement that with cpusets ?
I haven't seen you patches but I'd imagine that they will still need locks and 
iterators for "Is CPU N isolated" functionality.

So. I see cpusets as a higher level API/mechanism and cpu_isolated_map as lower
level mechanism that actually makes kernel aware of what's isolated what's not.
Kind of like sched domain/cpuset relationship. ie cpusets affect sched domains
but scheduler does not use cpusets directly.

> I wonder if there might be room for a "tiny cpusets" configuration option:
>   * provide the same hooks to the rest of the kernel, and
>   * provide the same syntactic interface to user space, but
>   * with more limited semantics.
> 
> The primary semantic limit I'd suggest would be supporting exactly
> one layer depth of cpusets, not a full hierarchy.  So one could still
> successfully issue from user space 'mkdir /dev/cpuset/foo', but trying
> to do 'mkdir /dev/cpuset/foo/bar' would fail.  This reminds me of
> very early FAT file systems, which had just a single, fixed size
> root directory ;).  There might even be a configurable fixed upper
> limit on how many /dev/cpuset/* directories were allowed, further
> simplifying the locking and dynamic memory behavior of this apparatus.
In a foreseeable future 2-8 cores will be most common configuration.
Do you think that cpusets are needed/useful for those machines ?
The reason I'm asking is because given the restrictions you mentioned
above it seems that you might as well just do
taskset -c 1,2,3 app1
taskset -c 3,4,5 app2 
Yes it's not quite the same of course but imo covers most cases. That's what we
do on 2-4 cores these days, and are quite happy with that. ie We either let the 
specialized apps manage their thread affinities themselves or use "taskset" to 
manage the apps.

> User space would see the same API, except that some valid operations
> on full cpusets, such as a nested mkdir, would fail on tiny cpusets.
Speaking of user-space API. I guess it's not directly related to the 
tiny-cpusets 
proposal but rather to the cpusets in general.
Stuff that I'm working on this days (wireless basestations) is designed with 
the 
following model:
cpuN - runs soft-RT networking and management code
cpuN+1 to cpuN+x - are used as dedicated engines
ie Simplest example would be
cpu0 - runs IP, L2 and control plane
cpu1 - runs hard-RT MAC 

So if CPU isolation is implemented on top of the cpusets what kind of API do 
you envision for such an app ? I mean currently cpusets seems to be mostly 
dealing
with entire processes, whereas in this case we're really dealing with the 
threads. 
ie Different threads of the same process require different policies, some must 
run
on isolated cpus some must not. I guess one could write a thread's pid into 
cpusets
fs but that's not very convenient. pthread_set_affinity() is exactly what's 
needed.
Personally I do not see much use for cpusets for those kinds of designs. But 
maybe
I missing something. I got really excited when cpusets where first merged into 
mainline but after looking closer I could not really find a use for them, at 
least 
for not for our apps.

Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Tiny cpusets -- cpusets for small systems?

2008-02-23 Thread Paul Jackson
Paul M wrote:
> I'm don't think that either of these would be enough to justify big
> changes to cpusets or cgroups, although eliminating bloat is always a
> good thing.

My "tiny cpuset" idea doesn't so much eliminate bloat, as provide a
thin alternative, along side of the existing fat alternative.  So
far as kernel source goes, it would get bigger, not smaller, with now
two CONFIG choices for cpusets, fat or tiny.

The odds are, however, given that one of us has just promised not to
code this, and the other of us doesn't figure it's worth it, this
idea will not live long.  Someone would have to step up from the
embedded side with a coded version that saved a nice chunk of memory
(from their perspective) to get this off the ground, and no telling
whether even that would meet with a warm reception.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.940.382.4214
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Tiny cpusets -- cpusets for small systems?

2008-02-23 Thread Paul Menage
On Sat, Feb 23, 2008 at 4:09 AM, Paul Jackson <[EMAIL PROTECTED]> wrote:
> A couple of proposals have been made recently by people working Linux
>  on smaller systems, for improving realtime isolation and memory
>  pressure handling:
>
>  (1) cpu isolation for hard(er) realtime
> http://lkml.org/lkml/2008/2/21/517
> Max Krasnyanskiy <[EMAIL PROTECTED]>
> [PATCH sched-devel 0/7] CPU isolation extensions
>
>  (2) notify user space of tight memory
> http://lkml.org/lkml/2008/2/9/144
> KOSAKI Motohiro <[EMAIL PROTECTED]>
> [PATCH 0/8][for -mm] mem_notify v6
>
>  In both cases, some of us have responded "why not use cpusets", and the
>  original submitters have replied "cpusets are too fat"  (well, they
>  were more diplomatic than that, but I guess I can say that ;)

Having read those threads, it looks to me as though:

- the parts of Max's problem that would be solved by cpusets can be
mostly accomplished just via sched_setaffinity()

- Motohiro wants to add a new system-wide API that you would also like
to have available on a per-cpuset basis. (Why not just add two access
points for the same feature?)

I'm don't think that either of these would be enough to justify big
changes to cpusets or cgroups, although eliminating bloat is always a
good thing.

>  The primary semantic limit I'd suggest would be supporting exactly
>  one layer depth of cpusets, not a full hierarchy.  So one could still
>  successfully issue from user space 'mkdir /dev/cpuset/foo', but trying
>  to do 'mkdir /dev/cpuset/foo/bar' would fail.  This reminds me of
>  very early FAT file systems, which had just a single, fixed size
>  root directory ;).  There might even be a configurable fixed upper
>  limit on how many /dev/cpuset/* directories were allowed, further
>  simplifying the locking and dynamic memory behavior of this apparatus.

I'm not sure that either of these would make much difference to the
overall footprint.

A single layer of cpusets would allow you to simplify
validate_change() but not much else.

I don't see how a fixed upper limit on the number of cpusets makes the
locking sufficiently simpler to save much code.

>
>  How this extends to cgroups I don't know; for now I suspect that most
>  cgroup module development is motivated by the needs of larger systems,
>  not smaller systems.  However, cpusets is now a module client of
>  cgroups, and it is cgroups that now provides cpusets with its interface
>  to the vfs infrastructure.  It would seem unfortunate if this relation
>  was not continued with tiny cpusets.  Perhaps someone can imagine a tiny
>  cgroups?  This might be the most difficult part of this proposal.

If we wanted to go this way, I can imagine a cgroups config option
that forces just a single hierarchy, which would allow a bunch of
simplifications that would save plenty of text.

>
>  Looking at some IA64 sn2 config builds I have laying about, I see the
>  following text sizes for a couple of versions, showing the growth of
>  the cpuset/cgroup apparatus over time:
>
> 25933   2.6.18-rc3-mm1/kernel/cpuset.o (Aug 2006)
>  vs.
> 37823   2.6.25-rc2-mm1/kernel/cgroup.o (Feb 2008)
> 19558   2.6.25-rc2-mm1/kernel/cpuset.o
>
>  So the total has grown from 25933 to 57381 text bytes (note that
>  this is IA64 arch; most arch's will have proportionately smaller
>  text sizes.)

On x86_64 they're:

cgroup.o: 17348
cpuset.o: 8533

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Tiny cpusets -- cpusets for small systems?

2008-02-23 Thread Paul Jackson
A couple of proposals have been made recently by people working Linux
on smaller systems, for improving realtime isolation and memory
pressure handling:

(1) cpu isolation for hard(er) realtime
http://lkml.org/lkml/2008/2/21/517
Max Krasnyanskiy <[EMAIL PROTECTED]>
[PATCH sched-devel 0/7] CPU isolation extensions

(2) notify user space of tight memory
http://lkml.org/lkml/2008/2/9/144
KOSAKI Motohiro <[EMAIL PROTECTED]>
[PATCH 0/8][for -mm] mem_notify v6

In both cases, some of us have responded "why not use cpusets", and the
original submitters have replied "cpusets are too fat"  (well, they
were more diplomatic than that, but I guess I can say that ;)

I wonder if there might be room for a "tiny cpusets" configuration
option:
  * provide the same hooks to the rest of the kernel, and
  * provide the same syntactic interface to user space, but
  * with more limited semantics.

The primary semantic limit I'd suggest would be supporting exactly
one layer depth of cpusets, not a full hierarchy.  So one could still
successfully issue from user space 'mkdir /dev/cpuset/foo', but trying
to do 'mkdir /dev/cpuset/foo/bar' would fail.  This reminds me of
very early FAT file systems, which had just a single, fixed size
root directory ;).  There might even be a configurable fixed upper
limit on how many /dev/cpuset/* directories were allowed, further
simplifying the locking and dynamic memory behavior of this apparatus.

Some other features that aren't so easy to implement, and which have
less value on small systems, such as notify_on_release, could also be
stubbed out and always disabled, simply returning error if requested
to be enabled from user space.  The recent, chunky piece of code
needed to compute dynamic sched domains from the cpuset hierarchy
probably admits of a simpler variant in the tiny cpuset configuration.

I suppose it would still be a vfs-based pseudo file system (even
embedded Linux still has that infrastructure), except that the vfs
operator functions could be simpler, as this would really be just
a flat set of cpumask_t's and nodemask_t's at the core of the
implementation, not an arbitrarily nested hierarchy of them.  See
further my comments on cgroups, below.

The rest of the kernel would see no difference ... except that some
of the cpuset_*() hooks would return more quickly.  This tiny cpuset
option would provide the same kernel hooks as are now provided by
the defines and inline stubs, in the "#else" to "#endif" half of the
"#ifdef CONFIG_CPUSETS" code lines in linux/cpuset.h.

User space would see the same API, except that some valid operations
on full cpusets, such as a nested mkdir, would fail on tiny cpusets.

How this extends to cgroups I don't know; for now I suspect that most
cgroup module development is motivated by the needs of larger systems,
not smaller systems.  However, cpusets is now a module client of
cgroups, and it is cgroups that now provides cpusets with its interface
to the vfs infrastructure.  It would seem unfortunate if this relation
was not continued with tiny cpusets.  Perhaps someone can imagine a tiny
cgroups?  This might be the most difficult part of this proposal.

Looking at some IA64 sn2 config builds I have laying about, I see the
following text sizes for a couple of versions, showing the growth of
the cpuset/cgroup apparatus over time:

25933   2.6.18-rc3-mm1/kernel/cpuset.o (Aug 2006)
vs.
37823   2.6.25-rc2-mm1/kernel/cgroup.o (Feb 2008)
19558   2.6.25-rc2-mm1/kernel/cpuset.o

So the total has grown from 25933 to 57381 text bytes (note that
this is IA64 arch; most arch's will have proportionately smaller
text sizes.)

Unfortunately, ideas without code are usually met with the sound of
silence, as well they should be.  Furthermore, I can promise that I
have no time to design or develop this myself; my good employer is
quite focused on the other end of things - the big honkin NUMA and
cluster systems.


-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.940.382.4214
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Tiny cpusets -- cpusets for small systems?

2008-02-23 Thread Paul Jackson
A couple of proposals have been made recently by people working Linux
on smaller systems, for improving realtime isolation and memory
pressure handling:

(1) cpu isolation for hard(er) realtime
http://lkml.org/lkml/2008/2/21/517
Max Krasnyanskiy [EMAIL PROTECTED]
[PATCH sched-devel 0/7] CPU isolation extensions

(2) notify user space of tight memory
http://lkml.org/lkml/2008/2/9/144
KOSAKI Motohiro [EMAIL PROTECTED]
[PATCH 0/8][for -mm] mem_notify v6

In both cases, some of us have responded why not use cpusets, and the
original submitters have replied cpusets are too fat  (well, they
were more diplomatic than that, but I guess I can say that ;)

I wonder if there might be room for a tiny cpusets configuration
option:
  * provide the same hooks to the rest of the kernel, and
  * provide the same syntactic interface to user space, but
  * with more limited semantics.

The primary semantic limit I'd suggest would be supporting exactly
one layer depth of cpusets, not a full hierarchy.  So one could still
successfully issue from user space 'mkdir /dev/cpuset/foo', but trying
to do 'mkdir /dev/cpuset/foo/bar' would fail.  This reminds me of
very early FAT file systems, which had just a single, fixed size
root directory ;).  There might even be a configurable fixed upper
limit on how many /dev/cpuset/* directories were allowed, further
simplifying the locking and dynamic memory behavior of this apparatus.

Some other features that aren't so easy to implement, and which have
less value on small systems, such as notify_on_release, could also be
stubbed out and always disabled, simply returning error if requested
to be enabled from user space.  The recent, chunky piece of code
needed to compute dynamic sched domains from the cpuset hierarchy
probably admits of a simpler variant in the tiny cpuset configuration.

I suppose it would still be a vfs-based pseudo file system (even
embedded Linux still has that infrastructure), except that the vfs
operator functions could be simpler, as this would really be just
a flat set of cpumask_t's and nodemask_t's at the core of the
implementation, not an arbitrarily nested hierarchy of them.  See
further my comments on cgroups, below.

The rest of the kernel would see no difference ... except that some
of the cpuset_*() hooks would return more quickly.  This tiny cpuset
option would provide the same kernel hooks as are now provided by
the defines and inline stubs, in the #else to #endif half of the
#ifdef CONFIG_CPUSETS code lines in linux/cpuset.h.

User space would see the same API, except that some valid operations
on full cpusets, such as a nested mkdir, would fail on tiny cpusets.

How this extends to cgroups I don't know; for now I suspect that most
cgroup module development is motivated by the needs of larger systems,
not smaller systems.  However, cpusets is now a module client of
cgroups, and it is cgroups that now provides cpusets with its interface
to the vfs infrastructure.  It would seem unfortunate if this relation
was not continued with tiny cpusets.  Perhaps someone can imagine a tiny
cgroups?  This might be the most difficult part of this proposal.

Looking at some IA64 sn2 config builds I have laying about, I see the
following text sizes for a couple of versions, showing the growth of
the cpuset/cgroup apparatus over time:

25933   2.6.18-rc3-mm1/kernel/cpuset.o (Aug 2006)
vs.
37823   2.6.25-rc2-mm1/kernel/cgroup.o (Feb 2008)
19558   2.6.25-rc2-mm1/kernel/cpuset.o

So the total has grown from 25933 to 57381 text bytes (note that
this is IA64 arch; most arch's will have proportionately smaller
text sizes.)

Unfortunately, ideas without code are usually met with the sound of
silence, as well they should be.  Furthermore, I can promise that I
have no time to design or develop this myself; my good employer is
quite focused on the other end of things - the big honkin NUMA and
cluster systems.


-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.940.382.4214
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Tiny cpusets -- cpusets for small systems?

2008-02-23 Thread Paul Menage
On Sat, Feb 23, 2008 at 4:09 AM, Paul Jackson [EMAIL PROTECTED] wrote:
 A couple of proposals have been made recently by people working Linux
  on smaller systems, for improving realtime isolation and memory
  pressure handling:

  (1) cpu isolation for hard(er) realtime
 http://lkml.org/lkml/2008/2/21/517
 Max Krasnyanskiy [EMAIL PROTECTED]
 [PATCH sched-devel 0/7] CPU isolation extensions

  (2) notify user space of tight memory
 http://lkml.org/lkml/2008/2/9/144
 KOSAKI Motohiro [EMAIL PROTECTED]
 [PATCH 0/8][for -mm] mem_notify v6

  In both cases, some of us have responded why not use cpusets, and the
  original submitters have replied cpusets are too fat  (well, they
  were more diplomatic than that, but I guess I can say that ;)

Having read those threads, it looks to me as though:

- the parts of Max's problem that would be solved by cpusets can be
mostly accomplished just via sched_setaffinity()

- Motohiro wants to add a new system-wide API that you would also like
to have available on a per-cpuset basis. (Why not just add two access
points for the same feature?)

I'm don't think that either of these would be enough to justify big
changes to cpusets or cgroups, although eliminating bloat is always a
good thing.

  The primary semantic limit I'd suggest would be supporting exactly
  one layer depth of cpusets, not a full hierarchy.  So one could still
  successfully issue from user space 'mkdir /dev/cpuset/foo', but trying
  to do 'mkdir /dev/cpuset/foo/bar' would fail.  This reminds me of
  very early FAT file systems, which had just a single, fixed size
  root directory ;).  There might even be a configurable fixed upper
  limit on how many /dev/cpuset/* directories were allowed, further
  simplifying the locking and dynamic memory behavior of this apparatus.

I'm not sure that either of these would make much difference to the
overall footprint.

A single layer of cpusets would allow you to simplify
validate_change() but not much else.

I don't see how a fixed upper limit on the number of cpusets makes the
locking sufficiently simpler to save much code.


  How this extends to cgroups I don't know; for now I suspect that most
  cgroup module development is motivated by the needs of larger systems,
  not smaller systems.  However, cpusets is now a module client of
  cgroups, and it is cgroups that now provides cpusets with its interface
  to the vfs infrastructure.  It would seem unfortunate if this relation
  was not continued with tiny cpusets.  Perhaps someone can imagine a tiny
  cgroups?  This might be the most difficult part of this proposal.

If we wanted to go this way, I can imagine a cgroups config option
that forces just a single hierarchy, which would allow a bunch of
simplifications that would save plenty of text.


  Looking at some IA64 sn2 config builds I have laying about, I see the
  following text sizes for a couple of versions, showing the growth of
  the cpuset/cgroup apparatus over time:

 25933   2.6.18-rc3-mm1/kernel/cpuset.o (Aug 2006)
  vs.
 37823   2.6.25-rc2-mm1/kernel/cgroup.o (Feb 2008)
 19558   2.6.25-rc2-mm1/kernel/cpuset.o

  So the total has grown from 25933 to 57381 text bytes (note that
  this is IA64 arch; most arch's will have proportionately smaller
  text sizes.)

On x86_64 they're:

cgroup.o: 17348
cpuset.o: 8533

Paul
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Tiny cpusets -- cpusets for small systems?

2008-02-23 Thread Paul Jackson
Paul M wrote:
 I'm don't think that either of these would be enough to justify big
 changes to cpusets or cgroups, although eliminating bloat is always a
 good thing.

My tiny cpuset idea doesn't so much eliminate bloat, as provide a
thin alternative, along side of the existing fat alternative.  So
far as kernel source goes, it would get bigger, not smaller, with now
two CONFIG choices for cpusets, fat or tiny.

The odds are, however, given that one of us has just promised not to
code this, and the other of us doesn't figure it's worth it, this
idea will not live long.  Someone would have to step up from the
embedded side with a coded version that saved a nice chunk of memory
(from their perspective) to get this off the ground, and no telling
whether even that would meet with a warm reception.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.940.382.4214
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Tiny cpusets -- cpusets for small systems?

2008-02-23 Thread Max Krasnyansky
Hi Paul,

 A couple of proposals have been made recently by people working Linux
 on smaller systems, for improving realtime isolation and memory
 pressure handling:
 
 (1) cpu isolation for hard(er) realtime
   http://lkml.org/lkml/2008/2/21/517
   Max Krasnyanskiy [EMAIL PROTECTED]
   [PATCH sched-devel 0/7] CPU isolation extensions
 
 (2) notify user space of tight memory
   http://lkml.org/lkml/2008/2/9/144
   KOSAKI Motohiro [EMAIL PROTECTED]
   [PATCH 0/8][for -mm] mem_notify v6
 
 In both cases, some of us have responded why not use cpusets, and the
 original submitters have replied cpusets are too fat  (well, they
 were more diplomatic than that, but I guess I can say that ;)

My primary issue with cpusets (from CPU isolation perspective that is) was 
not the fatness. I did make a couple of comments like On dual-cpu box
I do not need cpusets to manage the CPUs but that's not directly related to
the CPU isolation.
For the CPU isolation in particular I need code like this

int select_irq_affinity(unsigned int irq)
{
cpumask_t usable_cpus;
cpus_andnot(usable_cpus, cpu_online_map, cpu_isolated_map);
irq_desc[irq].affinity = usable_cpus;
irq_desc[irq].chip-set_affinity(irq, usable_cpus);
return 0;
}

How would you implement that with cpusets ?
I haven't seen you patches but I'd imagine that they will still need locks and 
iterators for Is CPU N isolated functionality.

So. I see cpusets as a higher level API/mechanism and cpu_isolated_map as lower
level mechanism that actually makes kernel aware of what's isolated what's not.
Kind of like sched domain/cpuset relationship. ie cpusets affect sched domains
but scheduler does not use cpusets directly.

 I wonder if there might be room for a tiny cpusets configuration option:
   * provide the same hooks to the rest of the kernel, and
   * provide the same syntactic interface to user space, but
   * with more limited semantics.
 
 The primary semantic limit I'd suggest would be supporting exactly
 one layer depth of cpusets, not a full hierarchy.  So one could still
 successfully issue from user space 'mkdir /dev/cpuset/foo', but trying
 to do 'mkdir /dev/cpuset/foo/bar' would fail.  This reminds me of
 very early FAT file systems, which had just a single, fixed size
 root directory ;).  There might even be a configurable fixed upper
 limit on how many /dev/cpuset/* directories were allowed, further
 simplifying the locking and dynamic memory behavior of this apparatus.
In a foreseeable future 2-8 cores will be most common configuration.
Do you think that cpusets are needed/useful for those machines ?
The reason I'm asking is because given the restrictions you mentioned
above it seems that you might as well just do
taskset -c 1,2,3 app1
taskset -c 3,4,5 app2 
Yes it's not quite the same of course but imo covers most cases. That's what we
do on 2-4 cores these days, and are quite happy with that. ie We either let the 
specialized apps manage their thread affinities themselves or use taskset to 
manage the apps.

 User space would see the same API, except that some valid operations
 on full cpusets, such as a nested mkdir, would fail on tiny cpusets.
Speaking of user-space API. I guess it's not directly related to the 
tiny-cpusets 
proposal but rather to the cpusets in general.
Stuff that I'm working on this days (wireless basestations) is designed with 
the 
following model:
cpuN - runs soft-RT networking and management code
cpuN+1 to cpuN+x - are used as dedicated engines
ie Simplest example would be
cpu0 - runs IP, L2 and control plane
cpu1 - runs hard-RT MAC 

So if CPU isolation is implemented on top of the cpusets what kind of API do 
you envision for such an app ? I mean currently cpusets seems to be mostly 
dealing
with entire processes, whereas in this case we're really dealing with the 
threads. 
ie Different threads of the same process require different policies, some must 
run
on isolated cpus some must not. I guess one could write a thread's pid into 
cpusets
fs but that's not very convenient. pthread_set_affinity() is exactly what's 
needed.
Personally I do not see much use for cpusets for those kinds of designs. But 
maybe
I missing something. I got really excited when cpusets where first merged into 
mainline but after looking closer I could not really find a use for them, at 
least 
for not for our apps.

Max
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/