Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-28 Thread Paul Jackson
Thanks for your well worded response, Shailabh.

Others will have to make further comments and
decisions here.  You have understood what I had
to say, and responded well.  I have nothing to
add at this point that would help further.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-28 Thread Shailabh Nagar



Paul Jackson wrote:

Sorry for the late response - I just saw this note.


Shailabh wrote:

So if the current CPU controller 
 implementation is considered too intrusive/unacceptable, it can be 
reworked or (and we certainly hope not) even rejected in perpetuity. 



It is certainly reasonable that you would hope such.

But this hypothetical possibility concerns me a little.  Where would
that leave CKRM, if it was in the mainline kernel, but there was no CPU
controller in the mainline kernel?  


It would be unfortunate indeed since CPU is the first resource that 
people want to try and control.


However, I feel the following are also true:

1. It is still better to have CKRM with the I/O, memory, network, 
forkrate controllers than to have nothing just because the CPU 
controller is unacceptable. Each controller is useful in its own right. 
It may not be enough to justify the framework all by itself but together 
with others (and the possibility of future controllers and per-class 
metrics), it is sufficient.


2. A CPU controller which is acceptable can be developed. It may not 
work as well because of the need to keep it simple and not affect the 
non-CKRM user path, but it will be better than not having anything. 
Years ago, people said a low-overhead SMP scheduler couldn't be written 
and they were proved wrong. Currently Ingo is hard at work to make 
acceptable-impact real time scheduling happen. So why should we rule out 
the possibility of someone being able to develop a CKRM CPU controller 
with acceptable impact ?


Basically, I'm pointing out that there is no reason to hold the 
acceptance of the CKRM framework + other controller's hostage to its 
current CPU controller implementation (or any one controller's 
implementation for that matter).




Wouldn't that be a rather serious
problem for many users of CKRM if they wanted to work on mainline
kernels?


Yes it would. And one could say that its one of the features of the 
Linux kernel community that they would have to learn to accept. Just 
like the embedded folks who were rooting for realtime enhancements to be 
made mainstream for years now, like the RAS folks who have been making a 
case for better dump/probe tools, and you, who's tried in the past to 
get the community to accept PAGG/CSA :-)


But I don't think we need to be resigned to a CPU controller-less 
existence quite yet.  Using the examples given earlier, realtime is 
being discussed seriously now and RAS features are getting acceptance. 
So why should one rule out the possibility of an acceptable CPU 
controller for CKRM being developed ?


We, the current developers of CKRM, hope our current design can be a 
basis for the "one controller to rule them all" ! But if there are other 
ways of doing it or people can point out whats wrong with the 
implementation, it can be reworked or rewritten from scratch.


The important thing, as Andrew said, is to get real feedback about what 
is unacceptable in the current implementation and any ideas on how it 
can be done better. But lets start off with what has been put out there 
in -mm rather than getting stuck on discussing something that hasn't 
been even put out yet ?



--Shailabh



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-28 Thread Shailabh Nagar



Paul Jackson wrote:

Sorry for the late response - I just saw this note.


Shailabh wrote:

So if the current CPU controller 
 implementation is considered too intrusive/unacceptable, it can be 
reworked or (and we certainly hope not) even rejected in perpetuity. 



It is certainly reasonable that you would hope such.

But this hypothetical possibility concerns me a little.  Where would
that leave CKRM, if it was in the mainline kernel, but there was no CPU
controller in the mainline kernel?  


It would be unfortunate indeed since CPU is the first resource that 
people want to try and control.


However, I feel the following are also true:

1. It is still better to have CKRM with the I/O, memory, network, 
forkrate controllers than to have nothing just because the CPU 
controller is unacceptable. Each controller is useful in its own right. 
It may not be enough to justify the framework all by itself but together 
with others (and the possibility of future controllers and per-class 
metrics), it is sufficient.


2. A CPU controller which is acceptable can be developed. It may not 
work as well because of the need to keep it simple and not affect the 
non-CKRM user path, but it will be better than not having anything. 
Years ago, people said a low-overhead SMP scheduler couldn't be written 
and they were proved wrong. Currently Ingo is hard at work to make 
acceptable-impact real time scheduling happen. So why should we rule out 
the possibility of someone being able to develop a CKRM CPU controller 
with acceptable impact ?


Basically, I'm pointing out that there is no reason to hold the 
acceptance of the CKRM framework + other controller's hostage to its 
current CPU controller implementation (or any one controller's 
implementation for that matter).




Wouldn't that be a rather serious
problem for many users of CKRM if they wanted to work on mainline
kernels?


Yes it would. And one could say that its one of the features of the 
Linux kernel community that they would have to learn to accept. Just 
like the embedded folks who were rooting for realtime enhancements to be 
made mainstream for years now, like the RAS folks who have been making a 
case for better dump/probe tools, and you, who's tried in the past to 
get the community to accept PAGG/CSA :-)


But I don't think we need to be resigned to a CPU controller-less 
existence quite yet.  Using the examples given earlier, realtime is 
being discussed seriously now and RAS features are getting acceptance. 
So why should one rule out the possibility of an acceptable CPU 
controller for CKRM being developed ?


We, the current developers of CKRM, hope our current design can be a 
basis for the one controller to rule them all ! But if there are other 
ways of doing it or people can point out whats wrong with the 
implementation, it can be reworked or rewritten from scratch.


The important thing, as Andrew said, is to get real feedback about what 
is unacceptable in the current implementation and any ideas on how it 
can be done better. But lets start off with what has been put out there 
in -mm rather than getting stuck on discussing something that hasn't 
been even put out yet ?



--Shailabh



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-28 Thread Paul Jackson
Thanks for your well worded response, Shailabh.

Others will have to make further comments and
decisions here.  You have understood what I had
to say, and responded well.  I have nothing to
add at this point that would help further.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-23 Thread Mark Hahn
> > if CKRM is just extensions, I think it should be an external patch.
> > if it provides a path towards unifying the many disparate RM mechanisms
> > already in the kernel, great!
> 
> OK, so if it provides a path towards unifying these, what should happen
> to the old interfaces when they conflict with those offered by CKRM?

I don't think the name matters, as long as the RM code is simplified/unified.
that is, the only difference at first would be a change in name - 
same behavior.

> For instance, I'm considering how a per-class (re)nice setting would
> work. What should happen when the user (re)nices a process to a
> different value than the nice of the process' class? Should CKRM:

it has to behave as it does now, unless the admin has imposed some 
class structure other than the normal POSIX one (ie, nice pertains 
only to a process and is inherited by future children.)

> a) disable the old interface by
>   i) removing it
>   ii) return an error when CKRM is active
>   iii) return an error when CKRM has specified a nice value for the
> process via membership in a class
>   iv) return an error when the (re)nice value is inconsistent with the
> nice value assigned to the class

some interfaces must remain (renice), and if their behavior is implemented
via CKRM, it must, by default, act as before.  other interfaces (say 
overcommit_ratio) probably don't need to remain.

> b) trust the user, ignore the class nice value, and allow the new nice
> value

users can only nice up, and that policy needs to remain, obviously.
you appear to be asking what happens when the scope of the old mechanism
conflicts with the scope determined by admin-set CKRM classes.  I'd 
say that nicing a single process should change the nice of the whole 
class that the process is in, if any.  otherwise, it acts to rip that 
process out of the class, which is probably even less 'least surprise'.

>   This sort of question would probably come up for any other CKRM
> "embraced-and-extended" tunables. Should they use the answer to this
> one, or would it go on a case-by-case basis?

I don't see that CKRM should play by rules different from other 
kernel improvements - preserve standard/former behavior when that 
behavior is documented (certainly nice is).  in the absense of admin-set
classes, nice would behave the same. 

all CKRM is doing here is providing a broader framework to hang the tunables
on.  it should be able to express all existing tunables in scope.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-23 Thread Mark Hahn
  if CKRM is just extensions, I think it should be an external patch.
  if it provides a path towards unifying the many disparate RM mechanisms
  already in the kernel, great!
 
 OK, so if it provides a path towards unifying these, what should happen
 to the old interfaces when they conflict with those offered by CKRM?

I don't think the name matters, as long as the RM code is simplified/unified.
that is, the only difference at first would be a change in name - 
same behavior.

 For instance, I'm considering how a per-class (re)nice setting would
 work. What should happen when the user (re)nices a process to a
 different value than the nice of the process' class? Should CKRM:

it has to behave as it does now, unless the admin has imposed some 
class structure other than the normal POSIX one (ie, nice pertains 
only to a process and is inherited by future children.)

 a) disable the old interface by
   i) removing it
   ii) return an error when CKRM is active
   iii) return an error when CKRM has specified a nice value for the
 process via membership in a class
   iv) return an error when the (re)nice value is inconsistent with the
 nice value assigned to the class

some interfaces must remain (renice), and if their behavior is implemented
via CKRM, it must, by default, act as before.  other interfaces (say 
overcommit_ratio) probably don't need to remain.

 b) trust the user, ignore the class nice value, and allow the new nice
 value

users can only nice up, and that policy needs to remain, obviously.
you appear to be asking what happens when the scope of the old mechanism
conflicts with the scope determined by admin-set CKRM classes.  I'd 
say that nicing a single process should change the nice of the whole 
class that the process is in, if any.  otherwise, it acts to rip that 
process out of the class, which is probably even less 'least surprise'.

   This sort of question would probably come up for any other CKRM
 embraced-and-extended tunables. Should they use the answer to this
 one, or would it go on a case-by-case basis?

I don't see that CKRM should play by rules different from other 
kernel improvements - preserve standard/former behavior when that 
behavior is documented (certainly nice is).  in the absense of admin-set
classes, nice would behave the same. 

all CKRM is doing here is providing a broader framework to hang the tunables
on.  it should be able to express all existing tunables in scope.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-22 Thread Matthew Helsley
On Fri, 2005-07-22 at 20:23 -0400, Mark Hahn wrote:
> > > actually, let me also say that CKRM is on a continuum that includes 
> > > current (global) /proc tuning for various subsystems, ulimits, and 
> > > at the other end, Xen/VMM's.  it's conceivable that CKRM could wind up
> > > being useful and fast enough to subsume the current global and per-proc
> > > tunables.  after all, there are MANY places where the kernel tries to 
> > > maintain some sort of context to allow it to tune/throttle/readahead
> > > based on some process-linked context.  "embracing and extending"
> > > those could make CKRM attractive to people outside the mainframe market.
> > 
> > Seems like an excellent suggestion to me! Yeah, it may be possible to
> > maintain the context the kernel keeps on a per-class basis instead of
> > globally or per-process. 
> 
> right, but are the CKRM people ready to take this on?  for instance,
> I just grepped 'throttle' in kernel/mm and found a per-task RM in 
> page-writeback.c.  it even has a vaguely class-oriented logic, since
> it exempts RT tasks.  if CKRM can become a way to make this stuff 
> cleaner and more effective (again, for normal tasks), then great.
> but bolting on a big new different, intrusive mechanism that slows
> down all normal jobs by 3% just so someone can run 10K mostly-idle
> guests on a giant Power box, well, that's gross.
> 
> > The real question is what constitutes a useful
> > "extension" :).
> 
> if CKRM is just extensions, I think it should be an external patch.
> if it provides a path towards unifying the many disparate RM mechanisms
> already in the kernel, great!

OK, so if it provides a path towards unifying these, what should happen
to the old interfaces when they conflict with those offered by CKRM?

For instance, I'm considering how a per-class (re)nice setting would
work. What should happen when the user (re)nices a process to a
different value than the nice of the process' class? Should CKRM:

a) disable the old interface by
i) removing it
ii) return an error when CKRM is active
iii) return an error when CKRM has specified a nice value for the
process via membership in a class
iv) return an error when the (re)nice value is inconsistent with the
nice value assigned to the class

b) trust the user, ignore the class nice value, and allow the new nice
value

I'd be tempted to do a.iv but it would require some modifications to a
system call. b probably wouldn't require any modifications to non-CKRM
files/dirs. 

This sort of question would probably come up for any other CKRM
"embraced-and-extended" tunables. Should they use the answer to this
one, or would it go on a case-by-case basis?

Thanks,
-Matt Helsley

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-22 Thread Mark Hahn
> > actually, let me also say that CKRM is on a continuum that includes 
> > current (global) /proc tuning for various subsystems, ulimits, and 
> > at the other end, Xen/VMM's.  it's conceivable that CKRM could wind up
> > being useful and fast enough to subsume the current global and per-proc
> > tunables.  after all, there are MANY places where the kernel tries to 
> > maintain some sort of context to allow it to tune/throttle/readahead
> > based on some process-linked context.  "embracing and extending"
> > those could make CKRM attractive to people outside the mainframe market.
> 
>   Seems like an excellent suggestion to me! Yeah, it may be possible to
> maintain the context the kernel keeps on a per-class basis instead of
> globally or per-process. 

right, but are the CKRM people ready to take this on?  for instance,
I just grepped 'throttle' in kernel/mm and found a per-task RM in 
page-writeback.c.  it even has a vaguely class-oriented logic, since
it exempts RT tasks.  if CKRM can become a way to make this stuff 
cleaner and more effective (again, for normal tasks), then great.
but bolting on a big new different, intrusive mechanism that slows
down all normal jobs by 3% just so someone can run 10K mostly-idle
guests on a giant Power box, well, that's gross.

> The real question is what constitutes a useful
> "extension" :).

if CKRM is just extensions, I think it should be an external patch.
if it provides a path towards unifying the many disparate RM mechanisms
already in the kernel, great!

>   I was thinking that per-class nice values might be a good place to
> start as well. One advantage of per-class as opposed to per-process nice
> is the class is less transient than the process since its lifetime is
> determined solely by the system administrator.

but the Linux RM needs to subsume traditional Unix process groups,
and inherited nice/schd class, and even CAP_ stuff.  I think CKRM
could start to do this, since classes are very general.
but merely adding a new, incompatible feature is just Not A Good Idea.

regards, mark hahn.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-22 Thread Matthew Helsley
On Fri, 2005-07-22 at 12:35 -0400, Mark Hahn wrote:


> actually, let me also say that CKRM is on a continuum that includes 
> current (global) /proc tuning for various subsystems, ulimits, and 
> at the other end, Xen/VMM's.  it's conceivable that CKRM could wind up
> being useful and fast enough to subsume the current global and per-proc
> tunables.  after all, there are MANY places where the kernel tries to 
> maintain some sort of context to allow it to tune/throttle/readahead
> based on some process-linked context.  "embracing and extending"
> those could make CKRM attractive to people outside the mainframe market.

Seems like an excellent suggestion to me! Yeah, it may be possible to
maintain the context the kernel keeps on a per-class basis instead of
globally or per-process. The real question is what constitutes a useful
"extension" :).

I was thinking that per-class nice values might be a good place to
start as well. One advantage of per-class as opposed to per-process nice
is the class is less transient than the process since its lifetime is
determined solely by the system administrator.

CKRM calls this kind of module a "resource controller". There's a small
HOWTO on writing resource controllers here:
http://ckrm.sourceforge.net/ckrm-controller-howto.txt
If anyone wants to investigate writing such a controller please feel
free to ask questions or send HOWTO feedback on the CKRM-Tech mailing
list at .

Thanks,
-Matt Helsley

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-22 Thread Paul Jackson
Shailabh wrote:
> So if the current CPU controller 
>   implementation is considered too intrusive/unacceptable, it can be 
> reworked or (and we certainly hope not) even rejected in perpetuity. 

It is certainly reasonable that you would hope such.

But this hypothetical possibility concerns me a little.  Where would
that leave CKRM, if it was in the mainline kernel, but there was no CPU
controller in the mainline kernel?  Wouldn't that be a rather serious
problem for many users of CKRM if they wanted to work on mainline
kernels?

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-22 Thread Alan Cox
On Gwe, 2005-07-22 at 12:35 -0400, Mark Hahn wrote:
> I imagine you, like me, are currently sitting in the Xen talk,

Out by a few thousand miles ;)

> and I don't believe they are or will do anything so dumb as to throw away
> or lose information.  yes, in principle, the logic will need to be 

They don't have it in the first place. 

> somewhere, and I'm suggesting that the virtualization logic should
> be in VMM-only code so it has literally zero effect on host-native 
> processes.  *or* the host-native fast-path.

I don't see why you are concerned. If the CKRM=n path is zero impact
then its irrelevant to you. Its more expensive to do a lot of resource
management at the VMM level because the virtualisation engine doesn't
know anything but its getting indications someone wants to be
bigger/smaller.


> but to really do CKRM, you are going to want quite extensive interaction with
> the scheduler, VM page replacement policies, etc.  all incredibly
> performance-sensitive areas.

Bingo - and areas the virtualiser can't see into, at least not unless it
uses the same hooks CKRM uses

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-22 Thread Mark Hahn
> > > the fast path slower and less maintainable.  if you are really concerned
> > > about isolating many competing servers on a single piece of hardware, then
> > > run separate virtualized environments, each with its own user-space.
> > 
> > And the virtualisation layer has to do the same job with less
> > information. That to me implies that the virtualisation case is likely
> > to be materially less efficient, its just the inefficiency you are
> > worried about is hidden in a different pieces of code.

I imagine you, like me, are currently sitting in the Xen talk,
and I don't believe they are or will do anything so dumb as to throw away
or lose information.  yes, in principle, the logic will need to be 
somewhere, and I'm suggesting that the virtualization logic should
be in VMM-only code so it has literally zero effect on host-native 
processes.  *or* the host-native fast-path.

> > Secondly a lot of this doesnt matter if CKRM=n compiles to no code
> > anyway
> 
> I'm actually trying to keep the impact of CKRM=y to near-zero, ergo
> only an impact if you create classes.  And even then, the goal is to
> keep that impact pretty small as well.

but to really do CKRM, you are going to want quite extensive interaction with
the scheduler, VM page replacement policies, etc.  all incredibly
performance-sensitive areas.

actually, let me also say that CKRM is on a continuum that includes 
current (global) /proc tuning for various subsystems, ulimits, and 
at the other end, Xen/VMM's.  it's conceivable that CKRM could wind up
being useful and fast enough to subsume the current global and per-proc
tunables.  after all, there are MANY places where the kernel tries to 
maintain some sort of context to allow it to tune/throttle/readahead
based on some process-linked context.  "embracing and extending"
those could make CKRM attractive to people outside the mainframe market.


> Plus you won't have to manage each operating system instance which
> can grow into a pain under virtualization.  But I still maintain that
> both have their place.

CKRM may have its place in an externally-maintained patch ;)

regards, mark hahn.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-22 Thread Gerrit Huizenga

On Fri, 22 Jul 2005 15:53:55 BST, Alan Cox wrote:
> On Gwe, 2005-07-22 at 00:53 -0400, Mark Hahn wrote:
> > the fast path slower and less maintainable.  if you are really concerned
> > about isolating many competing servers on a single piece of hardware, then
> > run separate virtualized environments, each with its own user-space.
> 
> And the virtualisation layer has to do the same job with less
> information. That to me implies that the virtualisation case is likely
> to be materially less efficient, its just the inefficiency you are
> worried about is hidden in a different pieces of code.
> 
> Secondly a lot of this doesnt matter if CKRM=n compiles to no code
> anyway

I'm actually trying to keep the impact of CKRM=y to near-zero, ergo
only an impact if you create classes.  And even then, the goal is to
keep that impact pretty small as well.

And yes, a hypervisor does have a lot more overhead in many forms.
Something like an overall 2-3% everywhere, where the CKRM impact is
likely to be so small as to be hard to measure in the individual
subsystems, and overall performance impact should be even smaller.
Plus you won't have to manage each operating system instance which
can grow into a pain under virtualization.  But I still maintain that
both have their place.

gerrit
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-22 Thread Alan Cox
On Gwe, 2005-07-22 at 00:53 -0400, Mark Hahn wrote:
> the fast path slower and less maintainable.  if you are really concerned
> about isolating many competing servers on a single piece of hardware, then
> run separate virtualized environments, each with its own user-space.

And the virtualisation layer has to do the same job with less
information. That to me implies that the virtualisation case is likely
to be materially less efficient, its just the inefficiency you are
worried about is hidden in a different pieces of code.

Secondly a lot of this doesnt matter if CKRM=n compiles to no code
anyway

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-22 Thread Alan Cox
On Gwe, 2005-07-22 at 00:53 -0400, Mark Hahn wrote:
 the fast path slower and less maintainable.  if you are really concerned
 about isolating many competing servers on a single piece of hardware, then
 run separate virtualized environments, each with its own user-space.

And the virtualisation layer has to do the same job with less
information. That to me implies that the virtualisation case is likely
to be materially less efficient, its just the inefficiency you are
worried about is hidden in a different pieces of code.

Secondly a lot of this doesnt matter if CKRM=n compiles to no code
anyway

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-22 Thread Gerrit Huizenga

On Fri, 22 Jul 2005 15:53:55 BST, Alan Cox wrote:
 On Gwe, 2005-07-22 at 00:53 -0400, Mark Hahn wrote:
  the fast path slower and less maintainable.  if you are really concerned
  about isolating many competing servers on a single piece of hardware, then
  run separate virtualized environments, each with its own user-space.
 
 And the virtualisation layer has to do the same job with less
 information. That to me implies that the virtualisation case is likely
 to be materially less efficient, its just the inefficiency you are
 worried about is hidden in a different pieces of code.
 
 Secondly a lot of this doesnt matter if CKRM=n compiles to no code
 anyway

I'm actually trying to keep the impact of CKRM=y to near-zero, ergo
only an impact if you create classes.  And even then, the goal is to
keep that impact pretty small as well.

And yes, a hypervisor does have a lot more overhead in many forms.
Something like an overall 2-3% everywhere, where the CKRM impact is
likely to be so small as to be hard to measure in the individual
subsystems, and overall performance impact should be even smaller.
Plus you won't have to manage each operating system instance which
can grow into a pain under virtualization.  But I still maintain that
both have their place.

gerrit
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-22 Thread Mark Hahn
   the fast path slower and less maintainable.  if you are really concerned
   about isolating many competing servers on a single piece of hardware, then
   run separate virtualized environments, each with its own user-space.
  
  And the virtualisation layer has to do the same job with less
  information. That to me implies that the virtualisation case is likely
  to be materially less efficient, its just the inefficiency you are
  worried about is hidden in a different pieces of code.

I imagine you, like me, are currently sitting in the Xen talk,
and I don't believe they are or will do anything so dumb as to throw away
or lose information.  yes, in principle, the logic will need to be 
somewhere, and I'm suggesting that the virtualization logic should
be in VMM-only code so it has literally zero effect on host-native 
processes.  *or* the host-native fast-path.

  Secondly a lot of this doesnt matter if CKRM=n compiles to no code
  anyway
 
 I'm actually trying to keep the impact of CKRM=y to near-zero, ergo
 only an impact if you create classes.  And even then, the goal is to
 keep that impact pretty small as well.

but to really do CKRM, you are going to want quite extensive interaction with
the scheduler, VM page replacement policies, etc.  all incredibly
performance-sensitive areas.

actually, let me also say that CKRM is on a continuum that includes 
current (global) /proc tuning for various subsystems, ulimits, and 
at the other end, Xen/VMM's.  it's conceivable that CKRM could wind up
being useful and fast enough to subsume the current global and per-proc
tunables.  after all, there are MANY places where the kernel tries to 
maintain some sort of context to allow it to tune/throttle/readahead
based on some process-linked context.  embracing and extending
those could make CKRM attractive to people outside the mainframe market.


 Plus you won't have to manage each operating system instance which
 can grow into a pain under virtualization.  But I still maintain that
 both have their place.

CKRM may have its place in an externally-maintained patch ;)

regards, mark hahn.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-22 Thread Alan Cox
On Gwe, 2005-07-22 at 12:35 -0400, Mark Hahn wrote:
 I imagine you, like me, are currently sitting in the Xen talk,

Out by a few thousand miles ;)

 and I don't believe they are or will do anything so dumb as to throw away
 or lose information.  yes, in principle, the logic will need to be 

They don't have it in the first place. 

 somewhere, and I'm suggesting that the virtualization logic should
 be in VMM-only code so it has literally zero effect on host-native 
 processes.  *or* the host-native fast-path.

I don't see why you are concerned. If the CKRM=n path is zero impact
then its irrelevant to you. Its more expensive to do a lot of resource
management at the VMM level because the virtualisation engine doesn't
know anything but its getting indications someone wants to be
bigger/smaller.


 but to really do CKRM, you are going to want quite extensive interaction with
 the scheduler, VM page replacement policies, etc.  all incredibly
 performance-sensitive areas.

Bingo - and areas the virtualiser can't see into, at least not unless it
uses the same hooks CKRM uses

Alan

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-22 Thread Paul Jackson
Shailabh wrote:
 So if the current CPU controller 
   implementation is considered too intrusive/unacceptable, it can be 
 reworked or (and we certainly hope not) even rejected in perpetuity. 

It is certainly reasonable that you would hope such.

But this hypothetical possibility concerns me a little.  Where would
that leave CKRM, if it was in the mainline kernel, but there was no CPU
controller in the mainline kernel?  Wouldn't that be a rather serious
problem for many users of CKRM if they wanted to work on mainline
kernels?

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-22 Thread Matthew Helsley
On Fri, 2005-07-22 at 12:35 -0400, Mark Hahn wrote:
snip

 actually, let me also say that CKRM is on a continuum that includes 
 current (global) /proc tuning for various subsystems, ulimits, and 
 at the other end, Xen/VMM's.  it's conceivable that CKRM could wind up
 being useful and fast enough to subsume the current global and per-proc
 tunables.  after all, there are MANY places where the kernel tries to 
 maintain some sort of context to allow it to tune/throttle/readahead
 based on some process-linked context.  embracing and extending
 those could make CKRM attractive to people outside the mainframe market.

Seems like an excellent suggestion to me! Yeah, it may be possible to
maintain the context the kernel keeps on a per-class basis instead of
globally or per-process. The real question is what constitutes a useful
extension :).

I was thinking that per-class nice values might be a good place to
start as well. One advantage of per-class as opposed to per-process nice
is the class is less transient than the process since its lifetime is
determined solely by the system administrator.

CKRM calls this kind of module a resource controller. There's a small
HOWTO on writing resource controllers here:
http://ckrm.sourceforge.net/ckrm-controller-howto.txt
If anyone wants to investigate writing such a controller please feel
free to ask questions or send HOWTO feedback on the CKRM-Tech mailing
list at ckrm-tech@lists.sourceforge.net.

Thanks,
-Matt Helsley

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-22 Thread Mark Hahn
  actually, let me also say that CKRM is on a continuum that includes 
  current (global) /proc tuning for various subsystems, ulimits, and 
  at the other end, Xen/VMM's.  it's conceivable that CKRM could wind up
  being useful and fast enough to subsume the current global and per-proc
  tunables.  after all, there are MANY places where the kernel tries to 
  maintain some sort of context to allow it to tune/throttle/readahead
  based on some process-linked context.  embracing and extending
  those could make CKRM attractive to people outside the mainframe market.
 
   Seems like an excellent suggestion to me! Yeah, it may be possible to
 maintain the context the kernel keeps on a per-class basis instead of
 globally or per-process. 

right, but are the CKRM people ready to take this on?  for instance,
I just grepped 'throttle' in kernel/mm and found a per-task RM in 
page-writeback.c.  it even has a vaguely class-oriented logic, since
it exempts RT tasks.  if CKRM can become a way to make this stuff 
cleaner and more effective (again, for normal tasks), then great.
but bolting on a big new different, intrusive mechanism that slows
down all normal jobs by 3% just so someone can run 10K mostly-idle
guests on a giant Power box, well, that's gross.

 The real question is what constitutes a useful
 extension :).

if CKRM is just extensions, I think it should be an external patch.
if it provides a path towards unifying the many disparate RM mechanisms
already in the kernel, great!

   I was thinking that per-class nice values might be a good place to
 start as well. One advantage of per-class as opposed to per-process nice
 is the class is less transient than the process since its lifetime is
 determined solely by the system administrator.

but the Linux RM needs to subsume traditional Unix process groups,
and inherited nice/schd class, and even CAP_ stuff.  I think CKRM
could start to do this, since classes are very general.
but merely adding a new, incompatible feature is just Not A Good Idea.

regards, mark hahn.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-22 Thread Matthew Helsley
On Fri, 2005-07-22 at 20:23 -0400, Mark Hahn wrote:
   actually, let me also say that CKRM is on a continuum that includes 
   current (global) /proc tuning for various subsystems, ulimits, and 
   at the other end, Xen/VMM's.  it's conceivable that CKRM could wind up
   being useful and fast enough to subsume the current global and per-proc
   tunables.  after all, there are MANY places where the kernel tries to 
   maintain some sort of context to allow it to tune/throttle/readahead
   based on some process-linked context.  embracing and extending
   those could make CKRM attractive to people outside the mainframe market.
  
  Seems like an excellent suggestion to me! Yeah, it may be possible to
  maintain the context the kernel keeps on a per-class basis instead of
  globally or per-process. 
 
 right, but are the CKRM people ready to take this on?  for instance,
 I just grepped 'throttle' in kernel/mm and found a per-task RM in 
 page-writeback.c.  it even has a vaguely class-oriented logic, since
 it exempts RT tasks.  if CKRM can become a way to make this stuff 
 cleaner and more effective (again, for normal tasks), then great.
 but bolting on a big new different, intrusive mechanism that slows
 down all normal jobs by 3% just so someone can run 10K mostly-idle
 guests on a giant Power box, well, that's gross.
 
  The real question is what constitutes a useful
  extension :).
 
 if CKRM is just extensions, I think it should be an external patch.
 if it provides a path towards unifying the many disparate RM mechanisms
 already in the kernel, great!

OK, so if it provides a path towards unifying these, what should happen
to the old interfaces when they conflict with those offered by CKRM?

For instance, I'm considering how a per-class (re)nice setting would
work. What should happen when the user (re)nices a process to a
different value than the nice of the process' class? Should CKRM:

a) disable the old interface by
i) removing it
ii) return an error when CKRM is active
iii) return an error when CKRM has specified a nice value for the
process via membership in a class
iv) return an error when the (re)nice value is inconsistent with the
nice value assigned to the class

b) trust the user, ignore the class nice value, and allow the new nice
value

I'd be tempted to do a.iv but it would require some modifications to a
system call. b probably wouldn't require any modifications to non-CKRM
files/dirs. 

This sort of question would probably come up for any other CKRM
embraced-and-extended tunables. Should they use the answer to this
one, or would it go on a case-by-case basis?

Thanks,
-Matt Helsley

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Mark Hahn
> of the various environments.  I don't think you are one of those end
> users, though.  I don't think I'm required to make everyone happy all
> the time.  ;)

the issue is whether CKRM (in it's real form, not this thin edge)
will noticably hurt Linux's fast-path.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Gerrit Huizenga

On Fri, 22 Jul 2005 00:53:58 EDT, Mark Hahn wrote:
> > > > yes, that's the crux.  CKRM is all about resolving conflicting resource 
> > > > demands in a multi-user, multi-server, multi-purpose machine.  this is 
> > > > a 
> > > > huge undertaking, and I'd argue that it's completely inappropriate for 
> > > > *most* servers.  that is, computers are generally so damn cheap that 
> > > > the clear trend is towards dedicating a machine to a specific purpose, 
> > > > rather than running eg, shell/MUA/MTA/FS/DB/etc all on a single 
> > > > machine.  
> >  
> > This is a big NAK - if computers are so damn cheap, why is virtualization
> > and consolidation such a big deal?  Well, the answer is actually that
> 
> yes, you did miss my point.  I'm actually arguing that it's bad design
> to attempt to arbitrate within a single shared user-space.  you make 
> the fast path slower and less maintainable.  if you are really concerned
> about isolating many competing servers on a single piece of hardware, then
> run separate virtualized environments, each with its own user-space.

I'm willing to agree to disagree.  I'm in favor of full virtualization
as well, as it is appropriate to certain styles of workloads.  I also
have enough end users who also want to share user level, share tasks,
yet also have some level of balancing between the resource consumption
of the various environments.  I don't think you are one of those end
users, though.  I don't think I'm required to make everyone happy all
the time.  ;)

BTW, does your mailer purposefully remove cc:'s?  Seems like that is
normally considered impolite.

gerrit
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Mark Hahn
> > > yes, that's the crux.  CKRM is all about resolving conflicting resource 
> > > demands in a multi-user, multi-server, multi-purpose machine.  this is a 
> > > huge undertaking, and I'd argue that it's completely inappropriate for 
> > > *most* servers.  that is, computers are generally so damn cheap that 
> > > the clear trend is towards dedicating a machine to a specific purpose, 
> > > rather than running eg, shell/MUA/MTA/FS/DB/etc all on a single machine.  
>  
> This is a big NAK - if computers are so damn cheap, why is virtualization
> and consolidation such a big deal?  Well, the answer is actually that

yes, you did miss my point.  I'm actually arguing that it's bad design
to attempt to arbitrate within a single shared user-space.  you make 
the fast path slower and less maintainable.  if you are really concerned
about isolating many competing servers on a single piece of hardware, then
run separate virtualized environments, each with its own user-space.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Gerrit Huizenga

Sorry - I didn't see Mark's original comment, so I'm replying to
a reply which I did get.  ;-)

On Thu, 21 Jul 2005 23:59:09 EDT, Shailabh Nagar wrote:
> Mark Hahn wrote:
> >>I suspect that the main problem is that this patch is not a mainstream
> >>kernel feature that will gain multiple uses, but rather provides
> >>support for a specific vendor middleware product used by that
> >>vendor and a few closely allied vendors.  If it were smaller or
> >>less intrusive, such as a driver, this would not be a big problem.
> >>That's not the case.
> > 
> > 
> > yes, that's the crux.  CKRM is all about resolving conflicting resource 
> > demands in a multi-user, multi-server, multi-purpose machine.  this is a 
> > huge undertaking, and I'd argue that it's completely inappropriate for 
> > *most* servers.  that is, computers are generally so damn cheap that 
> > the clear trend is towards dedicating a machine to a specific purpose, 
> > rather than running eg, shell/MUA/MTA/FS/DB/etc all on a single machine.  
 
This is a big NAK - if computers are so damn cheap, why is virtualization
and consolidation such a big deal?  Well, the answer is actually that
floor space, heat, and power are also continuing to be very important
in the overall equation.  And, buying machines which are dedicated but
often 80-99% idle occasionally bothers people who are concerned about
wasting planetary resources for no good reason.  Yeah, we can stamp out
thousands of metal boxes, but if just a couple can do the same work,
well, let's consolidate.  Less wasted metal, less wasted heat, less
wasted power, less air conditioning, wow, we are now part of the
eco-computing movement!  ;-)

> > this is *directly* in conflict with certain prominent products, such as 
> > the Altix and various less-prominent Linux-based mainframes.  they're all
> > about partitioning/virtualization - the big-iron aesthetic of splitting up 
> > a single machine.  note that it's not just about "big", since cluster-based 
> > approaches can clearly scale far past big-iron, and are in effect statically
> > partitioned.  yes, buying a hideously expensive single box, and then 
> > chopping 
> > it into little pieces is more than a little bizarre, and is mainly based
> > on a couple assumptions:

Well, yeah IBM has been doing this virtualization & partitioning stuff
for ages at lots of different levels for lots of reasons.  If we are
in such direct conflict with Altix, aren't we also in conflict with our
own lines of business which do the same thing?  But, well, we aren't
in conflict - this is a complementary part of our overall capabilities.

> > - that clusters are hard.  really, they aren't.  they are not 
> > necessarily higher-maintenance, can be far more robust, usually
> > do cost less.  just about the only bad thing about clusters is 
> > that they tend to be somewhat larger in size.

This is orthogonal to clusters.  Or, well, we are even using CKRM today
is some grid/cluster style applications.  But that has no bearing on
whether or not clusters is useful.

> > - that partitioning actually makes sense.  the appeal is that if 
> > you have a partition to yourself, you can only hurt yourself.
> > but it also follows that burstiness in resource demand cannot be 
> > overlapped without either constantly tuning the partitions or 
> > infringing on the guarantee.
 
Well, if you don't think it makes sense, don't buy one.  And stay away
from Xen, VMware, VirtualIron, PowerPC/pSeries hardware, Mainframes,
Altix, IA64 platforms, Intel VT, AMD Pacifica, and, well, anyone else
that is working to support virtualization, which is one key level of
partitioning.

I'm sorry but I'm not buying your argument here at all - it just has
no relationship to what's going on at the user side as near as I can
tell.

> > CKRM is one of those things that could be done to Linux, and will benefit a
> > few, but which will almost certainly hurt *most* of the community.
> > 
> > let me say that the CKRM design is actually quite good.  the issue is 
> > whether 
> > the extensive hooks it requires can be done (at all) in a way which does 
> > not disporportionately hurt maintainability or efficiency.
 
Can you be more clear on how this will hurt *most* of the community?
CKRM when not in use is not in any way intrusive.  Can you take a look
at the patch again and point out the "extensive" hooks for me?  I've
looked at "all" of them and I have trouble calling a couple of callbacks
"extensive hooks".

> > CKRM requires hooks into every resource-allocation decision fastpath:
> > - if CKRM is not CONFIG, the only overhead is software maintenance.
> > - if CKRM is CONFIG but not loaded, the overhead is a pointer check.
> > - if CKRM is CONFIG and loaded, the overhead is a pointer check
> > and a nontrivial callback.

You left out a case here:  CKRM is CONFIG and loaded and classes are
defined.

In all of the cases that you mentioned, if there are no 

Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Shailabh Nagar

Paul Jackson wrote:

Martin wrote:


No offense, but I really don't see why this matters at all ... the stuff
in -mm is what's under consideration for merging - what's in SuSE is ...



Yes - what's in SuSE doesn't matter, at least not directly.

No - we are not just considering the CKRM that is in *-mm now, but also
what can be expected to be proposed as part of CKRM in the future.

If the CPU controller is not in *-mm now, but if one might reasonably
expect it to be proposed as part of CKRM in the future, then we need to
understand that.  This is perhaps especially important in this case,
where there is some reason to suspect that this additional piece is
both non-trivial and essential to CKRM's purpose.



The CKRM design explicitly considered this problem of some controllers 
being more unacceptable than the rest and part of the indirections 
introduced in CKRM are to allow the kernel community the flexibility of 
cherry-picking acceptable controllers. So if the current CPU controller 
  implementation is considered too intrusive/unacceptable, it can be 
reworked or (and we certainly hope not) even rejected in perpetuity. 
Same for the other controllers as and when they're introduced and 
proposed for inclusion.



-- Shailabh




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Shailabh Nagar

Mark Hahn wrote:

I suspect that the main problem is that this patch is not a mainstream
kernel feature that will gain multiple uses, but rather provides
support for a specific vendor middleware product used by that
vendor and a few closely allied vendors.  If it were smaller or
less intrusive, such as a driver, this would not be a big problem.
That's not the case.



yes, that's the crux.  CKRM is all about resolving conflicting resource 
demands in a multi-user, multi-server, multi-purpose machine.  this is a 
huge undertaking, and I'd argue that it's completely inappropriate for 
*most* servers.  that is, computers are generally so damn cheap that 
the clear trend is towards dedicating a machine to a specific purpose, 
rather than running eg, shell/MUA/MTA/FS/DB/etc all on a single machine.  


The argument about scale-up vs. scale-out is nowhere close to being 
resolved. To argue that any support for performance partitioning (which 
CKRM does) is in support of a lost cause is premature to say the least.


this is *directly* in conflict with certain prominent products, such as 
the Altix and various less-prominent Linux-based mainframes.  they're all
about partitioning/virtualization - the big-iron aesthetic of splitting up 
a single machine.  note that it's not just about "big", since cluster-based 
approaches can clearly scale far past big-iron, and are in effect statically
partitioned.  yes, buying a hideously expensive single box, and then chopping 
it into little pieces is more than a little bizarre, and is mainly based

on a couple assumptions:

	- that clusters are hard.  really, they aren't.  they are not 
	necessarily higher-maintenance, can be far more robust, usually
	do cost less.  just about the only bad thing about clusters is 
	that they tend to be somewhat larger in size.


	- that partitioning actually makes sense.  the appeal is that if 
	you have a partition to yourself, you can only hurt yourself.
	but it also follows that burstiness in resource demand cannot be 
	overlapped without either constantly tuning the partitions or 
	infringing on the guarantee.


"constantly tuning the partitions" is effectively whats done by workload 
managers. But our earlier presentations and papers have made the case 
that this is not the only utility for performance isolation - simple 
needs like isolating one user from another on a general purpose server 
is also a need that cannot be met by any existing or proposed Linux 
kernel mechanisms today.


If partitioning made so little sense and the case for clusters was that 
obvious, one would be hard put to explain why server consolidation is 
being actively pursued by so many firms, Solaris is bothering with 
coming up with Containers and Xen/VMWare getting all this attention.

I don't think the concept of partitioning can be dismissed so easily.

Of course, it must be noted that CKRM only provides performance 
isolation not fault isolation. But there is a need for that. Whether 
Linux chooses to let this need influence its design is another matter 
(which I hope we'll also discuss besides the implementation issues).



CKRM is one of those things that could be done to Linux, and will benefit a
few, but which will almost certainly hurt *most* of the community.

let me say that the CKRM design is actually quite good.  the issue is whether 
the extensive hooks it requires can be done (at all) in a way which does 
not disporportionately hurt maintainability or efficiency.


If there are suggestions on implementing this better, it'll certainly be 
very welcome.




CKRM requires hooks into every resource-allocation decision fastpath:
- if CKRM is not CONFIG, the only overhead is software maintenance.
- if CKRM is CONFIG but not loaded, the overhead is a pointer check.
- if CKRM is CONFIG and loaded, the overhead is a pointer check
and a nontrivial callback.

but really, this is only for CKRM-enforced limits.  CKRM really wants to
change behavior in a more "weighted" way, not just causing an
allocation/fork/packet to fail.  a really meaningful CKRM needs to 
be tightly integrated into each resource manager - effecting each scheduler
(process, memory, IO, net).  I don't really see how full-on CKRM can be 
compiled out, unless these schedulers are made fully pluggable.


This is a valid point for the CPU, memory and network controllers (I/O 
can be made pluggable quite easily). For the CPU controller in SuSE, the 
CKRM CPU controller can be turned on and off dynamically at runtime. 
Exploring a similar option for  memory and network (incurring only a 
pointer check) could be explored. Keeping the overhead close to zero for 
kernel users not interested in CKRM is certainly one of our objectives.


finally, I observe that pluggable, class-based resource _limits_ could 
probably be done without callbacks and potentially with low overhead.
but mere limits doesn't meet CKRM's goal of flexible, wide-spread resource 
partitioning 

Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Gerrit Huizenga

On Fri, 22 Jul 2005 13:46:37 +1000, Peter Williams wrote:
> Gerrit Huizenga wrote:
> >>I imagine that the cpu controller is missing from this version of CKRM 
> >>because the bugs introduced to the cpu controller during upgrading from 
> >>2.6.5 to 2.6.10 version have not yet been resolved.
> > 
> > 
> >  I don't know what bugs you are referring to here.  I don't think we
> >  have any open defects with SuSE on the CPU scheduler in their releases.
> >  And that is not at all related to the reason for not having a CPU
> >  controller in the current patch set.
> 
> The bugs were in the patches for the 2.6.10 kernel not SuSE's 2.6.5 
> kernel.  I reported some of them to the ckrm-tech mailing list at the 
> time.  There were changes to the vanilla scheduler between 2.6.5 and 
> 2.6.10 that were not handled properly when the CKRM scheduler was 
> upgraded to the 2.6.10 kernel.

Ah - okay - that makes sense.  Those patches haven't gone through my
review yet and I'm not directly tracking their status until I figure
out what the right direction is with respect to a fair share style
scheduler of some sort.  I'm not convinced that the current one is
something that is ready for mainline or is necessarily the right answer
currently.  But we do need to figure out something that will provide
some level of CPU allocation minima & maxima for a class, where that
solution will work well on a laptop or a huge server.

Ideas in that space are welcome - I know of several proposed ideas
in progress - the scheduler in SuSE and the forward port to 2.6.10
that you referred to; an idea for building a very simple interface
on top of sched_domains for SMP systems (no fairness within a
single CPU) and a proposal for timeslice manipulation that might
provide some fairness that the Fujitsu folks are thinking about.
There are probably others and honestly, I don't have any clue yet as
to what the right long term/mainline direction should be here as yet.

gerrit
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Paul Jackson
Martin wrote:
> No offense, but I really don't see why this matters at all ... the stuff
> in -mm is what's under consideration for merging - what's in SuSE is ...

Yes - what's in SuSE doesn't matter, at least not directly.

No - we are not just considering the CKRM that is in *-mm now, but also
what can be expected to be proposed as part of CKRM in the future.

If the CPU controller is not in *-mm now, but if one might reasonably
expect it to be proposed as part of CKRM in the future, then we need to
understand that.  This is perhaps especially important in this case,
where there is some reason to suspect that this additional piece is
both non-trivial and essential to CKRM's purpose.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Peter Williams

Gerrit Huizenga wrote:

On Fri, 22 Jul 2005 11:06:14 +1000, Peter Williams wrote:


Paul Jackson wrote:


Matthew wrote:



I don't see the large ifdefs you're referring to in -mm's
kernel/sched.c.



Perhaps someone who knows CKRM better than I can explain why the CKRM
version in some SuSE releases based on 2.6.5 kernels has substantial
code and some large ifdef's in sched.c, but the CKRM in *-mm doesn't.
Or perhaps I'm confused.  There's a good chance that this represents
ongoing improvements that CKRM is making to reduce their footprint
in core kernel code.  Or perhaps there is a more sophisticated cpu
controller in the SuSE kernel.


As there is NO CKRM cpu controller in 2.6.13-rc3-mm1 (that I can see) 
the one in 2.6.5 is certainly more sophisticated :-).  So the reason 
that the considerable mangling of sched.c evident in SuSE's 2.6.5 kernel 
source is not present is that the cpu controller is not included in 
these patches.


 
 Yeah - I don't really consider the current CPU controller code something

 ready for consideration yet for mainline merging.  That doesn't mean
 we don't want a CPU controller for CKRM - just that what we have
 doesn't integrate cleanly/nicely with mainline.


I imagine that the cpu controller is missing from this version of CKRM 
because the bugs introduced to the cpu controller during upgrading from 
2.6.5 to 2.6.10 version have not yet been resolved.



 I don't know what bugs you are referring to here.  I don't think we
 have any open defects with SuSE on the CPU scheduler in their releases.
 And that is not at all related to the reason for not having a CPU
 controller in the current patch set.


The bugs were in the patches for the 2.6.10 kernel not SuSE's 2.6.5 
kernel.  I reported some of them to the ckrm-tech mailing list at the 
time.  There were changes to the vanilla scheduler between 2.6.5 and 
2.6.10 that were not handled properly when the CKRM scheduler was 
upgraded to the 2.6.10 kernel.


Peter
--
Peter Williams   [EMAIL PROTECTED]

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Gerrit Huizenga

On Fri, 22 Jul 2005 11:06:14 +1000, Peter Williams wrote:
> Paul Jackson wrote:
> > Matthew wrote:
> > 
> >>I don't see the large ifdefs you're referring to in -mm's
> >>kernel/sched.c.
> > 
> > 
> > Perhaps someone who knows CKRM better than I can explain why the CKRM
> > version in some SuSE releases based on 2.6.5 kernels has substantial
> > code and some large ifdef's in sched.c, but the CKRM in *-mm doesn't.
> > Or perhaps I'm confused.  There's a good chance that this represents
> > ongoing improvements that CKRM is making to reduce their footprint
> > in core kernel code.  Or perhaps there is a more sophisticated cpu
> > controller in the SuSE kernel.
> 
> As there is NO CKRM cpu controller in 2.6.13-rc3-mm1 (that I can see) 
> the one in 2.6.5 is certainly more sophisticated :-).  So the reason 
> that the considerable mangling of sched.c evident in SuSE's 2.6.5 kernel 
> source is not present is that the cpu controller is not included in 
> these patches.
 
 Yeah - I don't really consider the current CPU controller code something
 ready for consideration yet for mainline merging.  That doesn't mean
 we don't want a CPU controller for CKRM - just that what we have
 doesn't integrate cleanly/nicely with mainline.

> I imagine that the cpu controller is missing from this version of CKRM 
> because the bugs introduced to the cpu controller during upgrading from 
> 2.6.5 to 2.6.10 version have not yet been resolved.

 I don't know what bugs you are referring to here.  I don't think we
 have any open defects with SuSE on the CPU scheduler in their releases.
 And that is not at all related to the reason for not having a CPU
 controller in the current patch set.

gerrit
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Peter Williams

Paul Jackson wrote:

Matthew wrote:


I don't see the large ifdefs you're referring to in -mm's
kernel/sched.c.



Perhaps someone who knows CKRM better than I can explain why the CKRM
version in some SuSE releases based on 2.6.5 kernels has substantial
code and some large ifdef's in sched.c, but the CKRM in *-mm doesn't.
Or perhaps I'm confused.  There's a good chance that this represents
ongoing improvements that CKRM is making to reduce their footprint
in core kernel code.  Or perhaps there is a more sophisticated cpu
controller in the SuSE kernel.


As there is NO CKRM cpu controller in 2.6.13-rc3-mm1 (that I can see) 
the one in 2.6.5 is certainly more sophisticated :-).  So the reason 
that the considerable mangling of sched.c evident in SuSE's 2.6.5 kernel 
source is not present is that the cpu controller is not included in 
these patches.


I imagine that the cpu controller is missing from this version of CKRM 
because the bugs introduced to the cpu controller during upgrading from 
2.6.5 to 2.6.10 version have not yet been resolved.


Peter
--
Peter Williams   [EMAIL PROTECTED]

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Martin J. Bligh

Paul Jackson wrote:


Matthew wrote:
 


Perhaps someone who knows CKRM better than I can explain why the CKRM
version in some SuSE releases based on 2.6.5 kernels has substantial
code and some large ifdef's in sched.c, but the CKRM in *-mm doesn't.
Or perhaps I'm confused.  There's a good chance that this represents
ongoing improvements that CKRM is making to reduce their footprint
in core kernel code.  Or perhaps there is a more sophisticated cpu
controller in the SuSE kernel.
 



No offense, but I really don't see why this matters at all ... the stuff
in -mm is what's under consideration for merging - what's in SuSE is
wholly irrelevant ? One obvious thing is that that codebase will be
much older ... would be useful if people can direct critiques at the
current codebase ;-)

M.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Paul Jackson
Matthew wrote:
> I don't see the large ifdefs you're referring to in -mm's
> kernel/sched.c.

Perhaps someone who knows CKRM better than I can explain why the CKRM
version in some SuSE releases based on 2.6.5 kernels has substantial
code and some large ifdef's in sched.c, but the CKRM in *-mm doesn't.
Or perhaps I'm confused.  There's a good chance that this represents
ongoing improvements that CKRM is making to reduce their footprint
in core kernel code.  Or perhaps there is a more sophisticated cpu
controller in the SuSE kernel.


> Have you looked at more
> recent benchmarks posted on CKRM-Tech around April 15th 2005?
> ...
> http://ckrm.sourceforge.net/downloads/ckrm-ols04-slides.pdf 

I had not seen these before.  Thanks for the pointer.


> The Rule-Based Classification Engine (RBCE) makes CKRM useful
> without middleware.

I'd be encouraged more if this went one step further, past pointing
out that the API can be manipulated from the shell without requiring C
code, to providing examples of who intends to _directly_ use this
interface. The issue is perhaps less whether it's API is naturally C or
shell code, or more of how many actual, independent, uses of this API
are known to the community.  A non-trivial API and mechanism that
is de facto captive to a single middleware implementation (which
may or may not apply here - I don't know) creates an additional review
burden, because some of the natural forces that guide us to healthy
long lasting interfaces are missing.  If that concern applies here,
it's certainly not insurmountable - but it should in my view raise the
review barrier to acceptance.  If other middleware or direct users
are not essentially performing some of the review for us, we have to do
it here with greater thoroughness.


> If you could be more specific I'd be able to
> respond in less general and abstract terms.

Good come back .

I made an effort along these lines last year, in the thread
I referenced a few days ago:

Classes: 1) what are they, 2) what is their name?

http://sourceforge.net/mailarchive/forum.php?thread_id=5328162_id=35191

I doubt that it I have much more to contribute along
these lines now.

Sorry.

> I haven't seen this limitation [128 cpus] ...

Good - I presume that there is no longer, if there ever was, such a
limitation.

Thanks for you reply.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Matthew Helsley
On Sun, 2005-07-17 at 08:20 -0700, Paul Jackson wrote:


> It is somewhat intrusive in the areas it controls, such as some large
> ifdef's in kernel/sched.c.

I don't see the large ifdefs you're referring to in -mm's
kernel/sched.c.

> The sched hooks may well impact the cost of maintaining the sched code,
> which is always a hotbed of Linux kernel development.  However others
> who work in that area will have to speak to that concern.

I don't see the hooks you're referring to in the -mm scheduler.

> I tried just now to read through the ckrm hooks in fork, to see
> what sort of impact they might have on scalability on large systems.
> But I gave up after a couple layers of indirection.  I saw several
> atomic counters and a couple of spinlocks that I suspect (not at all
> sure) lay on the fork main code path.  I'd be surprised if this didn't
> impact scalability.  Earlier, according to my notes, I saw mention of
> lmbench results in the OLS 2004 slides, indicating a several percent
> cost of available cpu cycles.

The OLS2004 slides are roughly 1 year old. Have you looked at more
recent benchmarks posted on CKRM-Tech around April 15th 2005? They
should be available in the CKRM-Tech archives on SourceForge at
http://sourceforge.net/mailarchive/forum.php?thread_id=7025751_id=35191

(OLS 2004 Slide 24 of
http://ckrm.sourceforge.net/downloads/ckrm-ols04-slides.pdf )

The OLS slide indicates that the overhead is generally less than
0.5usec compared to a total context switch time of anywhere from 2 to
5.5usec. There appears to be little difference in scalability since the
overhead appears to oscillate around a constant.



> vendor has a serious middleware software product that provides full
> CKRM support.  Acceptance of CKRM would be easier if multiple competing
> middleware vendors were using it.  It is also a concern that CKRM
> is not really usable for its primary intended purpose except if it
> is accompanied by this corresponding middleware, which I presume is

The Rule-Based Classification Engine (RBCE) makes CKRM useful without
middleware. It uses a table of rules to classify tasks. For example
rules that would classify shells:

echo 'path=/bin/bash,class=/rcfs/taskclass/shells' > 
/rcfs/ce/rules/classify_bash_shells
echo 'path=/bin/tcsh,class=/rcfs/taskclass/shells' > 
/rcfs/ce/rules/classify_tcsh_shells
..

And class shares would control the fork rate of those shells:

echo 'res=numtasks,forkrate=1,forkrate_interval=1' > 
'/rcfs/taskclass/config'
echo 'res=numtasks,guarantee=1000,limit=5000' > '/rcfs/taskclass/shells'

No middleware necessary.

 

> CKRM is in part a generalization and descendent of what I call fair
> share schedulers.  For example, the fork hooks for CKRM include a
> forkrates controller, to slow down the rate of forking of tasks using
> too much resources.
> 
> No doubt the CKRM experts are already familiar with these, but for
> the possible benefit of other readers:
> 
>   UNICOS Resource Administration - Chapter 4. Fair-share Scheduler
>   
> http://oscinfo.osc.edu:8080/dynaweb/all/004-2302-001/@Generic__BookTextView/22883
> 
>   SHARE II -- A User Administration and Resource Control System for UNIX
>   http://www.c-side.com/c/papers/lisa-91.html
> 
>   Solaris Resource Manager White Paper
>   http://wwws.sun.com/software/resourcemgr/wp-mixed/
> 
>   ON THE PERFORMANCE IMPACT OF FAIR SHARE SCHEDULING
>   http://www.cs.umb.edu/~eb/goalmode/cmg2000final.htm
> 
>   A Fair Share Scheduler, J. Kay and P. Lauder
>   Communications of the ACM, January 1988, Volume 31, Number 1, pp 44-55.
> 
> The documentation that I've noticed (likely I've missed something)
> doesn't do an adequate job of making the case - providing the
> motivation and context essential to understanding this patch set.

The choice of algorithm is entirely up to the scheduler, memory
allocator, etc. CKRM currently provides an interface for reading share
values and does not impose any meaning on those shares -- that is the
role of the scheduler.

> Because CKRM provides an infrastructure for multiple controllers
> (limiting forks, memory allocation and network rates) and multiple
> classifiers and policies, its critical interfaces have rather
> generic and abstract names.  This makes it difficult for others to
> approach CKRM, reducing the rate of peer review by other Linux kernel
> developers, which is perhaps the key impediment to acceptance of CKRM.
> If anything, CKRM tends to be a little too abstract.

Generic and abstract names are appropriate for infrastructure that is
not tied to hardware. If you could be more specific I'd be able to
respond in less general and abstract terms.



> My notes from many months ago indicate something about a 128 CPU
> limit in CKRM.  I don't know why, nor if it still applies.  It is
> certainly a smaller limit than the systems I care about.

I haven't seen this limitation in the CKRM patches that went into -mm
and I'd like to look into 

Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Matthew Helsley
On Sun, 2005-07-17 at 08:20 -0700, Paul Jackson wrote:
snip

 It is somewhat intrusive in the areas it controls, such as some large
 ifdef's in kernel/sched.c.

I don't see the large ifdefs you're referring to in -mm's
kernel/sched.c.

 The sched hooks may well impact the cost of maintaining the sched code,
 which is always a hotbed of Linux kernel development.  However others
 who work in that area will have to speak to that concern.

I don't see the hooks you're referring to in the -mm scheduler.

 I tried just now to read through the ckrm hooks in fork, to see
 what sort of impact they might have on scalability on large systems.
 But I gave up after a couple layers of indirection.  I saw several
 atomic counters and a couple of spinlocks that I suspect (not at all
 sure) lay on the fork main code path.  I'd be surprised if this didn't
 impact scalability.  Earlier, according to my notes, I saw mention of
 lmbench results in the OLS 2004 slides, indicating a several percent
 cost of available cpu cycles.

The OLS2004 slides are roughly 1 year old. Have you looked at more
recent benchmarks posted on CKRM-Tech around April 15th 2005? They
should be available in the CKRM-Tech archives on SourceForge at
http://sourceforge.net/mailarchive/forum.php?thread_id=7025751forum_id=35191

(OLS 2004 Slide 24 of
http://ckrm.sourceforge.net/downloads/ckrm-ols04-slides.pdf )

The OLS slide indicates that the overhead is generally less than
0.5usec compared to a total context switch time of anywhere from 2 to
5.5usec. There appears to be little difference in scalability since the
overhead appears to oscillate around a constant.

snip

 vendor has a serious middleware software product that provides full
 CKRM support.  Acceptance of CKRM would be easier if multiple competing
 middleware vendors were using it.  It is also a concern that CKRM
 is not really usable for its primary intended purpose except if it
 is accompanied by this corresponding middleware, which I presume is

The Rule-Based Classification Engine (RBCE) makes CKRM useful without
middleware. It uses a table of rules to classify tasks. For example
rules that would classify shells:

echo 'path=/bin/bash,class=/rcfs/taskclass/shells'  
/rcfs/ce/rules/classify_bash_shells
echo 'path=/bin/tcsh,class=/rcfs/taskclass/shells'  
/rcfs/ce/rules/classify_tcsh_shells
..

And class shares would control the fork rate of those shells:

echo 'res=numtasks,forkrate=1,forkrate_interval=1'  
'/rcfs/taskclass/config'
echo 'res=numtasks,guarantee=1000,limit=5000'  '/rcfs/taskclass/shells'

No middleware necessary.

snip 

 CKRM is in part a generalization and descendent of what I call fair
 share schedulers.  For example, the fork hooks for CKRM include a
 forkrates controller, to slow down the rate of forking of tasks using
 too much resources.
 
 No doubt the CKRM experts are already familiar with these, but for
 the possible benefit of other readers:
 
   UNICOS Resource Administration - Chapter 4. Fair-share Scheduler
   
 http://oscinfo.osc.edu:8080/dynaweb/all/004-2302-001/@Generic__BookTextView/22883
 
   SHARE II -- A User Administration and Resource Control System for UNIX
   http://www.c-side.com/c/papers/lisa-91.html
 
   Solaris Resource Manager White Paper
   http://wwws.sun.com/software/resourcemgr/wp-mixed/
 
   ON THE PERFORMANCE IMPACT OF FAIR SHARE SCHEDULING
   http://www.cs.umb.edu/~eb/goalmode/cmg2000final.htm
 
   A Fair Share Scheduler, J. Kay and P. Lauder
   Communications of the ACM, January 1988, Volume 31, Number 1, pp 44-55.
 
 The documentation that I've noticed (likely I've missed something)
 doesn't do an adequate job of making the case - providing the
 motivation and context essential to understanding this patch set.

The choice of algorithm is entirely up to the scheduler, memory
allocator, etc. CKRM currently provides an interface for reading share
values and does not impose any meaning on those shares -- that is the
role of the scheduler.

 Because CKRM provides an infrastructure for multiple controllers
 (limiting forks, memory allocation and network rates) and multiple
 classifiers and policies, its critical interfaces have rather
 generic and abstract names.  This makes it difficult for others to
 approach CKRM, reducing the rate of peer review by other Linux kernel
 developers, which is perhaps the key impediment to acceptance of CKRM.
 If anything, CKRM tends to be a little too abstract.

Generic and abstract names are appropriate for infrastructure that is
not tied to hardware. If you could be more specific I'd be able to
respond in less general and abstract terms.

snip

 My notes from many months ago indicate something about a 128 CPU
 limit in CKRM.  I don't know why, nor if it still applies.  It is
 certainly a smaller limit than the systems I care about.

I haven't seen this limitation in the CKRM patches that went into -mm
and I'd like to look into this. Where did you see this limit?


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Paul Jackson
Matthew wrote:
 I don't see the large ifdefs you're referring to in -mm's
 kernel/sched.c.

Perhaps someone who knows CKRM better than I can explain why the CKRM
version in some SuSE releases based on 2.6.5 kernels has substantial
code and some large ifdef's in sched.c, but the CKRM in *-mm doesn't.
Or perhaps I'm confused.  There's a good chance that this represents
ongoing improvements that CKRM is making to reduce their footprint
in core kernel code.  Or perhaps there is a more sophisticated cpu
controller in the SuSE kernel.


 Have you looked at more
 recent benchmarks posted on CKRM-Tech around April 15th 2005?
 ...
 http://ckrm.sourceforge.net/downloads/ckrm-ols04-slides.pdf 

I had not seen these before.  Thanks for the pointer.


 The Rule-Based Classification Engine (RBCE) makes CKRM useful
 without middleware.

I'd be encouraged more if this went one step further, past pointing
out that the API can be manipulated from the shell without requiring C
code, to providing examples of who intends to _directly_ use this
interface. The issue is perhaps less whether it's API is naturally C or
shell code, or more of how many actual, independent, uses of this API
are known to the community.  A non-trivial API and mechanism that
is de facto captive to a single middleware implementation (which
may or may not apply here - I don't know) creates an additional review
burden, because some of the natural forces that guide us to healthy
long lasting interfaces are missing.  If that concern applies here,
it's certainly not insurmountable - but it should in my view raise the
review barrier to acceptance.  If other middleware or direct users
are not essentially performing some of the review for us, we have to do
it here with greater thoroughness.


 If you could be more specific I'd be able to
 respond in less general and abstract terms.

Good come back grin.

I made an effort along these lines last year, in the thread
I referenced a few days ago:

Classes: 1) what are they, 2) what is their name?

http://sourceforge.net/mailarchive/forum.php?thread_id=5328162forum_id=35191

I doubt that it I have much more to contribute along
these lines now.

Sorry.

 I haven't seen this limitation [128 cpus] ...

Good - I presume that there is no longer, if there ever was, such a
limitation.

Thanks for you reply.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Martin J. Bligh

Paul Jackson wrote:


Matthew wrote:
 


Perhaps someone who knows CKRM better than I can explain why the CKRM
version in some SuSE releases based on 2.6.5 kernels has substantial
code and some large ifdef's in sched.c, but the CKRM in *-mm doesn't.
Or perhaps I'm confused.  There's a good chance that this represents
ongoing improvements that CKRM is making to reduce their footprint
in core kernel code.  Or perhaps there is a more sophisticated cpu
controller in the SuSE kernel.
 



No offense, but I really don't see why this matters at all ... the stuff
in -mm is what's under consideration for merging - what's in SuSE is
wholly irrelevant ? One obvious thing is that that codebase will be
much older ... would be useful if people can direct critiques at the
current codebase ;-)

M.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Peter Williams

Paul Jackson wrote:

Matthew wrote:


I don't see the large ifdefs you're referring to in -mm's
kernel/sched.c.



Perhaps someone who knows CKRM better than I can explain why the CKRM
version in some SuSE releases based on 2.6.5 kernels has substantial
code and some large ifdef's in sched.c, but the CKRM in *-mm doesn't.
Or perhaps I'm confused.  There's a good chance that this represents
ongoing improvements that CKRM is making to reduce their footprint
in core kernel code.  Or perhaps there is a more sophisticated cpu
controller in the SuSE kernel.


As there is NO CKRM cpu controller in 2.6.13-rc3-mm1 (that I can see) 
the one in 2.6.5 is certainly more sophisticated :-).  So the reason 
that the considerable mangling of sched.c evident in SuSE's 2.6.5 kernel 
source is not present is that the cpu controller is not included in 
these patches.


I imagine that the cpu controller is missing from this version of CKRM 
because the bugs introduced to the cpu controller during upgrading from 
2.6.5 to 2.6.10 version have not yet been resolved.


Peter
--
Peter Williams   [EMAIL PROTECTED]

Learning, n. The kind of ignorance distinguishing the studious.
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Gerrit Huizenga

On Fri, 22 Jul 2005 11:06:14 +1000, Peter Williams wrote:
 Paul Jackson wrote:
  Matthew wrote:
  
 I don't see the large ifdefs you're referring to in -mm's
 kernel/sched.c.
  
  
  Perhaps someone who knows CKRM better than I can explain why the CKRM
  version in some SuSE releases based on 2.6.5 kernels has substantial
  code and some large ifdef's in sched.c, but the CKRM in *-mm doesn't.
  Or perhaps I'm confused.  There's a good chance that this represents
  ongoing improvements that CKRM is making to reduce their footprint
  in core kernel code.  Or perhaps there is a more sophisticated cpu
  controller in the SuSE kernel.
 
 As there is NO CKRM cpu controller in 2.6.13-rc3-mm1 (that I can see) 
 the one in 2.6.5 is certainly more sophisticated :-).  So the reason 
 that the considerable mangling of sched.c evident in SuSE's 2.6.5 kernel 
 source is not present is that the cpu controller is not included in 
 these patches.
 
 Yeah - I don't really consider the current CPU controller code something
 ready for consideration yet for mainline merging.  That doesn't mean
 we don't want a CPU controller for CKRM - just that what we have
 doesn't integrate cleanly/nicely with mainline.

 I imagine that the cpu controller is missing from this version of CKRM 
 because the bugs introduced to the cpu controller during upgrading from 
 2.6.5 to 2.6.10 version have not yet been resolved.

 I don't know what bugs you are referring to here.  I don't think we
 have any open defects with SuSE on the CPU scheduler in their releases.
 And that is not at all related to the reason for not having a CPU
 controller in the current patch set.

gerrit
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Peter Williams

Gerrit Huizenga wrote:

On Fri, 22 Jul 2005 11:06:14 +1000, Peter Williams wrote:


Paul Jackson wrote:


Matthew wrote:



I don't see the large ifdefs you're referring to in -mm's
kernel/sched.c.



Perhaps someone who knows CKRM better than I can explain why the CKRM
version in some SuSE releases based on 2.6.5 kernels has substantial
code and some large ifdef's in sched.c, but the CKRM in *-mm doesn't.
Or perhaps I'm confused.  There's a good chance that this represents
ongoing improvements that CKRM is making to reduce their footprint
in core kernel code.  Or perhaps there is a more sophisticated cpu
controller in the SuSE kernel.


As there is NO CKRM cpu controller in 2.6.13-rc3-mm1 (that I can see) 
the one in 2.6.5 is certainly more sophisticated :-).  So the reason 
that the considerable mangling of sched.c evident in SuSE's 2.6.5 kernel 
source is not present is that the cpu controller is not included in 
these patches.


 
 Yeah - I don't really consider the current CPU controller code something

 ready for consideration yet for mainline merging.  That doesn't mean
 we don't want a CPU controller for CKRM - just that what we have
 doesn't integrate cleanly/nicely with mainline.


I imagine that the cpu controller is missing from this version of CKRM 
because the bugs introduced to the cpu controller during upgrading from 
2.6.5 to 2.6.10 version have not yet been resolved.



 I don't know what bugs you are referring to here.  I don't think we
 have any open defects with SuSE on the CPU scheduler in their releases.
 And that is not at all related to the reason for not having a CPU
 controller in the current patch set.


The bugs were in the patches for the 2.6.10 kernel not SuSE's 2.6.5 
kernel.  I reported some of them to the ckrm-tech mailing list at the 
time.  There were changes to the vanilla scheduler between 2.6.5 and 
2.6.10 that were not handled properly when the CKRM scheduler was 
upgraded to the 2.6.10 kernel.


Peter
--
Peter Williams   [EMAIL PROTECTED]

Learning, n. The kind of ignorance distinguishing the studious.
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Paul Jackson
Martin wrote:
 No offense, but I really don't see why this matters at all ... the stuff
 in -mm is what's under consideration for merging - what's in SuSE is ...

Yes - what's in SuSE doesn't matter, at least not directly.

No - we are not just considering the CKRM that is in *-mm now, but also
what can be expected to be proposed as part of CKRM in the future.

If the CPU controller is not in *-mm now, but if one might reasonably
expect it to be proposed as part of CKRM in the future, then we need to
understand that.  This is perhaps especially important in this case,
where there is some reason to suspect that this additional piece is
both non-trivial and essential to CKRM's purpose.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Gerrit Huizenga

On Fri, 22 Jul 2005 13:46:37 +1000, Peter Williams wrote:
 Gerrit Huizenga wrote:
 I imagine that the cpu controller is missing from this version of CKRM 
 because the bugs introduced to the cpu controller during upgrading from 
 2.6.5 to 2.6.10 version have not yet been resolved.
  
  
   I don't know what bugs you are referring to here.  I don't think we
   have any open defects with SuSE on the CPU scheduler in their releases.
   And that is not at all related to the reason for not having a CPU
   controller in the current patch set.
 
 The bugs were in the patches for the 2.6.10 kernel not SuSE's 2.6.5 
 kernel.  I reported some of them to the ckrm-tech mailing list at the 
 time.  There were changes to the vanilla scheduler between 2.6.5 and 
 2.6.10 that were not handled properly when the CKRM scheduler was 
 upgraded to the 2.6.10 kernel.

Ah - okay - that makes sense.  Those patches haven't gone through my
review yet and I'm not directly tracking their status until I figure
out what the right direction is with respect to a fair share style
scheduler of some sort.  I'm not convinced that the current one is
something that is ready for mainline or is necessarily the right answer
currently.  But we do need to figure out something that will provide
some level of CPU allocation minima  maxima for a class, where that
solution will work well on a laptop or a huge server.

Ideas in that space are welcome - I know of several proposed ideas
in progress - the scheduler in SuSE and the forward port to 2.6.10
that you referred to; an idea for building a very simple interface
on top of sched_domains for SMP systems (no fairness within a
single CPU) and a proposal for timeslice manipulation that might
provide some fairness that the Fujitsu folks are thinking about.
There are probably others and honestly, I don't have any clue yet as
to what the right long term/mainline direction should be here as yet.

gerrit
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Shailabh Nagar

Mark Hahn wrote:

I suspect that the main problem is that this patch is not a mainstream
kernel feature that will gain multiple uses, but rather provides
support for a specific vendor middleware product used by that
vendor and a few closely allied vendors.  If it were smaller or
less intrusive, such as a driver, this would not be a big problem.
That's not the case.



yes, that's the crux.  CKRM is all about resolving conflicting resource 
demands in a multi-user, multi-server, multi-purpose machine.  this is a 
huge undertaking, and I'd argue that it's completely inappropriate for 
*most* servers.  that is, computers are generally so damn cheap that 
the clear trend is towards dedicating a machine to a specific purpose, 
rather than running eg, shell/MUA/MTA/FS/DB/etc all on a single machine.  


The argument about scale-up vs. scale-out is nowhere close to being 
resolved. To argue that any support for performance partitioning (which 
CKRM does) is in support of a lost cause is premature to say the least.


this is *directly* in conflict with certain prominent products, such as 
the Altix and various less-prominent Linux-based mainframes.  they're all
about partitioning/virtualization - the big-iron aesthetic of splitting up 
a single machine.  note that it's not just about big, since cluster-based 
approaches can clearly scale far past big-iron, and are in effect statically
partitioned.  yes, buying a hideously expensive single box, and then chopping 
it into little pieces is more than a little bizarre, and is mainly based

on a couple assumptions:

	- that clusters are hard.  really, they aren't.  they are not 
	necessarily higher-maintenance, can be far more robust, usually
	do cost less.  just about the only bad thing about clusters is 
	that they tend to be somewhat larger in size.


	- that partitioning actually makes sense.  the appeal is that if 
	you have a partition to yourself, you can only hurt yourself.
	but it also follows that burstiness in resource demand cannot be 
	overlapped without either constantly tuning the partitions or 
	infringing on the guarantee.


constantly tuning the partitions is effectively whats done by workload 
managers. But our earlier presentations and papers have made the case 
that this is not the only utility for performance isolation - simple 
needs like isolating one user from another on a general purpose server 
is also a need that cannot be met by any existing or proposed Linux 
kernel mechanisms today.


If partitioning made so little sense and the case for clusters was that 
obvious, one would be hard put to explain why server consolidation is 
being actively pursued by so many firms, Solaris is bothering with 
coming up with Containers and Xen/VMWare getting all this attention.

I don't think the concept of partitioning can be dismissed so easily.

Of course, it must be noted that CKRM only provides performance 
isolation not fault isolation. But there is a need for that. Whether 
Linux chooses to let this need influence its design is another matter 
(which I hope we'll also discuss besides the implementation issues).



CKRM is one of those things that could be done to Linux, and will benefit a
few, but which will almost certainly hurt *most* of the community.

let me say that the CKRM design is actually quite good.  the issue is whether 
the extensive hooks it requires can be done (at all) in a way which does 
not disporportionately hurt maintainability or efficiency.


If there are suggestions on implementing this better, it'll certainly be 
very welcome.




CKRM requires hooks into every resource-allocation decision fastpath:
- if CKRM is not CONFIG, the only overhead is software maintenance.
- if CKRM is CONFIG but not loaded, the overhead is a pointer check.
- if CKRM is CONFIG and loaded, the overhead is a pointer check
and a nontrivial callback.

but really, this is only for CKRM-enforced limits.  CKRM really wants to
change behavior in a more weighted way, not just causing an
allocation/fork/packet to fail.  a really meaningful CKRM needs to 
be tightly integrated into each resource manager - effecting each scheduler
(process, memory, IO, net).  I don't really see how full-on CKRM can be 
compiled out, unless these schedulers are made fully pluggable.


This is a valid point for the CPU, memory and network controllers (I/O 
can be made pluggable quite easily). For the CPU controller in SuSE, the 
CKRM CPU controller can be turned on and off dynamically at runtime. 
Exploring a similar option for  memory and network (incurring only a 
pointer check) could be explored. Keeping the overhead close to zero for 
kernel users not interested in CKRM is certainly one of our objectives.


finally, I observe that pluggable, class-based resource _limits_ could 
probably be done without callbacks and potentially with low overhead.
but mere limits doesn't meet CKRM's goal of flexible, wide-spread resource 
partitioning within a 

Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Shailabh Nagar

Paul Jackson wrote:

Martin wrote:


No offense, but I really don't see why this matters at all ... the stuff
in -mm is what's under consideration for merging - what's in SuSE is ...



Yes - what's in SuSE doesn't matter, at least not directly.

No - we are not just considering the CKRM that is in *-mm now, but also
what can be expected to be proposed as part of CKRM in the future.

If the CPU controller is not in *-mm now, but if one might reasonably
expect it to be proposed as part of CKRM in the future, then we need to
understand that.  This is perhaps especially important in this case,
where there is some reason to suspect that this additional piece is
both non-trivial and essential to CKRM's purpose.



The CKRM design explicitly considered this problem of some controllers 
being more unacceptable than the rest and part of the indirections 
introduced in CKRM are to allow the kernel community the flexibility of 
cherry-picking acceptable controllers. So if the current CPU controller 
  implementation is considered too intrusive/unacceptable, it can be 
reworked or (and we certainly hope not) even rejected in perpetuity. 
Same for the other controllers as and when they're introduced and 
proposed for inclusion.



-- Shailabh




-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Gerrit Huizenga

Sorry - I didn't see Mark's original comment, so I'm replying to
a reply which I did get.  ;-)

On Thu, 21 Jul 2005 23:59:09 EDT, Shailabh Nagar wrote:
 Mark Hahn wrote:
 I suspect that the main problem is that this patch is not a mainstream
 kernel feature that will gain multiple uses, but rather provides
 support for a specific vendor middleware product used by that
 vendor and a few closely allied vendors.  If it were smaller or
 less intrusive, such as a driver, this would not be a big problem.
 That's not the case.
  
  
  yes, that's the crux.  CKRM is all about resolving conflicting resource 
  demands in a multi-user, multi-server, multi-purpose machine.  this is a 
  huge undertaking, and I'd argue that it's completely inappropriate for 
  *most* servers.  that is, computers are generally so damn cheap that 
  the clear trend is towards dedicating a machine to a specific purpose, 
  rather than running eg, shell/MUA/MTA/FS/DB/etc all on a single machine.  
 
This is a big NAK - if computers are so damn cheap, why is virtualization
and consolidation such a big deal?  Well, the answer is actually that
floor space, heat, and power are also continuing to be very important
in the overall equation.  And, buying machines which are dedicated but
often 80-99% idle occasionally bothers people who are concerned about
wasting planetary resources for no good reason.  Yeah, we can stamp out
thousands of metal boxes, but if just a couple can do the same work,
well, let's consolidate.  Less wasted metal, less wasted heat, less
wasted power, less air conditioning, wow, we are now part of the
eco-computing movement!  ;-)

  this is *directly* in conflict with certain prominent products, such as 
  the Altix and various less-prominent Linux-based mainframes.  they're all
  about partitioning/virtualization - the big-iron aesthetic of splitting up 
  a single machine.  note that it's not just about big, since cluster-based 
  approaches can clearly scale far past big-iron, and are in effect statically
  partitioned.  yes, buying a hideously expensive single box, and then 
  chopping 
  it into little pieces is more than a little bizarre, and is mainly based
  on a couple assumptions:

Well, yeah IBM has been doing this virtualization  partitioning stuff
for ages at lots of different levels for lots of reasons.  If we are
in such direct conflict with Altix, aren't we also in conflict with our
own lines of business which do the same thing?  But, well, we aren't
in conflict - this is a complementary part of our overall capabilities.

  - that clusters are hard.  really, they aren't.  they are not 
  necessarily higher-maintenance, can be far more robust, usually
  do cost less.  just about the only bad thing about clusters is 
  that they tend to be somewhat larger in size.

This is orthogonal to clusters.  Or, well, we are even using CKRM today
is some grid/cluster style applications.  But that has no bearing on
whether or not clusters is useful.

  - that partitioning actually makes sense.  the appeal is that if 
  you have a partition to yourself, you can only hurt yourself.
  but it also follows that burstiness in resource demand cannot be 
  overlapped without either constantly tuning the partitions or 
  infringing on the guarantee.
 
Well, if you don't think it makes sense, don't buy one.  And stay away
from Xen, VMware, VirtualIron, PowerPC/pSeries hardware, Mainframes,
Altix, IA64 platforms, Intel VT, AMD Pacifica, and, well, anyone else
that is working to support virtualization, which is one key level of
partitioning.

I'm sorry but I'm not buying your argument here at all - it just has
no relationship to what's going on at the user side as near as I can
tell.

  CKRM is one of those things that could be done to Linux, and will benefit a
  few, but which will almost certainly hurt *most* of the community.
  
  let me say that the CKRM design is actually quite good.  the issue is 
  whether 
  the extensive hooks it requires can be done (at all) in a way which does 
  not disporportionately hurt maintainability or efficiency.
 
Can you be more clear on how this will hurt *most* of the community?
CKRM when not in use is not in any way intrusive.  Can you take a look
at the patch again and point out the extensive hooks for me?  I've
looked at all of them and I have trouble calling a couple of callbacks
extensive hooks.

  CKRM requires hooks into every resource-allocation decision fastpath:
  - if CKRM is not CONFIG, the only overhead is software maintenance.
  - if CKRM is CONFIG but not loaded, the overhead is a pointer check.
  - if CKRM is CONFIG and loaded, the overhead is a pointer check
  and a nontrivial callback.

You left out a case here:  CKRM is CONFIG and loaded and classes are
defined.

In all of the cases that you mentioned, if there are no classes
defined, the overhead is still unmeasurable for any real workload.
Refer to the archives 

Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Mark Hahn
   yes, that's the crux.  CKRM is all about resolving conflicting resource 
   demands in a multi-user, multi-server, multi-purpose machine.  this is a 
   huge undertaking, and I'd argue that it's completely inappropriate for 
   *most* servers.  that is, computers are generally so damn cheap that 
   the clear trend is towards dedicating a machine to a specific purpose, 
   rather than running eg, shell/MUA/MTA/FS/DB/etc all on a single machine.  
  
 This is a big NAK - if computers are so damn cheap, why is virtualization
 and consolidation such a big deal?  Well, the answer is actually that

yes, you did miss my point.  I'm actually arguing that it's bad design
to attempt to arbitrate within a single shared user-space.  you make 
the fast path slower and less maintainable.  if you are really concerned
about isolating many competing servers on a single piece of hardware, then
run separate virtualized environments, each with its own user-space.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Gerrit Huizenga

On Fri, 22 Jul 2005 00:53:58 EDT, Mark Hahn wrote:
yes, that's the crux.  CKRM is all about resolving conflicting resource 
demands in a multi-user, multi-server, multi-purpose machine.  this is 
a 
huge undertaking, and I'd argue that it's completely inappropriate for 
*most* servers.  that is, computers are generally so damn cheap that 
the clear trend is towards dedicating a machine to a specific purpose, 
rather than running eg, shell/MUA/MTA/FS/DB/etc all on a single 
machine.  
   
  This is a big NAK - if computers are so damn cheap, why is virtualization
  and consolidation such a big deal?  Well, the answer is actually that
 
 yes, you did miss my point.  I'm actually arguing that it's bad design
 to attempt to arbitrate within a single shared user-space.  you make 
 the fast path slower and less maintainable.  if you are really concerned
 about isolating many competing servers on a single piece of hardware, then
 run separate virtualized environments, each with its own user-space.

I'm willing to agree to disagree.  I'm in favor of full virtualization
as well, as it is appropriate to certain styles of workloads.  I also
have enough end users who also want to share user level, share tasks,
yet also have some level of balancing between the resource consumption
of the various environments.  I don't think you are one of those end
users, though.  I don't think I'm required to make everyone happy all
the time.  ;)

BTW, does your mailer purposefully remove cc:'s?  Seems like that is
normally considered impolite.

gerrit
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Mark Hahn
 of the various environments.  I don't think you are one of those end
 users, though.  I don't think I'm required to make everyone happy all
 the time.  ;)

the issue is whether CKRM (in it's real form, not this thin edge)
will noticably hurt Linux's fast-path.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-20 Thread Paul Jackson
Well said, Mark.  Thanks.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-20 Thread Paul Jackson
Well said, Mark.  Thanks.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-18 Thread Hirokazu Takahashi
Hi,

> > What, in your opinion, makes it "obviously unmergeable"?

Controlling resource assignment, I think that concept is good.
But the design is another matter that it seems somewhat overkilled
with the current CKRM.

> I suspect that the main problem is that this patch is not a mainstream
> kernel feature that will gain multiple uses, but rather provides
> support for a specific vendor middleware product used by that
> vendor and a few closely allied vendors.  If it were smaller or
> less intrusive, such as a driver, this would not be a big problem.
> That's not the case.

I believe this feature would also make desktop users happier -- controlling
X-server, mpeg player, video capturing and all that -- if the code
becomes much simpler and easier to use.

> A major restructuring of this patch set could be considered,  This
> might involve making the metric tools (that monitor memory, fork
> and network usage rates per task) separate patches useful for other
> purposes.  It might also make the rate limiters in fork, alloc and
> network i/o separately useful patches.  I mean here genuinely useful
> and understandable in their own right, independent of some abstract
> CKRM framework.

That makes sense.

> Though hints have been dropped, I have not seen any public effort to
> integrate CKRM with either cpusets or scheduler domains or process
> accounting.  By this I don't mean recoding cpusets using the CKRM
> infrastructure; that proposal received _extensive_ consideration
> earlier, and I am as certain as ever that it made no sense.  Rather I
> could imagine the CKRM folks extending cpusets to manage resources
> on a per-cpuset basis, not just on a per-task or task class basis.
> Similarly, it might make sense to use CKRM to manage resources on
> a per-sched domain basis, and to integrate the resource tracking
> of CKRM with the resource tracking needs of system accounting.

>From a standpoint of the users, CKRM and CPUSETS should be managed
seamlessly through the same interface though I'm not sure whether
your idea is the best yet.


Thanks,
Hirokazu Takahashi.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-18 Thread Hirokazu Takahashi
Hi,

  What, in your opinion, makes it obviously unmergeable?

Controlling resource assignment, I think that concept is good.
But the design is another matter that it seems somewhat overkilled
with the current CKRM.

 I suspect that the main problem is that this patch is not a mainstream
 kernel feature that will gain multiple uses, but rather provides
 support for a specific vendor middleware product used by that
 vendor and a few closely allied vendors.  If it were smaller or
 less intrusive, such as a driver, this would not be a big problem.
 That's not the case.

I believe this feature would also make desktop users happier -- controlling
X-server, mpeg player, video capturing and all that -- if the code
becomes much simpler and easier to use.

 A major restructuring of this patch set could be considered,  This
 might involve making the metric tools (that monitor memory, fork
 and network usage rates per task) separate patches useful for other
 purposes.  It might also make the rate limiters in fork, alloc and
 network i/o separately useful patches.  I mean here genuinely useful
 and understandable in their own right, independent of some abstract
 CKRM framework.

That makes sense.

 Though hints have been dropped, I have not seen any public effort to
 integrate CKRM with either cpusets or scheduler domains or process
 accounting.  By this I don't mean recoding cpusets using the CKRM
 infrastructure; that proposal received _extensive_ consideration
 earlier, and I am as certain as ever that it made no sense.  Rather I
 could imagine the CKRM folks extending cpusets to manage resources
 on a per-cpuset basis, not just on a per-task or task class basis.
 Similarly, it might make sense to use CKRM to manage resources on
 a per-sched domain basis, and to integrate the resource tracking
 of CKRM with the resource tracking needs of system accounting.

From a standpoint of the users, CKRM and CPUSETS should be managed
seamlessly through the same interface though I'm not sure whether
your idea is the best yet.


Thanks,
Hirokazu Takahashi.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-17 Thread Mark Hahn
> I suspect that the main problem is that this patch is not a mainstream
> kernel feature that will gain multiple uses, but rather provides
> support for a specific vendor middleware product used by that
> vendor and a few closely allied vendors.  If it were smaller or
> less intrusive, such as a driver, this would not be a big problem.
> That's not the case.

yes, that's the crux.  CKRM is all about resolving conflicting resource 
demands in a multi-user, multi-server, multi-purpose machine.  this is a 
huge undertaking, and I'd argue that it's completely inappropriate for 
*most* servers.  that is, computers are generally so damn cheap that 
the clear trend is towards dedicating a machine to a specific purpose, 
rather than running eg, shell/MUA/MTA/FS/DB/etc all on a single machine.  

this is *directly* in conflict with certain prominent products, such as 
the Altix and various less-prominent Linux-based mainframes.  they're all
about partitioning/virtualization - the big-iron aesthetic of splitting up 
a single machine.  note that it's not just about "big", since cluster-based 
approaches can clearly scale far past big-iron, and are in effect statically
partitioned.  yes, buying a hideously expensive single box, and then chopping 
it into little pieces is more than a little bizarre, and is mainly based
on a couple assumptions:

- that clusters are hard.  really, they aren't.  they are not 
necessarily higher-maintenance, can be far more robust, usually
do cost less.  just about the only bad thing about clusters is 
that they tend to be somewhat larger in size.

- that partitioning actually makes sense.  the appeal is that if 
you have a partition to yourself, you can only hurt yourself.
but it also follows that burstiness in resource demand cannot be 
overlapped without either constantly tuning the partitions or 
infringing on the guarantee.

CKRM is one of those things that could be done to Linux, and will benefit a
few, but which will almost certainly hurt *most* of the community.

let me say that the CKRM design is actually quite good.  the issue is whether 
the extensive hooks it requires can be done (at all) in a way which does 
not disporportionately hurt maintainability or efficiency.

CKRM requires hooks into every resource-allocation decision fastpath:
- if CKRM is not CONFIG, the only overhead is software maintenance.
- if CKRM is CONFIG but not loaded, the overhead is a pointer check.
- if CKRM is CONFIG and loaded, the overhead is a pointer check
and a nontrivial callback.

but really, this is only for CKRM-enforced limits.  CKRM really wants to
change behavior in a more "weighted" way, not just causing an
allocation/fork/packet to fail.  a really meaningful CKRM needs to 
be tightly integrated into each resource manager - effecting each scheduler
(process, memory, IO, net).  I don't really see how full-on CKRM can be 
compiled out, unless these schedulers are made fully pluggable.

finally, I observe that pluggable, class-based resource _limits_ could 
probably be done without callbacks and potentially with low overhead.
but mere limits doesn't meet CKRM's goal of flexible, wide-spread resource 
partitioning within a large, shared machine.

regards, mark hahn.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-17 Thread Paul Jackson
Andrew, replying to Christoph, about CKRM:
> What, in your opinion, makes it "obviously unmergeable"?

Thanks to some earlier discussions on the relation of CKRM with
cpusets, I've spent some time looking at CKRM.  I'm not Christoph,
but perhaps my notes will be of some use in this matter.

CKRM is big, it's difficult for us mere mortals to understand, and it
has attracted only limited review - inadequate review in proportion
to its size and impact.  I tried, and failed, sometime last year to
explain some of what I found difficult to grasp of CKRM to the folks
doing it.  See further an email thread entitled:

Classes: 1) what are they, 2) what is their name?

http://sourceforge.net/mailarchive/forum.php?thread_id=5328162_id=35191

on the ckrm-tech@lists.sourceforge.net email list between Aug 14 and
Aug 27, 2004

As to its size, CKRM is in a 2.6.5 variant of SuSE that I happen to be
building just now for other reasons.  The source files that have 'ckrm'
in the pathname, _not_ counting Doc files, total 13044 lines of text.
The CONFIG_CKRM* config options add 144 Kbytes to the kernel text.

The CKRM patches in 2.6.13-rc3-mm1 are similar in size.  These patch
files total 14367 lines of text.

It is somewhat intrusive in the areas it controls, such as some large
ifdef's in kernel/sched.c.

The sched hooks may well impact the cost of maintaining the sched code,
which is always a hotbed of Linux kernel development.  However others
who work in that area will have to speak to that concern.

I tried just now to read through the ckrm hooks in fork, to see
what sort of impact they might have on scalability on large systems.
But I gave up after a couple layers of indirection.  I saw several
atomic counters and a couple of spinlocks that I suspect (not at all
sure) lay on the fork main code path.  I'd be surprised if this didn't
impact scalability.  Earlier, according to my notes, I saw mention of
lmbench results in the OLS 2004 slides, indicating a several percent
cost of available cpu cycles.

A feature of this size and impact needs to attract a fair bit of
discussion, because it is essential to a variety of people, or because
it is intriguing in some other way.

I suspect that the main problem is that this patch is not a mainstream
kernel feature that will gain multiple uses, but rather provides
support for a specific vendor middleware product used by that
vendor and a few closely allied vendors.  If it were smaller or
less intrusive, such as a driver, this would not be a big problem.
That's not the case.

The threshold of what is sufficient review needs to be set rather high
for such a patch, quite a bit higher than I believe it has obtained
so far.  It will not be easy for them to obtain that level of review,
until they get better at arousing the substained interest of other
kernel developers.

There may well be multiple end users and applications depending on
CKRM, but I have not been able to identify how many separate vendors
provide middleware that depends on CKRM.  I am guessing that only one
vendor has a serious middleware software product that provides full
CKRM support.  Acceptance of CKRM would be easier if multiple competing
middleware vendors were using it.  It is also a concern that CKRM
is not really usable for its primary intended purpose except if it
is accompanied by this corresponding middleware, which I presume is
proprietary code.  I'd like to see a persuasive case that CKRM is
useful and used on production systems not running substantial sole
sourced proprietary middleware.

The development and maintenance costs so far of CKRM appear (to
this outsider) to have been substantial, which suggests that the
maintenance costs of CKRM once in the kernel would be non-trivial.
Given the size of the project, its impact on kernel code, and the
rather limited degree to which developers outside of the CKRM project
have participated in CKRM's development or review, this could either
leave the Linux kernel overly dependent on one vendor for maintaining
CKRM, or place an undo maintenance burden on other kernel developers.

CKRM is in part a generalization and descendent of what I call fair
share schedulers.  For example, the fork hooks for CKRM include a
forkrates controller, to slow down the rate of forking of tasks using
too much resources.

No doubt the CKRM experts are already familiar with these, but for
the possible benefit of other readers:

  UNICOS Resource Administration - Chapter 4. Fair-share Scheduler
  
http://oscinfo.osc.edu:8080/dynaweb/all/004-2302-001/@Generic__BookTextView/22883

  SHARE II -- A User Administration and Resource Control System for UNIX
  http://www.c-side.com/c/papers/lisa-91.html

  Solaris Resource Manager White Paper
  http://wwws.sun.com/software/resourcemgr/wp-mixed/

  ON THE PERFORMANCE IMPACT OF FAIR SHARE SCHEDULING
  http://www.cs.umb.edu/~eb/goalmode/cmg2000final.htm

  A Fair Share Scheduler, J. Kay and P. Lauder
  Communications of the ACM, January 1988, 

Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-17 Thread Paul Jackson
Andrew, replying to Christoph, about CKRM:
 What, in your opinion, makes it obviously unmergeable?

Thanks to some earlier discussions on the relation of CKRM with
cpusets, I've spent some time looking at CKRM.  I'm not Christoph,
but perhaps my notes will be of some use in this matter.

CKRM is big, it's difficult for us mere mortals to understand, and it
has attracted only limited review - inadequate review in proportion
to its size and impact.  I tried, and failed, sometime last year to
explain some of what I found difficult to grasp of CKRM to the folks
doing it.  See further an email thread entitled:

Classes: 1) what are they, 2) what is their name?

http://sourceforge.net/mailarchive/forum.php?thread_id=5328162forum_id=35191

on the ckrm-tech@lists.sourceforge.net email list between Aug 14 and
Aug 27, 2004

As to its size, CKRM is in a 2.6.5 variant of SuSE that I happen to be
building just now for other reasons.  The source files that have 'ckrm'
in the pathname, _not_ counting Doc files, total 13044 lines of text.
The CONFIG_CKRM* config options add 144 Kbytes to the kernel text.

The CKRM patches in 2.6.13-rc3-mm1 are similar in size.  These patch
files total 14367 lines of text.

It is somewhat intrusive in the areas it controls, such as some large
ifdef's in kernel/sched.c.

The sched hooks may well impact the cost of maintaining the sched code,
which is always a hotbed of Linux kernel development.  However others
who work in that area will have to speak to that concern.

I tried just now to read through the ckrm hooks in fork, to see
what sort of impact they might have on scalability on large systems.
But I gave up after a couple layers of indirection.  I saw several
atomic counters and a couple of spinlocks that I suspect (not at all
sure) lay on the fork main code path.  I'd be surprised if this didn't
impact scalability.  Earlier, according to my notes, I saw mention of
lmbench results in the OLS 2004 slides, indicating a several percent
cost of available cpu cycles.

A feature of this size and impact needs to attract a fair bit of
discussion, because it is essential to a variety of people, or because
it is intriguing in some other way.

I suspect that the main problem is that this patch is not a mainstream
kernel feature that will gain multiple uses, but rather provides
support for a specific vendor middleware product used by that
vendor and a few closely allied vendors.  If it were smaller or
less intrusive, such as a driver, this would not be a big problem.
That's not the case.

The threshold of what is sufficient review needs to be set rather high
for such a patch, quite a bit higher than I believe it has obtained
so far.  It will not be easy for them to obtain that level of review,
until they get better at arousing the substained interest of other
kernel developers.

There may well be multiple end users and applications depending on
CKRM, but I have not been able to identify how many separate vendors
provide middleware that depends on CKRM.  I am guessing that only one
vendor has a serious middleware software product that provides full
CKRM support.  Acceptance of CKRM would be easier if multiple competing
middleware vendors were using it.  It is also a concern that CKRM
is not really usable for its primary intended purpose except if it
is accompanied by this corresponding middleware, which I presume is
proprietary code.  I'd like to see a persuasive case that CKRM is
useful and used on production systems not running substantial sole
sourced proprietary middleware.

The development and maintenance costs so far of CKRM appear (to
this outsider) to have been substantial, which suggests that the
maintenance costs of CKRM once in the kernel would be non-trivial.
Given the size of the project, its impact on kernel code, and the
rather limited degree to which developers outside of the CKRM project
have participated in CKRM's development or review, this could either
leave the Linux kernel overly dependent on one vendor for maintaining
CKRM, or place an undo maintenance burden on other kernel developers.

CKRM is in part a generalization and descendent of what I call fair
share schedulers.  For example, the fork hooks for CKRM include a
forkrates controller, to slow down the rate of forking of tasks using
too much resources.

No doubt the CKRM experts are already familiar with these, but for
the possible benefit of other readers:

  UNICOS Resource Administration - Chapter 4. Fair-share Scheduler
  
http://oscinfo.osc.edu:8080/dynaweb/all/004-2302-001/@Generic__BookTextView/22883

  SHARE II -- A User Administration and Resource Control System for UNIX
  http://www.c-side.com/c/papers/lisa-91.html

  Solaris Resource Manager White Paper
  http://wwws.sun.com/software/resourcemgr/wp-mixed/

  ON THE PERFORMANCE IMPACT OF FAIR SHARE SCHEDULING
  http://www.cs.umb.edu/~eb/goalmode/cmg2000final.htm

  A Fair Share Scheduler, J. Kay and P. Lauder
  Communications of the ACM, January 1988, 

Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-17 Thread Mark Hahn
 I suspect that the main problem is that this patch is not a mainstream
 kernel feature that will gain multiple uses, but rather provides
 support for a specific vendor middleware product used by that
 vendor and a few closely allied vendors.  If it were smaller or
 less intrusive, such as a driver, this would not be a big problem.
 That's not the case.

yes, that's the crux.  CKRM is all about resolving conflicting resource 
demands in a multi-user, multi-server, multi-purpose machine.  this is a 
huge undertaking, and I'd argue that it's completely inappropriate for 
*most* servers.  that is, computers are generally so damn cheap that 
the clear trend is towards dedicating a machine to a specific purpose, 
rather than running eg, shell/MUA/MTA/FS/DB/etc all on a single machine.  

this is *directly* in conflict with certain prominent products, such as 
the Altix and various less-prominent Linux-based mainframes.  they're all
about partitioning/virtualization - the big-iron aesthetic of splitting up 
a single machine.  note that it's not just about big, since cluster-based 
approaches can clearly scale far past big-iron, and are in effect statically
partitioned.  yes, buying a hideously expensive single box, and then chopping 
it into little pieces is more than a little bizarre, and is mainly based
on a couple assumptions:

- that clusters are hard.  really, they aren't.  they are not 
necessarily higher-maintenance, can be far more robust, usually
do cost less.  just about the only bad thing about clusters is 
that they tend to be somewhat larger in size.

- that partitioning actually makes sense.  the appeal is that if 
you have a partition to yourself, you can only hurt yourself.
but it also follows that burstiness in resource demand cannot be 
overlapped without either constantly tuning the partitions or 
infringing on the guarantee.

CKRM is one of those things that could be done to Linux, and will benefit a
few, but which will almost certainly hurt *most* of the community.

let me say that the CKRM design is actually quite good.  the issue is whether 
the extensive hooks it requires can be done (at all) in a way which does 
not disporportionately hurt maintainability or efficiency.

CKRM requires hooks into every resource-allocation decision fastpath:
- if CKRM is not CONFIG, the only overhead is software maintenance.
- if CKRM is CONFIG but not loaded, the overhead is a pointer check.
- if CKRM is CONFIG and loaded, the overhead is a pointer check
and a nontrivial callback.

but really, this is only for CKRM-enforced limits.  CKRM really wants to
change behavior in a more weighted way, not just causing an
allocation/fork/packet to fail.  a really meaningful CKRM needs to 
be tightly integrated into each resource manager - effecting each scheduler
(process, memory, IO, net).  I don't really see how full-on CKRM can be 
compiled out, unless these schedulers are made fully pluggable.

finally, I observe that pluggable, class-based resource _limits_ could 
probably be done without callbacks and potentially with low overhead.
but mere limits doesn't meet CKRM's goal of flexible, wide-spread resource 
partitioning within a large, shared machine.

regards, mark hahn.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-15 Thread Andrew Morton
Christoph Hellwig <[EMAIL PROTECTED]> wrote:
>
> On Fri, Jul 15, 2005 at 01:36:53AM -0700, Andrew Morton wrote:
> > 
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.13-rc3/2.6.13-rc3-mm1/
> > 
> > (http://www.zip.com.au/~akpm/linux/patches/stuff/2.6.13-rc3-mm1.gz until
> > kernel.org syncs up)
> > 
> > 
> > - Added the CKRM patches.  This is just here for people to look at at this
> >   stage.
> 
> Andrew, do we really need to add every piece of crap lying on the street
> to -mm?  It's far away from mainline enough already without adding obviously
> unmergeable stuff like this.

My gut reaction to ckrm is the same as yours.  But there's been a lot of
work put into this and if we're to flatly reject the feature then the
developers are owed a much better reason than "eww yuk".

Otherwise, if there are certain specific problems in the code then it's
best that they be pointed out now rather than later on.

What, in your opinion, makes it "obviously unmregeable"?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-15 Thread Andrew Morton
Christoph Hellwig [EMAIL PROTECTED] wrote:

 On Fri, Jul 15, 2005 at 01:36:53AM -0700, Andrew Morton wrote:
  
  ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.13-rc3/2.6.13-rc3-mm1/
  
  (http://www.zip.com.au/~akpm/linux/patches/stuff/2.6.13-rc3-mm1.gz until
  kernel.org syncs up)
  
  
  - Added the CKRM patches.  This is just here for people to look at at this
stage.
 
 Andrew, do we really need to add every piece of crap lying on the street
 to -mm?  It's far away from mainline enough already without adding obviously
 unmergeable stuff like this.

My gut reaction to ckrm is the same as yours.  But there's been a lot of
work put into this and if we're to flatly reject the feature then the
developers are owed a much better reason than eww yuk.

Otherwise, if there are certain specific problems in the code then it's
best that they be pointed out now rather than later on.

What, in your opinion, makes it obviously unmregeable?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/