from:"Gerrit Huizenga"

Re: Documentation of kernel messages (Summary)

2007-07-09 Thread Gerrit Huizenga

On Mon, 09 Jul 2007 12:48:24 EDT, Rob Landley wrote:
> In regard to translating kernel messages:
> 
> On Monday 09 July 2007 01:36:31 H. Peter Anvin wrote:
> > Kunai, Takashi wrote:
> > > (1) Your kernel development proposal will be greatly supported by
> > > Japanese vendor community. At the same time, it needs support from the
> > > kernel communities, as well.
> >
> > There is a very strong reason for the kernel community to NOT support
> > this: it makes it much harder to deal with bug reports.
> 
> Agreed.

[...]

> 
> As for the documentation translations themselves, I note that Eric Raymond 
> dissuaded me from actually hosting translations of kernel documentation on 
> http://kernel.org/doc due to his experience with translations of his own 
> writings.  If he hosts the translations on his website, they never get 
> updated again.  But if he makes the contributor host them on their own 
> website, then they sometimes get updated.
> 
> For my part, I can't _tell_ when a given translation is out of date because I 
> can't read it, and I certainly can't update it.  So I agree with Eric and I'm 
> linking to sites hosting kernel documentation in other languages rather than 
> hosting them myself.  I've got a good link to a Japanese site, need to ping 
> the contributors of the chinese documentation and see if they have a site for 
> it...

Yeah, but it seems like having a translations directory in the kernel
avoids that problem - anyone can update, it is a single source, no digging
for sites that aren't tied to the kernel, available in the distros
directly, etc.

I'm not sure that the web site hosting argument is a good one.  Arch
maintainers and Language Maintainers have some similarities.  Also, being
able to clearly mark which version of a document was last translated
would help a bit with the fact that most of us aren't proficient in all
of the world's languages.  But, knowing who the translation maintainer is,
and when the translation was last updated allows both the original doc
maintainer and the translation document maintainer to see when a document
likely needs to be updated.  And that is probably good enough.

gerrit
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Documentation of kernel messages (Summary)

2007-07-09 Thread Gerrit Huizenga


On Mon, 09 Jul 2007 12:48:24 EDT, Rob Landley wrote:
 In regard to translating kernel messages:
 
 On Monday 09 July 2007 01:36:31 H. Peter Anvin wrote:
  Kunai, Takashi wrote:
   (1) Your kernel development proposal will be greatly supported by
   Japanese vendor community. At the same time, it needs support from the
   kernel communities, as well.
 
  There is a very strong reason for the kernel community to NOT support
  this: it makes it much harder to deal with bug reports.
 
 Agreed.

[...]

 
 As for the documentation translations themselves, I note that Eric Raymond 
 dissuaded me from actually hosting translations of kernel documentation on 
 http://kernel.org/doc due to his experience with translations of his own 
 writings.  If he hosts the translations on his website, they never get 
 updated again.  But if he makes the contributor host them on their own 
 website, then they sometimes get updated.
 
 For my part, I can't _tell_ when a given translation is out of date because I 
 can't read it, and I certainly can't update it.  So I agree with Eric and I'm 
 linking to sites hosting kernel documentation in other languages rather than 
 hosting them myself.  I've got a good link to a Japanese site, need to ping 
 the contributors of the chinese documentation and see if they have a site for 
 it...

Yeah, but it seems like having a translations directory in the kernel
avoids that problem - anyone can update, it is a single source, no digging
for sites that aren't tied to the kernel, available in the distros
directly, etc.

I'm not sure that the web site hosting argument is a good one.  Arch
maintainers and Language Maintainers have some similarities.  Also, being
able to clearly mark which version of a document was last translated
would help a bit with the fact that most of us aren't proficient in all
of the world's languages.  But, knowing who the translation maintainer is,
and when the translation was last updated allows both the original doc
maintainer and the translation document maintainer to see when a document
likely needs to be updated.  And that is probably good enough.

gerrit
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC/PATCH] Documentation of kernel messages

2007-06-18 Thread Gerrit Huizenga

On Mon, 18 Jun 2007 17:13:19 PDT, Tim Bird wrote:
> Gerrit Huizenga wrote:
> > Further, yet another kernel config option could allow distros to output
> > the calculated MD5 sum to be printed, much like we do with timestamps
> > today.
> 
> > Comments?
> 
> Would the compiled-in text then also become replaceable?
> Or is the MD5 sum output expected to be in addition to
> the regular English message?

 The MD5 sum would be calculated at print time, no proposal in
 here to replace the in-kernel text.  And, I'm not sure that replacing
 it with an MD5 sum would every be significantly shorter, ergo
 I don't think this helps your problem.

 The methods of post-processing to "shrink" the kernel text here
 seem too ugly to ponder.  And I just pondered a few that made my
 head hurt (sort the MD5 sums, re-insert the number of the MD5 sum
 as an integer instead of the original text, and, beyond being gross,
 what do you do with the formatting info about the args?).

> If message replacement at compile-time is supported, this
> could allow for creating "short" versions of the messages,
> which could have a beneficial impact on kernel size.
> 
> Right now, it is possible to completely disable printks
> and re-claim about 100K of memory.  However, in some
> embedded configurations, even if you are space-constrained
> it's desirable to retain some of the printks, for in-field
> debugging.  Thus not very many embedded developers
> disable printks completely, even though the option has
> been supported for a while.  (That, and many aren't caught
> up to the kernel version where it was introduced (2.6.10) :-)

 Which ones matter?  Odds are you could use the KERNEL_LOGLEVEL or
 such to mark those, then compile out the rest.

> But compressed messages (shortened text through abbreviations,
> or just outputting the ID alone, etc.) could save SOME
> of the space, in trade for less readability.  Heck, just
> removing all vowels would probably save 10k, and not
> hurt readability that much.

 Ick.  Are you serious?  How about running them through a valley
 girl filter and then converting to high school text messaging?

> Finally, for testing, it's handy to also
> have automatic translation generators.
> At a former company I worked for, they had translators
> that would output:
>  * all messages shortened by 20%
>  * all messages lengthened by 20%
>  * every message converted to pig-latin

 Double yucko.

> These were mostly used for testing if the strings broke
> screen real-estate constraints (which don't apply to
> kernel messages).  But the automatic translators
> would sometimes catch messages with weird attributes.

  I don't think people are worried about the correctness of
  the messages and message formats - much of that can actually
  be detected by simple tools.  The goal here was to at least
  allow people that support operating systems and for whom
  English (and English collation sequences) is not a primary
  language do some initial system diagosis.

  I think compiling out the messages of something below some LOGLEVEL
  is a lot more practical.

gerrit
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC/PATCH] Documentation of kernel messages

2007-06-18 Thread Gerrit Huizenga


On Mon, 18 Jun 2007 17:13:19 PDT, Tim Bird wrote:
 Gerrit Huizenga wrote:
  Further, yet another kernel config option could allow distros to output
  the calculated MD5 sum to be printed, much like we do with timestamps
  today.
 
  Comments?
 
 Would the compiled-in text then also become replaceable?
 Or is the MD5 sum output expected to be in addition to
 the regular English message?
 
 The MD5 sum would be calculated at print time, no proposal in
 here to replace the in-kernel text.  And, I'm not sure that replacing
 it with an MD5 sum would every be significantly shorter, ergo
 I don't think this helps your problem.

 The methods of post-processing to shrink the kernel text here
 seem too ugly to ponder.  And I just pondered a few that made my
 head hurt (sort the MD5 sums, re-insert the number of the MD5 sum
 as an integer instead of the original text, and, beyond being gross,
 what do you do with the formatting info about the args?).

 If message replacement at compile-time is supported, this
 could allow for creating short versions of the messages,
 which could have a beneficial impact on kernel size.
 
 Right now, it is possible to completely disable printks
 and re-claim about 100K of memory.  However, in some
 embedded configurations, even if you are space-constrained
 it's desirable to retain some of the printks, for in-field
 debugging.  Thus not very many embedded developers
 disable printks completely, even though the option has
 been supported for a while.  (That, and many aren't caught
 up to the kernel version where it was introduced (2.6.10) :-)

 Which ones matter?  Odds are you could use the KERNEL_LOGLEVEL or
 such to mark those, then compile out the rest.

 But compressed messages (shortened text through abbreviations,
 or just outputting the ID alone, etc.) could save SOME
 of the space, in trade for less readability.  Heck, just
 removing all vowels would probably save 10k, and not
 hurt readability that much.
 
 Ick.  Are you serious?  How about running them through a valley
 girl filter and then converting to high school text messaging?

 Finally, for testing, it's handy to also
 have automatic translation generators.
 At a former company I worked for, they had translators
 that would output:
  * all messages shortened by 20%
  * all messages lengthened by 20%
  * every message converted to pig-latin

 Double yucko.

 These were mostly used for testing if the strings broke
 screen real-estate constraints (which don't apply to
 kernel messages).  But the automatic translators
 would sometimes catch messages with weird attributes.
 
  I don't think people are worried about the correctness of
  the messages and message formats - much of that can actually
  be detected by simple tools.  The goal here was to at least
  allow people that support operating systems and for whom
  English (and English collation sequences) is not a primary
  language do some initial system diagosis.

  I think compiling out the messages of something below some LOGLEVEL
  is a lot more practical.

gerrit
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC/PATCH] Documentation of kernel messages

2007-06-15 Thread Gerrit Huizenga

On Fri, 15 Jun 2007 11:51:51 PDT, Randy Dunlap wrote:
> 
> For those of us who have not been in on these meetings, I think that
> some serious justifications are needed.  The last paragraph began to
> go in that direction, but it needs to be more detailed and convincing.
> And "for debugging" doesn't cut it IMO.
> 
> -- 
> ~Randy

Fair point, Randy.

This came up this time because people in Japan supporting Linux customers
are not always familiar with Linux kernel internals and thus debugging
problems based on a terse, English language message is just painful.  Also,
any given message doesn't really provide any input about what *really*
failed, what should be done about it, how to correct this issue, or what
parts to pull out of your machine and replace with properly working parts.

So, the Message Pedia is basically a Japanese language wiki today where
people doing support can annotate the messages with real life experience on
what to *do* about the error.

Part of the problem is that the error messages when extracted don't
stand alone and don't easily allow tracing back to the real source.  When
the problem is handed from the second line support folks deeper into the
support or development organization, the message has to be found in the
source tree in some non-ambiguous fashion.

The other problem that comes up in other, non-English locales is that
debugging your Linux box when you are a non-native English speaker is
just plain impossible.  Providing a means for getting localized kernel
messages allows more end users to at least reasonably debug their own
machine.  Most every user level project today allows localization directly
but the kernel has never had any reasonable mechanism for localisation.

Now, the proposal here doesn't make the kernel "as good as" user level
application, but it would at least provide a searchable key to help
people get localized guidance on problems.

I'm sure there are other benefits that most can see based on just simple
debuggability.  But the Japanese goal is really to construct a database
of debugging info & context based on their experiences.

And, I'm sure that others can speak for themselves.  I'm mostly playing
scribe on this one - we abandoned our error and event subsystem work years
ago, even though this was one of the biggest weaknesses we saw in customer
support 3-5 years ago, simply because our changes were still viewed as
too invasive for mainline kernel.  These proposed changes help a lot, while
remaining almost completely invisible to most developers.

gerrit
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC/PATCH] Documentation of kernel messages

2007-06-15 Thread Gerrit Huizenga

On Thu, 14 Jun 2007 12:38:53 +0200, holzheu wrote:
> On Thu, 2007-06-14 at 11:41 +0200, Jan Kara wrote:
> >   
> > 
> > > Your proposal is similar to one I made to some Japanese developers
> > > earlier this year.  I was more modest, proposing that we
> > > 
> > > - add an enhanced printk
> > > 
> > >   xxprintk(msgid, KERN_ERR "some text %d\n", some_number);
> >   Maybe a stupid idea but why do we want to assign these numbers by hand?
> > I can imagine it could introduce collisions when merging tons of patches
> > with new messages... Wouldn't it be better to compute say, 8-byte hash
> > from the message and use it as it's identifier? We could do this
> > automagically at compile time.
> 
> Of course automatically generated message numbers would be great and
> something like:
> 
> hub.4a5bcd77: Detected some problem.
> 
> looks acceptable for me.
> 
> We could generate the hash using the format string of the printk. Since
> we specify the format string also in KMSG_DOC, the hash for the
> KMSG_DOC and the printk should match and we have the required link
> between printk and description.
> 
> So technically that's probably doable.
> 
> Problems are:
> 
> * hashes are not unique
> * We need an additional preprocessor step
> * The might be people, who find 8 character hash values ugly in printks
> 
> The big advantage is, that we do not need to maintain message numbers.
> 
> > I know it also has it's problems - you
> > fix a spelling and the message gets a different id and you have to
> > update translation/documentation catalogue but maybe that could be
> > solved too...
> 
> Since in our approach the message catalog is created automatically for
> exactly one kernel and the message catalog belongs therefore to exactly
> one kernel, I think the problem of changing error numbers is not too
> big.

We just had a meeting with the Japanese and several other participants
from the vendor and community side and came up with a potential proposal
that is similar to many things discussed here.  It has the benefit that
it seems implementable and low/no overhead on most kernel developers.

The basic proposal is to use a tool, run by the kernel Makefile to
extract kernel messages from either the kernel source or the .i files
(both have advantages, although I prefer the source to the .i file
since it 1) gets all messages and 2) is probably a little quicker with
less impact to the standard kernel make.

These messages would be stored in a file in the source tree, e.g.
usr/src/linux/Translations/English.  As each message is added to that
file, we calculate, say, an MD5 sum of the printk (dev_printk, sdev_printk,
etc.) string, and the text file ultimately contains:

MD5 Checksum of text; the printk text itself, the File name, the line number.

The checksum is run over just the printk.  We definitely would not include
the line number since the line number is too volatile.  Including the
file name in the hash *might* help disambiguate the hash a bit better in
the case of duplicates, but there was some debate that duplicates might
be better handled in other ways.

Andrew mentioned a mechanism for adding a subsystem tag or other tag
which helps disambiguate the message, either in the message file or in
the end user documentation (e.g. the Message Pedia/mPedia that the Japanese
have already created with ~350 messages, and a total of ~700 targetted
by the end of the year).

That tag could be appended to the beginning of the printk, to the end of
the printk, or even in a formatted comment at the end of the printk that
the tool could extract.

Then, the translations could be managed by anyone outside of the normal/
core kernel community, by simply creating a translation file, e.g.
usr/src/linux/Translations/Japanese, which contained the MD5 sum, the
translated message, the file name and line number (the last two redundent
perhaps but informational, and automatically generated if possible).

The files in the Translations directory could be uesd as the unique
keys for an external database (such as the Message Pedia, vendors or
distributions help pages, etc.) to help look up and explain root cause
of a problem.  The key property here is that the MD5 sum becomes the
key to all database entries to look up that key.

Further, yet another kernel config option could allow distros to output
the calculated MD5 sum to be printed, much like we do with timestamps
today.

End result is that these in-kernel message catalogs for translation are
updated automatically (mostly no kernel developer changes needed) and
the translations can be maintained by anyone who is interested.

On the topic of MD5 collisions, using a disambiguating tag would be a
simple addition for the few cases where that happens, the tool could
be educated to use that tag in the calculation of the MD5 sum, and we
have a 98% solution which impacts <1% of the kernel developers.

Folks present for this discussion included Ted T'so, James Bottomley,
several of the key Japenese

Re: [RFC/PATCH] Documentation of kernel messages

2007-06-15 Thread Gerrit Huizenga

On Thu, 14 Jun 2007 12:38:53 +0200, holzheu wrote:
 On Thu, 2007-06-14 at 11:41 +0200, Jan Kara wrote:
snip
  
   Your proposal is similar to one I made to some Japanese developers
   earlier this year.  I was more modest, proposing that we
   
   - add an enhanced printk
   
 xxprintk(msgid, KERN_ERR some text %d\n, some_number);
Maybe a stupid idea but why do we want to assign these numbers by hand?
  I can imagine it could introduce collisions when merging tons of patches
  with new messages... Wouldn't it be better to compute say, 8-byte hash
  from the message and use it as it's identifier? We could do this
  automagically at compile time.
 
 Of course automatically generated message numbers would be great and
 something like:
 
 hub.4a5bcd77: Detected some problem.
 
 looks acceptable for me.
 
 We could generate the hash using the format string of the printk. Since
 we specify the format string also in KMSG_DOC, the hash for the
 KMSG_DOC and the printk should match and we have the required link
 between printk and description.
 
 So technically that's probably doable.
 
 Problems are:
 
 * hashes are not unique
 * We need an additional preprocessor step
 * The might be people, who find 8 character hash values ugly in printks
 
 The big advantage is, that we do not need to maintain message numbers.
 
  I know it also has it's problems - you
  fix a spelling and the message gets a different id and you have to
  update translation/documentation catalogue but maybe that could be
  solved too...
 
 Since in our approach the message catalog is created automatically for
 exactly one kernel and the message catalog belongs therefore to exactly
 one kernel, I think the problem of changing error numbers is not too
 big.

We just had a meeting with the Japanese and several other participants
from the vendor and community side and came up with a potential proposal
that is similar to many things discussed here.  It has the benefit that
it seems implementable and low/no overhead on most kernel developers.

The basic proposal is to use a tool, run by the kernel Makefile to
extract kernel messages from either the kernel source or the .i files
(both have advantages, although I prefer the source to the .i file
since it 1) gets all messages and 2) is probably a little quicker with
less impact to the standard kernel make.

These messages would be stored in a file in the source tree, e.g.
usr/src/linux/Translations/English.  As each message is added to that
file, we calculate, say, an MD5 sum of the printk (dev_printk, sdev_printk,
etc.) string, and the text file ultimately contains:

MD5 Checksum of text; the printk text itself, the File name, the line number.

The checksum is run over just the printk.  We definitely would not include
the line number since the line number is too volatile.  Including the
file name in the hash *might* help disambiguate the hash a bit better in
the case of duplicates, but there was some debate that duplicates might
be better handled in other ways.

Andrew mentioned a mechanism for adding a subsystem tag or other tag
which helps disambiguate the message, either in the message file or in
the end user documentation (e.g. the Message Pedia/mPedia that the Japanese
have already created with ~350 messages, and a total of ~700 targetted
by the end of the year).

That tag could be appended to the beginning of the printk, to the end of
the printk, or even in a formatted comment at the end of the printk that
the tool could extract.

Then, the translations could be managed by anyone outside of the normal/
core kernel community, by simply creating a translation file, e.g.
usr/src/linux/Translations/Japanese, which contained the MD5 sum, the
translated message, the file name and line number (the last two redundent
perhaps but informational, and automatically generated if possible).

The files in the Translations directory could be uesd as the unique
keys for an external database (such as the Message Pedia, vendors or
distributions help pages, etc.) to help look up and explain root cause
of a problem.  The key property here is that the MD5 sum becomes the
key to all database entries to look up that key.

Further, yet another kernel config option could allow distros to output
the calculated MD5 sum to be printed, much like we do with timestamps
today.

End result is that these in-kernel message catalogs for translation are
updated automatically (mostly no kernel developer changes needed) and
the translations can be maintained by anyone who is interested.

On the topic of MD5 collisions, using a disambiguating tag would be a
simple addition for the few cases where that happens, the tool could
be educated to use that tag in the calculation of the MD5 sum, and we
have a 98% solution which impacts 1% of the kernel developers.

Folks present for this discussion included Ted T'so, James Bottomley,
several of the key Japenese folks interested in using this for debugging,
and reps from several

Re: [RFC/PATCH] Documentation of kernel messages

2007-06-15 Thread Gerrit Huizenga


On Fri, 15 Jun 2007 11:51:51 PDT, Randy Dunlap wrote:
 
 For those of us who have not been in on these meetings, I think that
 some serious justifications are needed.  The last paragraph began to
 go in that direction, but it needs to be more detailed and convincing.
 And for debugging doesn't cut it IMO.
 
 -- 
 ~Randy

Fair point, Randy.

This came up this time because people in Japan supporting Linux customers
are not always familiar with Linux kernel internals and thus debugging
problems based on a terse, English language message is just painful.  Also,
any given message doesn't really provide any input about what *really*
failed, what should be done about it, how to correct this issue, or what
parts to pull out of your machine and replace with properly working parts.

So, the Message Pedia is basically a Japanese language wiki today where
people doing support can annotate the messages with real life experience on
what to *do* about the error.

Part of the problem is that the error messages when extracted don't
stand alone and don't easily allow tracing back to the real source.  When
the problem is handed from the second line support folks deeper into the
support or development organization, the message has to be found in the
source tree in some non-ambiguous fashion.

The other problem that comes up in other, non-English locales is that
debugging your Linux box when you are a non-native English speaker is
just plain impossible.  Providing a means for getting localized kernel
messages allows more end users to at least reasonably debug their own
machine.  Most every user level project today allows localization directly
but the kernel has never had any reasonable mechanism for localisation.

Now, the proposal here doesn't make the kernel as good as user level
application, but it would at least provide a searchable key to help
people get localized guidance on problems.

I'm sure there are other benefits that most can see based on just simple
debuggability.  But the Japanese goal is really to construct a database
of debugging info  context based on their experiences.

And, I'm sure that others can speak for themselves.  I'm mostly playing
scribe on this one - we abandoned our error and event subsystem work years
ago, even though this was one of the biggest weaknesses we saw in customer
support 3-5 years ago, simply because our changes were still viewed as
too invasive for mainline kernel.  These proposed changes help a lot, while
remaining almost completely invisible to most developers.

gerrit
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Ksummit-2007-discuss] Re: [Ksummit-2006-discuss] 2007 Linux Kernel Summit

2007-01-31 Thread Gerrit Huizenga

On Wed, 31 Jan 2007 15:30:43 PST, "H. Peter Anvin" wrote:
> Gerrit Huizenga wrote:
> > Don't confused KS with a conference;
> > it is a workshop for a very, very large, very very active project.
> 
> ... and *growing*, which is the real issue I think.
> 
> Something that might make sense for KS is to have multiple sessions 
> (perhaps replacing some or all of the "mini-summits" that have cropped 
> up) combined with some bigger, overall sessions.  At least that way 
> there would be more cross-pollination between the various groups than if 
> we eventually end up meeting everywhere.
> 
> That's of course only practical if KS is separated from any other 
> conference (like OLS.)

Are you thinking something like "core VM/scheduler/locking/etc." as one set of
not-quite-so-mini-summit, and a "block IO/storage drivers/filesystems" as 
another,
"arch maintainers" as another, and "all the nutty drivers and their writers" as
perhaps a fourth?  In other words, some semi-logical grouping of issues
each as more free floating meetings?  Or did I miss your suggestion?

Easy on the judgement on practicality, btw.  For instance, FAST is going
to try to do some part of one of these - possibly larger than a networking
mini-summit in scope but otherwise with similar goals.

I think there are some options to consider for hosting some targetted
working meetings in some of these areas, including the examples already
given for some mini-summits.  Some sponsors might help set up mini-summits
(and some have in the psat), including considering the Linux Foundation as
they do with the Desktop Architects Meeting (my favorite DAM meeting!).

The challenge is to figure out what people want to have happen, the see if
we can make it happen.

gerrit
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Ksummit-2007-discuss] Re: [Ksummit-2006-discuss] 2007 Linux Kernel Summit

2007-01-31 Thread Gerrit Huizenga

On Wed, 31 Jan 2007 23:49:11 +0100, Jes Sorensen wrote:
> 
> Gerrit mentioned that half the committee shows up to be dead weight when
> it comes down to the crunch at the end, so if this is the case, does it
> really make sense to keep said members on the committee? LCA had how
> many proposals? they handled it with a 7-8 member group I believe, and
> yes I know Rusty did bitch about having to read a couple of hundred
> papers, but they did pretty darn well.

I believe in that same post, I pointed out that throughout the prep
period, all members *did* have a valuable contribution.  Don't use half
the info to make a point, please.

And for paper & proposal reviews, also having been on the OLS program
committee for several years, I can guarantee you that these are two different
birds.  Paper proposals are more static, have a more or less intrinsic
value that you can assess at a single reading.  KS is *much* more dynamic,
and would be just another conference if it weren't.  KS is about current
issues, and actions to address those issues.  The *actions* part is a lot
harder than the paper reading portion.  Don't confused KS with a conference;
it is a workshop for a very, very large, very very active project.

gerrit
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Ksummit-2007-discuss] Re: [Ksummit-2006-discuss] 2007 Linux Kernel Summit

2007-01-31 Thread Gerrit Huizenga


On Wed, 31 Jan 2007 23:49:11 +0100, Jes Sorensen wrote:
 
 Gerrit mentioned that half the committee shows up to be dead weight when
 it comes down to the crunch at the end, so if this is the case, does it
 really make sense to keep said members on the committee? LCA had how
 many proposals? they handled it with a 7-8 member group I believe, and
 yes I know Rusty did bitch about having to read a couple of hundred
 papers, but they did pretty darn well.

I believe in that same post, I pointed out that throughout the prep
period, all members *did* have a valuable contribution.  Don't use half
the info to make a point, please.

And for paper  proposal reviews, also having been on the OLS program
committee for several years, I can guarantee you that these are two different
birds.  Paper proposals are more static, have a more or less intrinsic
value that you can assess at a single reading.  KS is *much* more dynamic,
and would be just another conference if it weren't.  KS is about current
issues, and actions to address those issues.  The *actions* part is a lot
harder than the paper reading portion.  Don't confused KS with a conference;
it is a workshop for a very, very large, very very active project.

gerrit
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Ksummit-2007-discuss] Re: [Ksummit-2006-discuss] 2007 Linux Kernel Summit

2007-01-31 Thread Gerrit Huizenga


On Wed, 31 Jan 2007 15:30:43 PST, H. Peter Anvin wrote:
 Gerrit Huizenga wrote:
  Don't confused KS with a conference;
  it is a workshop for a very, very large, very very active project.
 
 ... and *growing*, which is the real issue I think.
 
 Something that might make sense for KS is to have multiple sessions 
 (perhaps replacing some or all of the mini-summits that have cropped 
 up) combined with some bigger, overall sessions.  At least that way 
 there would be more cross-pollination between the various groups than if 
 we eventually end up meeting everywhere.
 
 That's of course only practical if KS is separated from any other 
 conference (like OLS.)

Are you thinking something like core VM/scheduler/locking/etc. as one set of
not-quite-so-mini-summit, and a block IO/storage drivers/filesystems as 
another,
arch maintainers as another, and all the nutty drivers and their writers as
perhaps a fourth?  In other words, some semi-logical grouping of issues
each as more free floating meetings?  Or did I miss your suggestion?

Easy on the judgement on practicality, btw.  For instance, FAST is going
to try to do some part of one of these - possibly larger than a networking
mini-summit in scope but otherwise with similar goals.

I think there are some options to consider for hosting some targetted
working meetings in some of these areas, including the examples already
given for some mini-summits.  Some sponsors might help set up mini-summits
(and some have in the psat), including considering the Linux Foundation as
they do with the Desktop Architects Meeting (my favorite DAM meeting!).

The challenge is to figure out what people want to have happen, the see if
we can make it happen.

gerrit
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: O_DIRECT question

2007-01-11 Thread Gerrit Huizenga


On Wed, 10 Jan 2007 20:51:57 PST, Andrew Morton wrote:
> On Thu, 11 Jan 2007 10:57:06 +0800
> Aubrey <[EMAIL PROTECTED]> wrote:
> 
> > Hi all,
> >
> > Opening file with O_DIRECT flag can do the un-buffered read/write access.
> > So if I need un-buffered access, I have to change all of my
> > applications to add this flag. What's more, Some scripts like "cp
> > oldfile newfile" still use pagecache and buffer.
> > Now, my question is, is there a existing way to mount a filesystem
> > with O_DIRECT flag? so that I don't need to change anything in my
> > system. If there is no option so far, What is the right way to achieve
> > my purpose?
> 
> Not possible, basically.
> 
> O_DIRECT reads and writes must be aligned to the device's block size
> (usually 512 bytes) in memory addresses, file offsets and read/write request
> sizes.  Very few applications will bother to do that and will hence fail if
> their files are automagically opened with O_DIRECT.

Actually, technically possible.  We heard from some application people
that Sun/Solaris has this option.  Good if the application is the only
one using the filesystem.  Supposedly there were large apps which used
lots of filesystems more or less exclusively and this option made people
happy.

Although before Linus says it, I guess crack makes people happy, too.  ;)

gerrit
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: O_DIRECT question

2007-01-11 Thread Gerrit Huizenga


On Wed, 10 Jan 2007 20:51:57 PST, Andrew Morton wrote:
 On Thu, 11 Jan 2007 10:57:06 +0800
 Aubrey [EMAIL PROTECTED] wrote:
 
  Hi all,
 
  Opening file with O_DIRECT flag can do the un-buffered read/write access.
  So if I need un-buffered access, I have to change all of my
  applications to add this flag. What's more, Some scripts like cp
  oldfile newfile still use pagecache and buffer.
  Now, my question is, is there a existing way to mount a filesystem
  with O_DIRECT flag? so that I don't need to change anything in my
  system. If there is no option so far, What is the right way to achieve
  my purpose?
 
 Not possible, basically.
 
 O_DIRECT reads and writes must be aligned to the device's block size
 (usually 512 bytes) in memory addresses, file offsets and read/write request
 sizes.  Very few applications will bother to do that and will hence fail if
 their files are automagically opened with O_DIRECT.

Actually, technically possible.  We heard from some application people
that Sun/Solaris has this option.  Good if the application is the only
one using the filesystem.  Supposedly there were large apps which used
lots of filesystems more or less exclusively and this option made people
happy.

Although before Linus says it, I guess crack makes people happy, too.  ;)

gerrit
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-22 Thread Gerrit Huizenga

On Fri, 22 Jul 2005 15:53:55 BST, Alan Cox wrote:
> On Gwe, 2005-07-22 at 00:53 -0400, Mark Hahn wrote:
> > the fast path slower and less maintainable.  if you are really concerned
> > about isolating many competing servers on a single piece of hardware, then
> > run separate virtualized environments, each with its own user-space.
> 
> And the virtualisation layer has to do the same job with less
> information. That to me implies that the virtualisation case is likely
> to be materially less efficient, its just the inefficiency you are
> worried about is hidden in a different pieces of code.
> 
> Secondly a lot of this doesnt matter if CKRM=n compiles to no code
> anyway

I'm actually trying to keep the impact of CKRM=y to near-zero, ergo
only an impact if you create classes.  And even then, the goal is to
keep that impact pretty small as well.

And yes, a hypervisor does have a lot more overhead in many forms.
Something like an overall 2-3% everywhere, where the CKRM impact is
likely to be so small as to be hard to measure in the individual
subsystems, and overall performance impact should be even smaller.
Plus you won't have to manage each operating system instance which
can grow into a pain under virtualization.  But I still maintain that
both have their place.

gerrit
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-22 Thread Gerrit Huizenga


On Fri, 22 Jul 2005 15:53:55 BST, Alan Cox wrote:
 On Gwe, 2005-07-22 at 00:53 -0400, Mark Hahn wrote:
  the fast path slower and less maintainable.  if you are really concerned
  about isolating many competing servers on a single piece of hardware, then
  run separate virtualized environments, each with its own user-space.
 
 And the virtualisation layer has to do the same job with less
 information. That to me implies that the virtualisation case is likely
 to be materially less efficient, its just the inefficiency you are
 worried about is hidden in a different pieces of code.
 
 Secondly a lot of this doesnt matter if CKRM=n compiles to no code
 anyway

I'm actually trying to keep the impact of CKRM=y to near-zero, ergo
only an impact if you create classes.  And even then, the goal is to
keep that impact pretty small as well.

And yes, a hypervisor does have a lot more overhead in many forms.
Something like an overall 2-3% everywhere, where the CKRM impact is
likely to be so small as to be hard to measure in the individual
subsystems, and overall performance impact should be even smaller.
Plus you won't have to manage each operating system instance which
can grow into a pain under virtualization.  But I still maintain that
both have their place.

gerrit
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Gerrit Huizenga


On Fri, 22 Jul 2005 00:53:58 EDT, Mark Hahn wrote:
> > > > yes, that's the crux.  CKRM is all about resolving conflicting resource 
> > > > demands in a multi-user, multi-server, multi-purpose machine.  this is 
> > > > a 
> > > > huge undertaking, and I'd argue that it's completely inappropriate for 
> > > > *most* servers.  that is, computers are generally so damn cheap that 
> > > > the clear trend is towards dedicating a machine to a specific purpose, 
> > > > rather than running eg, shell/MUA/MTA/FS/DB/etc all on a single 
> > > > machine.  
> >  
> > This is a big NAK - if computers are so damn cheap, why is virtualization
> > and consolidation such a big deal?  Well, the answer is actually that
> 
> yes, you did miss my point.  I'm actually arguing that it's bad design
> to attempt to arbitrate within a single shared user-space.  you make 
> the fast path slower and less maintainable.  if you are really concerned
> about isolating many competing servers on a single piece of hardware, then
> run separate virtualized environments, each with its own user-space.

I'm willing to agree to disagree.  I'm in favor of full virtualization
as well, as it is appropriate to certain styles of workloads.  I also
have enough end users who also want to share user level, share tasks,
yet also have some level of balancing between the resource consumption
of the various environments.  I don't think you are one of those end
users, though.  I don't think I'm required to make everyone happy all
the time.  ;)

BTW, does your mailer purposefully remove cc:'s?  Seems like that is
normally considered impolite.

gerrit
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Gerrit Huizenga

Sorry - I didn't see Mark's original comment, so I'm replying to
a reply which I did get.  ;-)

On Thu, 21 Jul 2005 23:59:09 EDT, Shailabh Nagar wrote:
> Mark Hahn wrote:
> >>I suspect that the main problem is that this patch is not a mainstream
> >>kernel feature that will gain multiple uses, but rather provides
> >>support for a specific vendor middleware product used by that
> >>vendor and a few closely allied vendors.  If it were smaller or
> >>less intrusive, such as a driver, this would not be a big problem.
> >>That's not the case.
> > 
> > 
> > yes, that's the crux.  CKRM is all about resolving conflicting resource 
> > demands in a multi-user, multi-server, multi-purpose machine.  this is a 
> > huge undertaking, and I'd argue that it's completely inappropriate for 
> > *most* servers.  that is, computers are generally so damn cheap that 
> > the clear trend is towards dedicating a machine to a specific purpose, 
> > rather than running eg, shell/MUA/MTA/FS/DB/etc all on a single machine.  

This is a big NAK - if computers are so damn cheap, why is virtualization
and consolidation such a big deal?  Well, the answer is actually that
floor space, heat, and power are also continuing to be very important
in the overall equation.  And, buying machines which are dedicated but
often 80-99% idle occasionally bothers people who are concerned about
wasting planetary resources for no good reason.  Yeah, we can stamp out
thousands of metal boxes, but if just a couple can do the same work,
well, let's consolidate.  Less wasted metal, less wasted heat, less
wasted power, less air conditioning, wow, we are now part of the
eco-computing movement!  ;-)

> > this is *directly* in conflict with certain prominent products, such as 
> > the Altix and various less-prominent Linux-based mainframes.  they're all
> > about partitioning/virtualization - the big-iron aesthetic of splitting up 
> > a single machine.  note that it's not just about "big", since cluster-based 
> > approaches can clearly scale far past big-iron, and are in effect statically
> > partitioned.  yes, buying a hideously expensive single box, and then 
> > chopping 
> > it into little pieces is more than a little bizarre, and is mainly based
> > on a couple assumptions:

Well, yeah IBM has been doing this virtualization & partitioning stuff
for ages at lots of different levels for lots of reasons.  If we are
in such direct conflict with Altix, aren't we also in conflict with our
own lines of business which do the same thing?  But, well, we aren't
in conflict - this is a complementary part of our overall capabilities.

> > - that clusters are hard.  really, they aren't.  they are not 
> > necessarily higher-maintenance, can be far more robust, usually
> > do cost less.  just about the only bad thing about clusters is 
> > that they tend to be somewhat larger in size.

This is orthogonal to clusters.  Or, well, we are even using CKRM today
is some grid/cluster style applications.  But that has no bearing on
whether or not clusters is useful.

> > - that partitioning actually makes sense.  the appeal is that if 
> > you have a partition to yourself, you can only hurt yourself.
> > but it also follows that burstiness in resource demand cannot be 
> > overlapped without either constantly tuning the partitions or 
> > infringing on the guarantee.

Well, if you don't think it makes sense, don't buy one.  And stay away
from Xen, VMware, VirtualIron, PowerPC/pSeries hardware, Mainframes,
Altix, IA64 platforms, Intel VT, AMD Pacifica, and, well, anyone else
that is working to support virtualization, which is one key level of
partitioning.

I'm sorry but I'm not buying your argument here at all - it just has
no relationship to what's going on at the user side as near as I can
tell.

> > CKRM is one of those things that could be done to Linux, and will benefit a
> > few, but which will almost certainly hurt *most* of the community.
> > 
> > let me say that the CKRM design is actually quite good.  the issue is 
> > whether 
> > the extensive hooks it requires can be done (at all) in a way which does 
> > not disporportionately hurt maintainability or efficiency.

Can you be more clear on how this will hurt *most* of the community?
CKRM when not in use is not in any way intrusive.  Can you take a look
at the patch again and point out the "extensive" hooks for me?  I've
looked at "all" of them and I have trouble calling a couple of callbacks
"extensive hooks".

> > CKRM requires hooks into every resource-allocation decision fastpath:
> > - if CKRM is not CONFIG, the only overhead is software maintenance.
> > - if CKRM is CONFIG but not loaded, the overhead is a pointer check.
> > - if CKRM is CONFIG and loaded, the overhead is a pointer check
> > and a nontrivial callback.

You left out a case here:  CKRM is CONFIG and loaded and classes are
defined.

In all of the cases that you mentioned, if there are no

Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Gerrit Huizenga

On Fri, 22 Jul 2005 13:46:37 +1000, Peter Williams wrote:
> Gerrit Huizenga wrote:
> >>I imagine that the cpu controller is missing from this version of CKRM 
> >>because the bugs introduced to the cpu controller during upgrading from 
> >>2.6.5 to 2.6.10 version have not yet been resolved.
> > 
> > 
> >  I don't know what bugs you are referring to here.  I don't think we
> >  have any open defects with SuSE on the CPU scheduler in their releases.
> >  And that is not at all related to the reason for not having a CPU
> >  controller in the current patch set.
> 
> The bugs were in the patches for the 2.6.10 kernel not SuSE's 2.6.5 
> kernel.  I reported some of them to the ckrm-tech mailing list at the 
> time.  There were changes to the vanilla scheduler between 2.6.5 and 
> 2.6.10 that were not handled properly when the CKRM scheduler was 
> upgraded to the 2.6.10 kernel.

Ah - okay - that makes sense.  Those patches haven't gone through my
review yet and I'm not directly tracking their status until I figure
out what the right direction is with respect to a fair share style
scheduler of some sort.  I'm not convinced that the current one is
something that is ready for mainline or is necessarily the right answer
currently.  But we do need to figure out something that will provide
some level of CPU allocation minima & maxima for a class, where that
solution will work well on a laptop or a huge server.

Ideas in that space are welcome - I know of several proposed ideas
in progress - the scheduler in SuSE and the forward port to 2.6.10
that you referred to; an idea for building a very simple interface
on top of sched_domains for SMP systems (no fairness within a
single CPU) and a proposal for timeslice manipulation that might
provide some fairness that the Fujitsu folks are thinking about.
There are probably others and honestly, I don't have any clue yet as
to what the right long term/mainline direction should be here as yet.

gerrit
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Gerrit Huizenga


On Fri, 22 Jul 2005 11:06:14 +1000, Peter Williams wrote:
> Paul Jackson wrote:
> > Matthew wrote:
> > 
> >>I don't see the large ifdefs you're referring to in -mm's
> >>kernel/sched.c.
> > 
> > 
> > Perhaps someone who knows CKRM better than I can explain why the CKRM
> > version in some SuSE releases based on 2.6.5 kernels has substantial
> > code and some large ifdef's in sched.c, but the CKRM in *-mm doesn't.
> > Or perhaps I'm confused.  There's a good chance that this represents
> > ongoing improvements that CKRM is making to reduce their footprint
> > in core kernel code.  Or perhaps there is a more sophisticated cpu
> > controller in the SuSE kernel.
> 
> As there is NO CKRM cpu controller in 2.6.13-rc3-mm1 (that I can see) 
> the one in 2.6.5 is certainly more sophisticated :-).  So the reason 
> that the considerable mangling of sched.c evident in SuSE's 2.6.5 kernel 
> source is not present is that the cpu controller is not included in 
> these patches.
 
 Yeah - I don't really consider the current CPU controller code something
 ready for consideration yet for mainline merging.  That doesn't mean
 we don't want a CPU controller for CKRM - just that what we have
 doesn't integrate cleanly/nicely with mainline.

> I imagine that the cpu controller is missing from this version of CKRM 
> because the bugs introduced to the cpu controller during upgrading from 
> 2.6.5 to 2.6.10 version have not yet been resolved.

 I don't know what bugs you are referring to here.  I don't think we
 have any open defects with SuSE on the CPU scheduler in their releases.
 And that is not at all related to the reason for not having a CPU
 controller in the current patch set.

gerrit
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Gerrit Huizenga


On Fri, 22 Jul 2005 11:06:14 +1000, Peter Williams wrote:
 Paul Jackson wrote:
  Matthew wrote:
  
 I don't see the large ifdefs you're referring to in -mm's
 kernel/sched.c.
  
  
  Perhaps someone who knows CKRM better than I can explain why the CKRM
  version in some SuSE releases based on 2.6.5 kernels has substantial
  code and some large ifdef's in sched.c, but the CKRM in *-mm doesn't.
  Or perhaps I'm confused.  There's a good chance that this represents
  ongoing improvements that CKRM is making to reduce their footprint
  in core kernel code.  Or perhaps there is a more sophisticated cpu
  controller in the SuSE kernel.
 
 As there is NO CKRM cpu controller in 2.6.13-rc3-mm1 (that I can see) 
 the one in 2.6.5 is certainly more sophisticated :-).  So the reason 
 that the considerable mangling of sched.c evident in SuSE's 2.6.5 kernel 
 source is not present is that the cpu controller is not included in 
 these patches.
 
 Yeah - I don't really consider the current CPU controller code something
 ready for consideration yet for mainline merging.  That doesn't mean
 we don't want a CPU controller for CKRM - just that what we have
 doesn't integrate cleanly/nicely with mainline.

 I imagine that the cpu controller is missing from this version of CKRM 
 because the bugs introduced to the cpu controller during upgrading from 
 2.6.5 to 2.6.10 version have not yet been resolved.

 I don't know what bugs you are referring to here.  I don't think we
 have any open defects with SuSE on the CPU scheduler in their releases.
 And that is not at all related to the reason for not having a CPU
 controller in the current patch set.

gerrit
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Gerrit Huizenga


On Fri, 22 Jul 2005 13:46:37 +1000, Peter Williams wrote:
 Gerrit Huizenga wrote:
 I imagine that the cpu controller is missing from this version of CKRM 
 because the bugs introduced to the cpu controller during upgrading from 
 2.6.5 to 2.6.10 version have not yet been resolved.
  
  
   I don't know what bugs you are referring to here.  I don't think we
   have any open defects with SuSE on the CPU scheduler in their releases.
   And that is not at all related to the reason for not having a CPU
   controller in the current patch set.
 
 The bugs were in the patches for the 2.6.10 kernel not SuSE's 2.6.5 
 kernel.  I reported some of them to the ckrm-tech mailing list at the 
 time.  There were changes to the vanilla scheduler between 2.6.5 and 
 2.6.10 that were not handled properly when the CKRM scheduler was 
 upgraded to the 2.6.10 kernel.

Ah - okay - that makes sense.  Those patches haven't gone through my
review yet and I'm not directly tracking their status until I figure
out what the right direction is with respect to a fair share style
scheduler of some sort.  I'm not convinced that the current one is
something that is ready for mainline or is necessarily the right answer
currently.  But we do need to figure out something that will provide
some level of CPU allocation minima  maxima for a class, where that
solution will work well on a laptop or a huge server.

Ideas in that space are welcome - I know of several proposed ideas
in progress - the scheduler in SuSE and the forward port to 2.6.10
that you referred to; an idea for building a very simple interface
on top of sched_domains for SMP systems (no fairness within a
single CPU) and a proposal for timeslice manipulation that might
provide some fairness that the Fujitsu folks are thinking about.
There are probably others and honestly, I don't have any clue yet as
to what the right long term/mainline direction should be here as yet.

gerrit
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Gerrit Huizenga


Sorry - I didn't see Mark's original comment, so I'm replying to
a reply which I did get.  ;-)

On Thu, 21 Jul 2005 23:59:09 EDT, Shailabh Nagar wrote:
 Mark Hahn wrote:
 I suspect that the main problem is that this patch is not a mainstream
 kernel feature that will gain multiple uses, but rather provides
 support for a specific vendor middleware product used by that
 vendor and a few closely allied vendors.  If it were smaller or
 less intrusive, such as a driver, this would not be a big problem.
 That's not the case.
  
  
  yes, that's the crux.  CKRM is all about resolving conflicting resource 
  demands in a multi-user, multi-server, multi-purpose machine.  this is a 
  huge undertaking, and I'd argue that it's completely inappropriate for 
  *most* servers.  that is, computers are generally so damn cheap that 
  the clear trend is towards dedicating a machine to a specific purpose, 
  rather than running eg, shell/MUA/MTA/FS/DB/etc all on a single machine.  
 
This is a big NAK - if computers are so damn cheap, why is virtualization
and consolidation such a big deal?  Well, the answer is actually that
floor space, heat, and power are also continuing to be very important
in the overall equation.  And, buying machines which are dedicated but
often 80-99% idle occasionally bothers people who are concerned about
wasting planetary resources for no good reason.  Yeah, we can stamp out
thousands of metal boxes, but if just a couple can do the same work,
well, let's consolidate.  Less wasted metal, less wasted heat, less
wasted power, less air conditioning, wow, we are now part of the
eco-computing movement!  ;-)

  this is *directly* in conflict with certain prominent products, such as 
  the Altix and various less-prominent Linux-based mainframes.  they're all
  about partitioning/virtualization - the big-iron aesthetic of splitting up 
  a single machine.  note that it's not just about big, since cluster-based 
  approaches can clearly scale far past big-iron, and are in effect statically
  partitioned.  yes, buying a hideously expensive single box, and then 
  chopping 
  it into little pieces is more than a little bizarre, and is mainly based
  on a couple assumptions:

Well, yeah IBM has been doing this virtualization  partitioning stuff
for ages at lots of different levels for lots of reasons.  If we are
in such direct conflict with Altix, aren't we also in conflict with our
own lines of business which do the same thing?  But, well, we aren't
in conflict - this is a complementary part of our overall capabilities.

  - that clusters are hard.  really, they aren't.  they are not 
  necessarily higher-maintenance, can be far more robust, usually
  do cost less.  just about the only bad thing about clusters is 
  that they tend to be somewhat larger in size.

This is orthogonal to clusters.  Or, well, we are even using CKRM today
is some grid/cluster style applications.  But that has no bearing on
whether or not clusters is useful.

  - that partitioning actually makes sense.  the appeal is that if 
  you have a partition to yourself, you can only hurt yourself.
  but it also follows that burstiness in resource demand cannot be 
  overlapped without either constantly tuning the partitions or 
  infringing on the guarantee.
 
Well, if you don't think it makes sense, don't buy one.  And stay away
from Xen, VMware, VirtualIron, PowerPC/pSeries hardware, Mainframes,
Altix, IA64 platforms, Intel VT, AMD Pacifica, and, well, anyone else
that is working to support virtualization, which is one key level of
partitioning.

I'm sorry but I'm not buying your argument here at all - it just has
no relationship to what's going on at the user side as near as I can
tell.

  CKRM is one of those things that could be done to Linux, and will benefit a
  few, but which will almost certainly hurt *most* of the community.
  
  let me say that the CKRM design is actually quite good.  the issue is 
  whether 
  the extensive hooks it requires can be done (at all) in a way which does 
  not disporportionately hurt maintainability or efficiency.
 
Can you be more clear on how this will hurt *most* of the community?
CKRM when not in use is not in any way intrusive.  Can you take a look
at the patch again and point out the extensive hooks for me?  I've
looked at all of them and I have trouble calling a couple of callbacks
extensive hooks.

  CKRM requires hooks into every resource-allocation decision fastpath:
  - if CKRM is not CONFIG, the only overhead is software maintenance.
  - if CKRM is CONFIG but not loaded, the overhead is a pointer check.
  - if CKRM is CONFIG and loaded, the overhead is a pointer check
  and a nontrivial callback.

You left out a case here:  CKRM is CONFIG and loaded and classes are
defined.

In all of the cases that you mentioned, if there are no classes
defined, the overhead is still unmeasurable for any real workload.
Refer to the archives

Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Gerrit Huizenga


On Fri, 22 Jul 2005 00:53:58 EDT, Mark Hahn wrote:
yes, that's the crux.  CKRM is all about resolving conflicting resource 
demands in a multi-user, multi-server, multi-purpose machine.  this is 
a 
huge undertaking, and I'd argue that it's completely inappropriate for 
*most* servers.  that is, computers are generally so damn cheap that 
the clear trend is towards dedicating a machine to a specific purpose, 
rather than running eg, shell/MUA/MTA/FS/DB/etc all on a single 
machine.  
   
  This is a big NAK - if computers are so damn cheap, why is virtualization
  and consolidation such a big deal?  Well, the answer is actually that
 
 yes, you did miss my point.  I'm actually arguing that it's bad design
 to attempt to arbitrate within a single shared user-space.  you make 
 the fast path slower and less maintainable.  if you are really concerned
 about isolating many competing servers on a single piece of hardware, then
 run separate virtualized environments, each with its own user-space.

I'm willing to agree to disagree.  I'm in favor of full virtualization
as well, as it is appropriate to certain styles of workloads.  I also
have enough end users who also want to share user level, share tasks,
yet also have some level of balancing between the resource consumption
of the various environments.  I don't think you are one of those end
users, though.  I don't think I'm required to make everyone happy all
the time.  ;)

BTW, does your mailer purposefully remove cc:'s?  Seems like that is
normally considered impolite.

gerrit
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] add a clear_pages function to clear pages of higher order

2005-04-05 Thread Gerrit Huizenga


On Tue, 05 Apr 2005 21:48:22 PDT, David Mosberger wrote:
> > On Tue, 5 Apr 2005 17:33:59 -0700 (PDT), Christoph Lameter <[EMAIL 
> > PROTECTED]> said:
> 
>   Christoph> Which benchmark would you recommend for this?
> 
> I don't know about "recommend", but I think SPECweb, SPECjbb,
> the-UNIX-multi-user-benchmark-whose-name-I-keep-forgetting, and in
> general anything that involves process-activity and/or large working
> sets might be interesting (in other words: anything but
> microbenchmarks; I'm afraid).

SpecSDET, Aim7 or ReAim from OSDL are probably what you are thinking
of.

gerrit
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] add a clear_pages function to clear pages of higher order

2005-04-05 Thread Gerrit Huizenga


On Tue, 05 Apr 2005 21:48:22 PDT, David Mosberger wrote:
  On Tue, 5 Apr 2005 17:33:59 -0700 (PDT), Christoph Lameter [EMAIL 
  PROTECTED] said:
 
   Christoph Which benchmark would you recommend for this?
 
 I don't know about recommend, but I think SPECweb, SPECjbb,
 the-UNIX-multi-user-benchmark-whose-name-I-keep-forgetting, and in
 general anything that involves process-activity and/or large working
 sets might be interesting (in other words: anything but
 microbenchmarks; I'm afraid).

SpecSDET, Aim7 or ReAim from OSDL are probably what you are thinking
of.

gerrit
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 0/8] CKRM: Core patch set

2005-03-30 Thread Gerrit Huizenga

On Wed, 30 Mar 2005 17:32:32 PST, Paul Jackson wrote:
> A question for the CKRM developers:
> 
> What middleware packages, outside the kernel, exist or are
> in the works that will rely on CKRM?

 Primarily, CKRM classes can be instantiated today by simple
 echo's into the /rcfs filesystem.  There isn't a big need for
 a complex middleware package to set up and use CKRM.

 However, there are some tools under way to provide a small CLI
 to help with the administration for those who want it.  There
 are also some pretty minimal rc scripts underway to ensure that
 classes are configured at boot time and/or saved and restored
 across reboots and a simple config file used by that rc script.

> CKRM (like another project near and dear to me, cpusets)
> strikes me as a "middleware foundation" facility, intended
> to provide the essential kernel support required for some
> serious enterprise software.  So perhaps in addition to
> asking what end-users (of a combined kernel-middleware
> platform) exist, we should also be asking who will be
> directly using CKRM - directly layering middleware on top
> of it.

 I'm sure you could plug this into some existing workload management
 tools - lots of companies have them for managing other OS's.  Getting
 them to manage Linux with CKRM should be pretty simple for any of
 them if you really want that sort of thing.

> The details don't matter much and may have to remain
> obscured in the competitive fog.  But the presence of
> multiple groups lobbying for the same kernel infrastructure,
> as an apparent basis for competing middleware products,
> would I think weigh in CKRM's favor.

> My impression, which may not align with how the CKRM developers view
> things, is that CKRM is descendent from what have been called fair-share
> schedulers.  The following comes from the above email thread.

 CKRM is about ways of managing kernel resources - CPU would just be
 one of these.  Fairshare scheduling is similar in some respects to
 what a scheduler might need to do for such a capabilitiy.  But that
 isn't part of the code being put forward now or the set that is
 getting finalized on ckrm-tech for mainline right now.  Definitely
 useful, but a bit more challenging for getting a mainline mergeable
 version.

 BTW, one of your comments was that the word "class" was confusing.
 This may stem from the fact that there have been two approaches
 with the word "class" in them in CKRM.

 The first was that a class would be a set of resource upper/lower limits
 such as CPU, memory, number of tasks, getrlimit style resource limits,
 IO bandwidth, network connections, etc. that would be applied to some
 set of tasks.

 At last year's kernel summit, Linus suggested that classes should
 be unique to each resource, e.g. a task could be a member of a
 memory class, mem-A; a CPU resource class cpu-B, an IO resource
 class io-C.  So, now a class is specific to a resource and a task
 is effectively a member of a number of distinct and otherwise
 independent resource classes.

 The current code embodies the second definition of class, which
 provides some more useful independence of resources (they don't all
 need to tie into a common class infrastructure, which made the code
 a little more intertangled).

 With the current core code, a task is put into a particular resource
 class simply by echoes in the corresponding rcfs directory structure
 for that resource.

 A soon to be forthcoming updated patch provides a simple and a more
 interesting classification engine which allows you to specific rules
 about what processes are associated with which resource classes.
 E.g. all tasks with a particular uid can be put in the
 "oracle_mem_pig" class or all tasks with a particular gid may be
 put into the "video" scheduler class.  The classification engine allows
 for some more complex rules which are applied at task creation
 time, or at a few other points such as a change of real or effective
 uid/gid.

 In some respects, this provides for a *very* lightweight form of
 virtualization, by restricting a working set of tasks to a limited
 set of resources, without the hard boundaries of a UML or Xen style
 virtual machine.  This also allows protection for some workloads
 in the face of bursty traffic or workloads which are otherwise content
 to consume your entire machine, to the exclusion of all other activities
 on the machine.

gerrit

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 0/8] CKRM: Core patch set

2005-03-30 Thread Gerrit Huizenga

On Wed, 30 Mar 2005 22:55:05 +0200, Diego Calleja wrote:
> El Tue, 29 Mar 2005 22:05:30 -0800,
> Paul Jackson <[EMAIL PROTECTED]> escribi=F3:
> 
> 
> > worth having.  I for one am a CKRM skeptic, so won't be much help to you
> > in that quest.  Good luck.
> >
> > I don't see any performance numbers, either on small systems, or
> > scalability on large systems.  Certainly this patch does not fall under
> > the "obviously no performance impact" exclusion.
> 
> I'm one of those people who also thinks that CKRM tries to do too much 
> things, and
> although my opinion doesn't counts a lot, I'll try to explain myself anyway :)
>
> One of the things I personally don't like about CKRM its how it handles "CPU 
> resources".
> The goal of CKRM seems to be "control how much % a process can get get", but 
> the
> amount of concepts created to achieve that is too huge and too complex. For 
> the
> "CPU resources", I think that there're much simpler and better solutions. For 
> example,
> instead what CRKM proposes I propose a simpler concept: "attaching" GIDs to a
> niceness level.

Well, the current code and the stacked up patch sets don't currently
include a CPU resource controller, although the SuSE distro version does.
We've pulled back on that for the time being since the scheduler has
been under so much revision lately.  However, resource utilization at the
priority level does not allow you to say "OpenOffice can have up to 30%
of my CPU, my email client is guaranteed to get at least 5%, and Firefox +
Java apps get no more than 50% of my machine, and my CD player gets 10%".
Niceness levels provide none of that level of resource control.  Also,
GID's have no utility on a desktop machine, other than to separate
possibly background tasks like updatedb vs. all my real time apps.

> Say, we "attach" group foo to nice level -5. All users who belong to group 
> foo will have
> permissions to renice themselves to nice -5. If instead of that, group foo 
> has been
> attached at nice level 15, all processes from users who belong to foo will be 
> run at 15,
> and they won't be able to renice themselves even to the default priority (0)

 Again, great for multiuser systems if you just want people to be prioritized
 as opposed to work.  But more often on larger multiuser systems, you want 
various
 services to have priorities.  For instance, a web server may be allowed some
 rate of incoming connections or some amount of CPU bandwidth; a database may
 have memory limits, CPU limits (or allowing "at least" some percentage, 
possibly
 also limiting it from taking over the entire machine; and IO limits in terms
 amount disk traffic.  These limits may allow various clients or web servers
 to make progress without getting drowned out by some large server which
 wants to consume 100% of cpu or all of available memory.

> This should be very easy to implement, and what's more important, it'd 
> probably have
> zero performance impact at runtime - CRKM touches hot paths in the scheduler
> I think, this would just touch a few non-critical places - because we'd just 
> use a existing
> concept.

 Not currently in the patches being brought forward to LKML.

> Sure, this can't guarantee that a group will get reserved exactly 57% of  the 
> CPU, but I
> think that such level of detail is unnecesary - instead we let the kernel 
> uses the
> standard internal mechanisms to do the dirty job based in the distinction 
> between
> standard nice levels. (And we could get that level of detail just by 
> modifying the
> scheduler algorithm and adding a range of -50...0...50 nice levels ;)

 Also, with various implementation of the scheduler, the nice levels have been
 either studiously ignored or sometimes at the other extreme there has been a
 more clear stairstepping of nice levels.  Relying on predictability here based
 on the current algorithm is not a great formula for success, nor does it 
address
 the needs of most desktop or server users in any simple/easy to use way.

> For the CPU resources, we already have nice levels. The existing algorithms 
> can already
> handle priorities with them. CKRM alternative seems to be to add a second 
> scheduling
> algorithm which in super-hot paths like the ones from sched.c are, it will 
> probably have a
> performance impact. In my very humble opinion, I think we should reuse 
> existing UNIX
> concepts and combine them to achieve some of the goals CKRM tries to achieve 
> in
> a much simpler (unixy ;) way.

 I'd love to see patches which could be validated by folks like the PlanetLab
 folks, for instance.  I don't believe it is possible to get the level of 
machine
 partitioning/virtualization that CKRM provides with this overly simple 
prioritization
 scheme.

gerrit
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at

Re: [patch 0/8] CKRM: Core patch set

2005-03-30 Thread Gerrit Huizenga

On Wed, 30 Mar 2005 08:53:19 PST, Dave Hansen wrote:
> On Tue, 2005-03-29 at 23:03 -0800, Gerrit Huizenga wrote:
> > The code provides a fairly simple mechanism for adding controllers for
> > any resource type
> 
> Last time I saw the memory controller, it was 3000 lines.  Doesn't seem
> too simple to me. :)

 Chandra, Dave's suggestions for the memory controller makes a lot of
 sense.  Can you post the current code, ported to the patch set that
 I just posted, to linux-mm for comment?

> Can you post some of the additional controllers that you've been working
> on to the appropriate mailing lists, like linux-mm?  If the subject
> experts get a good look at the controllers, it's quite possible that
> some comments will cascade back to the core, don't you think?

 You can access the various current controllers via the ckrm-tech
 archives from sf.net/projects/ckrm today.

 However, if there are additional changes to the core, I'd like to
 see them as patches built on top of this core set.  Resending the
 modified core each time makes it hard for people to see what has
 changed from release to release, where individual patches will help
 track modifications better.

gerrit
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 0/8] CKRM: Core patch set

2005-03-30 Thread Gerrit Huizenga


On Wed, 30 Mar 2005 08:53:19 PST, Dave Hansen wrote:
 On Tue, 2005-03-29 at 23:03 -0800, Gerrit Huizenga wrote:
  The code provides a fairly simple mechanism for adding controllers for
  any resource type
 
 Last time I saw the memory controller, it was 3000 lines.  Doesn't seem
 too simple to me. :)
 
 Chandra, Dave's suggestions for the memory controller makes a lot of
 sense.  Can you post the current code, ported to the patch set that
 I just posted, to linux-mm for comment?

 Can you post some of the additional controllers that you've been working
 on to the appropriate mailing lists, like linux-mm?  If the subject
 experts get a good look at the controllers, it's quite possible that
 some comments will cascade back to the core, don't you think?

 You can access the various current controllers via the ckrm-tech
 archives from sf.net/projects/ckrm today.

 However, if there are additional changes to the core, I'd like to
 see them as patches built on top of this core set.  Resending the
 modified core each time makes it hard for people to see what has
 changed from release to release, where individual patches will help
 track modifications better.

gerrit
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 0/8] CKRM: Core patch set

2005-03-30 Thread Gerrit Huizenga


On Wed, 30 Mar 2005 22:55:05 +0200, Diego Calleja wrote:
 El Tue, 29 Mar 2005 22:05:30 -0800,
 Paul Jackson [EMAIL PROTECTED] escribi=F3:
 
 
  worth having.  I for one am a CKRM skeptic, so won't be much help to you
  in that quest.  Good luck.
 
  I don't see any performance numbers, either on small systems, or
  scalability on large systems.  Certainly this patch does not fall under
  the obviously no performance impact exclusion.
 
 I'm one of those people who also thinks that CKRM tries to do too much 
 things, and
 although my opinion doesn't counts a lot, I'll try to explain myself anyway :)

 One of the things I personally don't like about CKRM its how it handles CPU 
 resources.
 The goal of CKRM seems to be control how much % a process can get get, but 
 the
 amount of concepts created to achieve that is too huge and too complex. For 
 the
 CPU resources, I think that there're much simpler and better solutions. For 
 example,
 instead what CRKM proposes I propose a simpler concept: attaching GIDs to a
 niceness level.

Well, the current code and the stacked up patch sets don't currently
include a CPU resource controller, although the SuSE distro version does.
We've pulled back on that for the time being since the scheduler has
been under so much revision lately.  However, resource utilization at the
priority level does not allow you to say OpenOffice can have up to 30%
of my CPU, my email client is guaranteed to get at least 5%, and Firefox +
Java apps get no more than 50% of my machine, and my CD player gets 10%.
Niceness levels provide none of that level of resource control.  Also,
GID's have no utility on a desktop machine, other than to separate
possibly background tasks like updatedb vs. all my real time apps.

 Say, we attach group foo to nice level -5. All users who belong to group 
 foo will have
 permissions to renice themselves to nice -5. If instead of that, group foo 
 has been
 attached at nice level 15, all processes from users who belong to foo will be 
 run at 15,
 and they won't be able to renice themselves even to the default priority (0)
 
 Again, great for multiuser systems if you just want people to be prioritized
 as opposed to work.  But more often on larger multiuser systems, you want 
various
 services to have priorities.  For instance, a web server may be allowed some
 rate of incoming connections or some amount of CPU bandwidth; a database may
 have memory limits, CPU limits (or allowing at least some percentage, 
possibly
 also limiting it from taking over the entire machine; and IO limits in terms
 amount disk traffic.  These limits may allow various clients or web servers
 to make progress without getting drowned out by some large server which
 wants to consume 100% of cpu or all of available memory.

 This should be very easy to implement, and what's more important, it'd 
 probably have
 zero performance impact at runtime - CRKM touches hot paths in the scheduler
 I think, this would just touch a few non-critical places - because we'd just 
 use a existing
 concept.

 Not currently in the patches being brought forward to LKML.

 Sure, this can't guarantee that a group will get reserved exactly 57% of  the 
 CPU, but I
 think that such level of detail is unnecesary - instead we let the kernel 
 uses the
 standard internal mechanisms to do the dirty job based in the distinction 
 between
 standard nice levels. (And we could get that level of detail just by 
 modifying the
 scheduler algorithm and adding a range of -50...0...50 nice levels ;)
 
 Also, with various implementation of the scheduler, the nice levels have been
 either studiously ignored or sometimes at the other extreme there has been a
 more clear stairstepping of nice levels.  Relying on predictability here based
 on the current algorithm is not a great formula for success, nor does it 
address
 the needs of most desktop or server users in any simple/easy to use way.

 For the CPU resources, we already have nice levels. The existing algorithms 
 can already
 handle priorities with them. CKRM alternative seems to be to add a second 
 scheduling
 algorithm which in super-hot paths like the ones from sched.c are, it will 
 probably have a
 performance impact. In my very humble opinion, I think we should reuse 
 existing UNIX
 concepts and combine them to achieve some of the goals CKRM tries to achieve 
 in
 a much simpler (unixy ;) way.

 I'd love to see patches which could be validated by folks like the PlanetLab
 folks, for instance.  I don't believe it is possible to get the level of 
machine
 partitioning/virtualization that CKRM provides with this overly simple 
prioritization
 scheme.

gerrit
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 0/8] CKRM: Core patch set

2005-03-30 Thread Gerrit Huizenga


On Wed, 30 Mar 2005 17:32:32 PST, Paul Jackson wrote:
 A question for the CKRM developers:
 
 What middleware packages, outside the kernel, exist or are
 in the works that will rely on CKRM?
 
 Primarily, CKRM classes can be instantiated today by simple
 echo's into the /rcfs filesystem.  There isn't a big need for
 a complex middleware package to set up and use CKRM.

 However, there are some tools under way to provide a small CLI
 to help with the administration for those who want it.  There
 are also some pretty minimal rc scripts underway to ensure that
 classes are configured at boot time and/or saved and restored
 across reboots and a simple config file used by that rc script.

 CKRM (like another project near and dear to me, cpusets)
 strikes me as a middleware foundation facility, intended
 to provide the essential kernel support required for some
 serious enterprise software.  So perhaps in addition to
 asking what end-users (of a combined kernel-middleware
 platform) exist, we should also be asking who will be
 directly using CKRM - directly layering middleware on top
 of it.

 I'm sure you could plug this into some existing workload management
 tools - lots of companies have them for managing other OS's.  Getting
 them to manage Linux with CKRM should be pretty simple for any of
 them if you really want that sort of thing.

 The details don't matter much and may have to remain
 obscured in the competitive fog.  But the presence of
 multiple groups lobbying for the same kernel infrastructure,
 as an apparent basis for competing middleware products,
 would I think weigh in CKRM's favor.

 My impression, which may not align with how the CKRM developers view
 things, is that CKRM is descendent from what have been called fair-share
 schedulers.  The following comes from the above email thread.
 
 CKRM is about ways of managing kernel resources - CPU would just be
 one of these.  Fairshare scheduling is similar in some respects to
 what a scheduler might need to do for such a capabilitiy.  But that
 isn't part of the code being put forward now or the set that is
 getting finalized on ckrm-tech for mainline right now.  Definitely
 useful, but a bit more challenging for getting a mainline mergeable
 version.

 BTW, one of your comments was that the word class was confusing.
 This may stem from the fact that there have been two approaches
 with the word class in them in CKRM.
 
 The first was that a class would be a set of resource upper/lower limits
 such as CPU, memory, number of tasks, getrlimit style resource limits,
 IO bandwidth, network connections, etc. that would be applied to some
 set of tasks.

 At last year's kernel summit, Linus suggested that classes should
 be unique to each resource, e.g. a task could be a member of a
 memory class, mem-A; a CPU resource class cpu-B, an IO resource
 class io-C.  So, now a class is specific to a resource and a task
 is effectively a member of a number of distinct and otherwise
 independent resource classes.

 The current code embodies the second definition of class, which
 provides some more useful independence of resources (they don't all
 need to tie into a common class infrastructure, which made the code
 a little more intertangled).

 With the current core code, a task is put into a particular resource
 class simply by echoes in the corresponding rcfs directory structure
 for that resource.

 A soon to be forthcoming updated patch provides a simple and a more
 interesting classification engine which allows you to specific rules
 about what processes are associated with which resource classes.
 E.g. all tasks with a particular uid can be put in the
 oracle_mem_pig class or all tasks with a particular gid may be
 put into the video scheduler class.  The classification engine allows
 for some more complex rules which are applied at task creation
 time, or at a few other points such as a change of real or effective
 uid/gid.

 In some respects, this provides for a *very* lightweight form of
 virtualization, by restricting a working set of tasks to a limited
 set of resources, without the hard boundaries of a UML or Xen style
 virtual machine.  This also allows protection for some workloads
 in the face of bursty traffic or workloads which are otherwise content
 to consume your entire machine, to the exclusion of all other activities
 on the machine.

gerrit

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 0/8] CKRM: Core patch set

2005-03-29 Thread Gerrit Huizenga

On Tue, 29 Mar 2005 22:05:30 PST, Paul Jackson wrote:
> gerrit wrote:
> > This is the core patch set for CKRM
> 
> Welcome.

 Hi Paul.

> Newcomers to CKRM might want to start reading these patches with "[patch
> 8/8] CKRM:  Documentation".  Starting with patch 0/8 or 1/8 will be
> difficult, at least if you're as dimm witted as I am.
> 
> Even the documentation included in patch 8/8 is missing the motivation
> and context essential to understanding this patch set.  It might have
> helped if the Introduction text at http://ckrm.sourceforge.net/ had been
> included in some form, as part of patch 0/8.  I'm just a little penguin
> here (lkml), but from what I can tell by watching how things work,
> you're going to have to "make the case" -- explain what this is, how
> it's put togeher, and why it's needed.  This is a sizable patch, in
> lines of code, in hooks in critical places, and in amount of "new
> concepts."  I presume (unless you've managed to bribe or blackmail some
> big penguin) you're going to have convince some others that this is
> worth having.  I for one am a CKRM skeptic, so won't be much help to you
> in that quest.  Good luck.

 Good point on including the pointer to the web site.  As you probably
 noticed, there is a history of the design, papers presented, etc.
 Also, Jonathan Corbet did a nice write up from the discussion at the
 2004 Kernel summit which is archived here: http://lwn.net/Articles/94573/
 which may be of use.

 The OLS and LinuxTag papers are archived at the site that you pointed
 to and there will be a tutorial on configuring, using and writing
 controllers for CKRM at OLS this year.  You may also want to see the
 previous postings of this code to LKML for more background.

 In short, CKRM provides very basic desktop to server workload management
 capabilities similar to those provided by most of the old fashioned
 operating systems.  The code provides a fairly simple mechanism for
 adding controllers for any resource type and the code is currently
 widely deployed by PlanetLab, a part of Novell/SuSE's distro, and
 the capabilities are requested by a fair number of Linux users and
 customers.

> I don't see any performance numbers, either on small systems, or
> scalability on large systems.  Certainly this patch does not fall under
> the "obviously no performance impact" exclusion.

 Fair point.  We have been running some of the smaller benchmarks but
 have not yet had a chance to do any kind of performance comparison
 based on the current code.  However, when configured out, it will
 have zero impact.  We do have some performance analysis of the code
 with CONFIG_CKRM set to y but no rules configured planned for the
 very near future.

> A couple of nits:
> 
>  1) Instead of disabling routines with #defines:
>  #define numtasks_put_ref(core_class)  do {} while (0)
> one can do it with static inlines, preserving more compiler
> checking.

 Yeah - that works well in some cases but it turns out to not do so
 well when an argument to a function refers to a structure element
 which is not configured in.  In that case, the compiler emits a
 reference to an undefined structure value in the case of the static
 inline, where otherwise the entire set of code is pre-processed
 away.  I think we've gone through the code and used the correct
 balance of static inlines and #define constructs as appropriate.
 If we've missed any, I'm more than willing to accept a patch to
 correct a specific instance.

>  2) I take it that the following constitutes the 'documentation'
> for what is in /proc//delay.  Perhaps I missed something.
> 
>   +   res  = sprintf(buffer,"%u %llu %llu %u %llu %u %llu\n",
>   +  (unsigned int) get_delay(task,runs),
>   +  (uint64_t) get_delay(task,runcpu_total),
>   +  (uint64_t) get_delay(task,waitcpu_total),
>   +  (unsigned int) get_delay(task,num_iowaits),
>   +  (uint64_t) get_delay(task,iowait_total),
>   +  (unsigned int) get_delay(task,num_memwaits),
>   +  (uint64_t) get_delay(task,mem_iowait_total)

 The code is the documentation?  :)

 There is probably some documentation on /proc// in general and
 we'll see if we can get it updated appropriately.  Vivek?

>  3) Typo in init/Kconfig "atleast":
> 
> If you say Y here, enable the Resource Class File System and atleast

 Got it - thanks!  Someone liked the new word "atleast" - at least
 three occurences removed.

 Oh - and uniformly updated diffstats - I probably missed some when
 I was playing with quilt originally.

gerrit
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 5/8] CKRM: Task Class Controller

2005-03-29 Thread Gerrit Huizenga


 This patch provides the extensions for CKRM to track task classes.
 This is the base to enable task class based resource control for
 cpu, memory and disk I/O.

Signed-Off-By: Chandra Seetharaman <[EMAIL PROTECTED]>
Signed-Off-By: Hubertus Franke <[EMAIL PROTECTED]>
Signed-Off-By: Shailabh Nagar <[EMAIL PROTECTED]>
Signed-Off-By: Vivek Kashyap <[EMAIL PROTECTED]>
Signed-Off-By: Gerrit Huizenga <[EMAIL PROTECTED]>
Signed-off-by: Nishanth Aravamudan <[EMAIL PROTECTED]>


Index: linux-2.6.12-rc1/fs/rcfs/Makefile
===
--- linux-2.6.12-rc1.orig/fs/rcfs/Makefile  2005-03-18 15:16:29.721772974 
-0800
+++ linux-2.6.12-rc1/fs/rcfs/Makefile   2005-03-18 15:16:33.370482769 -0800
@@ -5,3 +5,4 @@
 obj-$(CONFIG_RCFS_FS) += rcfs.o 
 
 rcfs-y := super.o inode.o dir.o rootdir.o magic.o
+rcfs-$(CONFIG_CKRM_TYPE_TASKCLASS) += tc_magic.o
Index: linux-2.6.12-rc1/fs/rcfs/rootdir.c
===
--- linux-2.6.12-rc1.orig/fs/rcfs/rootdir.c 2005-03-18 15:16:29.721772974 
-0800
+++ linux-2.6.12-rc1/fs/rcfs/rootdir.c  2005-03-18 15:16:33.372482610 -0800
@@ -58,7 +58,7 @@ int rcfs_unregister_engine(struct rbce_e
return 0;
 }
 
-EXPORT_SYMBOL(rcfs_unregister_engine);
+EXPORT_SYMBOL_GPL(rcfs_unregister_engine);
 
 /*
  * rcfs_mkroot
@@ -183,6 +183,10 @@ int rcfs_deregister_classtype(struct ckr
 
 EXPORT_SYMBOL_GPL(rcfs_deregister_classtype);
 
+#ifdef CONFIG_CKRM_TYPE_TASKCLASS
+extern struct rcfs_mfdesc tc_mfdesc;
+#endif
+
 /* Common root and magic file entries.
  * root name, root permissions, magic file names and magic file permissions 
  * are needed by all entities (classtypes and classification engines) existing 
@@ -193,6 +197,10 @@ EXPORT_SYMBOL_GPL(rcfs_deregister_classt
  * table to initialize their magf entries. 
  */
 
-struct rcfs_mfdesc *genmfdesc[] = {
+struct rcfs_mfdesc *genmfdesc[CKRM_MAX_CLASSTYPES] = {
+#ifdef CONFIG_CKRM_TYPE_TASKCLASS
+   _mfdesc,
+#else
NULL,
+#endif
 };
Index: linux-2.6.12-rc1/fs/rcfs/tc_magic.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.12-rc1/fs/rcfs/tc_magic.c 2005-03-18 15:16:33.373482530 -0800
@@ -0,0 +1,93 @@
+/* 
+ * fs/rcfs/tc_magic.c 
+ *
+ * Copyright (C) Shailabh Nagar,  IBM Corp. 2004
+ *   (C) Vivek Kashyap,   IBM Corp. 2004
+ *   (C) Chandra Seetharaman, IBM Corp. 2004
+ *   (C) Hubertus Franke, IBM Corp. 2004
+ *   
+ * define magic fileops for taskclass classtype
+ *
+ * Latest version, more details at http://ckrm.sf.net
+ * 
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 
+#include 
+
+/*
+ * Taskclass general
+ *
+ * Define structures for taskclass root directory and its magic files 
+ * In taskclasses, there is one set of magic files, created automatically under
+ * the taskclass root (upon classtype registration) and each directory (class) 
+ * created subsequently. However, classtypes can also choose to have different 
+ * sets of magic files created under their root and other directories under 
+ * root using their mkdir function. RCFS only provides helper functions for 
+ * creating the root directory and its magic files
+ * 
+ */
+
+#define TC_FILE_MODE (S_IFREG | S_IRUGO | S_IWUSR)
+
+#define NR_TCROOTMF  7
+struct rcfs_magf tc_rootdesc[NR_TCROOTMF] = {
+   /* First entry must be root */
+   {
+   /* .name = should not be set, copy from classtype name */
+.mode = RCFS_DEFAULT_DIR_MODE,
+.i_op = _dir_inode_operations,
+.i_fop = _dir_operations,
+},
+   /* Rest are root's magic files */
+   {
+.name = "target",
+.mode = TC_FILE_MODE,
+.i_fop = _fileops,
+.i_op = _file_inode_operations,
+},
+   {
+.name = "members",
+.mode = TC_FILE_MODE,
+.i_fop = _fileops,
+.i_op = _file_inode_operations,
+},
+   {
+.name = "stats",
+.mode = TC_FILE_MODE,
+.i_fop = _fileops,
+.i_op = _file_inode_operations,
+},
+   {
+.name = "shares",
+.mode = TC_FILE_MODE,
+.i_fop = _fileops,
+.i_op = _file_inode_operations,
+},
+   /*
+* Reclassify and Config should be made available only at the 
+* root level. Make sure they are the last two entries, as 
+* rcfs_mkdir depends on it.
+*/
+   {
+.name = "reclassify",
+.mode = TC_FILE_MODE,
+.i_fop = _fileops,
+.i_op = _file_inode_operations,
+},
+   {
+.name

[patch 4/8] CKRM: Resource Control File System (rcfs)

2005-03-29 Thread Gerrit Huizenga


Updates CKRM Resource Control Filesystem (rcfs) to include full
directory structure support.

Signed-Off-By: Chandra Seetharaman <[EMAIL PROTECTED]>
Signed-Off-By: Hubertus Franke <[EMAIL PROTECTED]>
Signed-Off-By: Shailabh Nagar <[EMAIL PROTECTED]>
Signed-Off-By: Vivek Kashyap <[EMAIL PROTECTED]>
Signed-Off-By: Gerrit Huizenga <[EMAIL PROTECTED]>
Signed-off-by: Nishanth Aravamudan <[EMAIL PROTECTED]>

Index: linux-2.6.12-rc1/fs/Makefile
===
--- linux-2.6.12-rc1.orig/fs/Makefile   2005-03-17 17:34:17.0 -0800
+++ linux-2.6.12-rc1/fs/Makefile2005-03-18 15:16:29.717773292 -0800
@@ -92,6 +92,7 @@ obj-$(CONFIG_JFS_FS)  += jfs/
 obj-$(CONFIG_XFS_FS)   += xfs/
 obj-$(CONFIG_AFS_FS)   += afs/
 obj-$(CONFIG_BEFS_FS)  += befs/
+obj-$(CONFIG_RCFS_FS)  += rcfs/
 obj-$(CONFIG_HOSTFS)   += hostfs/
 obj-$(CONFIG_HPPFS)+= hppfs/
 obj-$(CONFIG_DEBUG_FS) += debugfs/
Index: linux-2.6.12-rc1/fs/rcfs/dir.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.12-rc1/fs/rcfs/dir.c  2005-03-18 15:16:29.718773213 -0800
@@ -0,0 +1,220 @@
+/* 
+ * fs/rcfs/dir.c 
+ *
+ * Copyright (C) Shailabh Nagar,  IBM Corp. 2004
+ *   Vivek Kashyap,   IBM Corp. 2004
+ *   
+ * 
+ * Directory operations for rcfs
+ *
+ * Latest version, more details at http://ckrm.sf.net
+ * 
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define rcfs_positive(dentry)  ((dentry)->d_inode && !d_unhashed((dentry)))
+
+int rcfs_empty(struct dentry *dentry)
+{
+   struct dentry *child;
+   int ret = 0;
+
+   spin_lock(_lock);
+   list_for_each_entry(child, >d_subdirs, d_child)
+   if (!rcfs_is_magic(child) && rcfs_positive(child))
+   goto out;
+   ret = 1;
+out:
+   spin_unlock(_lock);
+   return ret;
+}
+
+/* Directory inode operations */
+
+int rcfs_create_coredir(struct inode *dir, struct dentry *dentry)
+{
+
+   struct rcfs_inode_info *ripar, *ridir;
+   int sz;
+
+   ripar = rcfs_get_inode_info(dir);
+   ridir = rcfs_get_inode_info(dentry->d_inode);
+   /* Inform resource controllers - do Core operations */
+   if (ckrm_is_core_valid(ripar->core)) {
+   sz = strlen(ripar->name) + strlen(dentry->d_name.name) + 2;
+   ridir->name = kmalloc(sz, GFP_KERNEL);
+   if (!ridir->name) {
+   return -ENOMEM;
+   }
+   snprintf(ridir->name, sz, "%s/%s", ripar->name,
+dentry->d_name.name);
+   ridir->core = (*(ripar->core->classtype->alloc))
+   (ripar->core, ridir->name);
+   } else {
+   printk(KERN_ERR "rcfs_mkdir: Invalid parent core %p\n",
+  ripar->core);
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
+int rcfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
+{
+
+   int retval = 0;
+   struct ckrm_classtype *clstype;
+
+   if (rcfs_mknod(dir, dentry, mode | S_IFDIR, 0)) {
+   printk(KERN_ERR "rcfs_mkdir: error in rcfs_mknod\n");
+   return retval;
+   }
+   dir->i_nlink++;
+   /* Inherit parent's ops since rcfs_mknod assigns noperm ops. */
+   dentry->d_inode->i_op = dir->i_op;
+   dentry->d_inode->i_fop = dir->i_fop;
+   retval = rcfs_create_coredir(dir, dentry);
+   if (retval) {
+   simple_rmdir(dir, dentry);
+   return retval;
+   }
+   /* create the default set of magic files */
+   clstype = (rcfs_get_inode_info(dentry->d_inode))->core->classtype;
+   rcfs_create_magic(dentry, &(((struct rcfs_magf *)clstype->mfdesc)[1]),
+ clstype->mfcount - 3);
+   return retval;
+}
+
+int rcfs_rmdir(struct inode *dir, struct dentry *dentry)
+{
+   struct rcfs_inode_info *ri = rcfs_get_inode_info(dentry->d_inode);
+
+   if (!rcfs_empty(dentry)) {
+   printk(KERN_ERR "rcfs_rmdir: directory not empty\n");
+   return -ENOTEMPTY;
+   }
+   /* Core class removal  */
+
+   if (ri->core == NULL) {
+   printk(KERN_ERR "rcfs_rmdir: core==NULL\n");
+   /* likely a race condition */
+   return 0;
+   }
+
+   if ((*(ri-&

[patch 4/8] CKRM: Resource Control File System (rcfs)

2005-03-29 Thread Gerrit Huizenga


Updates CKRM Resource Control Filesystem (rcfs) to include full
directory structure support.

Signed-Off-By: Chandra Seetharaman [EMAIL PROTECTED]
Signed-Off-By: Hubertus Franke [EMAIL PROTECTED]
Signed-Off-By: Shailabh Nagar [EMAIL PROTECTED]
Signed-Off-By: Vivek Kashyap [EMAIL PROTECTED]
Signed-Off-By: Gerrit Huizenga [EMAIL PROTECTED]
Signed-off-by: Nishanth Aravamudan [EMAIL PROTECTED]

Index: linux-2.6.12-rc1/fs/Makefile
===
--- linux-2.6.12-rc1.orig/fs/Makefile   2005-03-17 17:34:17.0 -0800
+++ linux-2.6.12-rc1/fs/Makefile2005-03-18 15:16:29.717773292 -0800
@@ -92,6 +92,7 @@ obj-$(CONFIG_JFS_FS)  += jfs/
 obj-$(CONFIG_XFS_FS)   += xfs/
 obj-$(CONFIG_AFS_FS)   += afs/
 obj-$(CONFIG_BEFS_FS)  += befs/
+obj-$(CONFIG_RCFS_FS)  += rcfs/
 obj-$(CONFIG_HOSTFS)   += hostfs/
 obj-$(CONFIG_HPPFS)+= hppfs/
 obj-$(CONFIG_DEBUG_FS) += debugfs/
Index: linux-2.6.12-rc1/fs/rcfs/dir.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.12-rc1/fs/rcfs/dir.c  2005-03-18 15:16:29.718773213 -0800
@@ -0,0 +1,220 @@
+/* 
+ * fs/rcfs/dir.c 
+ *
+ * Copyright (C) Shailabh Nagar,  IBM Corp. 2004
+ *   Vivek Kashyap,   IBM Corp. 2004
+ *   
+ * 
+ * Directory operations for rcfs
+ *
+ * Latest version, more details at http://ckrm.sf.net
+ * 
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ */
+
+#include linux/module.h
+#include linux/fs.h
+#include linux/namei.h
+#include linux/namespace.h
+#include linux/dcache.h
+#include linux/seq_file.h
+#include linux/pagemap.h
+#include linux/highmem.h
+#include linux/init.h
+#include linux/string.h
+#include linux/smp_lock.h
+#include linux/backing-dev.h
+#include linux/parser.h
+#include linux/rcfs.h
+#include asm/uaccess.h
+
+#define rcfs_positive(dentry)  ((dentry)-d_inode  !d_unhashed((dentry)))
+
+int rcfs_empty(struct dentry *dentry)
+{
+   struct dentry *child;
+   int ret = 0;
+
+   spin_lock(dcache_lock);
+   list_for_each_entry(child, dentry-d_subdirs, d_child)
+   if (!rcfs_is_magic(child)  rcfs_positive(child))
+   goto out;
+   ret = 1;
+out:
+   spin_unlock(dcache_lock);
+   return ret;
+}
+
+/* Directory inode operations */
+
+int rcfs_create_coredir(struct inode *dir, struct dentry *dentry)
+{
+
+   struct rcfs_inode_info *ripar, *ridir;
+   int sz;
+
+   ripar = rcfs_get_inode_info(dir);
+   ridir = rcfs_get_inode_info(dentry-d_inode);
+   /* Inform resource controllers - do Core operations */
+   if (ckrm_is_core_valid(ripar-core)) {
+   sz = strlen(ripar-name) + strlen(dentry-d_name.name) + 2;
+   ridir-name = kmalloc(sz, GFP_KERNEL);
+   if (!ridir-name) {
+   return -ENOMEM;
+   }
+   snprintf(ridir-name, sz, %s/%s, ripar-name,
+dentry-d_name.name);
+   ridir-core = (*(ripar-core-classtype-alloc))
+   (ripar-core, ridir-name);
+   } else {
+   printk(KERN_ERR rcfs_mkdir: Invalid parent core %p\n,
+  ripar-core);
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
+int rcfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
+{
+
+   int retval = 0;
+   struct ckrm_classtype *clstype;
+
+   if (rcfs_mknod(dir, dentry, mode | S_IFDIR, 0)) {
+   printk(KERN_ERR rcfs_mkdir: error in rcfs_mknod\n);
+   return retval;
+   }
+   dir-i_nlink++;
+   /* Inherit parent's ops since rcfs_mknod assigns noperm ops. */
+   dentry-d_inode-i_op = dir-i_op;
+   dentry-d_inode-i_fop = dir-i_fop;
+   retval = rcfs_create_coredir(dir, dentry);
+   if (retval) {
+   simple_rmdir(dir, dentry);
+   return retval;
+   }
+   /* create the default set of magic files */
+   clstype = (rcfs_get_inode_info(dentry-d_inode))-core-classtype;
+   rcfs_create_magic(dentry, (((struct rcfs_magf *)clstype-mfdesc)[1]),
+ clstype-mfcount - 3);
+   return retval;
+}
+
+int rcfs_rmdir(struct inode *dir, struct dentry *dentry)
+{
+   struct rcfs_inode_info *ri = rcfs_get_inode_info(dentry-d_inode);
+
+   if (!rcfs_empty(dentry)) {
+   printk(KERN_ERR rcfs_rmdir: directory not empty\n);
+   return -ENOTEMPTY;
+   }
+   /* Core class removal  */
+
+   if (ri-core == NULL) {
+   printk(KERN_ERR rcfs_rmdir: core==NULL\n);
+   /* likely a race condition */
+   return 0;
+   }
+
+   if ((*(ri-core-classtype-free)) (ri-core

[patch 5/8] CKRM: Task Class Controller

2005-03-29 Thread Gerrit Huizenga


 This patch provides the extensions for CKRM to track task classes.
 This is the base to enable task class based resource control for
 cpu, memory and disk I/O.

Signed-Off-By: Chandra Seetharaman [EMAIL PROTECTED]
Signed-Off-By: Hubertus Franke [EMAIL PROTECTED]
Signed-Off-By: Shailabh Nagar [EMAIL PROTECTED]
Signed-Off-By: Vivek Kashyap [EMAIL PROTECTED]
Signed-Off-By: Gerrit Huizenga [EMAIL PROTECTED]
Signed-off-by: Nishanth Aravamudan [EMAIL PROTECTED]


Index: linux-2.6.12-rc1/fs/rcfs/Makefile
===
--- linux-2.6.12-rc1.orig/fs/rcfs/Makefile  2005-03-18 15:16:29.721772974 
-0800
+++ linux-2.6.12-rc1/fs/rcfs/Makefile   2005-03-18 15:16:33.370482769 -0800
@@ -5,3 +5,4 @@
 obj-$(CONFIG_RCFS_FS) += rcfs.o 
 
 rcfs-y := super.o inode.o dir.o rootdir.o magic.o
+rcfs-$(CONFIG_CKRM_TYPE_TASKCLASS) += tc_magic.o
Index: linux-2.6.12-rc1/fs/rcfs/rootdir.c
===
--- linux-2.6.12-rc1.orig/fs/rcfs/rootdir.c 2005-03-18 15:16:29.721772974 
-0800
+++ linux-2.6.12-rc1/fs/rcfs/rootdir.c  2005-03-18 15:16:33.372482610 -0800
@@ -58,7 +58,7 @@ int rcfs_unregister_engine(struct rbce_e
return 0;
 }
 
-EXPORT_SYMBOL(rcfs_unregister_engine);
+EXPORT_SYMBOL_GPL(rcfs_unregister_engine);
 
 /*
  * rcfs_mkroot
@@ -183,6 +183,10 @@ int rcfs_deregister_classtype(struct ckr
 
 EXPORT_SYMBOL_GPL(rcfs_deregister_classtype);
 
+#ifdef CONFIG_CKRM_TYPE_TASKCLASS
+extern struct rcfs_mfdesc tc_mfdesc;
+#endif
+
 /* Common root and magic file entries.
  * root name, root permissions, magic file names and magic file permissions 
  * are needed by all entities (classtypes and classification engines) existing 
@@ -193,6 +197,10 @@ EXPORT_SYMBOL_GPL(rcfs_deregister_classt
  * table to initialize their magf entries. 
  */
 
-struct rcfs_mfdesc *genmfdesc[] = {
+struct rcfs_mfdesc *genmfdesc[CKRM_MAX_CLASSTYPES] = {
+#ifdef CONFIG_CKRM_TYPE_TASKCLASS
+   tc_mfdesc,
+#else
NULL,
+#endif
 };
Index: linux-2.6.12-rc1/fs/rcfs/tc_magic.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.12-rc1/fs/rcfs/tc_magic.c 2005-03-18 15:16:33.373482530 -0800
@@ -0,0 +1,93 @@
+/* 
+ * fs/rcfs/tc_magic.c 
+ *
+ * Copyright (C) Shailabh Nagar,  IBM Corp. 2004
+ *   (C) Vivek Kashyap,   IBM Corp. 2004
+ *   (C) Chandra Seetharaman, IBM Corp. 2004
+ *   (C) Hubertus Franke, IBM Corp. 2004
+ *   
+ * define magic fileops for taskclass classtype
+ *
+ * Latest version, more details at http://ckrm.sf.net
+ * 
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include linux/rcfs.h
+#include linux/ckrm_tc.h
+
+/*
+ * Taskclass general
+ *
+ * Define structures for taskclass root directory and its magic files 
+ * In taskclasses, there is one set of magic files, created automatically under
+ * the taskclass root (upon classtype registration) and each directory (class) 
+ * created subsequently. However, classtypes can also choose to have different 
+ * sets of magic files created under their root and other directories under 
+ * root using their mkdir function. RCFS only provides helper functions for 
+ * creating the root directory and its magic files
+ * 
+ */
+
+#define TC_FILE_MODE (S_IFREG | S_IRUGO | S_IWUSR)
+
+#define NR_TCROOTMF  7
+struct rcfs_magf tc_rootdesc[NR_TCROOTMF] = {
+   /* First entry must be root */
+   {
+   /* .name = should not be set, copy from classtype name */
+.mode = RCFS_DEFAULT_DIR_MODE,
+.i_op = rcfs_dir_inode_operations,
+.i_fop = simple_dir_operations,
+},
+   /* Rest are root's magic files */
+   {
+.name = target,
+.mode = TC_FILE_MODE,
+.i_fop = target_fileops,
+.i_op = rcfs_file_inode_operations,
+},
+   {
+.name = members,
+.mode = TC_FILE_MODE,
+.i_fop = members_fileops,
+.i_op = rcfs_file_inode_operations,
+},
+   {
+.name = stats,
+.mode = TC_FILE_MODE,
+.i_fop = stats_fileops,
+.i_op = rcfs_file_inode_operations,
+},
+   {
+.name = shares,
+.mode = TC_FILE_MODE,
+.i_fop = shares_fileops,
+.i_op = rcfs_file_inode_operations,
+},
+   /*
+* Reclassify and Config should be made available only at the 
+* root level. Make sure they are the last two entries, as 
+* rcfs_mkdir depends on it.
+*/
+   {
+.name = reclassify,
+.mode = TC_FILE_MODE,
+.i_fop = reclassify_fileops,
+.i_op = rcfs_file_inode_operations,
+},
+   {
+.name = config

Re: [patch 0/8] CKRM: Core patch set

2005-03-29 Thread Gerrit Huizenga


On Tue, 29 Mar 2005 22:05:30 PST, Paul Jackson wrote:
 gerrit wrote:
  This is the core patch set for CKRM
 
 Welcome.
 
 Hi Paul.

 Newcomers to CKRM might want to start reading these patches with [patch
 8/8] CKRM:  Documentation.  Starting with patch 0/8 or 1/8 will be
 difficult, at least if you're as dimm witted as I am.
 
 Even the documentation included in patch 8/8 is missing the motivation
 and context essential to understanding this patch set.  It might have
 helped if the Introduction text at http://ckrm.sourceforge.net/ had been
 included in some form, as part of patch 0/8.  I'm just a little penguin
 here (lkml), but from what I can tell by watching how things work,
 you're going to have to make the case -- explain what this is, how
 it's put togeher, and why it's needed.  This is a sizable patch, in
 lines of code, in hooks in critical places, and in amount of new
 concepts.  I presume (unless you've managed to bribe or blackmail some
 big penguin) you're going to have convince some others that this is
 worth having.  I for one am a CKRM skeptic, so won't be much help to you
 in that quest.  Good luck.
 
 Good point on including the pointer to the web site.  As you probably
 noticed, there is a history of the design, papers presented, etc.
 Also, Jonathan Corbet did a nice write up from the discussion at the
 2004 Kernel summit which is archived here: http://lwn.net/Articles/94573/
 which may be of use.

 The OLS and LinuxTag papers are archived at the site that you pointed
 to and there will be a tutorial on configuring, using and writing
 controllers for CKRM at OLS this year.  You may also want to see the
 previous postings of this code to LKML for more background.

 In short, CKRM provides very basic desktop to server workload management
 capabilities similar to those provided by most of the old fashioned
 operating systems.  The code provides a fairly simple mechanism for
 adding controllers for any resource type and the code is currently
 widely deployed by PlanetLab, a part of Novell/SuSE's distro, and
 the capabilities are requested by a fair number of Linux users and
 customers.

 I don't see any performance numbers, either on small systems, or
 scalability on large systems.  Certainly this patch does not fall under
 the obviously no performance impact exclusion.

 Fair point.  We have been running some of the smaller benchmarks but
 have not yet had a chance to do any kind of performance comparison
 based on the current code.  However, when configured out, it will
 have zero impact.  We do have some performance analysis of the code
 with CONFIG_CKRM set to y but no rules configured planned for the
 very near future.
 
 A couple of nits:
 
  1) Instead of disabling routines with #defines:
  #define numtasks_put_ref(core_class)  do {} while (0)
 one can do it with static inlines, preserving more compiler
 checking.
 
 Yeah - that works well in some cases but it turns out to not do so
 well when an argument to a function refers to a structure element
 which is not configured in.  In that case, the compiler emits a
 reference to an undefined structure value in the case of the static
 inline, where otherwise the entire set of code is pre-processed
 away.  I think we've gone through the code and used the correct
 balance of static inlines and #define constructs as appropriate.
 If we've missed any, I'm more than willing to accept a patch to
 correct a specific instance.

  2) I take it that the following constitutes the 'documentation'
 for what is in /proc/pid/delay.  Perhaps I missed something.
 
   +   res  = sprintf(buffer,%u %llu %llu %u %llu %u %llu\n,
   +  (unsigned int) get_delay(task,runs),
   +  (uint64_t) get_delay(task,runcpu_total),
   +  (uint64_t) get_delay(task,waitcpu_total),
   +  (unsigned int) get_delay(task,num_iowaits),
   +  (uint64_t) get_delay(task,iowait_total),
   +  (unsigned int) get_delay(task,num_memwaits),
   +  (uint64_t) get_delay(task,mem_iowait_total)
 
 The code is the documentation?  :)

 There is probably some documentation on /proc/pid/ in general and
 we'll see if we can get it updated appropriately.  Vivek?

  3) Typo in init/Kconfig atleast:
 
 If you say Y here, enable the Resource Class File System and atleast

 Got it - thanks!  Someone liked the new word atleast - at least
 three occurences removed.

 Oh - and uniformly updated diffstats - I probably missed some when
 I was playing with quilt originally.

gerrit
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] Re: [PATCH] CKRM: 4/10 CKRM: Full rcfs support

2005-02-24 Thread Gerrit Huizenga

On Thu, 24 Feb 2005 17:25:28 CST, Chris Friesen wrote:
> Shailabh Nagar wrote:
> 
> > Sounds like a case is being made to make CONFIG_RCFS a "y" and eliminate
> > the possibility of it being a loadable module ?
> 
> No, I believe the idea was to make CONFIG_RCFS be automatically set to 
> the same as CKRM.

Right, but CONFIG_CKRM is a Y/N, rcfs can be a module which is loaded
or not, depending on whether someone actually wants to *use* classes
in CKRM.

In theory, distros could build with CKRM set to "Y" but leave RCFS
as a module to simplify testing.  It dosn't matter too much to me but
it seems like having the flexibility of leaving rcfs as a module
is a nice capability.

I'm willing to be hear all comments.  ;-)

gerrit
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] CKRM: 3/10 CKRM: Core ckrm, rcfs

2005-02-24 Thread Gerrit Huizenga


On Thu, 24 Feb 2005 13:11:08 PST, Greg KH wrote:
> On Thu, Feb 24, 2005 at 12:54:17PM -0800, Gerrit Huizenga wrote:
> > On Thu, 24 Feb 2005 09:52:23 PST, Greg KH wrote:
> > > On Thu, Feb 24, 2005 at 01:33:12AM -0800, Gerrit Huizenga wrote:
> > > > On Mon, 29 Nov 2004 14:00:47 PST, Greg KH wrote:
> > > > > On Mon, Nov 29, 2004 at 10:47:32AM -0800, Gerrit Huizenga wrote:
> > > > > > +typedef void *(*ce_classify_fct_t) (enum ckrm_event event, void 
> > > > > > *obj, ...);
> > > > > > +typedef void (*ce_notify_fct_t) (enum ckrm_event event, void 
> > > > > > *classobj,
> > > > > > +void *obj);
> > > > > 
> > > > > Ick.  Don't put a _t at the end of a typedef.  Wrong OS style guide.
> > > >  
> > > > Fixed.  Although this isn't an OS style guide thing - it is a Posix
> > > > driven convention whereby any header file defined in the standard
> > > > automatically has _t suffixed variables reserved to the implementation,
> > > > e.g. no application is define variables using _t.  This header file 
> > > > isn't
> > > > being used by user level applications so it doesn't matter.
> > > 
> > > But Linux kernel internals are not driven by Posix "conventions", hence,
> > > my objection.
> >  
> > So what is the recommended way of making header files safe for both
> > kernel and user level consumption when a header file contains
> > structure definitions suitable for user/kernel communication?
> 
> Right now the way is, "Don't do it."  Write separate header files for
> userspace.  See the lkml archives for details as to what the proposed
> way to do this is, but I don't think anyone has started working on it
> yet.
 
Yeah - I've seen that.  Doesn't help for new projects yet since the
approach is not well fleshed out yet.

> > > > > > +#define ckrm_get_res_class(rescls, resid, type) \
> > > > > > +   ((type*) (((resid != -1) && ((rescls) != NULL) \
> > > > > > +  && ((rescls) != (void *)-1)) ? \
> > > > > > +((struct ckrm_core_class *)(rescls))->res_class[resid] : NULL))
> > > > > 
> > > > > What exactly are you trying to do with this macro?  Cast to see if a
> > > > > pointer is not -1?  That doesn't sound very safe...
> > > > 
> > > > This needs to be fixed and better commented.  Basically, when a task
> > > > is exiting, it's class can be set to -1 (-1 in a pointer is, uh, icky).
> > > > But when uninitialized, it is set to NULL.  We need to come up with
> > > > a better fix for this one.
> > > 
> > > Setting a pointer to -1 is, uh, wrong.  Please fix this, as it's just
> > > broken.
> >  
> > Yes - I have the patch at hand to fix this, just need to merge it in.
> > It will be included in the next release.
> 
> Just curious, what is your level of involvement in this project?  Are
> you just merging other developer's patches, or are you writing any of
> the changes yourself?  Isn't a maintainer of a kernel subsystem supposed
> to be one of the primary developers?
 
I'm the person who will ensure that it is maintained.  There are quite
a few developers who have been involved over and those have changed a
bit over time and will continue to change.  However, I'll be sticking
around to make sure that the kernel side is cleaned up and remains
maintainable.  Some of the areas have more specific owners but some
also are supporting distros, research activities, etc.

Of the cleanups, I've done most of them myself but have had some help
and will continue to have help from several of the authors as we carry
forward.  I also did a fair share of cleanups prior to the first posting;
I'm not sure what you would have thought of the first few iterations of
the code.  ;-)

> > > > > > +/*
> > > > > > + * Registering a callback structure by the classification engine.
> > > > > > + *
> > > > > > + * Returns typeId of class on success -errno for failure.
> > > > > > + */
> > > > > > +int ckrm_register_engine(const char *typename, ckrm_eng_callback_t 
> > > > > > * ecbs)
> > > > > > +{
> > > > > > +   struct ckrm_classtype *ctype;
> > > > > > +
> > > > > > +   ctype = ckrm_find_classtype_by_name(typename);
> > > > > > +   if (ctype == NULL)
> > > > > > +   re

Re: [PATCH] CKRM: 3/10 CKRM: Core ckrm, rcfs

2005-02-24 Thread Gerrit Huizenga

On Thu, 24 Feb 2005 09:52:23 PST, Greg KH wrote:
> On Thu, Feb 24, 2005 at 01:33:12AM -0800, Gerrit Huizenga wrote:
> > On Mon, 29 Nov 2004 14:00:47 PST, Greg KH wrote:
> > > On Mon, Nov 29, 2004 at 10:47:32AM -0800, Gerrit Huizenga wrote:
> > > > +typedef void *(*ce_classify_fct_t) (enum ckrm_event event, void *obj, 
> > > > ...);
> > > > +typedef void (*ce_notify_fct_t) (enum ckrm_event event, void *classobj,
> > > > +void *obj);
> > > 
> > > Ick.  Don't put a _t at the end of a typedef.  Wrong OS style guide.
> >  
> > Fixed.  Although this isn't an OS style guide thing - it is a Posix
> > driven convention whereby any header file defined in the standard
> > automatically has _t suffixed variables reserved to the implementation,
> > e.g. no application is define variables using _t.  This header file isn't
> > being used by user level applications so it doesn't matter.
> 
> But Linux kernel internals are not driven by Posix "conventions", hence,
> my objection.

So what is the recommended way of making header files safe for both
kernel and user level consumption when a header file contains
structure definitions suitable for user/kernel communication?

Currently, I don't like the way the current CKRM code mixes kernel
and user content, beyond just the things that are user level accessible.

However, it was pointed out to me that some of the CKRM files define
things which are intended to be part of the interface between user and
kernel.  That is also where some header files are defined as LGPL.

Now, I believe the contents of those header files should be clearly
separated into user/kernel API/structure files and kernel-only headers.

How do you recommend that that usually be done without polluting the
applications C namespace?  This gets right back to the problem with
replicating everything for glibc under a new license, which is really
quite a crock but just the way things are today.  I'd rather start out
with something involving a bit less redundent code.

> > > Again with the unneeded typedef.  Come on Gerrit, you should know
> > > better...
> >  
> > Sorry, years of implementing Posix conformant OS's and system header
> > files make this very common for anyone (including several of the
> > CKRM developers).  Specifically because of user level name space
> > collision avoidance issues (e.g. think preserving backwards compatibility
> > for user level apps).  It is the primary mechanism for simplifying the
> > #ifdef __KERNEL__ crap used in most OS's.
> 
> If you are going to write Linux kernel code, use the proper style rules.
> No matter how many years working on other oses, it doesn't matter, you
> know better than to try to bring up that kind of objection...

The question above still stands.  Linus has mentioned the value of
__KERNEL__ in the past to help avoid the application name space
pollution issue as well, but _t also is an internationally accepted
convention among application programmers and system providers.  I'm
not as convinced that this is a case where Linux being different adds
any value to anyone, and actually makes it tougher to define header
files which can preserve an application/kernel API.

I'm trying to figure out the "right way" of solving the issue of
allowing user apps that happen to be mostly Posix conformant use
CKRM without polluting their namespace.  Seperate headers will do that,
at the minor annoyance of a proliferation of header files.

> > > > +#define ckrm_get_res_class(rescls, resid, type) \
> > > > +   ((type*) (((resid != -1) && ((rescls) != NULL) \
> > > > +  && ((rescls) != (void *)-1)) ? \
> > > > +((struct ckrm_core_class *)(rescls))->res_class[resid] : NULL))
> > > 
> > > What exactly are you trying to do with this macro?  Cast to see if a
> > > pointer is not -1?  That doesn't sound very safe...
> > 
> > This needs to be fixed and better commented.  Basically, when a task
> > is exiting, it's class can be set to -1 (-1 in a pointer is, uh, icky).
> > But when uninitialized, it is set to NULL.  We need to come up with
> > a better fix for this one.
> 
> Setting a pointer to -1 is, uh, wrong.  Please fix this, as it's just
> broken.

Yes - I have the patch at hand to fix this, just need to merge it in.
It will be included in the next release.

> > > > +static inline void ckrm_core_grab(struct ckrm_core_class *core)
> > > > +{
> > > > +   if (core)
> > > > +   atomic_inc(>refcnt);
> > > > +}
> > > 
> > > Please just use kref, don't invent your own ref

Re: [PATCH] CKRM [7/8] Resource controller for number of tasks per class

2005-02-24 Thread Gerrit Huizenga


On Thu, 24 Feb 2005 10:00:39 PST, Greg KH wrote:
> On Thu, Feb 24, 2005 at 01:34:38AM -0800, Gerrit Huizenga wrote:
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> 
> What was that response you gave me about the fact that you fixed up the
> proper ordering of #include files...
 
Doh - missed that one.  :(

Fixed now.

gerrit
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] CKRM: 5/10 CKRM: Task based management for CPU, memory and Disk I/O.

2005-02-24 Thread Gerrit Huizenga


On Mon, 29 Nov 2004 14:23:23 PST, Greg KH wrote:
> On Mon, Nov 29, 2004 at 10:49:09AM -0800, Gerrit Huizenga wrote:
> > +#define TC_DEBUG(fmt, args...) do { \
> > +/* printk("%s: " fmt, __FUNCTION__ , ## args); */ } while (0)
> 
> Again with the new debug macro :(
> 
> > +static struct ckrm_task_class taskclass_dflt_class = {
> > +};
> 
> Empty structure?  Why?
 
Initialized definition, not declaration.  Although with no initializer
which was a bit odd.  struct ckrm_task_class is defined in ckrm_tc.h.

> > +// Hubertus .. following functions should move to ckrm_rc.h
> 
> Why haven't they moved :)

Because we aren't done yet.  ;-)

> > +static inline void ckrm_task_lock(struct task_struct *tsk)
> > +{
> > +   spin_lock(>ckrm_tsklock);
> > +}
> 
> Just lock (or unlock) the lock, don't wrap a lock in a function.
 
Yep.  Done.

> > +DECLARE_MUTEX(async_serializer);   // serialize all async functions
> 
> Should this really be global?  The code says otherwise :)
 
Not any more.

> > +   printk(".. Initializing ClassType<%s> \n",
> > +  CT_taskclass.name);
> 
> What a pretty log message.  Unfortunately it's wrong (me hears the
> growing mumblings of the kernel janitor mob...)
 
Yep - fixed.

> > +#if 0
> > +
> > +/**
> > + * Debugging Task Classes:  Utility functions
> > + 
> > **/
> 
> Then remove the code, if it's not needed.
 
Okay.  I can easily carry a debug patch later.  Should have done that
sooner...

> > +EXPORT_SYMBOL(tcp_v4_lookup_listener);
> 
> Not EXPORT_SYMBOL_GPL()?
 
Currently makes it just like all the others.  I'll let the networking
folks chime in on how they want that exported when this patch gets
cross posted to netdev.

thanks,

gerrit
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH] CKRM: 6/10 CKRM: Resource controller for sockets

2005-02-24 Thread Gerrit Huizenga


On Tue, 30 Nov 2004 11:43:11 EST, James Morris wrote:
> On Mon, 29 Nov 2004, Gerrit Huizenga wrote:
> 
> > +int sock_mkdir(struct inode *, struct dentry *, int mode);
> > +int sock_rmdir(struct inode *, struct dentry *);
> > +
> > +int sock_create_noperm(struct inode *, struct dentry *, int,
> > +  struct nameidata *);
> > +int sock_unlink_noperm(struct inode *, struct dentry *);
> > +int sock_mkdir_noperm(struct inode *, struct dentry *, int);
> > +int sock_rmdir_noperm(struct inode *, struct dentry *);
> > +int sock_mknod_noperm(struct inode *, struct dentry *, int, dev_t);
> > 
> 
> The sock_ namespace belongs to core networking.  Use rcfs_sock_ or 
> something.

Very good point.  Global search and destroy, er, replace applied.

gerrit
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] CKRM [0/8] Long overdue response to initial review

2005-02-24 Thread Gerrit Huizenga

This is a long overdue response to the many code review comments
that came in during the last posting of the CKRM core code.   While
CKRM has not by any means been inactive, a variety of other deliverables
have taken precedence until recently.

However, the following set of postings is a step towards starting to
rectify that delinquincy, including a refresh to 2.6.11-rc5.  While
testing has been going over the past couple of weeks on a set of patches
very close to this, a large number of cleanups have happened in the past
couple of days and testing is not complete on those.  In particular,
I know of a couple of batches of warnings that need to be cleaned up
and I have a strong suspicion that building with at least one and maybe
two particular CKRM_* config options set to Y may fail to compile at
the moment.

Also, since the last submission, a couple of the patches have
been removed from the set that I'm including now.  One of them
needs a few updates and some air time on ckrm-tech because of some
slight networking related changes; the other was just too darn big
of a patch and is being broken into more reasonable sized pieces.

I was not able to make all changes requested by review comments thus
far; however, the ones that I did not get to have been added to
a TODO file in the Docuemntation directory for ckrm.

The following postings will contain the updated patches for
these components of CKRM:

The following patches include:

01-diff_ckrm_events:
Base CKRM events, mods to existing kernel code

02-diff_delay_acct:
More accurate accounting for CPU scheduling, IO scheduling

03-diff_ckrm_core:
Main/core CKRM code, beginings of Resource Control Filesystem

04-diff_rcfs:
Full directory suppport for rcfs

05-diff_taskclass:
Task based management for CPU, memory and Disk I/O.

06-diff_sockclass:
CKRM tracking for socket classes for inbound connection control,
bandwidth control, etc.

07-diff_numtasks:
Resource controller for number of tasks per class.

10-diff_docs
CKRM documentation.

Please send comments to ckrm-tech@lists.sourceforge.net

thanks,

gerrit
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] CKRM [1/8] Base CKRM events, mods to existing kernel code

2005-02-24 Thread Gerrit Huizenga

Core CKRM Event Callbacks.

On exec, fork, exit, real/effective gid/uid, use CKRM to associate
tasks with appropriate class.

Addressed review comments:

Sam Ravnborg:  Use Makefile syntax correctly
Dave Hansen:  Use of ## is annoying
Greg KH:  Remove Changelogs;
Use __KERNEL__ correctly (if at all);
Consolidate CONFIG_ sections in header files;
Fix extern int get_exe_path_name().
Remove unused DEBUG code 
Convert enum to typedef in prep for sparce __bitwise use

Not yet Addressed:

Greg KH:
Use of __bitwise and sparse in enum's
Use of kernel list type


Signed-off-by:  Shailabh Nagar <[EMAIL PROTECTED]>
Signed-off-by:  Hubertus Franke <[EMAIL PROTECTED]>
Signed-off-by:  Chandra Seetharaman <[EMAIL PROTECTED]>
Signed-off-by:  Gerrit Huizenga <[EMAIL PROTECTED]>


 fs/exec.c   |2 
 include/linux/ckrm_events.h |  190 
 include/linux/sched.h   |1 
 init/Kconfig|   16 +++
 kernel/Makefile |2 
 kernel/ckrm/Makefile|7 +
 kernel/ckrm/ckrm_events.c   |   97 ++
 kernel/exit.c   |3 
 kernel/fork.c   |4 
 kernel/sys.c|   10 ++
 10 files changed, 331 insertions(+), 1 deletion(-)

Index: linux-2.6.11-rc5/fs/exec.c
===
--- linux-2.6.11-rc5.orig/fs/exec.c 2005-02-23 20:02:37.0 -0800
+++ linux-2.6.11-rc5/fs/exec.c  2005-02-24 00:54:50.529799288 -0800
@@ -48,6 +48,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1085,6 +1086,7 @@
fput(bprm->file);
bprm->file = NULL;
current->did_exec = 1;
+   ckrm_cb_exec(bprm->filename);
return retval;
}
read_lock(_lock);
Index: linux-2.6.11-rc5/include/linux/ckrm_events.h
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.11-rc5/include/linux/ckrm_events.h2005-02-24 
00:54:50.530799168 -0800
@@ -0,0 +1,192 @@
+/*
+ * ckrm_events.h - Class-based Kernel Resource Management (CKRM)
+ * event handling
+ *
+ * Copyright (C) Hubertus Franke, IBM Corp. 2003,2004
+ *   (C) Shailabh Nagar,  IBM Corp. 2003
+ *   (C) Chandra Seetharaman, IBM Corp. 2003
+ * 
+ * 
+ * Provides a base header file including macros and basic data structures.
+ *
+ * Latest version, more details at http://ckrm.sf.net
+ * 
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ *
+ */
+
+#ifndef _LINUX_CKRM_EVENTS_H
+#define _LINUX_CKRM_EVENTS_H
+
+#ifdef CONFIG_CKRM
+
+/*
+ * Data structure and function to get the list of registered 
+ * resource controllers.
+ */
+
+/*
+ * CKRM defines a set of events at particular points in the kernel
+ * at which callbacks registered by various class types are called
+ */
+
+enum ckrm_event {
+   /*
+* we distinguish these events types:
+*
+* (a) CKRM_LATCHABLE_EVENTS
+*  events can be latched for event callbacks by classtypes
+*
+* (b) CKRM_NONLATACHBLE_EVENTS
+* events can not be latched but can be used to call classification
+* 
+* (c) event that are used for notification purposes
+* range: [ CKRM_EVENT_CANNOT_CLASSIFY .. )
+*/
+
+   /* events (a) */
+
+   CKRM_LATCHABLE_EVENTS,
+
+   CKRM_EVENT_NEWTASK = CKRM_LATCHABLE_EVENTS,
+   CKRM_EVENT_FORK,
+   CKRM_EVENT_EXIT,
+   CKRM_EVENT_EXEC,
+   CKRM_EVENT_UID,
+   CKRM_EVENT_GID,
+   CKRM_EVENT_LOGIN,
+   CKRM_EVENT_USERADD,
+   CKRM_EVENT_USERDEL,
+   CKRM_EVENT_LISTEN_START,
+   CKRM_EVENT_LISTEN_STOP,
+   CKRM_EVENT_APPTAG,
+
+   /* events (b) */
+
+   CKRM_NONLATCHABLE_EVENTS,
+
+   CKRM_EVENT_RECLASSIFY = CKRM_NONLATCHABLE_EVENTS,
+
+   /* events (c) */
+
+   CKRM_NOTCLASSIFY_EVENTS,
+
+   CKRM_EVENT_MANUAL = CKRM_NOTCLASSIFY_EVENTS,
+
+   CKRM_NUM_EVENTS
+};
+
+/*
+ * CKRM event callback specification for the classtypes or resource 
controllers 
+ *   typically an array is specified using CKRM_EVENT_SPEC terminated with 
+ *   CKRM_EVENT_SPEC_LAST and then that array is registered using
+ *   ckrm_register_event_set.
+ *   Individual registration of event_cb is also possible
+ */
+
+struct ckrm_hoo

[PATCH] CKRM [2/8] More accurate account for CPU & IO scheduling

2005-02-24 Thread Gerrit Huizenga

CKRM processor scheduling delay accounting - provides a mechanism
to In addition to counting frequency the total delay in ns is also
recorded. CPU delays are specified as cpu-wait and cpu-run.  I/O delays
are recorded for memory and regular I/O.  Information is accessible
through /proc//delay.

Signed-Off-By: Chandra Seetharaman <[EMAIL PROTECTED]>
Signed-Off-By: Hubertus Franke <[EMAIL PROTECTED]>
Signed-Off-By: Shailabh Nagar <[EMAIL PROTECTED]>
Signed-Off-By: Gerrit Huizenga <[EMAIL PROTECTED]>

 fs/proc/array.c|   18 +
 fs/proc/base.c |   18 +
 include/linux/sched.h  |   86 +
 include/linux/taskdelays.h |   45 +++
 init/Kconfig   |8 
 kernel/fork.c  |1 
 kernel/sched.c |   17 
 mm/memory.c|9 +++-
 8 files changed, 200 insertions(+), 2 deletions(-)

Index: linux-2.6.11-rc5/fs/proc/array.c
===
--- linux-2.6.11-rc5.orig/fs/proc/array.c   2005-02-23 20:03:03.0 
-0800
+++ linux-2.6.11-rc5/fs/proc/array.c2005-02-24 00:54:56.449085584 -0800
@@ -473,3 +473,21 @@
return sprintf(buffer,"%d %d %d %d %d %d %d\n",
   size, resident, shared, text, lib, data, 0);
 }
+
+
+int proc_pid_delay(struct task_struct *task, char * buffer)
+{
+   int res;
+
+   res  = sprintf(buffer,"%u %llu %llu %u %llu %u %llu\n",
+  (unsigned int) get_delay(task,runs),
+  (uint64_t) get_delay(task,runcpu_total),
+  (uint64_t) get_delay(task,waitcpu_total),
+  (unsigned int) get_delay(task,num_iowaits),
+  (uint64_t) get_delay(task,iowait_total),
+  (unsigned int) get_delay(task,num_memwaits),
+  (uint64_t) get_delay(task,mem_iowait_total)
+   );
+   return res;
+}
+
Index: linux-2.6.11-rc5/fs/proc/base.c
===
--- linux-2.6.11-rc5.orig/fs/proc/base.c2005-02-23 20:03:04.0 
-0800
+++ linux-2.6.11-rc5/fs/proc/base.c 2005-02-24 00:54:56.451085343 -0800
@@ -105,6 +105,10 @@
 #ifdef CONFIG_AUDITSYSCALL
PROC_TID_LOGINUID,
 #endif
+#ifdef CONFIG_DELAY_ACCT
+PROC_TID_DELAY_ACCT,
+PROC_TGID_DELAY_ACCT,
+#endif
PROC_TID_FD_DIR = 0x8000,   /* 0x8000-0x */
PROC_TID_OOM_SCORE,
PROC_TID_OOM_ADJUST,
@@ -137,6 +141,9 @@
 #ifdef CONFIG_SECURITY
E(PROC_TGID_ATTR,  "attr",S_IFDIR|S_IRUGO|S_IXUGO),
 #endif
+#ifdef CONFIG_DELAY_ACCT
+   E(PROC_TGID_DELAY_ACCT,"delay",   S_IFREG|S_IRUGO),
+#endif
 #ifdef CONFIG_KALLSYMS
E(PROC_TGID_WCHAN, "wchan",   S_IFREG|S_IRUGO),
 #endif
@@ -167,6 +174,9 @@
 #ifdef CONFIG_SECURITY
E(PROC_TID_ATTR,   "attr",S_IFDIR|S_IRUGO|S_IXUGO),
 #endif
+#ifdef CONFIG_DELAY_ACCT
+   E(PROC_TGID_DELAY_ACCT,"delay",   S_IFREG|S_IRUGO),
+#endif
 #ifdef CONFIG_KALLSYMS
E(PROC_TID_WCHAN,  "wchan",   S_IFREG|S_IRUGO),
 #endif
@@ -1476,6 +1486,13 @@
ei->op.proc_read = proc_pid_wchan;
break;
 #endif
+#ifdef CONFIG_DELAY_ACCT
+   case PROC_TID_DELAY_ACCT:
+   case PROC_TGID_DELAY_ACCT:
+   inode->i_fop = _info_file_operations;
+   ei->op.proc_read = proc_pid_delay;
+   break;
+#endif
 #ifdef CONFIG_SCHEDSTATS
case PROC_TID_SCHEDSTAT:
case PROC_TGID_SCHEDSTAT:
Index: linux-2.6.11-rc5/include/linux/sched.h
===
--- linux-2.6.11-rc5.orig/include/linux/sched.h 2005-02-23 20:02:21.0 
-0800
+++ linux-2.6.11-rc5/include/linux/sched.h  2005-02-24 00:54:56.482081606 
-0800
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct exec_domain;
 
@@ -685,6 +686,9 @@
struct mempolicy *mempolicy;
short il_next;
 #endif
+#ifdef CONFIG_DELAY_ACCT
+   struct task_delay_info delays;
+#endif
 };
 
 static inline pid_t process_group(struct task_struct *tsk)
@@ -980,6 +984,9 @@
 extern void set_task_comm(struct task_struct *tsk, char *from);
 extern void get_task_comm(char *to, struct task_struct *tsk);
 
+#define PF_MEMIO   0x0040  /* I am potentially doing I/O for mem */
+#define PF_IOWAIT  0x0080  /* I am waiting on disk I/O */
+
 #ifdef CONFIG_SMP
 extern void wait_task_inactive(task_t * p);
 #else
@@ -1214,6 +1221,88 @@
return 0;
 }
 #endif /* CONFIG_PM */
+
+/* API for registering delay info */
+#ifdef CONFIG_DELAY_ACCT
+
+#define test_delay_flag(tsk,flg)   ((tsk)->flags & (flg))
+#define set_d

[PATCH] CKRM [3/8] Main/core CKRM code, beginning of RCFS

2005-02-24 Thread Gerrit Huizenga

Main code for CKRM default classification engine.  Adds Resrouce
Control (rc) filesystem as mechanism for setting policies for
class assignments in CKRM.

Signed-Off-By: Chandra Seetharaman <[EMAIL PROTECTED]>
Signed-Off-By: Hubertus Franke <[EMAIL PROTECTED]>
Signed-Off-By: Shailabh Nagar <[EMAIL PROTECTED]>
Signed-Off-By: Gerrit Huizenga <[EMAIL PROTECTED]>
Signed-Off-By: Vivek Kashyap <[EMAIL PROTECTED]>


 include/linux/ckrm_ce.h |  108 +
 include/linux/ckrm_events.h |8 
 include/linux/ckrm_rc.h |  355 
 include/linux/rcfs.h|   96 
 include/linux/sched.h   |6 
 init/main.c |2 
 kernel/ckrm/Makefile|2 
 kernel/ckrm/ckrm.c  |  927 
 kernel/ckrm/ckrmutils.c |  195 +
 9 files changed, 1694 insertions(+), 5 deletions(-)

Index: linux-2.6.11-rc5/include/linux/ckrm_ce.h
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.11-rc5/include/linux/ckrm_ce.h2005-02-24 00:55:01.390489786 
-0800
@@ -0,0 +1,95 @@
+/*
+ *  ckrm_ce.h - Header file to be used by Classification Engine of CKRM
+ *
+ * Copyright (C) Hubertus Franke, IBM Corp. 2003
+ *   (C) Shailabh Nagar,  IBM Corp. 2003
+ *   (C) Chandra Seetharaman, IBM Corp. 2003
+ * 
+ * Provides data structures, macros and kernel API of CKRM for 
+ * classification engine.
+ *
+ * Latest version, more details at http://ckrm.sf.net
+ * 
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ *
+ */
+
+#ifndef _LINUX_CKRM_CE_H
+#define _LINUX_CKRM_CE_H
+
+#ifdef CONFIG_CKRM
+
+#include 
+
+/*
+ * Action parameters identifying the cause of a task<->class notify callback 
+ * these can perculate up to user daemon consuming records send by the 
+ * classification engine
+ */
+
+typedef void *(*ce_classify_fct) (enum ckrm_event event, void *obj, ...);
+typedef void (*ce_notify_fct) (enum ckrm_event event, void *classobj,
+void *obj);
+
+struct ckrm_eng_callback {
+   /* general state information */
+   int always_callback;/* set if CE should always be called back 
+  regardless of numclasses */
+
+   /* callbacks which are called without holding locks */
+
+   unsigned long c_interest;   /* set of classification events of 
+* interest to CE 
+*/
+
+   /* generic classify */
+   ce_classify_fct classify;
+
+   /* class added */
+   void (*class_add) (const char *name, void *core, int classtype);
+
+   /* class deleted */
+   void (*class_delete) (const char *name, void *core, int classtype);
+
+   /* callbacks which are called while holding task_lock(tsk) */
+   unsigned long n_interest;   /* set of notification events of 
+*  interest to CE 
+*/
+   /* notify on class switch */
+   ce_notify_fct notify;   
+};
+
+struct inode;
+struct dentry;
+
+struct rbce_eng_callback {
+   int (*mkdir) (struct inode *, struct dentry *, int);/* mkdir */
+   int (*rmdir) (struct inode *, struct dentry *); /* rmdir */
+   int (*mnt) (void);
+   int (*umnt) (void);
+};
+
+extern int ckrm_register_engine(const char *name, struct ckrm_eng_callback *);
+extern int ckrm_unregister_engine(const char *name);
+
+extern void *ckrm_classobj(char *, int *classtype);
+
+extern int rcfs_register_engine(struct rbce_eng_callback *);
+extern int rcfs_unregister_engine(struct rbce_eng_callback *);
+
+extern int ckrm_reclassify(int pid);
+
+#ifndef _LINUX_CKRM_RC_H
+
+extern void ckrm_core_grab(struct ckrm_core_class *core);
+extern void ckrm_core_drop(struct ckrm_core_class *core);
+#endif
+
+#endif /* CONFIG_CKRM */
+#endif /* _LINUX_CKRM_CE_H */
Index: linux-2.6.11-rc5/include/linux/ckrm_events.h
===
--- linux-2.6.11-rc5.orig/include/linux/ckrm_events.h   2005-02-24 
00:54:50.530799168 -0800
+++ linux-2.6.11-rc5/include/linux/ckrm_events.h2005-02-24 
00:55:01.391489666 -0800
@@ -108,70 +108,78 @@
 extern void ckrm_invoke_event_cb_chain(enum ckrm_event ev, void *arg);
 
 /* forward declarations for function arguments */
-struct task_struct;
+
+#include/* for task_struct */
+
 struct sock;
 struct user_struct;
 
 static inline void ckrm_cb_fork(struct task_struct *p)
 {
- ckrm_i

[PATCH] CKRM [4/8] aFull directory support for rcfs

2005-02-24 Thread Gerrit Huizenga

Index: linux-2.6.11-rc5/fs/Makefile
===
--- linux-2.6.11-rc5.orig/fs/Makefile   2005-02-23 20:03:03.0 -0800
+++ linux-2.6.11-rc5/fs/Makefile2005-02-24 00:55:06.483875663 -0800
@@ -92,6 +92,7 @@
 obj-$(CONFIG_XFS_FS)   += xfs/
 obj-$(CONFIG_AFS_FS)   += afs/
 obj-$(CONFIG_BEFS_FS)  += befs/
+obj-$(CONFIG_RCFS_FS)  += rcfs/
 obj-$(CONFIG_HOSTFS)   += hostfs/
 obj-$(CONFIG_HPPFS)+= hppfs/
 obj-$(CONFIG_DEBUG_FS) += debugfs/
Index: linux-2.6.11-rc5/fs/rcfs/dir.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.11-rc5/fs/rcfs/dir.c  2005-02-24 00:55:06.484875543 -0800
@@ -0,0 +1,292 @@
+/* 
+ * fs/rcfs/dir.c 
+ *
+ * Copyright (C) Shailabh Nagar,  IBM Corp. 2004
+ *   Vivek Kashyap,   IBM Corp. 2004
+ *   
+ * 
+ * Directory operations for rcfs
+ *
+ * Latest version, more details at http://ckrm.sf.net
+ * 
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define rcfs_positive(dentry)  ((dentry)->d_inode && !d_unhashed((dentry)))
+
+int rcfs_empty(struct dentry *dentry)
+{
+   struct dentry *child;
+   int ret = 0;
+
+   spin_lock(_lock);
+   list_for_each_entry(child, >d_subdirs, d_child)
+   if (!rcfs_is_magic(child) && rcfs_positive(child))
+   goto out;
+   ret = 1;
+out:
+   spin_unlock(_lock);
+   return ret;
+}
+
+/* Directory inode operations */
+
+int
+rcfs_create(struct inode *dir, struct dentry *dentry, int mode,
+   struct nameidata *nd)
+{
+   return rcfs_mknod(dir, dentry, mode | S_IFREG, 0);
+}
+
+EXPORT_SYMBOL_GPL(rcfs_create);
+
+/* Symlinks permitted ?? */
+int rcfs_symlink(struct inode *dir, struct dentry *dentry, const char *symname)
+{
+   struct inode *inode;
+   int error = -ENOSPC;
+
+   inode = rcfs_get_inode(dir->i_sb, S_IFLNK | S_IRWXUGO, 0);
+   if (inode) {
+   int l = strlen(symname) + 1;
+   error = page_symlink(inode, symname, l);
+   if (!error) {
+   if (dir->i_mode & S_ISGID)
+   inode->i_gid = dir->i_gid;
+   d_instantiate(dentry, inode);
+   dget(dentry);
+   } else
+   iput(inode);
+   }
+   return error;
+}
+
+EXPORT_SYMBOL_GPL(rcfs_symlink);
+
+int rcfs_create_coredir(struct inode *dir, struct dentry *dentry)
+{
+
+   struct rcfs_inode_info *ripar, *ridir;
+   int sz;
+
+   ripar = rcfs_get_inode_info(dir);
+   ridir = rcfs_get_inode_info(dentry->d_inode);
+   /* Inform resource controllers - do Core operations */
+   if (ckrm_is_core_valid(ripar->core)) {
+   sz = strlen(ripar->name) + strlen(dentry->d_name.name) + 2;
+   ridir->name = kmalloc(sz, GFP_KERNEL);
+   if (!ridir->name) {
+   return -ENOMEM;
+   }
+   snprintf(ridir->name, sz, "%s/%s", ripar->name,
+dentry->d_name.name);
+   ridir->core = (*(ripar->core->classtype->alloc))
+   (ripar->core, ridir->name);
+   } else {
+   printk(KERN_ERR "rcfs_mkdir: Invalid parent core %p\n",
+  ripar->core);
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
+EXPORT_SYMBOL_GPL(rcfs_create_coredir);
+
+int rcfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
+{
+
+   int retval = 0;
+   struct ckrm_classtype *clstype;
+
+#if 0
+   struct dentry *pd = list_entry(dir->i_dentry.next, struct dentry,
+  d_alias);
+   if ((!strcmp(pd->d_name.name, "/") &&
+!strcmp(dentry->d_name.name, "ce"))) {
+   /* Call CE's mkdir if it has registered, else fail. */
+   if (rcfs_eng_callbacks.mkdir) {
+   return (*rcfs_eng_callbacks.mkdir) (dir, dentry, mode);
+   } else {
+   return -EINVAL;
+   }
+   }
+#endif
+   if (_rcfs_mknod(dir, dentry, mode | S_IFDIR, 0)) {
+   printk(KERN_ERR "rcfs_mkdir: error in _rcfs_mknod\n");
+   return retval;
+   }
+   dir->i_nlink++;
+   /* Inherit parent's ops since _rcfs_mknod assigns noperm ops. */
+   dentry->d_inode->i_op = dir->i_op;
+   dentry->d_inode->i_fop = dir->i_fop;
+   retval = rcfs_create_coredir(dir, dentry);
+   if (retval) {
+   simple_rmdir(dir, dentry);
+

[PATCH] CKRM [5/8] task based management for CPU, memory & disk IO

2005-02-24 Thread Gerrit Huizenga

 This patch provides the extensions for CKRM to track task classes.
 This is the base to enable task class based resource control for
 cpu, memory and disk I/O.

Signed-Off-By: Chandra Seetharaman <[EMAIL PROTECTED]>
Signed-Off-By: Hubertus Franke <[EMAIL PROTECTED]>
Signed-Off-By: Shailabh Nagar <[EMAIL PROTECTED]>
Signed-Off-By: Vivek Kashyap <[EMAIL PROTECTED]>
Signed-Off-By: Gerrit Huizenga <[EMAIL PROTECTED]>

Index: linux-2.6.11-rc5/fs/rcfs/Makefile
===
--- linux-2.6.11-rc5.orig/fs/rcfs/Makefile  2005-02-24 00:55:06.487875181 
-0800
+++ linux-2.6.11-rc5/fs/rcfs/Makefile   2005-02-24 00:55:10.938338577 -0800
@@ -5,3 +5,4 @@
 obj-$(CONFIG_RCFS_FS) += rcfs.o 
 
 rcfs-y := super.o inode.o dir.o rootdir.o magic.o
+rcfs-$(CONFIG_CKRM_TYPE_TASKCLASS) += tc_magic.o
Index: linux-2.6.11-rc5/fs/rcfs/rootdir.c
===
--- linux-2.6.11-rc5.orig/fs/rcfs/rootdir.c 2005-02-24 00:55:06.487875181 
-0800
+++ linux-2.6.11-rc5/fs/rcfs/rootdir.c  2005-02-24 00:55:10.938338577 -0800
@@ -58,7 +58,7 @@
return 0;
 }
 
-EXPORT_SYMBOL(rcfs_unregister_engine);
+EXPORT_SYMBOL_GPL(rcfs_unregister_engine);
 
 /*
  * rcfs_mkroot
@@ -183,6 +183,10 @@
 
 EXPORT_SYMBOL_GPL(rcfs_deregister_classtype);
 
+#ifdef CONFIG_CKRM_TYPE_TASKCLASS
+extern struct rcfs_mfdesc tc_mfdesc;
+#endif
+
 /* Common root and magic file entries.
  * root name, root permissions, magic file names and magic file permissions 
  * are needed by all entities (classtypes and classification engines) existing 
@@ -193,6 +197,10 @@
  * table to initialize their magf entries. 
  */
 
-struct rcfs_mfdesc *genmfdesc[] = {
+struct rcfs_mfdesc *genmfdesc[CKRM_MAX_CLASSTYPES] = {
+#ifdef CONFIG_CKRM_TYPE_TASKCLASS
+   _mfdesc,
+#else
NULL,
+#endif
 };
Index: linux-2.6.11-rc5/fs/rcfs/tc_magic.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.11-rc5/fs/rcfs/tc_magic.c 2005-02-24 00:55:10.939338456 -0800
@@ -0,0 +1,93 @@
+/* 
+ * fs/rcfs/tc_magic.c 
+ *
+ * Copyright (C) Shailabh Nagar,  IBM Corp. 2004
+ *   (C) Vivek Kashyap,   IBM Corp. 2004
+ *   (C) Chandra Seetharaman, IBM Corp. 2004
+ *   (C) Hubertus Franke, IBM Corp. 2004
+ *   
+ * define magic fileops for taskclass classtype
+ *
+ * Latest version, more details at http://ckrm.sf.net
+ * 
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 
+#include 
+
+/*
+ * Taskclass general
+ *
+ * Define structures for taskclass root directory and its magic files 
+ * In taskclasses, there is one set of magic files, created automatically under
+ * the taskclass root (upon classtype registration) and each directory (class) 
+ * created subsequently. However, classtypes can also choose to have different 
+ * sets of magic files created under their root and other directories under 
+ * root using their mkdir function. RCFS only provides helper functions for 
+ * creating the root directory and its magic files
+ * 
+ */
+
+#define TC_FILE_MODE (S_IFREG | S_IRUGO | S_IWUSR)
+
+#define NR_TCROOTMF  7
+struct rcfs_magf tc_rootdesc[NR_TCROOTMF] = {
+   /* First entry must be root */
+   {
+   /* .name = should not be set, copy from classtype name */
+.mode = RCFS_DEFAULT_DIR_MODE,
+.i_op = _dir_inode_operations,
+.i_fop = _dir_operations,
+},
+   /* Rest are root's magic files */
+   {
+.name = "target",
+.mode = TC_FILE_MODE,
+.i_fop = _fileops,
+.i_op = _file_inode_operations,
+},
+   {
+.name = "members",
+.mode = TC_FILE_MODE,
+.i_fop = _fileops,
+.i_op = _file_inode_operations,
+},
+   {
+.name = "stats",
+.mode = TC_FILE_MODE,
+.i_fop = _fileops,
+.i_op = _file_inode_operations,
+},
+   {
+.name = "shares",
+.mode = TC_FILE_MODE,
+.i_fop = _fileops,
+.i_op = _file_inode_operations,
+},
+   /*
+* Reclassify and Config should be made available only at the 
+* root level. Make sure they are the last two entries, as 
+* rcfs_mkdir depends on it.
+*/
+   {
+.name = "reclassify",
+.mode = TC_FILE_MODE,
+.i_fop = _fileops,
+.i_op = _file_inode_operations,
+},
+   {
+.name = "config",
+.mode = TC_FILE_MODE,
+.i_fop = _fileops,
+.i_op = _file_inode_operations,
+},
+};
+
+struct rcfs_mfdesc tc_mfdesc = {
+   .r

[PATCH] CKRM [6/8] CKRM tracking for socket classes

2005-02-24 Thread Gerrit Huizenga

This patch provides the extensions for CKRM to track per socket classes.
This is the base to enable socket based resource control for inbound
connection control, bandwidth control etc.

Signed-Off-By: Vivek Kashyap <[EMAIL PROTECTED]>
Signed-Off-By: Gerrit Huizenga <[EMAIL PROTECTED]>


Index: linux-2.6.11-rc5-ckrm01/fs/rcfs/Makefile
===
--- linux-2.6.11-rc5-ckrm01.orig/fs/rcfs/Makefile   2005-02-24 
01:09:01.190232913 -0800
+++ linux-2.6.11-rc5-ckrm01/fs/rcfs/Makefile2005-02-24 01:09:01.408206631 
-0800
@@ -6,3 +6,4 @@
 
 rcfs-y := super.o inode.o dir.o rootdir.o magic.o
 rcfs-$(CONFIG_CKRM_TYPE_TASKCLASS) += tc_magic.o
+rcfs-$(CONFIG_CKRM_TYPE_SOCKETCLASS) += socket_fs.o
Index: linux-2.6.11-rc5-ckrm01/fs/rcfs/rootdir.c
===
--- linux-2.6.11-rc5-ckrm01.orig/fs/rcfs/rootdir.c  2005-02-24 
01:09:01.191232792 -0800
+++ linux-2.6.11-rc5-ckrm01/fs/rcfs/rootdir.c   2005-02-24 01:09:34.051270771 
-0800
@@ -187,6 +187,10 @@
 extern struct rcfs_mfdesc tc_mfdesc;
 #endif
 
+#ifdef CONFIG_CKRM_TYPE_SOCKETCLASS
+extern struct rcfs_mfdesc rcfs_sock_mfdesc;
+#endif
+
 /* Common root and magic file entries.
  * root name, root permissions, magic file names and magic file permissions 
  * are needed by all entities (classtypes and classification engines) existing 
@@ -203,4 +207,10 @@
 #else
NULL,
 #endif
+#ifdef CONFIG_CKRM_TYPE_SOCKETCLASS
+   _sock_mfdesc,
+#else
+   NULL,
+#endif
+
 };
Index: linux-2.6.11-rc5-ckrm01/fs/rcfs/socket_fs.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.11-rc5-ckrm01/fs/rcfs/socket_fs.c 2005-02-24 01:09:01.410206390 
-0800
@@ -0,0 +1,308 @@
+/* ckrm_socketaq.c 
+ *
+ * Copyright (C) Vivek Kashyap,  IBM Corp. 2004
+ * 
+ * Latest version, more details at http://ckrm.sf.net
+ * 
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ */
+
+/***
+ *  Socket class type
+ *   
+ * Defines the root structure for socket based classes. Currently only inbound
+ * connection control is supported based on prioritized accept queues. 
+ 
**/
+
+#include 
+#include 
+
+extern int rcfs_create(struct inode *, struct dentry *, int,
+  struct nameidata *);
+extern int rcfs_unlink(struct inode *, struct dentry *);
+extern int rcfs_symlink(struct inode *, struct dentry *, const char *);
+extern int rcfs_mknod(struct inode *, struct dentry *, int mode, dev_t);
+extern int rcfs_mkdir(struct inode *, struct dentry *, int);
+extern int rcfs_rmdir(struct inode *, struct dentry *);
+extern int rcfs_rename(struct inode *, struct dentry *, struct inode *,
+  struct dentry *);
+
+extern int rcfs_create_coredir(struct inode *, struct dentry *);
+int rcfs_sock_mkdir(struct inode *, struct dentry *, int mode);
+int rcfs_sock_rmdir(struct inode *, struct dentry *);
+
+int rcfs_sock_create_noperm(struct inode *, struct dentry *, int,
+  struct nameidata *);
+int rcfs_sock_unlink_noperm(struct inode *, struct dentry *);
+int rcfs_sock_mkdir_noperm(struct inode *, struct dentry *, int);
+int rcfs_sock_rmdir_noperm(struct inode *, struct dentry *);
+int rcfs_sock_mknod_noperm(struct inode *, struct dentry *, int, dev_t);
+
+void rcfs_sock_set_directory(void);
+
+extern struct file_operations config_fileops,
+members_fileops, shares_fileops, stats_fileops, target_fileops;
+
+struct inode_operations my_iops = {
+   .create = rcfs_create,
+   .lookup = simple_lookup,
+   .link = simple_link,
+   .unlink = rcfs_unlink,
+   .symlink = rcfs_symlink,
+   .mkdir = rcfs_sock_mkdir,
+   .rmdir = rcfs_sock_rmdir,
+   .mknod = rcfs_mknod,
+   .rename = rcfs_rename,
+};
+
+struct inode_operations class_iops = {
+   .create = rcfs_sock_create_noperm,
+   .lookup = simple_lookup,
+   .link = simple_link,
+   .unlink = rcfs_sock_unlink_noperm,
+   .symlink = rcfs_symlink,
+   .mkdir = rcfs_sock_mkdir_noperm,
+   .rmdir = rcfs_sock_rmdir_noperm,
+   .mknod = rcfs_sock_mknod_noperm,
+   .rename = rcfs_rename,
+};
+
+struct inode_operations sub_iops = {
+   .create = rcfs_sock_create_noperm,
+   .lookup = simple_lookup,
+   .link = simple_link,
+   .unlink = rcfs_sock_unlink_noperm,
+   .symlink = rcfs_symlink,
+   .mkdir = rcfs_sock_mkdir_noperm,
+   .rmdir = rcfs_sock_rmdir_noperm,
+   .mknod = rcfs_sock_mknod_noperm,
+   .rename = rcfs_rename,
+};
+
+struct rcfs_m

[PATCH] CKRM [7/8] Resource controller for number of tasks per class

2005-02-24 Thread Gerrit Huizenga

This patch provides a resource controller for limiting the number
of tasks per class in CKRM.

Signed-Off-By: Chandra Seetharaman <[EMAIL PROTECTED]>
Signed-Off-By: Hubertus Franke <[EMAIL PROTECTED]>
Signed-Off-By: Shailabh Nagar <[EMAIL PROTECTED]>
Signed-Off-By: Gerrit Huizenga <[EMAIL PROTECTED]>


Index: linux-2.6.11-rc5-ckrm01/include/linux/ckrm_tsk.h
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.11-rc5-ckrm01/include/linux/ckrm_tsk.h2005-02-24 
01:09:42.896204314 -0800
@@ -0,0 +1,35 @@
+/* ckrm_tsk.h - No. of tasks resource controller for CKRM
+ *
+ * Copyright (C) Chandra Seetharaman, IBM Corp. 2003
+ * 
+ * Provides No. of tasks resource controller for CKRM
+ *
+ * Latest version, more details at http://ckrm.sf.net
+ * 
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ */
+
+#ifndef _LINUX_CKRM_TSK_H
+#define _LINUX_CKRM_TSK_H
+
+#ifdef CONFIG_CKRM_TYPE_TASKCLASS
+#include 
+
+typedef int (*get_ref_t) (struct ckrm_core_class *, int);
+typedef void (*put_ref_t) (struct ckrm_core_class *);
+
+extern int numtasks_get_ref(struct ckrm_core_class *, int);
+extern void numtasks_put_ref(struct ckrm_core_class *);
+extern void ckrm_numtasks_register(get_ref_t, put_ref_t);
+
+#else /* CONFIG_CKRM_TYPE_TASKCLASS */
+
+#define numtasks_get_ref(core_class, ref) (1)
+#define numtasks_put_ref(core_class)  do {} while (0)
+
+#endif /* CONFIG_CKRM_TYPE_TASKCLASS */
+#endif /* _LINUX_CKRM_RES_H */
Index: linux-2.6.11-rc5-ckrm01/init/Kconfig
===
--- linux-2.6.11-rc5-ckrm01.orig/init/Kconfig   2005-02-24 01:09:01.423204823 
-0800
+++ linux-2.6.11-rc5-ckrm01/init/Kconfig2005-02-24 01:09:42.897204193 
-0800
@@ -183,6 +183,15 @@

  Say N if unsure.  
 
+config CKRM_RES_NUMTASKS
+   tristate "Number of Tasks Resource Manager"
+   depends on CKRM_TYPE_TASKCLASS
+   default m
+   help
+ Provides a Resource Controller for CKRM that allows limiting no of
+ tasks a task class can have.
+   
+ Say N if unsure, Y to use the feature.
 endmenu
 
 config SYSCTL
Index: linux-2.6.11-rc5-ckrm01/kernel/ckrm/ckrm_numtasks.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.11-rc5-ckrm01/kernel/ckrm/ckrm_numtasks.c 2005-02-24 
01:09:42.898204073 -0800
@@ -0,0 +1,522 @@
+/* ckrm_numtasks.c - "Number of tasks" resource controller for CKRM
+ *
+ * Copyright (C) Chandra Seetharaman,  IBM Corp. 2003
+ * 
+ * Latest version, more details at http://ckrm.sf.net
+ * 
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ */
+
+/*
+ * CKRM Resource controller for tracking number of tasks in a class.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define TOTAL_NUM_TASKS (131072)   /* 128 K */
+#define NUMTASKS_DEBUG
+#define NUMTASKS_NAME "numtasks"
+
+struct ckrm_numtasks {
+   struct ckrm_core_class *core;   /* the core i am part of... */
+   struct ckrm_core_class *parent; /* parent of the core above. */
+   struct ckrm_shares shares;
+   spinlock_t cnt_lock;/* always grab parent's lock before child's */
+   int cnt_guarantee;  /* num_tasks guarantee in local units */
+   int cnt_unused; /* has to borrow if more than this is needed */
+   int cnt_limit;  /* no tasks over this limit. */
+   atomic_t cnt_cur_alloc; /* current alloc from self */
+   atomic_t cnt_borrowed;  /* borrowed from the parent */
+
+   int over_guarantee; /* turn on/off when cur_alloc goes  */
+   /* over/under guarantee */
+
+   /* internally maintained statictics to compare with max numbers */
+   int limit_failures; /* # failures as request was over the limit */
+   int borrow_sucesses;/* # successful borrows */
+   int borrow_failures;/* # borrow failures */
+
+   /* Maximum the specific statictics has reached. */
+   int max_limit_failures;
+   int max_borrow_sucesses;
+   int max_borrow_failures;
+
+   /* Total number of specific statistics */
+   int tot_limit_failures;
+   int tot_borrow_sucesses;
+   int tot_borrow_failures;
+};
+
+struct ckrm_res_ctlr numtasks_rcbs;
+
+/* Initialize rescls values
+ * May be called on each rcfs unmount or as part of error recover

[PATCH] CKRM [8/8] CKRM Documentation

2005-02-24 Thread Gerrit Huizenga

This patch adds all current documentation on CKRM.

Signed-Off-By: Hubertus Franke <[EMAIL PROTECTED]>
Signed-Off-By: Chandra Seetharaman <[EMAIL PROTECTED]>
Signed-Off-By: Shailabh Nagar <[EMAIL PROTECTED]>
Signed-Off-By: Vivek Kashyap <[EMAIL PROTECTED]>
Signed-Off-By: Gerrit Huizenga <[EMAIL PROTECTED]>

Index: linux-2.6.11-rc5-ckrm01/Documentation/ckrm/ckrm_basics
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.11-rc5-ckrm01/Documentation/ckrm/ckrm_basics  2005-02-24 
01:09:47.024706529 -0800
@@ -0,0 +1,66 @@
+CKRM Basics
+-
+A brief review of CKRM concepts and terminology will help make installation
+and testing easier. For more details, please visit http://ckrm.sf.net. 
+
+Currently there are two class types, taskclass and socketclass for grouping,
+regulating and monitoring tasks and sockets respectively.
+
+To avoid repeating instructions for each classtype, this document assumes a
+task to be the kernel object being grouped. By and large, one can replace task
+with socket and taskclass with socketclass.
+
+RCFS depicts a CKRM class as a directory. Hierarchy of classes can be
+created in which children of a class share resources allotted to
+the parent. Tasks can be classified to any class which is at any level.
+There is no correlation between parent-child relationship of tasks and
+the parent-child relationship of classes they belong to.
+
+Without a Classification Engine, class is inherited by a task. A privileged
+user can reassigned a task to a class as described below, after which all
+the child tasks under that task will be assigned to that class, unless the
+user reassigns any of them.
+
+A Classification Engine, if one exists, will be used by CKRM to
+classify a task to a class. The Rule based classification engine uses some
+of the attributes of the task to classify a task. When a CE is present
+class is not inherited by a task.
+
+Characteristics of a class can be accessed/changed through the following magic
+files under the directory representing the class:
+
+shares:  allows to change the shares of different resources managed by the
+ class
+stats:   allows to see the statistics associated with each resources managed
+ by the class
+target:  allows to assign a task to a class. If a CE is present, assigning
+ a task to a class through this interface will prevent CE from
+reassigning the task to any class during reclassification.
+members: allows to see which tasks has been assigned to a class
+config:  allow to view and modify configuration information of different
+ resources in a class.
+
+Resource allocations for a class is controlled by the parameters:
+
+guarantee: specifies how much of a resource is guranteed to a class. A
+   special value DONT_CARE(-2) mean that there is no specific
+  guarantee of a resource is specified, this class may not get
+  any resource if the system is runing short of resources
+limit: specifies the maximum amount of resource that is allowed to be
+   allocated by a class. A special value DONT_CARE(-2) mean that
+  there is no specific limit is specified, this class can get all
+  the resources available.
+total_guarantee: total guarantee that is allowed among the children of this
+   class. In other words, the sum of "guarantee"s of all children
+  of this class cannot exit this number.
+max_limit: Maximum "limit" allowed for any of this class's children. In
+  other words, "limit" of any children of this class cannot exceed
+  this value.
+
+None of this parameters are absolute or have any units associated with
+them. These are just numbers(that are relative to its parents') that are
+used to calculate the absolute number of resource available for a specific
+class.
+
+Note: The root class has an absolute number of resource units associated with 
it.
+
Index: linux-2.6.11-rc5-ckrm01/Documentation/ckrm/core_usage
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.11-rc5-ckrm01/Documentation/ckrm/core_usage   2005-02-24 
01:09:47.025706409 -0800
@@ -0,0 +1,72 @@
+Usage of CKRM without a classification engine
+---
+
+1. Create a class
+
+   # mkdir /rcfs/taskclass/c1
+   creates a taskclass named c1 , while
+   # mkdir /rcfs/socket_class/s1
+   creates a socketclass named s1 
+
+The newly created class directory is automatically populated by magic files
+shares, stats, members, target and config.
+
+2. View default shares 
+
+   # cat /rcfs/taskclass/c1/shares
+
+   "guarantee=-2,limit=-2,total_guarantee=100,max_limit=100" is the default
+   value set for resources that have controllers registered with CKRM.
+
+3.

Re: [PATCH] CKRM: 4/10 CKRM: Full rcfs support

2005-02-24 Thread Gerrit Huizenga


On Mon, 29 Nov 2004 14:15:48 PST, Greg KH wrote:
> On Mon, Nov 29, 2004 at 10:48:24AM -0800, Gerrit Huizenga wrote:
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include 
> 
> asm last please.
 
Fixed.

> > +/*
> > + * Address of variable used as flag to indicate a magic file, 
> > + * value unimportant
> > + */ 
> > +int RCFS_IS_MAGIC;
> 
> Shouldn't this be static?
 
Nope - used across files.

> And what is a "magic" file used for?  I see where you set something to
> point to this, but no where do you check for it.  What's the use of it?
 
I believe that these are auto-created file entries which are instantiated
when a class is created, hence they "magically" appear.  They are also
special in the sense that they are tied to the life of the class, unlike
other files in the class directories.  The MAGIC value is used to help
distinguish these auto-created entries from other entries in a directory.
This is a little bit like "." and ".." but specific to the class creation.

> > +int _rcfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t 
> > dev)
> > +{
> > +   struct inode *inode;
> > +   int error = -EPERM;
> > +
> > +   if (dentry->d_inode)
> > +   return -EEXIST;
> > +   inode = rcfs_get_inode(dir->i_sb, mode, dev);
> > +   if (inode) {
> > +   if (dir->i_mode & S_ISGID) {
> > +   inode->i_gid = dir->i_gid;
> > +   if (S_ISDIR(mode))
> > +   inode->i_mode |= S_ISGID;
> > +   }
> > +   d_instantiate(dentry, inode);
> > +   dget(dentry);
> > +   error = 0;
> > +   }
> > +   return error;
> > +}
> > +
> > +EXPORT_SYMBOL_GPL(_rcfs_mknod);
> > +
> > +int rcfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t 
> > dev)
> > +{
> > +   /* User can only create directories, not files */
> > +   if ((mode & S_IFMT) != S_IFDIR)
> > +   return -EINVAL;
> > +
> > +   return dir->i_op->mkdir(dir, dentry, mode);
> > +}
> > +
> > +EXPORT_SYMBOL_GPL(rcfs_mknod);
> 
> Why 2 mknod functions?  Do they both really need to be exported?
 
I believe they are both exported so resource controllers can create
class specific directories (including corresponding magic files)
in the case of _rcfs_mknod, and rcfs_mknod is the exported fs op
which allows a restricted set of standard user filesystem operations
within the created directory.  So, yes.

> > +
> > +#define MAGIC_SHOW(FUNC)   \
> > +static int \
> 
> You mix tabs and spaces in your #defines in this file, please just use
> tabs properly.
 
Fixed.

> > +static ssize_t
> > +target_reclassify_write(struct file *file, const char __user * buf,
> > +   size_t count, loff_t * ppos, int manual)
> > +{
> > +   struct rcfs_inode_info *ri = RCFS_I(file->f_dentry->d_inode);
> > +   char *optbuf;
> > +   int rc = -EINVAL;
> > +   ckrm_classtype_t *clstype;
> > +
> > +   if ((ssize_t) count < 0 || (ssize_t) count > TARGET_MAX_INPUT_SIZE)
> > +   return -EINVAL;
> 
> But count is an unsigned variable, right?  How could it ever be
> negative?
 
Yep.  But see how those nice casts covered up all the warnings?  ;-)
(Fixed!)

> > +   if (!access_ok(VERIFY_READ, buf, count))
> > +   return -EFAULT;
> > +   down(&(ri->vfs_inode.i_sem));
> > +   optbuf = kmalloc(TARGET_MAX_INPUT_SIZE, GFP_KERNEL);
> 
> kmalloc with a lock held?  Is that a good idea?

Lock?  Or sema?  Sema should be okay here, right?

> You also don't check the return value of kmalloc, that's a bad idea.
 
Yep - good catch.  Fixed.

> > +   __copy_from_user(optbuf, buf, count);
> > +   if (optbuf[count - 1] == '\n')
> > +   optbuf[count - 1] = '\0';
> 
> Stripping off a single trailing \n character?  Why?
 
I believe this is the "echo value > /rcfs/class/magic_file".  If
there is a newline, it would show up as an extra newline during
an ls.  Of course, Shailabh can correct me if I'm wrong on this one.

> > +inline struct rcfs_inode_info *RCFS_I(struct inode *inode)
> > +{
> > +   return container_of(inode, struct rcfs_inode_inf

Re: [PATCH] CKRM: 7/10 CKRM: Resource controller for number of tasks

2005-02-24 Thread Gerrit Huizenga


On Mon, 29 Nov 2004 15:01:48 PST, Greg KH wrote:
> On Mon, Nov 29, 2004 at 10:50:39AM -0800, Gerrit Huizenga wrote:
> > +static spinlock_t stub_lock = SPIN_LOCK_UNLOCKED;
> > +
> > +static get_ref_t real_get_ref = NULL;
> > +static put_ref_t real_put_ref = NULL;
> > +
> > +void ckrm_numtasks_register(get_ref_t gr, put_ref_t pr)
> > +{
> > +   spin_lock(_lock);
> > +   real_get_ref = gr;
> > +   real_put_ref = pr;
> > +   spin_unlock(_lock);
> > +}
> > +
> > +int numtasks_get_ref(void *arg, int force)
> > +{
> > +   int ret = 1;
> > +   spin_lock(_lock);
> > +   if (real_get_ref) {
> > +   ret = (*real_get_ref) (arg, force);
> > +   }
> > +   spin_unlock(_lock);
> > +   return ret;
> > +}
> > +
> > +void numtasks_put_ref(void *arg)
> > +{
> > +   spin_lock(_lock);
> > +   if (real_put_ref) {
> > +   (*real_put_ref) (arg);
> > +   }
> > +   spin_unlock(_lock);
> > +}
> > +
> > +EXPORT_SYMBOL(ckrm_numtasks_register);
> > +EXPORT_SYMBOL(numtasks_get_ref);
> > +EXPORT_SYMBOL(numtasks_put_ref);
> 
> Why are these functions used instead of calling the real functions?
> They are only ever used to register a single set of functions anyway.
 
The real functions are dummy's by default and can be loaded by
a module.  

> Oh, and void * is to be avoided at all costs...
 
Fixed.

thanks,

gerrit
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] CKRM: 3/10 CKRM: Core ckrm, rcfs

2005-02-24 Thread Gerrit Huizenga


On Mon, 29 Nov 2004 14:00:47 PST, Greg KH wrote:
> On Mon, Nov 29, 2004 at 10:47:32AM -0800, Gerrit Huizenga wrote:
> > +/* Changes
> > + *
> > + * 12 Nov 2003
> > + *Created.
> > + * 22 Apr 2004
> > + *Adopted to classtypes
> > + */
> 
> Ok, I'm not going to say this for every future file... :)
 
Good - I won't have to say "globally fixed" for each one, then.  ;)

> > +#ifdef __KERNEL__
> 
> Not needed.

Ditto.

> > +typedef void *(*ce_classify_fct_t) (enum ckrm_event event, void *obj, ...);
> > +typedef void (*ce_notify_fct_t) (enum ckrm_event event, void *classobj,
> > +void *obj);
> 
> Ick.  Don't put a _t at the end of a typedef.  Wrong OS style guide.
 
Fixed.  Although this isn't an OS style guide thing - it is a Posix
driven convention whereby any header file defined in the standard
automatically has _t suffixed variables reserved to the implementation,
e.g. no application is define variables using _t.  This header file isn't
being used by user level applications so it doesn't matter.

> > +typedef struct ckrm_eng_callback {
> 
> no typedef.

Fixed (globally).

> > +   /* general state information */
> > +   int always_callback;/* set if CE should always be called back 
> > +  regardless of numclasses */
> > +
> > +   /* callbacks which are called without holding locks */
> > +
> > +   unsigned long c_interest;   /* set of classification events of 
> > +* interest to CE 
> > +*/
> > +
> > +   /* generic classify */
> > +   ce_classify_fct_t classify;
> > +
> > +   /* class added */
> > +   void (*class_add) (const char *name, void *core, int classtype);
> > +
> > +   /* class deleted */
> > +   void (*class_delete) (const char *name, void *core, int classtype);
> > +
> > +   /* callbacks which are called while holding task_lock(tsk) */
> > +   unsigned long n_interest;   /* set of notification events of 
> > +*  interest to CE 
> > +*/
> > +   /* notify on class switch */
> > +   ce_notify_fct_t notify; 
> > +} ckrm_eng_callback_t;
> 
> Especially one that ends in _t again :(
 
Fixed (globally).

> > +struct inode;
> > +struct dentry;
> > +
> > +typedef struct rbce_eng_callback {
> > +   int (*mkdir) (struct inode *, struct dentry *, int);/* mkdir */
> > +   int (*rmdir) (struct inode *, struct dentry *); /* rmdir */
> > +   int (*mnt) (void);
> > +   int (*umnt) (void);
> > +} rbce_eng_callback_t;
> 
> Again with the unneeded typedef.  Come on Gerrit, you should know
> better...
 
Sorry, years of implementing Posix conformant OS's and system header
files make this very common for anyone (including several of the
CKRM developers).  Specifically because of user level name space
collision avoidance issues (e.g. think preserving backwards compatibility
for user level apps).  It is the primary mechanism for simplifying the
#ifdef __KERNEL__ crap used in most OS's.

> > +extern int ckrm_register_engine(const char *name, ckrm_eng_callback_t *);
> > +extern int ckrm_unregister_engine(const char *name);
> > +
> > +extern void *ckrm_classobj(char *, int *classtype);
> > +extern int get_exe_path_name(struct task_struct *t, char *filename,
> > +int max_size);
> 
> Wasn't this function in some other header file already?
 
And equally unnecessary in the current code.  Fixed.

> > +
> > +extern int rcfs_register_engine(rbce_eng_callback_t *);
> > +extern int rcfs_unregister_engine(rbce_eng_callback_t *);
> > +
> > +extern int ckrm_reclassify(int pid);
> > +
> > +#ifndef _LINUX_CKRM_RC_H
> > +
> > +extern void ckrm_core_grab(void *);
> > +extern void ckrm_core_drop(void *);
> 
> void *?  You can't use a proper type?
 
That was odd - definition was correct, declaration was silly.  Fixed.

> > +typedef struct ckrm_shares {
> > +   int my_guarantee;
> > +   int my_limit;
> > +   int total_guarantee;
> > +   int max_limit;
> > +   int unused_guarantee;   /* not used as parameters */
> > +   int cur_max_limit;  /* not used as parameters */
> > +} ckrm_shares_t;
> 
> Consider this the last of the "no more typedefs except for function
> pointers" reminders for the rest of the code base.
 
Good enough.  All applied.

> > +
> > +#define CKRM_SHARE_UNCHANGED (-1)  
> > +#define CKRM_SHARE_DONTCARE  (-2)  
> > +#define

Re: [PATCH] CKRM: 2/10 CKRM: Accurate delay accounting

2005-02-24 Thread Gerrit Huizenga


On Mon, 29 Nov 2004 13:38:25 PST, Greg KH wrote:
> On Mon, Nov 29, 2004 at 10:46:53AM -0800, Gerrit Huizenga wrote:
> > @@ -912,6 +915,9 @@
> >  extern void set_task_comm(struct task_struct *tsk, char *from);
> >  extern void get_task_comm(char *to, struct task_struct *tsk);
> >  
> > +#define PF_MEMIO   0x0040  /* I am  potentially doing I/O 
> > for mem */
> > +#define PF_IOWAIT   0x0080  /* I am waiting on disk I/O */
> > +
> 
> Mix of tabs and spaces :(
 
Fixed.

> >  #ifdef CONFIG_SMP
> >  extern void wait_task_inactive(task_t * p);
> >  #else
> > @@ -,6 +1117,86 @@
> >  
> >  #endif
> >  
> > +/* API for registering delay info */
> > +#ifdef CONFIG_DELAY_ACCT
> > +
> > +#define test_delay_flag(tsk,flg)((tsk)->flags & (flg))
> > +#define set_delay_flag(tsk,flg) ((tsk)->flags |= (flg))
> > +#define clear_delay_flag(tsk,flg)   ((tsk)->flags &= ~(flg))
> > +
> > +#define def_delay_var(var) unsigned long long var
> > +#define get_delay(tsk,field)((tsk)->delays.field)
> > +
> > +#define start_delay(var)((var) = sched_clock())
> > +#define start_delay_set(var,flg)
> > (set_delay_flag(current,flg),(var) = sched_clock())
> 
> You mixed tabs and spaces here.  Just use tabs please.
 
Fixed.

> > +#define add_delay_clear(tsk,field,start_ts,flg)\
> > +   do {   \
> > +   unsigned long long now = sched_clock();\
> > +   add_delay_ts(tsk,field,start_ts,now);  \
> > +   clear_delay_flag(tsk,flg); \
> > +} while (0)
> 
> -ENOTABS
 
Fixed.

> > +#else
> > +
> > +#define test_delay_flag(tsk,flg)(0)
> > +#define set_delay_flag(tsk,flg) do { } while (0)
> > +#define clear_delay_flag(tsk,flg)   do { } while (0)
> > +
> > +#define def_delay_var(var)   
> > +#define get_delay(tsk,field)(0)
> > +
> > +#define start_delay(var)do { } while (0)
> > +#define start_delay_set(var,flg)do { } while (0)
> > +
> > +#define inc_delay(tsk,field)do { } while (0)
> > +#define add_delay_ts(tsk,field,start_ts,now)do { } while (0)
> > +#define add_delay_clear(tsk,field,start_ts,flg) do { } while (0)
> > +#define add_io_delay(dstart)   do { } while (0) 
> > +#define init_delays(tsk)do { } while (0)
> > +#endif
> > +
> 
> It's that key over there on the left hand side of the keyboard...
 
Found it.  Thanks!

> > +/* Changes
> > + *
> > + * 24 Aug 2003
> > + *Created.
> > + */
> 
> No changelogs in files again please.
 
Globally gone.

> > +
> > +#ifndef _LINUX_TASKDELAYS_H
> > +#define _LINUX_TASKDELAYS_H
> > +
> > +#include 
> > +#include 
> > +
> > +struct task_delay_info {
> > +#if defined CONFIG_DELAY_ACCT 
> > +   /* delay statistics in usecs */
> > +   uint64_t waitcpu_total;
> > +   uint64_t runcpu_total;
> > +   uint64_t iowait_total;
> > +   uint64_t mem_iowait_total;
> > +   uint32_t runs;
> > +   uint32_t num_iowaits;
> > +   uint32_t num_memwaits;
> > +#endif 
> > +};
> 
> A null structure otherwise?  Why?
 
Bizarro world.  Fixed.

> > +#ifdef CONFIG_DELAY_ACCT
> > +int task_running_sys(struct task_struct *p)
> > +{
> > +   return task_is_running(p);
> > +}
> > +EXPORT_SYMBOL_GPL(task_running_sys);
> > +#endif
> 
> So LGPL code can use EXPORT_SYMBOL_GPL?

You lost me on this one - what LGPL code is using task_running_sys?
(and actually, I've held off on resubmitting the RBCE code for a few
days which is the code that uses this while we get some air time on
the broken up patch).

gerrit
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] CKRM: 2/10 CKRM: Accurate delay accounting

2005-02-24 Thread Gerrit Huizenga


On Mon, 29 Nov 2004 13:38:25 PST, Greg KH wrote:
 On Mon, Nov 29, 2004 at 10:46:53AM -0800, Gerrit Huizenga wrote:
  @@ -912,6 +915,9 @@
   extern void set_task_comm(struct task_struct *tsk, char *from);
   extern void get_task_comm(char *to, struct task_struct *tsk);
   
  +#define PF_MEMIO   0x0040  /* I am  potentially doing I/O 
  for mem */
  +#define PF_IOWAIT   0x0080  /* I am waiting on disk I/O */
  +
 
 Mix of tabs and spaces :(
 
Fixed.

   #ifdef CONFIG_SMP
   extern void wait_task_inactive(task_t * p);
   #else
  @@ -,6 +1117,86 @@
   
   #endif
   
  +/* API for registering delay info */
  +#ifdef CONFIG_DELAY_ACCT
  +
  +#define test_delay_flag(tsk,flg)((tsk)-flags  (flg))
  +#define set_delay_flag(tsk,flg) ((tsk)-flags |= (flg))
  +#define clear_delay_flag(tsk,flg)   ((tsk)-flags = ~(flg))
  +
  +#define def_delay_var(var) unsigned long long var
  +#define get_delay(tsk,field)((tsk)-delays.field)
  +
  +#define start_delay(var)((var) = sched_clock())
  +#define start_delay_set(var,flg)
  (set_delay_flag(current,flg),(var) = sched_clock())
 
 You mixed tabs and spaces here.  Just use tabs please.
 
Fixed.

  +#define add_delay_clear(tsk,field,start_ts,flg)\
  +   do {   \
  +   unsigned long long now = sched_clock();\
  +   add_delay_ts(tsk,field,start_ts,now);  \
  +   clear_delay_flag(tsk,flg); \
  +} while (0)
 
 -ENOTABS
 
Fixed.

  +#else
  +
  +#define test_delay_flag(tsk,flg)(0)
  +#define set_delay_flag(tsk,flg) do { } while (0)
  +#define clear_delay_flag(tsk,flg)   do { } while (0)
  +
  +#define def_delay_var(var)   
  +#define get_delay(tsk,field)(0)
  +
  +#define start_delay(var)do { } while (0)
  +#define start_delay_set(var,flg)do { } while (0)
  +
  +#define inc_delay(tsk,field)do { } while (0)
  +#define add_delay_ts(tsk,field,start_ts,now)do { } while (0)
  +#define add_delay_clear(tsk,field,start_ts,flg) do { } while (0)
  +#define add_io_delay(dstart)   do { } while (0) 
  +#define init_delays(tsk)do { } while (0)
  +#endif
  +
 
 It's that key over there on the left hand side of the keyboard...
 
Found it.  Thanks!

  +/* Changes
  + *
  + * 24 Aug 2003
  + *Created.
  + */
 
 No changelogs in files again please.
 
Globally gone.

  +
  +#ifndef _LINUX_TASKDELAYS_H
  +#define _LINUX_TASKDELAYS_H
  +
  +#include linux/config.h
  +#include linux/types.h
  +
  +struct task_delay_info {
  +#if defined CONFIG_DELAY_ACCT 
  +   /* delay statistics in usecs */
  +   uint64_t waitcpu_total;
  +   uint64_t runcpu_total;
  +   uint64_t iowait_total;
  +   uint64_t mem_iowait_total;
  +   uint32_t runs;
  +   uint32_t num_iowaits;
  +   uint32_t num_memwaits;
  +#endif 
  +};
 
 A null structure otherwise?  Why?
 
Bizarro world.  Fixed.

  +#ifdef CONFIG_DELAY_ACCT
  +int task_running_sys(struct task_struct *p)
  +{
  +   return task_is_running(p);
  +}
  +EXPORT_SYMBOL_GPL(task_running_sys);
  +#endif
 
 So LGPL code can use EXPORT_SYMBOL_GPL?

You lost me on this one - what LGPL code is using task_running_sys?
(and actually, I've held off on resubmitting the RBCE code for a few
days which is the code that uses this while we get some air time on
the broken up patch).

gerrit
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] CKRM: 3/10 CKRM: Core ckrm, rcfs

2005-02-24 Thread Gerrit Huizenga


On Mon, 29 Nov 2004 14:00:47 PST, Greg KH wrote:
 On Mon, Nov 29, 2004 at 10:47:32AM -0800, Gerrit Huizenga wrote:
  +/* Changes
  + *
  + * 12 Nov 2003
  + *Created.
  + * 22 Apr 2004
  + *Adopted to classtypes
  + */
 
 Ok, I'm not going to say this for every future file... :)
 
Good - I won't have to say globally fixed for each one, then.  ;)

  +#ifdef __KERNEL__
 
 Not needed.

Ditto.

  +typedef void *(*ce_classify_fct_t) (enum ckrm_event event, void *obj, ...);
  +typedef void (*ce_notify_fct_t) (enum ckrm_event event, void *classobj,
  +void *obj);
 
 Ick.  Don't put a _t at the end of a typedef.  Wrong OS style guide.
 
Fixed.  Although this isn't an OS style guide thing - it is a Posix
driven convention whereby any header file defined in the standard
automatically has _t suffixed variables reserved to the implementation,
e.g. no application is define variables using _t.  This header file isn't
being used by user level applications so it doesn't matter.

  +typedef struct ckrm_eng_callback {
 
 no typedef.

Fixed (globally).

  +   /* general state information */
  +   int always_callback;/* set if CE should always be called back 
  +  regardless of numclasses */
  +
  +   /* callbacks which are called without holding locks */
  +
  +   unsigned long c_interest;   /* set of classification events of 
  +* interest to CE 
  +*/
  +
  +   /* generic classify */
  +   ce_classify_fct_t classify;
  +
  +   /* class added */
  +   void (*class_add) (const char *name, void *core, int classtype);
  +
  +   /* class deleted */
  +   void (*class_delete) (const char *name, void *core, int classtype);
  +
  +   /* callbacks which are called while holding task_lock(tsk) */
  +   unsigned long n_interest;   /* set of notification events of 
  +*  interest to CE 
  +*/
  +   /* notify on class switch */
  +   ce_notify_fct_t notify; 
  +} ckrm_eng_callback_t;
 
 Especially one that ends in _t again :(
 
Fixed (globally).

  +struct inode;
  +struct dentry;
  +
  +typedef struct rbce_eng_callback {
  +   int (*mkdir) (struct inode *, struct dentry *, int);/* mkdir */
  +   int (*rmdir) (struct inode *, struct dentry *); /* rmdir */
  +   int (*mnt) (void);
  +   int (*umnt) (void);
  +} rbce_eng_callback_t;
 
 Again with the unneeded typedef.  Come on Gerrit, you should know
 better...
 
Sorry, years of implementing Posix conformant OS's and system header
files make this very common for anyone (including several of the
CKRM developers).  Specifically because of user level name space
collision avoidance issues (e.g. think preserving backwards compatibility
for user level apps).  It is the primary mechanism for simplifying the
#ifdef __KERNEL__ crap used in most OS's.

  +extern int ckrm_register_engine(const char *name, ckrm_eng_callback_t *);
  +extern int ckrm_unregister_engine(const char *name);
  +
  +extern void *ckrm_classobj(char *, int *classtype);
  +extern int get_exe_path_name(struct task_struct *t, char *filename,
  +int max_size);
 
 Wasn't this function in some other header file already?
 
And equally unnecessary in the current code.  Fixed.

  +
  +extern int rcfs_register_engine(rbce_eng_callback_t *);
  +extern int rcfs_unregister_engine(rbce_eng_callback_t *);
  +
  +extern int ckrm_reclassify(int pid);
  +
  +#ifndef _LINUX_CKRM_RC_H
  +
  +extern void ckrm_core_grab(void *);
  +extern void ckrm_core_drop(void *);
 
 void *?  You can't use a proper type?
 
That was odd - definition was correct, declaration was silly.  Fixed.

  +typedef struct ckrm_shares {
  +   int my_guarantee;
  +   int my_limit;
  +   int total_guarantee;
  +   int max_limit;
  +   int unused_guarantee;   /* not used as parameters */
  +   int cur_max_limit;  /* not used as parameters */
  +} ckrm_shares_t;
 
 Consider this the last of the no more typedefs except for function
 pointers reminders for the rest of the code base.
 
Good enough.  All applied.

  +
  +#define CKRM_SHARE_UNCHANGED (-1)  
  +#define CKRM_SHARE_DONTCARE  (-2)  
  +#define CKRM_SHARE_DFLT_TOTAL_GUARANTEE (100) 
  +#define CKRM_SHARE_DFLT_MAX_LIMIT (100)  
 
 Trailing whitespace that is a tab, but yet, no tab within the define
 itself.  Odd creature.
 
Yeah, I'm not sure what some of the original authors used for editors
or if they just had big thumbs resting on the space bar.  Fixed.

  +#define CKRM_CORE_MAGIC0xBADCAFFE

 Magic checks should not be needed at all.  Please drop them all.
 
I'd like to leave them in while we are testing with -mm to help
tracking down any potential problems.  Prior to going to Linus',
yes, I think it makes sense to get rid of these.

  +typedef struct ckrm_hnode {
  +   struct ckrm_core_class *parent;
  +   struct

Re: [PATCH] CKRM: 7/10 CKRM: Resource controller for number of tasks

2005-02-24 Thread Gerrit Huizenga


On Mon, 29 Nov 2004 15:01:48 PST, Greg KH wrote:
 On Mon, Nov 29, 2004 at 10:50:39AM -0800, Gerrit Huizenga wrote:
  +static spinlock_t stub_lock = SPIN_LOCK_UNLOCKED;
  +
  +static get_ref_t real_get_ref = NULL;
  +static put_ref_t real_put_ref = NULL;
  +
  +void ckrm_numtasks_register(get_ref_t gr, put_ref_t pr)
  +{
  +   spin_lock(stub_lock);
  +   real_get_ref = gr;
  +   real_put_ref = pr;
  +   spin_unlock(stub_lock);
  +}
  +
  +int numtasks_get_ref(void *arg, int force)
  +{
  +   int ret = 1;
  +   spin_lock(stub_lock);
  +   if (real_get_ref) {
  +   ret = (*real_get_ref) (arg, force);
  +   }
  +   spin_unlock(stub_lock);
  +   return ret;
  +}
  +
  +void numtasks_put_ref(void *arg)
  +{
  +   spin_lock(stub_lock);
  +   if (real_put_ref) {
  +   (*real_put_ref) (arg);
  +   }
  +   spin_unlock(stub_lock);
  +}
  +
  +EXPORT_SYMBOL(ckrm_numtasks_register);
  +EXPORT_SYMBOL(numtasks_get_ref);
  +EXPORT_SYMBOL(numtasks_put_ref);
 
 Why are these functions used instead of calling the real functions?
 They are only ever used to register a single set of functions anyway.
 
The real functions are dummy's by default and can be loaded by
a module.  

 Oh, and void * is to be avoided at all costs...
 
Fixed.

thanks,

gerrit
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] CKRM: 4/10 CKRM: Full rcfs support

2005-02-24 Thread Gerrit Huizenga


On Mon, 29 Nov 2004 14:15:48 PST, Greg KH wrote:
 On Mon, Nov 29, 2004 at 10:48:24AM -0800, Gerrit Huizenga wrote:
  +#include linux/module.h
  +#include linux/list.h
  +#include linux/fs.h
  +#include linux/namei.h
  +#include linux/namespace.h
  +#include linux/dcache.h
  +#include linux/seq_file.h
  +#include linux/pagemap.h
  +#include linux/highmem.h
  +#include linux/init.h
  +#include linux/string.h
  +#include linux/smp_lock.h
  +#include linux/backing-dev.h
  +#include linux/parser.h
  +#include asm/uaccess.h
  +
  +#include linux/rcfs.h
 
 asm last please.
 
Fixed.

  +/*
  + * Address of variable used as flag to indicate a magic file, 
  + * value unimportant
  + */ 
  +int RCFS_IS_MAGIC;
 
 Shouldn't this be static?
 
Nope - used across files.

 And what is a magic file used for?  I see where you set something to
 point to this, but no where do you check for it.  What's the use of it?
 
I believe that these are auto-created file entries which are instantiated
when a class is created, hence they magically appear.  They are also
special in the sense that they are tied to the life of the class, unlike
other files in the class directories.  The MAGIC value is used to help
distinguish these auto-created entries from other entries in a directory.
This is a little bit like . and .. but specific to the class creation.

  +int _rcfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t 
  dev)
  +{
  +   struct inode *inode;
  +   int error = -EPERM;
  +
  +   if (dentry-d_inode)
  +   return -EEXIST;
  +   inode = rcfs_get_inode(dir-i_sb, mode, dev);
  +   if (inode) {
  +   if (dir-i_mode  S_ISGID) {
  +   inode-i_gid = dir-i_gid;
  +   if (S_ISDIR(mode))
  +   inode-i_mode |= S_ISGID;
  +   }
  +   d_instantiate(dentry, inode);
  +   dget(dentry);
  +   error = 0;
  +   }
  +   return error;
  +}
  +
  +EXPORT_SYMBOL_GPL(_rcfs_mknod);
  +
  +int rcfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t 
  dev)
  +{
  +   /* User can only create directories, not files */
  +   if ((mode  S_IFMT) != S_IFDIR)
  +   return -EINVAL;
  +
  +   return dir-i_op-mkdir(dir, dentry, mode);
  +}
  +
  +EXPORT_SYMBOL_GPL(rcfs_mknod);
 
 Why 2 mknod functions?  Do they both really need to be exported?
 
I believe they are both exported so resource controllers can create
class specific directories (including corresponding magic files)
in the case of _rcfs_mknod, and rcfs_mknod is the exported fs op
which allows a restricted set of standard user filesystem operations
within the created directory.  So, yes.

  +
  +#define MAGIC_SHOW(FUNC)   \
  +static int \
 
 You mix tabs and spaces in your #defines in this file, please just use
 tabs properly.
 
Fixed.

  +static ssize_t
  +target_reclassify_write(struct file *file, const char __user * buf,
  +   size_t count, loff_t * ppos, int manual)
  +{
  +   struct rcfs_inode_info *ri = RCFS_I(file-f_dentry-d_inode);
  +   char *optbuf;
  +   int rc = -EINVAL;
  +   ckrm_classtype_t *clstype;
  +
  +   if ((ssize_t) count  0 || (ssize_t) count  TARGET_MAX_INPUT_SIZE)
  +   return -EINVAL;
 
 But count is an unsigned variable, right?  How could it ever be
 negative?
 
Yep.  But see how those nice casts covered up all the warnings?  ;-)
(Fixed!)

  +   if (!access_ok(VERIFY_READ, buf, count))
  +   return -EFAULT;
  +   down((ri-vfs_inode.i_sem));
  +   optbuf = kmalloc(TARGET_MAX_INPUT_SIZE, GFP_KERNEL);
 
 kmalloc with a lock held?  Is that a good idea?

Lock?  Or sema?  Sema should be okay here, right?

 You also don't check the return value of kmalloc, that's a bad idea.
 
Yep - good catch.  Fixed.

  +   __copy_from_user(optbuf, buf, count);
  +   if (optbuf[count - 1] == '\n')
  +   optbuf[count - 1] = '\0';
 
 Stripping off a single trailing \n character?  Why?
 
I believe this is the echo value  /rcfs/class/magic_file.  If
there is a newline, it would show up as an extra newline during
an ls.  Of course, Shailabh can correct me if I'm wrong on this one.

  +inline struct rcfs_inode_info *RCFS_I(struct inode *inode)
  +{
  +   return container_of(inode, struct rcfs_inode_info, vfs_inode);
  +}
  +
  +EXPORT_SYMBOL_GPL(RCFS_I);
 
 This should be named something sane, and just use a #define for it like
 most other container_of() users.

Stupid name gone.  I didn't grok the need for the #define though?

  +void rcfs_destroy_inodecache(void)
  +{
  +   printk(KERN_WARNING destroy inodecache was called\n);
 
 Do you really want to print this out in production code?
 
Nope.  Fixed.

  +   if (kmem_cache_destroy(rcfs_inode_cachep))
  +   printk(KERN_INFO
  +  rcfs_inode_cache: not all structures were freed\n);
 
 Shouldn't this really be INFO level?  What is a user going to do

[PATCH] CKRM [8/8] CKRM Documentation

2005-02-24 Thread Gerrit Huizenga

This patch adds all current documentation on CKRM.

Signed-Off-By: Hubertus Franke [EMAIL PROTECTED]
Signed-Off-By: Chandra Seetharaman [EMAIL PROTECTED]
Signed-Off-By: Shailabh Nagar [EMAIL PROTECTED]
Signed-Off-By: Vivek Kashyap [EMAIL PROTECTED]
Signed-Off-By: Gerrit Huizenga [EMAIL PROTECTED]

Index: linux-2.6.11-rc5-ckrm01/Documentation/ckrm/ckrm_basics
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.11-rc5-ckrm01/Documentation/ckrm/ckrm_basics  2005-02-24 
01:09:47.024706529 -0800
@@ -0,0 +1,66 @@
+CKRM Basics
+-
+A brief review of CKRM concepts and terminology will help make installation
+and testing easier. For more details, please visit http://ckrm.sf.net. 
+
+Currently there are two class types, taskclass and socketclass for grouping,
+regulating and monitoring tasks and sockets respectively.
+
+To avoid repeating instructions for each classtype, this document assumes a
+task to be the kernel object being grouped. By and large, one can replace task
+with socket and taskclass with socketclass.
+
+RCFS depicts a CKRM class as a directory. Hierarchy of classes can be
+created in which children of a class share resources allotted to
+the parent. Tasks can be classified to any class which is at any level.
+There is no correlation between parent-child relationship of tasks and
+the parent-child relationship of classes they belong to.
+
+Without a Classification Engine, class is inherited by a task. A privileged
+user can reassigned a task to a class as described below, after which all
+the child tasks under that task will be assigned to that class, unless the
+user reassigns any of them.
+
+A Classification Engine, if one exists, will be used by CKRM to
+classify a task to a class. The Rule based classification engine uses some
+of the attributes of the task to classify a task. When a CE is present
+class is not inherited by a task.
+
+Characteristics of a class can be accessed/changed through the following magic
+files under the directory representing the class:
+
+shares:  allows to change the shares of different resources managed by the
+ class
+stats:   allows to see the statistics associated with each resources managed
+ by the class
+target:  allows to assign a task to a class. If a CE is present, assigning
+ a task to a class through this interface will prevent CE from
+reassigning the task to any class during reclassification.
+members: allows to see which tasks has been assigned to a class
+config:  allow to view and modify configuration information of different
+ resources in a class.
+
+Resource allocations for a class is controlled by the parameters:
+
+guarantee: specifies how much of a resource is guranteed to a class. A
+   special value DONT_CARE(-2) mean that there is no specific
+  guarantee of a resource is specified, this class may not get
+  any resource if the system is runing short of resources
+limit: specifies the maximum amount of resource that is allowed to be
+   allocated by a class. A special value DONT_CARE(-2) mean that
+  there is no specific limit is specified, this class can get all
+  the resources available.
+total_guarantee: total guarantee that is allowed among the children of this
+   class. In other words, the sum of guarantees of all children
+  of this class cannot exit this number.
+max_limit: Maximum limit allowed for any of this class's children. In
+  other words, limit of any children of this class cannot exceed
+  this value.
+
+None of this parameters are absolute or have any units associated with
+them. These are just numbers(that are relative to its parents') that are
+used to calculate the absolute number of resource available for a specific
+class.
+
+Note: The root class has an absolute number of resource units associated with 
it.
+
Index: linux-2.6.11-rc5-ckrm01/Documentation/ckrm/core_usage
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.11-rc5-ckrm01/Documentation/ckrm/core_usage   2005-02-24 
01:09:47.025706409 -0800
@@ -0,0 +1,72 @@
+Usage of CKRM without a classification engine
+---
+
+1. Create a class
+
+   # mkdir /rcfs/taskclass/c1
+   creates a taskclass named c1 , while
+   # mkdir /rcfs/socket_class/s1
+   creates a socketclass named s1 
+
+The newly created class directory is automatically populated by magic files
+shares, stats, members, target and config.
+
+2. View default shares 
+
+   # cat /rcfs/taskclass/c1/shares
+
+   guarantee=-2,limit=-2,total_guarantee=100,max_limit=100 is the default
+   value set for resources that have controllers registered with CKRM.
+
+3. change shares of a class
+
+   One or more of the following fields can/must be specified
+   res

[PATCH] CKRM [7/8] Resource controller for number of tasks per class

2005-02-24 Thread Gerrit Huizenga

This patch provides a resource controller for limiting the number
of tasks per class in CKRM.

Signed-Off-By: Chandra Seetharaman [EMAIL PROTECTED]
Signed-Off-By: Hubertus Franke [EMAIL PROTECTED]
Signed-Off-By: Shailabh Nagar [EMAIL PROTECTED]
Signed-Off-By: Gerrit Huizenga [EMAIL PROTECTED]


Index: linux-2.6.11-rc5-ckrm01/include/linux/ckrm_tsk.h
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.11-rc5-ckrm01/include/linux/ckrm_tsk.h2005-02-24 
01:09:42.896204314 -0800
@@ -0,0 +1,35 @@
+/* ckrm_tsk.h - No. of tasks resource controller for CKRM
+ *
+ * Copyright (C) Chandra Seetharaman, IBM Corp. 2003
+ * 
+ * Provides No. of tasks resource controller for CKRM
+ *
+ * Latest version, more details at http://ckrm.sf.net
+ * 
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ */
+
+#ifndef _LINUX_CKRM_TSK_H
+#define _LINUX_CKRM_TSK_H
+
+#ifdef CONFIG_CKRM_TYPE_TASKCLASS
+#include linux/ckrm_rc.h
+
+typedef int (*get_ref_t) (struct ckrm_core_class *, int);
+typedef void (*put_ref_t) (struct ckrm_core_class *);
+
+extern int numtasks_get_ref(struct ckrm_core_class *, int);
+extern void numtasks_put_ref(struct ckrm_core_class *);
+extern void ckrm_numtasks_register(get_ref_t, put_ref_t);
+
+#else /* CONFIG_CKRM_TYPE_TASKCLASS */
+
+#define numtasks_get_ref(core_class, ref) (1)
+#define numtasks_put_ref(core_class)  do {} while (0)
+
+#endif /* CONFIG_CKRM_TYPE_TASKCLASS */
+#endif /* _LINUX_CKRM_RES_H */
Index: linux-2.6.11-rc5-ckrm01/init/Kconfig
===
--- linux-2.6.11-rc5-ckrm01.orig/init/Kconfig   2005-02-24 01:09:01.423204823 
-0800
+++ linux-2.6.11-rc5-ckrm01/init/Kconfig2005-02-24 01:09:42.897204193 
-0800
@@ -183,6 +183,15 @@

  Say N if unsure.  
 
+config CKRM_RES_NUMTASKS
+   tristate Number of Tasks Resource Manager
+   depends on CKRM_TYPE_TASKCLASS
+   default m
+   help
+ Provides a Resource Controller for CKRM that allows limiting no of
+ tasks a task class can have.
+   
+ Say N if unsure, Y to use the feature.
 endmenu
 
 config SYSCTL
Index: linux-2.6.11-rc5-ckrm01/kernel/ckrm/ckrm_numtasks.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.11-rc5-ckrm01/kernel/ckrm/ckrm_numtasks.c 2005-02-24 
01:09:42.898204073 -0800
@@ -0,0 +1,522 @@
+/* ckrm_numtasks.c - Number of tasks resource controller for CKRM
+ *
+ * Copyright (C) Chandra Seetharaman,  IBM Corp. 2003
+ * 
+ * Latest version, more details at http://ckrm.sf.net
+ * 
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ */
+
+/*
+ * CKRM Resource controller for tracking number of tasks in a class.
+ */
+
+#include linux/module.h
+#include linux/init.h
+#include linux/slab.h
+#include asm/errno.h
+#include asm/div64.h
+#include linux/list.h
+#include linux/spinlock.h
+#include linux/ckrm_rc.h
+#include linux/ckrm_tc.h
+#include linux/ckrm_tsk.h
+
+#define TOTAL_NUM_TASKS (131072)   /* 128 K */
+#define NUMTASKS_DEBUG
+#define NUMTASKS_NAME numtasks
+
+struct ckrm_numtasks {
+   struct ckrm_core_class *core;   /* the core i am part of... */
+   struct ckrm_core_class *parent; /* parent of the core above. */
+   struct ckrm_shares shares;
+   spinlock_t cnt_lock;/* always grab parent's lock before child's */
+   int cnt_guarantee;  /* num_tasks guarantee in local units */
+   int cnt_unused; /* has to borrow if more than this is needed */
+   int cnt_limit;  /* no tasks over this limit. */
+   atomic_t cnt_cur_alloc; /* current alloc from self */
+   atomic_t cnt_borrowed;  /* borrowed from the parent */
+
+   int over_guarantee; /* turn on/off when cur_alloc goes  */
+   /* over/under guarantee */
+
+   /* internally maintained statictics to compare with max numbers */
+   int limit_failures; /* # failures as request was over the limit */
+   int borrow_sucesses;/* # successful borrows */
+   int borrow_failures;/* # borrow failures */
+
+   /* Maximum the specific statictics has reached. */
+   int max_limit_failures;
+   int max_borrow_sucesses;
+   int max_borrow_failures;
+
+   /* Total number of specific statistics */
+   int tot_limit_failures;
+   int tot_borrow_sucesses;
+   int tot_borrow_failures;
+};
+
+struct ckrm_res_ctlr numtasks_rcbs;
+
+/* Initialize rescls

[PATCH] CKRM [6/8] CKRM tracking for socket classes

2005-02-24 Thread Gerrit Huizenga

This patch provides the extensions for CKRM to track per socket classes.
This is the base to enable socket based resource control for inbound
connection control, bandwidth control etc.

Signed-Off-By: Vivek Kashyap [EMAIL PROTECTED]
Signed-Off-By: Gerrit Huizenga [EMAIL PROTECTED]


Index: linux-2.6.11-rc5-ckrm01/fs/rcfs/Makefile
===
--- linux-2.6.11-rc5-ckrm01.orig/fs/rcfs/Makefile   2005-02-24 
01:09:01.190232913 -0800
+++ linux-2.6.11-rc5-ckrm01/fs/rcfs/Makefile2005-02-24 01:09:01.408206631 
-0800
@@ -6,3 +6,4 @@
 
 rcfs-y := super.o inode.o dir.o rootdir.o magic.o
 rcfs-$(CONFIG_CKRM_TYPE_TASKCLASS) += tc_magic.o
+rcfs-$(CONFIG_CKRM_TYPE_SOCKETCLASS) += socket_fs.o
Index: linux-2.6.11-rc5-ckrm01/fs/rcfs/rootdir.c
===
--- linux-2.6.11-rc5-ckrm01.orig/fs/rcfs/rootdir.c  2005-02-24 
01:09:01.191232792 -0800
+++ linux-2.6.11-rc5-ckrm01/fs/rcfs/rootdir.c   2005-02-24 01:09:34.051270771 
-0800
@@ -187,6 +187,10 @@
 extern struct rcfs_mfdesc tc_mfdesc;
 #endif
 
+#ifdef CONFIG_CKRM_TYPE_SOCKETCLASS
+extern struct rcfs_mfdesc rcfs_sock_mfdesc;
+#endif
+
 /* Common root and magic file entries.
  * root name, root permissions, magic file names and magic file permissions 
  * are needed by all entities (classtypes and classification engines) existing 
@@ -203,4 +207,10 @@
 #else
NULL,
 #endif
+#ifdef CONFIG_CKRM_TYPE_SOCKETCLASS
+   rcfs_sock_mfdesc,
+#else
+   NULL,
+#endif
+
 };
Index: linux-2.6.11-rc5-ckrm01/fs/rcfs/socket_fs.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.11-rc5-ckrm01/fs/rcfs/socket_fs.c 2005-02-24 01:09:01.410206390 
-0800
@@ -0,0 +1,308 @@
+/* ckrm_socketaq.c 
+ *
+ * Copyright (C) Vivek Kashyap,  IBM Corp. 2004
+ * 
+ * Latest version, more details at http://ckrm.sf.net
+ * 
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ */
+
+/***
+ *  Socket class type
+ *   
+ * Defines the root structure for socket based classes. Currently only inbound
+ * connection control is supported based on prioritized accept queues. 
+ 
**/
+
+#include linux/rcfs.h
+#include net/tcp.h
+
+extern int rcfs_create(struct inode *, struct dentry *, int,
+  struct nameidata *);
+extern int rcfs_unlink(struct inode *, struct dentry *);
+extern int rcfs_symlink(struct inode *, struct dentry *, const char *);
+extern int rcfs_mknod(struct inode *, struct dentry *, int mode, dev_t);
+extern int rcfs_mkdir(struct inode *, struct dentry *, int);
+extern int rcfs_rmdir(struct inode *, struct dentry *);
+extern int rcfs_rename(struct inode *, struct dentry *, struct inode *,
+  struct dentry *);
+
+extern int rcfs_create_coredir(struct inode *, struct dentry *);
+int rcfs_sock_mkdir(struct inode *, struct dentry *, int mode);
+int rcfs_sock_rmdir(struct inode *, struct dentry *);
+
+int rcfs_sock_create_noperm(struct inode *, struct dentry *, int,
+  struct nameidata *);
+int rcfs_sock_unlink_noperm(struct inode *, struct dentry *);
+int rcfs_sock_mkdir_noperm(struct inode *, struct dentry *, int);
+int rcfs_sock_rmdir_noperm(struct inode *, struct dentry *);
+int rcfs_sock_mknod_noperm(struct inode *, struct dentry *, int, dev_t);
+
+void rcfs_sock_set_directory(void);
+
+extern struct file_operations config_fileops,
+members_fileops, shares_fileops, stats_fileops, target_fileops;
+
+struct inode_operations my_iops = {
+   .create = rcfs_create,
+   .lookup = simple_lookup,
+   .link = simple_link,
+   .unlink = rcfs_unlink,
+   .symlink = rcfs_symlink,
+   .mkdir = rcfs_sock_mkdir,
+   .rmdir = rcfs_sock_rmdir,
+   .mknod = rcfs_mknod,
+   .rename = rcfs_rename,
+};
+
+struct inode_operations class_iops = {
+   .create = rcfs_sock_create_noperm,
+   .lookup = simple_lookup,
+   .link = simple_link,
+   .unlink = rcfs_sock_unlink_noperm,
+   .symlink = rcfs_symlink,
+   .mkdir = rcfs_sock_mkdir_noperm,
+   .rmdir = rcfs_sock_rmdir_noperm,
+   .mknod = rcfs_sock_mknod_noperm,
+   .rename = rcfs_rename,
+};
+
+struct inode_operations sub_iops = {
+   .create = rcfs_sock_create_noperm,
+   .lookup = simple_lookup,
+   .link = simple_link,
+   .unlink = rcfs_sock_unlink_noperm,
+   .symlink = rcfs_symlink,
+   .mkdir = rcfs_sock_mkdir_noperm,
+   .rmdir = rcfs_sock_rmdir_noperm,
+   .mknod = rcfs_sock_mknod_noperm,
+   .rename = rcfs_rename,
+};
+
+struct

[PATCH] CKRM [5/8] task based management for CPU, memory disk IO

2005-02-24 Thread Gerrit Huizenga

 This patch provides the extensions for CKRM to track task classes.
 This is the base to enable task class based resource control for
 cpu, memory and disk I/O.

Signed-Off-By: Chandra Seetharaman [EMAIL PROTECTED]
Signed-Off-By: Hubertus Franke [EMAIL PROTECTED]
Signed-Off-By: Shailabh Nagar [EMAIL PROTECTED]
Signed-Off-By: Vivek Kashyap [EMAIL PROTECTED]
Signed-Off-By: Gerrit Huizenga [EMAIL PROTECTED]

Index: linux-2.6.11-rc5/fs/rcfs/Makefile
===
--- linux-2.6.11-rc5.orig/fs/rcfs/Makefile  2005-02-24 00:55:06.487875181 
-0800
+++ linux-2.6.11-rc5/fs/rcfs/Makefile   2005-02-24 00:55:10.938338577 -0800
@@ -5,3 +5,4 @@
 obj-$(CONFIG_RCFS_FS) += rcfs.o 
 
 rcfs-y := super.o inode.o dir.o rootdir.o magic.o
+rcfs-$(CONFIG_CKRM_TYPE_TASKCLASS) += tc_magic.o
Index: linux-2.6.11-rc5/fs/rcfs/rootdir.c
===
--- linux-2.6.11-rc5.orig/fs/rcfs/rootdir.c 2005-02-24 00:55:06.487875181 
-0800
+++ linux-2.6.11-rc5/fs/rcfs/rootdir.c  2005-02-24 00:55:10.938338577 -0800
@@ -58,7 +58,7 @@
return 0;
 }
 
-EXPORT_SYMBOL(rcfs_unregister_engine);
+EXPORT_SYMBOL_GPL(rcfs_unregister_engine);
 
 /*
  * rcfs_mkroot
@@ -183,6 +183,10 @@
 
 EXPORT_SYMBOL_GPL(rcfs_deregister_classtype);
 
+#ifdef CONFIG_CKRM_TYPE_TASKCLASS
+extern struct rcfs_mfdesc tc_mfdesc;
+#endif
+
 /* Common root and magic file entries.
  * root name, root permissions, magic file names and magic file permissions 
  * are needed by all entities (classtypes and classification engines) existing 
@@ -193,6 +197,10 @@
  * table to initialize their magf entries. 
  */
 
-struct rcfs_mfdesc *genmfdesc[] = {
+struct rcfs_mfdesc *genmfdesc[CKRM_MAX_CLASSTYPES] = {
+#ifdef CONFIG_CKRM_TYPE_TASKCLASS
+   tc_mfdesc,
+#else
NULL,
+#endif
 };
Index: linux-2.6.11-rc5/fs/rcfs/tc_magic.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.11-rc5/fs/rcfs/tc_magic.c 2005-02-24 00:55:10.939338456 -0800
@@ -0,0 +1,93 @@
+/* 
+ * fs/rcfs/tc_magic.c 
+ *
+ * Copyright (C) Shailabh Nagar,  IBM Corp. 2004
+ *   (C) Vivek Kashyap,   IBM Corp. 2004
+ *   (C) Chandra Seetharaman, IBM Corp. 2004
+ *   (C) Hubertus Franke, IBM Corp. 2004
+ *   
+ * define magic fileops for taskclass classtype
+ *
+ * Latest version, more details at http://ckrm.sf.net
+ * 
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include linux/rcfs.h
+#include linux/ckrm_tc.h
+
+/*
+ * Taskclass general
+ *
+ * Define structures for taskclass root directory and its magic files 
+ * In taskclasses, there is one set of magic files, created automatically under
+ * the taskclass root (upon classtype registration) and each directory (class) 
+ * created subsequently. However, classtypes can also choose to have different 
+ * sets of magic files created under their root and other directories under 
+ * root using their mkdir function. RCFS only provides helper functions for 
+ * creating the root directory and its magic files
+ * 
+ */
+
+#define TC_FILE_MODE (S_IFREG | S_IRUGO | S_IWUSR)
+
+#define NR_TCROOTMF  7
+struct rcfs_magf tc_rootdesc[NR_TCROOTMF] = {
+   /* First entry must be root */
+   {
+   /* .name = should not be set, copy from classtype name */
+.mode = RCFS_DEFAULT_DIR_MODE,
+.i_op = rcfs_dir_inode_operations,
+.i_fop = simple_dir_operations,
+},
+   /* Rest are root's magic files */
+   {
+.name = target,
+.mode = TC_FILE_MODE,
+.i_fop = target_fileops,
+.i_op = rcfs_file_inode_operations,
+},
+   {
+.name = members,
+.mode = TC_FILE_MODE,
+.i_fop = members_fileops,
+.i_op = rcfs_file_inode_operations,
+},
+   {
+.name = stats,
+.mode = TC_FILE_MODE,
+.i_fop = stats_fileops,
+.i_op = rcfs_file_inode_operations,
+},
+   {
+.name = shares,
+.mode = TC_FILE_MODE,
+.i_fop = shares_fileops,
+.i_op = rcfs_file_inode_operations,
+},
+   /*
+* Reclassify and Config should be made available only at the 
+* root level. Make sure they are the last two entries, as 
+* rcfs_mkdir depends on it.
+*/
+   {
+.name = reclassify,
+.mode = TC_FILE_MODE,
+.i_fop = reclassify_fileops,
+.i_op = rcfs_file_inode_operations,
+},
+   {
+.name = config,
+.mode = TC_FILE_MODE,
+.i_fop = config_fileops,
+.i_op = rcfs_file_inode_operations,
+},
+};
+
+struct rcfs_mfdesc tc_mfdesc = {
+   .rootmf

[PATCH] CKRM [4/8] aFull directory support for rcfs

2005-02-24 Thread Gerrit Huizenga

Index: linux-2.6.11-rc5/fs/Makefile
===
--- linux-2.6.11-rc5.orig/fs/Makefile   2005-02-23 20:03:03.0 -0800
+++ linux-2.6.11-rc5/fs/Makefile2005-02-24 00:55:06.483875663 -0800
@@ -92,6 +92,7 @@
 obj-$(CONFIG_XFS_FS)   += xfs/
 obj-$(CONFIG_AFS_FS)   += afs/
 obj-$(CONFIG_BEFS_FS)  += befs/
+obj-$(CONFIG_RCFS_FS)  += rcfs/
 obj-$(CONFIG_HOSTFS)   += hostfs/
 obj-$(CONFIG_HPPFS)+= hppfs/
 obj-$(CONFIG_DEBUG_FS) += debugfs/
Index: linux-2.6.11-rc5/fs/rcfs/dir.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.11-rc5/fs/rcfs/dir.c  2005-02-24 00:55:06.484875543 -0800
@@ -0,0 +1,292 @@
+/* 
+ * fs/rcfs/dir.c 
+ *
+ * Copyright (C) Shailabh Nagar,  IBM Corp. 2004
+ *   Vivek Kashyap,   IBM Corp. 2004
+ *   
+ * 
+ * Directory operations for rcfs
+ *
+ * Latest version, more details at http://ckrm.sf.net
+ * 
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ */
+
+#include linux/module.h
+#include linux/fs.h
+#include linux/namei.h
+#include linux/namespace.h
+#include linux/dcache.h
+#include linux/seq_file.h
+#include linux/pagemap.h
+#include linux/highmem.h
+#include linux/init.h
+#include linux/string.h
+#include linux/smp_lock.h
+#include linux/backing-dev.h
+#include linux/parser.h
+#include linux/rcfs.h
+#include asm/uaccess.h
+
+#define rcfs_positive(dentry)  ((dentry)-d_inode  !d_unhashed((dentry)))
+
+int rcfs_empty(struct dentry *dentry)
+{
+   struct dentry *child;
+   int ret = 0;
+
+   spin_lock(dcache_lock);
+   list_for_each_entry(child, dentry-d_subdirs, d_child)
+   if (!rcfs_is_magic(child)  rcfs_positive(child))
+   goto out;
+   ret = 1;
+out:
+   spin_unlock(dcache_lock);
+   return ret;
+}
+
+/* Directory inode operations */
+
+int
+rcfs_create(struct inode *dir, struct dentry *dentry, int mode,
+   struct nameidata *nd)
+{
+   return rcfs_mknod(dir, dentry, mode | S_IFREG, 0);
+}
+
+EXPORT_SYMBOL_GPL(rcfs_create);
+
+/* Symlinks permitted ?? */
+int rcfs_symlink(struct inode *dir, struct dentry *dentry, const char *symname)
+{
+   struct inode *inode;
+   int error = -ENOSPC;
+
+   inode = rcfs_get_inode(dir-i_sb, S_IFLNK | S_IRWXUGO, 0);
+   if (inode) {
+   int l = strlen(symname) + 1;
+   error = page_symlink(inode, symname, l);
+   if (!error) {
+   if (dir-i_mode  S_ISGID)
+   inode-i_gid = dir-i_gid;
+   d_instantiate(dentry, inode);
+   dget(dentry);
+   } else
+   iput(inode);
+   }
+   return error;
+}
+
+EXPORT_SYMBOL_GPL(rcfs_symlink);
+
+int rcfs_create_coredir(struct inode *dir, struct dentry *dentry)
+{
+
+   struct rcfs_inode_info *ripar, *ridir;
+   int sz;
+
+   ripar = rcfs_get_inode_info(dir);
+   ridir = rcfs_get_inode_info(dentry-d_inode);
+   /* Inform resource controllers - do Core operations */
+   if (ckrm_is_core_valid(ripar-core)) {
+   sz = strlen(ripar-name) + strlen(dentry-d_name.name) + 2;
+   ridir-name = kmalloc(sz, GFP_KERNEL);
+   if (!ridir-name) {
+   return -ENOMEM;
+   }
+   snprintf(ridir-name, sz, %s/%s, ripar-name,
+dentry-d_name.name);
+   ridir-core = (*(ripar-core-classtype-alloc))
+   (ripar-core, ridir-name);
+   } else {
+   printk(KERN_ERR rcfs_mkdir: Invalid parent core %p\n,
+  ripar-core);
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
+EXPORT_SYMBOL_GPL(rcfs_create_coredir);
+
+int rcfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
+{
+
+   int retval = 0;
+   struct ckrm_classtype *clstype;
+
+#if 0
+   struct dentry *pd = list_entry(dir-i_dentry.next, struct dentry,
+  d_alias);
+   if ((!strcmp(pd-d_name.name, /) 
+!strcmp(dentry-d_name.name, ce))) {
+   /* Call CE's mkdir if it has registered, else fail. */
+   if (rcfs_eng_callbacks.mkdir) {
+   return (*rcfs_eng_callbacks.mkdir) (dir, dentry, mode);
+   } else {
+   return -EINVAL;
+   }
+   }
+#endif
+   if (_rcfs_mknod(dir, dentry, mode | S_IFDIR, 0)) {
+   printk(KERN_ERR rcfs_mkdir: error in _rcfs_mknod\n);
+   return retval;
+   }
+   dir-i_nlink++;
+   /* Inherit parent's ops since _rcfs_mknod assigns noperm ops. */
+

[PATCH] CKRM [3/8] Main/core CKRM code, beginning of RCFS

2005-02-24 Thread Gerrit Huizenga

Main code for CKRM default classification engine.  Adds Resrouce
Control (rc) filesystem as mechanism for setting policies for
class assignments in CKRM.

Signed-Off-By: Chandra Seetharaman [EMAIL PROTECTED]
Signed-Off-By: Hubertus Franke [EMAIL PROTECTED]
Signed-Off-By: Shailabh Nagar [EMAIL PROTECTED]
Signed-Off-By: Gerrit Huizenga [EMAIL PROTECTED]
Signed-Off-By: Vivek Kashyap [EMAIL PROTECTED]


 include/linux/ckrm_ce.h |  108 +
 include/linux/ckrm_events.h |8 
 include/linux/ckrm_rc.h |  355 
 include/linux/rcfs.h|   96 
 include/linux/sched.h   |6 
 init/main.c |2 
 kernel/ckrm/Makefile|2 
 kernel/ckrm/ckrm.c  |  927 
 kernel/ckrm/ckrmutils.c |  195 +
 9 files changed, 1694 insertions(+), 5 deletions(-)

Index: linux-2.6.11-rc5/include/linux/ckrm_ce.h
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.11-rc5/include/linux/ckrm_ce.h2005-02-24 00:55:01.390489786 
-0800
@@ -0,0 +1,95 @@
+/*
+ *  ckrm_ce.h - Header file to be used by Classification Engine of CKRM
+ *
+ * Copyright (C) Hubertus Franke, IBM Corp. 2003
+ *   (C) Shailabh Nagar,  IBM Corp. 2003
+ *   (C) Chandra Seetharaman, IBM Corp. 2003
+ * 
+ * Provides data structures, macros and kernel API of CKRM for 
+ * classification engine.
+ *
+ * Latest version, more details at http://ckrm.sf.net
+ * 
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ *
+ */
+
+#ifndef _LINUX_CKRM_CE_H
+#define _LINUX_CKRM_CE_H
+
+#ifdef CONFIG_CKRM
+
+#include linux/ckrm_events.h
+
+/*
+ * Action parameters identifying the cause of a task-class notify callback 
+ * these can perculate up to user daemon consuming records send by the 
+ * classification engine
+ */
+
+typedef void *(*ce_classify_fct) (enum ckrm_event event, void *obj, ...);
+typedef void (*ce_notify_fct) (enum ckrm_event event, void *classobj,
+void *obj);
+
+struct ckrm_eng_callback {
+   /* general state information */
+   int always_callback;/* set if CE should always be called back 
+  regardless of numclasses */
+
+   /* callbacks which are called without holding locks */
+
+   unsigned long c_interest;   /* set of classification events of 
+* interest to CE 
+*/
+
+   /* generic classify */
+   ce_classify_fct classify;
+
+   /* class added */
+   void (*class_add) (const char *name, void *core, int classtype);
+
+   /* class deleted */
+   void (*class_delete) (const char *name, void *core, int classtype);
+
+   /* callbacks which are called while holding task_lock(tsk) */
+   unsigned long n_interest;   /* set of notification events of 
+*  interest to CE 
+*/
+   /* notify on class switch */
+   ce_notify_fct notify;   
+};
+
+struct inode;
+struct dentry;
+
+struct rbce_eng_callback {
+   int (*mkdir) (struct inode *, struct dentry *, int);/* mkdir */
+   int (*rmdir) (struct inode *, struct dentry *); /* rmdir */
+   int (*mnt) (void);
+   int (*umnt) (void);
+};
+
+extern int ckrm_register_engine(const char *name, struct ckrm_eng_callback *);
+extern int ckrm_unregister_engine(const char *name);
+
+extern void *ckrm_classobj(char *, int *classtype);
+
+extern int rcfs_register_engine(struct rbce_eng_callback *);
+extern int rcfs_unregister_engine(struct rbce_eng_callback *);
+
+extern int ckrm_reclassify(int pid);
+
+#ifndef _LINUX_CKRM_RC_H
+
+extern void ckrm_core_grab(struct ckrm_core_class *core);
+extern void ckrm_core_drop(struct ckrm_core_class *core);
+#endif
+
+#endif /* CONFIG_CKRM */
+#endif /* _LINUX_CKRM_CE_H */
Index: linux-2.6.11-rc5/include/linux/ckrm_events.h
===
--- linux-2.6.11-rc5.orig/include/linux/ckrm_events.h   2005-02-24 
00:54:50.530799168 -0800
+++ linux-2.6.11-rc5/include/linux/ckrm_events.h2005-02-24 
00:55:01.391489666 -0800
@@ -108,70 +108,78 @@
 extern void ckrm_invoke_event_cb_chain(enum ckrm_event ev, void *arg);
 
 /* forward declarations for function arguments */
-struct task_struct;
+
+#include linux/sched.h   /* for task_struct */
+
 struct sock;
 struct user_struct;
 
 static inline void ckrm_cb_fork(struct task_struct *p

[PATCH] CKRM [2/8] More accurate account for CPU IO scheduling

2005-02-24 Thread Gerrit Huizenga

CKRM processor scheduling delay accounting - provides a mechanism
to In addition to counting frequency the total delay in ns is also
recorded. CPU delays are specified as cpu-wait and cpu-run.  I/O delays
are recorded for memory and regular I/O.  Information is accessible
through /proc/pid/delay.

Signed-Off-By: Chandra Seetharaman [EMAIL PROTECTED]
Signed-Off-By: Hubertus Franke [EMAIL PROTECTED]
Signed-Off-By: Shailabh Nagar [EMAIL PROTECTED]
Signed-Off-By: Gerrit Huizenga [EMAIL PROTECTED]

 fs/proc/array.c|   18 +
 fs/proc/base.c |   18 +
 include/linux/sched.h  |   86 +
 include/linux/taskdelays.h |   45 +++
 init/Kconfig   |8 
 kernel/fork.c  |1 
 kernel/sched.c |   17 
 mm/memory.c|9 +++-
 8 files changed, 200 insertions(+), 2 deletions(-)

Index: linux-2.6.11-rc5/fs/proc/array.c
===
--- linux-2.6.11-rc5.orig/fs/proc/array.c   2005-02-23 20:03:03.0 
-0800
+++ linux-2.6.11-rc5/fs/proc/array.c2005-02-24 00:54:56.449085584 -0800
@@ -473,3 +473,21 @@
return sprintf(buffer,%d %d %d %d %d %d %d\n,
   size, resident, shared, text, lib, data, 0);
 }
+
+
+int proc_pid_delay(struct task_struct *task, char * buffer)
+{
+   int res;
+
+   res  = sprintf(buffer,%u %llu %llu %u %llu %u %llu\n,
+  (unsigned int) get_delay(task,runs),
+  (uint64_t) get_delay(task,runcpu_total),
+  (uint64_t) get_delay(task,waitcpu_total),
+  (unsigned int) get_delay(task,num_iowaits),
+  (uint64_t) get_delay(task,iowait_total),
+  (unsigned int) get_delay(task,num_memwaits),
+  (uint64_t) get_delay(task,mem_iowait_total)
+   );
+   return res;
+}
+
Index: linux-2.6.11-rc5/fs/proc/base.c
===
--- linux-2.6.11-rc5.orig/fs/proc/base.c2005-02-23 20:03:04.0 
-0800
+++ linux-2.6.11-rc5/fs/proc/base.c 2005-02-24 00:54:56.451085343 -0800
@@ -105,6 +105,10 @@
 #ifdef CONFIG_AUDITSYSCALL
PROC_TID_LOGINUID,
 #endif
+#ifdef CONFIG_DELAY_ACCT
+PROC_TID_DELAY_ACCT,
+PROC_TGID_DELAY_ACCT,
+#endif
PROC_TID_FD_DIR = 0x8000,   /* 0x8000-0x */
PROC_TID_OOM_SCORE,
PROC_TID_OOM_ADJUST,
@@ -137,6 +141,9 @@
 #ifdef CONFIG_SECURITY
E(PROC_TGID_ATTR,  attr,S_IFDIR|S_IRUGO|S_IXUGO),
 #endif
+#ifdef CONFIG_DELAY_ACCT
+   E(PROC_TGID_DELAY_ACCT,delay,   S_IFREG|S_IRUGO),
+#endif
 #ifdef CONFIG_KALLSYMS
E(PROC_TGID_WCHAN, wchan,   S_IFREG|S_IRUGO),
 #endif
@@ -167,6 +174,9 @@
 #ifdef CONFIG_SECURITY
E(PROC_TID_ATTR,   attr,S_IFDIR|S_IRUGO|S_IXUGO),
 #endif
+#ifdef CONFIG_DELAY_ACCT
+   E(PROC_TGID_DELAY_ACCT,delay,   S_IFREG|S_IRUGO),
+#endif
 #ifdef CONFIG_KALLSYMS
E(PROC_TID_WCHAN,  wchan,   S_IFREG|S_IRUGO),
 #endif
@@ -1476,6 +1486,13 @@
ei-op.proc_read = proc_pid_wchan;
break;
 #endif
+#ifdef CONFIG_DELAY_ACCT
+   case PROC_TID_DELAY_ACCT:
+   case PROC_TGID_DELAY_ACCT:
+   inode-i_fop = proc_info_file_operations;
+   ei-op.proc_read = proc_pid_delay;
+   break;
+#endif
 #ifdef CONFIG_SCHEDSTATS
case PROC_TID_SCHEDSTAT:
case PROC_TGID_SCHEDSTAT:
Index: linux-2.6.11-rc5/include/linux/sched.h
===
--- linux-2.6.11-rc5.orig/include/linux/sched.h 2005-02-23 20:02:21.0 
-0800
+++ linux-2.6.11-rc5/include/linux/sched.h  2005-02-24 00:54:56.482081606 
-0800
@@ -32,6 +32,7 @@
 #include linux/pid.h
 #include linux/percpu.h
 #include linux/topology.h
+#include linux/taskdelays.h
 
 struct exec_domain;
 
@@ -685,6 +686,9 @@
struct mempolicy *mempolicy;
short il_next;
 #endif
+#ifdef CONFIG_DELAY_ACCT
+   struct task_delay_info delays;
+#endif
 };
 
 static inline pid_t process_group(struct task_struct *tsk)
@@ -980,6 +984,9 @@
 extern void set_task_comm(struct task_struct *tsk, char *from);
 extern void get_task_comm(char *to, struct task_struct *tsk);
 
+#define PF_MEMIO   0x0040  /* I am potentially doing I/O for mem */
+#define PF_IOWAIT  0x0080  /* I am waiting on disk I/O */
+
 #ifdef CONFIG_SMP
 extern void wait_task_inactive(task_t * p);
 #else
@@ -1214,6 +1221,88 @@
return 0;
 }
 #endif /* CONFIG_PM */
+
+/* API for registering delay info */
+#ifdef CONFIG_DELAY_ACCT
+
+#define test_delay_flag(tsk,flg)   ((tsk)-flags  (flg))
+#define set_delay_flag(tsk,flg)((tsk)-flags |= (flg))
+#define clear_delay_flag

[PATCH] CKRM [1/8] Base CKRM events, mods to existing kernel code

2005-02-24 Thread Gerrit Huizenga

Core CKRM Event Callbacks.

On exec, fork, exit, real/effective gid/uid, use CKRM to associate
tasks with appropriate class.

Addressed review comments:

Sam Ravnborg:  Use Makefile syntax correctly
Dave Hansen:  Use of ## is annoying
Greg KH:  Remove Changelogs;
Use __KERNEL__ correctly (if at all);
Consolidate CONFIG_ sections in header files;
Fix extern int get_exe_path_name().
Remove unused DEBUG code 
Convert enum to typedef in prep for sparce __bitwise use

Not yet Addressed:

Greg KH:
Use of __bitwise and sparse in enum's
Use of kernel list type


Signed-off-by:  Shailabh Nagar [EMAIL PROTECTED]
Signed-off-by:  Hubertus Franke [EMAIL PROTECTED]
Signed-off-by:  Chandra Seetharaman [EMAIL PROTECTED]
Signed-off-by:  Gerrit Huizenga [EMAIL PROTECTED]


 fs/exec.c   |2 
 include/linux/ckrm_events.h |  190 
 include/linux/sched.h   |1 
 init/Kconfig|   16 +++
 kernel/Makefile |2 
 kernel/ckrm/Makefile|7 +
 kernel/ckrm/ckrm_events.c   |   97 ++
 kernel/exit.c   |3 
 kernel/fork.c   |4 
 kernel/sys.c|   10 ++
 10 files changed, 331 insertions(+), 1 deletion(-)

Index: linux-2.6.11-rc5/fs/exec.c
===
--- linux-2.6.11-rc5.orig/fs/exec.c 2005-02-23 20:02:37.0 -0800
+++ linux-2.6.11-rc5/fs/exec.c  2005-02-24 00:54:50.529799288 -0800
@@ -48,6 +48,7 @@
 #include linux/syscalls.h
 #include linux/rmap.h
 #include linux/acct.h
+#include linux/ckrm_events.h
 
 #include asm/uaccess.h
 #include asm/mmu_context.h
@@ -1085,6 +1086,7 @@
fput(bprm-file);
bprm-file = NULL;
current-did_exec = 1;
+   ckrm_cb_exec(bprm-filename);
return retval;
}
read_lock(binfmt_lock);
Index: linux-2.6.11-rc5/include/linux/ckrm_events.h
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.11-rc5/include/linux/ckrm_events.h2005-02-24 
00:54:50.530799168 -0800
@@ -0,0 +1,192 @@
+/*
+ * ckrm_events.h - Class-based Kernel Resource Management (CKRM)
+ * event handling
+ *
+ * Copyright (C) Hubertus Franke, IBM Corp. 2003,2004
+ *   (C) Shailabh Nagar,  IBM Corp. 2003
+ *   (C) Chandra Seetharaman, IBM Corp. 2003
+ * 
+ * 
+ * Provides a base header file including macros and basic data structures.
+ *
+ * Latest version, more details at http://ckrm.sf.net
+ * 
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ *
+ */
+
+#ifndef _LINUX_CKRM_EVENTS_H
+#define _LINUX_CKRM_EVENTS_H
+
+#ifdef CONFIG_CKRM
+
+/*
+ * Data structure and function to get the list of registered 
+ * resource controllers.
+ */
+
+/*
+ * CKRM defines a set of events at particular points in the kernel
+ * at which callbacks registered by various class types are called
+ */
+
+enum ckrm_event {
+   /*
+* we distinguish these events types:
+*
+* (a) CKRM_LATCHABLE_EVENTS
+*  events can be latched for event callbacks by classtypes
+*
+* (b) CKRM_NONLATACHBLE_EVENTS
+* events can not be latched but can be used to call classification
+* 
+* (c) event that are used for notification purposes
+* range: [ CKRM_EVENT_CANNOT_CLASSIFY .. )
+*/
+
+   /* events (a) */
+
+   CKRM_LATCHABLE_EVENTS,
+
+   CKRM_EVENT_NEWTASK = CKRM_LATCHABLE_EVENTS,
+   CKRM_EVENT_FORK,
+   CKRM_EVENT_EXIT,
+   CKRM_EVENT_EXEC,
+   CKRM_EVENT_UID,
+   CKRM_EVENT_GID,
+   CKRM_EVENT_LOGIN,
+   CKRM_EVENT_USERADD,
+   CKRM_EVENT_USERDEL,
+   CKRM_EVENT_LISTEN_START,
+   CKRM_EVENT_LISTEN_STOP,
+   CKRM_EVENT_APPTAG,
+
+   /* events (b) */
+
+   CKRM_NONLATCHABLE_EVENTS,
+
+   CKRM_EVENT_RECLASSIFY = CKRM_NONLATCHABLE_EVENTS,
+
+   /* events (c) */
+
+   CKRM_NOTCLASSIFY_EVENTS,
+
+   CKRM_EVENT_MANUAL = CKRM_NOTCLASSIFY_EVENTS,
+
+   CKRM_NUM_EVENTS
+};
+
+/*
+ * CKRM event callback specification for the classtypes or resource 
controllers 
+ *   typically an array is specified using CKRM_EVENT_SPEC terminated with 
+ *   CKRM_EVENT_SPEC_LAST and then that array is registered using
+ *   ckrm_register_event_set.
+ *   Individual registration

[PATCH] CKRM [0/8] Long overdue response to initial review

2005-02-24 Thread Gerrit Huizenga

This is a long overdue response to the many code review comments
that came in during the last posting of the CKRM core code.   While
CKRM has not by any means been inactive, a variety of other deliverables
have taken precedence until recently.

However, the following set of postings is a step towards starting to
rectify that delinquincy, including a refresh to 2.6.11-rc5.  While
testing has been going over the past couple of weeks on a set of patches
very close to this, a large number of cleanups have happened in the past
couple of days and testing is not complete on those.  In particular,
I know of a couple of batches of warnings that need to be cleaned up
and I have a strong suspicion that building with at least one and maybe
two particular CKRM_* config options set to Y may fail to compile at
the moment.

Also, since the last submission, a couple of the patches have
been removed from the set that I'm including now.  One of them
needs a few updates and some air time on ckrm-tech because of some
slight networking related changes; the other was just too darn big
of a patch and is being broken into more reasonable sized pieces.

I was not able to make all changes requested by review comments thus
far; however, the ones that I did not get to have been added to
a TODO file in the Docuemntation directory for ckrm.

The following postings will contain the updated patches for
these components of CKRM:

The following patches include:

01-diff_ckrm_events:
Base CKRM events, mods to existing kernel code

02-diff_delay_acct:
More accurate accounting for CPU scheduling, IO scheduling

03-diff_ckrm_core:
Main/core CKRM code, beginings of Resource Control Filesystem

04-diff_rcfs:
Full directory suppport for rcfs

05-diff_taskclass:
Task based management for CPU, memory and Disk I/O.

06-diff_sockclass:
CKRM tracking for socket classes for inbound connection control,
bandwidth control, etc.

07-diff_numtasks:
Resource controller for number of tasks per class.

10-diff_docs
CKRM documentation.

Please send comments to ckrm-tech@lists.sourceforge.net

thanks,

gerrit
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH] CKRM: 6/10 CKRM: Resource controller for sockets

2005-02-24 Thread Gerrit Huizenga


On Tue, 30 Nov 2004 11:43:11 EST, James Morris wrote:
 On Mon, 29 Nov 2004, Gerrit Huizenga wrote:
 
  +int sock_mkdir(struct inode *, struct dentry *, int mode);
  +int sock_rmdir(struct inode *, struct dentry *);
  +
  +int sock_create_noperm(struct inode *, struct dentry *, int,
  +  struct nameidata *);
  +int sock_unlink_noperm(struct inode *, struct dentry *);
  +int sock_mkdir_noperm(struct inode *, struct dentry *, int);
  +int sock_rmdir_noperm(struct inode *, struct dentry *);
  +int sock_mknod_noperm(struct inode *, struct dentry *, int, dev_t);
  
 
 The sock_ namespace belongs to core networking.  Use rcfs_sock_ or 
 something.

Very good point.  Global search and destroy, er, replace applied.

gerrit
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] CKRM: 5/10 CKRM: Task based management for CPU, memory and Disk I/O.

2005-02-24 Thread Gerrit Huizenga


On Mon, 29 Nov 2004 14:23:23 PST, Greg KH wrote:
 On Mon, Nov 29, 2004 at 10:49:09AM -0800, Gerrit Huizenga wrote:
  +#define TC_DEBUG(fmt, args...) do { \
  +/* printk(%s:  fmt, __FUNCTION__ , ## args); */ } while (0)
 
 Again with the new debug macro :(
 
  +static struct ckrm_task_class taskclass_dflt_class = {
  +};
 
 Empty structure?  Why?
 
Initialized definition, not declaration.  Although with no initializer
which was a bit odd.  struct ckrm_task_class is defined in ckrm_tc.h.

  +// Hubertus .. following functions should move to ckrm_rc.h
 
 Why haven't they moved :)

Because we aren't done yet.  ;-)

  +static inline void ckrm_task_lock(struct task_struct *tsk)
  +{
  +   spin_lock(tsk-ckrm_tsklock);
  +}
 
 Just lock (or unlock) the lock, don't wrap a lock in a function.
 
Yep.  Done.

  +DECLARE_MUTEX(async_serializer);   // serialize all async functions
 
 Should this really be global?  The code says otherwise :)
 
Not any more.

  +   printk(.. Initializing ClassType%s \n,
  +  CT_taskclass.name);
 
 What a pretty log message.  Unfortunately it's wrong (me hears the
 growing mumblings of the kernel janitor mob...)
 
Yep - fixed.

  +#if 0
  +
  +/**
  + * Debugging Task Classes:  Utility functions
  + 
  **/
 
 Then remove the code, if it's not needed.
 
Okay.  I can easily carry a debug patch later.  Should have done that
sooner...

  +EXPORT_SYMBOL(tcp_v4_lookup_listener);
 
 Not EXPORT_SYMBOL_GPL()?
 
Currently makes it just like all the others.  I'll let the networking
folks chime in on how they want that exported when this patch gets
cross posted to netdev.

thanks,

gerrit
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] CKRM [7/8] Resource controller for number of tasks per class

2005-02-24 Thread Gerrit Huizenga


On Thu, 24 Feb 2005 10:00:39 PST, Greg KH wrote:
 On Thu, Feb 24, 2005 at 01:34:38AM -0800, Gerrit Huizenga wrote:
  +#include linux/module.h
  +#include linux/init.h
  +#include linux/slab.h
  +#include asm/errno.h
  +#include asm/div64.h
  +#include linux/list.h
  +#include linux/spinlock.h
  +#include linux/ckrm_rc.h
  +#include linux/ckrm_tc.h
  +#include linux/ckrm_tsk.h
 
 What was that response you gave me about the fact that you fixed up the
 proper ordering of #include files...
 
Doh - missed that one.  :(

Fixed now.

gerrit
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] CKRM: 3/10 CKRM: Core ckrm, rcfs

2005-02-24 Thread Gerrit Huizenga


On Thu, 24 Feb 2005 09:52:23 PST, Greg KH wrote:
 On Thu, Feb 24, 2005 at 01:33:12AM -0800, Gerrit Huizenga wrote:
  On Mon, 29 Nov 2004 14:00:47 PST, Greg KH wrote:
   On Mon, Nov 29, 2004 at 10:47:32AM -0800, Gerrit Huizenga wrote:
+typedef void *(*ce_classify_fct_t) (enum ckrm_event event, void *obj, 
...);
+typedef void (*ce_notify_fct_t) (enum ckrm_event event, void *classobj,
+void *obj);
   
   Ick.  Don't put a _t at the end of a typedef.  Wrong OS style guide.
   
  Fixed.  Although this isn't an OS style guide thing - it is a Posix
  driven convention whereby any header file defined in the standard
  automatically has _t suffixed variables reserved to the implementation,
  e.g. no application is define variables using _t.  This header file isn't
  being used by user level applications so it doesn't matter.
 
 But Linux kernel internals are not driven by Posix conventions, hence,
 my objection.
 
So what is the recommended way of making header files safe for both
kernel and user level consumption when a header file contains
structure definitions suitable for user/kernel communication?

Currently, I don't like the way the current CKRM code mixes kernel
and user content, beyond just the things that are user level accessible.

However, it was pointed out to me that some of the CKRM files define
things which are intended to be part of the interface between user and
kernel.  That is also where some header files are defined as LGPL.

Now, I believe the contents of those header files should be clearly
separated into user/kernel API/structure files and kernel-only headers.

How do you recommend that that usually be done without polluting the
applications C namespace?  This gets right back to the problem with
replicating everything for glibc under a new license, which is really
quite a crock but just the way things are today.  I'd rather start out
with something involving a bit less redundent code.

   Again with the unneeded typedef.  Come on Gerrit, you should know
   better...
   
  Sorry, years of implementing Posix conformant OS's and system header
  files make this very common for anyone (including several of the
  CKRM developers).  Specifically because of user level name space
  collision avoidance issues (e.g. think preserving backwards compatibility
  for user level apps).  It is the primary mechanism for simplifying the
  #ifdef __KERNEL__ crap used in most OS's.
 
 If you are going to write Linux kernel code, use the proper style rules.
 No matter how many years working on other oses, it doesn't matter, you
 know better than to try to bring up that kind of objection...

The question above still stands.  Linus has mentioned the value of
__KERNEL__ in the past to help avoid the application name space
pollution issue as well, but _t also is an internationally accepted
convention among application programmers and system providers.  I'm
not as convinced that this is a case where Linux being different adds
any value to anyone, and actually makes it tougher to define header
files which can preserve an application/kernel API.

I'm trying to figure out the right way of solving the issue of
allowing user apps that happen to be mostly Posix conformant use
CKRM without polluting their namespace.  Seperate headers will do that,
at the minor annoyance of a proliferation of header files.

+#define ckrm_get_res_class(rescls, resid, type) \
+   ((type*) (((resid != -1)  ((rescls) != NULL) \
+   ((rescls) != (void *)-1)) ? \
+((struct ckrm_core_class *)(rescls))-res_class[resid] : NULL))
   
   What exactly are you trying to do with this macro?  Cast to see if a
   pointer is not -1?  That doesn't sound very safe...
  
  This needs to be fixed and better commented.  Basically, when a task
  is exiting, it's class can be set to -1 (-1 in a pointer is, uh, icky).
  But when uninitialized, it is set to NULL.  We need to come up with
  a better fix for this one.
 
 Setting a pointer to -1 is, uh, wrong.  Please fix this, as it's just
 broken.
 
Yes - I have the patch at hand to fix this, just need to merge it in.
It will be included in the next release.

+static inline void ckrm_core_grab(struct ckrm_core_class *core)
+{
+   if (core)
+   atomic_inc(core-refcnt);
+}
   
   Please just use kref, don't invent your own reference counting.
   
  I agree with this but haven't gotten to it yet.  It will take
  a bit more transformation since the current code is 0 based references
  and kref_t's appear to be initialized to 1.  Also, the interactions with
  freeing code will need just a little bit of thought.  So I'm deferring
  this for the moment but not dropping it.
 
 It doesn't matter if kref (there is no kref_t, I don't know where you
 got that from) is initialized to 42.  The whole point is the reference
 counting is handled properly for you, and you don't care, or know, what

Re: [PATCH] CKRM: 3/10 CKRM: Core ckrm, rcfs

2005-02-24 Thread Gerrit Huizenga


On Thu, 24 Feb 2005 13:11:08 PST, Greg KH wrote:
 On Thu, Feb 24, 2005 at 12:54:17PM -0800, Gerrit Huizenga wrote:
  On Thu, 24 Feb 2005 09:52:23 PST, Greg KH wrote:
   On Thu, Feb 24, 2005 at 01:33:12AM -0800, Gerrit Huizenga wrote:
On Mon, 29 Nov 2004 14:00:47 PST, Greg KH wrote:
 On Mon, Nov 29, 2004 at 10:47:32AM -0800, Gerrit Huizenga wrote:
  +typedef void *(*ce_classify_fct_t) (enum ckrm_event event, void 
  *obj, ...);
  +typedef void (*ce_notify_fct_t) (enum ckrm_event event, void 
  *classobj,
  +void *obj);
 
 Ick.  Don't put a _t at the end of a typedef.  Wrong OS style guide.
 
Fixed.  Although this isn't an OS style guide thing - it is a Posix
driven convention whereby any header file defined in the standard
automatically has _t suffixed variables reserved to the implementation,
e.g. no application is define variables using _t.  This header file 
isn't
being used by user level applications so it doesn't matter.
   
   But Linux kernel internals are not driven by Posix conventions, hence,
   my objection.
   
  So what is the recommended way of making header files safe for both
  kernel and user level consumption when a header file contains
  structure definitions suitable for user/kernel communication?
 
 Right now the way is, Don't do it.  Write separate header files for
 userspace.  See the lkml archives for details as to what the proposed
 way to do this is, but I don't think anyone has started working on it
 yet.
 
Yeah - I've seen that.  Doesn't help for new projects yet since the
approach is not well fleshed out yet.

  +#define ckrm_get_res_class(rescls, resid, type) \
  +   ((type*) (((resid != -1)  ((rescls) != NULL) \
  +   ((rescls) != (void *)-1)) ? \
  +((struct ckrm_core_class *)(rescls))-res_class[resid] : NULL))
 
 What exactly are you trying to do with this macro?  Cast to see if a
 pointer is not -1?  That doesn't sound very safe...

This needs to be fixed and better commented.  Basically, when a task
is exiting, it's class can be set to -1 (-1 in a pointer is, uh, icky).
But when uninitialized, it is set to NULL.  We need to come up with
a better fix for this one.
   
   Setting a pointer to -1 is, uh, wrong.  Please fix this, as it's just
   broken.
   
  Yes - I have the patch at hand to fix this, just need to merge it in.
  It will be included in the next release.
 
 Just curious, what is your level of involvement in this project?  Are
 you just merging other developer's patches, or are you writing any of
 the changes yourself?  Isn't a maintainer of a kernel subsystem supposed
 to be one of the primary developers?
 
I'm the person who will ensure that it is maintained.  There are quite
a few developers who have been involved over and those have changed a
bit over time and will continue to change.  However, I'll be sticking
around to make sure that the kernel side is cleaned up and remains
maintainable.  Some of the areas have more specific owners but some
also are supporting distros, research activities, etc.

Of the cleanups, I've done most of them myself but have had some help
and will continue to have help from several of the authors as we carry
forward.  I also did a fair share of cleanups prior to the first posting;
I'm not sure what you would have thought of the first few iterations of
the code.  ;-)

  +/*
  + * Registering a callback structure by the classification engine.
  + *
  + * Returns typeId of class on success -errno for failure.
  + */
  +int ckrm_register_engine(const char *typename, ckrm_eng_callback_t 
  * ecbs)
  +{
  +   struct ckrm_classtype *ctype;
  +
  +   ctype = ckrm_find_classtype_by_name(typename);
  +   if (ctype == NULL)
  +   return (-ENOENT);
  +
  +   atomic_inc(ctype-ce_regd);
  +
  +   /* another engine registered or trying to register ? */
  +   if (atomic_read(ctype-ce_regd) != 1) {
  +   atomic_dec(ctype-ce_regd);
  +   return (-EBUSY);
  +   }
 
 Why not just use a lock if you are worried about this?
 
Wanted to avoid holding a lock while crossing the module boundary.
And, this is a very unlikely race.
   
   Crossing what module boundry?  This is at startup time, and you don't
   need to worry about lock speeds, right?  Please don't try to reinvent a
   lock with atomic values for something like this.
  
  The classification engines can be loadable modules.
 
 Then you have a race condition in the above code that needs to be fixed.
 And no, using an atomic_t is not the solution.

Why not?  This simply gives an EBUSY if someone tries to load multiple
classification engines in parallel - one wins, one loses.  I'm not sure
if there is a higher level mutex on module loading that might even prevent
this race although I wouldn't be surprised

Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-09 Thread Gerrit . Huizenga

At Sequent, we found that there are a small set of processes which are
"critical" to the system's operation in that they should not be killed
on swap shortage, memory shortage, etc.  This included things like init,
potentially inetd, the swapper, page daemon, clusters heartbeat daemon,
and generally any core system service which had a user process component.
If there wasn't enough memory for those processes, or if those processes
weren't already responsible in their use of memory/resources, you were
already toast.

Anyway, there is/was an API in PTX to say (either from in-kernel or through
some user machinations) "I Am a System Process".  Turns on a bit in the
proc struct (task struct) that made it exempt from death from a variety
of sources, e.g. OOM, generic user signals, portions of system shutdown,
etc.

Then, the code looking for things to kill simply skips those that are
intelligently marked, taking most of the decision making/policy making
out of the scheduler/memory manager.

gerrit

> On Mon, 9 Oct 2000, Linus Torvalds wrote:
> > On Mon, 9 Oct 2000, Andi Kleen wrote:
> > > 
> > > netscape usually has child processes: the dns helper. 
> > 
> > Yeah.
> > 
> > One thing we _can_ (and probably should do) is to do a per-user
> > memory pressure thing - we have easy access to the "struct
> > user_struct" (every process has a direct pointer to it), and it
> > should not be too bad to maintain a per-user "VM pressure"
> > counter.
> > 
> > Then, instead of trying to use heuristics like "does this
> > process have children" etc, you'd have things like "is this user
> > a nasty user", which is a much more valid thing to do and can be
> > used to find people who fork tons of processes that are
> > mid-sized but use a lot of memory due to just being many..
> 
> Sure we could do all of this, but does OOM really happen that
> often that we want to make the algorithm this complex ?
> 
> The current algorithm seems to work quite well and is already
> at the limit of how complex I'd like to see it. Having a less
> complex OOM killer turned out to not work very well, but having
> a more complex one is - IMHO - probably overkill ...
> 
> regards,
> 
> Rik
> --
> "What you're running that piece of shit Gnome?!?!"
>-- Miguel de Icaza, UKUUG 2000
> 
> http://www.conectiva.com/ http://www.surriel.com/
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [EMAIL PROTECTED]  For more info on Linux MM,
> see: http://www.linux.eu.org/Linux-MM/
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] VM fix for 2.4.0-test9 OOM handler

2000-10-09 Thread Gerrit . Huizenga



At Sequent, we found that there are a small set of processes which are
"critical" to the system's operation in that they should not be killed
on swap shortage, memory shortage, etc.  This included things like init,
potentially inetd, the swapper, page daemon, clusters heartbeat daemon,
and generally any core system service which had a user process component.
If there wasn't enough memory for those processes, or if those processes
weren't already responsible in their use of memory/resources, you were
already toast.

Anyway, there is/was an API in PTX to say (either from in-kernel or through
some user machinations) "I Am a System Process".  Turns on a bit in the
proc struct (task struct) that made it exempt from death from a variety
of sources, e.g. OOM, generic user signals, portions of system shutdown,
etc.

Then, the code looking for things to kill simply skips those that are
intelligently marked, taking most of the decision making/policy making
out of the scheduler/memory manager.

gerrit

 On Mon, 9 Oct 2000, Linus Torvalds wrote:
  On Mon, 9 Oct 2000, Andi Kleen wrote:
   
   netscape usually has child processes: the dns helper. 
  
  Yeah.
  
  One thing we _can_ (and probably should do) is to do a per-user
  memory pressure thing - we have easy access to the "struct
  user_struct" (every process has a direct pointer to it), and it
  should not be too bad to maintain a per-user "VM pressure"
  counter.
  
  Then, instead of trying to use heuristics like "does this
  process have children" etc, you'd have things like "is this user
  a nasty user", which is a much more valid thing to do and can be
  used to find people who fork tons of processes that are
  mid-sized but use a lot of memory due to just being many..
 
 Sure we could do all of this, but does OOM really happen that
 often that we want to make the algorithm this complex ?
 
 The current algorithm seems to work quite well and is already
 at the limit of how complex I'd like to see it. Having a less
 complex OOM killer turned out to not work very well, but having
 a more complex one is - IMHO - probably overkill ...
 
 regards,
 
 Rik
 --
 "What you're running that piece of shit Gnome?!?!"
-- Miguel de Icaza, UKUUG 2000
 
 http://www.conectiva.com/ http://www.surriel.com/
 
 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to [EMAIL PROTECTED]  For more info on Linux MM,
 see: http://www.linux.eu.org/Linux-MM/
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

77 matches

Mail list logo