Re: [PATCHSET] cpuset: decouple cpuset locking from cgroup core, take#2

2013-01-09 Thread Paul Menage
On Mon, Jan 7, 2013 at 5:31 PM, Li Zefan  wrote:
>
> I don't think Paul's still maintaining cpusets. Normally it's Andrew
> that picks up cpuset patches. It's fine you route it through cgroup
> tree.

Yes, I'm sorry - I should have handed on cpusets at the time I had to
hand on cgroups. I was only really ever the maintainer for cpusets
because Paul Jackson asked me to take it over when he retired, as I
understood the cgroups-related parts of it. I never really had a good
grasp of how the some of the lower-level parts of it interacted with
the rest of the system (e.g. offlining, CPUs, scheduler domains, etc)
anyway ...

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET] cpuset: decouple cpuset locking from cgroup core, take#2

2013-01-09 Thread Paul Menage
On Mon, Jan 7, 2013 at 5:31 PM, Li Zefan lize...@huawei.com wrote:

 I don't think Paul's still maintaining cpusets. Normally it's Andrew
 that picks up cpuset patches. It's fine you route it through cgroup
 tree.

Yes, I'm sorry - I should have handed on cpusets at the time I had to
hand on cgroups. I was only really ever the maintainer for cpusets
because Paul Jackson asked me to take it over when he retired, as I
understood the cgroups-related parts of it. I never really had a good
grasp of how the some of the lower-level parts of it interacted with
the rest of the system (e.g. offlining, CPUs, scheduler domains, etc)
anyway ...

Paul
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Memory Resource Controller Add Boot Option

2008-02-26 Thread Paul Menage
On Mon, Feb 25, 2008 at 7:01 PM, Li Zefan <[EMAIL PROTECTED]> wrote:
>  >
>  > - foo doesn't show up in /proc/cgroups
>
>  Or we can print out the disable flag, maybe this will be better?
>  Because we can distinguish from disabled and not compiled in from
>
> /proc/cgroups.

Certainly possible, if people felt it was useful.

>
>  > - foo isn't auto-mounted if you mount all cgroups in a single hierarchy
>  > - foo isn't visible as an individually mountable subsystem
>
>  You mentioned in a previous mail if we mount a disabled subsystem we
>  will get an error. Here we just ignore the mount option. Which makes
>  more sense ?
>

No, we don't ignore the mount option - we give an error since it
doesn't refer to a valid subsystem. (And in the first case there is no
mount option).

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Memory Resource Controller Add Boot Option

2008-02-26 Thread Paul Menage
On Mon, Feb 25, 2008 at 7:01 PM, Li Zefan [EMAIL PROTECTED] wrote:
  
   - foo doesn't show up in /proc/cgroups

  Or we can print out the disable flag, maybe this will be better?
  Because we can distinguish from disabled and not compiled in from

 /proc/cgroups.

Certainly possible, if people felt it was useful.


   - foo isn't auto-mounted if you mount all cgroups in a single hierarchy
   - foo isn't visible as an individually mountable subsystem

  You mentioned in a previous mail if we mount a disabled subsystem we
  will get an error. Here we just ignore the mount option. Which makes
  more sense ?


No, we don't ignore the mount option - we give an error since it
doesn't refer to a valid subsystem. (And in the first case there is no
mount option).

Paul
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/10] CGroup API files: Various cleanup to CGroup control files

2008-02-25 Thread Paul Menage
On Mon, Feb 25, 2008 at 7:23 PM, Li Zefan <[EMAIL PROTECTED]> wrote:
>
>  Should those pathces be rebased againt 2.6.25-rc3 ?
>

No, because they're against 2.6.25-rc2-mm1, which is already has (I
think) any of the new bits in 2.6.25-rc3 that would be affected by
these patches.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Memory Resource Controller Add Boot Option

2008-02-25 Thread Paul Menage

I'll send out a prototype for comment.


Something like the patch below. The effects of cgroup_disable=foo are:

- foo doesn't show up in /proc/cgroups
- foo isn't auto-mounted if you mount all cgroups in a single hierarchy
- foo isn't visible as an individually mountable subsystem

As a result there will only ever be one call to foo->create(), at init time; 
all processes will stay in this group, and the group will never be mounted on a 
visible hierarchy. Any additional effects (e.g. not allocating metadata) are up to 
the foo subsystem.

This doesn't handle early_init subsystems (their "disabled" bit isn't set be, 
but it could easily be extended to do so if any of the early_init systems wanted it - I 
think it would just involve some nastier parameter processing since it would occur before 
the command-line argument parser had been run.

include/linux/cgroup.h |1 +
kernel/cgroup.c|   29 +++--
2 files changed, 28 insertions(+), 2 deletions(-)

Index: cgroup_disable-2.6.25-rc2-mm1/include/linux/cgroup.h
===
--- cgroup_disable-2.6.25-rc2-mm1.orig/include/linux/cgroup.h
+++ cgroup_disable-2.6.25-rc2-mm1/include/linux/cgroup.h
@@ -256,6 +256,7 @@ struct cgroup_subsys {
void (*bind)(struct cgroup_subsys *ss, struct cgroup *root);
int subsys_id;
int active;
+   int disabled;
int early_init;
#define MAX_CGROUP_TYPE_NAMELEN 32
const char *name;
Index: cgroup_disable-2.6.25-rc2-mm1/kernel/cgroup.c
===
--- cgroup_disable-2.6.25-rc2-mm1.orig/kernel/cgroup.c
+++ cgroup_disable-2.6.25-rc2-mm1/kernel/cgroup.c
@@ -790,7 +790,14 @@ static int parse_cgroupfs_options(char *
if (!*token)
return -EINVAL;
if (!strcmp(token, "all")) {
-   opts->subsys_bits = (1 << CGROUP_SUBSYS_COUNT) - 1;
+   /* Add all non-disabled subsystems */
+   int i;
+   opts->subsys_bits = 0;
+   for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
+   struct cgroup_subsys *ss = subsys[i];
+   if (!ss->disabled)
+   opts->subsys_bits |= 1ul << i;
+   }
} else if (!strcmp(token, "noprefix")) {
set_bit(ROOT_NOPREFIX, >flags);
} else if (!strncmp(token, "release_agent=", 14)) {
@@ -808,7 +815,8 @@ static int parse_cgroupfs_options(char *
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
ss = subsys[i];
if (!strcmp(token, ss->name)) {
-   set_bit(i, >subsys_bits);
+   if (!ss->disabled)
+   set_bit(i, >subsys_bits);
break;
}
}
@@ -2596,6 +2606,8 @@ static int proc_cgroupstats_show(struct 
	mutex_lock(_mutex);

for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
+   if (ss->disabled)
+   continue;
seq_printf(m, "%s\t%lu\t%d\n",
   ss->name, ss->root->subsys_bits,
   ss->root->number_of_cgroups);
@@ -2991,3 +3003,16 @@ static void cgroup_release_agent(struct 
	spin_unlock(_list_lock);

mutex_unlock(_mutex);
}
+
+static int __init cgroup_disable(char *str)
+{
+   int i;
+   for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
+   struct cgroup_subsys *ss = subsys[i];
+   if (!strcmp(str, ss->name)) {
+   ss->disabled = 1;
+   break;
+   }
+   }
+}
+__setup("cgroup_disable=", cgroup_disable);




Sure thing, if css has the flag, then it would nice. Could you wrap it up to say
something like css_disabled(_cgroup_subsys)




It's the subsys object rather than the css (cgroup_subsys_state).

 We could have something like:

#define cgroup_subsys_disabled(_ss) ((ss_)->disabled)

but I don't see that 

 cgroup_subsys_disabled(_cgroup_subsys) 


is better than just putting

 mem_cgroup_subsys.disabled

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Memory Resource Controller Add Boot Option

2008-02-25 Thread Paul Menage
On Mon, Feb 25, 2008 at 9:18 AM, Balbir Singh <[EMAIL PROTECTED]> wrote:
>
>  I thought about it, but it did not work out all that well. The reason being,
>  that the memory controller is called in from places besides cgroup.
>  mem_cgroup_charge_common() for example is called from several places in mm.
>  Calling into cgroups to check, enabled/disabled did not seem right.

You wouldn't need to call into cgroups - if it's a flag in the subsys
object (which is defined in memcontrol.c) you'd just say

if (mem_cgroup_subsys.disabled) {
...
}

I'll send out a prototype for comment.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Memory Resource Controller Add Boot Option

2008-02-25 Thread Paul Menage
On Mon, Feb 25, 2008 at 3:55 AM, Balbir Singh <[EMAIL PROTECTED]> wrote:
>
>
>  A boot option for the memory controller was discussed on lkml. It is a good
>  idea to add it, since it saves memory for people who want to turn off the
>  memory controller.
>
>  By default the option is on for the following two reasons
>
>  1. It provides compatibility with the current scheme where the memory
>controller turns on if the config option is enabled
>  2. It allows for wider testing of the memory controller, once the config
>option is enabled
>
>  We still allow the create, destroy callbacks to succeed, since they are
>  not aware of boot options. We do not populate the directory will
>  memory resource controller specific files.

Would it make more sense to have a generic cgroups boot option for this?

Something like cgroup_disable=xxx, which would be parsed by cgroups
and would cause:

- a "disabled" flag to be set to true in the subsys object (you could
use this in place of the mem_cgroup_on flag)

- prevent the disabled cgroup from being bound to any mounted
hierarchy (so it would be ignored in a mount with no subsystem
options, and a mount with options that specifically pick that
subsystem would give an error)

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Memory Resource Controller Add Boot Option

2008-02-25 Thread Paul Menage
On Mon, Feb 25, 2008 at 3:55 AM, Balbir Singh [EMAIL PROTECTED] wrote:


  A boot option for the memory controller was discussed on lkml. It is a good
  idea to add it, since it saves memory for people who want to turn off the
  memory controller.

  By default the option is on for the following two reasons

  1. It provides compatibility with the current scheme where the memory
controller turns on if the config option is enabled
  2. It allows for wider testing of the memory controller, once the config
option is enabled

  We still allow the create, destroy callbacks to succeed, since they are
  not aware of boot options. We do not populate the directory will
  memory resource controller specific files.

Would it make more sense to have a generic cgroups boot option for this?

Something like cgroup_disable=xxx, which would be parsed by cgroups
and would cause:

- a disabled flag to be set to true in the subsys object (you could
use this in place of the mem_cgroup_on flag)

- prevent the disabled cgroup from being bound to any mounted
hierarchy (so it would be ignored in a mount with no subsystem
options, and a mount with options that specifically pick that
subsystem would give an error)

Paul
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Memory Resource Controller Add Boot Option

2008-02-25 Thread Paul Menage
On Mon, Feb 25, 2008 at 9:18 AM, Balbir Singh [EMAIL PROTECTED] wrote:

  I thought about it, but it did not work out all that well. The reason being,
  that the memory controller is called in from places besides cgroup.
  mem_cgroup_charge_common() for example is called from several places in mm.
  Calling into cgroups to check, enabled/disabled did not seem right.

You wouldn't need to call into cgroups - if it's a flag in the subsys
object (which is defined in memcontrol.c) you'd just say

if (mem_cgroup_subsys.disabled) {
...
}

I'll send out a prototype for comment.

Paul
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Memory Resource Controller Add Boot Option

2008-02-25 Thread Paul Menage

I'll send out a prototype for comment.


Something like the patch below. The effects of cgroup_disable=foo are:

- foo doesn't show up in /proc/cgroups
- foo isn't auto-mounted if you mount all cgroups in a single hierarchy
- foo isn't visible as an individually mountable subsystem

As a result there will only ever be one call to foo-create(), at init time; 
all processes will stay in this group, and the group will never be mounted on a 
visible hierarchy. Any additional effects (e.g. not allocating metadata) are up to 
the foo subsystem.

This doesn't handle early_init subsystems (their disabled bit isn't set be, 
but it could easily be extended to do so if any of the early_init systems wanted it - I 
think it would just involve some nastier parameter processing since it would occur before 
the command-line argument parser had been run.

include/linux/cgroup.h |1 +
kernel/cgroup.c|   29 +++--
2 files changed, 28 insertions(+), 2 deletions(-)

Index: cgroup_disable-2.6.25-rc2-mm1/include/linux/cgroup.h
===
--- cgroup_disable-2.6.25-rc2-mm1.orig/include/linux/cgroup.h
+++ cgroup_disable-2.6.25-rc2-mm1/include/linux/cgroup.h
@@ -256,6 +256,7 @@ struct cgroup_subsys {
void (*bind)(struct cgroup_subsys *ss, struct cgroup *root);
int subsys_id;
int active;
+   int disabled;
int early_init;
#define MAX_CGROUP_TYPE_NAMELEN 32
const char *name;
Index: cgroup_disable-2.6.25-rc2-mm1/kernel/cgroup.c
===
--- cgroup_disable-2.6.25-rc2-mm1.orig/kernel/cgroup.c
+++ cgroup_disable-2.6.25-rc2-mm1/kernel/cgroup.c
@@ -790,7 +790,14 @@ static int parse_cgroupfs_options(char *
if (!*token)
return -EINVAL;
if (!strcmp(token, all)) {
-   opts-subsys_bits = (1  CGROUP_SUBSYS_COUNT) - 1;
+   /* Add all non-disabled subsystems */
+   int i;
+   opts-subsys_bits = 0;
+   for (i = 0; i  CGROUP_SUBSYS_COUNT; i++) {
+   struct cgroup_subsys *ss = subsys[i];
+   if (!ss-disabled)
+   opts-subsys_bits |= 1ul  i;
+   }
} else if (!strcmp(token, noprefix)) {
set_bit(ROOT_NOPREFIX, opts-flags);
} else if (!strncmp(token, release_agent=, 14)) {
@@ -808,7 +815,8 @@ static int parse_cgroupfs_options(char *
for (i = 0; i  CGROUP_SUBSYS_COUNT; i++) {
ss = subsys[i];
if (!strcmp(token, ss-name)) {
-   set_bit(i, opts-subsys_bits);
+   if (!ss-disabled)
+   set_bit(i, opts-subsys_bits);
break;
}
}
@@ -2596,6 +2606,8 @@ static int proc_cgroupstats_show(struct 
	mutex_lock(cgroup_mutex);

for (i = 0; i  CGROUP_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
+   if (ss-disabled)
+   continue;
seq_printf(m, %s\t%lu\t%d\n,
   ss-name, ss-root-subsys_bits,
   ss-root-number_of_cgroups);
@@ -2991,3 +3003,16 @@ static void cgroup_release_agent(struct 
	spin_unlock(release_list_lock);

mutex_unlock(cgroup_mutex);
}
+
+static int __init cgroup_disable(char *str)
+{
+   int i;
+   for (i = 0; i  CGROUP_SUBSYS_COUNT; i++) {
+   struct cgroup_subsys *ss = subsys[i];
+   if (!strcmp(str, ss-name)) {
+   ss-disabled = 1;
+   break;
+   }
+   }
+}
+__setup(cgroup_disable=, cgroup_disable);




Sure thing, if css has the flag, then it would nice. Could you wrap it up to say
something like css_disabled(mem_cgroup_subsys)




It's the subsys object rather than the css (cgroup_subsys_state).

 We could have something like:

#define cgroup_subsys_disabled(_ss) ((ss_)-disabled)

but I don't see that 

 cgroup_subsys_disabled(mem_cgroup_subsys) 


is better than just putting

 mem_cgroup_subsys.disabled

Paul
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/10] CGroup API files: Various cleanup to CGroup control files

2008-02-25 Thread Paul Menage
On Mon, Feb 25, 2008 at 7:23 PM, Li Zefan [EMAIL PROTECTED] wrote:

  Should those pathces be rebased againt 2.6.25-rc3 ?


No, because they're against 2.6.25-rc2-mm1, which is already has (I
think) any of the new bits in 2.6.25-rc3 that would be affected by
these patches.

Paul
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cgroup: fix default notify_on_release setting

2008-02-24 Thread Paul Menage
On Sun, Feb 24, 2008 at 9:53 PM, Li Zefan <[EMAIL PROTECTED]> wrote:
> The documentation says the default value of notify_on_release of
>  a child cgroup is inherited from its parent, which is reasonable,
>  but the implementation just sets the flag disabled.
>
>  Signed-off-by: Li Zefan <[EMAIL PROTECTED]>

Acked-by: Paul Menage <[EMAIL PROTECTED]>

Yes, I guess it makes sense to follow the original cpusets behaviour.
I think that got lost when the notify-on-release functionality was
temporarily removed during cgroups development.

>  ---
>   kernel/cgroup.c |4 +++-
>   1 files changed, 3 insertions(+), 1 deletions(-)
>
>  diff --git a/kernel/cgroup.c b/kernel/cgroup.c
>  index d8abe99..e9c2fb0 100644
>  --- a/kernel/cgroup.c
>  +++ b/kernel/cgroup.c
>  @@ -2232,7 +2232,6 @@ static long cgroup_create(struct cgroup *parent, 
> struct dentry *dentry,
>
> mutex_lock(_mutex);
>
>  -   cgrp->flags = 0;
> INIT_LIST_HEAD(>sibling);
> INIT_LIST_HEAD(>children);
> INIT_LIST_HEAD(>css_sets);
>  @@ -2242,6 +2241,9 @@ static long cgroup_create(struct cgroup *parent, 
> struct dentry *dentry,
> cgrp->root = parent->root;
> cgrp->top_cgroup = parent->top_cgroup;
>
>  +   if (notify_on_release(parent))
>  +   set_bit(CGRP_NOTIFY_ON_RELEASE, >flags);
>  +
> for_each_subsys(root, ss) {
> struct cgroup_subsys_state *css = ss->create(ss, cgrp);
> if (IS_ERR(css)) {
>  --
>  1.5.4.rc3
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cgroup: fix default notify_on_release setting

2008-02-24 Thread Paul Menage
On Sun, Feb 24, 2008 at 9:53 PM, Li Zefan [EMAIL PROTECTED] wrote:
 The documentation says the default value of notify_on_release of
  a child cgroup is inherited from its parent, which is reasonable,
  but the implementation just sets the flag disabled.

  Signed-off-by: Li Zefan [EMAIL PROTECTED]

Acked-by: Paul Menage [EMAIL PROTECTED]

Yes, I guess it makes sense to follow the original cpusets behaviour.
I think that got lost when the notify-on-release functionality was
temporarily removed during cgroups development.

  ---
   kernel/cgroup.c |4 +++-
   1 files changed, 3 insertions(+), 1 deletions(-)

  diff --git a/kernel/cgroup.c b/kernel/cgroup.c
  index d8abe99..e9c2fb0 100644
  --- a/kernel/cgroup.c
  +++ b/kernel/cgroup.c
  @@ -2232,7 +2232,6 @@ static long cgroup_create(struct cgroup *parent, 
 struct dentry *dentry,

 mutex_lock(cgroup_mutex);

  -   cgrp-flags = 0;
 INIT_LIST_HEAD(cgrp-sibling);
 INIT_LIST_HEAD(cgrp-children);
 INIT_LIST_HEAD(cgrp-css_sets);
  @@ -2242,6 +2241,9 @@ static long cgroup_create(struct cgroup *parent, 
 struct dentry *dentry,
 cgrp-root = parent-root;
 cgrp-top_cgroup = parent-top_cgroup;

  +   if (notify_on_release(parent))
  +   set_bit(CGRP_NOTIFY_ON_RELEASE, cgrp-flags);
  +
 for_each_subsys(root, ss) {
 struct cgroup_subsys_state *css = ss-create(ss, cgrp);
 if (IS_ERR(css)) {
  --
  1.5.4.rc3


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] ResCounter: Use read_uint in memory controller

2008-02-23 Thread Paul Menage
On Sat, Feb 23, 2008 at 6:47 PM, Balbir Singh <[EMAIL PROTECTED]> wrote:
>  >> res_counter_read_u64() I'd also want to rename all the other
>  >> *read_uint functions/fields to *read_u64 too. Can I do that in a
>  >> separate patch?
>  >>
>  >
>  > Sounds sensible to me.
>  >
>
>  Sure, fair enough.
>

Actually, since multiple people were asking for this change I did the
search/replace and sent it out already (as a precursor of the other
patches in the series that I sent today).

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/8] sched: rt-group: interface

2008-02-23 Thread Paul Menage
On Sat, Feb 23, 2008 at 12:26 PM, Peter Zijlstra <[EMAIL PROTECTED]> wrote:
>
>  > In that case I guess I'll have to add signed versions of the
>  > read_uint/write_uint methods.
>
>  Yes, I looked at that, I found the interface somewhat unfortunate, it
>  would mean growing the struct with two more function pointers.

Is that really a big deal? We're talking about a structure that has a
small number (<10 in the current tree) of instances per cgroup
subsystem.

> Perhaps a
>  read and write function with abstract data would be better suited. That
>  would allow for this and more. Sadly it looses type information.

If the size of the struct cftype really became a problem, I think the
cleanest way to fix it would be to have a union of the potential
function pointers, and add a field to specify which one is in use.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/8] sched: rt-group: interface

2008-02-23 Thread Paul Menage
On Sat, Feb 23, 2008 at 11:57 AM, Peter Zijlstra <[EMAIL PROTECTED]> wrote:
>  > If so, could we avoid that problem by using 0 rather than -1 as the
>  > "unlimited" value? It looks from what I've read in the Documentation
>  > changes as though 0 isn't really a meaningful value.
>
>  0 means no time, quite useful and clearly distinct from inf. time.
>

So a real-time task in a cgroup with a 0 rt_runtime can be in the R
state but never actually get to run? OK, if people need to be able to
do that then fair enough.

In that case I guess I'll have to add signed versions of the
read_uint/write_uint methods.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/8] sched: rt-group: interface

2008-02-23 Thread Paul Menage
On Mon, Feb 4, 2008 at 1:03 PM, Peter Zijlstra <[EMAIL PROTECTED]> wrote:
>  +static int cpu_rt_runtime_write(struct cgroup *cgrp, struct cftype *cft,
>  +   struct file *file,
>  +   const char __user *userbuf,
>  +   size_t nbytes, loff_t *unused_ppos)
>  +{
>  +   char buffer[64];
>  +   int retval = 0;
>  +   s64 val;
>  +   char *end;
>  +
>  +   if (!nbytes)
>  +   return -EINVAL;
>  +   if (nbytes >= sizeof(buffer))
>  +   return -E2BIG;
>  +   if (copy_from_user(buffer, userbuf, nbytes))
>  +   return -EFAULT;
>  +
>  +   buffer[nbytes] = 0; /* nul-terminate */
>  +
>  +   /* strip newline if necessary */
>  +   if (nbytes && (buffer[nbytes-1] == '\n'))
>  +   buffer[nbytes-1] = 0;
>  +   val = simple_strtoll(buffer, , 0);
>  +   if (*end)
>  +   return -EINVAL;
>  +
>  +   /* Pass to subsystem */
>  +   retval = sched_group_set_rt_runtime(cgroup_tg(cgrp), val);
>  +   if (!retval)
>  +   retval = nbytes;
>  +   return retval;
>   }
>
>  -static u64 cpu_rt_ratio_read_uint(struct cgroup *cgrp, struct cftype *cft)
>  -{
>  -   struct task_group *tg = cgroup_tg(cgrp);
>  +static ssize_t cpu_rt_runtime_read(struct cgroup *cgrp, struct cftype *cft,
>  +  struct file *file,
>  +  char __user *buf, size_t nbytes,
>  +  loff_t *ppos)
>  +{
>  +   char tmp[64];
>  +   long val = sched_group_rt_runtime(cgroup_tg(cgrp));
>  +   int len = sprintf(tmp, "%ld\n", val);
>
>  -   return (u64) tg->rt_ratio;
>  +   return simple_read_from_buffer(buf, nbytes, ppos, tmp, len);
>   }

What's the reason that you can't use the cgroup read_uint/write_uint
methods for this? Is it just because you have -1 as your "unlimited"
value.

If so, could we avoid that problem by using 0 rather than -1 as the
"unlimited" value? It looks from what I've read in the Documentation
changes as though 0 isn't really a meaningful value.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] Cpusets API: Update cpusets to use cgroup structured file API

2008-02-23 Thread Paul Menage
On Sat, Feb 23, 2008 at 12:06 AM, Andrew Morton
<[EMAIL PROTECTED]> wrote:
>
>  It is unclear to me what the relationship is between this and your other
>  cgroup pseudo-fs changes, but as this is fiddling with a userspace
>  interface we should get a wiggle on - we don't want to let things like this
>  slip out to 2.6.26.
>
>  So..  please resend everything?
>

Yes, will do.

But (given that no-one seems to like the proposed cgroup.api file)
these patches don't actually change the userspace API at all - it's
purely internal plumbing improvements.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cgroup: fix sparse warning of shadow symbol in cgroup.c

2008-02-23 Thread Paul Menage
On Sat, Feb 23, 2008 at 4:33 AM, Paul Jackson <[EMAIL PROTECTED]> wrote:
> From: Paul Jackson <[EMAIL PROTECTED]>
>
>  Fix a code warning: symbol 'p' shadows an earlier one
>
>  This is a reincarnation of Harvey Harrison's patch:
> cpuset: sparse warnings in cpuset.c
>
>  Independently, Cliff Wickman moved the affected code,
>  from kernel/cpuset.c to kernel/cgroup.c, in his patch:
> cpusets: update_cpumask revision
>
>  Signed-off-by: Paul Jackson <[EMAIL PROTECTED]>

Acked-by: Paul Menage <[EMAIL PROTECTED]>

>  Cc: Harvey Harrison <[EMAIL PROTECTED]>
>  Cc: Cliff Wickman <[EMAIL PROTECTED]>
>
>  ---
>   kernel/cgroup.c |   10 +-
>   1 file changed, 5 insertions(+), 5 deletions(-)
>
>  --- 2.6.25-rc2-mm1.orig/kernel/cgroup.c 2008-02-16 01:04:48.0 -0800
>  +++ 2.6.25-rc2-mm1/kernel/cgroup.c  2008-02-23 04:19:44.006614677 -0800
>  @@ -1897,14 +1897,14 @@ int cgroup_scan_tasks(struct cgroup_scan
>
> if (heap->size) {
> for (i = 0; i < heap->size; i++) {
>  -   struct task_struct *p = heap->ptrs[i];
>  +   struct task_struct *q = heap->ptrs[i];
> if (i == 0) {
>  -   latest_time = p->start_time;
>  -   latest_task = p;
>  +   latest_time = q->start_time;
>  +   latest_task = q;
> }
> /* Process the task per the caller's callback */
>  -   scan->process_task(p, scan);
>  -   put_task_struct(p);
>  +   scan->process_task(q, scan);
>  +   put_task_struct(q);
> }
> /*
>  * If we had to process any tasks at all, scan again
>
>  --
>   I won't rest till it's the best ...
>   Programmer, Linux Scalability
>   Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] cgroup map files: Add cgroup map data type

2008-02-23 Thread Paul Menage
On Sat, Feb 23, 2008 at 12:04 AM, Andrew Morton
<[EMAIL PROTECTED]> wrote:
>  > +static int cgroup_map_add(struct cgroup_map_cb *cb, const char *key, u64 
> value)
>  > +{
>  > + struct seq_file *sf = cb->state;
>  > + return seq_printf(sf, "%s %llu\n", key, value);
>  > +}
>
>  We don't know what type the architecture uses to implement u64.  This will
>  warn on powerpc, sparc64, maybe others.

OK, I'll add an (unsigned long long) cast.

>
>
>  > +static int cgroup_seqfile_show(struct seq_file *m, void *arg)
>  > +{
>  > + struct cgroup_seqfile_state *state = m->private;
>  > + struct cftype *cft = state->cft;
>  > + if (cft->read_map) {
>  > + struct cgroup_map_cb cb = {
>  > + .fill = cgroup_map_add,
>  > + .state = m,
>  > + };
>  > + return cft->read_map(state->cgroup, cft, );
>  > + } else {
>  > + BUG();
>
>  That's not really needed.  Just call cft->read_map unconditionally.  if
>  it's zero we'll get a null-pointer deref which will have just the same
>  effect as a BUG.

OK. The long-term plan is to have other kinds of files also handled by
this function, so eventually it would look something like:

if (cft->read_map) {
...
} else if (cft->read_something_else) {
...
}
...
} else {
  BUG();
}

But I guess I can save that for the future.

>  >  static int cgroup_file_open(struct inode *inode, struct file *file)
>  >  {
>  >   int err;
>  > @@ -1499,7 +1539,18 @@ static int cgroup_file_open(struct inode
>  >   cft = __d_cft(file->f_dentry);
>  >   if (!cft)
>  >   return -ENODEV;
>  > - if (cft->open)
>  > + if (cft->read_map) {
>
>  But above a NULL value is illegal.  Why are we testing it here?
>
>

The existence of cft->read_map causes us to open a seq_file. Otherwise
we do nothing special and carry on down the normal open path.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Tiny cpusets -- cpusets for small systems?

2008-02-23 Thread Paul Menage
On Sat, Feb 23, 2008 at 4:09 AM, Paul Jackson <[EMAIL PROTECTED]> wrote:
> A couple of proposals have been made recently by people working Linux
>  on smaller systems, for improving realtime isolation and memory
>  pressure handling:
>
>  (1) cpu isolation for hard(er) realtime
> http://lkml.org/lkml/2008/2/21/517
> Max Krasnyanskiy <[EMAIL PROTECTED]>
> [PATCH sched-devel 0/7] CPU isolation extensions
>
>  (2) notify user space of tight memory
> http://lkml.org/lkml/2008/2/9/144
> KOSAKI Motohiro <[EMAIL PROTECTED]>
> [PATCH 0/8][for -mm] mem_notify v6
>
>  In both cases, some of us have responded "why not use cpusets", and the
>  original submitters have replied "cpusets are too fat"  (well, they
>  were more diplomatic than that, but I guess I can say that ;)

Having read those threads, it looks to me as though:

- the parts of Max's problem that would be solved by cpusets can be
mostly accomplished just via sched_setaffinity()

- Motohiro wants to add a new system-wide API that you would also like
to have available on a per-cpuset basis. (Why not just add two access
points for the same feature?)

I'm don't think that either of these would be enough to justify big
changes to cpusets or cgroups, although eliminating bloat is always a
good thing.

>  The primary semantic limit I'd suggest would be supporting exactly
>  one layer depth of cpusets, not a full hierarchy.  So one could still
>  successfully issue from user space 'mkdir /dev/cpuset/foo', but trying
>  to do 'mkdir /dev/cpuset/foo/bar' would fail.  This reminds me of
>  very early FAT file systems, which had just a single, fixed size
>  root directory ;).  There might even be a configurable fixed upper
>  limit on how many /dev/cpuset/* directories were allowed, further
>  simplifying the locking and dynamic memory behavior of this apparatus.

I'm not sure that either of these would make much difference to the
overall footprint.

A single layer of cpusets would allow you to simplify
validate_change() but not much else.

I don't see how a fixed upper limit on the number of cpusets makes the
locking sufficiently simpler to save much code.

>
>  How this extends to cgroups I don't know; for now I suspect that most
>  cgroup module development is motivated by the needs of larger systems,
>  not smaller systems.  However, cpusets is now a module client of
>  cgroups, and it is cgroups that now provides cpusets with its interface
>  to the vfs infrastructure.  It would seem unfortunate if this relation
>  was not continued with tiny cpusets.  Perhaps someone can imagine a tiny
>  cgroups?  This might be the most difficult part of this proposal.

If we wanted to go this way, I can imagine a cgroups config option
that forces just a single hierarchy, which would allow a bunch of
simplifications that would save plenty of text.

>
>  Looking at some IA64 sn2 config builds I have laying about, I see the
>  following text sizes for a couple of versions, showing the growth of
>  the cpuset/cgroup apparatus over time:
>
> 25933   2.6.18-rc3-mm1/kernel/cpuset.o (Aug 2006)
>  vs.
> 37823   2.6.25-rc2-mm1/kernel/cgroup.o (Feb 2008)
> 19558   2.6.25-rc2-mm1/kernel/cpuset.o
>
>  So the total has grown from 25933 to 57381 text bytes (note that
>  this is IA64 arch; most arch's will have proportionately smaller
>  text sizes.)

On x86_64 they're:

cgroup.o: 17348
cpuset.o: 8533

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] ResCounter: Use read_uint in memory controller

2008-02-23 Thread Paul Menage
On Thu, Feb 21, 2008 at 8:29 PM, Balbir Singh <[EMAIL PROTECTED]> wrote:
>
>  Looks good, except for the name uint(), can we make it u64(). Integers are 32
>  bit on both ILP32 and LP64, but we really read/write 64 bit values.

Yes, that's true. But read_uint() is more consistent with all the
other instances in cgroups and subsystems. So if we were to call it
res_counter_read_u64() I'd also want to rename all the other
*read_uint functions/fields to *read_u64 too. Can I do that in a
separate patch?

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] ResCounter: Use read_uint in memory controller

2008-02-23 Thread Paul Menage
On Thu, Feb 21, 2008 at 8:29 PM, Balbir Singh [EMAIL PROTECTED] wrote:

  Looks good, except for the name uint(), can we make it u64(). Integers are 32
  bit on both ILP32 and LP64, but we really read/write 64 bit values.

Yes, that's true. But read_uint() is more consistent with all the
other instances in cgroups and subsystems. So if we were to call it
res_counter_read_u64() I'd also want to rename all the other
*read_uint functions/fields to *read_u64 too. Can I do that in a
separate patch?

Paul
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Tiny cpusets -- cpusets for small systems?

2008-02-23 Thread Paul Menage
On Sat, Feb 23, 2008 at 4:09 AM, Paul Jackson [EMAIL PROTECTED] wrote:
 A couple of proposals have been made recently by people working Linux
  on smaller systems, for improving realtime isolation and memory
  pressure handling:

  (1) cpu isolation for hard(er) realtime
 http://lkml.org/lkml/2008/2/21/517
 Max Krasnyanskiy [EMAIL PROTECTED]
 [PATCH sched-devel 0/7] CPU isolation extensions

  (2) notify user space of tight memory
 http://lkml.org/lkml/2008/2/9/144
 KOSAKI Motohiro [EMAIL PROTECTED]
 [PATCH 0/8][for -mm] mem_notify v6

  In both cases, some of us have responded why not use cpusets, and the
  original submitters have replied cpusets are too fat  (well, they
  were more diplomatic than that, but I guess I can say that ;)

Having read those threads, it looks to me as though:

- the parts of Max's problem that would be solved by cpusets can be
mostly accomplished just via sched_setaffinity()

- Motohiro wants to add a new system-wide API that you would also like
to have available on a per-cpuset basis. (Why not just add two access
points for the same feature?)

I'm don't think that either of these would be enough to justify big
changes to cpusets or cgroups, although eliminating bloat is always a
good thing.

  The primary semantic limit I'd suggest would be supporting exactly
  one layer depth of cpusets, not a full hierarchy.  So one could still
  successfully issue from user space 'mkdir /dev/cpuset/foo', but trying
  to do 'mkdir /dev/cpuset/foo/bar' would fail.  This reminds me of
  very early FAT file systems, which had just a single, fixed size
  root directory ;).  There might even be a configurable fixed upper
  limit on how many /dev/cpuset/* directories were allowed, further
  simplifying the locking and dynamic memory behavior of this apparatus.

I'm not sure that either of these would make much difference to the
overall footprint.

A single layer of cpusets would allow you to simplify
validate_change() but not much else.

I don't see how a fixed upper limit on the number of cpusets makes the
locking sufficiently simpler to save much code.


  How this extends to cgroups I don't know; for now I suspect that most
  cgroup module development is motivated by the needs of larger systems,
  not smaller systems.  However, cpusets is now a module client of
  cgroups, and it is cgroups that now provides cpusets with its interface
  to the vfs infrastructure.  It would seem unfortunate if this relation
  was not continued with tiny cpusets.  Perhaps someone can imagine a tiny
  cgroups?  This might be the most difficult part of this proposal.

If we wanted to go this way, I can imagine a cgroups config option
that forces just a single hierarchy, which would allow a bunch of
simplifications that would save plenty of text.


  Looking at some IA64 sn2 config builds I have laying about, I see the
  following text sizes for a couple of versions, showing the growth of
  the cpuset/cgroup apparatus over time:

 25933   2.6.18-rc3-mm1/kernel/cpuset.o (Aug 2006)
  vs.
 37823   2.6.25-rc2-mm1/kernel/cgroup.o (Feb 2008)
 19558   2.6.25-rc2-mm1/kernel/cpuset.o

  So the total has grown from 25933 to 57381 text bytes (note that
  this is IA64 arch; most arch's will have proportionately smaller
  text sizes.)

On x86_64 they're:

cgroup.o: 17348
cpuset.o: 8533

Paul
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] cgroup map files: Add cgroup map data type

2008-02-23 Thread Paul Menage
On Sat, Feb 23, 2008 at 12:04 AM, Andrew Morton
[EMAIL PROTECTED] wrote:
   +static int cgroup_map_add(struct cgroup_map_cb *cb, const char *key, u64 
 value)
   +{
   + struct seq_file *sf = cb-state;
   + return seq_printf(sf, %s %llu\n, key, value);
   +}

  We don't know what type the architecture uses to implement u64.  This will
  warn on powerpc, sparc64, maybe others.

OK, I'll add an (unsigned long long) cast.



   +static int cgroup_seqfile_show(struct seq_file *m, void *arg)
   +{
   + struct cgroup_seqfile_state *state = m-private;
   + struct cftype *cft = state-cft;
   + if (cft-read_map) {
   + struct cgroup_map_cb cb = {
   + .fill = cgroup_map_add,
   + .state = m,
   + };
   + return cft-read_map(state-cgroup, cft, cb);
   + } else {
   + BUG();

  That's not really needed.  Just call cft-read_map unconditionally.  if
  it's zero we'll get a null-pointer deref which will have just the same
  effect as a BUG.

OK. The long-term plan is to have other kinds of files also handled by
this function, so eventually it would look something like:

if (cft-read_map) {
...
} else if (cft-read_something_else) {
...
}
...
} else {
  BUG();
}

But I guess I can save that for the future.

static int cgroup_file_open(struct inode *inode, struct file *file)
{
 int err;
   @@ -1499,7 +1539,18 @@ static int cgroup_file_open(struct inode
 cft = __d_cft(file-f_dentry);
 if (!cft)
 return -ENODEV;
   - if (cft-open)
   + if (cft-read_map) {

  But above a NULL value is illegal.  Why are we testing it here?



The existence of cft-read_map causes us to open a seq_file. Otherwise
we do nothing special and carry on down the normal open path.

Paul
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cgroup: fix sparse warning of shadow symbol in cgroup.c

2008-02-23 Thread Paul Menage
On Sat, Feb 23, 2008 at 4:33 AM, Paul Jackson [EMAIL PROTECTED] wrote:
 From: Paul Jackson [EMAIL PROTECTED]

  Fix a code warning: symbol 'p' shadows an earlier one

  This is a reincarnation of Harvey Harrison's patch:
 cpuset: sparse warnings in cpuset.c

  Independently, Cliff Wickman moved the affected code,
  from kernel/cpuset.c to kernel/cgroup.c, in his patch:
 cpusets: update_cpumask revision

  Signed-off-by: Paul Jackson [EMAIL PROTECTED]

Acked-by: Paul Menage [EMAIL PROTECTED]

  Cc: Harvey Harrison [EMAIL PROTECTED]
  Cc: Cliff Wickman [EMAIL PROTECTED]

  ---
   kernel/cgroup.c |   10 +-
   1 file changed, 5 insertions(+), 5 deletions(-)

  --- 2.6.25-rc2-mm1.orig/kernel/cgroup.c 2008-02-16 01:04:48.0 -0800
  +++ 2.6.25-rc2-mm1/kernel/cgroup.c  2008-02-23 04:19:44.006614677 -0800
  @@ -1897,14 +1897,14 @@ int cgroup_scan_tasks(struct cgroup_scan

 if (heap-size) {
 for (i = 0; i  heap-size; i++) {
  -   struct task_struct *p = heap-ptrs[i];
  +   struct task_struct *q = heap-ptrs[i];
 if (i == 0) {
  -   latest_time = p-start_time;
  -   latest_task = p;
  +   latest_time = q-start_time;
  +   latest_task = q;
 }
 /* Process the task per the caller's callback */
  -   scan-process_task(p, scan);
  -   put_task_struct(p);
  +   scan-process_task(q, scan);
  +   put_task_struct(q);
 }
 /*
  * If we had to process any tasks at all, scan again

  --
   I won't rest till it's the best ...
   Programmer, Linux Scalability
   Paul Jackson [EMAIL PROTECTED] 1.650.933.1373

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] Cpusets API: Update cpusets to use cgroup structured file API

2008-02-23 Thread Paul Menage
On Sat, Feb 23, 2008 at 12:06 AM, Andrew Morton
[EMAIL PROTECTED] wrote:

  It is unclear to me what the relationship is between this and your other
  cgroup pseudo-fs changes, but as this is fiddling with a userspace
  interface we should get a wiggle on - we don't want to let things like this
  slip out to 2.6.26.

  So..  please resend everything?


Yes, will do.

But (given that no-one seems to like the proposed cgroup.api file)
these patches don't actually change the userspace API at all - it's
purely internal plumbing improvements.

Paul
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/8] sched: rt-group: interface

2008-02-23 Thread Paul Menage
On Mon, Feb 4, 2008 at 1:03 PM, Peter Zijlstra [EMAIL PROTECTED] wrote:
  +static int cpu_rt_runtime_write(struct cgroup *cgrp, struct cftype *cft,
  +   struct file *file,
  +   const char __user *userbuf,
  +   size_t nbytes, loff_t *unused_ppos)
  +{
  +   char buffer[64];
  +   int retval = 0;
  +   s64 val;
  +   char *end;
  +
  +   if (!nbytes)
  +   return -EINVAL;
  +   if (nbytes = sizeof(buffer))
  +   return -E2BIG;
  +   if (copy_from_user(buffer, userbuf, nbytes))
  +   return -EFAULT;
  +
  +   buffer[nbytes] = 0; /* nul-terminate */
  +
  +   /* strip newline if necessary */
  +   if (nbytes  (buffer[nbytes-1] == '\n'))
  +   buffer[nbytes-1] = 0;
  +   val = simple_strtoll(buffer, end, 0);
  +   if (*end)
  +   return -EINVAL;
  +
  +   /* Pass to subsystem */
  +   retval = sched_group_set_rt_runtime(cgroup_tg(cgrp), val);
  +   if (!retval)
  +   retval = nbytes;
  +   return retval;
   }

  -static u64 cpu_rt_ratio_read_uint(struct cgroup *cgrp, struct cftype *cft)
  -{
  -   struct task_group *tg = cgroup_tg(cgrp);
  +static ssize_t cpu_rt_runtime_read(struct cgroup *cgrp, struct cftype *cft,
  +  struct file *file,
  +  char __user *buf, size_t nbytes,
  +  loff_t *ppos)
  +{
  +   char tmp[64];
  +   long val = sched_group_rt_runtime(cgroup_tg(cgrp));
  +   int len = sprintf(tmp, %ld\n, val);

  -   return (u64) tg-rt_ratio;
  +   return simple_read_from_buffer(buf, nbytes, ppos, tmp, len);
   }

What's the reason that you can't use the cgroup read_uint/write_uint
methods for this? Is it just because you have -1 as your unlimited
value.

If so, could we avoid that problem by using 0 rather than -1 as the
unlimited value? It looks from what I've read in the Documentation
changes as though 0 isn't really a meaningful value.

Paul
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/8] sched: rt-group: interface

2008-02-23 Thread Paul Menage
On Sat, Feb 23, 2008 at 11:57 AM, Peter Zijlstra [EMAIL PROTECTED] wrote:
   If so, could we avoid that problem by using 0 rather than -1 as the
   unlimited value? It looks from what I've read in the Documentation
   changes as though 0 isn't really a meaningful value.

  0 means no time, quite useful and clearly distinct from inf. time.


So a real-time task in a cgroup with a 0 rt_runtime can be in the R
state but never actually get to run? OK, if people need to be able to
do that then fair enough.

In that case I guess I'll have to add signed versions of the
read_uint/write_uint methods.

Paul
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/8] sched: rt-group: interface

2008-02-23 Thread Paul Menage
On Sat, Feb 23, 2008 at 12:26 PM, Peter Zijlstra [EMAIL PROTECTED] wrote:

   In that case I guess I'll have to add signed versions of the
   read_uint/write_uint methods.

  Yes, I looked at that, I found the interface somewhat unfortunate, it
  would mean growing the struct with two more function pointers.

Is that really a big deal? We're talking about a structure that has a
small number (10 in the current tree) of instances per cgroup
subsystem.

 Perhaps a
  read and write function with abstract data would be better suited. That
  would allow for this and more. Sadly it looses type information.

If the size of the struct cftype really became a problem, I think the
cleanest way to fix it would be to have a union of the potential
function pointers, and add a field to specify which one is in use.

Paul
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] ResCounter: Use read_uint in memory controller

2008-02-23 Thread Paul Menage
On Sat, Feb 23, 2008 at 6:47 PM, Balbir Singh [EMAIL PROTECTED] wrote:
   res_counter_read_u64() I'd also want to rename all the other
   *read_uint functions/fields to *read_u64 too. Can I do that in a
   separate patch?
  
  
   Sounds sensible to me.
  

  Sure, fair enough.


Actually, since multiple people were asking for this change I did the
search/replace and sent it out already (as a precursor of the other
patches in the series that I sent today).

Paul
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/7] cgroup: fix comments

2008-02-21 Thread Paul Menage
On Wed, Feb 20, 2008 at 6:14 PM, Li Zefan <[EMAIL PROTECTED]> wrote:
> Paul Menage wrote:
>  > I think that docbook-style function comments need /** at the start of
>  > the comment block.
>  >
>
>  Yes, I didn't notice it. I revised the patch to fix it.
>
>
>  ---
>
>
>  fix:
>  - comments about need_forkexit_callback
>  - comments about release agent
>  - typo and comment style, etc.
>
>  Signed-off-by: Li Zefan <[EMAIL PROTECTED]>

Acked-by: Paul Menage <[EMAIL PROTECTED]>

>  ---
>   include/linux/cgroup.h |2 +-
>   kernel/cgroup.c|  142 
> +++-
>   2 files changed, 80 insertions(+), 64 deletions(-)
>
>
>  diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
>  index ff9055f..2ebf7af 100644
>  --- a/include/linux/cgroup.h
>  +++ b/include/linux/cgroup.h
>  @@ -175,7 +175,7 @@ struct css_set {
>   *
>   *
>   * When reading/writing to a file:
>  - * - the cgroup to use in file->f_dentry->d_parent->d_fsdata
>  + * - the cgroup to use is file->f_dentry->d_parent->d_fsdata
>   * - the 'cftype' of the file is file->f_dentry->d_fsdata
>   */
>
>  diff --git a/kernel/cgroup.c b/kernel/cgroup.c
>  index 4766bb6..36066d8 100644
>
> --- a/kernel/cgroup.c
>  +++ b/kernel/cgroup.c
>  @@ -113,9 +113,9 @@ static int root_count;
>   #define dummytop (_cgroup)
>
>   /* This flag indicates whether tasks in the fork and exit paths should
>  - * take callback_mutex and check for fork/exit handlers to call. This
>  - * avoids us having to do extra work in the fork/exit path if none of the
>  - * subsystems need to be called.
>  + * check for fork/exit handlers to call. This avoids us having to do
>  + * extra work in the fork/exit path if none of the subsystems need to
>  + * be called.
>   */
>   static int need_forkexit_callback;
>
>  @@ -307,7 +307,6 @@ static inline void put_css_set_taskexit(struct css_set 
> *cg)
>   * template: location in which to build the desired set of subsystem
>   * state objects for the new cgroup group
>   */
>  -
>   static struct css_set *find_existing_css_set(
> struct css_set *oldcg,
> struct cgroup *cgrp,
>  @@ -354,7 +353,6 @@ static struct css_set *find_existing_css_set(
>   * and chains them on tmp through their cgrp_link_list fields. Returns 0 on
>   * success or a negative error
>   */
>  -
>   static int allocate_cg_links(int count, struct list_head *tmp)
>   {
> struct cg_cgroup_link *link;
>  @@ -396,7 +394,6 @@ static void free_cg_links(struct list_head *tmp)
>   * substituted into the appropriate hierarchy. Must be called with
>   * cgroup_mutex held
>   */
>  -
>
>  static struct css_set *find_css_set(
> struct css_set *oldcg, struct cgroup *cgrp)
>   {
>  @@ -507,8 +504,8 @@ static struct css_set *find_css_set(
>
>   * critical pieces of code here.  The exception occurs on cgroup_exit(),
>   * when a task in a notify_on_release cgroup exits.  Then cgroup_mutex
>   * is taken, and if the cgroup count is zero, a usermode call made
>  - * to /sbin/cgroup_release_agent with the name of the cgroup (path
>  - * relative to the root of cgroup file system) as the argument.
>  + * to the release agent with the name of the cgroup (path relative to
>  + * the root of cgroup file system) as the argument.
>   *
>   * A cgroup can only be deleted if both its 'count' of using tasks
>   * is zero, and its list of 'children' cgroups is empty.  Since all
>  @@ -521,7 +518,7 @@ static struct css_set *find_css_set(
>
>   *
>   * The need for this exception arises from the action of
>   * cgroup_attach_task(), which overwrites one tasks cgroup pointer with
>  - * another.  It does so using cgroup_mutexe, however there are
>  + * another.  It does so using cgroup_mutex, however there are
>   * several performance critical places that need to reference
>   * task->cgroup without the expense of grabbing a system global
>   * mutex.  Therefore except as noted below, when dereferencing or, as
>  @@ -537,7 +534,6 @@ static struct css_set *find_css_set(
>   * cgroup_lock - lock out any changes to cgroup structures
>   *
>   */
>  -
>   void cgroup_lock(void)
>   {
> mutex_lock(_mutex);
>  @@ -548,7 +544,6 @@ void cgroup_lock(void)
>   *
>   * Undo the lock taken in a previous cgroup_lock() call.
>   */
>  -
>   void cgroup_unlock(void)
>   {
> mutex_unlock(_mutex);
>  @@ -590,7 +585,6 @@ static struct inode *cgroup_new_inode(mode_t mode, 
> struct super_block *sb)
>   * Call subsys's pre_destroy handler.
>   * This is called before css refcnt check.
>   *

Re: [PATCH 2/7] cgroup: fix comments

2008-02-21 Thread Paul Menage
On Wed, Feb 20, 2008 at 6:14 PM, Li Zefan [EMAIL PROTECTED] wrote:
 Paul Menage wrote:
   I think that docbook-style function comments need /** at the start of
   the comment block.
  

  Yes, I didn't notice it. I revised the patch to fix it.


  ---


  fix:
  - comments about need_forkexit_callback
  - comments about release agent
  - typo and comment style, etc.

  Signed-off-by: Li Zefan [EMAIL PROTECTED]

Acked-by: Paul Menage [EMAIL PROTECTED]

  ---
   include/linux/cgroup.h |2 +-
   kernel/cgroup.c|  142 
 +++-
   2 files changed, 80 insertions(+), 64 deletions(-)


  diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
  index ff9055f..2ebf7af 100644
  --- a/include/linux/cgroup.h
  +++ b/include/linux/cgroup.h
  @@ -175,7 +175,7 @@ struct css_set {
   *
   *
   * When reading/writing to a file:
  - * - the cgroup to use in file-f_dentry-d_parent-d_fsdata
  + * - the cgroup to use is file-f_dentry-d_parent-d_fsdata
   * - the 'cftype' of the file is file-f_dentry-d_fsdata
   */

  diff --git a/kernel/cgroup.c b/kernel/cgroup.c
  index 4766bb6..36066d8 100644

 --- a/kernel/cgroup.c
  +++ b/kernel/cgroup.c
  @@ -113,9 +113,9 @@ static int root_count;
   #define dummytop (rootnode.top_cgroup)

   /* This flag indicates whether tasks in the fork and exit paths should
  - * take callback_mutex and check for fork/exit handlers to call. This
  - * avoids us having to do extra work in the fork/exit path if none of the
  - * subsystems need to be called.
  + * check for fork/exit handlers to call. This avoids us having to do
  + * extra work in the fork/exit path if none of the subsystems need to
  + * be called.
   */
   static int need_forkexit_callback;

  @@ -307,7 +307,6 @@ static inline void put_css_set_taskexit(struct css_set 
 *cg)
   * template: location in which to build the desired set of subsystem
   * state objects for the new cgroup group
   */
  -
   static struct css_set *find_existing_css_set(
 struct css_set *oldcg,
 struct cgroup *cgrp,
  @@ -354,7 +353,6 @@ static struct css_set *find_existing_css_set(
   * and chains them on tmp through their cgrp_link_list fields. Returns 0 on
   * success or a negative error
   */
  -
   static int allocate_cg_links(int count, struct list_head *tmp)
   {
 struct cg_cgroup_link *link;
  @@ -396,7 +394,6 @@ static void free_cg_links(struct list_head *tmp)
   * substituted into the appropriate hierarchy. Must be called with
   * cgroup_mutex held
   */
  -

  static struct css_set *find_css_set(
 struct css_set *oldcg, struct cgroup *cgrp)
   {
  @@ -507,8 +504,8 @@ static struct css_set *find_css_set(

   * critical pieces of code here.  The exception occurs on cgroup_exit(),
   * when a task in a notify_on_release cgroup exits.  Then cgroup_mutex
   * is taken, and if the cgroup count is zero, a usermode call made
  - * to /sbin/cgroup_release_agent with the name of the cgroup (path
  - * relative to the root of cgroup file system) as the argument.
  + * to the release agent with the name of the cgroup (path relative to
  + * the root of cgroup file system) as the argument.
   *
   * A cgroup can only be deleted if both its 'count' of using tasks
   * is zero, and its list of 'children' cgroups is empty.  Since all
  @@ -521,7 +518,7 @@ static struct css_set *find_css_set(

   *
   * The need for this exception arises from the action of
   * cgroup_attach_task(), which overwrites one tasks cgroup pointer with
  - * another.  It does so using cgroup_mutexe, however there are
  + * another.  It does so using cgroup_mutex, however there are
   * several performance critical places that need to reference
   * task-cgroup without the expense of grabbing a system global
   * mutex.  Therefore except as noted below, when dereferencing or, as
  @@ -537,7 +534,6 @@ static struct css_set *find_css_set(
   * cgroup_lock - lock out any changes to cgroup structures
   *
   */
  -
   void cgroup_lock(void)
   {
 mutex_lock(cgroup_mutex);
  @@ -548,7 +544,6 @@ void cgroup_lock(void)
   *
   * Undo the lock taken in a previous cgroup_lock() call.
   */
  -
   void cgroup_unlock(void)
   {
 mutex_unlock(cgroup_mutex);
  @@ -590,7 +585,6 @@ static struct inode *cgroup_new_inode(mode_t mode, 
 struct super_block *sb)
   * Call subsys's pre_destroy handler.
   * This is called before css refcnt check.
   */
  -
   static void cgroup_call_pre_destroy(struct cgroup *cgrp)
   {
 struct cgroup_subsys *ss;
  @@ -600,7 +594,6 @@ static void cgroup_call_pre_destroy(struct cgroup *cgrp)
 return;
   }

  -
   static void cgroup_diput(struct dentry *dentry, struct inode *inode)
   {
 /* is dentry a directory ? if so, kfree() associated cgroup */
  @@ -1129,8 +1122,13 @@ static inline struct cftype *__d_cft(struct dentry 
 *dentry)
 return dentry-d_fsdata;
   }

  -/*
  - * Called with cgroup_mutex held.  Writes path

Re: [PATCH 7/7] cgroup: remove dead code in cgroup_get_rootdir()

2008-02-20 Thread Paul Menage
2008/2/17 Li Zefan <[EMAIL PROTECTED]>:
> Signed-off-by: Li Zefan <[EMAIL PROTECTED]>

Acked-by: Paul Menage <[EMAIL PROTECTED]>

>  ---
>   kernel/cgroup.c |1 -
>   1 files changed, 0 insertions(+), 1 deletions(-)
>
>  diff --git a/kernel/cgroup.c b/kernel/cgroup.c
>  index 71cf961..879a056 100644
>  --- a/kernel/cgroup.c
>  +++ b/kernel/cgroup.c
>  @@ -926,7 +926,6 @@ static int cgroup_get_rootdir(struct super_block *sb)
> if (!inode)
> return -ENOMEM;
>
>  -   inode->i_op = _dir_inode_operations;
> inode->i_fop = _dir_operations;
> inode->i_op = _dir_inode_operations;
> /* directories start off with i_nlink == 2 (for "." entry) */
>  --
>  1.5.4.rc3
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/7] cgroup: remove dead code in cgroup_get_rootdir()

2008-02-20 Thread Paul Menage
2008/2/17 Li Zefan [EMAIL PROTECTED]:
 Signed-off-by: Li Zefan [EMAIL PROTECTED]

Acked-by: Paul Menage [EMAIL PROTECTED]

  ---
   kernel/cgroup.c |1 -
   1 files changed, 0 insertions(+), 1 deletions(-)

  diff --git a/kernel/cgroup.c b/kernel/cgroup.c
  index 71cf961..879a056 100644
  --- a/kernel/cgroup.c
  +++ b/kernel/cgroup.c
  @@ -926,7 +926,6 @@ static int cgroup_get_rootdir(struct super_block *sb)
 if (!inode)
 return -ENOMEM;

  -   inode-i_op = simple_dir_inode_operations;
 inode-i_fop = simple_dir_operations;
 inode-i_op = cgroup_dir_inode_operations;
 /* directories start off with i_nlink == 2 (for . entry) */
  --
  1.5.4.rc3


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] cgroup map files: Add a key/value map file type to cgroups

2008-02-19 Thread Paul Menage
On Feb 19, 2008 10:14 PM, YAMAMOTO Takashi <[EMAIL PROTECTED]> wrote:
> > On Feb 19, 2008 9:48 PM, YAMAMOTO Takashi <[EMAIL PROTECTED]> wrote:
> > >
> > > it changes the format from "%s %lld" to "%s: %llu", right?
> > > why?
> > >
> >
> > The colon for consistency with maps in /proc. I think it also makes it
> > slightly more readable.
>
> can you be a little more specific?
>
> i object against the colon because i want to use the same parser for
> /proc/vmstat, which doesn't have colons.

Ah. This /proc behaviour of having multiple formats for reporting the
same kind of data (compare with /proc/meminfo, which does use colons)
is the kind of thing that I want to avoid with cgroups. i.e. if two
cgroup subsystems are both reporting the same kind of structured data,
then they should both use the same output format.

I guess since /proc has both styles, and memory.stat is the first file
reporting key/value pairs in cgroups, you get to call the format. OK,
I'll zap the colon.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] cgroup map files: Add a key/value map file type to cgroups

2008-02-19 Thread Paul Menage
On Feb 19, 2008 9:48 PM, YAMAMOTO Takashi <[EMAIL PROTECTED]> wrote:
>
> it changes the format from "%s %lld" to "%s: %llu", right?
> why?
>

The colon for consistency with maps in /proc. I think it also makes it
slightly more readable.

For %lld versus %llu - I think that cgroup resource APIs are much more
likely to need to report unsigned rather than signed values. In the
case of the memory.stat file, that's certainly the case.

But I guess there's an argument to be made that nothing's likely to
need the final 64th bit of an unsigned value, whereas the ability to
report negative numbers could potentially be useful for some cgroups.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/7] CGroup API: Add cgroup.api control file

2008-02-19 Thread Paul Menage
On Feb 19, 2008 9:17 PM, Paul Jackson <[EMAIL PROTECTED]> wrote:
>
> Perhaps my primary concern with these *.api files was that I did not
> understand who or what the critical use or user was; who found this
> essential, not just nice to have.
>

Right now, no-one would find it essential. If/when a binary API is
added, I guess I'll ressurrect this part of the patchset.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] Cpusets API: Update cpusets to use cgroup structured file API

2008-02-19 Thread Paul Menage
Many of the cpusets control files are simple integer values, which
don't require the overhead of memory allocations for reads and writes.

Move the handlers for these control files into cpuset_read_uint() and
cpuset_write_uint(). This also has the advantage that the control
files show up as "u64" rather than "string" in the cgroup.api file.

Signed-off-by: Paul Menage <[EMAIL PROTECTED]>

---
 kernel/cpuset.c |  156 +---
 1 file changed, 82 insertions(+), 74 deletions(-)

Index: cpusets-2.6.25-rc2-mm1/kernel/cpuset.c
===
--- cpusets-2.6.25-rc2-mm1.orig/kernel/cpuset.c
+++ cpusets-2.6.25-rc2-mm1/kernel/cpuset.c
@@ -999,19 +999,6 @@ int current_cpuset_is_being_rebound(void
 }
 
 /*
- * Call with cgroup_mutex held.
- */
-
-static int update_memory_pressure_enabled(struct cpuset *cs, char *buf)
-{
-   if (simple_strtoul(buf, NULL, 10) != 0)
-   cpuset_memory_pressure_enabled = 1;
-   else
-   cpuset_memory_pressure_enabled = 0;
-   return 0;
-}
-
-/*
  * update_flag - read a 0 or a 1 in a file and update associated flag
  * bit:the bit to update (CS_CPU_EXCLUSIVE, CS_MEM_EXCLUSIVE,
  * CS_SCHED_LOAD_BALANCE,
@@ -1023,15 +1010,13 @@ static int update_memory_pressure_enable
  * Call with cgroup_mutex held.
  */
 
-static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs, char *buf)
+static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
+  int turning_on)
 {
-   int turning_on;
struct cpuset trialcs;
int err;
int cpus_nonempty, balance_flag_changed;
 
-   turning_on = (simple_strtoul(buf, NULL, 10) != 0);
-
trialcs = *cs;
if (turning_on)
set_bit(bit, );
@@ -1247,43 +1232,65 @@ static ssize_t cpuset_common_file_write(
case FILE_MEMLIST:
retval = update_nodemask(cs, buffer);
break;
+   default:
+   retval = -EINVAL;
+   goto out2;
+   }
+
+   if (retval == 0)
+   retval = nbytes;
+out2:
+   cgroup_unlock();
+out1:
+   kfree(buffer);
+   return retval;
+}
+
+static int cpuset_write_uint(struct cgroup *cgrp, struct cftype *cft, u64 val)
+{
+   int retval = 0;
+   struct cpuset *cs = cgroup_cs(cgrp);
+   cpuset_filetype_t type = cft->private;
+
+   cgroup_lock();
+
+   if (cgroup_is_removed(cgrp)) {
+   cgroup_unlock();
+   return -ENODEV;
+   }
+
+   switch (type) {
case FILE_CPU_EXCLUSIVE:
-   retval = update_flag(CS_CPU_EXCLUSIVE, cs, buffer);
+   retval = update_flag(CS_CPU_EXCLUSIVE, cs, val);
break;
case FILE_MEM_EXCLUSIVE:
-   retval = update_flag(CS_MEM_EXCLUSIVE, cs, buffer);
+   retval = update_flag(CS_MEM_EXCLUSIVE, cs, val);
break;
case FILE_SCHED_LOAD_BALANCE:
-   retval = update_flag(CS_SCHED_LOAD_BALANCE, cs, buffer);
+   retval = update_flag(CS_SCHED_LOAD_BALANCE, cs, val);
break;
case FILE_MEMORY_MIGRATE:
-   retval = update_flag(CS_MEMORY_MIGRATE, cs, buffer);
+   retval = update_flag(CS_MEMORY_MIGRATE, cs, val);
break;
case FILE_MEMORY_PRESSURE_ENABLED:
-   retval = update_memory_pressure_enabled(cs, buffer);
+   cpuset_memory_pressure_enabled = !!val;
break;
case FILE_MEMORY_PRESSURE:
retval = -EACCES;
break;
case FILE_SPREAD_PAGE:
-   retval = update_flag(CS_SPREAD_PAGE, cs, buffer);
+   retval = update_flag(CS_SPREAD_PAGE, cs, val);
cs->mems_generation = cpuset_mems_generation++;
break;
case FILE_SPREAD_SLAB:
-   retval = update_flag(CS_SPREAD_SLAB, cs, buffer);
+   retval = update_flag(CS_SPREAD_SLAB, cs, val);
cs->mems_generation = cpuset_mems_generation++;
break;
default:
retval = -EINVAL;
-   goto out2;
+   break;
}
-
-   if (retval == 0)
-   retval = nbytes;
-out2:
cgroup_unlock();
-out1:
-   kfree(buffer);
return retval;
 }
 
@@ -1345,30 +1352,6 @@ static ssize_t cpuset_common_file_read(s
case FILE_MEMLIST:
s += cpuset_sprintf_memlist(s, cs);
break;
-   case FILE_CPU_EXCLUSIVE:
-   *s++ = is_cpu_exclusive(cs) ? '1' : '0';
-   break;
-   case FILE_MEM_EXCLUSIVE:
-   *s++ = is_mem_exclusive(cs) ? '1' : '0';
-   break;
-   case FILE_SCHED_LOAD_BALANCE:
-   *s++ = is_sched_load_balance(cs) ? '1' : '0';
-   brea

[PATCH 1/2] Cpusets API: From: Paul Jackson <[EMAIL PROTECTED]>

2008-02-19 Thread Paul Menage
Strip all trailing whitespace in cgroup_write_uint

This removes the need for people to remember to pass the -n flag to
echo when writing values to cgroup control files.

Signed-off-by: Paul Menage <[EMAIL PROTECTED]>

---
 kernel/cgroup.c |5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

Index: cpusets-2.6.25-rc2-mm1/kernel/cgroup.c
===
--- cpusets-2.6.25-rc2-mm1.orig/kernel/cgroup.c
+++ cpusets-2.6.25-rc2-mm1/kernel/cgroup.c
@@ -1321,10 +1321,7 @@ static ssize_t cgroup_write_uint(struct 
return -EFAULT;
 
buffer[nbytes] = 0; /* nul-terminate */
-
-   /* strip newline if necessary */
-   if (nbytes && (buffer[nbytes-1] == '\n'))
-   buffer[nbytes-1] = 0;
+   strstrip(buffer);
val = simple_strtoull(buffer, , 0);
if (*end)
return -EINVAL;

--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/2] Cpusets API: Update Cpusets control files

2008-02-19 Thread Paul Menage
This pair of patches simplifies the cpusets read/write path for the
control files that consist of simple integers.

--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Improve init/Kconfig help descriptions [PATCH 6/9]

2008-02-19 Thread Paul Menage
On Feb 19, 2008 6:54 PM, Nick Andrew <[EMAIL PROTECTED]> wrote:
>
> config CGROUPS
> bool "Control Group support"
> help
>   Control Groups enables processes to be tracked and grouped
>   into "cgroups". This enables you, for example, to associate
>   cgroups with certain CPU sets using "cpusets".
>
>   When enabled, a new filesystem type "cgroup" is available
>   and can be mounted to control cpusets and other
>   resource/behaviour controllers.
>
>   See  for more information.
>
>   If unsure, say N.
>
>
> I don't think that description is as clear as it could be. From
> the non-kernel-developer point of view, that is.

Originally this wasn't a user-selectable config value, it was
auto-selected by any subsystem that needed it. I think that was nicer
from the user-experience, and it would eliminate the need for this
documentation but there were concerns that this triggered unspecified
brokenness in the Kbuild system.

>
> Re "other resource/behaviour controllers", what in particular?
> I take it that our current controllers are cpusets, scheduler,
> CPU accounting and Resource counters?

Resource counters aren't a resource controller, they're a helper
library. The others are good examples, as is the memory controller
that's just been added to 2.6.25.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/7] cgroup: fix and update documentation

2008-02-19 Thread Paul Menage
On Feb 18, 2008 12:39 AM, Li Zefan <[EMAIL PROTECTED]> wrote:
> Misc fixes and updates, make the doc consistent with current
> cgroup implementation.
>
> Signed-off-by: Li Zefan <[EMAIL PROTECTED]>

Acked-by: Paul Menage <[EMAIL PROTECTED]>

Thanks for these cleanups.

Paul

> ---
>  Documentation/cgroups.txt |   66 ++--
>  1 files changed, 33 insertions(+), 33 deletions(-)
>
> diff --git a/Documentation/cgroups.txt b/Documentation/cgroups.txt
> index 42d7c4c..31d12e2 100644
>
> --- a/Documentation/cgroups.txt
> +++ b/Documentation/cgroups.txt
> @@ -28,7 +28,7 @@ CONTENTS:
>  4. Questions
>
>  1. Control Groups
> -==
> +=
>
>  1.1 What are cgroups ?
>  --
> @@ -143,10 +143,10 @@ proliferation of such cgroups.
>
>  Also lets say that the administrator would like to give enhanced network
>  access temporarily to a student's browser (since it is night and the user
> -wants to do online gaming :)  OR give one of the students simulation
> +wants to do online gaming :))  OR give one of the students simulation
>  apps enhanced CPU power,
>
> -With ability to write pids directly to resource classes, its just a
> +With ability to write pids directly to resource classes, it's just a
>  matter of :
>
> # echo pid > /mnt/network//tasks
> @@ -227,10 +227,13 @@ Each cgroup is represented by a directory in the cgroup 
> file system
>  containing the following files describing that cgroup:
>
>   - tasks: list of tasks (by pid) attached to that cgroup
> - - notify_on_release flag: run /sbin/cgroup_release_agent on exit?
> + - releasable flag: cgroup currently removeable?
> + - notify_on_release flag: run the release agent on exit?
> + - release_agent: the path to use for release notifications (this file
> +   exists in the top cgroup only)
>
>  Other subsystems such as cpusets may add additional files in each
> -cgroup dir
> +cgroup dir.
>
>  New cgroups are created using the mkdir system call or shell
>  command.  The properties of a cgroup, such as its flags, are
> @@ -257,7 +260,7 @@ performance.
>  To allow access from a cgroup to the css_sets (and hence tasks)
>  that comprise it, a set of cg_cgroup_link objects form a lattice;
>  each cg_cgroup_link is linked into a list of cg_cgroup_links for
> -a single cgroup on its cont_link_list field, and a list of
> +a single cgroup on its cgrp_link_list field, and a list of
>  cg_cgroup_links for a single css_set on its cg_link_list.
>
>  Thus the set of tasks in a cgroup can be listed by iterating over
> @@ -271,9 +274,6 @@ for cgroups, with a minimum of additional kernel code.
>  1.4 What does notify_on_release do ?
>  
>
> -*** notify_on_release is disabled in the current patch set. It will be
> -*** reactivated in a future patch in a less-intrusive manner
> -
>  If the notify_on_release flag is enabled (1) in a cgroup, then
>  whenever the last task in the cgroup leaves (exits or attaches to
>  some other cgroup) and the last child cgroup of that cgroup
> @@ -360,8 +360,8 @@ Now you want to do something with this cgroup.
>
>  In this directory you can find several files:
>  # ls
> -notify_on_release release_agent tasks
> -(plus whatever files are added by the attached subsystems)
> +notify_on_release releasable tasks
> +(plus whatever files added by the attached subsystems)
>
>  Now attach your shell to this cgroup:
>  # /bin/echo $$ > tasks
> @@ -404,19 +404,13 @@ with a subsystem id which will be assigned by the 
> cgroup system.
>  Other fields in the cgroup_subsys object include:
>
>  - subsys_id: a unique array index for the subsystem, indicating which
> -  entry in cgroup->subsys[] this subsystem should be
> -  managing. Initialized by cgroup_register_subsys(); prior to this
> -  it should be initialized to -1
> +  entry in cgroup->subsys[] this subsystem should be managing.
>
> -- hierarchy: an index indicating which hierarchy, if any, this
> -  subsystem is currently attached to. If this is -1, then the
> -  subsystem is not attached to any hierarchy, and all tasks should be
> -  considered to be members of the subsystem's top_cgroup. It should
> -  be initialized to -1.
> +- name: should be initialized to a unique subsystem name. Should be
> +  no longer than MAX_CGROUP_TYPE_NAMELEN.
>
> -- name: should be initialized to a unique subsystem name prior to
> -  calling cgroup_register_subsystem. Should be no longer than
> -  MAX_CGROUP_TYPE_NAMELEN
> +- early_init: indicate if the subsystem needs early initialization
> +  at system boot.
>
>  Each cgroup object created by the system has an arr

Re: [PATCH 4/7] cgroup: fix memory leak in cgroup_get_sb()

2008-02-19 Thread Paul Menage
On Feb 17, 2008 9:49 PM, Li Zefan <[EMAIL PROTECTED]> wrote:
> opts.release_agent is not kfree()ed in all necessary places.
>
> Signed-off-by: Li Zefan <[EMAIL PROTECTED]>

Acked-by: Paul Menage <[EMAIL PROTECTED]>

Good catch, although hopefully something that would be extremely rare
in practice.

Thanks,

Paul

> ---
>  kernel/cgroup.c |5 -
>  1 files changed, 4 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 0c35022..aa76bbd 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -961,8 +961,11 @@ static int cgroup_get_sb(struct file_system_type 
> *fs_type,
> }
>
> root = kzalloc(sizeof(*root), GFP_KERNEL);
> -   if (!root)
> +   if (!root) {
> +   if (opts.release_agent)
> +   kfree(opts.release_agent);
> return -ENOMEM;
> +   }
>
> init_cgroup_root(root);
> root->subsys_bits = opts.subsys_bits;
> --
> 1.5.4.rc3
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] cgroup: remove duplicate code in find_css_set()

2008-02-19 Thread Paul Menage
On Feb 17, 2008 9:49 PM, Li Zefan <[EMAIL PROTECTED]> wrote:
> The list head res->tasks gets initialized twice in find_css_set().
>
> Signed-off-by: Li Zefan <[EMAIL PROTECTED]>

Acked-by: Paul Menage <[EMAIL PROTECTED]>

> ---
>  kernel/cgroup.c |1 -
>  1 files changed, 0 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index e8c8e58..71cf961 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -473,7 +473,6 @@ static struct css_set *find_css_set(
> /* Link this cgroup group into the list */
> list_add(>list, _css_set.list);
> css_set_count++;
> -   INIT_LIST_HEAD(>tasks);
> write_unlock(_set_lock);
>
> return res;
> --
> 1.5.4.rc3
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/7] cgroup: fix comments

2008-02-19 Thread Paul Menage
On Feb 17, 2008 9:49 PM, Li Zefan <[EMAIL PROTECTED]> wrote:
> fix:
> - comments about need_forkexit_callback
> - comments about release agent
> - typo and comment style, etc.
>
> Signed-off-by: Li Zefan <[EMAIL PROTECTED]>
> ---
>  include/linux/cgroup.h |2 +-
>  kernel/cgroup.c|   44 +---
>  2 files changed, 22 insertions(+), 24 deletions(-)
>
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index ff9055f..2ebf7af 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -175,7 +175,7 @@ struct css_set {
>   *
>   *
>   * When reading/writing to a file:
> - * - the cgroup to use in file->f_dentry->d_parent->d_fsdata
> + * - the cgroup to use is file->f_dentry->d_parent->d_fsdata
>   * - the 'cftype' of the file is file->f_dentry->d_fsdata
>   */
>
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 4766bb6..0c35022 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -113,9 +113,9 @@ static int root_count;
>  #define dummytop (_cgroup)
>
>  /* This flag indicates whether tasks in the fork and exit paths should
> - * take callback_mutex and check for fork/exit handlers to call. This
> - * avoids us having to do extra work in the fork/exit path if none of the
> - * subsystems need to be called.
> + * check for fork/exit handlers to call. This avoids us having to do
> + * extra work in the fork/exit path if none of the subsystems need to
> + * be called.
>   */
>  static int need_forkexit_callback;
>
> @@ -507,8 +507,8 @@ static struct css_set *find_css_set(
>   * critical pieces of code here.  The exception occurs on cgroup_exit(),
>   * when a task in a notify_on_release cgroup exits.  Then cgroup_mutex
>   * is taken, and if the cgroup count is zero, a usermode call made
> - * to /sbin/cgroup_release_agent with the name of the cgroup (path
> - * relative to the root of cgroup file system) as the argument.
> + * to the release agent with the name of the cgroup (path relative to
> + * the root of cgroup file system) as the argument.
>   *
>   * A cgroup can only be deleted if both its 'count' of using tasks
>   * is zero, and its list of 'children' cgroups is empty.  Since all
> @@ -521,7 +521,7 @@ static struct css_set *find_css_set(
>   *
>   * The need for this exception arises from the action of
>   * cgroup_attach_task(), which overwrites one tasks cgroup pointer with
> - * another.  It does so using cgroup_mutexe, however there are
> + * another.  It does so using cgroup_mutex, however there are
>   * several performance critical places that need to reference
>   * task->cgroup without the expense of grabbing a system global
>   * mutex.  Therefore except as noted below, when dereferencing or, as
> @@ -1192,7 +1192,7 @@ static void get_first_subsys(const struct cgroup *cgrp,
>   * Attach task 'tsk' to cgroup 'cgrp'
>   *
>   * Call holding cgroup_mutex.  May take task_lock of
> - * the task 'pid' during call.
> + * the task 'tsk' during call.
>   */
>  int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
>  {
> @@ -1584,12 +1584,11 @@ static int cgroup_create_file(struct dentry *dentry, 
> int mode,
>  }
>
>  /*

I think that docbook-style function comments need /** at the start of
the comment block.

Thanks,

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/7] cgroup: fix subsys bitops

2008-02-19 Thread Paul Menage
On Feb 17, 2008 9:49 PM, Li Zefan <[EMAIL PROTECTED]> wrote:
> Cgroup uses unsigned long for subsys bitops, not unsigned long long.
>
> Signed-off-by: Li Zefan <[EMAIL PROTECTED]>

Acked-by: Paul Menage <[EMAIL PROTECTED]>

> ---
>  kernel/cgroup.c |4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index aa76bbd..e8c8e58 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -320,7 +320,7 @@ static struct css_set *find_existing_css_set(
> /* Built the set of subsystem state objects that we want to
>  * see in the new css_set */
> for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
> -   if (root->subsys_bits & (1ull << i)) {
> +   if (root->subsys_bits & (1UL << i)) {
> /* Subsystem is in this hierarchy. So we want
>  * the subsystem state from the new
>  * cgroup */
> @@ -696,7 +696,7 @@ static int rebind_subsystems(struct cgroupfs_root *root,
> added_bits = final_bits & ~root->actual_subsys_bits;
> /* Check that any added subsystems are currently free */
> for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
> -   unsigned long long bit = 1ull << i;
> +   unsigned long bit = 1UL << i;
> struct cgroup_subsys *ss = subsys[i];
> if (!(bit & added_bits))
> continue;
> --
> 1.5.4.rc3
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] cgroup: clean up cgroup.h

2008-02-19 Thread Paul Menage
On Feb 17, 2008 9:49 PM, Li Zefan <[EMAIL PROTECTED]> wrote:
> - replace old name 'cont' with 'cgrp' (Paul Menage did this cleanup for
>   cgroup.c in commit bd89aabc6761de1c35b154fe6f914a445d301510)
> - remove a duplicate declaration of cgroup_path()
>
> Signed-off-by: Li Zefan <[EMAIL PROTECTED]>

Acked-by: Paul Menage <[EMAIL PROTECTED]>

> ---
>  include/linux/cgroup.h |   48 
> +++-
>  1 files changed, 23 insertions(+), 25 deletions(-)
>
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 2ebf7af..028ba3b 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -186,15 +186,15 @@ struct cftype {
> char name[MAX_CFTYPE_NAME];
> int private;
> int (*open) (struct inode *inode, struct file *file);
> -   ssize_t (*read) (struct cgroup *cont, struct cftype *cft,
> +   ssize_t (*read) (struct cgroup *cgrp, struct cftype *cft,
>  struct file *file,
>  char __user *buf, size_t nbytes, loff_t *ppos);
> /*
>  * read_uint() is a shortcut for the common case of returning a
>  * single integer. Use it in place of read()
>  */
> -   u64 (*read_uint) (struct cgroup *cont, struct cftype *cft);
> -   ssize_t (*write) (struct cgroup *cont, struct cftype *cft,
> +   u64 (*read_uint) (struct cgroup *cgrp, struct cftype *cft);
> +   ssize_t (*write) (struct cgroup *cgrp, struct cftype *cft,
>   struct file *file,
>   const char __user *buf, size_t nbytes, loff_t 
> *ppos);
>
> @@ -203,7 +203,7 @@ struct cftype {
>  * a single integer (as parsed by simple_strtoull) from
>  * userspace. Use in place of write(); return 0 or error.
>  */
> -   int (*write_uint) (struct cgroup *cont, struct cftype *cft, u64 val);
> +   int (*write_uint) (struct cgroup *cgrp, struct cftype *cft, u64 val);
>
> int (*release) (struct inode *inode, struct file *file);
>  };
> @@ -218,41 +218,41 @@ struct cgroup_scanner {
>
>  /* Add a new file to the given cgroup directory. Should only be
>   * called by subsystems from within a populate() method */
> -int cgroup_add_file(struct cgroup *cont, struct cgroup_subsys *subsys,
> +int cgroup_add_file(struct cgroup *cgrp, struct cgroup_subsys *subsys,
>const struct cftype *cft);
>
>  /* Add a set of new files to the given cgroup directory. Should
>   * only be called by subsystems from within a populate() method */
> -int cgroup_add_files(struct cgroup *cont,
> +int cgroup_add_files(struct cgroup *cgrp,
> struct cgroup_subsys *subsys,
> const struct cftype cft[],
> int count);
>
> -int cgroup_is_removed(const struct cgroup *cont);
> +int cgroup_is_removed(const struct cgroup *cgrp);
>
> -int cgroup_path(const struct cgroup *cont, char *buf, int buflen);
> +int cgroup_path(const struct cgroup *cgrp, char *buf, int buflen);
>
> -int cgroup_task_count(const struct cgroup *cont);
> +int cgroup_task_count(const struct cgroup *cgrp);
>
>  /* Return true if the cgroup is a descendant of the current cgroup */
> -int cgroup_is_descendant(const struct cgroup *cont);
> +int cgroup_is_descendant(const struct cgroup *cgrp);
>
>  /* Control Group subsystem type. See Documentation/cgroups.txt for details */
>
>  struct cgroup_subsys {
> struct cgroup_subsys_state *(*create)(struct cgroup_subsys *ss,
> - struct cgroup *cont);
> -   void (*pre_destroy)(struct cgroup_subsys *ss, struct cgroup *cont);
> -   void (*destroy)(struct cgroup_subsys *ss, struct cgroup *cont);
> + struct cgroup *cgrp);
> +   void (*pre_destroy)(struct cgroup_subsys *ss, struct cgroup *cgrp);
> +   void (*destroy)(struct cgroup_subsys *ss, struct cgroup *cgrp);
> int (*can_attach)(struct cgroup_subsys *ss,
> - struct cgroup *cont, struct task_struct *tsk);
> -   void (*attach)(struct cgroup_subsys *ss, struct cgroup *cont,
> -   struct cgroup *old_cont, struct task_struct *tsk);
> + struct cgroup *cgrp, struct task_struct *tsk);
> +   void (*attach)(struct cgroup_subsys *ss, struct cgroup *cgrp,
> +   struct cgroup *old_cgrp, struct task_struct *tsk);
> void (*fork)(struct cgroup_subsys *ss, struct task_struct *task);
> void (*exit)(struct cgroup_subsys *ss, struct task_struct *task);
> int (*populate)(struct cgroup_subsys *ss,

Re: [RFC][PATCH 1/7] CGroup API: Add cgroup.api control file

2008-02-19 Thread Paul Menage
On Feb 18, 2008 1:45 AM, Li Zefan <[EMAIL PROTECTED]> wrote:
> >
>
> But we don't have /proc/proc.api or /sys/sysfs.api ...

True. And /proc is a bit of a mess. Having a similar API file for
sysfs sounds like a good idea to me.

>
> And is it better to describe the debug subsystem too?
>

Yes, probably, but that would be a separate patch to the debug
subsystem itself, not the main cgroups code.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/7] CGroup API: Add cgroup.api control file

2008-02-19 Thread Paul Menage
On Feb 19, 2008 1:57 PM, Paul Jackson <[EMAIL PROTECTED]> wrote:
>
> Finally, it goes against the one thingie per file (at most, one scalar
> vector) that has worked well for us when tried.

Right, I like the idea of keeping things simple. But if you're going
to accept that a vector is useful, then it seems reasonable that some
other *simple* structured datatypes can be useful. An N-element
key/value map (a la /proc/meminfo) is, I think, nicer than having to
read values from N separate files.

>
> As to the motivations Paul M gives:
>  1) Avoid "an arbitrary mess of ad-hoc APIs":
> We can still do that, whether or not we "self-document" these
> API's in this manner.

We can, but this file makes it more clear what control files have a
well-defined API and which are just returning some ad-hoc string.

I guess it's not essential, I just figured that if we had that
information, it made sense to make it available to userspace. I guess
I'm happy with dropping the actual exposed cgroup.api file for now as
long as we can work towards reducing the number of control files that
just return strings, and make use of the structured output such as
read_uint() miore.

>  2) binary APIs versus ASCII APIs:
> Well, I have an ASCII API bias, not surprising.  But I'd
> suggest not doing things "in anticipation" of some future
> fuzzy binary API support.  Wait until that day actually arrives.

I have a reasonably clear idea of how we can do the binary API.

That's mostly for a separate RFC. But for example, reading a map via
the binary API would be able to just return a list values since the
keys could be parsed once from the ascii map (provided that the
subsystem guaranteed that the map keys and their order wouldn't change
between reboots).


>  3) The memory controller currently has files with the "_in_bytes":
> The traditional way to handle this is Documentation and man
> pages; good enough for my granddad, good enough for me ;).

I've tried submitting patches to remove the in_bytes suffix and just
rely on the documentation, and people didn't seem to like it ...

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Improve init/Kconfig help descriptions [PATCH 6/9]

2008-02-19 Thread Paul Menage
On Feb 19, 2008 7:12 AM, Nick Andrew <[EMAIL PROTECTED]> wrote:
>  config CGROUPS
> bool "Control Group support"
> help
> - This option will let you use process cgroup subsystems
> - such as Cpusets
> + Control Groups enables processes to be tracked and grouped
> + into "cgroups". This enables you, for example, to associate
> + cgroups with certain CPU sets using "cpusets".
>
> - Say N if unsure.
> + When enabled, a new filesystem type "cgroup" is available
> + and can be mounted to control cpusets.

How about:

... cpusets and other resource/behaviour controllers.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Improve init/Kconfig help descriptions [PATCH 6/9]

2008-02-19 Thread Paul Menage
On Feb 19, 2008 7:12 AM, Nick Andrew [EMAIL PROTECTED] wrote:
  config CGROUPS
 bool Control Group support
 help
 - This option will let you use process cgroup subsystems
 - such as Cpusets
 + Control Groups enables processes to be tracked and grouped
 + into cgroups. This enables you, for example, to associate
 + cgroups with certain CPU sets using cpusets.

 - Say N if unsure.
 + When enabled, a new filesystem type cgroup is available
 + and can be mounted to control cpusets.

How about:

... cpusets and other resource/behaviour controllers.

Paul
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/7] CGroup API: Add cgroup.api control file

2008-02-19 Thread Paul Menage
On Feb 19, 2008 1:57 PM, Paul Jackson [EMAIL PROTECTED] wrote:

 Finally, it goes against the one thingie per file (at most, one scalar
 vector) that has worked well for us when tried.

Right, I like the idea of keeping things simple. But if you're going
to accept that a vector is useful, then it seems reasonable that some
other *simple* structured datatypes can be useful. An N-element
key/value map (a la /proc/meminfo) is, I think, nicer than having to
read values from N separate files.


 As to the motivations Paul M gives:
  1) Avoid an arbitrary mess of ad-hoc APIs:
 We can still do that, whether or not we self-document these
 API's in this manner.

We can, but this file makes it more clear what control files have a
well-defined API and which are just returning some ad-hoc string.

I guess it's not essential, I just figured that if we had that
information, it made sense to make it available to userspace. I guess
I'm happy with dropping the actual exposed cgroup.api file for now as
long as we can work towards reducing the number of control files that
just return strings, and make use of the structured output such as
read_uint() miore.

  2) binary APIs versus ASCII APIs:
 Well, I have an ASCII API bias, not surprising.  But I'd
 suggest not doing things in anticipation of some future
 fuzzy binary API support.  Wait until that day actually arrives.

I have a reasonably clear idea of how we can do the binary API.

That's mostly for a separate RFC. But for example, reading a map via
the binary API would be able to just return a list values since the
keys could be parsed once from the ascii map (provided that the
subsystem guaranteed that the map keys and their order wouldn't change
between reboots).


  3) The memory controller currently has files with the _in_bytes:
 The traditional way to handle this is Documentation and man
 pages; good enough for my granddad, good enough for me ;).

I've tried submitting patches to remove the in_bytes suffix and just
rely on the documentation, and people didn't seem to like it ...

Paul
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/7] CGroup API: Add cgroup.api control file

2008-02-19 Thread Paul Menage
On Feb 18, 2008 1:45 AM, Li Zefan [EMAIL PROTECTED] wrote:
 

 But we don't have /proc/proc.api or /sys/sysfs.api ...

True. And /proc is a bit of a mess. Having a similar API file for
sysfs sounds like a good idea to me.


 And is it better to describe the debug subsystem too?


Yes, probably, but that would be a separate patch to the debug
subsystem itself, not the main cgroups code.

Paul
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] cgroup: clean up cgroup.h

2008-02-19 Thread Paul Menage
On Feb 17, 2008 9:49 PM, Li Zefan [EMAIL PROTECTED] wrote:
 - replace old name 'cont' with 'cgrp' (Paul Menage did this cleanup for
   cgroup.c in commit bd89aabc6761de1c35b154fe6f914a445d301510)
 - remove a duplicate declaration of cgroup_path()

 Signed-off-by: Li Zefan [EMAIL PROTECTED]

Acked-by: Paul Menage [EMAIL PROTECTED]

 ---
  include/linux/cgroup.h |   48 
 +++-
  1 files changed, 23 insertions(+), 25 deletions(-)

 diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
 index 2ebf7af..028ba3b 100644
 --- a/include/linux/cgroup.h
 +++ b/include/linux/cgroup.h
 @@ -186,15 +186,15 @@ struct cftype {
 char name[MAX_CFTYPE_NAME];
 int private;
 int (*open) (struct inode *inode, struct file *file);
 -   ssize_t (*read) (struct cgroup *cont, struct cftype *cft,
 +   ssize_t (*read) (struct cgroup *cgrp, struct cftype *cft,
  struct file *file,
  char __user *buf, size_t nbytes, loff_t *ppos);
 /*
  * read_uint() is a shortcut for the common case of returning a
  * single integer. Use it in place of read()
  */
 -   u64 (*read_uint) (struct cgroup *cont, struct cftype *cft);
 -   ssize_t (*write) (struct cgroup *cont, struct cftype *cft,
 +   u64 (*read_uint) (struct cgroup *cgrp, struct cftype *cft);
 +   ssize_t (*write) (struct cgroup *cgrp, struct cftype *cft,
   struct file *file,
   const char __user *buf, size_t nbytes, loff_t 
 *ppos);

 @@ -203,7 +203,7 @@ struct cftype {
  * a single integer (as parsed by simple_strtoull) from
  * userspace. Use in place of write(); return 0 or error.
  */
 -   int (*write_uint) (struct cgroup *cont, struct cftype *cft, u64 val);
 +   int (*write_uint) (struct cgroup *cgrp, struct cftype *cft, u64 val);

 int (*release) (struct inode *inode, struct file *file);
  };
 @@ -218,41 +218,41 @@ struct cgroup_scanner {

  /* Add a new file to the given cgroup directory. Should only be
   * called by subsystems from within a populate() method */
 -int cgroup_add_file(struct cgroup *cont, struct cgroup_subsys *subsys,
 +int cgroup_add_file(struct cgroup *cgrp, struct cgroup_subsys *subsys,
const struct cftype *cft);

  /* Add a set of new files to the given cgroup directory. Should
   * only be called by subsystems from within a populate() method */
 -int cgroup_add_files(struct cgroup *cont,
 +int cgroup_add_files(struct cgroup *cgrp,
 struct cgroup_subsys *subsys,
 const struct cftype cft[],
 int count);

 -int cgroup_is_removed(const struct cgroup *cont);
 +int cgroup_is_removed(const struct cgroup *cgrp);

 -int cgroup_path(const struct cgroup *cont, char *buf, int buflen);
 +int cgroup_path(const struct cgroup *cgrp, char *buf, int buflen);

 -int cgroup_task_count(const struct cgroup *cont);
 +int cgroup_task_count(const struct cgroup *cgrp);

  /* Return true if the cgroup is a descendant of the current cgroup */
 -int cgroup_is_descendant(const struct cgroup *cont);
 +int cgroup_is_descendant(const struct cgroup *cgrp);

  /* Control Group subsystem type. See Documentation/cgroups.txt for details */

  struct cgroup_subsys {
 struct cgroup_subsys_state *(*create)(struct cgroup_subsys *ss,
 - struct cgroup *cont);
 -   void (*pre_destroy)(struct cgroup_subsys *ss, struct cgroup *cont);
 -   void (*destroy)(struct cgroup_subsys *ss, struct cgroup *cont);
 + struct cgroup *cgrp);
 +   void (*pre_destroy)(struct cgroup_subsys *ss, struct cgroup *cgrp);
 +   void (*destroy)(struct cgroup_subsys *ss, struct cgroup *cgrp);
 int (*can_attach)(struct cgroup_subsys *ss,
 - struct cgroup *cont, struct task_struct *tsk);
 -   void (*attach)(struct cgroup_subsys *ss, struct cgroup *cont,
 -   struct cgroup *old_cont, struct task_struct *tsk);
 + struct cgroup *cgrp, struct task_struct *tsk);
 +   void (*attach)(struct cgroup_subsys *ss, struct cgroup *cgrp,
 +   struct cgroup *old_cgrp, struct task_struct *tsk);
 void (*fork)(struct cgroup_subsys *ss, struct task_struct *task);
 void (*exit)(struct cgroup_subsys *ss, struct task_struct *task);
 int (*populate)(struct cgroup_subsys *ss,
 -   struct cgroup *cont);
 -   void (*post_clone)(struct cgroup_subsys *ss, struct cgroup *cont);
 +   struct cgroup *cgrp);
 +   void (*post_clone)(struct cgroup_subsys *ss, struct cgroup *cgrp);
 void (*bind)(struct cgroup_subsys *ss, struct cgroup *root);
 int subsys_id;
 int active;
 @@ -273,9 +273,9 @@ struct

Re: [PATCH 5/7] cgroup: fix subsys bitops

2008-02-19 Thread Paul Menage
On Feb 17, 2008 9:49 PM, Li Zefan [EMAIL PROTECTED] wrote:
 Cgroup uses unsigned long for subsys bitops, not unsigned long long.

 Signed-off-by: Li Zefan [EMAIL PROTECTED]

Acked-by: Paul Menage [EMAIL PROTECTED]

 ---
  kernel/cgroup.c |4 ++--
  1 files changed, 2 insertions(+), 2 deletions(-)

 diff --git a/kernel/cgroup.c b/kernel/cgroup.c
 index aa76bbd..e8c8e58 100644
 --- a/kernel/cgroup.c
 +++ b/kernel/cgroup.c
 @@ -320,7 +320,7 @@ static struct css_set *find_existing_css_set(
 /* Built the set of subsystem state objects that we want to
  * see in the new css_set */
 for (i = 0; i  CGROUP_SUBSYS_COUNT; i++) {
 -   if (root-subsys_bits  (1ull  i)) {
 +   if (root-subsys_bits  (1UL  i)) {
 /* Subsystem is in this hierarchy. So we want
  * the subsystem state from the new
  * cgroup */
 @@ -696,7 +696,7 @@ static int rebind_subsystems(struct cgroupfs_root *root,
 added_bits = final_bits  ~root-actual_subsys_bits;
 /* Check that any added subsystems are currently free */
 for (i = 0; i  CGROUP_SUBSYS_COUNT; i++) {
 -   unsigned long long bit = 1ull  i;
 +   unsigned long bit = 1UL  i;
 struct cgroup_subsys *ss = subsys[i];
 if (!(bit  added_bits))
 continue;
 --
 1.5.4.rc3


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/7] cgroup: fix memory leak in cgroup_get_sb()

2008-02-19 Thread Paul Menage
On Feb 17, 2008 9:49 PM, Li Zefan [EMAIL PROTECTED] wrote:
 opts.release_agent is not kfree()ed in all necessary places.

 Signed-off-by: Li Zefan [EMAIL PROTECTED]

Acked-by: Paul Menage [EMAIL PROTECTED]

Good catch, although hopefully something that would be extremely rare
in practice.

Thanks,

Paul

 ---
  kernel/cgroup.c |5 -
  1 files changed, 4 insertions(+), 1 deletions(-)

 diff --git a/kernel/cgroup.c b/kernel/cgroup.c
 index 0c35022..aa76bbd 100644
 --- a/kernel/cgroup.c
 +++ b/kernel/cgroup.c
 @@ -961,8 +961,11 @@ static int cgroup_get_sb(struct file_system_type 
 *fs_type,
 }

 root = kzalloc(sizeof(*root), GFP_KERNEL);
 -   if (!root)
 +   if (!root) {
 +   if (opts.release_agent)
 +   kfree(opts.release_agent);
 return -ENOMEM;
 +   }

 init_cgroup_root(root);
 root-subsys_bits = opts.subsys_bits;
 --
 1.5.4.rc3


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/7] cgroup: fix and update documentation

2008-02-19 Thread Paul Menage
On Feb 18, 2008 12:39 AM, Li Zefan [EMAIL PROTECTED] wrote:
 Misc fixes and updates, make the doc consistent with current
 cgroup implementation.

 Signed-off-by: Li Zefan [EMAIL PROTECTED]

Acked-by: Paul Menage [EMAIL PROTECTED]

Thanks for these cleanups.

Paul

 ---
  Documentation/cgroups.txt |   66 ++--
  1 files changed, 33 insertions(+), 33 deletions(-)

 diff --git a/Documentation/cgroups.txt b/Documentation/cgroups.txt
 index 42d7c4c..31d12e2 100644

 --- a/Documentation/cgroups.txt
 +++ b/Documentation/cgroups.txt
 @@ -28,7 +28,7 @@ CONTENTS:
  4. Questions

  1. Control Groups
 -==
 +=

  1.1 What are cgroups ?
  --
 @@ -143,10 +143,10 @@ proliferation of such cgroups.

  Also lets say that the administrator would like to give enhanced network
  access temporarily to a student's browser (since it is night and the user
 -wants to do online gaming :)  OR give one of the students simulation
 +wants to do online gaming :))  OR give one of the students simulation
  apps enhanced CPU power,

 -With ability to write pids directly to resource classes, its just a
 +With ability to write pids directly to resource classes, it's just a
  matter of :

 # echo pid  /mnt/network/new_class/tasks
 @@ -227,10 +227,13 @@ Each cgroup is represented by a directory in the cgroup 
 file system
  containing the following files describing that cgroup:

   - tasks: list of tasks (by pid) attached to that cgroup
 - - notify_on_release flag: run /sbin/cgroup_release_agent on exit?
 + - releasable flag: cgroup currently removeable?
 + - notify_on_release flag: run the release agent on exit?
 + - release_agent: the path to use for release notifications (this file
 +   exists in the top cgroup only)

  Other subsystems such as cpusets may add additional files in each
 -cgroup dir
 +cgroup dir.

  New cgroups are created using the mkdir system call or shell
  command.  The properties of a cgroup, such as its flags, are
 @@ -257,7 +260,7 @@ performance.
  To allow access from a cgroup to the css_sets (and hence tasks)
  that comprise it, a set of cg_cgroup_link objects form a lattice;
  each cg_cgroup_link is linked into a list of cg_cgroup_links for
 -a single cgroup on its cont_link_list field, and a list of
 +a single cgroup on its cgrp_link_list field, and a list of
  cg_cgroup_links for a single css_set on its cg_link_list.

  Thus the set of tasks in a cgroup can be listed by iterating over
 @@ -271,9 +274,6 @@ for cgroups, with a minimum of additional kernel code.
  1.4 What does notify_on_release do ?
  

 -*** notify_on_release is disabled in the current patch set. It will be
 -*** reactivated in a future patch in a less-intrusive manner
 -
  If the notify_on_release flag is enabled (1) in a cgroup, then
  whenever the last task in the cgroup leaves (exits or attaches to
  some other cgroup) and the last child cgroup of that cgroup
 @@ -360,8 +360,8 @@ Now you want to do something with this cgroup.

  In this directory you can find several files:
  # ls
 -notify_on_release release_agent tasks
 -(plus whatever files are added by the attached subsystems)
 +notify_on_release releasable tasks
 +(plus whatever files added by the attached subsystems)

  Now attach your shell to this cgroup:
  # /bin/echo $$  tasks
 @@ -404,19 +404,13 @@ with a subsystem id which will be assigned by the 
 cgroup system.
  Other fields in the cgroup_subsys object include:

  - subsys_id: a unique array index for the subsystem, indicating which
 -  entry in cgroup-subsys[] this subsystem should be
 -  managing. Initialized by cgroup_register_subsys(); prior to this
 -  it should be initialized to -1
 +  entry in cgroup-subsys[] this subsystem should be managing.

 -- hierarchy: an index indicating which hierarchy, if any, this
 -  subsystem is currently attached to. If this is -1, then the
 -  subsystem is not attached to any hierarchy, and all tasks should be
 -  considered to be members of the subsystem's top_cgroup. It should
 -  be initialized to -1.
 +- name: should be initialized to a unique subsystem name. Should be
 +  no longer than MAX_CGROUP_TYPE_NAMELEN.

 -- name: should be initialized to a unique subsystem name prior to
 -  calling cgroup_register_subsystem. Should be no longer than
 -  MAX_CGROUP_TYPE_NAMELEN
 +- early_init: indicate if the subsystem needs early initialization
 +  at system boot.

  Each cgroup object created by the system has an array of pointers,
  indexed by subsystem id; this pointer is entirely managed by the
 @@ -434,8 +428,6 @@ situation.
  See kernel/cgroup.c for more details.

  Subsystems can take/release the cgroup_mutex via the functions
 -cgroup_lock()/cgroup_unlock(), and can
 -take/release the callback_mutex via the functions
  cgroup_lock()/cgroup_unlock().

  Accessing a task's cgroup pointer may be done in the following ways:
 @@ -444,7 +436,7 @@ Accessing a task's

Re: Improve init/Kconfig help descriptions [PATCH 6/9]

2008-02-19 Thread Paul Menage
On Feb 19, 2008 6:54 PM, Nick Andrew [EMAIL PROTECTED] wrote:

 config CGROUPS
 bool Control Group support
 help
   Control Groups enables processes to be tracked and grouped
   into cgroups. This enables you, for example, to associate
   cgroups with certain CPU sets using cpusets.

   When enabled, a new filesystem type cgroup is available
   and can be mounted to control cpusets and other
   resource/behaviour controllers.

   See file:Documentation/cgroups.txt for more information.

   If unsure, say N.


 I don't think that description is as clear as it could be. From
 the non-kernel-developer point of view, that is.

Originally this wasn't a user-selectable config value, it was
auto-selected by any subsystem that needed it. I think that was nicer
from the user-experience, and it would eliminate the need for this
documentation but there were concerns that this triggered unspecified
brokenness in the Kbuild system.


 Re other resource/behaviour controllers, what in particular?
 I take it that our current controllers are cpusets, scheduler,
 CPU accounting and Resource counters?

Resource counters aren't a resource controller, they're a helper
library. The others are good examples, as is the memory controller
that's just been added to 2.6.25.

Paul
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] cgroup: remove duplicate code in find_css_set()

2008-02-19 Thread Paul Menage
On Feb 17, 2008 9:49 PM, Li Zefan [EMAIL PROTECTED] wrote:
 The list head res-tasks gets initialized twice in find_css_set().

 Signed-off-by: Li Zefan [EMAIL PROTECTED]

Acked-by: Paul Menage [EMAIL PROTECTED]

 ---
  kernel/cgroup.c |1 -
  1 files changed, 0 insertions(+), 1 deletions(-)

 diff --git a/kernel/cgroup.c b/kernel/cgroup.c
 index e8c8e58..71cf961 100644
 --- a/kernel/cgroup.c
 +++ b/kernel/cgroup.c
 @@ -473,7 +473,6 @@ static struct css_set *find_css_set(
 /* Link this cgroup group into the list */
 list_add(res-list, init_css_set.list);
 css_set_count++;
 -   INIT_LIST_HEAD(res-tasks);
 write_unlock(css_set_lock);

 return res;
 --
 1.5.4.rc3


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/7] cgroup: fix comments

2008-02-19 Thread Paul Menage
On Feb 17, 2008 9:49 PM, Li Zefan [EMAIL PROTECTED] wrote:
 fix:
 - comments about need_forkexit_callback
 - comments about release agent
 - typo and comment style, etc.

 Signed-off-by: Li Zefan [EMAIL PROTECTED]
 ---
  include/linux/cgroup.h |2 +-
  kernel/cgroup.c|   44 +---
  2 files changed, 22 insertions(+), 24 deletions(-)

 diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
 index ff9055f..2ebf7af 100644
 --- a/include/linux/cgroup.h
 +++ b/include/linux/cgroup.h
 @@ -175,7 +175,7 @@ struct css_set {
   *
   *
   * When reading/writing to a file:
 - * - the cgroup to use in file-f_dentry-d_parent-d_fsdata
 + * - the cgroup to use is file-f_dentry-d_parent-d_fsdata
   * - the 'cftype' of the file is file-f_dentry-d_fsdata
   */

 diff --git a/kernel/cgroup.c b/kernel/cgroup.c
 index 4766bb6..0c35022 100644
 --- a/kernel/cgroup.c
 +++ b/kernel/cgroup.c
 @@ -113,9 +113,9 @@ static int root_count;
  #define dummytop (rootnode.top_cgroup)

  /* This flag indicates whether tasks in the fork and exit paths should
 - * take callback_mutex and check for fork/exit handlers to call. This
 - * avoids us having to do extra work in the fork/exit path if none of the
 - * subsystems need to be called.
 + * check for fork/exit handlers to call. This avoids us having to do
 + * extra work in the fork/exit path if none of the subsystems need to
 + * be called.
   */
  static int need_forkexit_callback;

 @@ -507,8 +507,8 @@ static struct css_set *find_css_set(
   * critical pieces of code here.  The exception occurs on cgroup_exit(),
   * when a task in a notify_on_release cgroup exits.  Then cgroup_mutex
   * is taken, and if the cgroup count is zero, a usermode call made
 - * to /sbin/cgroup_release_agent with the name of the cgroup (path
 - * relative to the root of cgroup file system) as the argument.
 + * to the release agent with the name of the cgroup (path relative to
 + * the root of cgroup file system) as the argument.
   *
   * A cgroup can only be deleted if both its 'count' of using tasks
   * is zero, and its list of 'children' cgroups is empty.  Since all
 @@ -521,7 +521,7 @@ static struct css_set *find_css_set(
   *
   * The need for this exception arises from the action of
   * cgroup_attach_task(), which overwrites one tasks cgroup pointer with
 - * another.  It does so using cgroup_mutexe, however there are
 + * another.  It does so using cgroup_mutex, however there are
   * several performance critical places that need to reference
   * task-cgroup without the expense of grabbing a system global
   * mutex.  Therefore except as noted below, when dereferencing or, as
 @@ -1192,7 +1192,7 @@ static void get_first_subsys(const struct cgroup *cgrp,
   * Attach task 'tsk' to cgroup 'cgrp'
   *
   * Call holding cgroup_mutex.  May take task_lock of
 - * the task 'pid' during call.
 + * the task 'tsk' during call.
   */
  int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
  {
 @@ -1584,12 +1584,11 @@ static int cgroup_create_file(struct dentry *dentry, 
 int mode,
  }

  /*

I think that docbook-style function comments need /** at the start of
the comment block.

Thanks,

Paul
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] Cpusets API: Update cpusets to use cgroup structured file API

2008-02-19 Thread Paul Menage
Many of the cpusets control files are simple integer values, which
don't require the overhead of memory allocations for reads and writes.

Move the handlers for these control files into cpuset_read_uint() and
cpuset_write_uint(). This also has the advantage that the control
files show up as u64 rather than string in the cgroup.api file.

Signed-off-by: Paul Menage [EMAIL PROTECTED]

---
 kernel/cpuset.c |  156 +---
 1 file changed, 82 insertions(+), 74 deletions(-)

Index: cpusets-2.6.25-rc2-mm1/kernel/cpuset.c
===
--- cpusets-2.6.25-rc2-mm1.orig/kernel/cpuset.c
+++ cpusets-2.6.25-rc2-mm1/kernel/cpuset.c
@@ -999,19 +999,6 @@ int current_cpuset_is_being_rebound(void
 }
 
 /*
- * Call with cgroup_mutex held.
- */
-
-static int update_memory_pressure_enabled(struct cpuset *cs, char *buf)
-{
-   if (simple_strtoul(buf, NULL, 10) != 0)
-   cpuset_memory_pressure_enabled = 1;
-   else
-   cpuset_memory_pressure_enabled = 0;
-   return 0;
-}
-
-/*
  * update_flag - read a 0 or a 1 in a file and update associated flag
  * bit:the bit to update (CS_CPU_EXCLUSIVE, CS_MEM_EXCLUSIVE,
  * CS_SCHED_LOAD_BALANCE,
@@ -1023,15 +1010,13 @@ static int update_memory_pressure_enable
  * Call with cgroup_mutex held.
  */
 
-static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs, char *buf)
+static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
+  int turning_on)
 {
-   int turning_on;
struct cpuset trialcs;
int err;
int cpus_nonempty, balance_flag_changed;
 
-   turning_on = (simple_strtoul(buf, NULL, 10) != 0);
-
trialcs = *cs;
if (turning_on)
set_bit(bit, trialcs.flags);
@@ -1247,43 +1232,65 @@ static ssize_t cpuset_common_file_write(
case FILE_MEMLIST:
retval = update_nodemask(cs, buffer);
break;
+   default:
+   retval = -EINVAL;
+   goto out2;
+   }
+
+   if (retval == 0)
+   retval = nbytes;
+out2:
+   cgroup_unlock();
+out1:
+   kfree(buffer);
+   return retval;
+}
+
+static int cpuset_write_uint(struct cgroup *cgrp, struct cftype *cft, u64 val)
+{
+   int retval = 0;
+   struct cpuset *cs = cgroup_cs(cgrp);
+   cpuset_filetype_t type = cft-private;
+
+   cgroup_lock();
+
+   if (cgroup_is_removed(cgrp)) {
+   cgroup_unlock();
+   return -ENODEV;
+   }
+
+   switch (type) {
case FILE_CPU_EXCLUSIVE:
-   retval = update_flag(CS_CPU_EXCLUSIVE, cs, buffer);
+   retval = update_flag(CS_CPU_EXCLUSIVE, cs, val);
break;
case FILE_MEM_EXCLUSIVE:
-   retval = update_flag(CS_MEM_EXCLUSIVE, cs, buffer);
+   retval = update_flag(CS_MEM_EXCLUSIVE, cs, val);
break;
case FILE_SCHED_LOAD_BALANCE:
-   retval = update_flag(CS_SCHED_LOAD_BALANCE, cs, buffer);
+   retval = update_flag(CS_SCHED_LOAD_BALANCE, cs, val);
break;
case FILE_MEMORY_MIGRATE:
-   retval = update_flag(CS_MEMORY_MIGRATE, cs, buffer);
+   retval = update_flag(CS_MEMORY_MIGRATE, cs, val);
break;
case FILE_MEMORY_PRESSURE_ENABLED:
-   retval = update_memory_pressure_enabled(cs, buffer);
+   cpuset_memory_pressure_enabled = !!val;
break;
case FILE_MEMORY_PRESSURE:
retval = -EACCES;
break;
case FILE_SPREAD_PAGE:
-   retval = update_flag(CS_SPREAD_PAGE, cs, buffer);
+   retval = update_flag(CS_SPREAD_PAGE, cs, val);
cs-mems_generation = cpuset_mems_generation++;
break;
case FILE_SPREAD_SLAB:
-   retval = update_flag(CS_SPREAD_SLAB, cs, buffer);
+   retval = update_flag(CS_SPREAD_SLAB, cs, val);
cs-mems_generation = cpuset_mems_generation++;
break;
default:
retval = -EINVAL;
-   goto out2;
+   break;
}
-
-   if (retval == 0)
-   retval = nbytes;
-out2:
cgroup_unlock();
-out1:
-   kfree(buffer);
return retval;
 }
 
@@ -1345,30 +1352,6 @@ static ssize_t cpuset_common_file_read(s
case FILE_MEMLIST:
s += cpuset_sprintf_memlist(s, cs);
break;
-   case FILE_CPU_EXCLUSIVE:
-   *s++ = is_cpu_exclusive(cs) ? '1' : '0';
-   break;
-   case FILE_MEM_EXCLUSIVE:
-   *s++ = is_mem_exclusive(cs) ? '1' : '0';
-   break;
-   case FILE_SCHED_LOAD_BALANCE:
-   *s++ = is_sched_load_balance(cs) ? '1' : '0';
-   break;
-   case

[PATCH 1/2] Cpusets API: From: Paul Jackson [EMAIL PROTECTED]

2008-02-19 Thread Paul Menage
Strip all trailing whitespace in cgroup_write_uint

This removes the need for people to remember to pass the -n flag to
echo when writing values to cgroup control files.

Signed-off-by: Paul Menage [EMAIL PROTECTED]

---
 kernel/cgroup.c |5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

Index: cpusets-2.6.25-rc2-mm1/kernel/cgroup.c
===
--- cpusets-2.6.25-rc2-mm1.orig/kernel/cgroup.c
+++ cpusets-2.6.25-rc2-mm1/kernel/cgroup.c
@@ -1321,10 +1321,7 @@ static ssize_t cgroup_write_uint(struct 
return -EFAULT;
 
buffer[nbytes] = 0; /* nul-terminate */
-
-   /* strip newline if necessary */
-   if (nbytes  (buffer[nbytes-1] == '\n'))
-   buffer[nbytes-1] = 0;
+   strstrip(buffer);
val = simple_strtoull(buffer, end, 0);
if (*end)
return -EINVAL;

--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/2] Cpusets API: Update Cpusets control files

2008-02-19 Thread Paul Menage
This pair of patches simplifies the cpusets read/write path for the
control files that consist of simple integers.

--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/7] CGroup API: Add cgroup.api control file

2008-02-19 Thread Paul Menage
On Feb 19, 2008 9:17 PM, Paul Jackson [EMAIL PROTECTED] wrote:

 Perhaps my primary concern with these *.api files was that I did not
 understand who or what the critical use or user was; who found this
 essential, not just nice to have.


Right now, no-one would find it essential. If/when a binary API is
added, I guess I'll ressurrect this part of the patchset.

Paul
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] cgroup map files: Add a key/value map file type to cgroups

2008-02-19 Thread Paul Menage
On Feb 19, 2008 9:48 PM, YAMAMOTO Takashi [EMAIL PROTECTED] wrote:

 it changes the format from %s %lld to %s: %llu, right?
 why?


The colon for consistency with maps in /proc. I think it also makes it
slightly more readable.

For %lld versus %llu - I think that cgroup resource APIs are much more
likely to need to report unsigned rather than signed values. In the
case of the memory.stat file, that's certainly the case.

But I guess there's an argument to be made that nothing's likely to
need the final 64th bit of an unsigned value, whereas the ability to
report negative numbers could potentially be useful for some cgroups.

Paul
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] cgroup map files: Add a key/value map file type to cgroups

2008-02-19 Thread Paul Menage
On Feb 19, 2008 10:14 PM, YAMAMOTO Takashi [EMAIL PROTECTED] wrote:
  On Feb 19, 2008 9:48 PM, YAMAMOTO Takashi [EMAIL PROTECTED] wrote:
  
   it changes the format from %s %lld to %s: %llu, right?
   why?
  
 
  The colon for consistency with maps in /proc. I think it also makes it
  slightly more readable.

 can you be a little more specific?

 i object against the colon because i want to use the same parser for
 /proc/vmstat, which doesn't have colons.

Ah. This /proc behaviour of having multiple formats for reporting the
same kind of data (compare with /proc/meminfo, which does use colons)
is the kind of thing that I want to avoid with cgroups. i.e. if two
cgroup subsystems are both reporting the same kind of structured data,
then they should both use the same output format.

I guess since /proc has both styles, and memory.stat is the first file
reporting key/value pairs in cgroups, you get to call the format. OK,
I'll zap the colon.

Paul
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 7/7] CGroup API: Update cpusets to use cgroup structured file API

2008-02-17 Thread Paul Menage
On Feb 17, 2008 9:28 AM, Paul Jackson <[EMAIL PROTECTED]> wrote:
>
> I'm figuring it would be easiest if you just threw this
> little change into your hopper for the bigger changes
> you're making

OK, will do.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 7/7] CGroup API: Update cpusets to use cgroup structured file API

2008-02-17 Thread Paul Menage
On Feb 16, 2008 7:29 PM, Paul Jackson <[EMAIL PROTECTED]> wrote:
>
> From: Paul Jackson <[EMAIL PROTECTED]>
>
> Strip all trailing whitespace (such as carriage returns)
> when parsing integer writes to cgroup files, not just
> one trailing newline if present.

Sounds like a good idea to me. Thanks for this.

>
> Signed-off-by: Paul Jackson <[EMAIL PROTECTED]>
> Cc: Paul Menage <[EMAIL PROTECTED]>

Acked-by: Paul Menage <[EMAIL PROTECTED]>

>
> ---
>  kernel/cgroup.c |5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)
>
> --- 2.6.24-mm1.orig/kernel/cgroup.c 2008-02-16 04:20:33.0 -0800
> +++ 2.6.24-mm1/kernel/cgroup.c  2008-02-16 19:00:41.207478218 -0800
> @@ -1321,10 +1321,7 @@ static ssize_t cgroup_write_uint(struct
> return -EFAULT;
>
> buffer[nbytes] = 0; /* nul-terminate */
> -
> -   /* strip newline if necessary */
> -   if (nbytes && (buffer[nbytes-1] == '\n'))
> -   buffer[nbytes-1] = 0;
> +   strstrip(buffer);   /* strip -just- trailing whitespace */
> val = simple_strtoull(buffer, , 0);
> if (*end)
> return -EINVAL;
>
>
> --
>   I won't rest till it's the best ...
>   Programmer, Linux Scalability
>   Paul Jackson <[EMAIL PROTECTED]> 1.940.382.4214
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 7/7] CGroup API: Update cpusets to use cgroup structured file API

2008-02-17 Thread Paul Menage
On Feb 16, 2008 7:29 PM, Paul Jackson [EMAIL PROTECTED] wrote:

 From: Paul Jackson [EMAIL PROTECTED]

 Strip all trailing whitespace (such as carriage returns)
 when parsing integer writes to cgroup files, not just
 one trailing newline if present.

Sounds like a good idea to me. Thanks for this.


 Signed-off-by: Paul Jackson [EMAIL PROTECTED]
 Cc: Paul Menage [EMAIL PROTECTED]

Acked-by: Paul Menage [EMAIL PROTECTED]


 ---
  kernel/cgroup.c |5 +
  1 file changed, 1 insertion(+), 4 deletions(-)

 --- 2.6.24-mm1.orig/kernel/cgroup.c 2008-02-16 04:20:33.0 -0800
 +++ 2.6.24-mm1/kernel/cgroup.c  2008-02-16 19:00:41.207478218 -0800
 @@ -1321,10 +1321,7 @@ static ssize_t cgroup_write_uint(struct
 return -EFAULT;

 buffer[nbytes] = 0; /* nul-terminate */
 -
 -   /* strip newline if necessary */
 -   if (nbytes  (buffer[nbytes-1] == '\n'))
 -   buffer[nbytes-1] = 0;
 +   strstrip(buffer);   /* strip -just- trailing whitespace */
 val = simple_strtoull(buffer, end, 0);
 if (*end)
 return -EINVAL;


 --
   I won't rest till it's the best ...
   Programmer, Linux Scalability
   Paul Jackson [EMAIL PROTECTED] 1.940.382.4214

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 7/7] CGroup API: Update cpusets to use cgroup structured file API

2008-02-17 Thread Paul Menage
On Feb 17, 2008 9:28 AM, Paul Jackson [EMAIL PROTECTED] wrote:

 I'm figuring it would be easiest if you just threw this
 little change into your hopper for the bigger changes
 you're making

OK, will do.

Paul
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/7] CGroup API: Add cgroup.api control file

2008-02-16 Thread Paul Menage
On Feb 16, 2008 2:07 AM, Balbir Singh <[EMAIL PROTECTED]> wrote:
> Paul Menage wrote:
>
> Hi, Paul,
>
> Do we need to use a cgroup.api file? Why not keep up to date documentation and
> get users to use that. I fear that, cgroup.api will not be kept up-to-date,
> leading to confusion.

The cgroup.api file isn't meant to give complete documentation for a
control file, simply a brief indication of its usage.

The aim is that most bits of the information reported in cgroup.api
are auto-generated, so there shouldn't be problems with it getting
out-of-date.

Is it just the space used by the documentation string that you're
objecting to? The other function of the file is to declare a type for
each variable.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 0/7] CGroup API: More structured API for CGroups control files

2008-02-16 Thread Paul Menage
On Feb 16, 2008 1:31 AM, Li Zefan <[EMAIL PROTECTED]> wrote:
>
> I don't quite catch what you mean. Cgoup does support write-only/read-only
> files. For a write-only file, just set .write and .write_uint to be NULL,
> similar for a read-only file.
>
> Do I miss something?
>

I suppose we could infer from the lack of any write handlers that we
should give the file in the filesystem a mode of 444 rather 644.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/7] CGroup API: Add cgroup.api control file

2008-02-16 Thread Paul Menage
On Feb 16, 2008 2:07 AM, Balbir Singh [EMAIL PROTECTED] wrote:
 Paul Menage wrote:

 Hi, Paul,

 Do we need to use a cgroup.api file? Why not keep up to date documentation and
 get users to use that. I fear that, cgroup.api will not be kept up-to-date,
 leading to confusion.

The cgroup.api file isn't meant to give complete documentation for a
control file, simply a brief indication of its usage.

The aim is that most bits of the information reported in cgroup.api
are auto-generated, so there shouldn't be problems with it getting
out-of-date.

Is it just the space used by the documentation string that you're
objecting to? The other function of the file is to declare a type for
each variable.

Paul
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 0/7] CGroup API: More structured API for CGroups control files

2008-02-16 Thread Paul Menage
On Feb 16, 2008 1:31 AM, Li Zefan [EMAIL PROTECTED] wrote:

 I don't quite catch what you mean. Cgoup does support write-only/read-only
 files. For a write-only file, just set .write and .write_uint to be NULL,
 similar for a read-only file.

 Do I miss something?


I suppose we could infer from the lack of any write handlers that we
should give the file in the filesystem a mode of 444 rather 644.

Paul
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH 6/7] CGroup API: Use descriptions for memory controller API files

2008-02-15 Thread Paul Menage
This patch adds descriptions to the memory controller API files to
indicate that the usage/limit are in bytes; the names of the control
files can then be simplified to usage/limit.

Also removes the unnecessary mem_force_empty_read() function

Signed-off-by: Paul Menage <[EMAIL PROTECTED]>

---
 mm/memcontrol.c |   21 +
 1 file changed, 5 insertions(+), 16 deletions(-)

Index: cgroupmap-2.6.24-mm1/mm/memcontrol.c
===
--- cgroupmap-2.6.24-mm1.orig/mm/memcontrol.c
+++ cgroupmap-2.6.24-mm1/mm/memcontrol.c
@@ -950,19 +950,6 @@ static ssize_t mem_force_empty_write(str
return ret;
 }
 
-/*
- * Note: This should be removed if cgroup supports write-only file.
- */
-
-static ssize_t mem_force_empty_read(struct cgroup *cont,
-   struct cftype *cft,
-   struct file *file, char __user *userbuf,
-   size_t nbytes, loff_t *ppos)
-{
-   return -EINVAL;
-}
-
-
 static const struct mem_cgroup_stat_desc {
const char *msg;
u64 unit;
@@ -1001,15 +988,17 @@ static int mem_control_stat_show(struct 
 
 static struct cftype mem_cgroup_files[] = {
{
-   .name = "usage_in_bytes",
+   .name = "usage",
.private = RES_USAGE,
.read_uint = mem_cgroup_read,
+   .desc = "Memory usage in bytes",
},
{
-   .name = "limit_in_bytes",
+   .name = "limit",
.private = RES_LIMIT,
.write = mem_cgroup_write,
.read_uint = mem_cgroup_read,
+   .desc = "Memory limit in bytes",
},
{
.name = "failcnt",
@@ -1019,7 +1008,7 @@ static struct cftype mem_cgroup_files[] 
{
.name = "force_empty",
.write = mem_force_empty_write,
-   .read = mem_force_empty_read,
+   .desc = "Write to this file to forget all memory charges"
},
{
.name = "stat",

--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH 5/7] CGroup API: Use read_uint in memory controller

2008-02-15 Thread Paul Menage
Update the memory controller to use read_uint for its
limit/usage/failcnt control files, calling the new
res_counter_read_uint() function. This allows the files to show up as
u64 rather than string in the cgroup.api file.

Signed-off-by: Paul Menage <[EMAIL PROTECTED]>

---
 mm/memcontrol.c |   15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

Index: cgroupmap-2.6.24-mm1/mm/memcontrol.c
===
--- cgroupmap-2.6.24-mm1.orig/mm/memcontrol.c
+++ cgroupmap-2.6.24-mm1/mm/memcontrol.c
@@ -922,13 +922,10 @@ int mem_cgroup_write_strategy(char *buf,
return 0;
 }
 
-static ssize_t mem_cgroup_read(struct cgroup *cont,
-   struct cftype *cft, struct file *file,
-   char __user *userbuf, size_t nbytes, loff_t *ppos)
+static u64 mem_cgroup_read(struct cgroup *cont, struct cftype *cft)
 {
-   return res_counter_read(_cgroup_from_cont(cont)->res,
-   cft->private, userbuf, nbytes, ppos,
-   NULL);
+   return res_counter_read_uint(_cgroup_from_cont(cont)->res,
+cft->private);
 }
 
 static ssize_t mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
@@ -1006,18 +1003,18 @@ static struct cftype mem_cgroup_files[] 
{
.name = "usage_in_bytes",
.private = RES_USAGE,
-   .read = mem_cgroup_read,
+   .read_uint = mem_cgroup_read,
},
{
.name = "limit_in_bytes",
.private = RES_LIMIT,
.write = mem_cgroup_write,
-   .read = mem_cgroup_read,
+   .read_uint = mem_cgroup_read,
},
{
.name = "failcnt",
.private = RES_FAILCNT,
-   .read = mem_cgroup_read,
+   .read_uint = mem_cgroup_read,
},
{
.name = "force_empty",

--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH 1/7] CGroup API: Add cgroup.api control file

2008-02-15 Thread Paul Menage
Add a cgroup.api control file in every cgroup directory. This reports
for each control file the type of data represented by that control
file, and a user-friendly description of the contents.

A secondary effect of this patch is to add the "cgroup." prefix in
front of all cgroup-provided control files. This will reduce the
chance of future control files clashing with user-provided names.

Signed-off-by: Paul Menage <[EMAIL PROTECTED]>

---
 include/linux/cgroup.h |   21 +++
 kernel/cgroup.c|  133 ++---
 2 files changed, 148 insertions(+), 6 deletions(-)

Index: cgroupmap-2.6.24-mm1/include/linux/cgroup.h
===
--- cgroupmap-2.6.24-mm1.orig/include/linux/cgroup.h
+++ cgroupmap-2.6.24-mm1/include/linux/cgroup.h
@@ -179,12 +179,33 @@ struct css_set {
  * - the 'cftype' of the file is file->f_dentry->d_fsdata
  */
 
+/*
+ * The various types of control file that are reported in the
+ * cgroup.api file. "String" is a catch-all default, but should only
+ * be used for special cases. If you use the appropriate accessors
+ * (such as "read_uint") in your control file, then you can leave this
+ * as 0 (CGROUP_FILE_UNKNOWN) and let cgroup figure out the right type.
+ */
+enum cgroup_file_type {
+   CGROUP_FILE_UNKNOWN = 0,
+   CGROUP_FILE_VOID,
+   CGROUP_FILE_U64,
+   CGROUP_FILE_STRING,
+};
+
 #define MAX_CFTYPE_NAME 64
 struct cftype {
/* By convention, the name should begin with the name of the
 * subsystem, followed by a period */
char name[MAX_CFTYPE_NAME];
int private;
+
+   /* The type of a file - reported in the cgroup.api file */
+   enum cgroup_file_type type;
+
+   /* Human-readable description of the file */
+   const char *desc;
+
int (*open) (struct inode *inode, struct file *file);
ssize_t (*read) (struct cgroup *cont, struct cftype *cft,
 struct file *file,
Index: cgroupmap-2.6.24-mm1/kernel/cgroup.c
===
--- cgroupmap-2.6.24-mm1.orig/kernel/cgroup.c
+++ cgroupmap-2.6.24-mm1/kernel/cgroup.c
@@ -1301,6 +1301,7 @@ enum cgroup_filetype {
FILE_NOTIFY_ON_RELEASE,
FILE_RELEASABLE,
FILE_RELEASE_AGENT,
+   FILE_API,
 };
 
 static ssize_t cgroup_write_uint(struct cgroup *cgrp, struct cftype *cft,
@@ -1611,17 +1612,21 @@ static int cgroup_create_dir(struct cgro
 }
 
 int cgroup_add_file(struct cgroup *cgrp,
-  struct cgroup_subsys *subsys,
-  const struct cftype *cft)
+   struct cgroup_subsys *subsys,
+   const struct cftype *cft)
 {
struct dentry *dir = cgrp->dentry;
struct dentry *dentry;
int error;
 
char name[MAX_CGROUP_TYPE_NAMELEN + MAX_CFTYPE_NAME + 2] = { 0 };
-   if (subsys && !test_bit(ROOT_NOPREFIX, >root->flags)) {
-   strcpy(name, subsys->name);
-   strcat(name, ".");
+   if (!test_bit(ROOT_NOPREFIX, >root->flags)) {
+   if (subsys) {
+   strcpy(name, subsys->name);
+   strcat(name, ".");
+   } else {
+   strcpy(name, "cgroup.");
+   }
}
strcat(name, cft->name);
BUG_ON(!mutex_is_locked(>d_inode->i_mutex));
@@ -2126,6 +2131,110 @@ static u64 cgroup_read_releasable(struct
return test_bit(CGRP_RELEASABLE, >flags);
 }
 
+static const struct file_operations cgroup_api_file_operations = {
+   .read = seq_read,
+   .llseek = seq_lseek,
+   .release = seq_release,
+};
+
+/*
+ * cgroup.api is a file in each cgroup directory that gives the types
+ * and descriptions of the various control files in that directory.
+ */
+
+static struct dentry *cgroup_api_advance(struct dentry *d, int advance)
+{
+   struct dentry *parent = d->d_parent;
+   struct list_head *l = >d_u.d_child;
+   while (true) {
+   if (advance)
+   l = l->next;
+   advance = true;
+   /* Did we reach the end of the directory? */
+   if (l == >d_subdirs)
+   return NULL;
+   d = container_of(l, struct dentry, d_u.d_child);
+   /* Skip cgroup subdirectories */
+   if (d->d_inode && S_ISREG(d->d_inode->i_mode))
+   return d;
+   }
+}
+
+static void *cgroup_api_start(struct seq_file *sf, loff_t *pos)
+{
+   struct dentry *parent = sf->private;
+   struct dentry *d;
+   loff_t l = 0;
+   spin_lock(_lock);
+   if (list_empty(>d_subdirs))
+   return NULL;
+   d = container_of(parent->d_subdirs.next, struct dentry, d_

[RFC][PATCH 0/7] CGroup API: More structured API for CGroups control files

2008-02-15 Thread Paul Menage

This set of patches makes the Control Groups API more structured and
self-describing.

1) Allows control files to be associated with data types such as
"u64", "string", "map", etc. These types show up in a new cgroup.api
file in each cgroup directory, along with a user-readable
string. Files that use cgroup-provided data accessors have these file
types inferred automatically.

2) Moves various files in cpusets and the memory controller from using
custom-written file handlers to cgroup-defined handlers

3) Adds the "cgroup." prefix for existing cgroup-provided control
files (tasks, release_agent, releasable, notify_on_release). Given
than we've already had 2.6.24 go out without this prefix, I guess this
could be a little contentious - but it seems like a good move to
prevent name clashes in the future. (Note that this doesn't affect
mounting the legacy cpuset filesystem, since the compatibility layer
disables all prefixes when mounted with filesystem type "cpuset"). If
people object too strongly, we could just make this the case for *new*
cgroup API files, but I think this is a case where consistency would
be better than compatibility - I'd be surprised if anyone has written
major legacy apps yet that rely on 2.6.24 cgroup control file names.


There are various motivations for this:

1) We said at Kernel Summit '07 that the cgroup API wouldn't be
allowed to spiral into an arbitrary mess of ad-hoc APIs. Having simple
ways to represent common data types makes this easier. (E.g. one
standard way to report a map of string,u64 pairs to userspace.)

2) People were divided on the issue of binary APIs versus ASCII APIs
for control groups. Compatibility with the existing cpusets system,
and ease of experimentation, were two important reasons for going with
the current. ASCII API. But by having structured control files, we can
open the path towards having more efficient binary APIs for simpler
and more efficient programmatic access too, without any additional
modifications required from the subsystems themselves.

My plans for this potential binary API are a little hazy at this
point, but they might go something like opening a cgroup.bin file in a
cgroup directory, and writing the names of the control files that you
were interested in; then a read on that file handle would return the
contents of the given control files in a single read in a simple
binary format. (Better suggestions are welcome). Regardless, getting a
good typing/structure on the control files is an important first step
if we want to go in that direction.

3) The memory controller currently has files with the "_in_bytes"
suffix, on the grounds that otherwise it's not obvious to a new user
what they represent. By moving the description to a auto-generated API
file, we can remove this (IMO) inelegant suffix.


--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH 2/7] CGroup API: Add cgroup map data type

2008-02-15 Thread Paul Menage
Adds a new type of supported control file representation, a map from
strings to u64 values.

Signed-off-by: Paul Menage <[EMAIL PROTECTED]>

---
 include/linux/cgroup.h |   19 +++
 kernel/cgroup.c|   61 -
 2 files changed, 79 insertions(+), 1 deletion(-)

Index: cgroupmap-2.6.24-mm1/include/linux/cgroup.h
===
--- cgroupmap-2.6.24-mm1.orig/include/linux/cgroup.h
+++ cgroupmap-2.6.24-mm1/include/linux/cgroup.h
@@ -191,6 +191,17 @@ enum cgroup_file_type {
CGROUP_FILE_VOID,
CGROUP_FILE_U64,
CGROUP_FILE_STRING,
+   CGROUP_FILE_MAP,
+};
+
+/*
+ * cgroup_map_cb is an abstract callback API for reporting map-valued
+ * control files
+ */
+
+struct cgroup_map_cb {
+   int (*fill)(struct cgroup_map_cb *cb, const char *key, u64 value);
+   void *state;
 };
 
 #define MAX_CFTYPE_NAME 64
@@ -215,6 +226,14 @@ struct cftype {
 * single integer. Use it in place of read()
 */
u64 (*read_uint) (struct cgroup *cont, struct cftype *cft);
+   /*
+* read_map() is used for defining a map of key/value
+* pairs. It should call cb->fill(cb, key, value) for each
+* entry.
+*/
+   int (*read_map) (struct cgroup *cont, struct cftype *cft,
+struct cgroup_map_cb *cb);
+
ssize_t (*write) (struct cgroup *cont, struct cftype *cft,
  struct file *file,
  const char __user *buf, size_t nbytes, loff_t *ppos);
Index: cgroupmap-2.6.24-mm1/kernel/cgroup.c
===
--- cgroupmap-2.6.24-mm1.orig/kernel/cgroup.c
+++ cgroupmap-2.6.24-mm1/kernel/cgroup.c
@@ -1488,6 +1488,46 @@ static ssize_t cgroup_file_read(struct f
return -EINVAL;
 }
 
+/*
+ * seqfile ops/methods for returning structured data. Currently just
+ * supports string->u64 maps, but can be extended in future.
+ */
+
+struct cgroup_seqfile_state {
+   struct cftype *cft;
+   struct cgroup *cgroup;
+};
+
+static int cgroup_map_add(struct cgroup_map_cb *cb, const char *key, u64 value)
+{
+   struct seq_file *sf = cb->state;
+   return seq_printf(sf, "%s: %llu\n", key, value);
+}
+
+static int cgroup_seqfile_show(struct seq_file *m, void *arg)
+{
+   struct cgroup_seqfile_state *state = m->private;
+   struct cftype *cft = state->cft;
+   struct cgroup_map_cb cb = {
+   .fill = cgroup_map_add,
+   .state = m,
+   };
+   if (cft->read_map) {
+   return cft->read_map(state->cgroup, cft, );
+   } else {
+   BUG();
+   }
+}
+
+int cgroup_seqfile_release(struct inode *inode, struct file *file)
+{
+   struct seq_file *seq = file->private_data;
+   kfree(seq->private);
+   return single_release(inode, file);
+}
+
+static struct file_operations cgroup_seqfile_operations;
+
 static int cgroup_file_open(struct inode *inode, struct file *file)
 {
int err;
@@ -1500,7 +1540,18 @@ static int cgroup_file_open(struct inode
cft = __d_cft(file->f_dentry);
if (!cft)
return -ENODEV;
-   if (cft->open)
+   if (cft->read_map) {
+   struct cgroup_seqfile_state *state =
+   kzalloc(sizeof(*state), GFP_USER);
+   if (!state)
+   return -ENOMEM;
+   state->cft = cft;
+   state->cgroup = __d_cgrp(file->f_dentry->d_parent);
+   file->f_op = _seqfile_operations;
+   err = single_open(file, cgroup_seqfile_show, state);
+   if (err < 0)
+   kfree(state);
+   } else if (cft->open)
err = cft->open(inode, file);
else
err = 0;
@@ -1539,6 +1590,12 @@ static struct file_operations cgroup_fil
.release = cgroup_file_release,
 };
 
+static struct file_operations cgroup_seqfile_operations = {
+   .read = seq_read,
+   .llseek = seq_lseek,
+   .release = cgroup_seqfile_release,
+};
+
 static struct inode_operations cgroup_dir_inode_operations = {
.lookup = simple_lookup,
.mkdir = cgroup_mkdir,
@@ -2206,6 +2263,8 @@ static int cgroup_api_show(struct seq_fi
if (type == CGROUP_FILE_UNKNOWN) {
if (cft->read_uint)
type = CGROUP_FILE_U64;
+   else if (cft->read_map)
+   type = CGROUP_FILE_MAP;
else if (cft->read)
type = CGROUP_FILE_STRING;
else if (!cft->open)

--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH 3/7] CGroup API: Use cgroup map for memcontrol stats file

2008-02-15 Thread Paul Menage
Remove the seq_file boilerplate used to construct the memcontrol stats
map, and instead use the new map representation for cgroup control
files

Signed-off-by: Paul Menage <[EMAIL PROTECTED]>

---
 mm/memcontrol.c |   30 ++
 1 file changed, 6 insertions(+), 24 deletions(-)

Index: cgroupmap-2.6.24-mm1/mm/memcontrol.c
===
--- cgroupmap-2.6.24-mm1.orig/mm/memcontrol.c
+++ cgroupmap-2.6.24-mm1/mm/memcontrol.c
@@ -974,9 +974,9 @@ static const struct mem_cgroup_stat_desc
[MEM_CGROUP_STAT_RSS] = { "rss", PAGE_SIZE, },
 };
 
-static int mem_control_stat_show(struct seq_file *m, void *arg)
+static int mem_control_stat_show(struct cgroup *cont, struct cftype *cft,
+struct cgroup_map_cb *cb)
 {
-   struct cgroup *cont = m->private;
struct mem_cgroup *mem_cont = mem_cgroup_from_cont(cont);
struct mem_cgroup_stat *stat = _cont->stat;
int i;
@@ -986,8 +986,7 @@ static int mem_control_stat_show(struct 
 
val = mem_cgroup_read_stat(stat, i);
val *= mem_cgroup_stat_desc[i].unit;
-   seq_printf(m, "%s %lld\n", mem_cgroup_stat_desc[i].msg,
-   (long long)val);
+   cb->fill(cb, mem_cgroup_stat_desc[i].msg, val);
}
/* showing # of active pages */
{
@@ -997,29 +996,12 @@ static int mem_control_stat_show(struct 
MEM_CGROUP_ZSTAT_INACTIVE);
active = mem_cgroup_get_all_zonestat(mem_cont,
MEM_CGROUP_ZSTAT_ACTIVE);
-   seq_printf(m, "active %ld\n", (active) * PAGE_SIZE);
-   seq_printf(m, "inactive %ld\n", (inactive) * PAGE_SIZE);
+   cb->fill(cb, "active", (active) * PAGE_SIZE);
+   cb->fill(cb, "inactive", (inactive) * PAGE_SIZE);
}
return 0;
 }
 
-static const struct file_operations mem_control_stat_file_operations = {
-   .read = seq_read,
-   .llseek = seq_lseek,
-   .release = single_release,
-};
-
-static int mem_control_stat_open(struct inode *unused, struct file *file)
-{
-   /* XXX __d_cont */
-   struct cgroup *cont = file->f_dentry->d_parent->d_fsdata;
-
-   file->f_op = _control_stat_file_operations;
-   return single_open(file, mem_control_stat_show, cont);
-}
-
-
-
 static struct cftype mem_cgroup_files[] = {
{
.name = "usage_in_bytes",
@@ -1044,7 +1026,7 @@ static struct cftype mem_cgroup_files[] 
},
{
.name = "stat",
-   .open = mem_control_stat_open,
+   .read_map = mem_control_stat_show,
},
 };
 

--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH 4/7] CGroup API: Add res_counter_read_uint()

2008-02-15 Thread Paul Menage
Adds a function for returning the value of a resource counter member,
in a form suitable for use in a cgroup read_uint control file method.

Signed-off-by: Paul Menage <[EMAIL PROTECTED]>

---
 include/linux/res_counter.h |1 +
 kernel/res_counter.c|5 +
 2 files changed, 6 insertions(+)

Index: cgroupmap-2.6.24-mm1/include/linux/res_counter.h
===
--- cgroupmap-2.6.24-mm1.orig/include/linux/res_counter.h
+++ cgroupmap-2.6.24-mm1/include/linux/res_counter.h
@@ -54,6 +54,7 @@ struct res_counter {
 ssize_t res_counter_read(struct res_counter *counter, int member,
const char __user *buf, size_t nbytes, loff_t *pos,
int (*read_strategy)(unsigned long long val, char *s));
+u64 res_counter_read_uint(struct res_counter *counter, int member);
 ssize_t res_counter_write(struct res_counter *counter, int member,
const char __user *buf, size_t nbytes, loff_t *pos,
int (*write_strategy)(char *buf, unsigned long long *val));
Index: cgroupmap-2.6.24-mm1/kernel/res_counter.c
===
--- cgroupmap-2.6.24-mm1.orig/kernel/res_counter.c
+++ cgroupmap-2.6.24-mm1/kernel/res_counter.c
@@ -92,6 +92,11 @@ ssize_t res_counter_read(struct res_coun
pos, buf, s - buf);
 }
 
+u64 res_counter_read_uint(struct res_counter *counter, int member)
+{
+   return *res_counter_member(counter, member);
+}
+
 ssize_t res_counter_write(struct res_counter *counter, int member,
const char __user *userbuf, size_t nbytes, loff_t *pos,
int (*write_strategy)(char *st_buf, unsigned long long *val))

--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH 7/7] CGroup API: Update cpusets to use cgroup structured file API

2008-02-15 Thread Paul Menage
Many of the cpusets control files are simple integer values, which
don't require the overhead of memory allocations for reads and writes.

Move the handlers for these control files into cpuset_read_uint() and
cpuset_write_uint(). This also has the advantage that the control
files show up as "u64" rather than "string" in the cgroup.api file.

Signed-off-by: Paul Menage <[EMAIL PROTECTED]>

---
 kernel/cpuset.c |  158 +---
 1 file changed, 83 insertions(+), 75 deletions(-)

Index: cgroupmap-2.6.24-mm1/kernel/cpuset.c
===
--- cgroupmap-2.6.24-mm1.orig/kernel/cpuset.c
+++ cgroupmap-2.6.24-mm1/kernel/cpuset.c
@@ -999,19 +999,6 @@ int current_cpuset_is_being_rebound(void
 }
 
 /*
- * Call with cgroup_mutex held.
- */
-
-static int update_memory_pressure_enabled(struct cpuset *cs, char *buf)
-{
-   if (simple_strtoul(buf, NULL, 10) != 0)
-   cpuset_memory_pressure_enabled = 1;
-   else
-   cpuset_memory_pressure_enabled = 0;
-   return 0;
-}
-
-/*
  * update_flag - read a 0 or a 1 in a file and update associated flag
  * bit:the bit to update (CS_CPU_EXCLUSIVE, CS_MEM_EXCLUSIVE,
  * CS_SCHED_LOAD_BALANCE,
@@ -1023,15 +1010,13 @@ static int update_memory_pressure_enable
  * Call with cgroup_mutex held.
  */
 
-static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs, char *buf)
+static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
+  int turning_on)
 {
-   int turning_on;
struct cpuset trialcs;
int err;
int cpus_nonempty, balance_flag_changed;
 
-   turning_on = (simple_strtoul(buf, NULL, 10) != 0);
-
trialcs = *cs;
if (turning_on)
set_bit(bit, );
@@ -1247,44 +1232,66 @@ static ssize_t cpuset_common_file_write(
case FILE_MEMLIST:
retval = update_nodemask(cs, buffer);
break;
+   default:
+   retval = -EINVAL;
+   goto out2;
+   }
+
+   if (retval == 0)
+   retval = nbytes;
+out2:
+   cgroup_unlock();
+out1:
+   kfree(buffer);
+   return retval;
+}
+
+static int cpuset_write_uint(struct cgroup *cgrp, struct cftype *cft, u64 val)
+{
+   int retval = 0;
+   struct cpuset *cs = cgroup_cs(cgrp);
+   cpuset_filetype_t type = cft->private;
+
+   cgroup_lock();
+
+   if (cgroup_is_removed(cgrp)) {
+   cgroup_unlock();
+   return -ENODEV;
+   }
+
+   switch (type) {
case FILE_CPU_EXCLUSIVE:
-   retval = update_flag(CS_CPU_EXCLUSIVE, cs, buffer);
+   retval = update_flag(CS_CPU_EXCLUSIVE, cs, val);
break;
case FILE_MEM_EXCLUSIVE:
-   retval = update_flag(CS_MEM_EXCLUSIVE, cs, buffer);
+   retval = update_flag(CS_MEM_EXCLUSIVE, cs, val);
break;
case FILE_SCHED_LOAD_BALANCE:
-   retval = update_flag(CS_SCHED_LOAD_BALANCE, cs, buffer);
+   retval = update_flag(CS_SCHED_LOAD_BALANCE, cs, val);
break;
case FILE_MEMORY_MIGRATE:
-   retval = update_flag(CS_MEMORY_MIGRATE, cs, buffer);
+   retval = update_flag(CS_MEMORY_MIGRATE, cs, val);
break;
case FILE_MEMORY_PRESSURE_ENABLED:
-   retval = update_memory_pressure_enabled(cs, buffer);
+   cpuset_memory_pressure_enabled = val;
break;
case FILE_MEMORY_PRESSURE:
retval = -EACCES;
break;
case FILE_SPREAD_PAGE:
-   retval = update_flag(CS_SPREAD_PAGE, cs, buffer);
+   retval = update_flag(CS_SPREAD_PAGE, cs, val);
cs->mems_generation = cpuset_mems_generation++;
break;
case FILE_SPREAD_SLAB:
-   retval = update_flag(CS_SPREAD_SLAB, cs, buffer);
+   retval = update_flag(CS_SPREAD_SLAB, cs, val);
cs->mems_generation = cpuset_mems_generation++;
break;
default:
retval = -EINVAL;
-   goto out2;
+   break;
}
-
-   if (retval == 0)
-   retval = nbytes;
-out2:
cgroup_unlock();
-out1:
-   kfree(buffer);
-   return retval;
+   return -EINVAL;
 }
 
 /*
@@ -1345,30 +1352,6 @@ static ssize_t cpuset_common_file_read(s
case FILE_MEMLIST:
s += cpuset_sprintf_memlist(s, cs);
break;
-   case FILE_CPU_EXCLUSIVE:
-   *s++ = is_cpu_exclusive(cs) ? '1' : '0';
-   break;
-   case FILE_MEM_EXCLUSIVE:
-   *s++ = is_mem_exclusive(cs) ? '1' : '0';
-   break;
-   case FILE_SCHED_LOAD_BALANCE:
-   *s++ = is_sched_load_balance(cs) ? '1' : '0';

[RFC][PATCH 1/7] CGroup API: Add cgroup.api control file

2008-02-15 Thread Paul Menage
Add a cgroup.api control file in every cgroup directory. This reports
for each control file the type of data represented by that control
file, and a user-friendly description of the contents.

A secondary effect of this patch is to add the cgroup. prefix in
front of all cgroup-provided control files. This will reduce the
chance of future control files clashing with user-provided names.

Signed-off-by: Paul Menage [EMAIL PROTECTED]

---
 include/linux/cgroup.h |   21 +++
 kernel/cgroup.c|  133 ++---
 2 files changed, 148 insertions(+), 6 deletions(-)

Index: cgroupmap-2.6.24-mm1/include/linux/cgroup.h
===
--- cgroupmap-2.6.24-mm1.orig/include/linux/cgroup.h
+++ cgroupmap-2.6.24-mm1/include/linux/cgroup.h
@@ -179,12 +179,33 @@ struct css_set {
  * - the 'cftype' of the file is file-f_dentry-d_fsdata
  */
 
+/*
+ * The various types of control file that are reported in the
+ * cgroup.api file. String is a catch-all default, but should only
+ * be used for special cases. If you use the appropriate accessors
+ * (such as read_uint) in your control file, then you can leave this
+ * as 0 (CGROUP_FILE_UNKNOWN) and let cgroup figure out the right type.
+ */
+enum cgroup_file_type {
+   CGROUP_FILE_UNKNOWN = 0,
+   CGROUP_FILE_VOID,
+   CGROUP_FILE_U64,
+   CGROUP_FILE_STRING,
+};
+
 #define MAX_CFTYPE_NAME 64
 struct cftype {
/* By convention, the name should begin with the name of the
 * subsystem, followed by a period */
char name[MAX_CFTYPE_NAME];
int private;
+
+   /* The type of a file - reported in the cgroup.api file */
+   enum cgroup_file_type type;
+
+   /* Human-readable description of the file */
+   const char *desc;
+
int (*open) (struct inode *inode, struct file *file);
ssize_t (*read) (struct cgroup *cont, struct cftype *cft,
 struct file *file,
Index: cgroupmap-2.6.24-mm1/kernel/cgroup.c
===
--- cgroupmap-2.6.24-mm1.orig/kernel/cgroup.c
+++ cgroupmap-2.6.24-mm1/kernel/cgroup.c
@@ -1301,6 +1301,7 @@ enum cgroup_filetype {
FILE_NOTIFY_ON_RELEASE,
FILE_RELEASABLE,
FILE_RELEASE_AGENT,
+   FILE_API,
 };
 
 static ssize_t cgroup_write_uint(struct cgroup *cgrp, struct cftype *cft,
@@ -1611,17 +1612,21 @@ static int cgroup_create_dir(struct cgro
 }
 
 int cgroup_add_file(struct cgroup *cgrp,
-  struct cgroup_subsys *subsys,
-  const struct cftype *cft)
+   struct cgroup_subsys *subsys,
+   const struct cftype *cft)
 {
struct dentry *dir = cgrp-dentry;
struct dentry *dentry;
int error;
 
char name[MAX_CGROUP_TYPE_NAMELEN + MAX_CFTYPE_NAME + 2] = { 0 };
-   if (subsys  !test_bit(ROOT_NOPREFIX, cgrp-root-flags)) {
-   strcpy(name, subsys-name);
-   strcat(name, .);
+   if (!test_bit(ROOT_NOPREFIX, cgrp-root-flags)) {
+   if (subsys) {
+   strcpy(name, subsys-name);
+   strcat(name, .);
+   } else {
+   strcpy(name, cgroup.);
+   }
}
strcat(name, cft-name);
BUG_ON(!mutex_is_locked(dir-d_inode-i_mutex));
@@ -2126,6 +2131,110 @@ static u64 cgroup_read_releasable(struct
return test_bit(CGRP_RELEASABLE, cgrp-flags);
 }
 
+static const struct file_operations cgroup_api_file_operations = {
+   .read = seq_read,
+   .llseek = seq_lseek,
+   .release = seq_release,
+};
+
+/*
+ * cgroup.api is a file in each cgroup directory that gives the types
+ * and descriptions of the various control files in that directory.
+ */
+
+static struct dentry *cgroup_api_advance(struct dentry *d, int advance)
+{
+   struct dentry *parent = d-d_parent;
+   struct list_head *l = d-d_u.d_child;
+   while (true) {
+   if (advance)
+   l = l-next;
+   advance = true;
+   /* Did we reach the end of the directory? */
+   if (l == parent-d_subdirs)
+   return NULL;
+   d = container_of(l, struct dentry, d_u.d_child);
+   /* Skip cgroup subdirectories */
+   if (d-d_inode  S_ISREG(d-d_inode-i_mode))
+   return d;
+   }
+}
+
+static void *cgroup_api_start(struct seq_file *sf, loff_t *pos)
+{
+   struct dentry *parent = sf-private;
+   struct dentry *d;
+   loff_t l = 0;
+   spin_lock(dcache_lock);
+   if (list_empty(parent-d_subdirs))
+   return NULL;
+   d = container_of(parent-d_subdirs.next, struct dentry, d_u.d_child);
+   d = cgroup_api_advance(d, 0);
+   while (d  l  *pos) {
+   (*pos)++;
+   d = cgroup_api_advance(d, 1

[RFC][PATCH 5/7] CGroup API: Use read_uint in memory controller

2008-02-15 Thread Paul Menage
Update the memory controller to use read_uint for its
limit/usage/failcnt control files, calling the new
res_counter_read_uint() function. This allows the files to show up as
u64 rather than string in the cgroup.api file.

Signed-off-by: Paul Menage [EMAIL PROTECTED]

---
 mm/memcontrol.c |   15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

Index: cgroupmap-2.6.24-mm1/mm/memcontrol.c
===
--- cgroupmap-2.6.24-mm1.orig/mm/memcontrol.c
+++ cgroupmap-2.6.24-mm1/mm/memcontrol.c
@@ -922,13 +922,10 @@ int mem_cgroup_write_strategy(char *buf,
return 0;
 }
 
-static ssize_t mem_cgroup_read(struct cgroup *cont,
-   struct cftype *cft, struct file *file,
-   char __user *userbuf, size_t nbytes, loff_t *ppos)
+static u64 mem_cgroup_read(struct cgroup *cont, struct cftype *cft)
 {
-   return res_counter_read(mem_cgroup_from_cont(cont)-res,
-   cft-private, userbuf, nbytes, ppos,
-   NULL);
+   return res_counter_read_uint(mem_cgroup_from_cont(cont)-res,
+cft-private);
 }
 
 static ssize_t mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
@@ -1006,18 +1003,18 @@ static struct cftype mem_cgroup_files[] 
{
.name = usage_in_bytes,
.private = RES_USAGE,
-   .read = mem_cgroup_read,
+   .read_uint = mem_cgroup_read,
},
{
.name = limit_in_bytes,
.private = RES_LIMIT,
.write = mem_cgroup_write,
-   .read = mem_cgroup_read,
+   .read_uint = mem_cgroup_read,
},
{
.name = failcnt,
.private = RES_FAILCNT,
-   .read = mem_cgroup_read,
+   .read_uint = mem_cgroup_read,
},
{
.name = force_empty,

--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH 6/7] CGroup API: Use descriptions for memory controller API files

2008-02-15 Thread Paul Menage
This patch adds descriptions to the memory controller API files to
indicate that the usage/limit are in bytes; the names of the control
files can then be simplified to usage/limit.

Also removes the unnecessary mem_force_empty_read() function

Signed-off-by: Paul Menage [EMAIL PROTECTED]

---
 mm/memcontrol.c |   21 +
 1 file changed, 5 insertions(+), 16 deletions(-)

Index: cgroupmap-2.6.24-mm1/mm/memcontrol.c
===
--- cgroupmap-2.6.24-mm1.orig/mm/memcontrol.c
+++ cgroupmap-2.6.24-mm1/mm/memcontrol.c
@@ -950,19 +950,6 @@ static ssize_t mem_force_empty_write(str
return ret;
 }
 
-/*
- * Note: This should be removed if cgroup supports write-only file.
- */
-
-static ssize_t mem_force_empty_read(struct cgroup *cont,
-   struct cftype *cft,
-   struct file *file, char __user *userbuf,
-   size_t nbytes, loff_t *ppos)
-{
-   return -EINVAL;
-}
-
-
 static const struct mem_cgroup_stat_desc {
const char *msg;
u64 unit;
@@ -1001,15 +988,17 @@ static int mem_control_stat_show(struct 
 
 static struct cftype mem_cgroup_files[] = {
{
-   .name = usage_in_bytes,
+   .name = usage,
.private = RES_USAGE,
.read_uint = mem_cgroup_read,
+   .desc = Memory usage in bytes,
},
{
-   .name = limit_in_bytes,
+   .name = limit,
.private = RES_LIMIT,
.write = mem_cgroup_write,
.read_uint = mem_cgroup_read,
+   .desc = Memory limit in bytes,
},
{
.name = failcnt,
@@ -1019,7 +1008,7 @@ static struct cftype mem_cgroup_files[] 
{
.name = force_empty,
.write = mem_force_empty_write,
-   .read = mem_force_empty_read,
+   .desc = Write to this file to forget all memory charges
},
{
.name = stat,

--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH 0/7] CGroup API: More structured API for CGroups control files

2008-02-15 Thread Paul Menage

This set of patches makes the Control Groups API more structured and
self-describing.

1) Allows control files to be associated with data types such as
u64, string, map, etc. These types show up in a new cgroup.api
file in each cgroup directory, along with a user-readable
string. Files that use cgroup-provided data accessors have these file
types inferred automatically.

2) Moves various files in cpusets and the memory controller from using
custom-written file handlers to cgroup-defined handlers

3) Adds the cgroup. prefix for existing cgroup-provided control
files (tasks, release_agent, releasable, notify_on_release). Given
than we've already had 2.6.24 go out without this prefix, I guess this
could be a little contentious - but it seems like a good move to
prevent name clashes in the future. (Note that this doesn't affect
mounting the legacy cpuset filesystem, since the compatibility layer
disables all prefixes when mounted with filesystem type cpuset). If
people object too strongly, we could just make this the case for *new*
cgroup API files, but I think this is a case where consistency would
be better than compatibility - I'd be surprised if anyone has written
major legacy apps yet that rely on 2.6.24 cgroup control file names.


There are various motivations for this:

1) We said at Kernel Summit '07 that the cgroup API wouldn't be
allowed to spiral into an arbitrary mess of ad-hoc APIs. Having simple
ways to represent common data types makes this easier. (E.g. one
standard way to report a map of string,u64 pairs to userspace.)

2) People were divided on the issue of binary APIs versus ASCII APIs
for control groups. Compatibility with the existing cpusets system,
and ease of experimentation, were two important reasons for going with
the current. ASCII API. But by having structured control files, we can
open the path towards having more efficient binary APIs for simpler
and more efficient programmatic access too, without any additional
modifications required from the subsystems themselves.

My plans for this potential binary API are a little hazy at this
point, but they might go something like opening a cgroup.bin file in a
cgroup directory, and writing the names of the control files that you
were interested in; then a read on that file handle would return the
contents of the given control files in a single read in a simple
binary format. (Better suggestions are welcome). Regardless, getting a
good typing/structure on the control files is an important first step
if we want to go in that direction.

3) The memory controller currently has files with the _in_bytes
suffix, on the grounds that otherwise it's not obvious to a new user
what they represent. By moving the description to a auto-generated API
file, we can remove this (IMO) inelegant suffix.


--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH 3/7] CGroup API: Use cgroup map for memcontrol stats file

2008-02-15 Thread Paul Menage
Remove the seq_file boilerplate used to construct the memcontrol stats
map, and instead use the new map representation for cgroup control
files

Signed-off-by: Paul Menage [EMAIL PROTECTED]

---
 mm/memcontrol.c |   30 ++
 1 file changed, 6 insertions(+), 24 deletions(-)

Index: cgroupmap-2.6.24-mm1/mm/memcontrol.c
===
--- cgroupmap-2.6.24-mm1.orig/mm/memcontrol.c
+++ cgroupmap-2.6.24-mm1/mm/memcontrol.c
@@ -974,9 +974,9 @@ static const struct mem_cgroup_stat_desc
[MEM_CGROUP_STAT_RSS] = { rss, PAGE_SIZE, },
 };
 
-static int mem_control_stat_show(struct seq_file *m, void *arg)
+static int mem_control_stat_show(struct cgroup *cont, struct cftype *cft,
+struct cgroup_map_cb *cb)
 {
-   struct cgroup *cont = m-private;
struct mem_cgroup *mem_cont = mem_cgroup_from_cont(cont);
struct mem_cgroup_stat *stat = mem_cont-stat;
int i;
@@ -986,8 +986,7 @@ static int mem_control_stat_show(struct 
 
val = mem_cgroup_read_stat(stat, i);
val *= mem_cgroup_stat_desc[i].unit;
-   seq_printf(m, %s %lld\n, mem_cgroup_stat_desc[i].msg,
-   (long long)val);
+   cb-fill(cb, mem_cgroup_stat_desc[i].msg, val);
}
/* showing # of active pages */
{
@@ -997,29 +996,12 @@ static int mem_control_stat_show(struct 
MEM_CGROUP_ZSTAT_INACTIVE);
active = mem_cgroup_get_all_zonestat(mem_cont,
MEM_CGROUP_ZSTAT_ACTIVE);
-   seq_printf(m, active %ld\n, (active) * PAGE_SIZE);
-   seq_printf(m, inactive %ld\n, (inactive) * PAGE_SIZE);
+   cb-fill(cb, active, (active) * PAGE_SIZE);
+   cb-fill(cb, inactive, (inactive) * PAGE_SIZE);
}
return 0;
 }
 
-static const struct file_operations mem_control_stat_file_operations = {
-   .read = seq_read,
-   .llseek = seq_lseek,
-   .release = single_release,
-};
-
-static int mem_control_stat_open(struct inode *unused, struct file *file)
-{
-   /* XXX __d_cont */
-   struct cgroup *cont = file-f_dentry-d_parent-d_fsdata;
-
-   file-f_op = mem_control_stat_file_operations;
-   return single_open(file, mem_control_stat_show, cont);
-}
-
-
-
 static struct cftype mem_cgroup_files[] = {
{
.name = usage_in_bytes,
@@ -1044,7 +1026,7 @@ static struct cftype mem_cgroup_files[] 
},
{
.name = stat,
-   .open = mem_control_stat_open,
+   .read_map = mem_control_stat_show,
},
 };
 

--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH 2/7] CGroup API: Add cgroup map data type

2008-02-15 Thread Paul Menage
Adds a new type of supported control file representation, a map from
strings to u64 values.

Signed-off-by: Paul Menage [EMAIL PROTECTED]

---
 include/linux/cgroup.h |   19 +++
 kernel/cgroup.c|   61 -
 2 files changed, 79 insertions(+), 1 deletion(-)

Index: cgroupmap-2.6.24-mm1/include/linux/cgroup.h
===
--- cgroupmap-2.6.24-mm1.orig/include/linux/cgroup.h
+++ cgroupmap-2.6.24-mm1/include/linux/cgroup.h
@@ -191,6 +191,17 @@ enum cgroup_file_type {
CGROUP_FILE_VOID,
CGROUP_FILE_U64,
CGROUP_FILE_STRING,
+   CGROUP_FILE_MAP,
+};
+
+/*
+ * cgroup_map_cb is an abstract callback API for reporting map-valued
+ * control files
+ */
+
+struct cgroup_map_cb {
+   int (*fill)(struct cgroup_map_cb *cb, const char *key, u64 value);
+   void *state;
 };
 
 #define MAX_CFTYPE_NAME 64
@@ -215,6 +226,14 @@ struct cftype {
 * single integer. Use it in place of read()
 */
u64 (*read_uint) (struct cgroup *cont, struct cftype *cft);
+   /*
+* read_map() is used for defining a map of key/value
+* pairs. It should call cb-fill(cb, key, value) for each
+* entry.
+*/
+   int (*read_map) (struct cgroup *cont, struct cftype *cft,
+struct cgroup_map_cb *cb);
+
ssize_t (*write) (struct cgroup *cont, struct cftype *cft,
  struct file *file,
  const char __user *buf, size_t nbytes, loff_t *ppos);
Index: cgroupmap-2.6.24-mm1/kernel/cgroup.c
===
--- cgroupmap-2.6.24-mm1.orig/kernel/cgroup.c
+++ cgroupmap-2.6.24-mm1/kernel/cgroup.c
@@ -1488,6 +1488,46 @@ static ssize_t cgroup_file_read(struct f
return -EINVAL;
 }
 
+/*
+ * seqfile ops/methods for returning structured data. Currently just
+ * supports string-u64 maps, but can be extended in future.
+ */
+
+struct cgroup_seqfile_state {
+   struct cftype *cft;
+   struct cgroup *cgroup;
+};
+
+static int cgroup_map_add(struct cgroup_map_cb *cb, const char *key, u64 value)
+{
+   struct seq_file *sf = cb-state;
+   return seq_printf(sf, %s: %llu\n, key, value);
+}
+
+static int cgroup_seqfile_show(struct seq_file *m, void *arg)
+{
+   struct cgroup_seqfile_state *state = m-private;
+   struct cftype *cft = state-cft;
+   struct cgroup_map_cb cb = {
+   .fill = cgroup_map_add,
+   .state = m,
+   };
+   if (cft-read_map) {
+   return cft-read_map(state-cgroup, cft, cb);
+   } else {
+   BUG();
+   }
+}
+
+int cgroup_seqfile_release(struct inode *inode, struct file *file)
+{
+   struct seq_file *seq = file-private_data;
+   kfree(seq-private);
+   return single_release(inode, file);
+}
+
+static struct file_operations cgroup_seqfile_operations;
+
 static int cgroup_file_open(struct inode *inode, struct file *file)
 {
int err;
@@ -1500,7 +1540,18 @@ static int cgroup_file_open(struct inode
cft = __d_cft(file-f_dentry);
if (!cft)
return -ENODEV;
-   if (cft-open)
+   if (cft-read_map) {
+   struct cgroup_seqfile_state *state =
+   kzalloc(sizeof(*state), GFP_USER);
+   if (!state)
+   return -ENOMEM;
+   state-cft = cft;
+   state-cgroup = __d_cgrp(file-f_dentry-d_parent);
+   file-f_op = cgroup_seqfile_operations;
+   err = single_open(file, cgroup_seqfile_show, state);
+   if (err  0)
+   kfree(state);
+   } else if (cft-open)
err = cft-open(inode, file);
else
err = 0;
@@ -1539,6 +1590,12 @@ static struct file_operations cgroup_fil
.release = cgroup_file_release,
 };
 
+static struct file_operations cgroup_seqfile_operations = {
+   .read = seq_read,
+   .llseek = seq_lseek,
+   .release = cgroup_seqfile_release,
+};
+
 static struct inode_operations cgroup_dir_inode_operations = {
.lookup = simple_lookup,
.mkdir = cgroup_mkdir,
@@ -2206,6 +2263,8 @@ static int cgroup_api_show(struct seq_fi
if (type == CGROUP_FILE_UNKNOWN) {
if (cft-read_uint)
type = CGROUP_FILE_U64;
+   else if (cft-read_map)
+   type = CGROUP_FILE_MAP;
else if (cft-read)
type = CGROUP_FILE_STRING;
else if (!cft-open)

--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH 7/7] CGroup API: Update cpusets to use cgroup structured file API

2008-02-15 Thread Paul Menage
Many of the cpusets control files are simple integer values, which
don't require the overhead of memory allocations for reads and writes.

Move the handlers for these control files into cpuset_read_uint() and
cpuset_write_uint(). This also has the advantage that the control
files show up as u64 rather than string in the cgroup.api file.

Signed-off-by: Paul Menage [EMAIL PROTECTED]

---
 kernel/cpuset.c |  158 +---
 1 file changed, 83 insertions(+), 75 deletions(-)

Index: cgroupmap-2.6.24-mm1/kernel/cpuset.c
===
--- cgroupmap-2.6.24-mm1.orig/kernel/cpuset.c
+++ cgroupmap-2.6.24-mm1/kernel/cpuset.c
@@ -999,19 +999,6 @@ int current_cpuset_is_being_rebound(void
 }
 
 /*
- * Call with cgroup_mutex held.
- */
-
-static int update_memory_pressure_enabled(struct cpuset *cs, char *buf)
-{
-   if (simple_strtoul(buf, NULL, 10) != 0)
-   cpuset_memory_pressure_enabled = 1;
-   else
-   cpuset_memory_pressure_enabled = 0;
-   return 0;
-}
-
-/*
  * update_flag - read a 0 or a 1 in a file and update associated flag
  * bit:the bit to update (CS_CPU_EXCLUSIVE, CS_MEM_EXCLUSIVE,
  * CS_SCHED_LOAD_BALANCE,
@@ -1023,15 +1010,13 @@ static int update_memory_pressure_enable
  * Call with cgroup_mutex held.
  */
 
-static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs, char *buf)
+static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
+  int turning_on)
 {
-   int turning_on;
struct cpuset trialcs;
int err;
int cpus_nonempty, balance_flag_changed;
 
-   turning_on = (simple_strtoul(buf, NULL, 10) != 0);
-
trialcs = *cs;
if (turning_on)
set_bit(bit, trialcs.flags);
@@ -1247,44 +1232,66 @@ static ssize_t cpuset_common_file_write(
case FILE_MEMLIST:
retval = update_nodemask(cs, buffer);
break;
+   default:
+   retval = -EINVAL;
+   goto out2;
+   }
+
+   if (retval == 0)
+   retval = nbytes;
+out2:
+   cgroup_unlock();
+out1:
+   kfree(buffer);
+   return retval;
+}
+
+static int cpuset_write_uint(struct cgroup *cgrp, struct cftype *cft, u64 val)
+{
+   int retval = 0;
+   struct cpuset *cs = cgroup_cs(cgrp);
+   cpuset_filetype_t type = cft-private;
+
+   cgroup_lock();
+
+   if (cgroup_is_removed(cgrp)) {
+   cgroup_unlock();
+   return -ENODEV;
+   }
+
+   switch (type) {
case FILE_CPU_EXCLUSIVE:
-   retval = update_flag(CS_CPU_EXCLUSIVE, cs, buffer);
+   retval = update_flag(CS_CPU_EXCLUSIVE, cs, val);
break;
case FILE_MEM_EXCLUSIVE:
-   retval = update_flag(CS_MEM_EXCLUSIVE, cs, buffer);
+   retval = update_flag(CS_MEM_EXCLUSIVE, cs, val);
break;
case FILE_SCHED_LOAD_BALANCE:
-   retval = update_flag(CS_SCHED_LOAD_BALANCE, cs, buffer);
+   retval = update_flag(CS_SCHED_LOAD_BALANCE, cs, val);
break;
case FILE_MEMORY_MIGRATE:
-   retval = update_flag(CS_MEMORY_MIGRATE, cs, buffer);
+   retval = update_flag(CS_MEMORY_MIGRATE, cs, val);
break;
case FILE_MEMORY_PRESSURE_ENABLED:
-   retval = update_memory_pressure_enabled(cs, buffer);
+   cpuset_memory_pressure_enabled = val;
break;
case FILE_MEMORY_PRESSURE:
retval = -EACCES;
break;
case FILE_SPREAD_PAGE:
-   retval = update_flag(CS_SPREAD_PAGE, cs, buffer);
+   retval = update_flag(CS_SPREAD_PAGE, cs, val);
cs-mems_generation = cpuset_mems_generation++;
break;
case FILE_SPREAD_SLAB:
-   retval = update_flag(CS_SPREAD_SLAB, cs, buffer);
+   retval = update_flag(CS_SPREAD_SLAB, cs, val);
cs-mems_generation = cpuset_mems_generation++;
break;
default:
retval = -EINVAL;
-   goto out2;
+   break;
}
-
-   if (retval == 0)
-   retval = nbytes;
-out2:
cgroup_unlock();
-out1:
-   kfree(buffer);
-   return retval;
+   return -EINVAL;
 }
 
 /*
@@ -1345,30 +1352,6 @@ static ssize_t cpuset_common_file_read(s
case FILE_MEMLIST:
s += cpuset_sprintf_memlist(s, cs);
break;
-   case FILE_CPU_EXCLUSIVE:
-   *s++ = is_cpu_exclusive(cs) ? '1' : '0';
-   break;
-   case FILE_MEM_EXCLUSIVE:
-   *s++ = is_mem_exclusive(cs) ? '1' : '0';
-   break;
-   case FILE_SCHED_LOAD_BALANCE:
-   *s++ = is_sched_load_balance(cs) ? '1' : '0';
-   break

[RFC][PATCH 4/7] CGroup API: Add res_counter_read_uint()

2008-02-15 Thread Paul Menage
Adds a function for returning the value of a resource counter member,
in a form suitable for use in a cgroup read_uint control file method.

Signed-off-by: Paul Menage [EMAIL PROTECTED]

---
 include/linux/res_counter.h |1 +
 kernel/res_counter.c|5 +
 2 files changed, 6 insertions(+)

Index: cgroupmap-2.6.24-mm1/include/linux/res_counter.h
===
--- cgroupmap-2.6.24-mm1.orig/include/linux/res_counter.h
+++ cgroupmap-2.6.24-mm1/include/linux/res_counter.h
@@ -54,6 +54,7 @@ struct res_counter {
 ssize_t res_counter_read(struct res_counter *counter, int member,
const char __user *buf, size_t nbytes, loff_t *pos,
int (*read_strategy)(unsigned long long val, char *s));
+u64 res_counter_read_uint(struct res_counter *counter, int member);
 ssize_t res_counter_write(struct res_counter *counter, int member,
const char __user *buf, size_t nbytes, loff_t *pos,
int (*write_strategy)(char *buf, unsigned long long *val));
Index: cgroupmap-2.6.24-mm1/kernel/res_counter.c
===
--- cgroupmap-2.6.24-mm1.orig/kernel/res_counter.c
+++ cgroupmap-2.6.24-mm1/kernel/res_counter.c
@@ -92,6 +92,11 @@ ssize_t res_counter_read(struct res_coun
pos, buf, s - buf);
 }
 
+u64 res_counter_read_uint(struct res_counter *counter, int member)
+{
+   return *res_counter_member(counter, member);
+}
+
 ssize_t res_counter_write(struct res_counter *counter, int member,
const char __user *userbuf, size_t nbytes, loff_t *pos,
int (*write_strategy)(char *st_buf, unsigned long long *val))

--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Add linux-fsdevel to VFS entry in MAINTAINERS

2008-02-14 Thread Paul Menage

Add linux-fsdevel to the VFS entry in MAINTAINERS

Signed-off-by: Paul Menage <[EMAIL PROTECTED]>

---
MAINTAINERS |1 +
1 file changed, 1 insertion(+)

Index: 2.6.24-mm1-bindflags/MAINTAINERS
===
--- 2.6.24-mm1-bindflags.orig/MAINTAINERS
+++ 2.6.24-mm1-bindflags/MAINTAINERS
@@ -1616,6 +1616,7 @@ S:Maintained
FILESYSTEMS (VFS and infrastructure)
P:  Alexander Viro
M:  [EMAIL PROTECTED]
+L: [EMAIL PROTECTED]
S:  Maintained

FIREWIRE SUBSYSTEM (drivers/firewire, )
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Add MS_BIND_FLAGS mount flag

2008-02-14 Thread Paul Menage
On Thu, Feb 14, 2008 at 9:31 AM, Miklos Szeredi <[EMAIL PROTECTED]> wrote:
>
>  I deliberately not used the MS_* flags, which is currently a messy mix
>  of things with totally different meanings.
>
>  Does this solve all the issues?

We should add a size parameter either in the mount_params or as a
final argument, for future extensibility.

And we might as well include MNT_READONLY in the API on the assumption
that per-mount readonly will be available soon.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Add MS_BIND_FLAGS mount flag

2008-02-14 Thread Paul Menage
On Thu, Feb 14, 2008 at 8:03 AM, Miklos Szeredi <[EMAIL PROTECTED]> wrote:
>  > The "flags" argument could be the same as for regular mount, and
>  > contain the mnt_flags - so the extra argument could maybe usefully be
>  > a "mnt_flags_mask", to indicate which flags we actually care about
>  > overriding.
>
>  The way I imagined it, is that mnt_flags is a mask, and the operation
>  (determined by flags) is either:
>
>   - set bits in mask
>   - clear bits in mask (or not in mask)
>   - set flags to mask
>
>  It doesn't allow setting some bits, clearing some others, and leaving
>  alone the rest.  But I think such flexibility isn't really needed.

I think I'd suggest something like:

new_mnt->mnt_flags = (old_mnt->mnt_flags & ~arg_mask) | (arg_flags & mask)

>  Maybe instead of messing with masks, it's better to introduce a
>  get_flags() or a more general mount_stat() operation, and let
>  userspace deal with setting and clearing flags, just as we do for
>  stat/chmod?
>
>  So we'd have
>
>   mount_stat(path, stat);
>   mount_bind(from, to, flags);
>   mount_set_flags(path, flags);
>   mount_move(from, to);
>
>  and perhaps
>
>   mount_remount(path, opt_string, flags);

Sounds reasonable to me. But it wouldn't directly solve the "do a
recursive bind mount setting the MS_READONLY flag on all children"
problem, so we'd need some of the earlier suggestions too.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Add MS_BIND_FLAGS mount flag

2008-02-14 Thread Paul Menage
[ cc: linux-fsdevel ]

On Thu, Feb 14, 2008 at 7:22 AM, Paul Menage <[EMAIL PROTECTED]> wrote:
> On Wed, Feb 13, 2008 at 10:02 PM, Christoph Hellwig <[EMAIL PROTECTED]> wrote:
>  >
>  >  I think this concept is reasonable, but I don't think MS_BIND_FLAGS
>  >  is a descriptive name for this flag.  MS_EXPLICIT_FLAGS might be better
>  >  but still isn't optimal.
>  >
>
>  MS_BIND_FLAGS_OVERRIDE ?
>
>  Paul
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Add MS_BIND_FLAGS mount flag

2008-02-14 Thread Paul Menage
On Wed, Feb 13, 2008 at 10:02 PM, Christoph Hellwig <[EMAIL PROTECTED]> wrote:
>
>  I think this concept is reasonable, but I don't think MS_BIND_FLAGS
>  is a descriptive name for this flag.  MS_EXPLICIT_FLAGS might be better
>  but still isn't optimal.
>

MS_BIND_FLAGS_OVERRIDE ?

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Add MS_BIND_FLAGS mount flag

2008-02-14 Thread Paul Menage
On Thu, Feb 14, 2008 at 12:30 AM, Miklos Szeredi <[EMAIL PROTECTED]> wrote:
>  > For recursive bind mounts, only the root of the tree being bound
>  > inherits the per-mount flags from the mount() arguments; sub-mounts
>  > inherit their per-mount flags from the source tree as usual.
>
>  This is rather strange behavior.  I think it would be much better, if
>  setting mount flags would work for recursive operations as well.  Also
>  what we really need is not resetting all the mount flags to some
>  predetermined values, but to be able to set or clear each flag
>  individually.

This is certainly true, but as you observe below it's a fair bit more
fiddly to specify in the API. I wasn't sure how much people recursive
bind mounts, so I figured I'd throw out this simpler version first.

>
>  For example, with the per-mount-read-only thing the most useful
>  application would be to just set the read-only flag and leave the
>  others alone.
>
>  And this is where we usually conclude, that a new userspace mount API
>  is long overdue.  So for starters, how about a new syscall for bind
>  mounts:
>
>  int mount_bind(const char *src, const char *dst, unsigned flags,
>  unsigned mnt_flags);

The "flags" argument could be the same as for regular mount, and
contain the mnt_flags - so the extra argument could maybe usefully be
a "mnt_flags_mask", to indicate which flags we actually care about
overriding.

What would happen when an existing super-block flag changes to become
a per-mount flag (e.g. per-mount read-only)? I think that would just
fit in with the "mask" idea, as long as we complained if any bits in
mnt_flags_mask weren't actually per-mount settable.

Being able to mask/set mount flags might be useful on a remount too,
since there's no clean way to get the existing mount flags for a mount
other than by scanning /proc/mounts. So an alternative to a separate
system call would be a new mnt_flag_mask argument to mount() (whose
presence would be indicated by a flag bit being set in the main flags)
which would be used to control which bits were set cleared for
remount/bind calls. Seems a bit wasteful of bits though. If we turned
"flags" into an (optionally) 64-bit argument then we'd have plenty of
bits to be able to specify both a "set" bit and a "mask" bit for each,
without needing a new syscall.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Add MS_BIND_FLAGS mount flag

2008-02-14 Thread Paul Menage
On Thu, Feb 14, 2008 at 12:30 AM, Miklos Szeredi [EMAIL PROTECTED] wrote:
   For recursive bind mounts, only the root of the tree being bound
   inherits the per-mount flags from the mount() arguments; sub-mounts
   inherit their per-mount flags from the source tree as usual.

  This is rather strange behavior.  I think it would be much better, if
  setting mount flags would work for recursive operations as well.  Also
  what we really need is not resetting all the mount flags to some
  predetermined values, but to be able to set or clear each flag
  individually.

This is certainly true, but as you observe below it's a fair bit more
fiddly to specify in the API. I wasn't sure how much people recursive
bind mounts, so I figured I'd throw out this simpler version first.


  For example, with the per-mount-read-only thing the most useful
  application would be to just set the read-only flag and leave the
  others alone.

  And this is where we usually conclude, that a new userspace mount API
  is long overdue.  So for starters, how about a new syscall for bind
  mounts:

  int mount_bind(const char *src, const char *dst, unsigned flags,
  unsigned mnt_flags);

The flags argument could be the same as for regular mount, and
contain the mnt_flags - so the extra argument could maybe usefully be
a mnt_flags_mask, to indicate which flags we actually care about
overriding.

What would happen when an existing super-block flag changes to become
a per-mount flag (e.g. per-mount read-only)? I think that would just
fit in with the mask idea, as long as we complained if any bits in
mnt_flags_mask weren't actually per-mount settable.

Being able to mask/set mount flags might be useful on a remount too,
since there's no clean way to get the existing mount flags for a mount
other than by scanning /proc/mounts. So an alternative to a separate
system call would be a new mnt_flag_mask argument to mount() (whose
presence would be indicated by a flag bit being set in the main flags)
which would be used to control which bits were set cleared for
remount/bind calls. Seems a bit wasteful of bits though. If we turned
flags into an (optionally) 64-bit argument then we'd have plenty of
bits to be able to specify both a set bit and a mask bit for each,
without needing a new syscall.

Paul
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Add MS_BIND_FLAGS mount flag

2008-02-14 Thread Paul Menage
On Wed, Feb 13, 2008 at 10:02 PM, Christoph Hellwig [EMAIL PROTECTED] wrote:

  I think this concept is reasonable, but I don't think MS_BIND_FLAGS
  is a descriptive name for this flag.  MS_EXPLICIT_FLAGS might be better
  but still isn't optimal.


MS_BIND_FLAGS_OVERRIDE ?

Paul
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   >