Re: [PATCHSET] cpuset: decouple cpuset locking from cgroup core, take#2
On Mon, Jan 7, 2013 at 5:31 PM, Li Zefan wrote: > > I don't think Paul's still maintaining cpusets. Normally it's Andrew > that picks up cpuset patches. It's fine you route it through cgroup > tree. Yes, I'm sorry - I should have handed on cpusets at the time I had to hand on cgroups. I was only really ever the maintainer for cpusets because Paul Jackson asked me to take it over when he retired, as I understood the cgroups-related parts of it. I never really had a good grasp of how the some of the lower-level parts of it interacted with the rest of the system (e.g. offlining, CPUs, scheduler domains, etc) anyway ... Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHSET] cpuset: decouple cpuset locking from cgroup core, take#2
On Mon, Jan 7, 2013 at 5:31 PM, Li Zefan lize...@huawei.com wrote: I don't think Paul's still maintaining cpusets. Normally it's Andrew that picks up cpuset patches. It's fine you route it through cgroup tree. Yes, I'm sorry - I should have handed on cpusets at the time I had to hand on cgroups. I was only really ever the maintainer for cpusets because Paul Jackson asked me to take it over when he retired, as I understood the cgroups-related parts of it. I never really had a good grasp of how the some of the lower-level parts of it interacted with the rest of the system (e.g. offlining, CPUs, scheduler domains, etc) anyway ... Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Memory Resource Controller Add Boot Option
On Mon, Feb 25, 2008 at 7:01 PM, Li Zefan <[EMAIL PROTECTED]> wrote: > > > > - foo doesn't show up in /proc/cgroups > > Or we can print out the disable flag, maybe this will be better? > Because we can distinguish from disabled and not compiled in from > > /proc/cgroups. Certainly possible, if people felt it was useful. > > > - foo isn't auto-mounted if you mount all cgroups in a single hierarchy > > - foo isn't visible as an individually mountable subsystem > > You mentioned in a previous mail if we mount a disabled subsystem we > will get an error. Here we just ignore the mount option. Which makes > more sense ? > No, we don't ignore the mount option - we give an error since it doesn't refer to a valid subsystem. (And in the first case there is no mount option). Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Memory Resource Controller Add Boot Option
On Mon, Feb 25, 2008 at 7:01 PM, Li Zefan [EMAIL PROTECTED] wrote: - foo doesn't show up in /proc/cgroups Or we can print out the disable flag, maybe this will be better? Because we can distinguish from disabled and not compiled in from /proc/cgroups. Certainly possible, if people felt it was useful. - foo isn't auto-mounted if you mount all cgroups in a single hierarchy - foo isn't visible as an individually mountable subsystem You mentioned in a previous mail if we mount a disabled subsystem we will get an error. Here we just ignore the mount option. Which makes more sense ? No, we don't ignore the mount option - we give an error since it doesn't refer to a valid subsystem. (And in the first case there is no mount option). Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/10] CGroup API files: Various cleanup to CGroup control files
On Mon, Feb 25, 2008 at 7:23 PM, Li Zefan <[EMAIL PROTECTED]> wrote: > > Should those pathces be rebased againt 2.6.25-rc3 ? > No, because they're against 2.6.25-rc2-mm1, which is already has (I think) any of the new bits in 2.6.25-rc3 that would be affected by these patches. Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Memory Resource Controller Add Boot Option
I'll send out a prototype for comment. Something like the patch below. The effects of cgroup_disable=foo are: - foo doesn't show up in /proc/cgroups - foo isn't auto-mounted if you mount all cgroups in a single hierarchy - foo isn't visible as an individually mountable subsystem As a result there will only ever be one call to foo->create(), at init time; all processes will stay in this group, and the group will never be mounted on a visible hierarchy. Any additional effects (e.g. not allocating metadata) are up to the foo subsystem. This doesn't handle early_init subsystems (their "disabled" bit isn't set be, but it could easily be extended to do so if any of the early_init systems wanted it - I think it would just involve some nastier parameter processing since it would occur before the command-line argument parser had been run. include/linux/cgroup.h |1 + kernel/cgroup.c| 29 +++-- 2 files changed, 28 insertions(+), 2 deletions(-) Index: cgroup_disable-2.6.25-rc2-mm1/include/linux/cgroup.h === --- cgroup_disable-2.6.25-rc2-mm1.orig/include/linux/cgroup.h +++ cgroup_disable-2.6.25-rc2-mm1/include/linux/cgroup.h @@ -256,6 +256,7 @@ struct cgroup_subsys { void (*bind)(struct cgroup_subsys *ss, struct cgroup *root); int subsys_id; int active; + int disabled; int early_init; #define MAX_CGROUP_TYPE_NAMELEN 32 const char *name; Index: cgroup_disable-2.6.25-rc2-mm1/kernel/cgroup.c === --- cgroup_disable-2.6.25-rc2-mm1.orig/kernel/cgroup.c +++ cgroup_disable-2.6.25-rc2-mm1/kernel/cgroup.c @@ -790,7 +790,14 @@ static int parse_cgroupfs_options(char * if (!*token) return -EINVAL; if (!strcmp(token, "all")) { - opts->subsys_bits = (1 << CGROUP_SUBSYS_COUNT) - 1; + /* Add all non-disabled subsystems */ + int i; + opts->subsys_bits = 0; + for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) { + struct cgroup_subsys *ss = subsys[i]; + if (!ss->disabled) + opts->subsys_bits |= 1ul << i; + } } else if (!strcmp(token, "noprefix")) { set_bit(ROOT_NOPREFIX, >flags); } else if (!strncmp(token, "release_agent=", 14)) { @@ -808,7 +815,8 @@ static int parse_cgroupfs_options(char * for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) { ss = subsys[i]; if (!strcmp(token, ss->name)) { - set_bit(i, >subsys_bits); + if (!ss->disabled) + set_bit(i, >subsys_bits); break; } } @@ -2596,6 +2606,8 @@ static int proc_cgroupstats_show(struct mutex_lock(_mutex); for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) { struct cgroup_subsys *ss = subsys[i]; + if (ss->disabled) + continue; seq_printf(m, "%s\t%lu\t%d\n", ss->name, ss->root->subsys_bits, ss->root->number_of_cgroups); @@ -2991,3 +3003,16 @@ static void cgroup_release_agent(struct spin_unlock(_list_lock); mutex_unlock(_mutex); } + +static int __init cgroup_disable(char *str) +{ + int i; + for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) { + struct cgroup_subsys *ss = subsys[i]; + if (!strcmp(str, ss->name)) { + ss->disabled = 1; + break; + } + } +} +__setup("cgroup_disable=", cgroup_disable); Sure thing, if css has the flag, then it would nice. Could you wrap it up to say something like css_disabled(_cgroup_subsys) It's the subsys object rather than the css (cgroup_subsys_state). We could have something like: #define cgroup_subsys_disabled(_ss) ((ss_)->disabled) but I don't see that cgroup_subsys_disabled(_cgroup_subsys) is better than just putting mem_cgroup_subsys.disabled Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Memory Resource Controller Add Boot Option
On Mon, Feb 25, 2008 at 9:18 AM, Balbir Singh <[EMAIL PROTECTED]> wrote: > > I thought about it, but it did not work out all that well. The reason being, > that the memory controller is called in from places besides cgroup. > mem_cgroup_charge_common() for example is called from several places in mm. > Calling into cgroups to check, enabled/disabled did not seem right. You wouldn't need to call into cgroups - if it's a flag in the subsys object (which is defined in memcontrol.c) you'd just say if (mem_cgroup_subsys.disabled) { ... } I'll send out a prototype for comment. Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Memory Resource Controller Add Boot Option
On Mon, Feb 25, 2008 at 3:55 AM, Balbir Singh <[EMAIL PROTECTED]> wrote: > > > A boot option for the memory controller was discussed on lkml. It is a good > idea to add it, since it saves memory for people who want to turn off the > memory controller. > > By default the option is on for the following two reasons > > 1. It provides compatibility with the current scheme where the memory >controller turns on if the config option is enabled > 2. It allows for wider testing of the memory controller, once the config >option is enabled > > We still allow the create, destroy callbacks to succeed, since they are > not aware of boot options. We do not populate the directory will > memory resource controller specific files. Would it make more sense to have a generic cgroups boot option for this? Something like cgroup_disable=xxx, which would be parsed by cgroups and would cause: - a "disabled" flag to be set to true in the subsys object (you could use this in place of the mem_cgroup_on flag) - prevent the disabled cgroup from being bound to any mounted hierarchy (so it would be ignored in a mount with no subsystem options, and a mount with options that specifically pick that subsystem would give an error) Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Memory Resource Controller Add Boot Option
On Mon, Feb 25, 2008 at 3:55 AM, Balbir Singh [EMAIL PROTECTED] wrote: A boot option for the memory controller was discussed on lkml. It is a good idea to add it, since it saves memory for people who want to turn off the memory controller. By default the option is on for the following two reasons 1. It provides compatibility with the current scheme where the memory controller turns on if the config option is enabled 2. It allows for wider testing of the memory controller, once the config option is enabled We still allow the create, destroy callbacks to succeed, since they are not aware of boot options. We do not populate the directory will memory resource controller specific files. Would it make more sense to have a generic cgroups boot option for this? Something like cgroup_disable=xxx, which would be parsed by cgroups and would cause: - a disabled flag to be set to true in the subsys object (you could use this in place of the mem_cgroup_on flag) - prevent the disabled cgroup from being bound to any mounted hierarchy (so it would be ignored in a mount with no subsystem options, and a mount with options that specifically pick that subsystem would give an error) Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Memory Resource Controller Add Boot Option
On Mon, Feb 25, 2008 at 9:18 AM, Balbir Singh [EMAIL PROTECTED] wrote: I thought about it, but it did not work out all that well. The reason being, that the memory controller is called in from places besides cgroup. mem_cgroup_charge_common() for example is called from several places in mm. Calling into cgroups to check, enabled/disabled did not seem right. You wouldn't need to call into cgroups - if it's a flag in the subsys object (which is defined in memcontrol.c) you'd just say if (mem_cgroup_subsys.disabled) { ... } I'll send out a prototype for comment. Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Memory Resource Controller Add Boot Option
I'll send out a prototype for comment. Something like the patch below. The effects of cgroup_disable=foo are: - foo doesn't show up in /proc/cgroups - foo isn't auto-mounted if you mount all cgroups in a single hierarchy - foo isn't visible as an individually mountable subsystem As a result there will only ever be one call to foo-create(), at init time; all processes will stay in this group, and the group will never be mounted on a visible hierarchy. Any additional effects (e.g. not allocating metadata) are up to the foo subsystem. This doesn't handle early_init subsystems (their disabled bit isn't set be, but it could easily be extended to do so if any of the early_init systems wanted it - I think it would just involve some nastier parameter processing since it would occur before the command-line argument parser had been run. include/linux/cgroup.h |1 + kernel/cgroup.c| 29 +++-- 2 files changed, 28 insertions(+), 2 deletions(-) Index: cgroup_disable-2.6.25-rc2-mm1/include/linux/cgroup.h === --- cgroup_disable-2.6.25-rc2-mm1.orig/include/linux/cgroup.h +++ cgroup_disable-2.6.25-rc2-mm1/include/linux/cgroup.h @@ -256,6 +256,7 @@ struct cgroup_subsys { void (*bind)(struct cgroup_subsys *ss, struct cgroup *root); int subsys_id; int active; + int disabled; int early_init; #define MAX_CGROUP_TYPE_NAMELEN 32 const char *name; Index: cgroup_disable-2.6.25-rc2-mm1/kernel/cgroup.c === --- cgroup_disable-2.6.25-rc2-mm1.orig/kernel/cgroup.c +++ cgroup_disable-2.6.25-rc2-mm1/kernel/cgroup.c @@ -790,7 +790,14 @@ static int parse_cgroupfs_options(char * if (!*token) return -EINVAL; if (!strcmp(token, all)) { - opts-subsys_bits = (1 CGROUP_SUBSYS_COUNT) - 1; + /* Add all non-disabled subsystems */ + int i; + opts-subsys_bits = 0; + for (i = 0; i CGROUP_SUBSYS_COUNT; i++) { + struct cgroup_subsys *ss = subsys[i]; + if (!ss-disabled) + opts-subsys_bits |= 1ul i; + } } else if (!strcmp(token, noprefix)) { set_bit(ROOT_NOPREFIX, opts-flags); } else if (!strncmp(token, release_agent=, 14)) { @@ -808,7 +815,8 @@ static int parse_cgroupfs_options(char * for (i = 0; i CGROUP_SUBSYS_COUNT; i++) { ss = subsys[i]; if (!strcmp(token, ss-name)) { - set_bit(i, opts-subsys_bits); + if (!ss-disabled) + set_bit(i, opts-subsys_bits); break; } } @@ -2596,6 +2606,8 @@ static int proc_cgroupstats_show(struct mutex_lock(cgroup_mutex); for (i = 0; i CGROUP_SUBSYS_COUNT; i++) { struct cgroup_subsys *ss = subsys[i]; + if (ss-disabled) + continue; seq_printf(m, %s\t%lu\t%d\n, ss-name, ss-root-subsys_bits, ss-root-number_of_cgroups); @@ -2991,3 +3003,16 @@ static void cgroup_release_agent(struct spin_unlock(release_list_lock); mutex_unlock(cgroup_mutex); } + +static int __init cgroup_disable(char *str) +{ + int i; + for (i = 0; i CGROUP_SUBSYS_COUNT; i++) { + struct cgroup_subsys *ss = subsys[i]; + if (!strcmp(str, ss-name)) { + ss-disabled = 1; + break; + } + } +} +__setup(cgroup_disable=, cgroup_disable); Sure thing, if css has the flag, then it would nice. Could you wrap it up to say something like css_disabled(mem_cgroup_subsys) It's the subsys object rather than the css (cgroup_subsys_state). We could have something like: #define cgroup_subsys_disabled(_ss) ((ss_)-disabled) but I don't see that cgroup_subsys_disabled(mem_cgroup_subsys) is better than just putting mem_cgroup_subsys.disabled Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/10] CGroup API files: Various cleanup to CGroup control files
On Mon, Feb 25, 2008 at 7:23 PM, Li Zefan [EMAIL PROTECTED] wrote: Should those pathces be rebased againt 2.6.25-rc3 ? No, because they're against 2.6.25-rc2-mm1, which is already has (I think) any of the new bits in 2.6.25-rc3 that would be affected by these patches. Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cgroup: fix default notify_on_release setting
On Sun, Feb 24, 2008 at 9:53 PM, Li Zefan <[EMAIL PROTECTED]> wrote: > The documentation says the default value of notify_on_release of > a child cgroup is inherited from its parent, which is reasonable, > but the implementation just sets the flag disabled. > > Signed-off-by: Li Zefan <[EMAIL PROTECTED]> Acked-by: Paul Menage <[EMAIL PROTECTED]> Yes, I guess it makes sense to follow the original cpusets behaviour. I think that got lost when the notify-on-release functionality was temporarily removed during cgroups development. > --- > kernel/cgroup.c |4 +++- > 1 files changed, 3 insertions(+), 1 deletions(-) > > diff --git a/kernel/cgroup.c b/kernel/cgroup.c > index d8abe99..e9c2fb0 100644 > --- a/kernel/cgroup.c > +++ b/kernel/cgroup.c > @@ -2232,7 +2232,6 @@ static long cgroup_create(struct cgroup *parent, > struct dentry *dentry, > > mutex_lock(_mutex); > > - cgrp->flags = 0; > INIT_LIST_HEAD(>sibling); > INIT_LIST_HEAD(>children); > INIT_LIST_HEAD(>css_sets); > @@ -2242,6 +2241,9 @@ static long cgroup_create(struct cgroup *parent, > struct dentry *dentry, > cgrp->root = parent->root; > cgrp->top_cgroup = parent->top_cgroup; > > + if (notify_on_release(parent)) > + set_bit(CGRP_NOTIFY_ON_RELEASE, >flags); > + > for_each_subsys(root, ss) { > struct cgroup_subsys_state *css = ss->create(ss, cgrp); > if (IS_ERR(css)) { > -- > 1.5.4.rc3 > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cgroup: fix default notify_on_release setting
On Sun, Feb 24, 2008 at 9:53 PM, Li Zefan [EMAIL PROTECTED] wrote: The documentation says the default value of notify_on_release of a child cgroup is inherited from its parent, which is reasonable, but the implementation just sets the flag disabled. Signed-off-by: Li Zefan [EMAIL PROTECTED] Acked-by: Paul Menage [EMAIL PROTECTED] Yes, I guess it makes sense to follow the original cpusets behaviour. I think that got lost when the notify-on-release functionality was temporarily removed during cgroups development. --- kernel/cgroup.c |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index d8abe99..e9c2fb0 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -2232,7 +2232,6 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry, mutex_lock(cgroup_mutex); - cgrp-flags = 0; INIT_LIST_HEAD(cgrp-sibling); INIT_LIST_HEAD(cgrp-children); INIT_LIST_HEAD(cgrp-css_sets); @@ -2242,6 +2241,9 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry, cgrp-root = parent-root; cgrp-top_cgroup = parent-top_cgroup; + if (notify_on_release(parent)) + set_bit(CGRP_NOTIFY_ON_RELEASE, cgrp-flags); + for_each_subsys(root, ss) { struct cgroup_subsys_state *css = ss-create(ss, cgrp); if (IS_ERR(css)) { -- 1.5.4.rc3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] ResCounter: Use read_uint in memory controller
On Sat, Feb 23, 2008 at 6:47 PM, Balbir Singh <[EMAIL PROTECTED]> wrote: > >> res_counter_read_u64() I'd also want to rename all the other > >> *read_uint functions/fields to *read_u64 too. Can I do that in a > >> separate patch? > >> > > > > Sounds sensible to me. > > > > Sure, fair enough. > Actually, since multiple people were asking for this change I did the search/replace and sent it out already (as a precursor of the other patches in the series that I sent today). Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/8] sched: rt-group: interface
On Sat, Feb 23, 2008 at 12:26 PM, Peter Zijlstra <[EMAIL PROTECTED]> wrote: > > > In that case I guess I'll have to add signed versions of the > > read_uint/write_uint methods. > > Yes, I looked at that, I found the interface somewhat unfortunate, it > would mean growing the struct with two more function pointers. Is that really a big deal? We're talking about a structure that has a small number (<10 in the current tree) of instances per cgroup subsystem. > Perhaps a > read and write function with abstract data would be better suited. That > would allow for this and more. Sadly it looses type information. If the size of the struct cftype really became a problem, I think the cleanest way to fix it would be to have a union of the potential function pointers, and add a field to specify which one is in use. Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/8] sched: rt-group: interface
On Sat, Feb 23, 2008 at 11:57 AM, Peter Zijlstra <[EMAIL PROTECTED]> wrote: > > If so, could we avoid that problem by using 0 rather than -1 as the > > "unlimited" value? It looks from what I've read in the Documentation > > changes as though 0 isn't really a meaningful value. > > 0 means no time, quite useful and clearly distinct from inf. time. > So a real-time task in a cgroup with a 0 rt_runtime can be in the R state but never actually get to run? OK, if people need to be able to do that then fair enough. In that case I guess I'll have to add signed versions of the read_uint/write_uint methods. Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/8] sched: rt-group: interface
On Mon, Feb 4, 2008 at 1:03 PM, Peter Zijlstra <[EMAIL PROTECTED]> wrote: > +static int cpu_rt_runtime_write(struct cgroup *cgrp, struct cftype *cft, > + struct file *file, > + const char __user *userbuf, > + size_t nbytes, loff_t *unused_ppos) > +{ > + char buffer[64]; > + int retval = 0; > + s64 val; > + char *end; > + > + if (!nbytes) > + return -EINVAL; > + if (nbytes >= sizeof(buffer)) > + return -E2BIG; > + if (copy_from_user(buffer, userbuf, nbytes)) > + return -EFAULT; > + > + buffer[nbytes] = 0; /* nul-terminate */ > + > + /* strip newline if necessary */ > + if (nbytes && (buffer[nbytes-1] == '\n')) > + buffer[nbytes-1] = 0; > + val = simple_strtoll(buffer, , 0); > + if (*end) > + return -EINVAL; > + > + /* Pass to subsystem */ > + retval = sched_group_set_rt_runtime(cgroup_tg(cgrp), val); > + if (!retval) > + retval = nbytes; > + return retval; > } > > -static u64 cpu_rt_ratio_read_uint(struct cgroup *cgrp, struct cftype *cft) > -{ > - struct task_group *tg = cgroup_tg(cgrp); > +static ssize_t cpu_rt_runtime_read(struct cgroup *cgrp, struct cftype *cft, > + struct file *file, > + char __user *buf, size_t nbytes, > + loff_t *ppos) > +{ > + char tmp[64]; > + long val = sched_group_rt_runtime(cgroup_tg(cgrp)); > + int len = sprintf(tmp, "%ld\n", val); > > - return (u64) tg->rt_ratio; > + return simple_read_from_buffer(buf, nbytes, ppos, tmp, len); > } What's the reason that you can't use the cgroup read_uint/write_uint methods for this? Is it just because you have -1 as your "unlimited" value. If so, could we avoid that problem by using 0 rather than -1 as the "unlimited" value? It looks from what I've read in the Documentation changes as though 0 isn't really a meaningful value. Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] Cpusets API: Update cpusets to use cgroup structured file API
On Sat, Feb 23, 2008 at 12:06 AM, Andrew Morton <[EMAIL PROTECTED]> wrote: > > It is unclear to me what the relationship is between this and your other > cgroup pseudo-fs changes, but as this is fiddling with a userspace > interface we should get a wiggle on - we don't want to let things like this > slip out to 2.6.26. > > So.. please resend everything? > Yes, will do. But (given that no-one seems to like the proposed cgroup.api file) these patches don't actually change the userspace API at all - it's purely internal plumbing improvements. Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cgroup: fix sparse warning of shadow symbol in cgroup.c
On Sat, Feb 23, 2008 at 4:33 AM, Paul Jackson <[EMAIL PROTECTED]> wrote: > From: Paul Jackson <[EMAIL PROTECTED]> > > Fix a code warning: symbol 'p' shadows an earlier one > > This is a reincarnation of Harvey Harrison's patch: > cpuset: sparse warnings in cpuset.c > > Independently, Cliff Wickman moved the affected code, > from kernel/cpuset.c to kernel/cgroup.c, in his patch: > cpusets: update_cpumask revision > > Signed-off-by: Paul Jackson <[EMAIL PROTECTED]> Acked-by: Paul Menage <[EMAIL PROTECTED]> > Cc: Harvey Harrison <[EMAIL PROTECTED]> > Cc: Cliff Wickman <[EMAIL PROTECTED]> > > --- > kernel/cgroup.c | 10 +- > 1 file changed, 5 insertions(+), 5 deletions(-) > > --- 2.6.25-rc2-mm1.orig/kernel/cgroup.c 2008-02-16 01:04:48.0 -0800 > +++ 2.6.25-rc2-mm1/kernel/cgroup.c 2008-02-23 04:19:44.006614677 -0800 > @@ -1897,14 +1897,14 @@ int cgroup_scan_tasks(struct cgroup_scan > > if (heap->size) { > for (i = 0; i < heap->size; i++) { > - struct task_struct *p = heap->ptrs[i]; > + struct task_struct *q = heap->ptrs[i]; > if (i == 0) { > - latest_time = p->start_time; > - latest_task = p; > + latest_time = q->start_time; > + latest_task = q; > } > /* Process the task per the caller's callback */ > - scan->process_task(p, scan); > - put_task_struct(p); > + scan->process_task(q, scan); > + put_task_struct(q); > } > /* > * If we had to process any tasks at all, scan again > > -- > I won't rest till it's the best ... > Programmer, Linux Scalability > Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] cgroup map files: Add cgroup map data type
On Sat, Feb 23, 2008 at 12:04 AM, Andrew Morton <[EMAIL PROTECTED]> wrote: > > +static int cgroup_map_add(struct cgroup_map_cb *cb, const char *key, u64 > value) > > +{ > > + struct seq_file *sf = cb->state; > > + return seq_printf(sf, "%s %llu\n", key, value); > > +} > > We don't know what type the architecture uses to implement u64. This will > warn on powerpc, sparc64, maybe others. OK, I'll add an (unsigned long long) cast. > > > > +static int cgroup_seqfile_show(struct seq_file *m, void *arg) > > +{ > > + struct cgroup_seqfile_state *state = m->private; > > + struct cftype *cft = state->cft; > > + if (cft->read_map) { > > + struct cgroup_map_cb cb = { > > + .fill = cgroup_map_add, > > + .state = m, > > + }; > > + return cft->read_map(state->cgroup, cft, ); > > + } else { > > + BUG(); > > That's not really needed. Just call cft->read_map unconditionally. if > it's zero we'll get a null-pointer deref which will have just the same > effect as a BUG. OK. The long-term plan is to have other kinds of files also handled by this function, so eventually it would look something like: if (cft->read_map) { ... } else if (cft->read_something_else) { ... } ... } else { BUG(); } But I guess I can save that for the future. > > static int cgroup_file_open(struct inode *inode, struct file *file) > > { > > int err; > > @@ -1499,7 +1539,18 @@ static int cgroup_file_open(struct inode > > cft = __d_cft(file->f_dentry); > > if (!cft) > > return -ENODEV; > > - if (cft->open) > > + if (cft->read_map) { > > But above a NULL value is illegal. Why are we testing it here? > > The existence of cft->read_map causes us to open a seq_file. Otherwise we do nothing special and carry on down the normal open path. Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Tiny cpusets -- cpusets for small systems?
On Sat, Feb 23, 2008 at 4:09 AM, Paul Jackson <[EMAIL PROTECTED]> wrote: > A couple of proposals have been made recently by people working Linux > on smaller systems, for improving realtime isolation and memory > pressure handling: > > (1) cpu isolation for hard(er) realtime > http://lkml.org/lkml/2008/2/21/517 > Max Krasnyanskiy <[EMAIL PROTECTED]> > [PATCH sched-devel 0/7] CPU isolation extensions > > (2) notify user space of tight memory > http://lkml.org/lkml/2008/2/9/144 > KOSAKI Motohiro <[EMAIL PROTECTED]> > [PATCH 0/8][for -mm] mem_notify v6 > > In both cases, some of us have responded "why not use cpusets", and the > original submitters have replied "cpusets are too fat" (well, they > were more diplomatic than that, but I guess I can say that ;) Having read those threads, it looks to me as though: - the parts of Max's problem that would be solved by cpusets can be mostly accomplished just via sched_setaffinity() - Motohiro wants to add a new system-wide API that you would also like to have available on a per-cpuset basis. (Why not just add two access points for the same feature?) I'm don't think that either of these would be enough to justify big changes to cpusets or cgroups, although eliminating bloat is always a good thing. > The primary semantic limit I'd suggest would be supporting exactly > one layer depth of cpusets, not a full hierarchy. So one could still > successfully issue from user space 'mkdir /dev/cpuset/foo', but trying > to do 'mkdir /dev/cpuset/foo/bar' would fail. This reminds me of > very early FAT file systems, which had just a single, fixed size > root directory ;). There might even be a configurable fixed upper > limit on how many /dev/cpuset/* directories were allowed, further > simplifying the locking and dynamic memory behavior of this apparatus. I'm not sure that either of these would make much difference to the overall footprint. A single layer of cpusets would allow you to simplify validate_change() but not much else. I don't see how a fixed upper limit on the number of cpusets makes the locking sufficiently simpler to save much code. > > How this extends to cgroups I don't know; for now I suspect that most > cgroup module development is motivated by the needs of larger systems, > not smaller systems. However, cpusets is now a module client of > cgroups, and it is cgroups that now provides cpusets with its interface > to the vfs infrastructure. It would seem unfortunate if this relation > was not continued with tiny cpusets. Perhaps someone can imagine a tiny > cgroups? This might be the most difficult part of this proposal. If we wanted to go this way, I can imagine a cgroups config option that forces just a single hierarchy, which would allow a bunch of simplifications that would save plenty of text. > > Looking at some IA64 sn2 config builds I have laying about, I see the > following text sizes for a couple of versions, showing the growth of > the cpuset/cgroup apparatus over time: > > 25933 2.6.18-rc3-mm1/kernel/cpuset.o (Aug 2006) > vs. > 37823 2.6.25-rc2-mm1/kernel/cgroup.o (Feb 2008) > 19558 2.6.25-rc2-mm1/kernel/cpuset.o > > So the total has grown from 25933 to 57381 text bytes (note that > this is IA64 arch; most arch's will have proportionately smaller > text sizes.) On x86_64 they're: cgroup.o: 17348 cpuset.o: 8533 Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] ResCounter: Use read_uint in memory controller
On Thu, Feb 21, 2008 at 8:29 PM, Balbir Singh <[EMAIL PROTECTED]> wrote: > > Looks good, except for the name uint(), can we make it u64(). Integers are 32 > bit on both ILP32 and LP64, but we really read/write 64 bit values. Yes, that's true. But read_uint() is more consistent with all the other instances in cgroups and subsystems. So if we were to call it res_counter_read_u64() I'd also want to rename all the other *read_uint functions/fields to *read_u64 too. Can I do that in a separate patch? Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] ResCounter: Use read_uint in memory controller
On Thu, Feb 21, 2008 at 8:29 PM, Balbir Singh [EMAIL PROTECTED] wrote: Looks good, except for the name uint(), can we make it u64(). Integers are 32 bit on both ILP32 and LP64, but we really read/write 64 bit values. Yes, that's true. But read_uint() is more consistent with all the other instances in cgroups and subsystems. So if we were to call it res_counter_read_u64() I'd also want to rename all the other *read_uint functions/fields to *read_u64 too. Can I do that in a separate patch? Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Tiny cpusets -- cpusets for small systems?
On Sat, Feb 23, 2008 at 4:09 AM, Paul Jackson [EMAIL PROTECTED] wrote: A couple of proposals have been made recently by people working Linux on smaller systems, for improving realtime isolation and memory pressure handling: (1) cpu isolation for hard(er) realtime http://lkml.org/lkml/2008/2/21/517 Max Krasnyanskiy [EMAIL PROTECTED] [PATCH sched-devel 0/7] CPU isolation extensions (2) notify user space of tight memory http://lkml.org/lkml/2008/2/9/144 KOSAKI Motohiro [EMAIL PROTECTED] [PATCH 0/8][for -mm] mem_notify v6 In both cases, some of us have responded why not use cpusets, and the original submitters have replied cpusets are too fat (well, they were more diplomatic than that, but I guess I can say that ;) Having read those threads, it looks to me as though: - the parts of Max's problem that would be solved by cpusets can be mostly accomplished just via sched_setaffinity() - Motohiro wants to add a new system-wide API that you would also like to have available on a per-cpuset basis. (Why not just add two access points for the same feature?) I'm don't think that either of these would be enough to justify big changes to cpusets or cgroups, although eliminating bloat is always a good thing. The primary semantic limit I'd suggest would be supporting exactly one layer depth of cpusets, not a full hierarchy. So one could still successfully issue from user space 'mkdir /dev/cpuset/foo', but trying to do 'mkdir /dev/cpuset/foo/bar' would fail. This reminds me of very early FAT file systems, which had just a single, fixed size root directory ;). There might even be a configurable fixed upper limit on how many /dev/cpuset/* directories were allowed, further simplifying the locking and dynamic memory behavior of this apparatus. I'm not sure that either of these would make much difference to the overall footprint. A single layer of cpusets would allow you to simplify validate_change() but not much else. I don't see how a fixed upper limit on the number of cpusets makes the locking sufficiently simpler to save much code. How this extends to cgroups I don't know; for now I suspect that most cgroup module development is motivated by the needs of larger systems, not smaller systems. However, cpusets is now a module client of cgroups, and it is cgroups that now provides cpusets with its interface to the vfs infrastructure. It would seem unfortunate if this relation was not continued with tiny cpusets. Perhaps someone can imagine a tiny cgroups? This might be the most difficult part of this proposal. If we wanted to go this way, I can imagine a cgroups config option that forces just a single hierarchy, which would allow a bunch of simplifications that would save plenty of text. Looking at some IA64 sn2 config builds I have laying about, I see the following text sizes for a couple of versions, showing the growth of the cpuset/cgroup apparatus over time: 25933 2.6.18-rc3-mm1/kernel/cpuset.o (Aug 2006) vs. 37823 2.6.25-rc2-mm1/kernel/cgroup.o (Feb 2008) 19558 2.6.25-rc2-mm1/kernel/cpuset.o So the total has grown from 25933 to 57381 text bytes (note that this is IA64 arch; most arch's will have proportionately smaller text sizes.) On x86_64 they're: cgroup.o: 17348 cpuset.o: 8533 Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] cgroup map files: Add cgroup map data type
On Sat, Feb 23, 2008 at 12:04 AM, Andrew Morton [EMAIL PROTECTED] wrote: +static int cgroup_map_add(struct cgroup_map_cb *cb, const char *key, u64 value) +{ + struct seq_file *sf = cb-state; + return seq_printf(sf, %s %llu\n, key, value); +} We don't know what type the architecture uses to implement u64. This will warn on powerpc, sparc64, maybe others. OK, I'll add an (unsigned long long) cast. +static int cgroup_seqfile_show(struct seq_file *m, void *arg) +{ + struct cgroup_seqfile_state *state = m-private; + struct cftype *cft = state-cft; + if (cft-read_map) { + struct cgroup_map_cb cb = { + .fill = cgroup_map_add, + .state = m, + }; + return cft-read_map(state-cgroup, cft, cb); + } else { + BUG(); That's not really needed. Just call cft-read_map unconditionally. if it's zero we'll get a null-pointer deref which will have just the same effect as a BUG. OK. The long-term plan is to have other kinds of files also handled by this function, so eventually it would look something like: if (cft-read_map) { ... } else if (cft-read_something_else) { ... } ... } else { BUG(); } But I guess I can save that for the future. static int cgroup_file_open(struct inode *inode, struct file *file) { int err; @@ -1499,7 +1539,18 @@ static int cgroup_file_open(struct inode cft = __d_cft(file-f_dentry); if (!cft) return -ENODEV; - if (cft-open) + if (cft-read_map) { But above a NULL value is illegal. Why are we testing it here? The existence of cft-read_map causes us to open a seq_file. Otherwise we do nothing special and carry on down the normal open path. Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cgroup: fix sparse warning of shadow symbol in cgroup.c
On Sat, Feb 23, 2008 at 4:33 AM, Paul Jackson [EMAIL PROTECTED] wrote: From: Paul Jackson [EMAIL PROTECTED] Fix a code warning: symbol 'p' shadows an earlier one This is a reincarnation of Harvey Harrison's patch: cpuset: sparse warnings in cpuset.c Independently, Cliff Wickman moved the affected code, from kernel/cpuset.c to kernel/cgroup.c, in his patch: cpusets: update_cpumask revision Signed-off-by: Paul Jackson [EMAIL PROTECTED] Acked-by: Paul Menage [EMAIL PROTECTED] Cc: Harvey Harrison [EMAIL PROTECTED] Cc: Cliff Wickman [EMAIL PROTECTED] --- kernel/cgroup.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) --- 2.6.25-rc2-mm1.orig/kernel/cgroup.c 2008-02-16 01:04:48.0 -0800 +++ 2.6.25-rc2-mm1/kernel/cgroup.c 2008-02-23 04:19:44.006614677 -0800 @@ -1897,14 +1897,14 @@ int cgroup_scan_tasks(struct cgroup_scan if (heap-size) { for (i = 0; i heap-size; i++) { - struct task_struct *p = heap-ptrs[i]; + struct task_struct *q = heap-ptrs[i]; if (i == 0) { - latest_time = p-start_time; - latest_task = p; + latest_time = q-start_time; + latest_task = q; } /* Process the task per the caller's callback */ - scan-process_task(p, scan); - put_task_struct(p); + scan-process_task(q, scan); + put_task_struct(q); } /* * If we had to process any tasks at all, scan again -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.650.933.1373 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] Cpusets API: Update cpusets to use cgroup structured file API
On Sat, Feb 23, 2008 at 12:06 AM, Andrew Morton [EMAIL PROTECTED] wrote: It is unclear to me what the relationship is between this and your other cgroup pseudo-fs changes, but as this is fiddling with a userspace interface we should get a wiggle on - we don't want to let things like this slip out to 2.6.26. So.. please resend everything? Yes, will do. But (given that no-one seems to like the proposed cgroup.api file) these patches don't actually change the userspace API at all - it's purely internal plumbing improvements. Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/8] sched: rt-group: interface
On Mon, Feb 4, 2008 at 1:03 PM, Peter Zijlstra [EMAIL PROTECTED] wrote: +static int cpu_rt_runtime_write(struct cgroup *cgrp, struct cftype *cft, + struct file *file, + const char __user *userbuf, + size_t nbytes, loff_t *unused_ppos) +{ + char buffer[64]; + int retval = 0; + s64 val; + char *end; + + if (!nbytes) + return -EINVAL; + if (nbytes = sizeof(buffer)) + return -E2BIG; + if (copy_from_user(buffer, userbuf, nbytes)) + return -EFAULT; + + buffer[nbytes] = 0; /* nul-terminate */ + + /* strip newline if necessary */ + if (nbytes (buffer[nbytes-1] == '\n')) + buffer[nbytes-1] = 0; + val = simple_strtoll(buffer, end, 0); + if (*end) + return -EINVAL; + + /* Pass to subsystem */ + retval = sched_group_set_rt_runtime(cgroup_tg(cgrp), val); + if (!retval) + retval = nbytes; + return retval; } -static u64 cpu_rt_ratio_read_uint(struct cgroup *cgrp, struct cftype *cft) -{ - struct task_group *tg = cgroup_tg(cgrp); +static ssize_t cpu_rt_runtime_read(struct cgroup *cgrp, struct cftype *cft, + struct file *file, + char __user *buf, size_t nbytes, + loff_t *ppos) +{ + char tmp[64]; + long val = sched_group_rt_runtime(cgroup_tg(cgrp)); + int len = sprintf(tmp, %ld\n, val); - return (u64) tg-rt_ratio; + return simple_read_from_buffer(buf, nbytes, ppos, tmp, len); } What's the reason that you can't use the cgroup read_uint/write_uint methods for this? Is it just because you have -1 as your unlimited value. If so, could we avoid that problem by using 0 rather than -1 as the unlimited value? It looks from what I've read in the Documentation changes as though 0 isn't really a meaningful value. Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/8] sched: rt-group: interface
On Sat, Feb 23, 2008 at 11:57 AM, Peter Zijlstra [EMAIL PROTECTED] wrote: If so, could we avoid that problem by using 0 rather than -1 as the unlimited value? It looks from what I've read in the Documentation changes as though 0 isn't really a meaningful value. 0 means no time, quite useful and clearly distinct from inf. time. So a real-time task in a cgroup with a 0 rt_runtime can be in the R state but never actually get to run? OK, if people need to be able to do that then fair enough. In that case I guess I'll have to add signed versions of the read_uint/write_uint methods. Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/8] sched: rt-group: interface
On Sat, Feb 23, 2008 at 12:26 PM, Peter Zijlstra [EMAIL PROTECTED] wrote: In that case I guess I'll have to add signed versions of the read_uint/write_uint methods. Yes, I looked at that, I found the interface somewhat unfortunate, it would mean growing the struct with two more function pointers. Is that really a big deal? We're talking about a structure that has a small number (10 in the current tree) of instances per cgroup subsystem. Perhaps a read and write function with abstract data would be better suited. That would allow for this and more. Sadly it looses type information. If the size of the struct cftype really became a problem, I think the cleanest way to fix it would be to have a union of the potential function pointers, and add a field to specify which one is in use. Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] ResCounter: Use read_uint in memory controller
On Sat, Feb 23, 2008 at 6:47 PM, Balbir Singh [EMAIL PROTECTED] wrote: res_counter_read_u64() I'd also want to rename all the other *read_uint functions/fields to *read_u64 too. Can I do that in a separate patch? Sounds sensible to me. Sure, fair enough. Actually, since multiple people were asking for this change I did the search/replace and sent it out already (as a precursor of the other patches in the series that I sent today). Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/7] cgroup: fix comments
On Wed, Feb 20, 2008 at 6:14 PM, Li Zefan <[EMAIL PROTECTED]> wrote: > Paul Menage wrote: > > I think that docbook-style function comments need /** at the start of > > the comment block. > > > > Yes, I didn't notice it. I revised the patch to fix it. > > > --- > > > fix: > - comments about need_forkexit_callback > - comments about release agent > - typo and comment style, etc. > > Signed-off-by: Li Zefan <[EMAIL PROTECTED]> Acked-by: Paul Menage <[EMAIL PROTECTED]> > --- > include/linux/cgroup.h |2 +- > kernel/cgroup.c| 142 > +++- > 2 files changed, 80 insertions(+), 64 deletions(-) > > > diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h > index ff9055f..2ebf7af 100644 > --- a/include/linux/cgroup.h > +++ b/include/linux/cgroup.h > @@ -175,7 +175,7 @@ struct css_set { > * > * > * When reading/writing to a file: > - * - the cgroup to use in file->f_dentry->d_parent->d_fsdata > + * - the cgroup to use is file->f_dentry->d_parent->d_fsdata > * - the 'cftype' of the file is file->f_dentry->d_fsdata > */ > > diff --git a/kernel/cgroup.c b/kernel/cgroup.c > index 4766bb6..36066d8 100644 > > --- a/kernel/cgroup.c > +++ b/kernel/cgroup.c > @@ -113,9 +113,9 @@ static int root_count; > #define dummytop (_cgroup) > > /* This flag indicates whether tasks in the fork and exit paths should > - * take callback_mutex and check for fork/exit handlers to call. This > - * avoids us having to do extra work in the fork/exit path if none of the > - * subsystems need to be called. > + * check for fork/exit handlers to call. This avoids us having to do > + * extra work in the fork/exit path if none of the subsystems need to > + * be called. > */ > static int need_forkexit_callback; > > @@ -307,7 +307,6 @@ static inline void put_css_set_taskexit(struct css_set > *cg) > * template: location in which to build the desired set of subsystem > * state objects for the new cgroup group > */ > - > static struct css_set *find_existing_css_set( > struct css_set *oldcg, > struct cgroup *cgrp, > @@ -354,7 +353,6 @@ static struct css_set *find_existing_css_set( > * and chains them on tmp through their cgrp_link_list fields. Returns 0 on > * success or a negative error > */ > - > static int allocate_cg_links(int count, struct list_head *tmp) > { > struct cg_cgroup_link *link; > @@ -396,7 +394,6 @@ static void free_cg_links(struct list_head *tmp) > * substituted into the appropriate hierarchy. Must be called with > * cgroup_mutex held > */ > - > > static struct css_set *find_css_set( > struct css_set *oldcg, struct cgroup *cgrp) > { > @@ -507,8 +504,8 @@ static struct css_set *find_css_set( > > * critical pieces of code here. The exception occurs on cgroup_exit(), > * when a task in a notify_on_release cgroup exits. Then cgroup_mutex > * is taken, and if the cgroup count is zero, a usermode call made > - * to /sbin/cgroup_release_agent with the name of the cgroup (path > - * relative to the root of cgroup file system) as the argument. > + * to the release agent with the name of the cgroup (path relative to > + * the root of cgroup file system) as the argument. > * > * A cgroup can only be deleted if both its 'count' of using tasks > * is zero, and its list of 'children' cgroups is empty. Since all > @@ -521,7 +518,7 @@ static struct css_set *find_css_set( > > * > * The need for this exception arises from the action of > * cgroup_attach_task(), which overwrites one tasks cgroup pointer with > - * another. It does so using cgroup_mutexe, however there are > + * another. It does so using cgroup_mutex, however there are > * several performance critical places that need to reference > * task->cgroup without the expense of grabbing a system global > * mutex. Therefore except as noted below, when dereferencing or, as > @@ -537,7 +534,6 @@ static struct css_set *find_css_set( > * cgroup_lock - lock out any changes to cgroup structures > * > */ > - > void cgroup_lock(void) > { > mutex_lock(_mutex); > @@ -548,7 +544,6 @@ void cgroup_lock(void) > * > * Undo the lock taken in a previous cgroup_lock() call. > */ > - > void cgroup_unlock(void) > { > mutex_unlock(_mutex); > @@ -590,7 +585,6 @@ static struct inode *cgroup_new_inode(mode_t mode, > struct super_block *sb) > * Call subsys's pre_destroy handler. > * This is called before css refcnt check. > *
Re: [PATCH 2/7] cgroup: fix comments
On Wed, Feb 20, 2008 at 6:14 PM, Li Zefan [EMAIL PROTECTED] wrote: Paul Menage wrote: I think that docbook-style function comments need /** at the start of the comment block. Yes, I didn't notice it. I revised the patch to fix it. --- fix: - comments about need_forkexit_callback - comments about release agent - typo and comment style, etc. Signed-off-by: Li Zefan [EMAIL PROTECTED] Acked-by: Paul Menage [EMAIL PROTECTED] --- include/linux/cgroup.h |2 +- kernel/cgroup.c| 142 +++- 2 files changed, 80 insertions(+), 64 deletions(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index ff9055f..2ebf7af 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -175,7 +175,7 @@ struct css_set { * * * When reading/writing to a file: - * - the cgroup to use in file-f_dentry-d_parent-d_fsdata + * - the cgroup to use is file-f_dentry-d_parent-d_fsdata * - the 'cftype' of the file is file-f_dentry-d_fsdata */ diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 4766bb6..36066d8 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -113,9 +113,9 @@ static int root_count; #define dummytop (rootnode.top_cgroup) /* This flag indicates whether tasks in the fork and exit paths should - * take callback_mutex and check for fork/exit handlers to call. This - * avoids us having to do extra work in the fork/exit path if none of the - * subsystems need to be called. + * check for fork/exit handlers to call. This avoids us having to do + * extra work in the fork/exit path if none of the subsystems need to + * be called. */ static int need_forkexit_callback; @@ -307,7 +307,6 @@ static inline void put_css_set_taskexit(struct css_set *cg) * template: location in which to build the desired set of subsystem * state objects for the new cgroup group */ - static struct css_set *find_existing_css_set( struct css_set *oldcg, struct cgroup *cgrp, @@ -354,7 +353,6 @@ static struct css_set *find_existing_css_set( * and chains them on tmp through their cgrp_link_list fields. Returns 0 on * success or a negative error */ - static int allocate_cg_links(int count, struct list_head *tmp) { struct cg_cgroup_link *link; @@ -396,7 +394,6 @@ static void free_cg_links(struct list_head *tmp) * substituted into the appropriate hierarchy. Must be called with * cgroup_mutex held */ - static struct css_set *find_css_set( struct css_set *oldcg, struct cgroup *cgrp) { @@ -507,8 +504,8 @@ static struct css_set *find_css_set( * critical pieces of code here. The exception occurs on cgroup_exit(), * when a task in a notify_on_release cgroup exits. Then cgroup_mutex * is taken, and if the cgroup count is zero, a usermode call made - * to /sbin/cgroup_release_agent with the name of the cgroup (path - * relative to the root of cgroup file system) as the argument. + * to the release agent with the name of the cgroup (path relative to + * the root of cgroup file system) as the argument. * * A cgroup can only be deleted if both its 'count' of using tasks * is zero, and its list of 'children' cgroups is empty. Since all @@ -521,7 +518,7 @@ static struct css_set *find_css_set( * * The need for this exception arises from the action of * cgroup_attach_task(), which overwrites one tasks cgroup pointer with - * another. It does so using cgroup_mutexe, however there are + * another. It does so using cgroup_mutex, however there are * several performance critical places that need to reference * task-cgroup without the expense of grabbing a system global * mutex. Therefore except as noted below, when dereferencing or, as @@ -537,7 +534,6 @@ static struct css_set *find_css_set( * cgroup_lock - lock out any changes to cgroup structures * */ - void cgroup_lock(void) { mutex_lock(cgroup_mutex); @@ -548,7 +544,6 @@ void cgroup_lock(void) * * Undo the lock taken in a previous cgroup_lock() call. */ - void cgroup_unlock(void) { mutex_unlock(cgroup_mutex); @@ -590,7 +585,6 @@ static struct inode *cgroup_new_inode(mode_t mode, struct super_block *sb) * Call subsys's pre_destroy handler. * This is called before css refcnt check. */ - static void cgroup_call_pre_destroy(struct cgroup *cgrp) { struct cgroup_subsys *ss; @@ -600,7 +594,6 @@ static void cgroup_call_pre_destroy(struct cgroup *cgrp) return; } - static void cgroup_diput(struct dentry *dentry, struct inode *inode) { /* is dentry a directory ? if so, kfree() associated cgroup */ @@ -1129,8 +1122,13 @@ static inline struct cftype *__d_cft(struct dentry *dentry) return dentry-d_fsdata; } -/* - * Called with cgroup_mutex held. Writes path
Re: [PATCH 7/7] cgroup: remove dead code in cgroup_get_rootdir()
2008/2/17 Li Zefan <[EMAIL PROTECTED]>: > Signed-off-by: Li Zefan <[EMAIL PROTECTED]> Acked-by: Paul Menage <[EMAIL PROTECTED]> > --- > kernel/cgroup.c |1 - > 1 files changed, 0 insertions(+), 1 deletions(-) > > diff --git a/kernel/cgroup.c b/kernel/cgroup.c > index 71cf961..879a056 100644 > --- a/kernel/cgroup.c > +++ b/kernel/cgroup.c > @@ -926,7 +926,6 @@ static int cgroup_get_rootdir(struct super_block *sb) > if (!inode) > return -ENOMEM; > > - inode->i_op = _dir_inode_operations; > inode->i_fop = _dir_operations; > inode->i_op = _dir_inode_operations; > /* directories start off with i_nlink == 2 (for "." entry) */ > -- > 1.5.4.rc3 > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/7] cgroup: remove dead code in cgroup_get_rootdir()
2008/2/17 Li Zefan [EMAIL PROTECTED]: Signed-off-by: Li Zefan [EMAIL PROTECTED] Acked-by: Paul Menage [EMAIL PROTECTED] --- kernel/cgroup.c |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 71cf961..879a056 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -926,7 +926,6 @@ static int cgroup_get_rootdir(struct super_block *sb) if (!inode) return -ENOMEM; - inode-i_op = simple_dir_inode_operations; inode-i_fop = simple_dir_operations; inode-i_op = cgroup_dir_inode_operations; /* directories start off with i_nlink == 2 (for . entry) */ -- 1.5.4.rc3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] cgroup map files: Add a key/value map file type to cgroups
On Feb 19, 2008 10:14 PM, YAMAMOTO Takashi <[EMAIL PROTECTED]> wrote: > > On Feb 19, 2008 9:48 PM, YAMAMOTO Takashi <[EMAIL PROTECTED]> wrote: > > > > > > it changes the format from "%s %lld" to "%s: %llu", right? > > > why? > > > > > > > The colon for consistency with maps in /proc. I think it also makes it > > slightly more readable. > > can you be a little more specific? > > i object against the colon because i want to use the same parser for > /proc/vmstat, which doesn't have colons. Ah. This /proc behaviour of having multiple formats for reporting the same kind of data (compare with /proc/meminfo, which does use colons) is the kind of thing that I want to avoid with cgroups. i.e. if two cgroup subsystems are both reporting the same kind of structured data, then they should both use the same output format. I guess since /proc has both styles, and memory.stat is the first file reporting key/value pairs in cgroups, you get to call the format. OK, I'll zap the colon. Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] cgroup map files: Add a key/value map file type to cgroups
On Feb 19, 2008 9:48 PM, YAMAMOTO Takashi <[EMAIL PROTECTED]> wrote: > > it changes the format from "%s %lld" to "%s: %llu", right? > why? > The colon for consistency with maps in /proc. I think it also makes it slightly more readable. For %lld versus %llu - I think that cgroup resource APIs are much more likely to need to report unsigned rather than signed values. In the case of the memory.stat file, that's certainly the case. But I guess there's an argument to be made that nothing's likely to need the final 64th bit of an unsigned value, whereas the ability to report negative numbers could potentially be useful for some cgroups. Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 1/7] CGroup API: Add cgroup.api control file
On Feb 19, 2008 9:17 PM, Paul Jackson <[EMAIL PROTECTED]> wrote: > > Perhaps my primary concern with these *.api files was that I did not > understand who or what the critical use or user was; who found this > essential, not just nice to have. > Right now, no-one would find it essential. If/when a binary API is added, I guess I'll ressurrect this part of the patchset. Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] Cpusets API: Update cpusets to use cgroup structured file API
Many of the cpusets control files are simple integer values, which don't require the overhead of memory allocations for reads and writes. Move the handlers for these control files into cpuset_read_uint() and cpuset_write_uint(). This also has the advantage that the control files show up as "u64" rather than "string" in the cgroup.api file. Signed-off-by: Paul Menage <[EMAIL PROTECTED]> --- kernel/cpuset.c | 156 +--- 1 file changed, 82 insertions(+), 74 deletions(-) Index: cpusets-2.6.25-rc2-mm1/kernel/cpuset.c === --- cpusets-2.6.25-rc2-mm1.orig/kernel/cpuset.c +++ cpusets-2.6.25-rc2-mm1/kernel/cpuset.c @@ -999,19 +999,6 @@ int current_cpuset_is_being_rebound(void } /* - * Call with cgroup_mutex held. - */ - -static int update_memory_pressure_enabled(struct cpuset *cs, char *buf) -{ - if (simple_strtoul(buf, NULL, 10) != 0) - cpuset_memory_pressure_enabled = 1; - else - cpuset_memory_pressure_enabled = 0; - return 0; -} - -/* * update_flag - read a 0 or a 1 in a file and update associated flag * bit:the bit to update (CS_CPU_EXCLUSIVE, CS_MEM_EXCLUSIVE, * CS_SCHED_LOAD_BALANCE, @@ -1023,15 +1010,13 @@ static int update_memory_pressure_enable * Call with cgroup_mutex held. */ -static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs, char *buf) +static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs, + int turning_on) { - int turning_on; struct cpuset trialcs; int err; int cpus_nonempty, balance_flag_changed; - turning_on = (simple_strtoul(buf, NULL, 10) != 0); - trialcs = *cs; if (turning_on) set_bit(bit, ); @@ -1247,43 +1232,65 @@ static ssize_t cpuset_common_file_write( case FILE_MEMLIST: retval = update_nodemask(cs, buffer); break; + default: + retval = -EINVAL; + goto out2; + } + + if (retval == 0) + retval = nbytes; +out2: + cgroup_unlock(); +out1: + kfree(buffer); + return retval; +} + +static int cpuset_write_uint(struct cgroup *cgrp, struct cftype *cft, u64 val) +{ + int retval = 0; + struct cpuset *cs = cgroup_cs(cgrp); + cpuset_filetype_t type = cft->private; + + cgroup_lock(); + + if (cgroup_is_removed(cgrp)) { + cgroup_unlock(); + return -ENODEV; + } + + switch (type) { case FILE_CPU_EXCLUSIVE: - retval = update_flag(CS_CPU_EXCLUSIVE, cs, buffer); + retval = update_flag(CS_CPU_EXCLUSIVE, cs, val); break; case FILE_MEM_EXCLUSIVE: - retval = update_flag(CS_MEM_EXCLUSIVE, cs, buffer); + retval = update_flag(CS_MEM_EXCLUSIVE, cs, val); break; case FILE_SCHED_LOAD_BALANCE: - retval = update_flag(CS_SCHED_LOAD_BALANCE, cs, buffer); + retval = update_flag(CS_SCHED_LOAD_BALANCE, cs, val); break; case FILE_MEMORY_MIGRATE: - retval = update_flag(CS_MEMORY_MIGRATE, cs, buffer); + retval = update_flag(CS_MEMORY_MIGRATE, cs, val); break; case FILE_MEMORY_PRESSURE_ENABLED: - retval = update_memory_pressure_enabled(cs, buffer); + cpuset_memory_pressure_enabled = !!val; break; case FILE_MEMORY_PRESSURE: retval = -EACCES; break; case FILE_SPREAD_PAGE: - retval = update_flag(CS_SPREAD_PAGE, cs, buffer); + retval = update_flag(CS_SPREAD_PAGE, cs, val); cs->mems_generation = cpuset_mems_generation++; break; case FILE_SPREAD_SLAB: - retval = update_flag(CS_SPREAD_SLAB, cs, buffer); + retval = update_flag(CS_SPREAD_SLAB, cs, val); cs->mems_generation = cpuset_mems_generation++; break; default: retval = -EINVAL; - goto out2; + break; } - - if (retval == 0) - retval = nbytes; -out2: cgroup_unlock(); -out1: - kfree(buffer); return retval; } @@ -1345,30 +1352,6 @@ static ssize_t cpuset_common_file_read(s case FILE_MEMLIST: s += cpuset_sprintf_memlist(s, cs); break; - case FILE_CPU_EXCLUSIVE: - *s++ = is_cpu_exclusive(cs) ? '1' : '0'; - break; - case FILE_MEM_EXCLUSIVE: - *s++ = is_mem_exclusive(cs) ? '1' : '0'; - break; - case FILE_SCHED_LOAD_BALANCE: - *s++ = is_sched_load_balance(cs) ? '1' : '0'; - brea
[PATCH 1/2] Cpusets API: From: Paul Jackson <[EMAIL PROTECTED]>
Strip all trailing whitespace in cgroup_write_uint This removes the need for people to remember to pass the -n flag to echo when writing values to cgroup control files. Signed-off-by: Paul Menage <[EMAIL PROTECTED]> --- kernel/cgroup.c |5 + 1 file changed, 1 insertion(+), 4 deletions(-) Index: cpusets-2.6.25-rc2-mm1/kernel/cgroup.c === --- cpusets-2.6.25-rc2-mm1.orig/kernel/cgroup.c +++ cpusets-2.6.25-rc2-mm1/kernel/cgroup.c @@ -1321,10 +1321,7 @@ static ssize_t cgroup_write_uint(struct return -EFAULT; buffer[nbytes] = 0; /* nul-terminate */ - - /* strip newline if necessary */ - if (nbytes && (buffer[nbytes-1] == '\n')) - buffer[nbytes-1] = 0; + strstrip(buffer); val = simple_strtoull(buffer, , 0); if (*end) return -EINVAL; -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/2] Cpusets API: Update Cpusets control files
This pair of patches simplifies the cpusets read/write path for the control files that consist of simple integers. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Improve init/Kconfig help descriptions [PATCH 6/9]
On Feb 19, 2008 6:54 PM, Nick Andrew <[EMAIL PROTECTED]> wrote: > > config CGROUPS > bool "Control Group support" > help > Control Groups enables processes to be tracked and grouped > into "cgroups". This enables you, for example, to associate > cgroups with certain CPU sets using "cpusets". > > When enabled, a new filesystem type "cgroup" is available > and can be mounted to control cpusets and other > resource/behaviour controllers. > > See for more information. > > If unsure, say N. > > > I don't think that description is as clear as it could be. From > the non-kernel-developer point of view, that is. Originally this wasn't a user-selectable config value, it was auto-selected by any subsystem that needed it. I think that was nicer from the user-experience, and it would eliminate the need for this documentation but there were concerns that this triggered unspecified brokenness in the Kbuild system. > > Re "other resource/behaviour controllers", what in particular? > I take it that our current controllers are cpusets, scheduler, > CPU accounting and Resource counters? Resource counters aren't a resource controller, they're a helper library. The others are good examples, as is the memory controller that's just been added to 2.6.25. Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/7] cgroup: fix and update documentation
On Feb 18, 2008 12:39 AM, Li Zefan <[EMAIL PROTECTED]> wrote: > Misc fixes and updates, make the doc consistent with current > cgroup implementation. > > Signed-off-by: Li Zefan <[EMAIL PROTECTED]> Acked-by: Paul Menage <[EMAIL PROTECTED]> Thanks for these cleanups. Paul > --- > Documentation/cgroups.txt | 66 ++-- > 1 files changed, 33 insertions(+), 33 deletions(-) > > diff --git a/Documentation/cgroups.txt b/Documentation/cgroups.txt > index 42d7c4c..31d12e2 100644 > > --- a/Documentation/cgroups.txt > +++ b/Documentation/cgroups.txt > @@ -28,7 +28,7 @@ CONTENTS: > 4. Questions > > 1. Control Groups > -== > += > > 1.1 What are cgroups ? > -- > @@ -143,10 +143,10 @@ proliferation of such cgroups. > > Also lets say that the administrator would like to give enhanced network > access temporarily to a student's browser (since it is night and the user > -wants to do online gaming :) OR give one of the students simulation > +wants to do online gaming :)) OR give one of the students simulation > apps enhanced CPU power, > > -With ability to write pids directly to resource classes, its just a > +With ability to write pids directly to resource classes, it's just a > matter of : > > # echo pid > /mnt/network//tasks > @@ -227,10 +227,13 @@ Each cgroup is represented by a directory in the cgroup > file system > containing the following files describing that cgroup: > > - tasks: list of tasks (by pid) attached to that cgroup > - - notify_on_release flag: run /sbin/cgroup_release_agent on exit? > + - releasable flag: cgroup currently removeable? > + - notify_on_release flag: run the release agent on exit? > + - release_agent: the path to use for release notifications (this file > + exists in the top cgroup only) > > Other subsystems such as cpusets may add additional files in each > -cgroup dir > +cgroup dir. > > New cgroups are created using the mkdir system call or shell > command. The properties of a cgroup, such as its flags, are > @@ -257,7 +260,7 @@ performance. > To allow access from a cgroup to the css_sets (and hence tasks) > that comprise it, a set of cg_cgroup_link objects form a lattice; > each cg_cgroup_link is linked into a list of cg_cgroup_links for > -a single cgroup on its cont_link_list field, and a list of > +a single cgroup on its cgrp_link_list field, and a list of > cg_cgroup_links for a single css_set on its cg_link_list. > > Thus the set of tasks in a cgroup can be listed by iterating over > @@ -271,9 +274,6 @@ for cgroups, with a minimum of additional kernel code. > 1.4 What does notify_on_release do ? > > > -*** notify_on_release is disabled in the current patch set. It will be > -*** reactivated in a future patch in a less-intrusive manner > - > If the notify_on_release flag is enabled (1) in a cgroup, then > whenever the last task in the cgroup leaves (exits or attaches to > some other cgroup) and the last child cgroup of that cgroup > @@ -360,8 +360,8 @@ Now you want to do something with this cgroup. > > In this directory you can find several files: > # ls > -notify_on_release release_agent tasks > -(plus whatever files are added by the attached subsystems) > +notify_on_release releasable tasks > +(plus whatever files added by the attached subsystems) > > Now attach your shell to this cgroup: > # /bin/echo $$ > tasks > @@ -404,19 +404,13 @@ with a subsystem id which will be assigned by the > cgroup system. > Other fields in the cgroup_subsys object include: > > - subsys_id: a unique array index for the subsystem, indicating which > - entry in cgroup->subsys[] this subsystem should be > - managing. Initialized by cgroup_register_subsys(); prior to this > - it should be initialized to -1 > + entry in cgroup->subsys[] this subsystem should be managing. > > -- hierarchy: an index indicating which hierarchy, if any, this > - subsystem is currently attached to. If this is -1, then the > - subsystem is not attached to any hierarchy, and all tasks should be > - considered to be members of the subsystem's top_cgroup. It should > - be initialized to -1. > +- name: should be initialized to a unique subsystem name. Should be > + no longer than MAX_CGROUP_TYPE_NAMELEN. > > -- name: should be initialized to a unique subsystem name prior to > - calling cgroup_register_subsystem. Should be no longer than > - MAX_CGROUP_TYPE_NAMELEN > +- early_init: indicate if the subsystem needs early initialization > + at system boot. > > Each cgroup object created by the system has an arr
Re: [PATCH 4/7] cgroup: fix memory leak in cgroup_get_sb()
On Feb 17, 2008 9:49 PM, Li Zefan <[EMAIL PROTECTED]> wrote: > opts.release_agent is not kfree()ed in all necessary places. > > Signed-off-by: Li Zefan <[EMAIL PROTECTED]> Acked-by: Paul Menage <[EMAIL PROTECTED]> Good catch, although hopefully something that would be extremely rare in practice. Thanks, Paul > --- > kernel/cgroup.c |5 - > 1 files changed, 4 insertions(+), 1 deletions(-) > > diff --git a/kernel/cgroup.c b/kernel/cgroup.c > index 0c35022..aa76bbd 100644 > --- a/kernel/cgroup.c > +++ b/kernel/cgroup.c > @@ -961,8 +961,11 @@ static int cgroup_get_sb(struct file_system_type > *fs_type, > } > > root = kzalloc(sizeof(*root), GFP_KERNEL); > - if (!root) > + if (!root) { > + if (opts.release_agent) > + kfree(opts.release_agent); > return -ENOMEM; > + } > > init_cgroup_root(root); > root->subsys_bits = opts.subsys_bits; > -- > 1.5.4.rc3 > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/7] cgroup: remove duplicate code in find_css_set()
On Feb 17, 2008 9:49 PM, Li Zefan <[EMAIL PROTECTED]> wrote: > The list head res->tasks gets initialized twice in find_css_set(). > > Signed-off-by: Li Zefan <[EMAIL PROTECTED]> Acked-by: Paul Menage <[EMAIL PROTECTED]> > --- > kernel/cgroup.c |1 - > 1 files changed, 0 insertions(+), 1 deletions(-) > > diff --git a/kernel/cgroup.c b/kernel/cgroup.c > index e8c8e58..71cf961 100644 > --- a/kernel/cgroup.c > +++ b/kernel/cgroup.c > @@ -473,7 +473,6 @@ static struct css_set *find_css_set( > /* Link this cgroup group into the list */ > list_add(>list, _css_set.list); > css_set_count++; > - INIT_LIST_HEAD(>tasks); > write_unlock(_set_lock); > > return res; > -- > 1.5.4.rc3 > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/7] cgroup: fix comments
On Feb 17, 2008 9:49 PM, Li Zefan <[EMAIL PROTECTED]> wrote: > fix: > - comments about need_forkexit_callback > - comments about release agent > - typo and comment style, etc. > > Signed-off-by: Li Zefan <[EMAIL PROTECTED]> > --- > include/linux/cgroup.h |2 +- > kernel/cgroup.c| 44 +--- > 2 files changed, 22 insertions(+), 24 deletions(-) > > diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h > index ff9055f..2ebf7af 100644 > --- a/include/linux/cgroup.h > +++ b/include/linux/cgroup.h > @@ -175,7 +175,7 @@ struct css_set { > * > * > * When reading/writing to a file: > - * - the cgroup to use in file->f_dentry->d_parent->d_fsdata > + * - the cgroup to use is file->f_dentry->d_parent->d_fsdata > * - the 'cftype' of the file is file->f_dentry->d_fsdata > */ > > diff --git a/kernel/cgroup.c b/kernel/cgroup.c > index 4766bb6..0c35022 100644 > --- a/kernel/cgroup.c > +++ b/kernel/cgroup.c > @@ -113,9 +113,9 @@ static int root_count; > #define dummytop (_cgroup) > > /* This flag indicates whether tasks in the fork and exit paths should > - * take callback_mutex and check for fork/exit handlers to call. This > - * avoids us having to do extra work in the fork/exit path if none of the > - * subsystems need to be called. > + * check for fork/exit handlers to call. This avoids us having to do > + * extra work in the fork/exit path if none of the subsystems need to > + * be called. > */ > static int need_forkexit_callback; > > @@ -507,8 +507,8 @@ static struct css_set *find_css_set( > * critical pieces of code here. The exception occurs on cgroup_exit(), > * when a task in a notify_on_release cgroup exits. Then cgroup_mutex > * is taken, and if the cgroup count is zero, a usermode call made > - * to /sbin/cgroup_release_agent with the name of the cgroup (path > - * relative to the root of cgroup file system) as the argument. > + * to the release agent with the name of the cgroup (path relative to > + * the root of cgroup file system) as the argument. > * > * A cgroup can only be deleted if both its 'count' of using tasks > * is zero, and its list of 'children' cgroups is empty. Since all > @@ -521,7 +521,7 @@ static struct css_set *find_css_set( > * > * The need for this exception arises from the action of > * cgroup_attach_task(), which overwrites one tasks cgroup pointer with > - * another. It does so using cgroup_mutexe, however there are > + * another. It does so using cgroup_mutex, however there are > * several performance critical places that need to reference > * task->cgroup without the expense of grabbing a system global > * mutex. Therefore except as noted below, when dereferencing or, as > @@ -1192,7 +1192,7 @@ static void get_first_subsys(const struct cgroup *cgrp, > * Attach task 'tsk' to cgroup 'cgrp' > * > * Call holding cgroup_mutex. May take task_lock of > - * the task 'pid' during call. > + * the task 'tsk' during call. > */ > int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk) > { > @@ -1584,12 +1584,11 @@ static int cgroup_create_file(struct dentry *dentry, > int mode, > } > > /* I think that docbook-style function comments need /** at the start of the comment block. Thanks, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/7] cgroup: fix subsys bitops
On Feb 17, 2008 9:49 PM, Li Zefan <[EMAIL PROTECTED]> wrote: > Cgroup uses unsigned long for subsys bitops, not unsigned long long. > > Signed-off-by: Li Zefan <[EMAIL PROTECTED]> Acked-by: Paul Menage <[EMAIL PROTECTED]> > --- > kernel/cgroup.c |4 ++-- > 1 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/kernel/cgroup.c b/kernel/cgroup.c > index aa76bbd..e8c8e58 100644 > --- a/kernel/cgroup.c > +++ b/kernel/cgroup.c > @@ -320,7 +320,7 @@ static struct css_set *find_existing_css_set( > /* Built the set of subsystem state objects that we want to > * see in the new css_set */ > for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) { > - if (root->subsys_bits & (1ull << i)) { > + if (root->subsys_bits & (1UL << i)) { > /* Subsystem is in this hierarchy. So we want > * the subsystem state from the new > * cgroup */ > @@ -696,7 +696,7 @@ static int rebind_subsystems(struct cgroupfs_root *root, > added_bits = final_bits & ~root->actual_subsys_bits; > /* Check that any added subsystems are currently free */ > for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) { > - unsigned long long bit = 1ull << i; > + unsigned long bit = 1UL << i; > struct cgroup_subsys *ss = subsys[i]; > if (!(bit & added_bits)) > continue; > -- > 1.5.4.rc3 > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/7] cgroup: clean up cgroup.h
On Feb 17, 2008 9:49 PM, Li Zefan <[EMAIL PROTECTED]> wrote: > - replace old name 'cont' with 'cgrp' (Paul Menage did this cleanup for > cgroup.c in commit bd89aabc6761de1c35b154fe6f914a445d301510) > - remove a duplicate declaration of cgroup_path() > > Signed-off-by: Li Zefan <[EMAIL PROTECTED]> Acked-by: Paul Menage <[EMAIL PROTECTED]> > --- > include/linux/cgroup.h | 48 > +++- > 1 files changed, 23 insertions(+), 25 deletions(-) > > diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h > index 2ebf7af..028ba3b 100644 > --- a/include/linux/cgroup.h > +++ b/include/linux/cgroup.h > @@ -186,15 +186,15 @@ struct cftype { > char name[MAX_CFTYPE_NAME]; > int private; > int (*open) (struct inode *inode, struct file *file); > - ssize_t (*read) (struct cgroup *cont, struct cftype *cft, > + ssize_t (*read) (struct cgroup *cgrp, struct cftype *cft, > struct file *file, > char __user *buf, size_t nbytes, loff_t *ppos); > /* > * read_uint() is a shortcut for the common case of returning a > * single integer. Use it in place of read() > */ > - u64 (*read_uint) (struct cgroup *cont, struct cftype *cft); > - ssize_t (*write) (struct cgroup *cont, struct cftype *cft, > + u64 (*read_uint) (struct cgroup *cgrp, struct cftype *cft); > + ssize_t (*write) (struct cgroup *cgrp, struct cftype *cft, > struct file *file, > const char __user *buf, size_t nbytes, loff_t > *ppos); > > @@ -203,7 +203,7 @@ struct cftype { > * a single integer (as parsed by simple_strtoull) from > * userspace. Use in place of write(); return 0 or error. > */ > - int (*write_uint) (struct cgroup *cont, struct cftype *cft, u64 val); > + int (*write_uint) (struct cgroup *cgrp, struct cftype *cft, u64 val); > > int (*release) (struct inode *inode, struct file *file); > }; > @@ -218,41 +218,41 @@ struct cgroup_scanner { > > /* Add a new file to the given cgroup directory. Should only be > * called by subsystems from within a populate() method */ > -int cgroup_add_file(struct cgroup *cont, struct cgroup_subsys *subsys, > +int cgroup_add_file(struct cgroup *cgrp, struct cgroup_subsys *subsys, >const struct cftype *cft); > > /* Add a set of new files to the given cgroup directory. Should > * only be called by subsystems from within a populate() method */ > -int cgroup_add_files(struct cgroup *cont, > +int cgroup_add_files(struct cgroup *cgrp, > struct cgroup_subsys *subsys, > const struct cftype cft[], > int count); > > -int cgroup_is_removed(const struct cgroup *cont); > +int cgroup_is_removed(const struct cgroup *cgrp); > > -int cgroup_path(const struct cgroup *cont, char *buf, int buflen); > +int cgroup_path(const struct cgroup *cgrp, char *buf, int buflen); > > -int cgroup_task_count(const struct cgroup *cont); > +int cgroup_task_count(const struct cgroup *cgrp); > > /* Return true if the cgroup is a descendant of the current cgroup */ > -int cgroup_is_descendant(const struct cgroup *cont); > +int cgroup_is_descendant(const struct cgroup *cgrp); > > /* Control Group subsystem type. See Documentation/cgroups.txt for details */ > > struct cgroup_subsys { > struct cgroup_subsys_state *(*create)(struct cgroup_subsys *ss, > - struct cgroup *cont); > - void (*pre_destroy)(struct cgroup_subsys *ss, struct cgroup *cont); > - void (*destroy)(struct cgroup_subsys *ss, struct cgroup *cont); > + struct cgroup *cgrp); > + void (*pre_destroy)(struct cgroup_subsys *ss, struct cgroup *cgrp); > + void (*destroy)(struct cgroup_subsys *ss, struct cgroup *cgrp); > int (*can_attach)(struct cgroup_subsys *ss, > - struct cgroup *cont, struct task_struct *tsk); > - void (*attach)(struct cgroup_subsys *ss, struct cgroup *cont, > - struct cgroup *old_cont, struct task_struct *tsk); > + struct cgroup *cgrp, struct task_struct *tsk); > + void (*attach)(struct cgroup_subsys *ss, struct cgroup *cgrp, > + struct cgroup *old_cgrp, struct task_struct *tsk); > void (*fork)(struct cgroup_subsys *ss, struct task_struct *task); > void (*exit)(struct cgroup_subsys *ss, struct task_struct *task); > int (*populate)(struct cgroup_subsys *ss,
Re: [RFC][PATCH 1/7] CGroup API: Add cgroup.api control file
On Feb 18, 2008 1:45 AM, Li Zefan <[EMAIL PROTECTED]> wrote: > > > > But we don't have /proc/proc.api or /sys/sysfs.api ... True. And /proc is a bit of a mess. Having a similar API file for sysfs sounds like a good idea to me. > > And is it better to describe the debug subsystem too? > Yes, probably, but that would be a separate patch to the debug subsystem itself, not the main cgroups code. Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 1/7] CGroup API: Add cgroup.api control file
On Feb 19, 2008 1:57 PM, Paul Jackson <[EMAIL PROTECTED]> wrote: > > Finally, it goes against the one thingie per file (at most, one scalar > vector) that has worked well for us when tried. Right, I like the idea of keeping things simple. But if you're going to accept that a vector is useful, then it seems reasonable that some other *simple* structured datatypes can be useful. An N-element key/value map (a la /proc/meminfo) is, I think, nicer than having to read values from N separate files. > > As to the motivations Paul M gives: > 1) Avoid "an arbitrary mess of ad-hoc APIs": > We can still do that, whether or not we "self-document" these > API's in this manner. We can, but this file makes it more clear what control files have a well-defined API and which are just returning some ad-hoc string. I guess it's not essential, I just figured that if we had that information, it made sense to make it available to userspace. I guess I'm happy with dropping the actual exposed cgroup.api file for now as long as we can work towards reducing the number of control files that just return strings, and make use of the structured output such as read_uint() miore. > 2) binary APIs versus ASCII APIs: > Well, I have an ASCII API bias, not surprising. But I'd > suggest not doing things "in anticipation" of some future > fuzzy binary API support. Wait until that day actually arrives. I have a reasonably clear idea of how we can do the binary API. That's mostly for a separate RFC. But for example, reading a map via the binary API would be able to just return a list values since the keys could be parsed once from the ascii map (provided that the subsystem guaranteed that the map keys and their order wouldn't change between reboots). > 3) The memory controller currently has files with the "_in_bytes": > The traditional way to handle this is Documentation and man > pages; good enough for my granddad, good enough for me ;). I've tried submitting patches to remove the in_bytes suffix and just rely on the documentation, and people didn't seem to like it ... Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Improve init/Kconfig help descriptions [PATCH 6/9]
On Feb 19, 2008 7:12 AM, Nick Andrew <[EMAIL PROTECTED]> wrote: > config CGROUPS > bool "Control Group support" > help > - This option will let you use process cgroup subsystems > - such as Cpusets > + Control Groups enables processes to be tracked and grouped > + into "cgroups". This enables you, for example, to associate > + cgroups with certain CPU sets using "cpusets". > > - Say N if unsure. > + When enabled, a new filesystem type "cgroup" is available > + and can be mounted to control cpusets. How about: ... cpusets and other resource/behaviour controllers. Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Improve init/Kconfig help descriptions [PATCH 6/9]
On Feb 19, 2008 7:12 AM, Nick Andrew [EMAIL PROTECTED] wrote: config CGROUPS bool Control Group support help - This option will let you use process cgroup subsystems - such as Cpusets + Control Groups enables processes to be tracked and grouped + into cgroups. This enables you, for example, to associate + cgroups with certain CPU sets using cpusets. - Say N if unsure. + When enabled, a new filesystem type cgroup is available + and can be mounted to control cpusets. How about: ... cpusets and other resource/behaviour controllers. Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 1/7] CGroup API: Add cgroup.api control file
On Feb 19, 2008 1:57 PM, Paul Jackson [EMAIL PROTECTED] wrote: Finally, it goes against the one thingie per file (at most, one scalar vector) that has worked well for us when tried. Right, I like the idea of keeping things simple. But if you're going to accept that a vector is useful, then it seems reasonable that some other *simple* structured datatypes can be useful. An N-element key/value map (a la /proc/meminfo) is, I think, nicer than having to read values from N separate files. As to the motivations Paul M gives: 1) Avoid an arbitrary mess of ad-hoc APIs: We can still do that, whether or not we self-document these API's in this manner. We can, but this file makes it more clear what control files have a well-defined API and which are just returning some ad-hoc string. I guess it's not essential, I just figured that if we had that information, it made sense to make it available to userspace. I guess I'm happy with dropping the actual exposed cgroup.api file for now as long as we can work towards reducing the number of control files that just return strings, and make use of the structured output such as read_uint() miore. 2) binary APIs versus ASCII APIs: Well, I have an ASCII API bias, not surprising. But I'd suggest not doing things in anticipation of some future fuzzy binary API support. Wait until that day actually arrives. I have a reasonably clear idea of how we can do the binary API. That's mostly for a separate RFC. But for example, reading a map via the binary API would be able to just return a list values since the keys could be parsed once from the ascii map (provided that the subsystem guaranteed that the map keys and their order wouldn't change between reboots). 3) The memory controller currently has files with the _in_bytes: The traditional way to handle this is Documentation and man pages; good enough for my granddad, good enough for me ;). I've tried submitting patches to remove the in_bytes suffix and just rely on the documentation, and people didn't seem to like it ... Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 1/7] CGroup API: Add cgroup.api control file
On Feb 18, 2008 1:45 AM, Li Zefan [EMAIL PROTECTED] wrote: But we don't have /proc/proc.api or /sys/sysfs.api ... True. And /proc is a bit of a mess. Having a similar API file for sysfs sounds like a good idea to me. And is it better to describe the debug subsystem too? Yes, probably, but that would be a separate patch to the debug subsystem itself, not the main cgroups code. Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/7] cgroup: clean up cgroup.h
On Feb 17, 2008 9:49 PM, Li Zefan [EMAIL PROTECTED] wrote: - replace old name 'cont' with 'cgrp' (Paul Menage did this cleanup for cgroup.c in commit bd89aabc6761de1c35b154fe6f914a445d301510) - remove a duplicate declaration of cgroup_path() Signed-off-by: Li Zefan [EMAIL PROTECTED] Acked-by: Paul Menage [EMAIL PROTECTED] --- include/linux/cgroup.h | 48 +++- 1 files changed, 23 insertions(+), 25 deletions(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 2ebf7af..028ba3b 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -186,15 +186,15 @@ struct cftype { char name[MAX_CFTYPE_NAME]; int private; int (*open) (struct inode *inode, struct file *file); - ssize_t (*read) (struct cgroup *cont, struct cftype *cft, + ssize_t (*read) (struct cgroup *cgrp, struct cftype *cft, struct file *file, char __user *buf, size_t nbytes, loff_t *ppos); /* * read_uint() is a shortcut for the common case of returning a * single integer. Use it in place of read() */ - u64 (*read_uint) (struct cgroup *cont, struct cftype *cft); - ssize_t (*write) (struct cgroup *cont, struct cftype *cft, + u64 (*read_uint) (struct cgroup *cgrp, struct cftype *cft); + ssize_t (*write) (struct cgroup *cgrp, struct cftype *cft, struct file *file, const char __user *buf, size_t nbytes, loff_t *ppos); @@ -203,7 +203,7 @@ struct cftype { * a single integer (as parsed by simple_strtoull) from * userspace. Use in place of write(); return 0 or error. */ - int (*write_uint) (struct cgroup *cont, struct cftype *cft, u64 val); + int (*write_uint) (struct cgroup *cgrp, struct cftype *cft, u64 val); int (*release) (struct inode *inode, struct file *file); }; @@ -218,41 +218,41 @@ struct cgroup_scanner { /* Add a new file to the given cgroup directory. Should only be * called by subsystems from within a populate() method */ -int cgroup_add_file(struct cgroup *cont, struct cgroup_subsys *subsys, +int cgroup_add_file(struct cgroup *cgrp, struct cgroup_subsys *subsys, const struct cftype *cft); /* Add a set of new files to the given cgroup directory. Should * only be called by subsystems from within a populate() method */ -int cgroup_add_files(struct cgroup *cont, +int cgroup_add_files(struct cgroup *cgrp, struct cgroup_subsys *subsys, const struct cftype cft[], int count); -int cgroup_is_removed(const struct cgroup *cont); +int cgroup_is_removed(const struct cgroup *cgrp); -int cgroup_path(const struct cgroup *cont, char *buf, int buflen); +int cgroup_path(const struct cgroup *cgrp, char *buf, int buflen); -int cgroup_task_count(const struct cgroup *cont); +int cgroup_task_count(const struct cgroup *cgrp); /* Return true if the cgroup is a descendant of the current cgroup */ -int cgroup_is_descendant(const struct cgroup *cont); +int cgroup_is_descendant(const struct cgroup *cgrp); /* Control Group subsystem type. See Documentation/cgroups.txt for details */ struct cgroup_subsys { struct cgroup_subsys_state *(*create)(struct cgroup_subsys *ss, - struct cgroup *cont); - void (*pre_destroy)(struct cgroup_subsys *ss, struct cgroup *cont); - void (*destroy)(struct cgroup_subsys *ss, struct cgroup *cont); + struct cgroup *cgrp); + void (*pre_destroy)(struct cgroup_subsys *ss, struct cgroup *cgrp); + void (*destroy)(struct cgroup_subsys *ss, struct cgroup *cgrp); int (*can_attach)(struct cgroup_subsys *ss, - struct cgroup *cont, struct task_struct *tsk); - void (*attach)(struct cgroup_subsys *ss, struct cgroup *cont, - struct cgroup *old_cont, struct task_struct *tsk); + struct cgroup *cgrp, struct task_struct *tsk); + void (*attach)(struct cgroup_subsys *ss, struct cgroup *cgrp, + struct cgroup *old_cgrp, struct task_struct *tsk); void (*fork)(struct cgroup_subsys *ss, struct task_struct *task); void (*exit)(struct cgroup_subsys *ss, struct task_struct *task); int (*populate)(struct cgroup_subsys *ss, - struct cgroup *cont); - void (*post_clone)(struct cgroup_subsys *ss, struct cgroup *cont); + struct cgroup *cgrp); + void (*post_clone)(struct cgroup_subsys *ss, struct cgroup *cgrp); void (*bind)(struct cgroup_subsys *ss, struct cgroup *root); int subsys_id; int active; @@ -273,9 +273,9 @@ struct
Re: [PATCH 5/7] cgroup: fix subsys bitops
On Feb 17, 2008 9:49 PM, Li Zefan [EMAIL PROTECTED] wrote: Cgroup uses unsigned long for subsys bitops, not unsigned long long. Signed-off-by: Li Zefan [EMAIL PROTECTED] Acked-by: Paul Menage [EMAIL PROTECTED] --- kernel/cgroup.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index aa76bbd..e8c8e58 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -320,7 +320,7 @@ static struct css_set *find_existing_css_set( /* Built the set of subsystem state objects that we want to * see in the new css_set */ for (i = 0; i CGROUP_SUBSYS_COUNT; i++) { - if (root-subsys_bits (1ull i)) { + if (root-subsys_bits (1UL i)) { /* Subsystem is in this hierarchy. So we want * the subsystem state from the new * cgroup */ @@ -696,7 +696,7 @@ static int rebind_subsystems(struct cgroupfs_root *root, added_bits = final_bits ~root-actual_subsys_bits; /* Check that any added subsystems are currently free */ for (i = 0; i CGROUP_SUBSYS_COUNT; i++) { - unsigned long long bit = 1ull i; + unsigned long bit = 1UL i; struct cgroup_subsys *ss = subsys[i]; if (!(bit added_bits)) continue; -- 1.5.4.rc3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/7] cgroup: fix memory leak in cgroup_get_sb()
On Feb 17, 2008 9:49 PM, Li Zefan [EMAIL PROTECTED] wrote: opts.release_agent is not kfree()ed in all necessary places. Signed-off-by: Li Zefan [EMAIL PROTECTED] Acked-by: Paul Menage [EMAIL PROTECTED] Good catch, although hopefully something that would be extremely rare in practice. Thanks, Paul --- kernel/cgroup.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 0c35022..aa76bbd 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -961,8 +961,11 @@ static int cgroup_get_sb(struct file_system_type *fs_type, } root = kzalloc(sizeof(*root), GFP_KERNEL); - if (!root) + if (!root) { + if (opts.release_agent) + kfree(opts.release_agent); return -ENOMEM; + } init_cgroup_root(root); root-subsys_bits = opts.subsys_bits; -- 1.5.4.rc3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/7] cgroup: fix and update documentation
On Feb 18, 2008 12:39 AM, Li Zefan [EMAIL PROTECTED] wrote: Misc fixes and updates, make the doc consistent with current cgroup implementation. Signed-off-by: Li Zefan [EMAIL PROTECTED] Acked-by: Paul Menage [EMAIL PROTECTED] Thanks for these cleanups. Paul --- Documentation/cgroups.txt | 66 ++-- 1 files changed, 33 insertions(+), 33 deletions(-) diff --git a/Documentation/cgroups.txt b/Documentation/cgroups.txt index 42d7c4c..31d12e2 100644 --- a/Documentation/cgroups.txt +++ b/Documentation/cgroups.txt @@ -28,7 +28,7 @@ CONTENTS: 4. Questions 1. Control Groups -== += 1.1 What are cgroups ? -- @@ -143,10 +143,10 @@ proliferation of such cgroups. Also lets say that the administrator would like to give enhanced network access temporarily to a student's browser (since it is night and the user -wants to do online gaming :) OR give one of the students simulation +wants to do online gaming :)) OR give one of the students simulation apps enhanced CPU power, -With ability to write pids directly to resource classes, its just a +With ability to write pids directly to resource classes, it's just a matter of : # echo pid /mnt/network/new_class/tasks @@ -227,10 +227,13 @@ Each cgroup is represented by a directory in the cgroup file system containing the following files describing that cgroup: - tasks: list of tasks (by pid) attached to that cgroup - - notify_on_release flag: run /sbin/cgroup_release_agent on exit? + - releasable flag: cgroup currently removeable? + - notify_on_release flag: run the release agent on exit? + - release_agent: the path to use for release notifications (this file + exists in the top cgroup only) Other subsystems such as cpusets may add additional files in each -cgroup dir +cgroup dir. New cgroups are created using the mkdir system call or shell command. The properties of a cgroup, such as its flags, are @@ -257,7 +260,7 @@ performance. To allow access from a cgroup to the css_sets (and hence tasks) that comprise it, a set of cg_cgroup_link objects form a lattice; each cg_cgroup_link is linked into a list of cg_cgroup_links for -a single cgroup on its cont_link_list field, and a list of +a single cgroup on its cgrp_link_list field, and a list of cg_cgroup_links for a single css_set on its cg_link_list. Thus the set of tasks in a cgroup can be listed by iterating over @@ -271,9 +274,6 @@ for cgroups, with a minimum of additional kernel code. 1.4 What does notify_on_release do ? -*** notify_on_release is disabled in the current patch set. It will be -*** reactivated in a future patch in a less-intrusive manner - If the notify_on_release flag is enabled (1) in a cgroup, then whenever the last task in the cgroup leaves (exits or attaches to some other cgroup) and the last child cgroup of that cgroup @@ -360,8 +360,8 @@ Now you want to do something with this cgroup. In this directory you can find several files: # ls -notify_on_release release_agent tasks -(plus whatever files are added by the attached subsystems) +notify_on_release releasable tasks +(plus whatever files added by the attached subsystems) Now attach your shell to this cgroup: # /bin/echo $$ tasks @@ -404,19 +404,13 @@ with a subsystem id which will be assigned by the cgroup system. Other fields in the cgroup_subsys object include: - subsys_id: a unique array index for the subsystem, indicating which - entry in cgroup-subsys[] this subsystem should be - managing. Initialized by cgroup_register_subsys(); prior to this - it should be initialized to -1 + entry in cgroup-subsys[] this subsystem should be managing. -- hierarchy: an index indicating which hierarchy, if any, this - subsystem is currently attached to. If this is -1, then the - subsystem is not attached to any hierarchy, and all tasks should be - considered to be members of the subsystem's top_cgroup. It should - be initialized to -1. +- name: should be initialized to a unique subsystem name. Should be + no longer than MAX_CGROUP_TYPE_NAMELEN. -- name: should be initialized to a unique subsystem name prior to - calling cgroup_register_subsystem. Should be no longer than - MAX_CGROUP_TYPE_NAMELEN +- early_init: indicate if the subsystem needs early initialization + at system boot. Each cgroup object created by the system has an array of pointers, indexed by subsystem id; this pointer is entirely managed by the @@ -434,8 +428,6 @@ situation. See kernel/cgroup.c for more details. Subsystems can take/release the cgroup_mutex via the functions -cgroup_lock()/cgroup_unlock(), and can -take/release the callback_mutex via the functions cgroup_lock()/cgroup_unlock(). Accessing a task's cgroup pointer may be done in the following ways: @@ -444,7 +436,7 @@ Accessing a task's
Re: Improve init/Kconfig help descriptions [PATCH 6/9]
On Feb 19, 2008 6:54 PM, Nick Andrew [EMAIL PROTECTED] wrote: config CGROUPS bool Control Group support help Control Groups enables processes to be tracked and grouped into cgroups. This enables you, for example, to associate cgroups with certain CPU sets using cpusets. When enabled, a new filesystem type cgroup is available and can be mounted to control cpusets and other resource/behaviour controllers. See file:Documentation/cgroups.txt for more information. If unsure, say N. I don't think that description is as clear as it could be. From the non-kernel-developer point of view, that is. Originally this wasn't a user-selectable config value, it was auto-selected by any subsystem that needed it. I think that was nicer from the user-experience, and it would eliminate the need for this documentation but there were concerns that this triggered unspecified brokenness in the Kbuild system. Re other resource/behaviour controllers, what in particular? I take it that our current controllers are cpusets, scheduler, CPU accounting and Resource counters? Resource counters aren't a resource controller, they're a helper library. The others are good examples, as is the memory controller that's just been added to 2.6.25. Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/7] cgroup: remove duplicate code in find_css_set()
On Feb 17, 2008 9:49 PM, Li Zefan [EMAIL PROTECTED] wrote: The list head res-tasks gets initialized twice in find_css_set(). Signed-off-by: Li Zefan [EMAIL PROTECTED] Acked-by: Paul Menage [EMAIL PROTECTED] --- kernel/cgroup.c |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index e8c8e58..71cf961 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -473,7 +473,6 @@ static struct css_set *find_css_set( /* Link this cgroup group into the list */ list_add(res-list, init_css_set.list); css_set_count++; - INIT_LIST_HEAD(res-tasks); write_unlock(css_set_lock); return res; -- 1.5.4.rc3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/7] cgroup: fix comments
On Feb 17, 2008 9:49 PM, Li Zefan [EMAIL PROTECTED] wrote: fix: - comments about need_forkexit_callback - comments about release agent - typo and comment style, etc. Signed-off-by: Li Zefan [EMAIL PROTECTED] --- include/linux/cgroup.h |2 +- kernel/cgroup.c| 44 +--- 2 files changed, 22 insertions(+), 24 deletions(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index ff9055f..2ebf7af 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -175,7 +175,7 @@ struct css_set { * * * When reading/writing to a file: - * - the cgroup to use in file-f_dentry-d_parent-d_fsdata + * - the cgroup to use is file-f_dentry-d_parent-d_fsdata * - the 'cftype' of the file is file-f_dentry-d_fsdata */ diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 4766bb6..0c35022 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -113,9 +113,9 @@ static int root_count; #define dummytop (rootnode.top_cgroup) /* This flag indicates whether tasks in the fork and exit paths should - * take callback_mutex and check for fork/exit handlers to call. This - * avoids us having to do extra work in the fork/exit path if none of the - * subsystems need to be called. + * check for fork/exit handlers to call. This avoids us having to do + * extra work in the fork/exit path if none of the subsystems need to + * be called. */ static int need_forkexit_callback; @@ -507,8 +507,8 @@ static struct css_set *find_css_set( * critical pieces of code here. The exception occurs on cgroup_exit(), * when a task in a notify_on_release cgroup exits. Then cgroup_mutex * is taken, and if the cgroup count is zero, a usermode call made - * to /sbin/cgroup_release_agent with the name of the cgroup (path - * relative to the root of cgroup file system) as the argument. + * to the release agent with the name of the cgroup (path relative to + * the root of cgroup file system) as the argument. * * A cgroup can only be deleted if both its 'count' of using tasks * is zero, and its list of 'children' cgroups is empty. Since all @@ -521,7 +521,7 @@ static struct css_set *find_css_set( * * The need for this exception arises from the action of * cgroup_attach_task(), which overwrites one tasks cgroup pointer with - * another. It does so using cgroup_mutexe, however there are + * another. It does so using cgroup_mutex, however there are * several performance critical places that need to reference * task-cgroup without the expense of grabbing a system global * mutex. Therefore except as noted below, when dereferencing or, as @@ -1192,7 +1192,7 @@ static void get_first_subsys(const struct cgroup *cgrp, * Attach task 'tsk' to cgroup 'cgrp' * * Call holding cgroup_mutex. May take task_lock of - * the task 'pid' during call. + * the task 'tsk' during call. */ int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk) { @@ -1584,12 +1584,11 @@ static int cgroup_create_file(struct dentry *dentry, int mode, } /* I think that docbook-style function comments need /** at the start of the comment block. Thanks, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] Cpusets API: Update cpusets to use cgroup structured file API
Many of the cpusets control files are simple integer values, which don't require the overhead of memory allocations for reads and writes. Move the handlers for these control files into cpuset_read_uint() and cpuset_write_uint(). This also has the advantage that the control files show up as u64 rather than string in the cgroup.api file. Signed-off-by: Paul Menage [EMAIL PROTECTED] --- kernel/cpuset.c | 156 +--- 1 file changed, 82 insertions(+), 74 deletions(-) Index: cpusets-2.6.25-rc2-mm1/kernel/cpuset.c === --- cpusets-2.6.25-rc2-mm1.orig/kernel/cpuset.c +++ cpusets-2.6.25-rc2-mm1/kernel/cpuset.c @@ -999,19 +999,6 @@ int current_cpuset_is_being_rebound(void } /* - * Call with cgroup_mutex held. - */ - -static int update_memory_pressure_enabled(struct cpuset *cs, char *buf) -{ - if (simple_strtoul(buf, NULL, 10) != 0) - cpuset_memory_pressure_enabled = 1; - else - cpuset_memory_pressure_enabled = 0; - return 0; -} - -/* * update_flag - read a 0 or a 1 in a file and update associated flag * bit:the bit to update (CS_CPU_EXCLUSIVE, CS_MEM_EXCLUSIVE, * CS_SCHED_LOAD_BALANCE, @@ -1023,15 +1010,13 @@ static int update_memory_pressure_enable * Call with cgroup_mutex held. */ -static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs, char *buf) +static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs, + int turning_on) { - int turning_on; struct cpuset trialcs; int err; int cpus_nonempty, balance_flag_changed; - turning_on = (simple_strtoul(buf, NULL, 10) != 0); - trialcs = *cs; if (turning_on) set_bit(bit, trialcs.flags); @@ -1247,43 +1232,65 @@ static ssize_t cpuset_common_file_write( case FILE_MEMLIST: retval = update_nodemask(cs, buffer); break; + default: + retval = -EINVAL; + goto out2; + } + + if (retval == 0) + retval = nbytes; +out2: + cgroup_unlock(); +out1: + kfree(buffer); + return retval; +} + +static int cpuset_write_uint(struct cgroup *cgrp, struct cftype *cft, u64 val) +{ + int retval = 0; + struct cpuset *cs = cgroup_cs(cgrp); + cpuset_filetype_t type = cft-private; + + cgroup_lock(); + + if (cgroup_is_removed(cgrp)) { + cgroup_unlock(); + return -ENODEV; + } + + switch (type) { case FILE_CPU_EXCLUSIVE: - retval = update_flag(CS_CPU_EXCLUSIVE, cs, buffer); + retval = update_flag(CS_CPU_EXCLUSIVE, cs, val); break; case FILE_MEM_EXCLUSIVE: - retval = update_flag(CS_MEM_EXCLUSIVE, cs, buffer); + retval = update_flag(CS_MEM_EXCLUSIVE, cs, val); break; case FILE_SCHED_LOAD_BALANCE: - retval = update_flag(CS_SCHED_LOAD_BALANCE, cs, buffer); + retval = update_flag(CS_SCHED_LOAD_BALANCE, cs, val); break; case FILE_MEMORY_MIGRATE: - retval = update_flag(CS_MEMORY_MIGRATE, cs, buffer); + retval = update_flag(CS_MEMORY_MIGRATE, cs, val); break; case FILE_MEMORY_PRESSURE_ENABLED: - retval = update_memory_pressure_enabled(cs, buffer); + cpuset_memory_pressure_enabled = !!val; break; case FILE_MEMORY_PRESSURE: retval = -EACCES; break; case FILE_SPREAD_PAGE: - retval = update_flag(CS_SPREAD_PAGE, cs, buffer); + retval = update_flag(CS_SPREAD_PAGE, cs, val); cs-mems_generation = cpuset_mems_generation++; break; case FILE_SPREAD_SLAB: - retval = update_flag(CS_SPREAD_SLAB, cs, buffer); + retval = update_flag(CS_SPREAD_SLAB, cs, val); cs-mems_generation = cpuset_mems_generation++; break; default: retval = -EINVAL; - goto out2; + break; } - - if (retval == 0) - retval = nbytes; -out2: cgroup_unlock(); -out1: - kfree(buffer); return retval; } @@ -1345,30 +1352,6 @@ static ssize_t cpuset_common_file_read(s case FILE_MEMLIST: s += cpuset_sprintf_memlist(s, cs); break; - case FILE_CPU_EXCLUSIVE: - *s++ = is_cpu_exclusive(cs) ? '1' : '0'; - break; - case FILE_MEM_EXCLUSIVE: - *s++ = is_mem_exclusive(cs) ? '1' : '0'; - break; - case FILE_SCHED_LOAD_BALANCE: - *s++ = is_sched_load_balance(cs) ? '1' : '0'; - break; - case
[PATCH 1/2] Cpusets API: From: Paul Jackson [EMAIL PROTECTED]
Strip all trailing whitespace in cgroup_write_uint This removes the need for people to remember to pass the -n flag to echo when writing values to cgroup control files. Signed-off-by: Paul Menage [EMAIL PROTECTED] --- kernel/cgroup.c |5 + 1 file changed, 1 insertion(+), 4 deletions(-) Index: cpusets-2.6.25-rc2-mm1/kernel/cgroup.c === --- cpusets-2.6.25-rc2-mm1.orig/kernel/cgroup.c +++ cpusets-2.6.25-rc2-mm1/kernel/cgroup.c @@ -1321,10 +1321,7 @@ static ssize_t cgroup_write_uint(struct return -EFAULT; buffer[nbytes] = 0; /* nul-terminate */ - - /* strip newline if necessary */ - if (nbytes (buffer[nbytes-1] == '\n')) - buffer[nbytes-1] = 0; + strstrip(buffer); val = simple_strtoull(buffer, end, 0); if (*end) return -EINVAL; -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/2] Cpusets API: Update Cpusets control files
This pair of patches simplifies the cpusets read/write path for the control files that consist of simple integers. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 1/7] CGroup API: Add cgroup.api control file
On Feb 19, 2008 9:17 PM, Paul Jackson [EMAIL PROTECTED] wrote: Perhaps my primary concern with these *.api files was that I did not understand who or what the critical use or user was; who found this essential, not just nice to have. Right now, no-one would find it essential. If/when a binary API is added, I guess I'll ressurrect this part of the patchset. Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] cgroup map files: Add a key/value map file type to cgroups
On Feb 19, 2008 9:48 PM, YAMAMOTO Takashi [EMAIL PROTECTED] wrote: it changes the format from %s %lld to %s: %llu, right? why? The colon for consistency with maps in /proc. I think it also makes it slightly more readable. For %lld versus %llu - I think that cgroup resource APIs are much more likely to need to report unsigned rather than signed values. In the case of the memory.stat file, that's certainly the case. But I guess there's an argument to be made that nothing's likely to need the final 64th bit of an unsigned value, whereas the ability to report negative numbers could potentially be useful for some cgroups. Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] cgroup map files: Add a key/value map file type to cgroups
On Feb 19, 2008 10:14 PM, YAMAMOTO Takashi [EMAIL PROTECTED] wrote: On Feb 19, 2008 9:48 PM, YAMAMOTO Takashi [EMAIL PROTECTED] wrote: it changes the format from %s %lld to %s: %llu, right? why? The colon for consistency with maps in /proc. I think it also makes it slightly more readable. can you be a little more specific? i object against the colon because i want to use the same parser for /proc/vmstat, which doesn't have colons. Ah. This /proc behaviour of having multiple formats for reporting the same kind of data (compare with /proc/meminfo, which does use colons) is the kind of thing that I want to avoid with cgroups. i.e. if two cgroup subsystems are both reporting the same kind of structured data, then they should both use the same output format. I guess since /proc has both styles, and memory.stat is the first file reporting key/value pairs in cgroups, you get to call the format. OK, I'll zap the colon. Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 7/7] CGroup API: Update cpusets to use cgroup structured file API
On Feb 17, 2008 9:28 AM, Paul Jackson <[EMAIL PROTECTED]> wrote: > > I'm figuring it would be easiest if you just threw this > little change into your hopper for the bigger changes > you're making OK, will do. Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 7/7] CGroup API: Update cpusets to use cgroup structured file API
On Feb 16, 2008 7:29 PM, Paul Jackson <[EMAIL PROTECTED]> wrote: > > From: Paul Jackson <[EMAIL PROTECTED]> > > Strip all trailing whitespace (such as carriage returns) > when parsing integer writes to cgroup files, not just > one trailing newline if present. Sounds like a good idea to me. Thanks for this. > > Signed-off-by: Paul Jackson <[EMAIL PROTECTED]> > Cc: Paul Menage <[EMAIL PROTECTED]> Acked-by: Paul Menage <[EMAIL PROTECTED]> > > --- > kernel/cgroup.c |5 + > 1 file changed, 1 insertion(+), 4 deletions(-) > > --- 2.6.24-mm1.orig/kernel/cgroup.c 2008-02-16 04:20:33.0 -0800 > +++ 2.6.24-mm1/kernel/cgroup.c 2008-02-16 19:00:41.207478218 -0800 > @@ -1321,10 +1321,7 @@ static ssize_t cgroup_write_uint(struct > return -EFAULT; > > buffer[nbytes] = 0; /* nul-terminate */ > - > - /* strip newline if necessary */ > - if (nbytes && (buffer[nbytes-1] == '\n')) > - buffer[nbytes-1] = 0; > + strstrip(buffer); /* strip -just- trailing whitespace */ > val = simple_strtoull(buffer, , 0); > if (*end) > return -EINVAL; > > > -- > I won't rest till it's the best ... > Programmer, Linux Scalability > Paul Jackson <[EMAIL PROTECTED]> 1.940.382.4214 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 7/7] CGroup API: Update cpusets to use cgroup structured file API
On Feb 16, 2008 7:29 PM, Paul Jackson [EMAIL PROTECTED] wrote: From: Paul Jackson [EMAIL PROTECTED] Strip all trailing whitespace (such as carriage returns) when parsing integer writes to cgroup files, not just one trailing newline if present. Sounds like a good idea to me. Thanks for this. Signed-off-by: Paul Jackson [EMAIL PROTECTED] Cc: Paul Menage [EMAIL PROTECTED] Acked-by: Paul Menage [EMAIL PROTECTED] --- kernel/cgroup.c |5 + 1 file changed, 1 insertion(+), 4 deletions(-) --- 2.6.24-mm1.orig/kernel/cgroup.c 2008-02-16 04:20:33.0 -0800 +++ 2.6.24-mm1/kernel/cgroup.c 2008-02-16 19:00:41.207478218 -0800 @@ -1321,10 +1321,7 @@ static ssize_t cgroup_write_uint(struct return -EFAULT; buffer[nbytes] = 0; /* nul-terminate */ - - /* strip newline if necessary */ - if (nbytes (buffer[nbytes-1] == '\n')) - buffer[nbytes-1] = 0; + strstrip(buffer); /* strip -just- trailing whitespace */ val = simple_strtoull(buffer, end, 0); if (*end) return -EINVAL; -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.940.382.4214 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 7/7] CGroup API: Update cpusets to use cgroup structured file API
On Feb 17, 2008 9:28 AM, Paul Jackson [EMAIL PROTECTED] wrote: I'm figuring it would be easiest if you just threw this little change into your hopper for the bigger changes you're making OK, will do. Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 1/7] CGroup API: Add cgroup.api control file
On Feb 16, 2008 2:07 AM, Balbir Singh <[EMAIL PROTECTED]> wrote: > Paul Menage wrote: > > Hi, Paul, > > Do we need to use a cgroup.api file? Why not keep up to date documentation and > get users to use that. I fear that, cgroup.api will not be kept up-to-date, > leading to confusion. The cgroup.api file isn't meant to give complete documentation for a control file, simply a brief indication of its usage. The aim is that most bits of the information reported in cgroup.api are auto-generated, so there shouldn't be problems with it getting out-of-date. Is it just the space used by the documentation string that you're objecting to? The other function of the file is to declare a type for each variable. Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 0/7] CGroup API: More structured API for CGroups control files
On Feb 16, 2008 1:31 AM, Li Zefan <[EMAIL PROTECTED]> wrote: > > I don't quite catch what you mean. Cgoup does support write-only/read-only > files. For a write-only file, just set .write and .write_uint to be NULL, > similar for a read-only file. > > Do I miss something? > I suppose we could infer from the lack of any write handlers that we should give the file in the filesystem a mode of 444 rather 644. Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 1/7] CGroup API: Add cgroup.api control file
On Feb 16, 2008 2:07 AM, Balbir Singh [EMAIL PROTECTED] wrote: Paul Menage wrote: Hi, Paul, Do we need to use a cgroup.api file? Why not keep up to date documentation and get users to use that. I fear that, cgroup.api will not be kept up-to-date, leading to confusion. The cgroup.api file isn't meant to give complete documentation for a control file, simply a brief indication of its usage. The aim is that most bits of the information reported in cgroup.api are auto-generated, so there shouldn't be problems with it getting out-of-date. Is it just the space used by the documentation string that you're objecting to? The other function of the file is to declare a type for each variable. Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 0/7] CGroup API: More structured API for CGroups control files
On Feb 16, 2008 1:31 AM, Li Zefan [EMAIL PROTECTED] wrote: I don't quite catch what you mean. Cgoup does support write-only/read-only files. For a write-only file, just set .write and .write_uint to be NULL, similar for a read-only file. Do I miss something? I suppose we could infer from the lack of any write handlers that we should give the file in the filesystem a mode of 444 rather 644. Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][PATCH 6/7] CGroup API: Use descriptions for memory controller API files
This patch adds descriptions to the memory controller API files to indicate that the usage/limit are in bytes; the names of the control files can then be simplified to usage/limit. Also removes the unnecessary mem_force_empty_read() function Signed-off-by: Paul Menage <[EMAIL PROTECTED]> --- mm/memcontrol.c | 21 + 1 file changed, 5 insertions(+), 16 deletions(-) Index: cgroupmap-2.6.24-mm1/mm/memcontrol.c === --- cgroupmap-2.6.24-mm1.orig/mm/memcontrol.c +++ cgroupmap-2.6.24-mm1/mm/memcontrol.c @@ -950,19 +950,6 @@ static ssize_t mem_force_empty_write(str return ret; } -/* - * Note: This should be removed if cgroup supports write-only file. - */ - -static ssize_t mem_force_empty_read(struct cgroup *cont, - struct cftype *cft, - struct file *file, char __user *userbuf, - size_t nbytes, loff_t *ppos) -{ - return -EINVAL; -} - - static const struct mem_cgroup_stat_desc { const char *msg; u64 unit; @@ -1001,15 +988,17 @@ static int mem_control_stat_show(struct static struct cftype mem_cgroup_files[] = { { - .name = "usage_in_bytes", + .name = "usage", .private = RES_USAGE, .read_uint = mem_cgroup_read, + .desc = "Memory usage in bytes", }, { - .name = "limit_in_bytes", + .name = "limit", .private = RES_LIMIT, .write = mem_cgroup_write, .read_uint = mem_cgroup_read, + .desc = "Memory limit in bytes", }, { .name = "failcnt", @@ -1019,7 +1008,7 @@ static struct cftype mem_cgroup_files[] { .name = "force_empty", .write = mem_force_empty_write, - .read = mem_force_empty_read, + .desc = "Write to this file to forget all memory charges" }, { .name = "stat", -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][PATCH 5/7] CGroup API: Use read_uint in memory controller
Update the memory controller to use read_uint for its limit/usage/failcnt control files, calling the new res_counter_read_uint() function. This allows the files to show up as u64 rather than string in the cgroup.api file. Signed-off-by: Paul Menage <[EMAIL PROTECTED]> --- mm/memcontrol.c | 15 ++- 1 file changed, 6 insertions(+), 9 deletions(-) Index: cgroupmap-2.6.24-mm1/mm/memcontrol.c === --- cgroupmap-2.6.24-mm1.orig/mm/memcontrol.c +++ cgroupmap-2.6.24-mm1/mm/memcontrol.c @@ -922,13 +922,10 @@ int mem_cgroup_write_strategy(char *buf, return 0; } -static ssize_t mem_cgroup_read(struct cgroup *cont, - struct cftype *cft, struct file *file, - char __user *userbuf, size_t nbytes, loff_t *ppos) +static u64 mem_cgroup_read(struct cgroup *cont, struct cftype *cft) { - return res_counter_read(_cgroup_from_cont(cont)->res, - cft->private, userbuf, nbytes, ppos, - NULL); + return res_counter_read_uint(_cgroup_from_cont(cont)->res, +cft->private); } static ssize_t mem_cgroup_write(struct cgroup *cont, struct cftype *cft, @@ -1006,18 +1003,18 @@ static struct cftype mem_cgroup_files[] { .name = "usage_in_bytes", .private = RES_USAGE, - .read = mem_cgroup_read, + .read_uint = mem_cgroup_read, }, { .name = "limit_in_bytes", .private = RES_LIMIT, .write = mem_cgroup_write, - .read = mem_cgroup_read, + .read_uint = mem_cgroup_read, }, { .name = "failcnt", .private = RES_FAILCNT, - .read = mem_cgroup_read, + .read_uint = mem_cgroup_read, }, { .name = "force_empty", -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][PATCH 1/7] CGroup API: Add cgroup.api control file
Add a cgroup.api control file in every cgroup directory. This reports for each control file the type of data represented by that control file, and a user-friendly description of the contents. A secondary effect of this patch is to add the "cgroup." prefix in front of all cgroup-provided control files. This will reduce the chance of future control files clashing with user-provided names. Signed-off-by: Paul Menage <[EMAIL PROTECTED]> --- include/linux/cgroup.h | 21 +++ kernel/cgroup.c| 133 ++--- 2 files changed, 148 insertions(+), 6 deletions(-) Index: cgroupmap-2.6.24-mm1/include/linux/cgroup.h === --- cgroupmap-2.6.24-mm1.orig/include/linux/cgroup.h +++ cgroupmap-2.6.24-mm1/include/linux/cgroup.h @@ -179,12 +179,33 @@ struct css_set { * - the 'cftype' of the file is file->f_dentry->d_fsdata */ +/* + * The various types of control file that are reported in the + * cgroup.api file. "String" is a catch-all default, but should only + * be used for special cases. If you use the appropriate accessors + * (such as "read_uint") in your control file, then you can leave this + * as 0 (CGROUP_FILE_UNKNOWN) and let cgroup figure out the right type. + */ +enum cgroup_file_type { + CGROUP_FILE_UNKNOWN = 0, + CGROUP_FILE_VOID, + CGROUP_FILE_U64, + CGROUP_FILE_STRING, +}; + #define MAX_CFTYPE_NAME 64 struct cftype { /* By convention, the name should begin with the name of the * subsystem, followed by a period */ char name[MAX_CFTYPE_NAME]; int private; + + /* The type of a file - reported in the cgroup.api file */ + enum cgroup_file_type type; + + /* Human-readable description of the file */ + const char *desc; + int (*open) (struct inode *inode, struct file *file); ssize_t (*read) (struct cgroup *cont, struct cftype *cft, struct file *file, Index: cgroupmap-2.6.24-mm1/kernel/cgroup.c === --- cgroupmap-2.6.24-mm1.orig/kernel/cgroup.c +++ cgroupmap-2.6.24-mm1/kernel/cgroup.c @@ -1301,6 +1301,7 @@ enum cgroup_filetype { FILE_NOTIFY_ON_RELEASE, FILE_RELEASABLE, FILE_RELEASE_AGENT, + FILE_API, }; static ssize_t cgroup_write_uint(struct cgroup *cgrp, struct cftype *cft, @@ -1611,17 +1612,21 @@ static int cgroup_create_dir(struct cgro } int cgroup_add_file(struct cgroup *cgrp, - struct cgroup_subsys *subsys, - const struct cftype *cft) + struct cgroup_subsys *subsys, + const struct cftype *cft) { struct dentry *dir = cgrp->dentry; struct dentry *dentry; int error; char name[MAX_CGROUP_TYPE_NAMELEN + MAX_CFTYPE_NAME + 2] = { 0 }; - if (subsys && !test_bit(ROOT_NOPREFIX, >root->flags)) { - strcpy(name, subsys->name); - strcat(name, "."); + if (!test_bit(ROOT_NOPREFIX, >root->flags)) { + if (subsys) { + strcpy(name, subsys->name); + strcat(name, "."); + } else { + strcpy(name, "cgroup."); + } } strcat(name, cft->name); BUG_ON(!mutex_is_locked(>d_inode->i_mutex)); @@ -2126,6 +2131,110 @@ static u64 cgroup_read_releasable(struct return test_bit(CGRP_RELEASABLE, >flags); } +static const struct file_operations cgroup_api_file_operations = { + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release, +}; + +/* + * cgroup.api is a file in each cgroup directory that gives the types + * and descriptions of the various control files in that directory. + */ + +static struct dentry *cgroup_api_advance(struct dentry *d, int advance) +{ + struct dentry *parent = d->d_parent; + struct list_head *l = >d_u.d_child; + while (true) { + if (advance) + l = l->next; + advance = true; + /* Did we reach the end of the directory? */ + if (l == >d_subdirs) + return NULL; + d = container_of(l, struct dentry, d_u.d_child); + /* Skip cgroup subdirectories */ + if (d->d_inode && S_ISREG(d->d_inode->i_mode)) + return d; + } +} + +static void *cgroup_api_start(struct seq_file *sf, loff_t *pos) +{ + struct dentry *parent = sf->private; + struct dentry *d; + loff_t l = 0; + spin_lock(_lock); + if (list_empty(>d_subdirs)) + return NULL; + d = container_of(parent->d_subdirs.next, struct dentry, d_
[RFC][PATCH 0/7] CGroup API: More structured API for CGroups control files
This set of patches makes the Control Groups API more structured and self-describing. 1) Allows control files to be associated with data types such as "u64", "string", "map", etc. These types show up in a new cgroup.api file in each cgroup directory, along with a user-readable string. Files that use cgroup-provided data accessors have these file types inferred automatically. 2) Moves various files in cpusets and the memory controller from using custom-written file handlers to cgroup-defined handlers 3) Adds the "cgroup." prefix for existing cgroup-provided control files (tasks, release_agent, releasable, notify_on_release). Given than we've already had 2.6.24 go out without this prefix, I guess this could be a little contentious - but it seems like a good move to prevent name clashes in the future. (Note that this doesn't affect mounting the legacy cpuset filesystem, since the compatibility layer disables all prefixes when mounted with filesystem type "cpuset"). If people object too strongly, we could just make this the case for *new* cgroup API files, but I think this is a case where consistency would be better than compatibility - I'd be surprised if anyone has written major legacy apps yet that rely on 2.6.24 cgroup control file names. There are various motivations for this: 1) We said at Kernel Summit '07 that the cgroup API wouldn't be allowed to spiral into an arbitrary mess of ad-hoc APIs. Having simple ways to represent common data types makes this easier. (E.g. one standard way to report a map of string,u64 pairs to userspace.) 2) People were divided on the issue of binary APIs versus ASCII APIs for control groups. Compatibility with the existing cpusets system, and ease of experimentation, were two important reasons for going with the current. ASCII API. But by having structured control files, we can open the path towards having more efficient binary APIs for simpler and more efficient programmatic access too, without any additional modifications required from the subsystems themselves. My plans for this potential binary API are a little hazy at this point, but they might go something like opening a cgroup.bin file in a cgroup directory, and writing the names of the control files that you were interested in; then a read on that file handle would return the contents of the given control files in a single read in a simple binary format. (Better suggestions are welcome). Regardless, getting a good typing/structure on the control files is an important first step if we want to go in that direction. 3) The memory controller currently has files with the "_in_bytes" suffix, on the grounds that otherwise it's not obvious to a new user what they represent. By moving the description to a auto-generated API file, we can remove this (IMO) inelegant suffix. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][PATCH 2/7] CGroup API: Add cgroup map data type
Adds a new type of supported control file representation, a map from strings to u64 values. Signed-off-by: Paul Menage <[EMAIL PROTECTED]> --- include/linux/cgroup.h | 19 +++ kernel/cgroup.c| 61 - 2 files changed, 79 insertions(+), 1 deletion(-) Index: cgroupmap-2.6.24-mm1/include/linux/cgroup.h === --- cgroupmap-2.6.24-mm1.orig/include/linux/cgroup.h +++ cgroupmap-2.6.24-mm1/include/linux/cgroup.h @@ -191,6 +191,17 @@ enum cgroup_file_type { CGROUP_FILE_VOID, CGROUP_FILE_U64, CGROUP_FILE_STRING, + CGROUP_FILE_MAP, +}; + +/* + * cgroup_map_cb is an abstract callback API for reporting map-valued + * control files + */ + +struct cgroup_map_cb { + int (*fill)(struct cgroup_map_cb *cb, const char *key, u64 value); + void *state; }; #define MAX_CFTYPE_NAME 64 @@ -215,6 +226,14 @@ struct cftype { * single integer. Use it in place of read() */ u64 (*read_uint) (struct cgroup *cont, struct cftype *cft); + /* +* read_map() is used for defining a map of key/value +* pairs. It should call cb->fill(cb, key, value) for each +* entry. +*/ + int (*read_map) (struct cgroup *cont, struct cftype *cft, +struct cgroup_map_cb *cb); + ssize_t (*write) (struct cgroup *cont, struct cftype *cft, struct file *file, const char __user *buf, size_t nbytes, loff_t *ppos); Index: cgroupmap-2.6.24-mm1/kernel/cgroup.c === --- cgroupmap-2.6.24-mm1.orig/kernel/cgroup.c +++ cgroupmap-2.6.24-mm1/kernel/cgroup.c @@ -1488,6 +1488,46 @@ static ssize_t cgroup_file_read(struct f return -EINVAL; } +/* + * seqfile ops/methods for returning structured data. Currently just + * supports string->u64 maps, but can be extended in future. + */ + +struct cgroup_seqfile_state { + struct cftype *cft; + struct cgroup *cgroup; +}; + +static int cgroup_map_add(struct cgroup_map_cb *cb, const char *key, u64 value) +{ + struct seq_file *sf = cb->state; + return seq_printf(sf, "%s: %llu\n", key, value); +} + +static int cgroup_seqfile_show(struct seq_file *m, void *arg) +{ + struct cgroup_seqfile_state *state = m->private; + struct cftype *cft = state->cft; + struct cgroup_map_cb cb = { + .fill = cgroup_map_add, + .state = m, + }; + if (cft->read_map) { + return cft->read_map(state->cgroup, cft, ); + } else { + BUG(); + } +} + +int cgroup_seqfile_release(struct inode *inode, struct file *file) +{ + struct seq_file *seq = file->private_data; + kfree(seq->private); + return single_release(inode, file); +} + +static struct file_operations cgroup_seqfile_operations; + static int cgroup_file_open(struct inode *inode, struct file *file) { int err; @@ -1500,7 +1540,18 @@ static int cgroup_file_open(struct inode cft = __d_cft(file->f_dentry); if (!cft) return -ENODEV; - if (cft->open) + if (cft->read_map) { + struct cgroup_seqfile_state *state = + kzalloc(sizeof(*state), GFP_USER); + if (!state) + return -ENOMEM; + state->cft = cft; + state->cgroup = __d_cgrp(file->f_dentry->d_parent); + file->f_op = _seqfile_operations; + err = single_open(file, cgroup_seqfile_show, state); + if (err < 0) + kfree(state); + } else if (cft->open) err = cft->open(inode, file); else err = 0; @@ -1539,6 +1590,12 @@ static struct file_operations cgroup_fil .release = cgroup_file_release, }; +static struct file_operations cgroup_seqfile_operations = { + .read = seq_read, + .llseek = seq_lseek, + .release = cgroup_seqfile_release, +}; + static struct inode_operations cgroup_dir_inode_operations = { .lookup = simple_lookup, .mkdir = cgroup_mkdir, @@ -2206,6 +2263,8 @@ static int cgroup_api_show(struct seq_fi if (type == CGROUP_FILE_UNKNOWN) { if (cft->read_uint) type = CGROUP_FILE_U64; + else if (cft->read_map) + type = CGROUP_FILE_MAP; else if (cft->read) type = CGROUP_FILE_STRING; else if (!cft->open) -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][PATCH 3/7] CGroup API: Use cgroup map for memcontrol stats file
Remove the seq_file boilerplate used to construct the memcontrol stats map, and instead use the new map representation for cgroup control files Signed-off-by: Paul Menage <[EMAIL PROTECTED]> --- mm/memcontrol.c | 30 ++ 1 file changed, 6 insertions(+), 24 deletions(-) Index: cgroupmap-2.6.24-mm1/mm/memcontrol.c === --- cgroupmap-2.6.24-mm1.orig/mm/memcontrol.c +++ cgroupmap-2.6.24-mm1/mm/memcontrol.c @@ -974,9 +974,9 @@ static const struct mem_cgroup_stat_desc [MEM_CGROUP_STAT_RSS] = { "rss", PAGE_SIZE, }, }; -static int mem_control_stat_show(struct seq_file *m, void *arg) +static int mem_control_stat_show(struct cgroup *cont, struct cftype *cft, +struct cgroup_map_cb *cb) { - struct cgroup *cont = m->private; struct mem_cgroup *mem_cont = mem_cgroup_from_cont(cont); struct mem_cgroup_stat *stat = _cont->stat; int i; @@ -986,8 +986,7 @@ static int mem_control_stat_show(struct val = mem_cgroup_read_stat(stat, i); val *= mem_cgroup_stat_desc[i].unit; - seq_printf(m, "%s %lld\n", mem_cgroup_stat_desc[i].msg, - (long long)val); + cb->fill(cb, mem_cgroup_stat_desc[i].msg, val); } /* showing # of active pages */ { @@ -997,29 +996,12 @@ static int mem_control_stat_show(struct MEM_CGROUP_ZSTAT_INACTIVE); active = mem_cgroup_get_all_zonestat(mem_cont, MEM_CGROUP_ZSTAT_ACTIVE); - seq_printf(m, "active %ld\n", (active) * PAGE_SIZE); - seq_printf(m, "inactive %ld\n", (inactive) * PAGE_SIZE); + cb->fill(cb, "active", (active) * PAGE_SIZE); + cb->fill(cb, "inactive", (inactive) * PAGE_SIZE); } return 0; } -static const struct file_operations mem_control_stat_file_operations = { - .read = seq_read, - .llseek = seq_lseek, - .release = single_release, -}; - -static int mem_control_stat_open(struct inode *unused, struct file *file) -{ - /* XXX __d_cont */ - struct cgroup *cont = file->f_dentry->d_parent->d_fsdata; - - file->f_op = _control_stat_file_operations; - return single_open(file, mem_control_stat_show, cont); -} - - - static struct cftype mem_cgroup_files[] = { { .name = "usage_in_bytes", @@ -1044,7 +1026,7 @@ static struct cftype mem_cgroup_files[] }, { .name = "stat", - .open = mem_control_stat_open, + .read_map = mem_control_stat_show, }, }; -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][PATCH 4/7] CGroup API: Add res_counter_read_uint()
Adds a function for returning the value of a resource counter member, in a form suitable for use in a cgroup read_uint control file method. Signed-off-by: Paul Menage <[EMAIL PROTECTED]> --- include/linux/res_counter.h |1 + kernel/res_counter.c|5 + 2 files changed, 6 insertions(+) Index: cgroupmap-2.6.24-mm1/include/linux/res_counter.h === --- cgroupmap-2.6.24-mm1.orig/include/linux/res_counter.h +++ cgroupmap-2.6.24-mm1/include/linux/res_counter.h @@ -54,6 +54,7 @@ struct res_counter { ssize_t res_counter_read(struct res_counter *counter, int member, const char __user *buf, size_t nbytes, loff_t *pos, int (*read_strategy)(unsigned long long val, char *s)); +u64 res_counter_read_uint(struct res_counter *counter, int member); ssize_t res_counter_write(struct res_counter *counter, int member, const char __user *buf, size_t nbytes, loff_t *pos, int (*write_strategy)(char *buf, unsigned long long *val)); Index: cgroupmap-2.6.24-mm1/kernel/res_counter.c === --- cgroupmap-2.6.24-mm1.orig/kernel/res_counter.c +++ cgroupmap-2.6.24-mm1/kernel/res_counter.c @@ -92,6 +92,11 @@ ssize_t res_counter_read(struct res_coun pos, buf, s - buf); } +u64 res_counter_read_uint(struct res_counter *counter, int member) +{ + return *res_counter_member(counter, member); +} + ssize_t res_counter_write(struct res_counter *counter, int member, const char __user *userbuf, size_t nbytes, loff_t *pos, int (*write_strategy)(char *st_buf, unsigned long long *val)) -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][PATCH 7/7] CGroup API: Update cpusets to use cgroup structured file API
Many of the cpusets control files are simple integer values, which don't require the overhead of memory allocations for reads and writes. Move the handlers for these control files into cpuset_read_uint() and cpuset_write_uint(). This also has the advantage that the control files show up as "u64" rather than "string" in the cgroup.api file. Signed-off-by: Paul Menage <[EMAIL PROTECTED]> --- kernel/cpuset.c | 158 +--- 1 file changed, 83 insertions(+), 75 deletions(-) Index: cgroupmap-2.6.24-mm1/kernel/cpuset.c === --- cgroupmap-2.6.24-mm1.orig/kernel/cpuset.c +++ cgroupmap-2.6.24-mm1/kernel/cpuset.c @@ -999,19 +999,6 @@ int current_cpuset_is_being_rebound(void } /* - * Call with cgroup_mutex held. - */ - -static int update_memory_pressure_enabled(struct cpuset *cs, char *buf) -{ - if (simple_strtoul(buf, NULL, 10) != 0) - cpuset_memory_pressure_enabled = 1; - else - cpuset_memory_pressure_enabled = 0; - return 0; -} - -/* * update_flag - read a 0 or a 1 in a file and update associated flag * bit:the bit to update (CS_CPU_EXCLUSIVE, CS_MEM_EXCLUSIVE, * CS_SCHED_LOAD_BALANCE, @@ -1023,15 +1010,13 @@ static int update_memory_pressure_enable * Call with cgroup_mutex held. */ -static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs, char *buf) +static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs, + int turning_on) { - int turning_on; struct cpuset trialcs; int err; int cpus_nonempty, balance_flag_changed; - turning_on = (simple_strtoul(buf, NULL, 10) != 0); - trialcs = *cs; if (turning_on) set_bit(bit, ); @@ -1247,44 +1232,66 @@ static ssize_t cpuset_common_file_write( case FILE_MEMLIST: retval = update_nodemask(cs, buffer); break; + default: + retval = -EINVAL; + goto out2; + } + + if (retval == 0) + retval = nbytes; +out2: + cgroup_unlock(); +out1: + kfree(buffer); + return retval; +} + +static int cpuset_write_uint(struct cgroup *cgrp, struct cftype *cft, u64 val) +{ + int retval = 0; + struct cpuset *cs = cgroup_cs(cgrp); + cpuset_filetype_t type = cft->private; + + cgroup_lock(); + + if (cgroup_is_removed(cgrp)) { + cgroup_unlock(); + return -ENODEV; + } + + switch (type) { case FILE_CPU_EXCLUSIVE: - retval = update_flag(CS_CPU_EXCLUSIVE, cs, buffer); + retval = update_flag(CS_CPU_EXCLUSIVE, cs, val); break; case FILE_MEM_EXCLUSIVE: - retval = update_flag(CS_MEM_EXCLUSIVE, cs, buffer); + retval = update_flag(CS_MEM_EXCLUSIVE, cs, val); break; case FILE_SCHED_LOAD_BALANCE: - retval = update_flag(CS_SCHED_LOAD_BALANCE, cs, buffer); + retval = update_flag(CS_SCHED_LOAD_BALANCE, cs, val); break; case FILE_MEMORY_MIGRATE: - retval = update_flag(CS_MEMORY_MIGRATE, cs, buffer); + retval = update_flag(CS_MEMORY_MIGRATE, cs, val); break; case FILE_MEMORY_PRESSURE_ENABLED: - retval = update_memory_pressure_enabled(cs, buffer); + cpuset_memory_pressure_enabled = val; break; case FILE_MEMORY_PRESSURE: retval = -EACCES; break; case FILE_SPREAD_PAGE: - retval = update_flag(CS_SPREAD_PAGE, cs, buffer); + retval = update_flag(CS_SPREAD_PAGE, cs, val); cs->mems_generation = cpuset_mems_generation++; break; case FILE_SPREAD_SLAB: - retval = update_flag(CS_SPREAD_SLAB, cs, buffer); + retval = update_flag(CS_SPREAD_SLAB, cs, val); cs->mems_generation = cpuset_mems_generation++; break; default: retval = -EINVAL; - goto out2; + break; } - - if (retval == 0) - retval = nbytes; -out2: cgroup_unlock(); -out1: - kfree(buffer); - return retval; + return -EINVAL; } /* @@ -1345,30 +1352,6 @@ static ssize_t cpuset_common_file_read(s case FILE_MEMLIST: s += cpuset_sprintf_memlist(s, cs); break; - case FILE_CPU_EXCLUSIVE: - *s++ = is_cpu_exclusive(cs) ? '1' : '0'; - break; - case FILE_MEM_EXCLUSIVE: - *s++ = is_mem_exclusive(cs) ? '1' : '0'; - break; - case FILE_SCHED_LOAD_BALANCE: - *s++ = is_sched_load_balance(cs) ? '1' : '0';
[RFC][PATCH 1/7] CGroup API: Add cgroup.api control file
Add a cgroup.api control file in every cgroup directory. This reports for each control file the type of data represented by that control file, and a user-friendly description of the contents. A secondary effect of this patch is to add the cgroup. prefix in front of all cgroup-provided control files. This will reduce the chance of future control files clashing with user-provided names. Signed-off-by: Paul Menage [EMAIL PROTECTED] --- include/linux/cgroup.h | 21 +++ kernel/cgroup.c| 133 ++--- 2 files changed, 148 insertions(+), 6 deletions(-) Index: cgroupmap-2.6.24-mm1/include/linux/cgroup.h === --- cgroupmap-2.6.24-mm1.orig/include/linux/cgroup.h +++ cgroupmap-2.6.24-mm1/include/linux/cgroup.h @@ -179,12 +179,33 @@ struct css_set { * - the 'cftype' of the file is file-f_dentry-d_fsdata */ +/* + * The various types of control file that are reported in the + * cgroup.api file. String is a catch-all default, but should only + * be used for special cases. If you use the appropriate accessors + * (such as read_uint) in your control file, then you can leave this + * as 0 (CGROUP_FILE_UNKNOWN) and let cgroup figure out the right type. + */ +enum cgroup_file_type { + CGROUP_FILE_UNKNOWN = 0, + CGROUP_FILE_VOID, + CGROUP_FILE_U64, + CGROUP_FILE_STRING, +}; + #define MAX_CFTYPE_NAME 64 struct cftype { /* By convention, the name should begin with the name of the * subsystem, followed by a period */ char name[MAX_CFTYPE_NAME]; int private; + + /* The type of a file - reported in the cgroup.api file */ + enum cgroup_file_type type; + + /* Human-readable description of the file */ + const char *desc; + int (*open) (struct inode *inode, struct file *file); ssize_t (*read) (struct cgroup *cont, struct cftype *cft, struct file *file, Index: cgroupmap-2.6.24-mm1/kernel/cgroup.c === --- cgroupmap-2.6.24-mm1.orig/kernel/cgroup.c +++ cgroupmap-2.6.24-mm1/kernel/cgroup.c @@ -1301,6 +1301,7 @@ enum cgroup_filetype { FILE_NOTIFY_ON_RELEASE, FILE_RELEASABLE, FILE_RELEASE_AGENT, + FILE_API, }; static ssize_t cgroup_write_uint(struct cgroup *cgrp, struct cftype *cft, @@ -1611,17 +1612,21 @@ static int cgroup_create_dir(struct cgro } int cgroup_add_file(struct cgroup *cgrp, - struct cgroup_subsys *subsys, - const struct cftype *cft) + struct cgroup_subsys *subsys, + const struct cftype *cft) { struct dentry *dir = cgrp-dentry; struct dentry *dentry; int error; char name[MAX_CGROUP_TYPE_NAMELEN + MAX_CFTYPE_NAME + 2] = { 0 }; - if (subsys !test_bit(ROOT_NOPREFIX, cgrp-root-flags)) { - strcpy(name, subsys-name); - strcat(name, .); + if (!test_bit(ROOT_NOPREFIX, cgrp-root-flags)) { + if (subsys) { + strcpy(name, subsys-name); + strcat(name, .); + } else { + strcpy(name, cgroup.); + } } strcat(name, cft-name); BUG_ON(!mutex_is_locked(dir-d_inode-i_mutex)); @@ -2126,6 +2131,110 @@ static u64 cgroup_read_releasable(struct return test_bit(CGRP_RELEASABLE, cgrp-flags); } +static const struct file_operations cgroup_api_file_operations = { + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release, +}; + +/* + * cgroup.api is a file in each cgroup directory that gives the types + * and descriptions of the various control files in that directory. + */ + +static struct dentry *cgroup_api_advance(struct dentry *d, int advance) +{ + struct dentry *parent = d-d_parent; + struct list_head *l = d-d_u.d_child; + while (true) { + if (advance) + l = l-next; + advance = true; + /* Did we reach the end of the directory? */ + if (l == parent-d_subdirs) + return NULL; + d = container_of(l, struct dentry, d_u.d_child); + /* Skip cgroup subdirectories */ + if (d-d_inode S_ISREG(d-d_inode-i_mode)) + return d; + } +} + +static void *cgroup_api_start(struct seq_file *sf, loff_t *pos) +{ + struct dentry *parent = sf-private; + struct dentry *d; + loff_t l = 0; + spin_lock(dcache_lock); + if (list_empty(parent-d_subdirs)) + return NULL; + d = container_of(parent-d_subdirs.next, struct dentry, d_u.d_child); + d = cgroup_api_advance(d, 0); + while (d l *pos) { + (*pos)++; + d = cgroup_api_advance(d, 1
[RFC][PATCH 5/7] CGroup API: Use read_uint in memory controller
Update the memory controller to use read_uint for its limit/usage/failcnt control files, calling the new res_counter_read_uint() function. This allows the files to show up as u64 rather than string in the cgroup.api file. Signed-off-by: Paul Menage [EMAIL PROTECTED] --- mm/memcontrol.c | 15 ++- 1 file changed, 6 insertions(+), 9 deletions(-) Index: cgroupmap-2.6.24-mm1/mm/memcontrol.c === --- cgroupmap-2.6.24-mm1.orig/mm/memcontrol.c +++ cgroupmap-2.6.24-mm1/mm/memcontrol.c @@ -922,13 +922,10 @@ int mem_cgroup_write_strategy(char *buf, return 0; } -static ssize_t mem_cgroup_read(struct cgroup *cont, - struct cftype *cft, struct file *file, - char __user *userbuf, size_t nbytes, loff_t *ppos) +static u64 mem_cgroup_read(struct cgroup *cont, struct cftype *cft) { - return res_counter_read(mem_cgroup_from_cont(cont)-res, - cft-private, userbuf, nbytes, ppos, - NULL); + return res_counter_read_uint(mem_cgroup_from_cont(cont)-res, +cft-private); } static ssize_t mem_cgroup_write(struct cgroup *cont, struct cftype *cft, @@ -1006,18 +1003,18 @@ static struct cftype mem_cgroup_files[] { .name = usage_in_bytes, .private = RES_USAGE, - .read = mem_cgroup_read, + .read_uint = mem_cgroup_read, }, { .name = limit_in_bytes, .private = RES_LIMIT, .write = mem_cgroup_write, - .read = mem_cgroup_read, + .read_uint = mem_cgroup_read, }, { .name = failcnt, .private = RES_FAILCNT, - .read = mem_cgroup_read, + .read_uint = mem_cgroup_read, }, { .name = force_empty, -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][PATCH 6/7] CGroup API: Use descriptions for memory controller API files
This patch adds descriptions to the memory controller API files to indicate that the usage/limit are in bytes; the names of the control files can then be simplified to usage/limit. Also removes the unnecessary mem_force_empty_read() function Signed-off-by: Paul Menage [EMAIL PROTECTED] --- mm/memcontrol.c | 21 + 1 file changed, 5 insertions(+), 16 deletions(-) Index: cgroupmap-2.6.24-mm1/mm/memcontrol.c === --- cgroupmap-2.6.24-mm1.orig/mm/memcontrol.c +++ cgroupmap-2.6.24-mm1/mm/memcontrol.c @@ -950,19 +950,6 @@ static ssize_t mem_force_empty_write(str return ret; } -/* - * Note: This should be removed if cgroup supports write-only file. - */ - -static ssize_t mem_force_empty_read(struct cgroup *cont, - struct cftype *cft, - struct file *file, char __user *userbuf, - size_t nbytes, loff_t *ppos) -{ - return -EINVAL; -} - - static const struct mem_cgroup_stat_desc { const char *msg; u64 unit; @@ -1001,15 +988,17 @@ static int mem_control_stat_show(struct static struct cftype mem_cgroup_files[] = { { - .name = usage_in_bytes, + .name = usage, .private = RES_USAGE, .read_uint = mem_cgroup_read, + .desc = Memory usage in bytes, }, { - .name = limit_in_bytes, + .name = limit, .private = RES_LIMIT, .write = mem_cgroup_write, .read_uint = mem_cgroup_read, + .desc = Memory limit in bytes, }, { .name = failcnt, @@ -1019,7 +1008,7 @@ static struct cftype mem_cgroup_files[] { .name = force_empty, .write = mem_force_empty_write, - .read = mem_force_empty_read, + .desc = Write to this file to forget all memory charges }, { .name = stat, -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][PATCH 0/7] CGroup API: More structured API for CGroups control files
This set of patches makes the Control Groups API more structured and self-describing. 1) Allows control files to be associated with data types such as u64, string, map, etc. These types show up in a new cgroup.api file in each cgroup directory, along with a user-readable string. Files that use cgroup-provided data accessors have these file types inferred automatically. 2) Moves various files in cpusets and the memory controller from using custom-written file handlers to cgroup-defined handlers 3) Adds the cgroup. prefix for existing cgroup-provided control files (tasks, release_agent, releasable, notify_on_release). Given than we've already had 2.6.24 go out without this prefix, I guess this could be a little contentious - but it seems like a good move to prevent name clashes in the future. (Note that this doesn't affect mounting the legacy cpuset filesystem, since the compatibility layer disables all prefixes when mounted with filesystem type cpuset). If people object too strongly, we could just make this the case for *new* cgroup API files, but I think this is a case where consistency would be better than compatibility - I'd be surprised if anyone has written major legacy apps yet that rely on 2.6.24 cgroup control file names. There are various motivations for this: 1) We said at Kernel Summit '07 that the cgroup API wouldn't be allowed to spiral into an arbitrary mess of ad-hoc APIs. Having simple ways to represent common data types makes this easier. (E.g. one standard way to report a map of string,u64 pairs to userspace.) 2) People were divided on the issue of binary APIs versus ASCII APIs for control groups. Compatibility with the existing cpusets system, and ease of experimentation, were two important reasons for going with the current. ASCII API. But by having structured control files, we can open the path towards having more efficient binary APIs for simpler and more efficient programmatic access too, without any additional modifications required from the subsystems themselves. My plans for this potential binary API are a little hazy at this point, but they might go something like opening a cgroup.bin file in a cgroup directory, and writing the names of the control files that you were interested in; then a read on that file handle would return the contents of the given control files in a single read in a simple binary format. (Better suggestions are welcome). Regardless, getting a good typing/structure on the control files is an important first step if we want to go in that direction. 3) The memory controller currently has files with the _in_bytes suffix, on the grounds that otherwise it's not obvious to a new user what they represent. By moving the description to a auto-generated API file, we can remove this (IMO) inelegant suffix. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][PATCH 3/7] CGroup API: Use cgroup map for memcontrol stats file
Remove the seq_file boilerplate used to construct the memcontrol stats map, and instead use the new map representation for cgroup control files Signed-off-by: Paul Menage [EMAIL PROTECTED] --- mm/memcontrol.c | 30 ++ 1 file changed, 6 insertions(+), 24 deletions(-) Index: cgroupmap-2.6.24-mm1/mm/memcontrol.c === --- cgroupmap-2.6.24-mm1.orig/mm/memcontrol.c +++ cgroupmap-2.6.24-mm1/mm/memcontrol.c @@ -974,9 +974,9 @@ static const struct mem_cgroup_stat_desc [MEM_CGROUP_STAT_RSS] = { rss, PAGE_SIZE, }, }; -static int mem_control_stat_show(struct seq_file *m, void *arg) +static int mem_control_stat_show(struct cgroup *cont, struct cftype *cft, +struct cgroup_map_cb *cb) { - struct cgroup *cont = m-private; struct mem_cgroup *mem_cont = mem_cgroup_from_cont(cont); struct mem_cgroup_stat *stat = mem_cont-stat; int i; @@ -986,8 +986,7 @@ static int mem_control_stat_show(struct val = mem_cgroup_read_stat(stat, i); val *= mem_cgroup_stat_desc[i].unit; - seq_printf(m, %s %lld\n, mem_cgroup_stat_desc[i].msg, - (long long)val); + cb-fill(cb, mem_cgroup_stat_desc[i].msg, val); } /* showing # of active pages */ { @@ -997,29 +996,12 @@ static int mem_control_stat_show(struct MEM_CGROUP_ZSTAT_INACTIVE); active = mem_cgroup_get_all_zonestat(mem_cont, MEM_CGROUP_ZSTAT_ACTIVE); - seq_printf(m, active %ld\n, (active) * PAGE_SIZE); - seq_printf(m, inactive %ld\n, (inactive) * PAGE_SIZE); + cb-fill(cb, active, (active) * PAGE_SIZE); + cb-fill(cb, inactive, (inactive) * PAGE_SIZE); } return 0; } -static const struct file_operations mem_control_stat_file_operations = { - .read = seq_read, - .llseek = seq_lseek, - .release = single_release, -}; - -static int mem_control_stat_open(struct inode *unused, struct file *file) -{ - /* XXX __d_cont */ - struct cgroup *cont = file-f_dentry-d_parent-d_fsdata; - - file-f_op = mem_control_stat_file_operations; - return single_open(file, mem_control_stat_show, cont); -} - - - static struct cftype mem_cgroup_files[] = { { .name = usage_in_bytes, @@ -1044,7 +1026,7 @@ static struct cftype mem_cgroup_files[] }, { .name = stat, - .open = mem_control_stat_open, + .read_map = mem_control_stat_show, }, }; -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][PATCH 2/7] CGroup API: Add cgroup map data type
Adds a new type of supported control file representation, a map from strings to u64 values. Signed-off-by: Paul Menage [EMAIL PROTECTED] --- include/linux/cgroup.h | 19 +++ kernel/cgroup.c| 61 - 2 files changed, 79 insertions(+), 1 deletion(-) Index: cgroupmap-2.6.24-mm1/include/linux/cgroup.h === --- cgroupmap-2.6.24-mm1.orig/include/linux/cgroup.h +++ cgroupmap-2.6.24-mm1/include/linux/cgroup.h @@ -191,6 +191,17 @@ enum cgroup_file_type { CGROUP_FILE_VOID, CGROUP_FILE_U64, CGROUP_FILE_STRING, + CGROUP_FILE_MAP, +}; + +/* + * cgroup_map_cb is an abstract callback API for reporting map-valued + * control files + */ + +struct cgroup_map_cb { + int (*fill)(struct cgroup_map_cb *cb, const char *key, u64 value); + void *state; }; #define MAX_CFTYPE_NAME 64 @@ -215,6 +226,14 @@ struct cftype { * single integer. Use it in place of read() */ u64 (*read_uint) (struct cgroup *cont, struct cftype *cft); + /* +* read_map() is used for defining a map of key/value +* pairs. It should call cb-fill(cb, key, value) for each +* entry. +*/ + int (*read_map) (struct cgroup *cont, struct cftype *cft, +struct cgroup_map_cb *cb); + ssize_t (*write) (struct cgroup *cont, struct cftype *cft, struct file *file, const char __user *buf, size_t nbytes, loff_t *ppos); Index: cgroupmap-2.6.24-mm1/kernel/cgroup.c === --- cgroupmap-2.6.24-mm1.orig/kernel/cgroup.c +++ cgroupmap-2.6.24-mm1/kernel/cgroup.c @@ -1488,6 +1488,46 @@ static ssize_t cgroup_file_read(struct f return -EINVAL; } +/* + * seqfile ops/methods for returning structured data. Currently just + * supports string-u64 maps, but can be extended in future. + */ + +struct cgroup_seqfile_state { + struct cftype *cft; + struct cgroup *cgroup; +}; + +static int cgroup_map_add(struct cgroup_map_cb *cb, const char *key, u64 value) +{ + struct seq_file *sf = cb-state; + return seq_printf(sf, %s: %llu\n, key, value); +} + +static int cgroup_seqfile_show(struct seq_file *m, void *arg) +{ + struct cgroup_seqfile_state *state = m-private; + struct cftype *cft = state-cft; + struct cgroup_map_cb cb = { + .fill = cgroup_map_add, + .state = m, + }; + if (cft-read_map) { + return cft-read_map(state-cgroup, cft, cb); + } else { + BUG(); + } +} + +int cgroup_seqfile_release(struct inode *inode, struct file *file) +{ + struct seq_file *seq = file-private_data; + kfree(seq-private); + return single_release(inode, file); +} + +static struct file_operations cgroup_seqfile_operations; + static int cgroup_file_open(struct inode *inode, struct file *file) { int err; @@ -1500,7 +1540,18 @@ static int cgroup_file_open(struct inode cft = __d_cft(file-f_dentry); if (!cft) return -ENODEV; - if (cft-open) + if (cft-read_map) { + struct cgroup_seqfile_state *state = + kzalloc(sizeof(*state), GFP_USER); + if (!state) + return -ENOMEM; + state-cft = cft; + state-cgroup = __d_cgrp(file-f_dentry-d_parent); + file-f_op = cgroup_seqfile_operations; + err = single_open(file, cgroup_seqfile_show, state); + if (err 0) + kfree(state); + } else if (cft-open) err = cft-open(inode, file); else err = 0; @@ -1539,6 +1590,12 @@ static struct file_operations cgroup_fil .release = cgroup_file_release, }; +static struct file_operations cgroup_seqfile_operations = { + .read = seq_read, + .llseek = seq_lseek, + .release = cgroup_seqfile_release, +}; + static struct inode_operations cgroup_dir_inode_operations = { .lookup = simple_lookup, .mkdir = cgroup_mkdir, @@ -2206,6 +2263,8 @@ static int cgroup_api_show(struct seq_fi if (type == CGROUP_FILE_UNKNOWN) { if (cft-read_uint) type = CGROUP_FILE_U64; + else if (cft-read_map) + type = CGROUP_FILE_MAP; else if (cft-read) type = CGROUP_FILE_STRING; else if (!cft-open) -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][PATCH 7/7] CGroup API: Update cpusets to use cgroup structured file API
Many of the cpusets control files are simple integer values, which don't require the overhead of memory allocations for reads and writes. Move the handlers for these control files into cpuset_read_uint() and cpuset_write_uint(). This also has the advantage that the control files show up as u64 rather than string in the cgroup.api file. Signed-off-by: Paul Menage [EMAIL PROTECTED] --- kernel/cpuset.c | 158 +--- 1 file changed, 83 insertions(+), 75 deletions(-) Index: cgroupmap-2.6.24-mm1/kernel/cpuset.c === --- cgroupmap-2.6.24-mm1.orig/kernel/cpuset.c +++ cgroupmap-2.6.24-mm1/kernel/cpuset.c @@ -999,19 +999,6 @@ int current_cpuset_is_being_rebound(void } /* - * Call with cgroup_mutex held. - */ - -static int update_memory_pressure_enabled(struct cpuset *cs, char *buf) -{ - if (simple_strtoul(buf, NULL, 10) != 0) - cpuset_memory_pressure_enabled = 1; - else - cpuset_memory_pressure_enabled = 0; - return 0; -} - -/* * update_flag - read a 0 or a 1 in a file and update associated flag * bit:the bit to update (CS_CPU_EXCLUSIVE, CS_MEM_EXCLUSIVE, * CS_SCHED_LOAD_BALANCE, @@ -1023,15 +1010,13 @@ static int update_memory_pressure_enable * Call with cgroup_mutex held. */ -static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs, char *buf) +static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs, + int turning_on) { - int turning_on; struct cpuset trialcs; int err; int cpus_nonempty, balance_flag_changed; - turning_on = (simple_strtoul(buf, NULL, 10) != 0); - trialcs = *cs; if (turning_on) set_bit(bit, trialcs.flags); @@ -1247,44 +1232,66 @@ static ssize_t cpuset_common_file_write( case FILE_MEMLIST: retval = update_nodemask(cs, buffer); break; + default: + retval = -EINVAL; + goto out2; + } + + if (retval == 0) + retval = nbytes; +out2: + cgroup_unlock(); +out1: + kfree(buffer); + return retval; +} + +static int cpuset_write_uint(struct cgroup *cgrp, struct cftype *cft, u64 val) +{ + int retval = 0; + struct cpuset *cs = cgroup_cs(cgrp); + cpuset_filetype_t type = cft-private; + + cgroup_lock(); + + if (cgroup_is_removed(cgrp)) { + cgroup_unlock(); + return -ENODEV; + } + + switch (type) { case FILE_CPU_EXCLUSIVE: - retval = update_flag(CS_CPU_EXCLUSIVE, cs, buffer); + retval = update_flag(CS_CPU_EXCLUSIVE, cs, val); break; case FILE_MEM_EXCLUSIVE: - retval = update_flag(CS_MEM_EXCLUSIVE, cs, buffer); + retval = update_flag(CS_MEM_EXCLUSIVE, cs, val); break; case FILE_SCHED_LOAD_BALANCE: - retval = update_flag(CS_SCHED_LOAD_BALANCE, cs, buffer); + retval = update_flag(CS_SCHED_LOAD_BALANCE, cs, val); break; case FILE_MEMORY_MIGRATE: - retval = update_flag(CS_MEMORY_MIGRATE, cs, buffer); + retval = update_flag(CS_MEMORY_MIGRATE, cs, val); break; case FILE_MEMORY_PRESSURE_ENABLED: - retval = update_memory_pressure_enabled(cs, buffer); + cpuset_memory_pressure_enabled = val; break; case FILE_MEMORY_PRESSURE: retval = -EACCES; break; case FILE_SPREAD_PAGE: - retval = update_flag(CS_SPREAD_PAGE, cs, buffer); + retval = update_flag(CS_SPREAD_PAGE, cs, val); cs-mems_generation = cpuset_mems_generation++; break; case FILE_SPREAD_SLAB: - retval = update_flag(CS_SPREAD_SLAB, cs, buffer); + retval = update_flag(CS_SPREAD_SLAB, cs, val); cs-mems_generation = cpuset_mems_generation++; break; default: retval = -EINVAL; - goto out2; + break; } - - if (retval == 0) - retval = nbytes; -out2: cgroup_unlock(); -out1: - kfree(buffer); - return retval; + return -EINVAL; } /* @@ -1345,30 +1352,6 @@ static ssize_t cpuset_common_file_read(s case FILE_MEMLIST: s += cpuset_sprintf_memlist(s, cs); break; - case FILE_CPU_EXCLUSIVE: - *s++ = is_cpu_exclusive(cs) ? '1' : '0'; - break; - case FILE_MEM_EXCLUSIVE: - *s++ = is_mem_exclusive(cs) ? '1' : '0'; - break; - case FILE_SCHED_LOAD_BALANCE: - *s++ = is_sched_load_balance(cs) ? '1' : '0'; - break
[RFC][PATCH 4/7] CGroup API: Add res_counter_read_uint()
Adds a function for returning the value of a resource counter member, in a form suitable for use in a cgroup read_uint control file method. Signed-off-by: Paul Menage [EMAIL PROTECTED] --- include/linux/res_counter.h |1 + kernel/res_counter.c|5 + 2 files changed, 6 insertions(+) Index: cgroupmap-2.6.24-mm1/include/linux/res_counter.h === --- cgroupmap-2.6.24-mm1.orig/include/linux/res_counter.h +++ cgroupmap-2.6.24-mm1/include/linux/res_counter.h @@ -54,6 +54,7 @@ struct res_counter { ssize_t res_counter_read(struct res_counter *counter, int member, const char __user *buf, size_t nbytes, loff_t *pos, int (*read_strategy)(unsigned long long val, char *s)); +u64 res_counter_read_uint(struct res_counter *counter, int member); ssize_t res_counter_write(struct res_counter *counter, int member, const char __user *buf, size_t nbytes, loff_t *pos, int (*write_strategy)(char *buf, unsigned long long *val)); Index: cgroupmap-2.6.24-mm1/kernel/res_counter.c === --- cgroupmap-2.6.24-mm1.orig/kernel/res_counter.c +++ cgroupmap-2.6.24-mm1/kernel/res_counter.c @@ -92,6 +92,11 @@ ssize_t res_counter_read(struct res_coun pos, buf, s - buf); } +u64 res_counter_read_uint(struct res_counter *counter, int member) +{ + return *res_counter_member(counter, member); +} + ssize_t res_counter_write(struct res_counter *counter, int member, const char __user *userbuf, size_t nbytes, loff_t *pos, int (*write_strategy)(char *st_buf, unsigned long long *val)) -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Add linux-fsdevel to VFS entry in MAINTAINERS
Add linux-fsdevel to the VFS entry in MAINTAINERS Signed-off-by: Paul Menage <[EMAIL PROTECTED]> --- MAINTAINERS |1 + 1 file changed, 1 insertion(+) Index: 2.6.24-mm1-bindflags/MAINTAINERS === --- 2.6.24-mm1-bindflags.orig/MAINTAINERS +++ 2.6.24-mm1-bindflags/MAINTAINERS @@ -1616,6 +1616,7 @@ S:Maintained FILESYSTEMS (VFS and infrastructure) P: Alexander Viro M: [EMAIL PROTECTED] +L: [EMAIL PROTECTED] S: Maintained FIREWIRE SUBSYSTEM (drivers/firewire, ) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add MS_BIND_FLAGS mount flag
On Thu, Feb 14, 2008 at 9:31 AM, Miklos Szeredi <[EMAIL PROTECTED]> wrote: > > I deliberately not used the MS_* flags, which is currently a messy mix > of things with totally different meanings. > > Does this solve all the issues? We should add a size parameter either in the mount_params or as a final argument, for future extensibility. And we might as well include MNT_READONLY in the API on the assumption that per-mount readonly will be available soon. Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add MS_BIND_FLAGS mount flag
On Thu, Feb 14, 2008 at 8:03 AM, Miklos Szeredi <[EMAIL PROTECTED]> wrote: > > The "flags" argument could be the same as for regular mount, and > > contain the mnt_flags - so the extra argument could maybe usefully be > > a "mnt_flags_mask", to indicate which flags we actually care about > > overriding. > > The way I imagined it, is that mnt_flags is a mask, and the operation > (determined by flags) is either: > > - set bits in mask > - clear bits in mask (or not in mask) > - set flags to mask > > It doesn't allow setting some bits, clearing some others, and leaving > alone the rest. But I think such flexibility isn't really needed. I think I'd suggest something like: new_mnt->mnt_flags = (old_mnt->mnt_flags & ~arg_mask) | (arg_flags & mask) > Maybe instead of messing with masks, it's better to introduce a > get_flags() or a more general mount_stat() operation, and let > userspace deal with setting and clearing flags, just as we do for > stat/chmod? > > So we'd have > > mount_stat(path, stat); > mount_bind(from, to, flags); > mount_set_flags(path, flags); > mount_move(from, to); > > and perhaps > > mount_remount(path, opt_string, flags); Sounds reasonable to me. But it wouldn't directly solve the "do a recursive bind mount setting the MS_READONLY flag on all children" problem, so we'd need some of the earlier suggestions too. Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add MS_BIND_FLAGS mount flag
[ cc: linux-fsdevel ] On Thu, Feb 14, 2008 at 7:22 AM, Paul Menage <[EMAIL PROTECTED]> wrote: > On Wed, Feb 13, 2008 at 10:02 PM, Christoph Hellwig <[EMAIL PROTECTED]> wrote: > > > > I think this concept is reasonable, but I don't think MS_BIND_FLAGS > > is a descriptive name for this flag. MS_EXPLICIT_FLAGS might be better > > but still isn't optimal. > > > > MS_BIND_FLAGS_OVERRIDE ? > > Paul > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add MS_BIND_FLAGS mount flag
On Wed, Feb 13, 2008 at 10:02 PM, Christoph Hellwig <[EMAIL PROTECTED]> wrote: > > I think this concept is reasonable, but I don't think MS_BIND_FLAGS > is a descriptive name for this flag. MS_EXPLICIT_FLAGS might be better > but still isn't optimal. > MS_BIND_FLAGS_OVERRIDE ? Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add MS_BIND_FLAGS mount flag
On Thu, Feb 14, 2008 at 12:30 AM, Miklos Szeredi <[EMAIL PROTECTED]> wrote: > > For recursive bind mounts, only the root of the tree being bound > > inherits the per-mount flags from the mount() arguments; sub-mounts > > inherit their per-mount flags from the source tree as usual. > > This is rather strange behavior. I think it would be much better, if > setting mount flags would work for recursive operations as well. Also > what we really need is not resetting all the mount flags to some > predetermined values, but to be able to set or clear each flag > individually. This is certainly true, but as you observe below it's a fair bit more fiddly to specify in the API. I wasn't sure how much people recursive bind mounts, so I figured I'd throw out this simpler version first. > > For example, with the per-mount-read-only thing the most useful > application would be to just set the read-only flag and leave the > others alone. > > And this is where we usually conclude, that a new userspace mount API > is long overdue. So for starters, how about a new syscall for bind > mounts: > > int mount_bind(const char *src, const char *dst, unsigned flags, > unsigned mnt_flags); The "flags" argument could be the same as for regular mount, and contain the mnt_flags - so the extra argument could maybe usefully be a "mnt_flags_mask", to indicate which flags we actually care about overriding. What would happen when an existing super-block flag changes to become a per-mount flag (e.g. per-mount read-only)? I think that would just fit in with the "mask" idea, as long as we complained if any bits in mnt_flags_mask weren't actually per-mount settable. Being able to mask/set mount flags might be useful on a remount too, since there's no clean way to get the existing mount flags for a mount other than by scanning /proc/mounts. So an alternative to a separate system call would be a new mnt_flag_mask argument to mount() (whose presence would be indicated by a flag bit being set in the main flags) which would be used to control which bits were set cleared for remount/bind calls. Seems a bit wasteful of bits though. If we turned "flags" into an (optionally) 64-bit argument then we'd have plenty of bits to be able to specify both a "set" bit and a "mask" bit for each, without needing a new syscall. Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add MS_BIND_FLAGS mount flag
On Thu, Feb 14, 2008 at 12:30 AM, Miklos Szeredi [EMAIL PROTECTED] wrote: For recursive bind mounts, only the root of the tree being bound inherits the per-mount flags from the mount() arguments; sub-mounts inherit their per-mount flags from the source tree as usual. This is rather strange behavior. I think it would be much better, if setting mount flags would work for recursive operations as well. Also what we really need is not resetting all the mount flags to some predetermined values, but to be able to set or clear each flag individually. This is certainly true, but as you observe below it's a fair bit more fiddly to specify in the API. I wasn't sure how much people recursive bind mounts, so I figured I'd throw out this simpler version first. For example, with the per-mount-read-only thing the most useful application would be to just set the read-only flag and leave the others alone. And this is where we usually conclude, that a new userspace mount API is long overdue. So for starters, how about a new syscall for bind mounts: int mount_bind(const char *src, const char *dst, unsigned flags, unsigned mnt_flags); The flags argument could be the same as for regular mount, and contain the mnt_flags - so the extra argument could maybe usefully be a mnt_flags_mask, to indicate which flags we actually care about overriding. What would happen when an existing super-block flag changes to become a per-mount flag (e.g. per-mount read-only)? I think that would just fit in with the mask idea, as long as we complained if any bits in mnt_flags_mask weren't actually per-mount settable. Being able to mask/set mount flags might be useful on a remount too, since there's no clean way to get the existing mount flags for a mount other than by scanning /proc/mounts. So an alternative to a separate system call would be a new mnt_flag_mask argument to mount() (whose presence would be indicated by a flag bit being set in the main flags) which would be used to control which bits were set cleared for remount/bind calls. Seems a bit wasteful of bits though. If we turned flags into an (optionally) 64-bit argument then we'd have plenty of bits to be able to specify both a set bit and a mask bit for each, without needing a new syscall. Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add MS_BIND_FLAGS mount flag
On Wed, Feb 13, 2008 at 10:02 PM, Christoph Hellwig [EMAIL PROTECTED] wrote: I think this concept is reasonable, but I don't think MS_BIND_FLAGS is a descriptive name for this flag. MS_EXPLICIT_FLAGS might be better but still isn't optimal. MS_BIND_FLAGS_OVERRIDE ? Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/