Re: [ckrm-tech] [PATCH 0/7] Containers (V8): Generic Process Containers
On 4/23/07, Vaidyanathan Srinivasan <[EMAIL PROTECTED]> wrote: Hi Paul, In [patch 3/7] Containers (V8): Add generic multi-subsystem API to containers, you have forcefully enabled interrupt in container_init_subsys() with spin_unlock_irq() which breaks on PPC64. > +static void container_init_subsys(struct container_subsys *ss) { > + int retval; > + struct list_head *l; > + printk(KERN_ERR "Initializing container subsys %s\n", > ss->name); > + > + /* Create the top container state for this subsystem */ > + ss->root = > + retval = ss->create(ss, dummytop); > + BUG_ON(retval); > + init_container_css(ss, dummytop); > + > + /* Update all container groups to contain a subsys > + * pointer to this state - since the subsystem is > + * newly registered, all tasks and hence all container > + * groups are in the subsystem's top container. */ > + spin_lock_irq(_group_lock); > + l = _container_group.list; > + do { > + struct container_group *cg = > + list_entry(l, struct container_group, list); > + cg->subsys[ss->subsys_id] = > dummytop->subsys[ss->subsys_id]; > + l = l->next; > + } while (l != _container_group.list); > + spin_unlock_irq(_group_lock); Interrupt gets enabled here and on PPC64, the kernel takes a pending decrementer and crashes because it is too early to handle them. Use of irqsave and restore routines would fix the problem. OK, thanks. I'll add that change. Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 0/7] Containers (V8): Generic Process Containers
On 4/23/07, Vaidyanathan Srinivasan [EMAIL PROTECTED] wrote: Hi Paul, In [patch 3/7] Containers (V8): Add generic multi-subsystem API to containers, you have forcefully enabled interrupt in container_init_subsys() with spin_unlock_irq() which breaks on PPC64. +static void container_init_subsys(struct container_subsys *ss) { + int retval; + struct list_head *l; + printk(KERN_ERR Initializing container subsys %s\n, ss-name); + + /* Create the top container state for this subsystem */ + ss-root = rootnode; + retval = ss-create(ss, dummytop); + BUG_ON(retval); + init_container_css(ss, dummytop); + + /* Update all container groups to contain a subsys + * pointer to this state - since the subsystem is + * newly registered, all tasks and hence all container + * groups are in the subsystem's top container. */ + spin_lock_irq(container_group_lock); + l = init_container_group.list; + do { + struct container_group *cg = + list_entry(l, struct container_group, list); + cg-subsys[ss-subsys_id] = dummytop-subsys[ss-subsys_id]; + l = l-next; + } while (l != init_container_group.list); + spin_unlock_irq(container_group_lock); Interrupt gets enabled here and on PPC64, the kernel takes a pending decrementer and crashes because it is too early to handle them. Use of irqsave and restore routines would fix the problem. OK, thanks. I'll add that change. Paul - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/7] Containers (V8): Generic Process Containers
Hi Paul, In [patch 3/7] Containers (V8): Add generic multi-subsystem API to containers, you have forcefully enabled interrupt in container_init_subsys() with spin_unlock_irq() which breaks on PPC64. > +static void container_init_subsys(struct container_subsys *ss) { > + int retval; > + struct list_head *l; > + printk(KERN_ERR "Initializing container subsys %s\n", > ss->name); > + > + /* Create the top container state for this subsystem */ > + ss->root = > + retval = ss->create(ss, dummytop); > + BUG_ON(retval); > + init_container_css(ss, dummytop); > + > + /* Update all container groups to contain a subsys > + * pointer to this state - since the subsystem is > + * newly registered, all tasks and hence all container > + * groups are in the subsystem's top container. */ > + spin_lock_irq(_group_lock); > + l = _container_group.list; > + do { > + struct container_group *cg = > + list_entry(l, struct container_group, list); > + cg->subsys[ss->subsys_id] = > dummytop->subsys[ss->subsys_id]; > + l = l->next; > + } while (l != _container_group.list); > + spin_unlock_irq(_group_lock); Interrupt gets enabled here and on PPC64, the kernel takes a pending decrementer and crashes because it is too early to handle them. Use of irqsave and restore routines would fix the problem. I have included the fix along with minor Kconfig correction. Also your 3/7 patch did not showup on LKML. --Vaidy > + > + need_forkexit_callback |= ss->fork || ss->exit; Index: linux-2.6.20/kernel/container.c === --- linux-2.6.20.orig/kernel/container.c +++ linux-2.6.20/kernel/container.c @@ -1638,6 +1638,7 @@ int container_is_descendant(const struct static void container_init_subsys(struct container_subsys *ss) { int retval; + unsigned long flags; struct list_head *l; printk(KERN_ERR "Initializing container subsys %s\n", ss->name); @@ -1651,7 +1652,7 @@ static void container_init_subsys(struct * pointer to this state - since the subsystem is * newly registered, all tasks and hence all container * groups are in the subsystem's top container. */ - spin_lock_irq(_group_lock); + spin_lock_irqsave(_group_lock, flags); l = _container_group.list; do { struct container_group *cg = @@ -1659,7 +1660,7 @@ static void container_init_subsys(struct cg->subsys[ss->subsys_id] = dummytop->subsys[ss->subsys_id]; l = l->next; } while (l != _container_group.list); - spin_unlock_irq(_group_lock); + spin_unlock_irqrestore(_group_lock, flags); need_forkexit_callback |= ss->fork || ss->exit; Index: linux-2.6.20/init/Kconfig === --- linux-2.6.20.orig/init/Kconfig +++ linux-2.6.20/init/Kconfig @@ -239,8 +239,13 @@ config IKCONFIG_PROC through /proc/config.gz. config CONTAINERS - bool - + bool "Container support" + help + This option will let you create and manage process containers, + which can be used to aggregate multiple processes, e.g. for + the purposes of resource tracking. + + Say N if unsure config CPUSETS bool "Cpuset support"
Re: [PATCH 0/7] Containers (V8): Generic Process Containers
Hi Paul, In [patch 3/7] Containers (V8): Add generic multi-subsystem API to containers, you have forcefully enabled interrupt in container_init_subsys() with spin_unlock_irq() which breaks on PPC64. +static void container_init_subsys(struct container_subsys *ss) { + int retval; + struct list_head *l; + printk(KERN_ERR Initializing container subsys %s\n, ss-name); + + /* Create the top container state for this subsystem */ + ss-root = rootnode; + retval = ss-create(ss, dummytop); + BUG_ON(retval); + init_container_css(ss, dummytop); + + /* Update all container groups to contain a subsys + * pointer to this state - since the subsystem is + * newly registered, all tasks and hence all container + * groups are in the subsystem's top container. */ + spin_lock_irq(container_group_lock); + l = init_container_group.list; + do { + struct container_group *cg = + list_entry(l, struct container_group, list); + cg-subsys[ss-subsys_id] = dummytop-subsys[ss-subsys_id]; + l = l-next; + } while (l != init_container_group.list); + spin_unlock_irq(container_group_lock); Interrupt gets enabled here and on PPC64, the kernel takes a pending decrementer and crashes because it is too early to handle them. Use of irqsave and restore routines would fix the problem. I have included the fix along with minor Kconfig correction. Also your 3/7 patch did not showup on LKML. --Vaidy + + need_forkexit_callback |= ss-fork || ss-exit; Index: linux-2.6.20/kernel/container.c === --- linux-2.6.20.orig/kernel/container.c +++ linux-2.6.20/kernel/container.c @@ -1638,6 +1638,7 @@ int container_is_descendant(const struct static void container_init_subsys(struct container_subsys *ss) { int retval; + unsigned long flags; struct list_head *l; printk(KERN_ERR Initializing container subsys %s\n, ss-name); @@ -1651,7 +1652,7 @@ static void container_init_subsys(struct * pointer to this state - since the subsystem is * newly registered, all tasks and hence all container * groups are in the subsystem's top container. */ - spin_lock_irq(container_group_lock); + spin_lock_irqsave(container_group_lock, flags); l = init_container_group.list; do { struct container_group *cg = @@ -1659,7 +1660,7 @@ static void container_init_subsys(struct cg-subsys[ss-subsys_id] = dummytop-subsys[ss-subsys_id]; l = l-next; } while (l != init_container_group.list); - spin_unlock_irq(container_group_lock); + spin_unlock_irqrestore(container_group_lock, flags); need_forkexit_callback |= ss-fork || ss-exit; Index: linux-2.6.20/init/Kconfig === --- linux-2.6.20.orig/init/Kconfig +++ linux-2.6.20/init/Kconfig @@ -239,8 +239,13 @@ config IKCONFIG_PROC through /proc/config.gz. config CONTAINERS - bool - + bool Container support + help + This option will let you create and manage process containers, + which can be used to aggregate multiple processes, e.g. for + the purposes of resource tracking. + + Say N if unsure config CPUSETS bool Cpuset support
[PATCH 0/7] Containers (V8): Generic Process Containers
-- This is an update to my multi-hierarchy multi-subsystem generic process containers patch. Changes since V7 (12th Feb) include: - Removed the config-time choice of the number of supported hierarchies - this is now completely dynamic; new hierarchies are allocated on demand, and freed when no longer in use. - Subsystems are now registered at compile-time in linux/container_subsys.h. This allows for faster access to subsystem state since the id is a compile-time constant, so there's only a single extra pointer dereference compared to having a pointer directly in the task_struct. It also avoids wasting space with unused subsystem pointers. - Removed the container pointers from container_group - this results in a structure very similar to Srivatsa Vaddagiri's rcfs approach. (RCFS uses the nsproxy object rather than the container_group object; merging container_group and nsproxy would be pretty straightforward if desired). - Removed callback_mutex from container subsystem to be purely back in the cpuset subsystem. Renamed manage_mutex to container_mutex. - Condensed post_attach_task() into attach_task() now that callback_mutex is purely within cpuset.c - Simplified the container_subsys_state reference counting - stricter rules on liveness make adding reference counts cheaper. Still TODO: - decide whether "Containers" is an acceptable name for the system given its usage by some other development groups, or whether something else (ProcessSets? ResourceGroups?) would be better - decide whether merging container_group and nsproxy is desirable - add a hash-table based lookup for container_group objects. - use seq_file properly in container tasks files (and also in cpuset_attach_task) to avoid having to allocate a big array for all the container's task pointers. - add back support for the "release agent" functionality - lots more testing - define standards for container file names Generic Process Containers -- There have recently been various proposals floating around for resource management/accounting and other task grouping subsystems in the kernel, including ResGroups, User BeanCounters, NSProxy containers, and others. These all need the basic abstraction of being able to group together multiple processes in an aggregate, in order to track/limit the resources permitted to those processes, or control other behaviour of the processes, and all implement this grouping in different ways. Already existing in the kernel is the cpuset subsystem; this has a process grouping mechanism that is mature, tested, and well documented (particularly with regards to synchronization rules). This patchset extracts the process grouping code from cpusets into a generic container system, and makes the cpusets code a client of the container system. It also provides several example clients of the container system, including ResGroups, BeanCounters and namespace proxy. The change is implemented in three implementation patches, plus four example subsystems that aren't necessarily intended to be merged as part of this patch set, but demonstrate the applicability of the framework. 1) extract the process grouping code from cpusets into a standalone system 2) remove the process grouping code from cpusets and hook into the container system 3) convert the container system to present a generic multi-hierarchy API, and make cpusets a client of that API 4) example of a simple CPU accounting container subsystem. Useful as a boilerplate for people implementing their own subsystems. 5) example of implementing ResGroups and its numtasks controller over generic containers 6) example of implementing BeanCounters and its numfiles counter over generic containers 7) example of integrating the namespace isolation code (sys_unshare() or various clone flags) with generic containers, allowing virtual servers to take advantage of other resource control efforts. The intention is that the various resource management and virtualization efforts can also become container clients, with the result that: - the userspace APIs are (somewhat) normalised - it's easier to test out e.g. the ResGroups CPU controller in conjunction with the BeanCounters memory controller, or use either of them as the resource-control portion of a virtual server system. - the additional kernel footprint of any of the competing resource management systems is substantially reduced, since it doesn't need to provide process grouping/containment, hence improving their chances of getting into the kernel Signed-off-by: Paul Menage <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/7] Containers (V8): Generic Process Containers
-- This is an update to my multi-hierarchy multi-subsystem generic process containers patch. Changes since V7 (12th Feb) include: - Removed the config-time choice of the number of supported hierarchies - this is now completely dynamic; new hierarchies are allocated on demand, and freed when no longer in use. - Subsystems are now registered at compile-time in linux/container_subsys.h. This allows for faster access to subsystem state since the id is a compile-time constant, so there's only a single extra pointer dereference compared to having a pointer directly in the task_struct. It also avoids wasting space with unused subsystem pointers. - Removed the container pointers from container_group - this results in a structure very similar to Srivatsa Vaddagiri's rcfs approach. (RCFS uses the nsproxy object rather than the container_group object; merging container_group and nsproxy would be pretty straightforward if desired). - Removed callback_mutex from container subsystem to be purely back in the cpuset subsystem. Renamed manage_mutex to container_mutex. - Condensed post_attach_task() into attach_task() now that callback_mutex is purely within cpuset.c - Simplified the container_subsys_state reference counting - stricter rules on liveness make adding reference counts cheaper. Still TODO: - decide whether Containers is an acceptable name for the system given its usage by some other development groups, or whether something else (ProcessSets? ResourceGroups?) would be better - decide whether merging container_group and nsproxy is desirable - add a hash-table based lookup for container_group objects. - use seq_file properly in container tasks files (and also in cpuset_attach_task) to avoid having to allocate a big array for all the container's task pointers. - add back support for the release agent functionality - lots more testing - define standards for container file names Generic Process Containers -- There have recently been various proposals floating around for resource management/accounting and other task grouping subsystems in the kernel, including ResGroups, User BeanCounters, NSProxy containers, and others. These all need the basic abstraction of being able to group together multiple processes in an aggregate, in order to track/limit the resources permitted to those processes, or control other behaviour of the processes, and all implement this grouping in different ways. Already existing in the kernel is the cpuset subsystem; this has a process grouping mechanism that is mature, tested, and well documented (particularly with regards to synchronization rules). This patchset extracts the process grouping code from cpusets into a generic container system, and makes the cpusets code a client of the container system. It also provides several example clients of the container system, including ResGroups, BeanCounters and namespace proxy. The change is implemented in three implementation patches, plus four example subsystems that aren't necessarily intended to be merged as part of this patch set, but demonstrate the applicability of the framework. 1) extract the process grouping code from cpusets into a standalone system 2) remove the process grouping code from cpusets and hook into the container system 3) convert the container system to present a generic multi-hierarchy API, and make cpusets a client of that API 4) example of a simple CPU accounting container subsystem. Useful as a boilerplate for people implementing their own subsystems. 5) example of implementing ResGroups and its numtasks controller over generic containers 6) example of implementing BeanCounters and its numfiles counter over generic containers 7) example of integrating the namespace isolation code (sys_unshare() or various clone flags) with generic containers, allowing virtual servers to take advantage of other resource control efforts. The intention is that the various resource management and virtualization efforts can also become container clients, with the result that: - the userspace APIs are (somewhat) normalised - it's easier to test out e.g. the ResGroups CPU controller in conjunction with the BeanCounters memory controller, or use either of them as the resource-control portion of a virtual server system. - the additional kernel footprint of any of the competing resource management systems is substantially reduced, since it doesn't need to provide process grouping/containment, hence improving their chances of getting into the kernel Signed-off-by: Paul Menage [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/