On Mon, 2011-02-07 at 13:21 -0800, Paul Menage wrote: > On Mon, Feb 7, 2011 at 12:02 PM, Peter Zijlstra <pet...@infradead.org> wrote: > > On Mon, 2011-02-07 at 11:28 -0800, Paul Menage wrote: > >> On Mon, Feb 7, 2011 at 8:10 AM, Peter Zijlstra <pet...@infradead.org> > >> wrote: > >> > > >> > Make the ::exit method act like ::attach, it is after all very nearly > >> > the same thing. > >> > >> The major difference between attach and exit is that the former is > >> only triggered in response to user cgroup-management action, whereas > >> the latter is triggered whenever a task exits, even if cgroups aren't > >> set up. > > > > And the major likeness is that they both migrate a task from one cgroup > > to another. You cannot simply ignore that. > > True, and the exit path for cgroups has always been a bit fuzzy - that > was kind of inherited from cpusets, where this worked out OK, but the > CPU subsystem has more interesting requirements.
Yeah, from that pov we're actually still a running task that can and will scheduler still. > An important semantic difference between attach and exit is that in > the exit case the destination is always the root, and the task in > question is going to be exiting before doing anything else > interesting. So it should be possible to optimize that path a lot > compared to the regular attach (many things related to resource usage > can be ignored, since the task won't have time to actually use any > non-trivial amount of resources). One could also argue that the accumulated time (/other resources) of all tasks exiting might end up being a significant amount, but yeah whatever :-) I'm mostly concerned with not wrecking state (and crashing/leaking etc). > > > > If maybe you're like respond more often than about once every two months > > I might actually care what you think. > > Yes, sadly since switching groups at Google, cgroups has become pretty > much just a spare-time activity for me And here I thought Google was starting to understand what community participation meant.. is there anybody you know who can play a more active role in the whole cgroup thing? > - but that in itself doesn't > automatically invalidate my opinion when I do have time to respond. > It's still the case that cgroup_mutex is an incredibly heavyweight > mutex that has no business being in the task exit path. Or do you just > believe that ad hominem is a valid style of argument? No, it was mostly frustration talking. > How about if the exit callback was moved before the preceding > task_unlock()? Since I think the scheduler is still the only user of > the exit callback, redefining the locking semantics should be fine. Like the below? Both the perf and sched exit callback are fine with being called under task_lock afaict, but I haven't actually ran with lockdep enabled to see if I missed something. I also pondered doing the cgroup exit from __put_task_struct() but that had another problem iirc. --- Index: linux-2.6/include/linux/cgroup.h =================================================================== --- linux-2.6.orig/include/linux/cgroup.h +++ linux-2.6/include/linux/cgroup.h @@ -474,7 +474,8 @@ struct cgroup_subsys { struct cgroup *old_cgrp, struct task_struct *tsk, bool threadgroup); void (*fork)(struct cgroup_subsys *ss, struct task_struct *task); - void (*exit)(struct cgroup_subsys *ss, struct task_struct *task); + void (*exit)(struct cgroup_subsys *ss, struct cgroup *cgrp, + struct cgroup *old_cgrp, struct task_struct *task); int (*populate)(struct cgroup_subsys *ss, struct cgroup *cgrp); void (*post_clone)(struct cgroup_subsys *ss, struct cgroup *cgrp); Index: linux-2.6/kernel/cgroup.c =================================================================== --- linux-2.6.orig/kernel/cgroup.c +++ linux-2.6/kernel/cgroup.c @@ -4230,20 +4230,8 @@ void cgroup_post_fork(struct task_struct */ void cgroup_exit(struct task_struct *tsk, int run_callbacks) { - int i; struct css_set *cg; - - if (run_callbacks && need_forkexit_callback) { - /* - * modular subsystems can't use callbacks, so no need to lock - * the subsys array - */ - for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) { - struct cgroup_subsys *ss = subsys[i]; - if (ss->exit) - ss->exit(ss, tsk); - } - } + int i; /* * Unlink from the css_set task list if necessary. @@ -4261,7 +4249,24 @@ void cgroup_exit(struct task_struct *tsk task_lock(tsk); cg = tsk->cgroups; tsk->cgroups = &init_css_set; + + if (run_callbacks && need_forkexit_callback) { + /* + * modular subsystems can't use callbacks, so no need to lock + * the subsys array + */ + for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) { + struct cgroup_subsys *ss = subsys[i]; + if (ss->exit) { + struct cgroup *old_cgrp = + rcu_dereference_raw(cg->subsys[i])->cgroup; + struct cgroup *cgrp = task_cgroup(tsk, i); + ss->exit(ss, cgrp, old_cgrp, tsk); + } + } + } task_unlock(tsk); + if (cg) put_css_set_taskexit(cg); } Index: linux-2.6/kernel/sched.c =================================================================== --- linux-2.6.orig/kernel/sched.c +++ linux-2.6/kernel/sched.c @@ -606,9 +606,6 @@ static inline struct task_group *task_gr struct task_group *tg; struct cgroup_subsys_state *css; - if (p->flags & PF_EXITING) - return &root_task_group; - css = task_subsys_state_check(p, cpu_cgroup_subsys_id, lockdep_is_held(&task_rq(p)->lock)); tg = container_of(css, struct task_group, css); @@ -9081,7 +9078,8 @@ cpu_cgroup_attach(struct cgroup_subsys * } static void -cpu_cgroup_exit(struct cgroup_subsys *ss, struct task_struct *task) +cpu_cgroup_exit(struct cgroup_subsys *ss, struct cgroup *cgrp, + struct cgroup *old_cgrp, struct task_struct *task) { /* * cgroup_exit() is called in the copy_process() failure path. ------------------------------------------------------------------------------ The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb _______________________________________________ perfmon2-devel mailing list perfmon2-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/perfmon2-devel