[Devel] Re: [PATCH 1/14][NETNS]: Introduce the net-subsys id generator.
+int register_pernet_gen_device(int *id, struct pernet_operations *ops) +{ +int error; +mutex_lock(net_mutex); +again: +error = ida_get_new_above(net_generic_ids, 1, id); +if (error) { +if (error == -EAGAIN) { +ida_pre_get(net_generic_ids, GFP_KERNEL); +goto again; +} goto out; +} +error = register_pernet_operations(first_device, ops); +if (error) +ida_remove(net_generic_ids, *id); +else if (first_device == pernet_list) +first_device = ops-list; out: Oops! Thank, will fix. ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [RFC PATCH 0/4] Container Freezer: Reuse Suspend Freezer
Hi! NOTE: Due to problems with my MTA configuration two earlier attempts reached linux-pm but not linux-kernel. Please cc [EMAIL PROTECTED] on replies. This patchset is a prototype using the container infrastructure and the swsusp freezer to freeze a group of tasks. I've merely taken Cedric's patches, forward-ported them to 2.6.25-rc8-mm1 and done a small amount of testing. Okay, freezer probably does what you want, but be warned that Linus is not exactly in love with freezer. You probably can get away with using it for user processes, but maybe you should drop him the line saying you want to expand freezer usage and see what happens. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [RFC PATCH 1/4] Container Freezer: Add TIF_FREEZE flag to all architectures
On Thu 2008-04-03 14:03:17, [EMAIL PROTECTED] wrote: This patch is the first step in making the refrigerator() available to all architectures, even for those without power management. The purpose of such a change is to be able to use the refrigerator() in a new control group subsystem which will implement a control group freezer. Signed-off-by: Cedric Le Goater [EMAIL PROTECTED] Signed-off-by: Matt Helsley [EMAIL PROTECTED] Tested-by: Matt Helsley [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] ACK. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [PATCH 6/14][RTNL]: Introduce the rtnl_kill_links call.
Pavel Emelyanov wrote: for_each_net(net) { -restart: - for_each_netdev_safe(net, dev, n) { - if (dev-rtnl_link_ops == ops) { - ops-dellink(dev); - goto restart; - } - } + __rtnl_kill_links(net, ops); This was _safe, and now it's not. Is that intentional? Yup - we goto restart in case we del some link, so there's no need in _safe iteration. This goto was added by Partick (commit 68365458 [NET]: rtnl_link: fix use-after-free) and I suspect he simply forgot to remove the _safe iterator (I put him in Cc to correct me if I'm wrong). No, that was an oversight, it should be safe to remove. ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [PATCH 2/14][NETNS]: Generic per-net pointers.
Pavel Emelyanov wrote: Add the elastic array of void * pointer to the struct net. The access rules are simple: 1. register the ops with register_pernet_gen_device to get the id of your private pointer 2. call net_assign_generic() to put the private data on the struct net (most preferably this should be done in the -init callback of the ops registered) 3. do not change this pointer while the net is alive; 4. use the net_generic() to get the pointer. When adding a new pointer, I copy the old array, replace it with a new one and schedule the old for kfree after an RCU grace period. Since the net_generic explores the net-gen array inside rcu read section and once set the net-gen-ptr[x] pointer never changes, this grants us a safe access to generic pointers. Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- include/net/net_namespace.h |2 + include/net/netns/generic.h | 49 ++ net/core/net_namespace.c| 62 +++ 3 files changed, 113 insertions(+), 0 deletions(-) create mode 100644 include/net/netns/generic.h diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index 6971fdb..e3d4eb4 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -19,6 +19,7 @@ struct proc_dir_entry; struct net_device; struct sock; struct ctl_table_header; +struct net_generic; struct net { atomic_tcount; /* To decided when the network @@ -57,6 +58,7 @@ struct net { #ifdef CONFIG_NETFILTER struct netns_xt xt; #endif + struct net_generic *gen; }; diff --git a/include/net/netns/generic.h b/include/net/netns/generic.h new file mode 100644 index 000..e8a6d27 --- /dev/null +++ b/include/net/netns/generic.h @@ -0,0 +1,49 @@ +/* + * generic net pointers + */ + +#ifndef __NET_GENERIC_H__ +#define __NET_GENERIC_H__ + +#include linux/rcupdate.h + +/* + * Generic net pointers are to be used by modules + * to put some private stuff on the struct net without + * explicit struct net modification + * + * The rules are simple: + * 1. register the ops with register_pernet_gen_device to get + *the id of your private pointer + * 2. call net_assign_generic() to put the private data on the + *struct net (most preferably this should be done in the + *-init callback of the ops registered) + * 3. do not change this pointer while the net is alive. + * + * After accomplishing all of the above, the private pointer + * can be accessed with the net_generic() call. + */ + +struct net_generic { + unsigned int len; + struct rcu_head rcu; + + void *ptr[0]; +}; + +static inline void *net_generic(struct net *net, int id) +{ + struct net_generic *ng; + void *ptr; + + rcu_read_lock(); + ng = rcu_dereference(net-gen); + BUG_ON(id == 0 || id ng-len); + ptr = ng-ptr[id - 1]; + rcu_read_unlock(); + + return ptr; +} + +extern int net_assign_generic(struct net *net, int id, void *data); +#endif diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index 7ef3bac..b384840 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -7,6 +7,7 @@ #include linux/sched.h #include linux/idr.h #include net/net_namespace.h +#include net/netns/generic.h /* * Our network namespace constructor/destructor lists @@ -21,6 +22,8 @@ LIST_HEAD(net_namespace_list); struct net init_net; EXPORT_SYMBOL(init_net); +#define INITIAL_NET_GEN_PTRS 13 /* +1 for len +2 for rcu_head */ + /* * setup_net runs the initializers for the network namespace object. */ @@ -29,10 +32,21 @@ static __net_init int setup_net(struct net *net) /* Must be called with net_mutex held */ struct pernet_operations *ops; int error; + struct net_generic *ng; atomic_set(net-count, 1); atomic_set(net-use_count, 0); + error = -ENOMEM; + ng = kzalloc(sizeof(struct net_generic) + + INITIAL_NET_GEN_PTRS * sizeof(void *), GFP_KERNEL); Why do you need to allocate more than sizeof(struct net_generic) ? + if (ng == NULL) + goto out; + + ng-len = INITIAL_NET_GEN_PTRS; + INIT_RCU_HEAD(ng-rcu); + rcu_assign_pointer(net-gen, ng); + error = 0; list_for_each_entry(ops, pernet_list, list) { if (ops-init) { @@ -54,6 +68,7 @@ out_undo: } rcu_barrier(); + kfree(ng); goto out; } @@ -384,3 +399,50 @@ void unregister_pernet_gen_device(int id, struct pernet_operations *ops) mutex_unlock(net_mutex); } EXPORT_SYMBOL_GPL(unregister_pernet_gen_device); + +static void net_generic_release(struct rcu_head *rcu) +{ + struct net_generic *ng; + + ng = container_of(rcu, struct net_generic, rcu); + kfree(ng); +} + +int
[Devel] Re: [PATCH 2/14][NETNS]: Generic per-net pointers.
[snip] @@ -29,10 +32,21 @@ static __net_init int setup_net(struct net *net) /* Must be called with net_mutex held */ struct pernet_operations *ops; int error; +struct net_generic *ng; atomic_set(net-count, 1); atomic_set(net-use_count, 0); +error = -ENOMEM; +ng = kzalloc(sizeof(struct net_generic) + +INITIAL_NET_GEN_PTRS * sizeof(void *), GFP_KERNEL); Why do you need to allocate more than sizeof(struct net_generic) ? That's just an optimization to avoid many reallocations in the nearest future. I planned to make similar in net_assign_generic (allocate a bit more that required), but decided to do it later to keep net_assign_generic simpler. Currently I have only 5 users of generic pointers (tun and vlan you see and I have patches for ipip, ipgre and sit tunnels), so that's enough for the first time. [snip] +int net_assign_generic(struct net *net, int id, void *data) +{ +struct net_generic *ng, *old_ng; + +BUG_ON(!mutex_is_locked(net_mutex)); +BUG_ON(id == 0); + +ng = old_ng = net-gen; shouldn't it be rcu_dereferenced ? Nope - nobody can race with us and change this pointer, so it's safe to get one without rcu_dereference. Thanks, Pavel ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [PATCH 3/14][TUN]: Introduce the tun_net structure.
On Fri, Apr 11, 2008 at 11:55:59AM +0400, Pavel Emelyanov wrote: Paul E. McKenney wrote: On Thu, Apr 10, 2008 at 07:06:24PM +0400, Pavel Emelyanov wrote: This is the first step in making tuntap devices work in net namespaces. The structure mentioned is pointed by generic net pointer with tun_net_id id, and tun driver fills one on its load. It will contain only the tun devices list. So declare this structure and introduce net init and exit hooks. OK, I have to ask... What prevents someone else from invoking net_generic() concurrently with a call to tun_exit_net(), potentially obtaining a pointer to the structure that tun_exit_net() is about to kfree()? It's the same as if the tun_net was directly pointed by the struct net. Nobody can grant, that the pointer got by you from the struct net is not going to become free, unless you provide this security by yourself. So tun_net acquires some lock before calling net_generic(), and that same lock is held when calling tun_exit_net()? Or is there but a single tun_net task, so that it will never call tun_net_exit() at the same time that it calls net_generic() for the tun_net pointer? But if you call net_generic to get some pointer other than tun_net, then you're fine (due to RCU), providing you play the same rules with the pointer you're getting. Agreed, RCU protects the net_generic structure, but not the structures pointed to by that structure. Maybe I'm missing something in your question, can you provide some testcase, that you suspect may cause an OOPS? Just trying to understand what prevents one task from calling net_generic() to pick up the tun_net pointer at the same time some other task calls tun_net_exit(). Thanx, Paul Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- drivers/net/tun.c | 53 - 1 files changed, 52 insertions(+), 1 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 7b816a0..9bfba02 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -63,6 +63,7 @@ #include linux/if_tun.h #include linux/crc32.h #include net/net_namespace.h +#include net/netns/generic.h #include asm/system.h #include asm/uaccess.h @@ -73,6 +74,11 @@ static int debug; /* Network device part of the driver */ +static unsigned int tun_net_id; +struct tun_net { + struct list_head dev_list; +}; + static LIST_HEAD(tun_dev_list); static const struct ethtool_ops tun_ethtool_ops; @@ -873,6 +879,37 @@ static const struct ethtool_ops tun_ethtool_ops = { .set_rx_csum= tun_set_rx_csum }; +static int tun_init_net(struct net *net) +{ + struct tun_net *tn; + + tn = kmalloc(sizeof(*tn), GFP_KERNEL); + if (tn == NULL) + return -ENOMEM; + + INIT_LIST_HEAD(tn-dev_list); + + if (net_assign_generic(net, tun_net_id, tn)) { + kfree(tn); + return -ENOMEM; + } + + return 0; +} + +static void tun_exit_net(struct net *net) +{ + struct tun_net *tn; + + tn = net_generic(net, tun_net_id); + kfree(tn); +} + +static struct pernet_operations tun_net_ops = { + .init = tun_init_net, + .exit = tun_exit_net, +}; + static int __init tun_init(void) { int ret = 0; @@ -880,9 +917,22 @@ static int __init tun_init(void) printk(KERN_INFO tun: %s, %s\n, DRV_DESCRIPTION, DRV_VERSION); printk(KERN_INFO tun: %s\n, DRV_COPYRIGHT); + ret = register_pernet_gen_device(tun_net_id, tun_net_ops); + if (ret) { + printk(KERN_ERR tun: Can't register pernet ops\n); + goto err_pernet; + } + ret = misc_register(tun_miscdev); - if (ret) + if (ret) { printk(KERN_ERR tun: Can't register misc device %d\n, TUN_MINOR); + goto err_misc; + } + return 0; + +err_misc: + unregister_pernet_gen_device(tun_net_id, tun_net_ops); +err_pernet: return ret; } @@ -899,6 +949,7 @@ static void tun_cleanup(void) } rtnl_unlock(); + unregister_pernet_gen_device(tun_net_id, tun_net_ops); } module_init(tun_init); -- 1.5.3.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [PATCH 3/14][TUN]: Introduce the tun_net structure.
Paul E. McKenney wrote: On Fri, Apr 11, 2008 at 11:55:59AM +0400, Pavel Emelyanov wrote: Paul E. McKenney wrote: On Thu, Apr 10, 2008 at 07:06:24PM +0400, Pavel Emelyanov wrote: This is the first step in making tuntap devices work in net namespaces. The structure mentioned is pointed by generic net pointer with tun_net_id id, and tun driver fills one on its load. It will contain only the tun devices list. So declare this structure and introduce net init and exit hooks. OK, I have to ask... What prevents someone else from invoking net_generic() concurrently with a call to tun_exit_net(), potentially obtaining a pointer to the structure that tun_exit_net() is about to kfree()? It's the same as if the tun_net was directly pointed by the struct net. Nobody can grant, that the pointer got by you from the struct net is not going to become free, unless you provide this security by yourself. So tun_net acquires some lock before calling net_generic(), and that same lock is held when calling tun_exit_net()? Or is there but a No. single tun_net task, so that it will never call tun_net_exit() at the same time that it calls net_generic() for the tun_net pointer? tun_net_exit is called only when a struct net is no longer referenced and is going to be kfree-ed itself, so it's impossible (or BUGy by its own) that someone still has a pointer on this net. Providing the struct net is alive (!), the net-gen array is alive (or is scheduled for kfree after RCU grace period). Thus, if your code holds the net and uses the net_generic() call, then it will get alive net-gen array and alive tun_net pointer. Next, what happens after net_generic() completes and leaves the RCU-read section? Simple - the struct net is (should be) still referenced, so the tun_net_exit cannot yet be called and thus the tun_net pointer obtained earlier is alive. Unlike the (possibly) former instance of the net_generic array, but nobody references this one in my code (and should not do so, hm... I think I'll add this rule to the comments). But if you call net_generic to get some pointer other than tun_net, then you're fine (due to RCU), providing you play the same rules with the pointer you're getting. Agreed, RCU protects the net_generic structure, but not the structures pointed to by that structure. They are protected by struct net reference counting. Maybe I'm missing something in your question, can you provide some testcase, that you suspect may cause an OOPS? Just trying to understand what prevents one task from calling net_generic() to pick up the tun_net pointer at the same time some other task calls tun_net_exit(). If this task dereferences a held struct net, then should be OK. If this task does not, this will OOPs in any case. Consider the struct net to look like struct net { ... void *ptrs[N]; } and the net_generic to be just static inline void net_generic(struct net *net, int id) { BUG_ON(id = N); return net-ptrs[id - 1]; } That's the same to what I propose, except for the ptrs array is on the RCU protected memory. Thanx, Pau Thanks, Pavel ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: nets: status of sysfs with netns
Eric W. Biederman wrote: Pavel Emelyanov [EMAIL PROTECTED] writes: Benjamin Thery wrote: Eric, Pavel, I haven't followed everything about the sysfs/netns issue recently and I think I have missed some of its recent developments in the past weeks. So I'm wondering what is the current status? Is anyone still working on it? I'm thinking on it, but have no graceful solution :( Sysfs is a big... no - HUUGE problem. As several netns patches for IPv4 and IPv6 have already been merged in net-2.6.26, it would be great to have the sysfs part also to help people who want test netns. Right now, it is a pain to boot and test a system with CONFIG_SYSFS=n. (I know the problem is not trivial and won't be easy to solve. I just want to be sure it is not forgotten) It is (unfortunately) not. Pavel, I've heard you sent (or have) a patch that prevents sysfs access from a child netns. A patch that can be a good workaround until we have the full thing ready. I can't find it in mailing lists archives, can you give some hints about where to find it? :) Hm. I haven't had plans to do such things actually. Thanks a lot. Sorry guys. I think my initial reply got lost. Currently my approach was approved in code review. The only hold up seems to be time, and gregkh getting overloaded and dropping the patches, for the Nth time. Hi Eric, I tryed to look at your patchset but I was unable to find it. Do you have it somewhere available ? I was guessing I could play with it and by the way test it a little ;) Thanks -- Daniel Sauf indication contraire ci-dessus: Compagnie IBM France Siège Social : Tour Descartes, 2, avenue Gambetta, La Défense 5, 92400 Courbevoie RCS Nanterre 552 118 465 Forme Sociale : S.A.S. Capital Social : 542.737.118 ? SIREN/SIRET : 552 118 465 02430 ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [PATCH 0/14 (3 subsets)] Make tuns and vlans devices work per-net.
Pavel Emelyanov wrote: Hi, guys. I've recently sent a TUN devices virtualization, but it was rejected by Dave, since the struct net is becoming a dumping ground. I agree with him - we really need some way to register on-net data dynamically. That's my view of such a thing and two examples of how to use it (TUN and VLAN devices virtualization). If this will be found good, I'll send these sets to David, hoping he will accept them :) Pavel, seems to be a smart solution :) I am just afraid with the performances when the network resources are to be accessed in the fast path like a routing table (that seems not to be the case for tun and vlan). Shall we assume the fast path should always go to struct net and non critical path can go to net_generic ? -- Daniel ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [PATCH 0/14 (3 subsets)] Make tuns and vlans devices work per-net.
Daniel Lezcano wrote: Pavel Emelyanov wrote: Hi, guys. I've recently sent a TUN devices virtualization, but it was rejected by Dave, since the struct net is becoming a dumping ground. I agree with him - we really need some way to register on-net data dynamically. That's my view of such a thing and two examples of how to use it (TUN and VLAN devices virtualization). If this will be found good, I'll send these sets to David, hoping he will accept them :) Pavel, seems to be a smart solution :) Thanks :) However, I've already found a bug in the 1st patch (already fixed). I am just afraid with the performances when the network resources are to be accessed in the fast path like a routing table (that seems not to be the case for tun and vlan). Shall we assume the fast path should always go to struct net and non critical path can go to net_generic ? Hm... I put call to net_generic() into tunnels rcv call and measured the performance with netperf - no performance penalty. I tried to make net_generic() work w/o any locks and looks like I've managed to make it fast enough :) I think, that core kernel code and protocols should/may use the struct net, while modules are better to work via generic pointers. However, if the generic pointers cause noticeable performance degradation, then we may ask Dave to bear with on-net members :) -- Daniel ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [PATCH 3/14][TUN]: Introduce the tun_net structure.
On Fri, Apr 11, 2008 at 07:45:06PM +0400, Pavel Emelyanov wrote: Paul E. McKenney wrote: On Fri, Apr 11, 2008 at 11:55:59AM +0400, Pavel Emelyanov wrote: Paul E. McKenney wrote: On Thu, Apr 10, 2008 at 07:06:24PM +0400, Pavel Emelyanov wrote: This is the first step in making tuntap devices work in net namespaces. The structure mentioned is pointed by generic net pointer with tun_net_id id, and tun driver fills one on its load. It will contain only the tun devices list. So declare this structure and introduce net init and exit hooks. OK, I have to ask... What prevents someone else from invoking net_generic() concurrently with a call to tun_exit_net(), potentially obtaining a pointer to the structure that tun_exit_net() is about to kfree()? It's the same as if the tun_net was directly pointed by the struct net. Nobody can grant, that the pointer got by you from the struct net is not going to become free, unless you provide this security by yourself. So tun_net acquires some lock before calling net_generic(), and that same lock is held when calling tun_exit_net()? Or is there but a No. single tun_net task, so that it will never call tun_net_exit() at the same time that it calls net_generic() for the tun_net pointer? tun_net_exit is called only when a struct net is no longer referenced and is going to be kfree-ed itself, so it's impossible (or BUGy by its own) that someone still has a pointer on this net. Providing the struct net is alive (!), the net-gen array is alive (or is scheduled for kfree after RCU grace period). Thus, if your code holds the net and uses the net_generic() call, then it will get alive net-gen array and alive tun_net pointer. Next, what happens after net_generic() completes and leaves the RCU-read section? Simple - the struct net is (should be) still referenced, so the tun_net_exit cannot yet be called and thus the tun_net pointer obtained earlier is alive. Unlike the (possibly) former instance of the net_generic array, but nobody references this one in my code (and should not do so, hm... I think I'll add this rule to the comments). But if you call net_generic to get some pointer other than tun_net, then you're fine (due to RCU), providing you play the same rules with the pointer you're getting. Agreed, RCU protects the net_generic structure, but not the structures pointed to by that structure. They are protected by struct net reference counting. Ah, OK, got it! Thank you for the tutorial! Maybe I'm missing something in your question, can you provide some testcase, that you suspect may cause an OOPS? Just trying to understand what prevents one task from calling net_generic() to pick up the tun_net pointer at the same time some other task calls tun_net_exit(). If this task dereferences a held struct net, then should be OK. If this task does not, this will OOPs in any case. Consider the struct net to look like struct net { ... void *ptrs[N]; } and the net_generic to be just static inline void net_generic(struct net *net, int id) { BUG_ON(id = N); return net-ptrs[id - 1]; } That's the same to what I propose, except for the ptrs array is on the RCU protected memory. So RCU is protecting -only- the net_generic structure that net_generic() is traversing, and the structure returned by net_generic() is protected by a reference counter in the upper-level struct net. If this is the approach, I am happy. ;-) Thanx, Paul ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [PATCH 0/14 (3 subsets)] Make tuns and vlans devices work per-net.
From: Pavel Emelyanov [EMAIL PROTECTED] Date: Fri, 11 Apr 2008 19:57:25 +0400 I think, that core kernel code and protocols should/may use the struct net, while modules are better to work via generic pointers. However, if the generic pointers cause noticeable performance degradation, then we may ask Dave to bear with on-net members :) This sounds fine. ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [RFC] Control Groups Roadmap ideas
On Fri, Apr 11, 2008 at 8:18 PM, Serge E. Hallyn [EMAIL PROTECTED] wrote: Quoting Paul Menage ([EMAIL PROTECTED]): This is a list of some of the sub-projects that I'm planning for Control Groups, or that I know others are planning on or working on. Any comments or suggestions are welcome. 1) Stateless subsystems - This was motivated by the recent freezer subsystem proposal, which included a facility for sending signals to all members of a cgroup. This wasn't specifically freezer-related, and wasn't even something that needed particular per-cgroup state - its only state is that set of processes, which is already tracked by crgoups. So it could theoretically be mounted on multiple hierarchies at once, and wouldn't need an entry in the css_set array. This would require a few internal plumbing changes in cgroups, in particular: - hashing css_set objects based on their cgroups rather than their css pointers - allowing stateless subsystems to be in multiple hierarchies - changing the way hierarchy ids are calculated - simply ORing together the subsystem would no longer work since that could result in duplicates 2) More flexible binding/unbinding/rebinding - Currently you can only add/remove subsystems to a hierarchy when it has just a single (root) cgroup. This is a bit inflexible, so I'm planning to support: - adding a subsystem to an existing hierarchy by automatically creating a subsys state object for the new subsystem for each existing cgroup in the hierarchy and doing the appropriate can_attach()/attach_tasks() callbacks for all tasks in the system - removing a subsystem from an existing hierarchy by moving all tasks to that subsystem's root cgroup and destroying the child subsystem state objects - merging two existing hierarchies that have identical cgroup trees - (maybe) splitting one hierarchy into two separate hierarchies Whether all these operations should be forced through the mount() system call, or whether they should be done via operations on cgroup control files, is something I've not figured out yet. I'm tempted to ask what the use case is for this (I assume you have one, you don't generally introduce features for no good reason), but it doesn't sound like this would have any performance effect on the general case, so it sounds good. I'd stick with mount semantics. Just mount -t cgroup -o remount,devices,cpu none /devwh should handle all cases, no? 3) Subsystem dependencies - This would be a fairly simple change, essentially allowing one subsystem to require that it only be mounted on a hierarchy when some other subsystem was also present. The implementation would probably be a callback that allows a subsystem to confirm whether it's prepared to be included in a proposed hierarchy containing a specified subsystem bitmask; it would be able to prevent the hierarchy from being created by giving an error return. An example of a use for this would be a swap subsystem that is mostly independent of the memory controller, but uses the page-ownership tracking of the memory controller to determine which cgroup to charge swap pages to. Hence it would require that it only be mounted on a hierarchy that also included a memory controller. The memory controller would make no such requirement by itself, so could be used on its own without the swap controller. 4) Subsystem Inheritance -- This is an idea that I've been kicking around for a while trying to figure out whether its usefulness is worth the in-kernel complexity, versus doing it in userspace. It comes from the idea that although cgroups supports multiple hierarchies so that different subsystems can see different task groupings, one of the more common uses of this is (I believe) to support a setup where say we have separate groups A, B and C for one resource X, but for resource Y we want a group consisting of A+B+C. E.g. we want individual CPU limits for A, B and C, but for disk I/O we want them all to share a common limit. This can be done from userspace by mounting two hierarchies, one for CPU and one for disk I/O, and creating appropriate groupings, but it could also be done in the kernel as follows: - each subsystem foo would have a foo.inherit file provided by (and handled by) cgroups in each group directory - setting the foo.inherit flag (i.e. writing 1 to it) would cause tasks in that cgroup to share the foo subsystem state with the parent cgroup - from the subsystem's point of view, it would only need to worry about its own foo_cgroup objects and which task was associated with each object; the subsystem wouldn't need to care about which tasks were part of each cgroup, and which cgroups were sharing state; that would all be