[Devel] Re: [PATCH 1/14][NETNS]: Introduce the net-subsys id generator.

2008-04-11 Thread Pavel Emelyanov
 +int register_pernet_gen_device(int *id, struct pernet_operations *ops)
 +{
 +int error;
 +mutex_lock(net_mutex);
 +again:
 +error = ida_get_new_above(net_generic_ids, 1, id);
 +if (error) {
 +if (error == -EAGAIN) {
 +ida_pre_get(net_generic_ids, GFP_KERNEL);
 +goto again;
 +}
 
   goto out;
 
 +}
 +error = register_pernet_operations(first_device, ops);
 +if (error)
 +ida_remove(net_generic_ids, *id);
 +else if (first_device == pernet_list)
 +first_device = ops-list;
 
 out:

Oops! Thank, will fix.
___
Containers mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [RFC PATCH 0/4] Container Freezer: Reuse Suspend Freezer

2008-04-11 Thread Pavel Machek
Hi!

 NOTE: Due to problems with my MTA configuration two earlier attempts reached 
 linux-pm
 but not linux-kernel. Please cc [EMAIL PROTECTED] on replies.
 
 This patchset is a prototype using the container infrastructure and
 the swsusp freezer to freeze a group of tasks. I've merely taken Cedric's
 patches, forward-ported them to 2.6.25-rc8-mm1 and done a small amount of
 testing.

Okay, freezer probably does what you want, but be warned that Linus is
not exactly in love with freezer. You probably can get away with using
it for user processes, but maybe you should drop him the line saying
you want to expand freezer usage and see what happens.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
___
Containers mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [RFC PATCH 1/4] Container Freezer: Add TIF_FREEZE flag to all architectures

2008-04-11 Thread Pavel Machek
On Thu 2008-04-03 14:03:17, [EMAIL PROTECTED] wrote:
 This patch is the first step in making the refrigerator() available 
 to all architectures, even for those without power management. 
 
 The purpose of such a change is to be able to use the refrigerator() 
 in a new control group subsystem which will implement a control group
 freezer.
 
 Signed-off-by: Cedric Le Goater [EMAIL PROTECTED]
 Signed-off-by: Matt Helsley [EMAIL PROTECTED]
 Tested-by: Matt Helsley [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]

ACK.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
___
Containers mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [PATCH 6/14][RTNL]: Introduce the rtnl_kill_links call.

2008-04-11 Thread Patrick McHardy
Pavel Emelyanov wrote:
 for_each_net(net) {
 -restart:
 -   for_each_netdev_safe(net, dev, n) {
 -   if (dev-rtnl_link_ops == ops) {
 -   ops-dellink(dev);
 -   goto restart;
 -   }
 -   }
 +   __rtnl_kill_links(net, ops);
 This was _safe, and now it's not. Is that intentional?
 
 Yup - we goto restart in case we del some link, so there's no need
 in _safe iteration. 
 
 This goto was added by Partick (commit 68365458 [NET]: rtnl_link: 
 fix use-after-free) and I suspect he simply forgot to remove the 
 _safe iterator (I put him in Cc to correct me if I'm wrong).


No, that was an oversight, it should be safe to remove.
___
Containers mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [PATCH 2/14][NETNS]: Generic per-net pointers.

2008-04-11 Thread Daniel Lezcano
Pavel Emelyanov wrote:
 Add the elastic array of void * pointer to the struct net.
 The access rules are simple:
 
  1. register the ops with register_pernet_gen_device to get
 the id of your private pointer
  2. call net_assign_generic() to put the private data on the
 struct net (most preferably this should be done in the
 -init callback of the ops registered)
  3. do not change this pointer while the net is alive;
  4. use the net_generic() to get the pointer.
 
 When adding a new pointer, I copy the old array, replace it
 with a new one and schedule the old for kfree after an RCU
 grace period.
 
 Since the net_generic explores the net-gen array inside rcu
 read section and once set the net-gen-ptr[x] pointer never 
 changes, this grants us a safe access to generic pointers.
 
 Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]
 
 ---
  include/net/net_namespace.h |2 +
  include/net/netns/generic.h |   49 ++
  net/core/net_namespace.c|   62 
 +++
  3 files changed, 113 insertions(+), 0 deletions(-)
  create mode 100644 include/net/netns/generic.h
 
 diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
 index 6971fdb..e3d4eb4 100644
 --- a/include/net/net_namespace.h
 +++ b/include/net/net_namespace.h
 @@ -19,6 +19,7 @@ struct proc_dir_entry;
  struct net_device;
  struct sock;
  struct ctl_table_header;
 +struct net_generic;
 
  struct net {
   atomic_tcount;  /* To decided when the network
 @@ -57,6 +58,7 @@ struct net {
  #ifdef CONFIG_NETFILTER
   struct netns_xt xt;
  #endif
 + struct net_generic  *gen;
  };
 
 
 diff --git a/include/net/netns/generic.h b/include/net/netns/generic.h
 new file mode 100644
 index 000..e8a6d27
 --- /dev/null
 +++ b/include/net/netns/generic.h
 @@ -0,0 +1,49 @@
 +/*
 + * generic net pointers
 + */
 +
 +#ifndef __NET_GENERIC_H__
 +#define __NET_GENERIC_H__
 +
 +#include linux/rcupdate.h
 +
 +/*
 + * Generic net pointers are to be used by modules
 + * to put some private stuff on the struct net without
 + * explicit struct net modification
 + *
 + * The rules are simple:
 + * 1. register the ops with register_pernet_gen_device to get
 + *the id of your private pointer
 + * 2. call net_assign_generic() to put the private data on the
 + *struct net (most preferably this should be done in the
 + *-init callback of the ops registered)
 + * 3. do not change this pointer while the net is alive.
 + *
 + * After accomplishing all of the above, the private pointer
 + * can be accessed with the net_generic() call.
 + */
 +
 +struct net_generic {
 + unsigned int len;
 + struct rcu_head rcu;
 +
 + void *ptr[0];
 +};
 +
 +static inline void *net_generic(struct net *net, int id)
 +{
 + struct net_generic *ng;
 + void *ptr;
 +
 + rcu_read_lock();
 + ng = rcu_dereference(net-gen);
 + BUG_ON(id == 0 || id  ng-len);
 + ptr = ng-ptr[id - 1];
 + rcu_read_unlock();
 +
 + return ptr;
 +}
 +
 +extern int net_assign_generic(struct net *net, int id, void *data);
 +#endif
 diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
 index 7ef3bac..b384840 100644
 --- a/net/core/net_namespace.c
 +++ b/net/core/net_namespace.c
 @@ -7,6 +7,7 @@
  #include linux/sched.h
  #include linux/idr.h
  #include net/net_namespace.h
 +#include net/netns/generic.h
 
  /*
   *   Our network namespace constructor/destructor lists
 @@ -21,6 +22,8 @@ LIST_HEAD(net_namespace_list);
  struct net init_net;
  EXPORT_SYMBOL(init_net);
 
 +#define INITIAL_NET_GEN_PTRS 13 /* +1 for len +2 for rcu_head */
 +
  /*
   * setup_net runs the initializers for the network namespace object.
   */
 @@ -29,10 +32,21 @@ static __net_init int setup_net(struct net *net)
   /* Must be called with net_mutex held */
   struct pernet_operations *ops;
   int error;
 + struct net_generic *ng;
 
   atomic_set(net-count, 1);
   atomic_set(net-use_count, 0);
 
 + error = -ENOMEM;
 + ng = kzalloc(sizeof(struct net_generic) +
 + INITIAL_NET_GEN_PTRS * sizeof(void *), GFP_KERNEL);

Why do you need to allocate more than sizeof(struct net_generic) ?

 + if (ng == NULL)
 + goto out;
 +
 + ng-len = INITIAL_NET_GEN_PTRS;
 + INIT_RCU_HEAD(ng-rcu);
 + rcu_assign_pointer(net-gen, ng);
 +
   error = 0;
   list_for_each_entry(ops, pernet_list, list) {
   if (ops-init) {
 @@ -54,6 +68,7 @@ out_undo:
   }
 
   rcu_barrier();
 + kfree(ng);
   goto out;
  }
 
 @@ -384,3 +399,50 @@ void unregister_pernet_gen_device(int id, struct 
 pernet_operations *ops)
   mutex_unlock(net_mutex);
  }
  EXPORT_SYMBOL_GPL(unregister_pernet_gen_device);
 +
 +static void net_generic_release(struct rcu_head *rcu)
 +{
 + struct net_generic *ng;
 +
 + ng = container_of(rcu, struct net_generic, rcu);
 + kfree(ng);
 +}
 +
 +int 

[Devel] Re: [PATCH 2/14][NETNS]: Generic per-net pointers.

2008-04-11 Thread Pavel Emelyanov
[snip]

 @@ -29,10 +32,21 @@ static __net_init int setup_net(struct net *net)
  /* Must be called with net_mutex held */
  struct pernet_operations *ops;
  int error;
 +struct net_generic *ng;

  atomic_set(net-count, 1);
  atomic_set(net-use_count, 0);

 +error = -ENOMEM;
 +ng = kzalloc(sizeof(struct net_generic) +
 +INITIAL_NET_GEN_PTRS * sizeof(void *), GFP_KERNEL);
 
 Why do you need to allocate more than sizeof(struct net_generic) ?

That's just an optimization to avoid many reallocations in the
nearest future. I planned to make similar in net_assign_generic
(allocate a bit more that required), but decided to do it later
to keep net_assign_generic simpler.

Currently I have only 5 users of generic pointers (tun and vlan
you see and I have patches for ipip, ipgre and sit tunnels), so
that's enough for the first time.

[snip]

 +int net_assign_generic(struct net *net, int id, void *data)
 +{
 +struct net_generic *ng, *old_ng;
 +
 +BUG_ON(!mutex_is_locked(net_mutex));
 +BUG_ON(id == 0);
 +
 +ng = old_ng = net-gen;
 
 shouldn't it be rcu_dereferenced ?

Nope - nobody can race with us and change this pointer, so it's
safe to get one without rcu_dereference.

Thanks,
Pavel
___
Containers mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [PATCH 3/14][TUN]: Introduce the tun_net structure.

2008-04-11 Thread Paul E. McKenney
On Fri, Apr 11, 2008 at 11:55:59AM +0400, Pavel Emelyanov wrote:
 Paul E. McKenney wrote:
  On Thu, Apr 10, 2008 at 07:06:24PM +0400, Pavel Emelyanov wrote:
  This is the first step in making tuntap devices work in net 
  namespaces. The structure mentioned is pointed by generic
  net pointer with tun_net_id id, and tun driver fills one on 
  its load. It will contain only the tun devices list.
 
  So declare this structure and introduce net init and exit hooks.
  
  OK, I have to ask...  What prevents someone else from invoking
  net_generic() concurrently with a call to tun_exit_net(), potentially
  obtaining a pointer to the structure that tun_exit_net() is about
  to kfree()?
 
 It's the same as if the tun_net was directly pointed by the struct 
 net. Nobody can grant, that the pointer got by you from the struct
 net is not going to become free, unless you provide this security
 by yourself.

So tun_net acquires some lock before calling net_generic(), and that
same lock is held when calling tun_exit_net()?  Or is there but a
single tun_net task, so that it will never call tun_net_exit()
at the same time that it calls net_generic() for the tun_net pointer?

 But if you call net_generic to get some pointer other than tun_net,
 then you're fine (due to RCU), providing you play the same rules with
 the pointer you're getting.

Agreed, RCU protects the net_generic structure, but not the structures
pointed to by that structure.

 Maybe I'm missing something in your question, can you provide some
 testcase, that you suspect may cause an OOPS?

Just trying to understand what prevents one task from calling
net_generic() to pick up the tun_net pointer at the same time some other
task calls tun_net_exit().

Thanx, Paul

  Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]
 
  ---
   drivers/net/tun.c |   53 
  -
   1 files changed, 52 insertions(+), 1 deletions(-)
 
  diff --git a/drivers/net/tun.c b/drivers/net/tun.c
  index 7b816a0..9bfba02 100644
  --- a/drivers/net/tun.c
  +++ b/drivers/net/tun.c
  @@ -63,6 +63,7 @@
   #include linux/if_tun.h
   #include linux/crc32.h
   #include net/net_namespace.h
  +#include net/netns/generic.h
 
   #include asm/system.h
   #include asm/uaccess.h
  @@ -73,6 +74,11 @@ static int debug;
 
   /* Network device part of the driver */
 
  +static unsigned int tun_net_id;
  +struct tun_net {
  +  struct list_head dev_list;
  +};
  +
   static LIST_HEAD(tun_dev_list);
   static const struct ethtool_ops tun_ethtool_ops;
 
  @@ -873,6 +879,37 @@ static const struct ethtool_ops tun_ethtool_ops = {
 .set_rx_csum= tun_set_rx_csum
   };
 
  +static int tun_init_net(struct net *net)
  +{
  +  struct tun_net *tn;
  +
  +  tn = kmalloc(sizeof(*tn), GFP_KERNEL);
  +  if (tn == NULL)
  +  return -ENOMEM;
  +
  +  INIT_LIST_HEAD(tn-dev_list);
  +
  +  if (net_assign_generic(net, tun_net_id, tn)) {
  +  kfree(tn);
  +  return -ENOMEM;
  +  }
  +
  +  return 0;
  +}
  +
  +static void tun_exit_net(struct net *net)
  +{
  +  struct tun_net *tn;
  +
  +  tn = net_generic(net, tun_net_id);
  +  kfree(tn);
  +}
  +
  +static struct pernet_operations tun_net_ops = {
  +  .init = tun_init_net,
  +  .exit = tun_exit_net,
  +};
  +
   static int __init tun_init(void)
   {
 int ret = 0;
  @@ -880,9 +917,22 @@ static int __init tun_init(void)
 printk(KERN_INFO tun: %s, %s\n, DRV_DESCRIPTION, DRV_VERSION);
 printk(KERN_INFO tun: %s\n, DRV_COPYRIGHT);
 
  +  ret = register_pernet_gen_device(tun_net_id, tun_net_ops);
  +  if (ret) {
  +  printk(KERN_ERR tun: Can't register pernet ops\n);
  +  goto err_pernet;
  +  }
  +
 ret = misc_register(tun_miscdev);
  -  if (ret)
  +  if (ret) {
 printk(KERN_ERR tun: Can't register misc device %d\n, 
  TUN_MINOR);
  +  goto err_misc;
  +  }
  +  return 0;
  +
  +err_misc:
  +  unregister_pernet_gen_device(tun_net_id, tun_net_ops);
  +err_pernet:
 return ret;
   }
 
  @@ -899,6 +949,7 @@ static void tun_cleanup(void)
 }
 rtnl_unlock();
 
  +  unregister_pernet_gen_device(tun_net_id, tun_net_ops);
   }
 
   module_init(tun_init);
  -- 
  1.5.3.4
 
  --
  To unsubscribe from this list: send the line unsubscribe netdev in
  the body of a message to [EMAIL PROTECTED]
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  --
  To unsubscribe from this list: send the line unsubscribe netdev in
  the body of a message to [EMAIL PROTECTED]
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  
 
___
Containers mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [PATCH 3/14][TUN]: Introduce the tun_net structure.

2008-04-11 Thread Pavel Emelyanov
Paul E. McKenney wrote:
 On Fri, Apr 11, 2008 at 11:55:59AM +0400, Pavel Emelyanov wrote:
 Paul E. McKenney wrote:
 On Thu, Apr 10, 2008 at 07:06:24PM +0400, Pavel Emelyanov wrote:
 This is the first step in making tuntap devices work in net 
 namespaces. The structure mentioned is pointed by generic
 net pointer with tun_net_id id, and tun driver fills one on 
 its load. It will contain only the tun devices list.

 So declare this structure and introduce net init and exit hooks.
 OK, I have to ask...  What prevents someone else from invoking
 net_generic() concurrently with a call to tun_exit_net(), potentially
 obtaining a pointer to the structure that tun_exit_net() is about
 to kfree()?
 It's the same as if the tun_net was directly pointed by the struct 
 net. Nobody can grant, that the pointer got by you from the struct
 net is not going to become free, unless you provide this security
 by yourself.
 
 So tun_net acquires some lock before calling net_generic(), and that
 same lock is held when calling tun_exit_net()?  Or is there but a

No.

 single tun_net task, so that it will never call tun_net_exit()
 at the same time that it calls net_generic() for the tun_net pointer?

tun_net_exit is called only when a struct net is no longer referenced
and is going to be kfree-ed itself, so it's impossible (or BUGy by its
own) that someone still has a pointer on this net.

Providing the struct net is alive (!), the net-gen array is alive (or
is scheduled for kfree after RCU grace period). Thus, if your code 
holds the net and uses the net_generic() call, then it will get alive 
net-gen array and alive tun_net pointer.

Next, what happens after net_generic() completes and leaves the RCU-read 
section? Simple - the struct net is (should be) still referenced, so the
tun_net_exit cannot yet be called and thus the tun_net pointer obtained
earlier is alive. Unlike the (possibly) former instance of the net_generic
array, but nobody references this one in my code (and should not do so,
hm... I think I'll add this rule to the comments).

 But if you call net_generic to get some pointer other than tun_net,
 then you're fine (due to RCU), providing you play the same rules with
 the pointer you're getting.
 
 Agreed, RCU protects the net_generic structure, but not the structures
 pointed to by that structure.

They are protected by struct net reference counting.

 Maybe I'm missing something in your question, can you provide some
 testcase, that you suspect may cause an OOPS?
 
 Just trying to understand what prevents one task from calling
 net_generic() to pick up the tun_net pointer at the same time some other
 task calls tun_net_exit().

If this task dereferences a held struct net, then should be OK. If 
this task does not, this will OOPs in any case.

Consider the struct net to look like

struct net {
...
void *ptrs[N];
}

and the net_generic to be just

static inline void net_generic(struct net *net, int id)
{
BUG_ON(id = N);
return net-ptrs[id - 1];
}

That's the same to what I propose, except for the ptrs array is on the
RCU protected memory.

   Thanx, Pau

Thanks,
Pavel
___
Containers mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: nets: status of sysfs with netns

2008-04-11 Thread Daniel Lezcano
Eric W. Biederman wrote:
 Pavel Emelyanov [EMAIL PROTECTED] writes:
 
 Benjamin Thery wrote:
 Eric, Pavel,

 I haven't followed everything about the sysfs/netns issue recently and I 
 think I have missed some of its recent developments in the past weeks. 
 So I'm wondering what is the current status? Is anyone still working on it?
 I'm thinking on it, but have no graceful solution :( Sysfs is a big...
 no - HUUGE problem.

 As several netns patches for IPv4 and IPv6 have already been merged in 
 net-2.6.26, it would be great to have the sysfs part also to help people 
 who want test netns.
 Right now, it is a pain to boot and test a system with CONFIG_SYSFS=n.

 (I know the problem is not trivial and won't be easy to solve. I just 
 want to be sure it is not forgotten)
 It is (unfortunately) not.

 Pavel, I've heard you sent (or have) a patch that prevents sysfs access 
 from a child netns. A patch that can be a good workaround until we 
 have the full thing ready. I can't find it in mailing lists archives, 
 can you give some hints about where to find it? :)
 Hm. I haven't had plans to do such things actually.

 Thanks a lot.
 
 Sorry guys.  I think my initial reply got lost.
 
 Currently my approach was approved in code review.  The only hold up
 seems to be time, and gregkh getting overloaded and dropping the
 patches, for the Nth time.

Hi Eric,

I tryed to look at your patchset but I was unable to find it. Do you 
have it somewhere available ? I was guessing I could play with it and by 
the way test it a little ;)

Thanks
  -- Daniel













































Sauf indication contraire ci-dessus:
Compagnie IBM France
Siège Social : Tour Descartes, 2, avenue Gambetta, La Défense 5, 92400
Courbevoie
RCS Nanterre 552 118 465
Forme Sociale : S.A.S.
Capital Social : 542.737.118 ?
SIREN/SIRET : 552 118 465 02430
___
Containers mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [PATCH 0/14 (3 subsets)] Make tuns and vlans devices work per-net.

2008-04-11 Thread Daniel Lezcano
Pavel Emelyanov wrote:
 Hi, guys.
 
 I've recently sent a TUN devices virtualization, but it was rejected
 by Dave, since the struct net is becoming a dumping ground.
 
 I agree with him - we really need some way to register on-net data
 dynamically. That's my view of such a thing and two examples of how
 to use it (TUN and VLAN devices virtualization).
 
 If this will be found good, I'll send these sets to David, hoping he
 will accept them :)

Pavel,

seems to be a smart solution :)

I am just afraid with the performances when the network resources are to 
be accessed in the fast path like a routing table (that seems not to be 
the case for tun and vlan). Shall we assume the fast path should always 
go to struct net and non critical path can go to net_generic ?

   -- Daniel
___
Containers mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [PATCH 0/14 (3 subsets)] Make tuns and vlans devices work per-net.

2008-04-11 Thread Pavel Emelyanov
Daniel Lezcano wrote:
 Pavel Emelyanov wrote:
 Hi, guys.

 I've recently sent a TUN devices virtualization, but it was rejected
 by Dave, since the struct net is becoming a dumping ground.

 I agree with him - we really need some way to register on-net data
 dynamically. That's my view of such a thing and two examples of how
 to use it (TUN and VLAN devices virtualization).

 If this will be found good, I'll send these sets to David, hoping he
 will accept them :)
 
 Pavel,
 
 seems to be a smart solution :)

Thanks :) However, I've already found a bug in the 1st patch (already fixed).

 I am just afraid with the performances when the network resources are to 
 be accessed in the fast path like a routing table (that seems not to be 
 the case for tun and vlan). Shall we assume the fast path should always 
 go to struct net and non critical path can go to net_generic ?

Hm... I put call to net_generic() into tunnels rcv call and measured 
the performance with netperf - no performance penalty. I tried to make 
net_generic() work w/o any locks and looks like I've managed to make 
it fast enough :)

I think, that core kernel code and protocols should/may use the struct 
net, while modules are better to work via generic pointers. However, if
the generic pointers cause noticeable performance degradation, then we
may ask Dave to bear with on-net members :)

-- Daniel
 

___
Containers mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [PATCH 3/14][TUN]: Introduce the tun_net structure.

2008-04-11 Thread Paul E. McKenney
On Fri, Apr 11, 2008 at 07:45:06PM +0400, Pavel Emelyanov wrote:
 Paul E. McKenney wrote:
  On Fri, Apr 11, 2008 at 11:55:59AM +0400, Pavel Emelyanov wrote:
  Paul E. McKenney wrote:
  On Thu, Apr 10, 2008 at 07:06:24PM +0400, Pavel Emelyanov wrote:
  This is the first step in making tuntap devices work in net 
  namespaces. The structure mentioned is pointed by generic
  net pointer with tun_net_id id, and tun driver fills one on 
  its load. It will contain only the tun devices list.
 
  So declare this structure and introduce net init and exit hooks.
  OK, I have to ask...  What prevents someone else from invoking
  net_generic() concurrently with a call to tun_exit_net(), potentially
  obtaining a pointer to the structure that tun_exit_net() is about
  to kfree()?
  It's the same as if the tun_net was directly pointed by the struct 
  net. Nobody can grant, that the pointer got by you from the struct
  net is not going to become free, unless you provide this security
  by yourself.
  
  So tun_net acquires some lock before calling net_generic(), and that
  same lock is held when calling tun_exit_net()?  Or is there but a
 
 No.
 
  single tun_net task, so that it will never call tun_net_exit()
  at the same time that it calls net_generic() for the tun_net pointer?
 
 tun_net_exit is called only when a struct net is no longer referenced
 and is going to be kfree-ed itself, so it's impossible (or BUGy by its
 own) that someone still has a pointer on this net.
 
 Providing the struct net is alive (!), the net-gen array is alive (or
 is scheduled for kfree after RCU grace period). Thus, if your code 
 holds the net and uses the net_generic() call, then it will get alive 
 net-gen array and alive tun_net pointer.
 
 Next, what happens after net_generic() completes and leaves the RCU-read 
 section? Simple - the struct net is (should be) still referenced, so the
 tun_net_exit cannot yet be called and thus the tun_net pointer obtained
 earlier is alive. Unlike the (possibly) former instance of the net_generic
 array, but nobody references this one in my code (and should not do so,
 hm... I think I'll add this rule to the comments).
 
  But if you call net_generic to get some pointer other than tun_net,
  then you're fine (due to RCU), providing you play the same rules with
  the pointer you're getting.
  
  Agreed, RCU protects the net_generic structure, but not the structures
  pointed to by that structure.
 
 They are protected by struct net reference counting.

Ah, OK, got it!  Thank you for the tutorial!

  Maybe I'm missing something in your question, can you provide some
  testcase, that you suspect may cause an OOPS?
  
  Just trying to understand what prevents one task from calling
  net_generic() to pick up the tun_net pointer at the same time some other
  task calls tun_net_exit().
 
 If this task dereferences a held struct net, then should be OK. If 
 this task does not, this will OOPs in any case.
 
 Consider the struct net to look like
 
 struct net {
   ...
   void *ptrs[N];
 }
 
 and the net_generic to be just
 
 static inline void net_generic(struct net *net, int id)
 {
   BUG_ON(id = N);
   return net-ptrs[id - 1];
 }
 
 That's the same to what I propose, except for the ptrs array is on the
 RCU protected memory.

So RCU is protecting -only- the net_generic structure that net_generic()
is traversing, and the structure returned by net_generic() is protected
by a reference counter in the upper-level struct net.

If this is the approach, I am happy.  ;-)

Thanx, Paul
___
Containers mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [PATCH 0/14 (3 subsets)] Make tuns and vlans devices work per-net.

2008-04-11 Thread David Miller
From: Pavel Emelyanov [EMAIL PROTECTED]
Date: Fri, 11 Apr 2008 19:57:25 +0400

 I think, that core kernel code and protocols should/may use the struct 
 net, while modules are better to work via generic pointers. However, if
 the generic pointers cause noticeable performance degradation, then we
 may ask Dave to bear with on-net members :)

This sounds fine.
___
Containers mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [RFC] Control Groups Roadmap ideas

2008-04-11 Thread Balbir Singh
On Fri, Apr 11, 2008 at 8:18 PM, Serge E. Hallyn [EMAIL PROTECTED] wrote:

 Quoting Paul Menage ([EMAIL PROTECTED]):
   This is a list of some of the sub-projects that I'm planning for
   Control Groups, or that I know others are planning on or working on.
   Any comments or suggestions are welcome.
  
  
   1) Stateless subsystems
   -
  
   This was motivated by the recent freezer subsystem proposal, which
   included a facility for sending signals to all members of a cgroup.
   This wasn't specifically freezer-related, and wasn't even something
   that needed particular per-cgroup state - its only state is that set
   of processes, which is already tracked by crgoups. So it could
   theoretically be mounted on multiple hierarchies at once, and wouldn't
   need an entry in the css_set array.
  
   This would require a few internal plumbing changes in cgroups, in 
 particular:
  
   - hashing css_set objects based on their cgroups rather than their css 
 pointers
   - allowing stateless subsystems to be in multiple hierarchies
   - changing the way hierarchy ids are calculated - simply ORing
   together the subsystem would no longer work since that could result in
   duplicates
  
   2) More flexible binding/unbinding/rebinding
   -
  
   Currently you can only add/remove subsystems to a hierarchy when it
   has just a single (root) cgroup. This is a bit inflexible, so I'm
   planning to support:
  
   - adding a subsystem to an existing hierarchy by automatically
   creating a subsys state object for the new subsystem for each existing
   cgroup in the hierarchy and doing the appropriate
   can_attach()/attach_tasks() callbacks for all tasks in the system
  
   - removing a subsystem from an existing hierarchy by moving all tasks
   to that subsystem's root cgroup and destroying the child subsystem
   state objects
  
   - merging two existing hierarchies that have identical cgroup trees
  
   - (maybe) splitting one hierarchy into two separate hierarchies
  
   Whether all these operations should be forced through the mount()
   system call, or whether they should be done via operations on cgroup
   control files, is something I've not figured out yet.

  I'm tempted to ask what the use case is for this (I assume you have one,
  you don't generally introduce features for no good reason), but it
  doesn't sound like this would have any performance effect on the general
  case, so it sounds good.

  I'd stick with mount semantics.  Just
 mount -t cgroup -o remount,devices,cpu none /devwh
  should handle all cases, no?



   3) Subsystem dependencies
   -
  
   This would be a fairly simple change, essentially allowing one
   subsystem to require that it only be mounted on a hierarchy when some
   other subsystem was also present. The implementation would probably be
   a callback that allows a subsystem to confirm whether it's prepared to
   be included in a proposed hierarchy containing a specified subsystem
   bitmask; it would be able to prevent the hierarchy from being created
   by giving an error return. An example of a use for this would be a
   swap subsystem that is mostly independent of the memory controller,
   but uses the page-ownership tracking of the memory controller to
   determine which cgroup to charge swap pages to. Hence it would require
   that it only be mounted on a hierarchy that also included a memory
   controller. The memory controller would make no such requirement by
   itself, so could be used on its own without the swap controller.
  
  
   4) Subsystem Inheritance
   --
  
   This is an idea that I've been kicking around for a while trying to
   figure out whether its usefulness is worth the in-kernel complexity,
   versus doing it in userspace. It comes from the idea that although
   cgroups supports multiple hierarchies so that different subsystems can
   see different task groupings, one of the more common uses of this is
   (I believe) to support a setup where say we have separate groups A, B
   and C for one resource X, but for resource Y we want a group
   consisting of A+B+C. E.g. we want individual CPU limits for A, B and
   C, but for disk I/O we want them all to share a common limit. This can
   be done from userspace by mounting two hierarchies, one for CPU and
   one for disk I/O, and creating appropriate groupings, but it could
   also be done in the kernel as follows:
  
   - each subsystem foo would have a foo.inherit file provided by
   (and handled by) cgroups in each group directory
  
   - setting the foo.inherit flag (i.e. writing 1 to it) would cause
   tasks in that cgroup to share the foo subsystem state with the
   parent cgroup
  
   - from the subsystem's point of view, it would only need to worry
   about its own foo_cgroup objects  and which task was associated with
   each object; the subsystem wouldn't need to care about which tasks
   were part of each cgroup, and which cgroups were sharing state; that
   would all be