Re: [PATCH 0/12] L2 network namespace (v3)

2007-01-19 Thread Dmitry Mishin
On Friday 19 January 2007 10:27, Eric W. Biederman wrote:
 YOSHIFUJI Hideaki / 吉藤英明 [EMAIL PROTECTED] writes:
 
  In article [EMAIL PROTECTED] (at Wed, 17 Jan 2007 18:51:14
  +0300), Dmitry Mishin [EMAIL PROTECTED] says:
 
  ===
  L2 network namespaces
  
  The most straightforward concept of network virtualization is complete
  separation of namespaces, covering device list, routing tables, netfilter
  tables, socket hashes, and everything else.
  
  On input path, each packet is tagged with namespace right from the
  place where it appears from a device, and is processed by each layer
  in the context of this namespace.
  Non-root namespaces communicate with the outside world in two ways: by
  owning hardware devices, or receiving packets forwarded them by their 
  parent
  namespace via pass-through device.
 
  Can you handle multicast / broadcast and IPv6, which are very important?
 
 The basic idea here is very simple.
 
 Each network namespace appears to user space as a separate network stack,
 with it's own set of routing tables etc.
 
 All sockets and all network devices (the sources of packets) belong
 to exactly one network namespace.  
 
 From the socket or the network device a packet enters the network stack
 you can infer the network namespace that it will be processed in.
 Each network namespace should get it own complement of the data structures
 necessary to process packets, and everything should work.
 
 Talking between namespaces is accomplished either through an external network,
 or through a special pseudo network device.  The simplest to implement
 is two network devices where all packets transmitted on one are received
 on the other.  Then by placing one network device in one namespace and
 the other in another interface it looks like two machines connected by
 a cross over cable.
 
 Once you have that in a one namespace you can connect other namespaces
 with the existing ethernet bridging or by configuring one of the
 namespaces as a router and routing traffic between them.
 
 
 Supporting IPv6 is roughly as difficult as supporting IPv4.  
 
 What needs to happen to convert code is all variables either need
 a per network namespace instance or the data structures needs to be
 modified to have a network namespace tag.  For hash tables which
 are hard to allocate dynamically tagging is the preferred conversion
 method, for anything that is small enough duplication is preferred
 as it allows the existing logic to be kept.
 
 In the fast path the impact of all of the conversions should be very light,
 to non-existent.  In network stack initialization and cleanup there
 is work todo because you are initializing and cleanup variables more often
 then at module insertion and removal.
 
 So my expectation is that once we get a framework established and merged
 to allow network namespaces eventually the entire network stack will be
 converted.  Not just ipv4 and ipv6 but decnet, ipx, iptables, fair scheduling,
 ethernet bridging and all of the other weird and twisty bits of the
 linux network stack.
Thanks Eric for such descriptive comment. I can only sign off on it :)

 
 The primary practical hurdle is there is a lot of networking code in
 the kernel.
 
 I think I know a path by which we can incrementally merge support for
 network namespaces without breaking anything.  More to come on this
 when I finish up my demonstration patchset in a week or so that
 is complete enough to show what I am talking about.
 
 I hope this helps but the concept into perspective.
I'll be waiting it. 

 
 As for Dmitry's patchset in particular it currently does not support
 IPv6 and I don't know where it is with respect to the broadcast and
 multicast but I don't see any immediate problems that would preclude
 those from working.  But any incompleteness is exactly that
 incompleteness and an implementation problem not a fundamental design
 issue.
Broadcasts/multicasts are supported.

-- 
Thanks,
Dmitry.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/12] L2 network namespace (v3): current network namespace operations

2007-01-18 Thread Dmitry Mishin
On Wednesday 17 January 2007 23:16, Eric W. Biederman wrote:
 Dmitry Mishin [EMAIL PROTECTED] writes:
 
  Added functions and macros required to operate with network namespaces.
  They are required in order to switch network namespace for incoming packets 
  and
  to not extend current network interface by additional network namespace 
  argue.
 
 Is exec_net only used in interrupt context?
I tried to do so.

 Or how do you ensure a sleeping function does not get called and the
 kernel process comes back on another cpu?
Seems that I forgot to remove it's usage at least in one place - in
clone_net_ns(). If you caught more, please, let me know. 

-- 
Thanks,
Dmitry.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/12] L2 network namespace (v3)

2007-01-17 Thread Dmitry Mishin
This is an update of L2 network namespaces patches. They are applicable
to Cedric's 2.6.20-rc4-mm1-lxc2 tree. 

Changes:
- updated to 2.6.20-rc4-mm1-lxc2
- current network context is per-CPU now
- fixed compilation without CONFIG_NET_NS

Changed current context definition should fix all mentioned by Cedric issues:
- the nsproxy backpointer is unnecessary now - thus removed; 
- the push_net_ns() and pop_net_ns() use per-CPU variable now;
- there is no race on -nsproxy between push_net_ns() and
  exit_task_namespaces() because they deals with differrent pointers.

===
L2 network namespaces

The most straightforward concept of network virtualization is complete
separation of namespaces, covering device list, routing tables, netfilter
tables, socket hashes, and everything else.

On input path, each packet is tagged with namespace right from the
place where it appears from a device, and is processed by each layer
in the context of this namespace.
Non-root namespaces communicate with the outside world in two ways: by
owning hardware devices, or receiving packets forwarded them by their parent
namespace via pass-through device.

This complete separation of namespaces is very useful for at least two
purposes:
  - allowing users to create and manage by their own various tunnels and
VPNs, and
  - enabling easier and more straightforward live migration of groups of
processes with their environment.


-- 
Thanks,
Dmitry.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/12] allow proc_dir_entries to have destructor

2007-01-17 Thread Dmitry Mishin
Destructor field added proc_dir_entries,
standard destructor kfree'ing data introduced.

Signed-off-by: Andrey Savochkin [EMAIL PROTECTED]

---
 fs/proc/generic.c   |   10 --
 fs/proc/root.c  |1 +
 include/linux/proc_fs.h |4 
 3 files changed, 13 insertions(+), 2 deletions(-)

Index: 2.6.20-rc4-mm1/fs/proc/generic.c
===
--- 2.6.20-rc4-mm1.orig/fs/proc/generic.c
+++ 2.6.20-rc4-mm1/fs/proc/generic.c
@@ -611,6 +611,11 @@ static struct proc_dir_entry *proc_creat
return ent;
 }
 
+void proc_data_destructor(struct proc_dir_entry *ent)
+{
+   kfree(ent-data);
+}
+
 struct proc_dir_entry *proc_symlink(const char *name,
struct proc_dir_entry *parent, const char *dest)
 {
@@ -623,6 +628,7 @@ struct proc_dir_entry *proc_symlink(cons
ent-data = kmalloc((ent-size=strlen(dest))+1, GFP_KERNEL);
if (ent-data) {
strcpy((char*)ent-data,dest);
+   ent-destructor = proc_data_destructor;
if (proc_register(parent, ent)  0) {
kfree(ent-data);
kfree(ent);
@@ -701,8 +707,8 @@ void free_proc_entry(struct proc_dir_ent
 
release_inode_number(ino);
 
-   if (S_ISLNK(de-mode)  de-data)
-   kfree(de-data);
+   if (de-destructor)
+   de-destructor(de);
kfree(de);
 }
 
Index: 2.6.20-rc4-mm1/fs/proc/root.c
===
--- 2.6.20-rc4-mm1.orig/fs/proc/root.c
+++ 2.6.20-rc4-mm1/fs/proc/root.c
@@ -167,6 +167,7 @@ EXPORT_SYMBOL(proc_symlink);
 EXPORT_SYMBOL(proc_mkdir);
 EXPORT_SYMBOL(create_proc_entry);
 EXPORT_SYMBOL(remove_proc_entry);
+EXPORT_SYMBOL(proc_data_destructor);
 EXPORT_SYMBOL(proc_root);
 EXPORT_SYMBOL(proc_root_fs);
 EXPORT_SYMBOL(proc_net);
Index: 2.6.20-rc4-mm1/include/linux/proc_fs.h
===
--- 2.6.20-rc4-mm1.orig/include/linux/proc_fs.h
+++ 2.6.20-rc4-mm1/include/linux/proc_fs.h
@@ -45,6 +45,8 @@ typedef   int (read_proc_t)(char *page, ch
 typedefint (write_proc_t)(struct file *file, const char __user *buffer,
   unsigned long count, void *data);
 typedef int (get_info_t)(char *, char **, off_t, int);
+struct proc_dir_entry;
+typedef void (destroy_proc_t)(struct proc_dir_entry *);
 
 struct proc_dir_entry {
unsigned int low_ino;
@@ -64,6 +66,7 @@ struct proc_dir_entry {
read_proc_t *read_proc;
write_proc_t *write_proc;
atomic_t count; /* use count */
+   destroy_proc_t *destructor;
int deleted;/* delete flag */
void *set;
 };
@@ -108,6 +111,7 @@ char *task_mem(struct mm_struct *, char 
 extern struct proc_dir_entry *create_proc_entry(const char *name, mode_t mode,
struct proc_dir_entry *parent);
 extern void remove_proc_entry(const char *name, struct proc_dir_entry *parent);
+extern void proc_data_destructor(struct proc_dir_entry *);
 
 extern struct vfsmount *proc_mnt;
 extern int proc_fill_super(struct super_block *,void *,int);
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 8/12] net_device seq_file

2007-01-17 Thread Dmitry Mishin
Library function to create a seq_file in proc filesystem,
showing some information for each netdevice.
This code is present in the kernel in about 10 instances, and
all of them can be converted to using introduced library function.

Signed-off-by: Andrey Savochkin [EMAIL PROTECTED]

---
 include/linux/netdevice.h |7 +++
 net/core/dev.c|   96 ++
 2 files changed, 103 insertions(+)

--- linux-2.6.20-rc4-mm1.net_ns.orig/include/linux/netdevice.h
+++ linux-2.6.20-rc4-mm1.net_ns/include/linux/netdevice.h
@@ -604,6 +604,13 @@ extern int register_netdevice(struct ne
 extern int unregister_netdevice(struct net_device *dev);
 extern voidfree_netdev(struct net_device *dev);
 extern voidsynchronize_net(void);
+#ifdef CONFIG_PROC_FS
+extern int netdev_proc_create(char *name,
+   int (*show)(struct seq_file *,
+   struct net_device *, void *),
+   void *data, struct module *mod);
+void   netdev_proc_remove(char *name);
+#endif
 extern int register_netdevice_notifier(struct notifier_block *nb);
 extern int unregister_netdevice_notifier(struct notifier_block 
*nb);
 extern int call_netdevice_notifiers(unsigned long val, void *v);
--- linux-2.6.20-rc4-mm1.net_ns.orig/net/core/dev.c
+++ linux-2.6.20-rc4-mm1.net_ns/net/core/dev.c
@@ -2099,6 +2099,102 @@ static int dev_ifconf(char __user *arg)
 }
 
 #ifdef CONFIG_PROC_FS
+
+struct netdev_proc_data {
+   struct file_operations fops;
+   int (*show)(struct seq_file *, struct net_device *, void *);
+   void *data;
+};
+
+static void *netdev_proc_seq_start(struct seq_file *seq, loff_t *pos)
+{
+   struct net_device *dev;
+   loff_t off;
+
+   read_lock(dev_base_lock);
+   if (*pos == 0)
+   return SEQ_START_TOKEN;
+   for (dev = dev_base, off = 1; dev; dev = dev-next, off++) {
+   if (*pos == off)
+   return dev;
+   }
+   return NULL;
+}
+
+static void *netdev_proc_seq_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+   ++*pos;
+   return (v == SEQ_START_TOKEN) ? dev_base
+   : ((struct net_device *)v)-next;
+}
+
+static void netdev_proc_seq_stop(struct seq_file *seq, void *v)
+{
+   read_unlock(dev_base_lock);
+}
+
+static int netdev_proc_seq_show(struct seq_file *seq, void *v)
+{
+   struct netdev_proc_data *p;
+
+   p = seq-private;
+   return (*p-show)(seq, v, p-data);
+}
+
+static struct seq_operations netdev_proc_seq_ops = {
+   .start = netdev_proc_seq_start,
+   .next  = netdev_proc_seq_next,
+   .stop  = netdev_proc_seq_stop,
+   .show  = netdev_proc_seq_show,
+};
+
+static int netdev_proc_open(struct inode *inode, struct file *file)
+{
+   int err;
+   struct seq_file *p;
+
+   err = seq_open(file, netdev_proc_seq_ops);
+   if (!err) {
+   p = file-private_data;
+   p-private = (struct netdev_proc_data *)PDE(inode)-data;
+   }
+   return err;
+}
+
+int netdev_proc_create(char *name,
+   int (*show)(struct seq_file *, struct net_device *, void *),
+   void *data, struct module *mod)
+{
+   struct netdev_proc_data *p;
+   struct proc_dir_entry *ent;
+
+   p = kzalloc(sizeof(*p), GFP_KERNEL);
+   p-fops.owner = mod;
+   p-fops.open = netdev_proc_open;
+   p-fops.read = seq_read;
+   p-fops.llseek = seq_lseek;
+   p-fops.release = seq_release;
+   p-show = show;
+   p-data = data;
+   ent = create_proc_entry(name, S_IRUGO, proc_net);
+   if (ent == NULL) {
+   kfree(p);
+   return -EINVAL;
+   }
+   ent-data = p;
+   ent-destructor = proc_data_destructor;
+   smp_wmb();
+   ent-proc_fops = p-fops;
+   return 0;
+}
+EXPORT_SYMBOL(netdev_proc_create);
+
+void netdev_proc_remove(char *name)
+{
+   proc_net_remove(name);
+}
+EXPORT_SYMBOL(netdev_proc_remove);
+
 /*
  * This is invoked by the /proc filesystem handler to display a device
  * in detail.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 9/12] L2 network namespace (v3): device to pass packets between namespaces

2007-01-17 Thread Dmitry Mishin
A simple device to pass packets between a namespace and its child.

Signed-off-by: Dmitry Mishin [EMAIL PROTECTED]

---
 drivers/net/Makefile |3 
 drivers/net/veth.c   |  321 +++
 net/core/net_namespace.c |1 
 3 files changed, 325 insertions(+)

--- linux-2.6.20-rc4-mm1.net_ns.orig/drivers/net/Makefile
+++ linux-2.6.20-rc4-mm1.net_ns/drivers/net/Makefile
@@ -125,6 +125,9 @@ obj-$(CONFIG_SLIP) += slip.o
 obj-$(CONFIG_SLHC) += slhc.o
 
 obj-$(CONFIG_DUMMY) += dummy.o
+ifeq ($(CONFIG_NET_NS),y)
+obj-m += veth.o
+endif
 obj-$(CONFIG_IFB) += ifb.o
 obj-$(CONFIG_DE600) += de600.o
 obj-$(CONFIG_DE620) += de620.o
--- /dev/null
+++ linux-2.6.20-rc4-mm1.net_ns/drivers/net/veth.c
@@ -0,0 +1,321 @@
+/*
+ * Copyright (C) 2006  SWsoft
+ *
+ * Written by Andrey Savochkin [EMAIL PROTECTED],
+ * reusing code by Andrey Mirkin [EMAIL PROTECTED].
+ */
+#include linux/list.h
+#include linux/spinlock.h
+#include linux/ctype.h
+#include asm/semaphore.h
+#include linux/netdevice.h
+#include linux/etherdevice.h
+#include linux/proc_fs.h
+#include linux/seq_file.h
+#include net/dst.h
+#include net/xfrm.h
+
+struct veth_struct
+{
+   struct net_device   *pair;
+   struct net_device_stats stats;
+};
+
+#define veth_from_netdev(dev) ((struct veth_struct *)(netdev_priv(dev)))
+
+/* --- *
+ *
+ * Device functions
+ *
+ * --- */
+
+static struct net_device_stats *get_stats(struct net_device *dev);
+static int veth_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+   struct net_device_stats *stats;
+   struct veth_struct *entry;
+   struct net_device *rcv;
+   struct net_namespace *orig_net_ns;
+   int length;
+
+   stats = get_stats(dev);
+   entry = veth_from_netdev(dev);
+   rcv = entry-pair;
+
+   if (!(rcv-flags  IFF_UP))
+   /* Target namespace does not want to receive packets */
+   goto outf;
+
+   dst_release(skb-dst);
+   skb-dst = NULL;
+   secpath_reset(skb);
+   skb_orphan(skb);
+   nf_reset(skb);
+
+   orig_net_ns = push_net_ns(rcv-net_ns);
+   skb-dev = rcv;
+   skb-pkt_type = PACKET_HOST;
+   skb-protocol = eth_type_trans(skb, rcv);
+
+   length = skb-len;
+   stats-tx_bytes += length;
+   stats-tx_packets++;
+   stats = get_stats(rcv);
+   stats-rx_bytes += length;
+   stats-rx_packets++;
+
+   netif_rx(skb);
+   pop_net_ns(orig_net_ns);
+   return 0;
+
+outf:
+   stats-tx_dropped++;
+   kfree_skb(skb);
+   return 0;
+}
+
+static int veth_open(struct net_device *dev)
+{
+   return 0;
+}
+
+static int veth_close(struct net_device *dev)
+{
+   return 0;
+}
+
+static void veth_destructor(struct net_device *dev)
+{
+   free_netdev(dev);
+}
+
+static struct net_device_stats *get_stats(struct net_device *dev)
+{
+   return veth_from_netdev(dev)-stats;
+}
+
+int veth_init_dev(struct net_device *dev)
+{
+   dev-hard_start_xmit = veth_xmit;
+   dev-open = veth_open;
+   dev-stop = veth_close;
+   dev-destructor = veth_destructor;
+   dev-get_stats = get_stats;
+
+   ether_setup(dev);
+
+   dev-tx_queue_len = 0;
+   return 0;
+}
+
+static void veth_setup(struct net_device *dev)
+{
+   dev-init = veth_init_dev;
+}
+
+static inline int is_veth_dev(struct net_device *dev)
+{
+   return dev-init == veth_init_dev;
+}
+
+/* --- *
+ *
+ * Management interface
+ *
+ * --- */
+
+struct net_device *veth_dev_alloc(char *name, char *addr)
+{
+   struct net_device *dev;
+
+   dev = alloc_netdev(sizeof(struct veth_struct), name, veth_setup);
+   if (dev != NULL) {
+   memcpy(dev-dev_addr, addr, ETH_ALEN);
+   dev-addr_len = ETH_ALEN;
+   }
+   return dev;
+}
+
+int veth_entry_add(char *parent_name, char *parent_addr,
+   struct net_namespace *parent_ns, char *child_name, char *child_addr,
+   struct net_namespace *child_ns)
+{
+   struct net_device *parent_dev, *child_dev;
+   int err;
+
+   err = -ENOMEM;
+   if ((parent_dev = veth_dev_alloc(parent_name, parent_addr)) == NULL)
+   goto out_alocp;
+   if ((child_dev = veth_dev_alloc(child_name, child_addr)) == NULL)
+   goto out_alocc;
+   veth_from_netdev(parent_dev)-pair = child_dev;
+   veth_from_netdev(child_dev)-pair = parent_dev;
+
+   /*
+* About serialization, see comments to veth_pair_del().
+*/
+   rtnl_lock();
+   /* refcounts should be already upped, so, just put old ones */
+   put_net_ns(parent_dev-net_ns);
+   parent_dev-net_ns = parent_ns;
+   if ((err = register_netdevice

[PATCH 10/12] L2 network namespace (v3): playing with pass-through device

2007-01-17 Thread Dmitry Mishin
Temporary code to debug and play with pass-through device.
Create device pair by
modprobe veth
echo 'add veth1 0:1:2:3:4:1 eth0 0:1:2:3:4:2' /proc/net/veth_ctl
and your shell will appear into a new namespace with `eth0' device.
Configure device in this namespace
ip l s eth0 up
ip a a 1.2.3.4/24 dev eth0
and in the root namespace
ip l s veth1 up
ip a a 1.2.3.1/24 dev veth1
to establish a communication channel between root namespace and the newly
created one.

Code is done by Andrey Savochkin and ported by me over Cedric'c patchset

Signed-off-by: Dmitry Mishin [EMAIL PROTECTED]

---
 drivers/net/veth.c   |  121 +++
 fs/proc/array.c  |8 +++
 kernel/fork.c|1 
 kernel/nsproxy.c |1 
 net/core/net_namespace.c |3 +
 5 files changed, 134 insertions(+)

--- linux-2.6.20-rc4-mm1.net_ns.orig/drivers/net/veth.c
+++ linux-2.6.20-rc4-mm1.net_ns/drivers/net/veth.c
@@ -12,6 +12,7 @@
 #include linux/etherdevice.h
 #include linux/proc_fs.h
 #include linux/seq_file.h
+#include linux/syscalls.h
 #include net/dst.h
 #include net/xfrm.h
 
@@ -245,6 +246,123 @@ void veth_entry_del_all(void)
 
 /* --- *
  *
+ * Temporary interface to create veth devices
+ *
+ * --- */
+
+#ifdef CONFIG_PROC_FS
+
+static int veth_debug_open(struct inode *inode, struct file *file)
+{
+   return 0;
+}
+
+static char *parse_addr(char *s, char *addr)
+{
+   int i, v;
+
+   for (i = 0; i  ETH_ALEN; i++) {
+   if (!isxdigit(*s))
+   return NULL;
+   *addr = 0;
+   v = isdigit(*s) ? *s - '0' : toupper(*s) - 'A' + 10;
+   s++;
+   if (isxdigit(*s)) {
+   *addr += v  16;
+   v = isdigit(*s) ? *s - '0' : toupper(*s) - 'A' + 10;
+   s++;
+   }
+   *addr++ += v;
+   if (i  ETH_ALEN - 1  ispunct(*s))
+   s++;
+   }
+   return s;
+}
+
+static ssize_t veth_debug_write(struct file *file, const char __user *user_buf,
+   size_t size, loff_t *ppos)
+{
+   char buf[128], *s, *parent_name, *child_name;
+   char parent_addr[ETH_ALEN], child_addr[ETH_ALEN];
+   struct net_namespace *parent_ns, *child_ns;
+   int err;
+
+   s = buf;
+   err = -EINVAL;
+   if (size = sizeof(buf))
+   goto out;
+   err = -EFAULT;
+   if (copy_from_user(buf, user_buf, size))
+   goto out;
+   buf[size] = 0;
+
+   err = -EBADRQC;
+   if (!strncmp(buf, add , 4)) {
+   parent_name = buf + 4;
+   if ((s = strchr(parent_name, ' ')) == NULL)
+   goto out;
+   *s = 0;
+   if ((s = parse_addr(s + 1, parent_addr)) == NULL)
+   goto out;
+   if (!*s)
+   goto out;
+   child_name = s + 1;
+   if ((s = strchr(child_name, ' ')) == NULL)
+   goto out;
+   *s = 0;
+   if ((s = parse_addr(s + 1, child_addr)) == NULL)
+   goto out;
+
+   get_net_ns(current_net_ns);
+   parent_ns = current_net_ns;
+   if (*s == ' ') {
+   unsigned int id;
+   id = simple_strtoul(s + 1, s, 0);
+   err = sys_bind_ns(id, NS_ALL);
+   } else
+   err = sys_unshare(CLONE_NEWNET2);
+   if (err)
+   goto out;
+   /* after bind_ns() or unshare_ns() namespace is changed */
+   get_net_ns(current_net_ns);
+   child_ns = current_net_ns;
+   err = veth_entry_add(parent_name, parent_addr, parent_ns,
+   child_name, child_addr, child_ns);
+   if (err) {
+   put_net_ns(child_ns);
+   put_net_ns(parent_ns);
+   } else
+   err = size;
+   }
+out:
+   return err;
+}
+
+static struct file_operations veth_debug_ops = {
+   .open   = veth_debug_open,
+   .write  = veth_debug_write,
+};
+
+static int veth_debug_create(void)
+{
+   proc_net_fops_create(veth_ctl, 0200, veth_debug_ops);
+   return 0;
+}
+
+static void veth_debug_remove(void)
+{
+   proc_net_remove(veth_ctl);
+}
+
+#else
+
+static int veth_debug_create(void) { return -1; }
+static void veth_debug_remove(void) { }
+
+#endif
+
+/* --- *
+ *
  * Information in proc
  *
  * --- */
@@ -304,12 +422,15 @@ static inline void veth_proc_remove(void
 
 int __init veth_init

[PATCH 11/12] L2 network namespace (v3): sockets proc view virtualization

2007-01-17 Thread Dmitry Mishin
Only current net namespace sockets or all sockets in case of init_net_ns should
be visible through proc interface.

Signed-off-by: Dmitry Mishin [EMAIL PROTECTED]

---
 include/net/af_unix.h |   21 +
 net/ipv4/tcp_ipv4.c   |9 +
 net/ipv4/udp.c|   13 +++--
 3 files changed, 37 insertions(+), 6 deletions(-)

--- linux-2.6.20-rc4-mm1.net_ns.orig/include/net/af_unix.h
+++ linux-2.6.20-rc4-mm1.net_ns/include/net/af_unix.h
@@ -19,9 +19,13 @@ extern atomic_t unix_tot_inflight;
 
 static inline struct sock *first_unix_socket(int *i)
 {
+   struct sock *sk;
+
for (*i = 0; *i = UNIX_HASH_SIZE; (*i)++) {
-   if (!hlist_empty(unix_socket_table[*i]))
-   return __sk_head(unix_socket_table[*i]);
+   for (sk = sk_head(unix_socket_table[*i]); sk; sk = sk_next(sk))
+   if (net_ns_match(sk-sk_net_ns, current_net_ns) ||
+   net_ns_match(current_net_ns, init_net_ns))
+   return sk;
}
return NULL;
 }
@@ -32,10 +36,19 @@ static inline struct sock *next_unix_soc
/* More in this chain? */
if (next)
return next;
+   for (; next != NULL; next = sk_next(next)) {
+   if (!net_ns_match(next-sk_net_ns, current_net_ns) 
+   !net_ns_match(current_net_ns, init_net_ns))
+   continue;
+   return next;
+   }
/* Look for next non-empty chain. */
for ((*i)++; *i = UNIX_HASH_SIZE; (*i)++) {
-   if (!hlist_empty(unix_socket_table[*i]))
-   return __sk_head(unix_socket_table[*i]);
+   for (next = sk_head(unix_socket_table[*i]); next;
+   next = sk_next(next))
+   if (net_ns_match(next-sk_net_ns, current_net_ns) ||
+   net_ns_match(current_net_ns, init_net_ns))
+   return next;
}
return NULL;
 }
--- linux-2.6.20-rc4-mm1.net_ns.orig/net/ipv4/tcp_ipv4.c
+++ linux-2.6.20-rc4-mm1.net_ns/net/ipv4/tcp_ipv4.c
@@ -1992,6 +1992,9 @@ get_req:
}
 get_sk:
sk_for_each_from(sk, node) {
+   if (!net_ns_match(sk-sk_net_ns, current_net_ns) 
+   !net_ns_match(current_net_ns, init_net_ns))
+   continue;
if (sk-sk_family == st-family) {
cur = sk;
goto out;
@@ -2043,6 +2046,9 @@ static void *established_get_first(struc
 
read_lock(tcp_hashinfo.ehash[st-bucket].lock);
sk_for_each(sk, node, tcp_hashinfo.ehash[st-bucket].chain) {
+   if (!net_ns_match(sk-sk_net_ns, current_net_ns) 
+   !net_ns_match(current_net_ns, init_net_ns))
+   continue;
if (sk-sk_family != st-family) {
continue;
}
@@ -2102,6 +2108,9 @@ get_tw:
sk = sk_next(sk);
 
sk_for_each_from(sk, node) {
+   if (!net_ns_match(sk-sk_net_ns, current_net_ns) 
+   !net_ns_match(current_net_ns, init_net_ns))
+   continue;
if (sk-sk_family == st-family)
goto found;
}
--- linux-2.6.20-rc4-mm1.net_ns.orig/net/ipv4/udp.c
+++ linux-2.6.20-rc4-mm1.net_ns/net/ipv4/udp.c
@@ -1549,6 +1549,9 @@ static struct sock *udp_get_first(struct
for (state-bucket = 0; state-bucket  UDP_HTABLE_SIZE; 
++state-bucket) {
struct hlist_node *node;
sk_for_each(sk, node, state-hashtable + state-bucket) {
+   if (!net_ns_match(sk-sk_net_ns, current_net_ns) 
+   !net_ns_match(current_net_ns, init_net_ns))
+   continue;
if (sk-sk_family == state-family)
goto found;
}
@@ -1565,8 +1568,14 @@ static struct sock *udp_get_next(struct 
do {
sk = sk_next(sk);
 try_again:
-   ;
-   } while (sk  sk-sk_family != state-family);
+   if (!sk)
+   break;
+   if (sk-sk_family != state-family)
+   continue;
+   if (net_ns_match(sk-sk_net_ns, current_net_ns) ||
+   net_ns_match(current_net_ns, init_net_ns))
+   break;
+   } while (1);
 
if (!sk  ++state-bucket  UDP_HTABLE_SIZE) {
sk = sk_head(state-hashtable + state-bucket);
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 12/12] L2 network namespace (v3): L3 network namespace intro

2007-01-17 Thread Dmitry Mishin
 Inroduce two kind of network namespaces - level 2 and level 3. First one is
 namespace with full set of networking objects, while second one -
 socket-level with restricted set.

 Signed-off-by: Dmitry Mishin [EMAIL PROTECTED]

---
 include/linux/net_namespace.h |3 +++
 net/core/net_namespace.c  |   40 
 2 files changed, 31 insertions(+), 12 deletions(-)

--- linux-2.6.20-rc4-mm1.net_ns.orig/include/linux/net_namespace.h
+++ linux-2.6.20-rc4-mm1.net_ns/include/linux/net_namespace.h
@@ -24,6 +24,9 @@ struct net_namespace {
int fib4_trie_last_dflt;
 #endif
unsigned inthash;
+#define NET_NS_LEVEL2  1
+#define NET_NS_LEVEL3  2
+   unsigned intlevel;
 };
 
 extern struct net_namespace init_net_ns;
--- linux-2.6.20-rc4-mm1.net_ns.orig/net/core/net_namespace.c
+++ linux-2.6.20-rc4-mm1.net_ns/net/core/net_namespace.c
@@ -30,13 +30,19 @@ EXPORT_PER_CPU_SYMBOL_GPL(exec_net_ns);
 
 /*
  * Clone a new ns copying an original net ns, setting refcount to 1
+ * @level: level of namespace to create
  * @old_ns: namespace to clone
- * Return NULL on error (failure to kmalloc), new ns otherwise
+ * Return ERR_PTR on error, new ns otherwise
  */
-static struct net_namespace *clone_net_ns(struct net_namespace *old_ns)
+static struct net_namespace *clone_net_ns(unsigned int level,
+   struct net_namespace *old_ns)
 {
struct net_namespace *ns;
 
+   /* level 3 namespaces are incomplete in order to have childs */
+   if (current_net_ns-level == NET_NS_LEVEL3)
+   return ERR_PTR(-EPERM);
+
ns = kzalloc(sizeof(struct net_namespace), GFP_KERNEL);
if (!ns)
return NULL;
@@ -48,20 +54,25 @@ static struct net_namespace *clone_net_n
 
if ((push_net_ns(ns)) != old_ns)
BUG();
+   if (level ==  NET_NS_LEVEL2) {
 #ifdef CONFIG_IP_MULTIPLE_TABLES
-   INIT_LIST_HEAD(ns-fib_rules_ops_list);
+   INIT_LIST_HEAD(ns-fib_rules_ops_list);
 #endif
-   if (ip_fib_struct_init())
-   goto out_fib4;
+   if (ip_fib_struct_init())
+   goto out_fib4;
+   }
+   ns-level = level;
if (loopback_init())
goto out_loopback;
pop_net_ns(old_ns);
-   printk(KERN_DEBUG NET_NS: created new netcontext %p for %s 
-   (pid=%d)\n, ns, current-comm, current-tgid);
+   printk(KERN_DEBUG NET_NS: created new netcontext %p, level %u, 
+   for %s (pid=%d)\n, ns, (ns-level == NET_NS_LEVEL2) ?
+   2 : 3, current-comm, current-tgid);
return ns;
 
 out_loopback:
-   ip_fib_struct_cleanup(ns);
+   if (level ==  NET_NS_LEVEL2)
+   ip_fib_struct_cleanup(ns);
 out_fib4:
pop_net_ns(old_ns);
BUG_ON(atomic_read(ns-kref.refcount) != 1);
@@ -75,13 +86,17 @@ out_fib4:
 int unshare_net_ns(unsigned long unshare_flags,
   struct net_namespace **new_net)
 {
+   unsigned int level;
+
if (unshare_flags  (CLONE_NEWNET2|CLONE_NEWNET3)) {
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
 
-   *new_net = clone_net_ns(current-nsproxy-net_ns);
-   if (!*new_net)
-   return -ENOMEM;
+   level = (unshare_flags  CLONE_NEWNET2) ? NET_NS_LEVEL2 :
+   NET_NS_LEVEL3;
+   *new_net = clone_net_ns(level, current-nsproxy-net_ns);
+   if (IS_ERR(*new_net))
+   return PTR_ERR(*new_net);
}
 
return 0;
@@ -110,7 +125,8 @@ void free_net_ns(struct kref *kref)
ns, atomic_read(ns-kref.refcount));
return;
}
-   ip_fib_struct_cleanup(ns);
+   if (ns-level == NET_NS_LEVEL2)
+   ip_fib_struct_cleanup(ns);
printk(KERN_DEBUG NET_NS: net namespace %p destroyed\n, ns);
kfree(ns);
 }
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Devel] Re: Network virtualization/isolation

2006-12-09 Thread Dmitry Mishin
On Saturday 09 December 2006 09:35, Herbert Poetzl wrote:
 On Fri, Dec 08, 2006 at 10:13:48PM -0800, Andrew Morton wrote:
  On Sat, 9 Dec 2006 04:50:02 +0100
  Herbert Poetzl [EMAIL PROTECTED] wrote:
  
   On Fri, Dec 08, 2006 at 12:57:49PM -0700, Eric W. Biederman wrote:
Herbert Poetzl [EMAIL PROTECTED] writes:

 But, ok, it is not the real point to argue so much imho 
 and waste our time instead of doing things.
   
 well, IMHO better talk (and think) first, then implement
 something ... not the other way round, and then start
 fixing up the mess ...

Well we need a bit of both.
   
   hmm, are 'we' in a hurry here?
   
   until recently, 'Linux' (mainline) didn't even want
   to hear about OS Level virtualization, now there
   is a rush to quickly get 'something' in, not knowing
   or caring if it is usable at all?
  
  It's actually happening quite gradually and carefully.
 
 hmm, I must have missed a testing phase for the
 IPC namespace then, not that I think it is broken
 (well, maybe it is, we do not know yet)
Herbert,

you know that this code is used in our product. And in its turn, our
product is tested internally and by a community. We have no reports about
bugs in this code. If you have to say more than just something to say,
please, say it.

 
   I think there are a lot of 'potential users' for
   this kind of virtualization, and so 'we' can test
   almost all aspects outside of mainline, and once
   we know the stuff works as expected, then we can
   integrate it ...
   
   the UTS namespace was something 'we all' had already
   implemented in this (or a very similar) way, and in
   one or two interations, it should actually work as 
   expected. nevertheless, it was one of the simplest
   spaces ...
   
   we do not yet know the details for the IPC namespace,
   as IPC is not that easy to check as UTS, and 'we'
   haven't gotten real world feedback on that yet ...
  
  We are very dependent upon all stakeholders including yourself 
  to review, test and comment upon this infrastructure as it is 
  proposed and merged. If something is proposed which will not 
  suit your requirements then it is important that we hear about 
  it, in detail, at the earliest possible time.
 
 okay, good to hear that I'm still considered a stakeholder 
 
 will try to focus the feedback and cc as many folks
 as possible, as it seems that some feedback is lost
 on the way upstream ...
 
 best,
 Herbert
 
  Thanks.
 

-- 
Thanks,
Dmitry.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network virtualization/isolation

2006-12-04 Thread Dmitry Mishin
On Sunday 03 December 2006 19:00, Eric W. Biederman wrote:
 Ok.  Just a quick summary of where I see the discussion.

 We all agree that L2 isolation is needed at some point.
As we all agreed on this, may be it is time to send patches one-by-one?
For the beggining, I propose to resend Cedric's empty namespace patch as base 
for others - it is really empty, but necessary in order to move further.

After this patch and the following net namespace unshare patch will be 
accepted, I could send network devices virtualization patches for review and 
discussion.

What do you think?


 The approaches discussed for L2 and L3 are sufficiently orthogonal
 that we can implement then in either order.  You would need to
 unshare L3 to unshare L2, but if we think of them as two separate
 namespaces we are likely to be in better shape.

 The L3 discussion still has the problem that there has not been
 agreement on all of the semantics yet.

 More comments after I get some sleep.

 Eric
 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Thanks,
Dmitry.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network virtualization/isolation

2006-12-04 Thread Dmitry Mishin
On Monday 04 December 2006 18:35, Eric W. Biederman wrote:
[skip]
 Where and when you look to find the network namespace that applies to
 a packet is the primary difference between the OpenVZ L2
 implementation and my L2 implementation.

 If there is a better and less intrusive while still being obvious
 method I am all for it.  I do not like the OpenVZ thing of doing the
 lookup once and then stashing the value in current and the special
 casing the exceptions.
Why?

-- 
Thanks,
Dmitry.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network virtualization/isolation

2006-12-04 Thread Dmitry Mishin
On Monday 04 December 2006 19:43, Herbert Poetzl wrote:
 On Mon, Dec 04, 2006 at 06:19:00PM +0300, Dmitry Mishin wrote:
  On Sunday 03 December 2006 19:00, Eric W. Biederman wrote:
   Ok.  Just a quick summary of where I see the discussion.
  
   We all agree that L2 isolation is needed at some point.
 
  As we all agreed on this, may be it is time to send patches
  one-by-one? For the beggining, I propose to resend Cedric's
  empty namespace patch as base for others - it is really empty,
  but necessary in order to move further.
 
  After this patch and the following net namespace unshare
  patch will be accepted,

 well, I have neither seen any performance tests showing
 that the following is true:

  - no change on network performance without the
space enabled
  - no change on network performance on the host
with the network namespaces enabled
  - no measureable overhead inside the network
namespace
  - good scaleability for a larger number of network
namespaces
These questions are for complete L2 implementation, not for these 2 empty 
patches. If you need some data relating to Andrey's implementation, I'll get 
it. Which test do you accept?
 

  I could send network devices virtualization patches for
  review and discussion.

 that won't hurt ...

 best,
 Herbert

  What do you think?
 
   The approaches discussed for L2 and L3 are sufficiently orthogonal
   that we can implement then in either order.  You would need to
   unshare L3 to unshare L2, but if we think of them as two separate
   namespaces we are likely to be in better shape.
  
   The L3 discussion still has the problem that there has not been
   agreement on all of the semantics yet.
  
   More comments after I get some sleep.
  
   Eric
   -
   To unsubscribe from this list: send the line unsubscribe netdev in
   the body of a message to [EMAIL PROTECTED]
   More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
  --
  Thanks,
  Dmitry.
  ___
  Containers mailing list
  [EMAIL PROTECTED]
  https://lists.osdl.org/mailman/listinfo/containers

-- 
Thanks,
Dmitry.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] add ndisc_netdev_notifier unregister

2006-11-03 Thread Dmitry Mishin
If inet6_init() fails later than ndisc_init() call, or IPv6 module is 
unloaded, ndisc_netdev_notifier call remains in the list and will follows in 
oops later.

Signed-off-by: Dmitry Mishin [EMAIL PROTECTED]
---
 ndisc.c |1 +
 1 file changed, 1 insertion(+)
---
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 41a8a5f..73eb8c3 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -1742,6 +1742,7 @@ #endif
 
 void ndisc_cleanup(void)
 {
+   unregister_netdevice_notifier(ndisc_netdev_notifier);
 #ifdef CONFIG_SYSCTL
neigh_sysctl_unregister(nd_tbl.parms);
 #endif
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Bridge it's MAC address question

2006-10-30 Thread Dmitry Mishin
Hi,

Could somebody explain, why bridge uses minimal MAC of the attached devices?
It makes this address instable, variable during bridge life-cycle, which is 
not good for DHCP. For example, I want to attach multiple virtual devices to 
one physical. Then, I need to make sure that after each virtual device 
addition, bridge addr is not changed and still addr of the physical device.  
Why not to use MAC of the first attached device?

-- 
Thanks,
Dmitry.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network virtualization/isolation

2006-10-27 Thread Dmitry Mishin
On Thursday 26 October 2006 19:56, Stephen Hemminger wrote:
 On Thu, 26 Oct 2006 11:44:55 +0200

 Daniel Lezcano [EMAIL PROTECTED] wrote:
  Stephen Hemminger wrote:
   On Wed, 25 Oct 2006 17:51:28 +0200
  
   Daniel Lezcano [EMAIL PROTECTED] wrote:
  Hi Stephen,
  
  currently the work to make the container enablement into the kernel is
  doing good progress. The ipc, pid, utsname and filesystem system
  ressources are isolated/virtualized relying on the namespaces concept.
  
  But, there is missing the network virtualization/isolation. Two
  approaches are proposed: doing the isolation at the layer 2 and at the
  layer 3.
  
  The first one instanciate a network device by namespace and add a peer
  network device into the root namespace, all the routing ressources
   are relative to the namespace. This work is done by Andrey Savochkin
   from the openvz project.
  
  The second relies on the routes and associates the network namespace
  pointer with each route. When the traffic is incoming, the packet
  follows an input route and retrieve the associated network namespace.
  When the traffic is outgoing, the packet, identified from the network
  namespace is coming from, follows only the routes matching the same
  network namespace. This work is made by me.
  
  IMHO, we need the two approach, the layer-2 to be able to bring *very*
  strong isolation for system container with a performance cost and a
  layer-3 to be able to have good isolation for lightweight container or
  application container when performances are more important.
  
  Do you have some suggestions ? What is your point of view on that ?
  
  Thanks in advance.
  
 -- Daniel
  
   Any solution should allow both and it should build on the existing
   netfilter infrastructure.
 
  The problem is netfilter can not give a good isolation, eg. how can be
  handled netstat command ? or avoid to see IP addresses assigned to
  another container when doing ifconfig ? Furthermore, one of the biggest
  interest of the network isolation is to bring mobility with a container
  and that can only be done if the network ressources inside the kernel
  can be identified by container in order to checkpoint/restart them.
 
  The all-in-namespace solution, ie. at layer 2, is very good in terms of
  isolation but it adds an non-negligeable overhead. The layer 3 isolation
has an insignifiant overhead, a good isolation perfectly adapted for
  applications containers.
 
  Unfortunatly, from the point of view of implementation, layer 3 can not
  be a subset of layer 2 isolation when using all-in-namespace and layer
  2 isolation can not be a extension of the layer 3 isolation.
 
  I think the layer 2 and the layer 3 implementations can coexists. You
  can for example create a system container with a layer 2 isolation and
  inside it add a layer 3 isolation.
 
  Does that make sense ?
 
  -- Daniel

 Assuming you are talking about pseudo-virtualized environments,
 there are several different discussions.

 1. How should the namespace be isolated for the virtualized containered
applications?

 2. How should traffic be restricted into/out of those containers. This
is where existing netfilter, classification, etc, should be used.
The network code is overly rich as it is, we don't need another
abstraction.

 3. Can the virtualized containers be secure? No. we really can't keep
hostile root in a container from killing system without going to
a hypervisor.
Stephen, 

Virtualized container can be secure, if it is complete system virtualization, 
not just an application container. OpenVZ implements such and it is used hard 
over the world. And of course, we care a lot to keep hostile root from
killing whole system.
 
OpenVZ uses virtualization on IP level (implemented by Andrey Savochkin, 
http://marc.theaimsgroup.com/?l=linux-netdevm=115572448503723), with all
necessary network objects isolated/virtualized, such as sockets, devices, 
routes, netfilters, etc.

-- 
Thanks,
Dmitry.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] network namespaces

2006-09-12 Thread Dmitry Mishin
Sorry, dont' understand your proposal correctly from the previous talk. :)
But...

On Tuesday 12 September 2006 07:28, Eric W. Biederman wrote:
 Do you have some concrete arguments against the proposal?
Yes, I have. I think it is unnecessary complication. This complication will 
followed in additional bugs. Especially if we'll accept rules creation in 
userspace. Why we need complex solution, if there are only two approaches to  
socket bound - isolation and virtualization? These approaches could co-exist 
without hooks. Or you probably have thoughts about other ways?

-- 
Thanks,
Dmitry.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Devel] Re: [RFC] network namespaces

2006-09-11 Thread Dmitry Mishin
On Monday 11 September 2006 18:57, Herbert Poetzl wrote:
 I completely agree here, we need a separate namespace
 for that, so that we can combine isolation and virtualization
 as needed, unless the bind restrictions can be completely
 expressed with an additional mangle or filter table (as
 was suggested)
iptables are designed for packet flow decisions and filtering, it has nothing 
common with bind restrictions. So, it may be only packet flow 
scheduling/filtering, but it will not help to resolve bind-time IP conflicts.

-- 
Thanks,
Dmitry.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Devel] Re: [RFC] network namespaces

2006-09-10 Thread Dmitry Mishin
On Sunday 10 September 2006 06:47, Herbert Poetzl wrote:
 well, I think it would be best to have both, as
 they are complementary to some degree, and IMHO
 both, the full virtualization _and_ the isolation
 will require a separate namespace to work,   
[snip]
 I do not think that folks would want to recompile
 their kernel just to get a light-weight guest or
 a fully virtualized one
In this case light-weight guest will have unnecessary overhead.
For example, instead of using static pointer, we have to find the required 
common namespace before. And there will be no advantages for such guest over 
full-featured.


 best,
 Herbert

  --
  Thanks,
  Dmitry.

-- 
Thanks,
Dmitry.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Devel] Re: [RFC] network namespaces

2006-09-10 Thread Dmitry Mishin
On Sunday 10 September 2006 07:41, Eric W. Biederman wrote:
 I certainly agree that we are not at a point where a final decision
 can be made.  A major piece of that is that a layer 2 approach has
 not shown to be without a performance penalty.
But it is required. Why to limit possible usages?
 
 A practical question.  Do the IPs assigned to guests ever get used
 by anything besides the guest?
In case of level2 virtualization - no.

-- 
Thanks,
Dmitry.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Devel] Re: [RFC] network namespaces

2006-09-09 Thread Dmitry Mishin
On Friday 08 September 2006 22:11, Herbert Poetzl wrote:
 actually the light-weight ip isolation runs perfectly
 fine _without_ CAP_NET_ADMIN, as you do not want the
 guest to be able to mess with the 'configured' ips at
 all (not to speak of interfaces here)
It was only an example. I'm thinking about how to implement flexible solution, 
which permits light-weight ip isolation as well as full-fledged netwrok 
virtualization. Another solution is to split CONFIG_NET_NAMESPACE. Is it good 
for you?

-- 
Thanks,
Dmitry.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Devel] Re: [RFC] network namespaces

2006-09-08 Thread Dmitry Mishin
On Thursday 07 September 2006 21:27, Herbert Poetzl wrote:
 well, who said that you need to have things like RAW sockets
 or other protocols except IP, not to speak of iptable and
 routing entries ...

 folks who _want_ full network virtualization can use the
 more complete virtual setup and be happy ...
Let's think about how to implement this.
As I understood VServer's design, your proposal is to split CAP_NET_ADMIN to
multiple capabilities and use them if required. So, for your light-weight 
container it is enough to implement context isolation for protected by 
CAP_NET_IP capability (for example) code and put 'if (!capable(CAP_NET_*))' 
checks to all other places. But this could be easily implemented over OpenVZ 
code by CAP_VE_NET_ADMIN split.

So, the question is:
Could you point out the places in Andrey's implementation of network 
namespaces, which prevents you to add CAP_NET_ADMIN separation later?

-- 
Thanks,
Dmitry.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html