[NET-2.6.24][VETH][patch 0/1] fix kernel Oops for veth

2007-09-19 Thread dlezcano
When I tryed the veth driver, I fall into a kernel oops.

qemu login: Oops:  [#1]
Modules linked in:
CPU:0
EIP:0060:[c0265c9e]Not tainted VLI
EFLAGS: 0202   (2.6.23-rc6-g754f885d-dirty #33)
EIP is at __linkwatch_run_queue+0x6a/0x175
eax: c7fc9550   ebx: 6b6b6b6b   ecx: c3360c80   edx: 0246
esi: 0001   edi: 6b6b6b6b   ebp: c7fd9f7c   esp: c7fd9f5c
ds: 007b   es: 007b   fs:   gs:   ss: 0068
Process events/0 (pid: 5, ti=c7fd8000 task=c7fc9550 task.ti=c7fd8000)
Stack: c7fee5a8 c0387680 c7fd9f74 c02e1aaa 4f732564 c0387684 c7fee5a8 c0387680 
   c7fd9f84 c0265dc9 c7fd9fac c011fb3c c7fd9f94 c02e277e c7fd9fac c02e1166 
   c0265da9 c7fee5a8 c0120203 c7fd9fc8 c7fd9fd0 c01202ba  c7fc9550 
Call Trace:
 [c0102c69] show_trace_log_lvl+0x1a/0x2f
 [c0102d1b] show_stack_log_lvl+0x9d/0xa5
 [c0102ee1] show_registers+0x1be/0x28f
 [c010309a] die+0xe8/0x208
 [c010d5a1] do_page_fault+0x4ba/0x595
 [c02e2842] error_code+0x6a/0x70
 [c0265dc9] linkwatch_event+0x20/0x27
 [c011fb3c] run_workqueue+0x7c/0x102
 [c01202ba] worker_thread+0xb7/0xc5
 [c012270c] kthread+0x39/0x61
 [c0102913] kernel_thread_helper+0x7/0x10
 ===
Code: b8 60 76 38 c0 e8 e3 ca 07 00 b8 60 76 38 c0 8b 1d 78 a7 3d c0 c7 05 78 
a7 3d c0 00 00 00 00 e8 df ca 07 00 e9 ed 00 00 00 85 f6 8b bb f4 01 00 00 74 
17 89 d8 e8 73 fe ff ff 85 c0 75 0c 89 d8 
EIP: [c0265c9e] __linkwatch_run_queue+0x6a/0x175 SS:ESP 0068:c7fd9f5c
Slab corruption: size-2048 start=c473eac8, len=2048
Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
Last user: [c025be72](free_netdev+0x1f/0x41)
200: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b c0 e2 73 c4
Prev obj: start=c473e2b0, len=2048
Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
Last user: [c025bed0](alloc_netdev_mq+0x3c/0xa1)
000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
010: 76 65 74 68 30 00 00 00 00 00 00 00 00 00 00 00
Next obj: start=c473f2e0, len=2048
Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
Last user: [c0260e69](neigh_sysctl_unregister+0x2b/0x2e)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b

That happens when trying to add the veth driver using the ip command:

ip link add veth0 

which fail.

It appears that the netif_carrier_off is placed into the setup function
and this one is called before register_netdevice.

The register_netdevice function does a lot of initialization to the netdev
and if the netif_carrier_off is called before the register_netdev function,
it will use and trigger an event for an uninitialized netdev.

-- 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-2.6.24][patch 0/2] Dynamically allocate the loopback

2007-09-17 Thread dlezcano
This patch allows to dynamically allocate the loopback
like an usual network device.

This global static variable loopback_dev has been replaced by a
netdev pointer and the init function does the usual allocation,
initialization and registering of the loopback.

This patchset is splitted in two parts, the first one is a big but
trivial patch which replace the usage of the static variable loopback_dev
by the usage of a pointer. The second patch is the interesting part where
the loopback is dynamically allocated.

-- 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-2.6.24][patch 2/2] Dynamically allocate the loopback device

2007-09-17 Thread dlezcano
From: Daniel Lezcano [EMAIL PROTECTED]

Doing this makes loopback.c a better example of how to do a
simple network device, and it removes the special case
single static allocation of a struct net_device, hopefully
making maintenance easier.

Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]
Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]
Acked-By: Kirill Korotaev [EMAIL PROTECTED]
Acked-by: Benjamin Thery [EMAIL PROTECTED]
---
 drivers/net/loopback.c |   69 ++---
 1 file changed, 43 insertions(+), 26 deletions(-)

Index: net-2.6.24/drivers/net/loopback.c
===
--- net-2.6.24.orig/drivers/net/loopback.c
+++ net-2.6.24/drivers/net/loopback.c
@@ -202,44 +202,61 @@ static const struct ethtool_ops loopback
  * The loopback device is special. There is only one instance and
  * it is statically allocated. Don't do this for other devices.
  */
-struct net_device __loopback_dev = {
-   .name   = lo,
-   .get_stats  = get_stats,
-   .mtu= (16 * 1024) + 20 + 20 + 12,
-   .hard_start_xmit= loopback_xmit,
-   .hard_header= eth_header,
-   .hard_header_cache  = eth_header_cache,
-   .header_cache_update= eth_header_cache_update,
-   .hard_header_len= ETH_HLEN, /* 14   */
-   .addr_len   = ETH_ALEN, /* 6*/
-   .tx_queue_len   = 0,
-   .type   = ARPHRD_LOOPBACK,  /* 0x0001*/
-   .rebuild_header = eth_rebuild_header,
-   .flags  = IFF_LOOPBACK,
-   .features   = NETIF_F_SG | NETIF_F_FRAGLIST
+static void loopback_setup(struct net_device *dev)
+{
+   dev-get_stats  = get_stats;
+   dev-mtu= (16 * 1024) + 20 + 20 + 12;
+   dev-hard_start_xmit= loopback_xmit;
+   dev-hard_header= eth_header;
+   dev-hard_header_cache  = eth_header_cache;
+   dev-header_cache_update = eth_header_cache_update;
+   dev-hard_header_len= ETH_HLEN; /* 14   */
+   dev-addr_len   = ETH_ALEN; /* 6*/
+   dev-tx_queue_len   = 0;
+   dev-type   = ARPHRD_LOOPBACK;  /* 0x0001*/
+   dev-rebuild_header = eth_rebuild_header;
+   dev-flags  = IFF_LOOPBACK;
+   dev-features   = NETIF_F_SG | NETIF_F_FRAGLIST
 #ifdef LOOPBACK_TSO
- | NETIF_F_TSO
+   | NETIF_F_TSO
 #endif
- | NETIF_F_NO_CSUM | NETIF_F_HIGHDMA
- | NETIF_F_LLTX
- | NETIF_F_NETNS_LOCAL,
-   .ethtool_ops= loopback_ethtool_ops,
-   .nd_net = init_net,
-};
-
-struct net_device *loopback_dev = __loopback_dev;
+   | NETIF_F_NO_CSUM
+   | NETIF_F_HIGHDMA
+   | NETIF_F_LLTX
+   | NETIF_F_NETNS_LOCAL,
+   dev-ethtool_ops= loopback_ethtool_ops;
+}
 
 /* Setup and register the loopback device. */
 static int __init loopback_init(void)
 {
-   int err = register_netdev(loopback_dev);
+   struct net_device *dev;
+   int err;
+
+   err = -ENOMEM;
+   dev = alloc_netdev(0, lo, loopback_setup);
+   if (!dev)
+   goto out;
 
+   err = register_netdev(dev);
+   if (err)
+   goto out_free_netdev;
+
+   err = 0;
+   loopback_dev = dev;
+
+out:
if (err)
panic(loopback: Failed to register netdevice: %d\n, err);
+   return err;
 
+out_free_netdev:
+   free_netdev(dev);
+   goto out;
return err;
 };
 
-module_init(loopback_init);
+fs_initcall(loopback_init);
 
+struct net_device *loopback_dev;
 EXPORT_SYMBOL(loopback_dev);

-- 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-2.6.24][patch 1/2] Dynamically allocate the loopback device - mindless changes

2007-09-17 Thread dlezcano
From: Daniel Lezcano [EMAIL PROTECTED]

This patch replaces all occurences to the static variable
loopback_dev to a pointer loopback_dev. That provides the
mindless, trivial, uninteressting change part for the dynamic
allocation for the loopback.

Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]
Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]
Acked-By: Kirill Korotaev [EMAIL PROTECTED]
Acked-by: Benjamin Thery [EMAIL PROTECTED]
---
 drivers/net/loopback.c   |6 --
 include/linux/netdevice.h|2 +-
 net/core/dst.c   |8 
 net/decnet/dn_dev.c  |4 ++--
 net/decnet/dn_route.c|   14 +++---
 net/ipv4/devinet.c   |6 +++---
 net/ipv4/ipconfig.c  |6 +++---
 net/ipv4/ipvs/ip_vs_core.c   |2 +-
 net/ipv4/route.c |   18 +-
 net/ipv4/xfrm4_policy.c  |2 +-
 net/ipv6/addrconf.c  |   15 +--
 net/ipv6/ip6_input.c |2 +-
 net/ipv6/netfilter/ip6t_REJECT.c |2 +-
 net/ipv6/route.c |   15 ++-
 net/ipv6/xfrm6_policy.c  |2 +-
 net/xfrm/xfrm_policy.c   |4 ++--
 16 files changed, 55 insertions(+), 53 deletions(-)

Index: net-2.6.24/drivers/net/loopback.c
===
--- net-2.6.24.orig/drivers/net/loopback.c
+++ net-2.6.24/drivers/net/loopback.c
@@ -202,7 +202,7 @@ static const struct ethtool_ops loopback
  * The loopback device is special. There is only one instance and
  * it is statically allocated. Don't do this for other devices.
  */
-struct net_device loopback_dev = {
+struct net_device __loopback_dev = {
.name   = lo,
.get_stats  = get_stats,
.mtu= (16 * 1024) + 20 + 20 + 12,
@@ -227,10 +227,12 @@ struct net_device loopback_dev = {
.nd_net = init_net,
 };
 
+struct net_device *loopback_dev = __loopback_dev;
+
 /* Setup and register the loopback device. */
 static int __init loopback_init(void)
 {
-   int err = register_netdev(loopback_dev);
+   int err = register_netdev(loopback_dev);
 
if (err)
panic(loopback: Failed to register netdevice: %d\n, err);
Index: net-2.6.24/include/linux/netdevice.h
===
--- net-2.6.24.orig/include/linux/netdevice.h
+++ net-2.6.24/include/linux/netdevice.h
@@ -742,7 +742,7 @@ struct packet_type {
 #include linux/interrupt.h
 #include linux/notifier.h
 
-extern struct net_device   loopback_dev;   /* The loopback 
*/
+extern struct net_device   *loopback_dev;  /* The loopback 
*/
 extern rwlock_tdev_base_lock;  /* 
Device list lock */
 
 
Index: net-2.6.24/net/core/dst.c
===
--- net-2.6.24.orig/net/core/dst.c
+++ net-2.6.24/net/core/dst.c
@@ -278,13 +278,13 @@ static inline void dst_ifdown(struct dst
if (!unregister) {
dst-input = dst-output = dst_discard;
} else {
-   dst-dev = loopback_dev;
-   dev_hold(loopback_dev);
+   dst-dev = loopback_dev;
+   dev_hold(dst-dev);
dev_put(dev);
if (dst-neighbour  dst-neighbour-dev == dev) {
-   dst-neighbour-dev = loopback_dev;
+   dst-neighbour-dev = loopback_dev;
dev_put(dev);
-   dev_hold(loopback_dev);
+   dev_hold(dst-neighbour-dev);
}
}
 }
Index: net-2.6.24/net/decnet/dn_dev.c
===
--- net-2.6.24.orig/net/decnet/dn_dev.c
+++ net-2.6.24/net/decnet/dn_dev.c
@@ -869,10 +869,10 @@ last_chance:
rv = dn_dev_get_first(dev, addr);
read_unlock(dev_base_lock);
dev_put(dev);
-   if (rv == 0 || dev == loopback_dev)
+   if (rv == 0 || dev == loopback_dev)
return rv;
}
-   dev = loopback_dev;
+   dev = loopback_dev;
dev_hold(dev);
goto last_chance;
 }
Index: net-2.6.24/net/decnet/dn_route.c
===
--- net-2.6.24.orig/net/decnet/dn_route.c
+++ net-2.6.24/net/decnet/dn_route.c
@@ -887,7 +887,7 @@ static int dn_route_output_slow(struct d
.scope = RT_SCOPE_UNIVERSE,
 } },
.mark = oldflp-mark,
-   .iif = loopback_dev.ifindex,
+   .iif = loopback_dev-ifindex,
.oif = oldflp-oif };
struct dn_route *rt = NULL;
struct net_device *dev_out = NULL, *dev;

[net-2.6.24][NETNS][patch 0/1] fix allnoconfig compilation erro

2007-09-13 Thread dlezcano
fixes a compilation issue when allnoconfig is used.
 - init_net is unresolved.

-- 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-2.6.24][NETNS][patch 1/1] fix allnoconfig compilation error

2007-09-13 Thread dlezcano
From: Daniel Lezcano [EMAIL PROTECTED]

When CONFIG_NET=no, init_net is unresolved because net_namespace.c
is not compiled and the include pull init_net definition.

This problem was very similar with the ipc namespace where the kernel
can be compiled with SYSV ipc out.

This patch fix that defining a macro which simply remove init_net
initialization from nsproxy namespace aggregator.

Compiled and booted on qemu-i386 with CONFIG_NET=no and CONFIG_NET=yes.

Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]
Acked-by: Eric W. Biederman [EMAIL PROTECTED]
---
 include/linux/init_task.h   |2 +-
 include/net/net_namespace.h |7 +++
 2 files changed, 8 insertions(+), 1 deletion(-)

Index: net-2.6.24/include/linux/init_task.h
===
--- net-2.6.24.orig/include/linux/init_task.h
+++ net-2.6.24/include/linux/init_task.h
@@ -79,7 +79,7 @@ extern struct nsproxy init_nsproxy;
.nslock = __SPIN_LOCK_UNLOCKED(nsproxy.nslock), \
.uts_ns = init_uts_ns, \
.mnt_ns = NULL, \
-   .net_ns = init_net,\
+   INIT_NET_NS(net_ns) \
INIT_IPC_NS(ipc_ns) \
.user_ns= init_user_ns,\
 }
Index: net-2.6.24/include/net/net_namespace.h
===
--- net-2.6.24.orig/include/net/net_namespace.h
+++ net-2.6.24/include/net/net_namespace.h
@@ -28,7 +28,14 @@ struct net {
struct hlist_head   *dev_index_head;
 };
 
+#ifdef CONFIG_NET
+/* Init's network namespace */
 extern struct net init_net;
+#define INIT_NET_NS(net_ns) .net_ns = init_net,
+#else
+#define INIT_NET_NS(net_ns)
+#endif
+
 extern struct list_head net_namespace_list;
 
 extern void __put_net(struct net *net);

-- 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-2.6.24][NETNS][patch 0/3] fixes for the core network namespace

2007-09-12 Thread dlezcano
The following patches fixes some compilation errors and boot problems
related to the network namespace patchset.

They apply to net-2.6.24
-- 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-2.6.24][NETNS][patch 3/3] fix bad macro definition

2007-09-12 Thread dlezcano
From: Daniel Lezcano [EMAIL PROTECTED]

The macro definition is bad. When calling next_net_device with 
parameter name dev, the resulting code is:
  struct net_device *dev = dev and that leads to an unexpected
behavior. Especially when llc_core is compiled in, the kernel panics
at boot time.
The patchset change macro definition with static inline functions as
they were defined before.

Signed-off-by: Benjamin Thery [EMAIL PROTECTED]
Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]
---
 include/linux/netdevice.h |   35 +--
 1 file changed, 17 insertions(+), 18 deletions(-)

Index: net-2.6.24/include/linux/netdevice.h
===
--- net-2.6.24.orig/include/linux/netdevice.h
+++ net-2.6.24/include/linux/netdevice.h
@@ -41,7 +41,8 @@
 #include linux/dmaengine.h
 #include linux/workqueue.h
 
-struct net;
+#include net/net_namespace.h
+
 struct vlan_group;
 struct ethtool_ops;
 struct netpoll_info;
@@ -739,23 +740,21 @@
list_for_each_entry_continue(d, (net)-dev_base_head, dev_list)
 #define net_device_entry(lh)   list_entry(lh, struct net_device, dev_list)
 
-#define next_net_device(d) \
-({ \
-   struct net_device *dev = d; \
-   struct list_head *lh;   \
-   struct net *net;\
-   \
-   net = dev-nd_net;  \
-   lh = dev-dev_list.next;\
-   lh == net-dev_base_head ? NULL : net_device_entry(lh);\
-})
-
-#define first_net_device(N)\
-({ \
-   struct net *NET = (N);  \
-   list_empty(NET-dev_base_head) ? NULL :\
-   net_device_entry(NET-dev_base_head.next);  \
-})
+static inline struct net_device *next_net_device(struct net_device *dev)
+{
+   struct list_head *lh;
+   struct net *net;
+
+   net = dev-nd_net;
+lh = dev-dev_list.next;
+   return lh == net-dev_base_head ? NULL : net_device_entry(lh);
+}
+
+static inline struct net_device *first_net_device(struct net *net)
+{
+   return list_empty(net-dev_base_head) ? NULL :
+   net_device_entry(net-dev_base_head.next);
+}
 
 extern int netdev_boot_setup_check(struct net_device *dev);
 extern unsigned long   netdev_boot_base(const char *prefix, int unit);

-- 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-2.6.24][NETNS][patch 2/3] fix loopback network namespace initialization

2007-09-12 Thread dlezcano
From: Daniel Lezcano [EMAIL PROTECTED]

The core patchset of the network namespace sent by 
Eric Biederman does not do dynamic loopback creation.
So there is no call to alloc_netdev_mq which fills the
network namespace field of the netdevice.

This patch assign the loopback to the init network namespace.

Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]
---
 drivers/net/loopback.c |1 +
 1 file changed, 1 insertion(+)

Index: net-2.6.24/drivers/net/loopback.c
===
--- net-2.6.24.orig/drivers/net/loopback.c
+++ net-2.6.24/drivers/net/loopback.c
@@ -225,6 +225,7 @@
  | NETIF_F_LLTX
  | NETIF_F_NETNS_LOCAL,
.ethtool_ops= loopback_ethtool_ops,
+   .nd_net = init_net,
 };
 
 /* Setup and register the loopback device. */

-- 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-2.6.24][NETNS][patch 1/3] fix export symbols

2007-09-12 Thread dlezcano
From: Daniel Lezcano [EMAIL PROTECTED]

Add the appropriate EXPORT_SYMBOLS for proc_net_create,
proc_net_fops_create and proc_net_remove to fix errors when
compiling allmodconfig

Signed-off-by: Mark Nelson [EMAIL PROTECTED]
Acked-by: Benjamin Thery [EMAIL PROTECTED]
---
 fs/proc/proc_net.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: net-2.6.24/fs/proc/proc_net.c
===
--- net-2.6.24.orig/fs/proc/proc_net.c
+++ net-2.6.24/fs/proc/proc_net.c
@@ -31,6 +31,7 @@
 {
return create_proc_info_entry(name,mode, net-proc_net, get_info);
 }
+EXPORT_SYMBOL_GPL(proc_net_create);
 
 struct proc_dir_entry *proc_net_fops_create(struct net *net,
const char *name, mode_t mode, const struct file_operations *fops)
@@ -42,12 +43,13 @@
res-proc_fops = fops;
return res;
 }
+EXPORT_SYMBOL_GPL(proc_net_fops_create);
 
 void proc_net_remove(struct net *net, const char *name)
 {
remove_proc_entry(name, net-proc_net);
 }
-
+EXPORT_SYMBOL_GPL(proc_net_remove);
 
 static struct proc_dir_entry *proc_net_shadow;
 

-- 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-2.6.24][XFRM][patch 0/1] fix allmodconfig

2007-09-12 Thread dlezcano
Fixes missing export symbols

-- 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-2.6.24][XFRM][patch 1/1] fix xfrm audit export symbol for allmodconfig

2007-09-12 Thread dlezcano
From: Daniel Lezcano [EMAIL PROTECTED]

This patch fixes export symbol for:
xfrm_audit_policy_add
xfrm_audit_policy_delete
xfrm_audit_state_add
xfrm_audit_state_delete
That allows xfrm_user and af_key to be compiled as module

I didn't used EXPORT_SYMBOL_GPL to be consistent with the rest
of the code.

Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]
---
 net/xfrm/xfrm_policy.c |2 ++
 net/xfrm/xfrm_state.c  |3 +++
 2 files changed, 5 insertions(+)

Index: net-2.6.24/net/xfrm/xfrm_policy.c
===
--- net-2.6.24.orig/net/xfrm/xfrm_policy.c
+++ net-2.6.24/net/xfrm/xfrm_policy.c
@@ -2341,6 +2341,7 @@
xfrm_audit_common_policyinfo(xp, audit_buf);
audit_log_end(audit_buf);
 }
+EXPORT_SYMBOL(xfrm_audit_policy_add);
 
 void
 xfrm_audit_policy_delete(struct xfrm_policy *xp, int result, u32 auid, u32 sid)
@@ -2357,6 +2358,7 @@
xfrm_audit_common_policyinfo(xp, audit_buf);
audit_log_end(audit_buf);
 }
+EXPORT_SYMBOL(xfrm_audit_policy_delete);
 #endif
 
 #ifdef CONFIG_XFRM_MIGRATE
Index: net-2.6.24/net/xfrm/xfrm_state.c
===
--- net-2.6.24.orig/net/xfrm/xfrm_state.c
+++ net-2.6.24/net/xfrm/xfrm_state.c
@@ -1865,6 +1865,7 @@
 (unsigned long)x-id.spi, (unsigned long)x-id.spi);
audit_log_end(audit_buf);
 }
+EXPORT_SYMBOL(xfrm_audit_state_add);
 
 void
 xfrm_audit_state_delete(struct xfrm_state *x, int result, u32 auid, u32 sid)
@@ -1883,4 +1884,6 @@
 (unsigned long)x-id.spi, (unsigned long)x-id.spi);
audit_log_end(audit_buf);
 }
+EXPORT_SYMBOL(xfrm_audit_state_delete);
+
 #endif /* CONFIG_AUDITSYSCALL */

-- 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 0/1] [PATCH] Fix Kconfigs for net-2.6.24

2007-09-05 Thread dlezcano
Fixes for 3 typos in Kconfig files

-- 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 1/1] Fix some Kconfigs on net-2.6.24

2007-09-05 Thread dlezcano
From: Daniel Lezcano [EMAIL PROTECTED]

Three fixes for Kconfigs.

Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]

---
 drivers/input/misc/Kconfig |2 +-
 drivers/leds/Kconfig   |2 +-
 drivers/telephony/Kconfig  |2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

Index: net-2.6.24/drivers/input/misc/Kconfig
===
--- net-2.6.24.orig/drivers/input/misc/Kconfig
+++ net-2.6.24/drivers/input/misc/Kconfig
@@ -152,7 +152,7 @@
 
 config INPUT_YEALINK
tristate Yealink usb-p1k voip phone
-   depends EXPERIMENTAL
+   depends on EXPERIMENTAL
depends on USB_ARCH_HAS_HCD
select USB
help
Index: net-2.6.24/drivers/leds/Kconfig
===
--- net-2.6.24.orig/drivers/leds/Kconfig
+++ net-2.6.24/drivers/leds/Kconfig
@@ -83,7 +83,7 @@
 
 config LEDS_H1940
tristate LED Support for iPAQ H1940 device
-   depends LEDS_CLASS  ARCH_H1940
+   depends on LEDS_CLASS  ARCH_H1940
help
  This option enables support for the LEDs on the h1940.
 
Index: net-2.6.24/drivers/telephony/Kconfig
===
--- net-2.6.24.orig/drivers/telephony/Kconfig
+++ net-2.6.24/drivers/telephony/Kconfig
@@ -19,7 +19,7 @@
 
 config PHONE_IXJ
tristate QuickNet Internet LineJack/PhoneJack support
-   depends ISA || PCI
+   depends on ISA || PCI
---help---
  Say M if you have a telephony card manufactured by Quicknet
  Technologies, Inc.  These include the Internet PhoneJACK and

-- 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 1/1][RFC] add a private field to the sock structure

2007-08-29 Thread dlezcano
From: Daniel Lezcano [EMAIL PROTECTED]

Store private information for a socket

This patch adds a field to the common socket structure. This field
is a anonymous pointer which allow to store an information about
the socket

Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]

---
 include/net/inet_timewait_sock.h |1 +
 include/net/sock.h   |3 +++
 net/ipv4/inet_timewait_sock.c|1 +
 3 files changed, 5 insertions(+)

Index: net-2.6.24-bf/include/net/sock.h
===
--- net-2.6.24-bf.orig/include/net/sock.h
+++ net-2.6.24-bf/include/net/sock.h
@@ -106,6 +106,7 @@
  * @skc_refcnt: reference count
  * @skc_hash: hash value used with various protocol lookup tables
  * @skc_prot: protocol handlers inside a network family
+ *  @skc_private: field used to store private data
  *
  * This is the minimal network layer representation of sockets, the header
  * for struct sock and struct inet_timewait_sock.
@@ -120,6 +121,7 @@
atomic_tskc_refcnt;
unsigned intskc_hash;
struct proto*skc_prot;
+   void*skc_private;
 };
 
 /**
@@ -196,6 +198,7 @@
 #define sk_refcnt  __sk_common.skc_refcnt
 #define sk_hash__sk_common.skc_hash
 #define sk_prot__sk_common.skc_prot
+#define sk_private __sk_common.skc_private
unsigned char   sk_shutdown : 2,
sk_no_check : 2,
sk_userlocks : 4;
Index: net-2.6.24-bf/net/ipv4/inet_timewait_sock.c
===
--- net-2.6.24-bf.orig/net/ipv4/inet_timewait_sock.c
+++ net-2.6.24-bf/net/ipv4/inet_timewait_sock.c
@@ -108,6 +108,7 @@
tw-tw_hash = sk-sk_hash;
tw-tw_ipv6only = 0;
tw-tw_prot = sk-sk_prot_creator;
+   tw-tw_private  = sk-sk_private;
atomic_set(tw-tw_refcnt, 1);
inet_twsk_dead_node_init(tw);
__module_get(tw-tw_prot-owner);
Index: net-2.6.24-bf/include/net/inet_timewait_sock.h
===
--- net-2.6.24-bf.orig/include/net/inet_timewait_sock.h
+++ net-2.6.24-bf/include/net/inet_timewait_sock.h
@@ -115,6 +115,7 @@
 #define tw_refcnt  __tw_common.skc_refcnt
 #define tw_hash__tw_common.skc_hash
 #define tw_prot__tw_common.skc_prot
+#define tw_private  __tw_common.skc_private
volatile unsigned char  tw_substate;
/* 3 bits hole, try to pack */
unsigned char   tw_rcv_wscale;

-- 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 0/1][RFC] add a private field to the sock structure

2007-08-29 Thread dlezcano
When a socket is created it is sometime useful to store a specific information
for this socket.
 
This information can be for examples: 
* a creation time
* a pid
* a uid/gid
* a container identifier
* a pointer to a more specific structure
* ...

The following patch is a proposition to add a private anonymous pointer
field to the common part of the sock structure.

-- 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] Dynamically allocate the loopback device

2007-08-24 Thread dlezcano
From: Daniel Lezcano [EMAIL PROTECTED]

Doing this makes loopback.c a better example of how to do a
simple network device, and it removes the special case
single static allocation of a struct net_device, hopefully
making maintenance easier.

Applies against net-2.6.24

Tested on i386, x86_64
Compiled on ia64, sparc

Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]
Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]
Acked-By: Kirill Korotaev [EMAIL PROTECTED]
Acked-by: Benjamin Thery [EMAIL PROTECTED]
---
 drivers/net/loopback.c   |   63 +++---
 include/linux/netdevice.h|2 +-
 net/core/dst.c   |8 ++--
 net/decnet/dn_dev.c  |4 +-
 net/decnet/dn_route.c|   14 
 net/ipv4/devinet.c   |6 ++--
 net/ipv4/ipconfig.c  |6 ++--
 net/ipv4/ipvs/ip_vs_core.c   |2 +-
 net/ipv4/route.c |   18 +-
 net/ipv4/xfrm4_policy.c  |2 +-
 net/ipv6/addrconf.c  |   15 +---
 net/ipv6/ip6_input.c |2 +-
 net/ipv6/netfilter/ip6t_REJECT.c |2 +-
 net/ipv6/route.c |   15 +++-
 net/ipv6/xfrm6_policy.c  |2 +-
 net/xfrm/xfrm_policy.c   |4 +-
 16 files changed, 89 insertions(+), 76 deletions(-)

diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index 5106c23..3642aff 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -199,44 +199,57 @@ static const struct ethtool_ops loopback_ethtool_ops = {
.get_rx_csum= always_on,
 };
 
-/*
- * The loopback device is special. There is only one instance and
- * it is statically allocated. Don't do this for other devices.
- */
-struct net_device loopback_dev = {
-   .name   = lo,
-   .get_stats  = get_stats,
-   .mtu= (16 * 1024) + 20 + 20 + 12,
-   .hard_start_xmit= loopback_xmit,
-   .hard_header= eth_header,
-   .hard_header_cache  = eth_header_cache,
-   .header_cache_update= eth_header_cache_update,
-   .hard_header_len= ETH_HLEN, /* 14   */
-   .addr_len   = ETH_ALEN, /* 6*/
-   .tx_queue_len   = 0,
-   .type   = ARPHRD_LOOPBACK,  /* 0x0001*/
-   .rebuild_header = eth_rebuild_header,
-   .flags  = IFF_LOOPBACK,
-   .features   = NETIF_F_SG | NETIF_F_FRAGLIST
+static void loopback_setup(struct net_device *dev)
+{
+   dev-get_stats  = get_stats;
+   dev-mtu= (16 * 1024) + 20 + 20 + 12;
+   dev-hard_start_xmit= loopback_xmit;
+   dev-hard_header= eth_header;
+   dev-hard_header_cache  = eth_header_cache;
+   dev-header_cache_update = eth_header_cache_update;
+   dev-hard_header_len= ETH_HLEN; /* 14   */
+   dev-addr_len   = ETH_ALEN; /* 6*/
+   dev-tx_queue_len   = 0;
+   dev-type   = ARPHRD_LOOPBACK;  /* 0x0001*/
+   dev-rebuild_header = eth_rebuild_header;
+   dev-flags  = IFF_LOOPBACK;
+   dev-features   = NETIF_F_SG | NETIF_F_FRAGLIST
 #ifdef LOOPBACK_TSO
  | NETIF_F_TSO
 #endif
  | NETIF_F_NO_CSUM | NETIF_F_HIGHDMA
- | NETIF_F_LLTX,
-   .ethtool_ops= loopback_ethtool_ops,
-};
+ | NETIF_F_LLTX;
+   dev-ethtool_ops= loopback_ethtool_ops;
+}
 
 /* Setup and register the loopback device. */
 static int __init loopback_init(void)
 {
-   int err = register_netdev(loopback_dev);
+   struct net_device *dev;
+   int err;
+   
+   err = -ENOMEM;
+   dev = alloc_netdev(0, lo, loopback_setup);
+   if (!dev)
+   goto out;
+
+   err = register_netdev(dev);
+   if (err)
+   goto out_free_netdev;
 
+   err = 0;
+   loopback_dev = dev;
+
+out:
if (err)
panic(loopback: Failed to register netdevice: %d\n, err);
-
return err;
+out_free_netdev:
+   free_netdev(dev);
+   goto out;
 };
 
-module_init(loopback_init);
+fs_initcall(loopback_init);
 
+struct net_device *loopback_dev;
 EXPORT_SYMBOL(loopback_dev);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 8d12f02..7cd0641 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -680,7 +680,7 @@ struct packet_type {
 #include linux/interrupt.h
 #include linux/notifier.h
 
-extern struct net_device   loopback_dev;   /* The loopback 
*/
+extern struct net_device   *loopback_dev;  /* The loopback 
*/
 extern struct list_headdev_base_head;  /* All 
devices */
 extern rwlock_tdev_base_lock;  

[patch 01/12] net namespace : initialize init process to level 2

2007-01-19 Thread dlezcano
From: Daniel Lezcano [EMAIL PROTECTED]

Initialize the init's network namespace to level 2

Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]

---
 net/core/net_namespace.c |1 +
 1 file changed, 1 insertion(+)

Index: 2.6.20-rc4-mm1/net/core/net_namespace.c
===
--- 2.6.20-rc4-mm1.orig/net/core/net_namespace.c
+++ 2.6.20-rc4-mm1/net/core/net_namespace.c
@@ -21,6 +21,7 @@
.dev_tail_p = init_net_ns.dev_base_p,
.loopback_dev_p = NULL,
.pcpu_lstats_p  = NULL,
+   .level  = NET_NS_LEVEL2,
 };
 
 #ifdef CONFIG_NET_NS

-- 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 00/12] net namespace : L3 namespace - introduction

2007-01-19 Thread dlezcano
This patchset provide a network isolation similar at what
Linux-Vserver provides. It is based on the L2 namespaces and relies on
the mechanisms provided by the namespace. This L3 namespaces does not
aim to bring full virtualization for the network, it provides an IP
isolation which can be reused for Linux-Vserver, jailed application or
application containers.

A L3 namespace are always L2 s' childs and they can not create more
network namespaces, furthermore, they lose their NET_ADMIN
capability. They share their parent's network ressources. From the
parent namespace, IP addresses are created and assigned to the
different L3 childs. From this point, L3 namespaces can use their
assigned IP address and all computed broadcast addresses.

Because the L3 namespace relies on the L2 virtualization mechanisms,
it is possible to have several L3 namespaces listening on
INADDR_ANY:port without conflict, that's allow to run several server
without modifying the network configuration.

The loopback is a shared device between all L3 namespaces. To ensure
the 127.0.0.1 address isolation, the sender store its namespace into
the packet, so when the packet arrives, the destination namespace is
already set, because source == destination. By this way, it is
easy to disable the loopback isolation and let the application to talk
with application outside of the namespace via the 127.0.0.1 because we
consider them trusted (like portmap).

The ifconfig / ip commands will only show IP addresses assigned to the
L3 namespace. When a L3 namespace dies, the assigned IP address is
released to its parent.

At the IP level, when a packet arrives, the L3 network namespace
destination is retrieved from the destination address.

At the bind time, the address is checked against the assigned IP
address.

-- 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 02/12] net namespace : store L2 parent namespace

2007-01-19 Thread dlezcano
From: Daniel Lezcano [EMAIL PROTECTED]

All L3 namespaces are the final nodes of the L2 namespaces
tree. Because their share some ressources coming from the L2
namespace. The L2 parent namespace should be stored into the L3 child
when it is created.

Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]

---
 include/linux/net_namespace.h |1 +
 net/core/net_namespace.c  |   11 +++
 2 files changed, 12 insertions(+)

Index: 2.6.20-rc4-mm1/include/linux/net_namespace.h
===
--- 2.6.20-rc4-mm1.orig/include/linux/net_namespace.h
+++ 2.6.20-rc4-mm1/include/linux/net_namespace.h
@@ -27,6 +27,7 @@
 #define NET_NS_LEVEL2  1
 #define NET_NS_LEVEL3  2
unsigned intlevel;
+   struct net_namespace*parent;
 };
 
 extern struct net_namespace init_net_ns;
Index: 2.6.20-rc4-mm1/net/core/net_namespace.c
===
--- 2.6.20-rc4-mm1.orig/net/core/net_namespace.c
+++ 2.6.20-rc4-mm1/net/core/net_namespace.c
@@ -22,6 +22,7 @@
.loopback_dev_p = NULL,
.pcpu_lstats_p  = NULL,
.level  = NET_NS_LEVEL2,
+   .parent = NULL,
 };
 
 #ifdef CONFIG_NET_NS
@@ -62,6 +63,12 @@
if (ip_fib_struct_init())
goto out_fib4;
}
+
+   if (level == NET_NS_LEVEL3) {
+   get_net_ns(old_ns);
+   ns-parent = old_ns;
+   }
+
ns-level = level;
if (loopback_init())
goto out_loopback;
@@ -126,8 +133,12 @@
ns, atomic_read(ns-kref.refcount));
return;
}
+
if (ns-level == NET_NS_LEVEL2)
ip_fib_struct_cleanup(ns);
+   if (ns-level == NET_NS_LEVEL3)
+   put_net_ns(ns-parent);
+
printk(KERN_DEBUG NET_NS: net namespace %p destroyed\n, ns);
kfree(ns);
 }

-- 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 12/12] net namespace : Add broadcasting

2007-01-19 Thread dlezcano
From: Daniel Lezcano [EMAIL PROTECTED]

Broadcast packets should be delivered to l2 and all l3 childs

Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]

---
 include/linux/net_namespace.h |   11 +++
 net/core/net_namespace.c  |   27 +++
 net/ipv4/udp.c|3 ++-
 3 files changed, 40 insertions(+), 1 deletion(-)

Index: 2.6.20-rc4-mm1/include/linux/net_namespace.h
===
--- 2.6.20-rc4-mm1.orig/include/linux/net_namespace.h
+++ 2.6.20-rc4-mm1/include/linux/net_namespace.h
@@ -9,6 +9,7 @@
 
 struct in_ifaddr;
 struct sk_buff;
+struct sock;
 
 struct net_namespace {
struct kref kref;
@@ -109,6 +110,9 @@
 
 extern void net_ns_tag_sk_buff(struct sk_buff *skb);
 
+extern int net_ns_sock_is_visible(const struct sock *sk,
+ const struct net_namespace *net_ns);
+
 #define SELECT_SRC_ADDR net_ns_select_source_address
 
 #else /* CONFIG_NET_NS */
@@ -192,6 +196,13 @@
 {
;
 }
+
+static inline int net_ns_sock_is_visible(const struct sock *sk,
+const struct net_namespace *net_ns)
+{
+   return 1;
+}
+
 #define SELECT_SRC_ADDR inet_select_addr
 
 #endif /* !CONFIG_NET_NS */
Index: 2.6.20-rc4-mm1/net/core/net_namespace.c
===
--- 2.6.20-rc4-mm1.orig/net/core/net_namespace.c
+++ 2.6.20-rc4-mm1/net/core/net_namespace.c
@@ -17,6 +17,7 @@
 #include linux/ip.h
 
 #include net/ip_fib.h
+#include net/sock.h
 
 struct net_namespace init_net_ns = {
.kref = {
@@ -464,4 +465,30 @@
struct net_namespace *net_ns = current_net_ns;
skb-net_ns = net_ns;
 }
+
+/*
+ * This function checks if the socket is visible from the specified
+ * namespace. This is needed to ensure the broadcast and the multicast
+ * for multiple network namespace l2 and l3 to have the packets to be
+ * delivered. If we have a l3 namespace and its parent (l2 namespace)
+ * listening on a broadcast address, we should deliver the packet to
+ * both. That is done by the udp_v4_mcast_next function. But we should
+ * find a common point between sockets which are relatives to a
+ * namespace.  The common point is they have the same parent in case
+ * of l3 network namespace.
+ * @sk : the socket to be checked
+ * @net_ns : the receiving network namespace
+ * Returns: 1 if the socket is visible by the namespace, 0 otherwise.
+ */
+int net_ns_sock_is_visible(const struct sock *sk,
+  const struct net_namespace *net_ns)
+{
+   if (net_ns-level == NET_NS_LEVEL3)
+   net_ns = net_ns-parent;
+
+   if (sk-sk_net_ns-level == NET_NS_LEVEL3)
+   return sk-sk_net_ns-parent == net_ns;
+   else
+   return sk-sk_net_ns == net_ns;
+}
 #endif /* CONFIG_NET_NS */
Index: 2.6.20-rc4-mm1/net/ipv4/udp.c
===
--- 2.6.20-rc4-mm1.orig/net/ipv4/udp.c
+++ 2.6.20-rc4-mm1/net/ipv4/udp.c
@@ -309,9 +309,10 @@
(inet-dport != rmt_port  inet-dport)||
(inet-rcv_saddr  inet-rcv_saddr != loc_addr)||
ipv6_only_sock(s)   ||
-   !net_ns_match(sk-sk_net_ns, ns)||
(s-sk_bound_dev_if  s-sk_bound_dev_if != dif))
continue;
+   if (!net_ns_sock_is_visible(sk, ns))
+   continue;
if (!ip_mc_sf_allow(s, loc_addr, rmt_addr, dif))
continue;
goto found;

-- 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 08/12] net namespace : find namespace by addr

2007-01-19 Thread dlezcano
From: Daniel Lezcano [EMAIL PROTECTED]

Switch to the the l3 namespace using the destination address.

Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]

---
 include/linux/net_namespace.h |7 +++
 net/core/net_namespace.c  |   35 +++
 net/ipv4/ip_input.c   |   16 +++-
 3 files changed, 57 insertions(+), 1 deletion(-)

Index: 2.6.20-rc4-mm1/net/ipv4/ip_input.c
===
--- 2.6.20-rc4-mm1.orig/net/ipv4/ip_input.c
+++ 2.6.20-rc4-mm1/net/ipv4/ip_input.c
@@ -374,6 +374,9 @@
 {
struct iphdr *iph;
u32 len;
+   int err;
+   struct net_namespace *net_ns = current_net_ns;
+   struct net_namespace *dst_net_ns = NULL;
 
/* When the interface is in promisc. mode, drop all the crap
 * that it receives, do not try to analyse it.
@@ -393,6 +396,9 @@
 
iph = skb-nh.iph;
 
+   dst_net_ns = net_ns_find_from_dest_addr(iph-daddr);
+   if (dst_net_ns  !net_ns_match(net_ns, dst_net_ns))
+   push_net_ns(dst_net_ns);
/*
 *  RFC1122: 3.1.2.2 MUST silently discard any IP frame that fails 
the checksum.
 *
@@ -431,10 +437,18 @@
/* Remove any debris in the socket control block */
memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
 
-   return NF_HOOK(PF_INET, NF_IP_PRE_ROUTING, skb, dev, NULL,
+   err =  NF_HOOK(PF_INET, NF_IP_PRE_ROUTING, skb, dev, NULL,
   ip_rcv_finish);
 
+   if (dst_net_ns  !net_ns_match(net_ns, dst_net_ns))
+   pop_net_ns(net_ns);
+
+   return err;
+
 inhdr_error:
+   if (dst_net_ns  !net_ns_match(net_ns, dst_net_ns))
+   pop_net_ns(net_ns);
+
IP_INC_STATS_BH(IPSTATS_MIB_INHDRERRORS);
 drop:
 kfree_skb(skb);
Index: 2.6.20-rc4-mm1/include/linux/net_namespace.h
===
--- 2.6.20-rc4-mm1.orig/include/linux/net_namespace.h
+++ 2.6.20-rc4-mm1/include/linux/net_namespace.h
@@ -99,6 +99,8 @@
 extern __be32 net_ns_select_source_address(const struct net_device *dev,
   u32 dst, int scope);
 
+extern struct net_namespace *net_ns_find_from_dest_addr(u32 daddr);
+
 #define SELECT_SRC_ADDR net_ns_select_source_address
 
 #else /* CONFIG_NET_NS */
@@ -167,6 +169,11 @@
return 0;
 }
 
+static inline struct net_namespace *net_ns_find_from_dest_addr(u32 daddr)
+{
+   return NULL;
+}
+
 #define SELECT_SRC_ADDR inet_select_addr
 
 #endif /* !CONFIG_NET_NS */
Index: 2.6.20-rc4-mm1/net/core/net_namespace.c
===
--- 2.6.20-rc4-mm1.orig/net/core/net_namespace.c
+++ 2.6.20-rc4-mm1/net/core/net_namespace.c
@@ -385,4 +385,39 @@
 out:
return addr;
 }
+
+/*
+ * This function finds the network namespace destination deduced from
+ * the destination address. The network namespace is retrieved from
+ * the ifaddr owned by a network namespace
+ * @daddr  : destination
+ * Returns : the network namespace destination or NULL if not found
+ */
+struct net_namespace *net_ns_find_from_dest_addr(u32 daddr)
+{
+   struct net_namespace *net_ns = NULL;
+   struct net_device *dev;
+   struct in_device *in_dev;
+
+   if (LOOPBACK(daddr))
+   return current_net_ns;
+
+   read_lock(dev_base_lock);
+   rcu_read_lock();
+   for (dev = dev_base; dev; dev = dev-next) {
+   if ((in_dev = __in_dev_get_rcu(dev)) == NULL)
+   continue;
+   for_ifa(in_dev) {
+   if (ifa-ifa_local == daddr) {
+   net_ns = ifa-ifa_net_ns;
+   goto out_unlock_both;
+   }
+   } endfor_ifa(in_dev);
+   }
+out_unlock_both:
+   read_unlock(dev_base_lock);
+   rcu_read_unlock();
+
+   return net_ns;
+}
 #endif /* CONFIG_NET_NS */

-- 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 09/12] net namespace : make loopback address always visible

2007-01-19 Thread dlezcano
From: Daniel Lezcano [EMAIL PROTECTED]

Add a specific condition when doing inet interface listing 
in order to see always the loopback address.

Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]

---
 include/linux/net_namespace.h |9 +
 net/core/net_namespace.c  |   22 ++
 net/ipv4/devinet.c|   12 +---
 3 files changed, 36 insertions(+), 7 deletions(-)

Index: 2.6.20-rc4-mm1/net/ipv4/devinet.c
===
--- 2.6.20-rc4-mm1.orig/net/ipv4/devinet.c
+++ 2.6.20-rc4-mm1/net/ipv4/devinet.c
@@ -695,8 +695,7 @@
for (ifap = in_dev-ifa_list; (ifa = *ifap) != NULL;
 ifap = ifa-ifa_next) {
if (!strcmp(ifr.ifr_name, ifa-ifa_label) 
-   net_ns_match(ifa-ifa_net_ns,
-current_net_ns) 
+   net_ns_ifa_is_visible(ifa) 
sin_orig.sin_addr.s_addr ==
ifa-ifa_address) {
break; /* found */
@@ -710,13 +709,12 @@
for (ifap = in_dev-ifa_list; (ifa = *ifap) != NULL;
 ifap = ifa-ifa_next)
if (!strcmp(ifr.ifr_name, ifa-ifa_label) 
-net_ns_match(ifa-ifa_net_ns,
- current_net_ns))
+net_ns_ifa_is_visible(ifa))
break;
}
}
 
-   if (ifa  !net_ns_match(ifa-ifa_net_ns, current_net_ns))
+   if (ifa  !net_ns_ifa_is_visible(ifa))
goto done;
 
ret = -EADDRNOTAVAIL;
@@ -868,7 +866,7 @@
goto out;
 
for (; ifa; ifa = ifa-ifa_next) {
-   if (!net_ns_match(ifa-ifa_net_ns, current_net_ns))
+   if (!net_ns_ifa_is_visible(ifa))
continue;
if (!buf) {
done += sizeof(ifr);
@@ -1216,7 +1214,7 @@
 
for (ifa = in_dev-ifa_list, ip_idx = 0; ifa;
 ifa = ifa-ifa_next, ip_idx++) {
-   if (!net_ns_match(ifa-ifa_net_ns, current_net_ns))
+   if (!net_ns_ifa_is_visible(ifa))
continue;
if (ip_idx  s_ip_idx)
continue;
Index: 2.6.20-rc4-mm1/include/linux/net_namespace.h
===
--- 2.6.20-rc4-mm1.orig/include/linux/net_namespace.h
+++ 2.6.20-rc4-mm1/include/linux/net_namespace.h
@@ -7,6 +7,8 @@
 #include linux/errno.h
 #include linux/types.h
 
+struct in_ifaddr;
+
 struct net_namespace {
struct kref kref;
struct net_device   *dev_base_p, **dev_tail_p;
@@ -101,6 +103,8 @@
 
 extern struct net_namespace *net_ns_find_from_dest_addr(u32 daddr);
 
+extern int net_ns_ifa_is_visible(const struct in_ifaddr *ifa);
+
 #define SELECT_SRC_ADDR net_ns_select_source_address
 
 #else /* CONFIG_NET_NS */
@@ -174,6 +178,11 @@
return NULL;
 }
 
+static inline int net_ns_ifa_is_visible(const struct in_ifaddr *ifa)
+{
+   return 1;
+}
+
 #define SELECT_SRC_ADDR inet_select_addr
 
 #endif /* !CONFIG_NET_NS */
Index: 2.6.20-rc4-mm1/net/core/net_namespace.c
===
--- 2.6.20-rc4-mm1.orig/net/core/net_namespace.c
+++ 2.6.20-rc4-mm1/net/core/net_namespace.c
@@ -420,4 +420,26 @@
 
return net_ns;
 }
+
+/*
+ * This function checks if the ifaddr is visible from the
+ * current network namespace. This is true if the ifaddr is
+ * the loopback address or if the ifaddr is owned by the network
+ * namespace.
+ * @ifa : the ifaddr
+ * Returns : 1 if visible, 0 otherwise
+ */
+int net_ns_ifa_is_visible(const struct in_ifaddr *ifa)
+{
+   struct net_namespace *net_ns = current_net_ns;
+
+   if (LOOPBACK(ifa-ifa_local))
+   return 1;
+
+   if (net_ns_match(ifa-ifa_net_ns, net_ns))
+   return 1;
+
+   return 0;
+}
+
 #endif /* CONFIG_NET_NS */

-- 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 03/12] net namespace : share network ressources L2 with L3

2007-01-19 Thread dlezcano
From: Daniel Lezcano [EMAIL PROTECTED]

L3 namespace will use routes and devices belonging to its parent, so
the old network namespace structure is copied when allocating a new
one. By this way, hash value, dev list, routes are accessible from the
L3 namespaces. In case of L2 namespace, these values are overwritten
by the newly allocated values.

Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]

---
 include/linux/net_namespace.h |   14 ++
 net/core/dev.c|4 ++--
 net/core/net_namespace.c  |   33 ++---
 3 files changed, 34 insertions(+), 17 deletions(-)

Index: 2.6.20-rc4-mm1/net/core/net_namespace.c
===
--- 2.6.20-rc4-mm1.orig/net/core/net_namespace.c
+++ 2.6.20-rc4-mm1/net/core/net_namespace.c
@@ -37,7 +37,7 @@
  * Return ERR_PTR on error, new ns otherwise
  */
 static struct net_namespace *clone_net_ns(unsigned int level,
-   struct net_namespace *old_ns)
+ struct net_namespace *old_ns)
 {
struct net_namespace *ns;
 
@@ -45,23 +45,26 @@
if (current_net_ns-level == NET_NS_LEVEL3)
return ERR_PTR(-EPERM);
 
-   ns = kzalloc(sizeof(struct net_namespace), GFP_KERNEL);
+   ns = kmemdup(old_ns, sizeof(struct net_namespace), GFP_KERNEL);
if (!ns)
return NULL;
 
kref_init(ns-kref);
-   ns-dev_base_p = NULL;
-   ns-dev_tail_p = ns-dev_base_p;
-   ns-hash = net_random();
-
if ((push_net_ns(ns)) != old_ns)
+
BUG();
if (level ==  NET_NS_LEVEL2) {
+   ns-dev_base_p = NULL;
+   ns-dev_tail_p = ns-dev_base_p;
+   ns-hash = net_random();
+
 #ifdef CONFIG_IP_MULTIPLE_TABLES
INIT_LIST_HEAD(ns-fib_rules_ops_list);
 #endif
if (ip_fib_struct_init())
goto out_fib4;
+   if (loopback_init())
+   goto out_loopback;
}
 
if (level == NET_NS_LEVEL3) {
@@ -70,8 +73,6 @@
}
 
ns-level = level;
-   if (loopback_init())
-   goto out_loopback;
pop_net_ns(old_ns);
printk(KERN_DEBUG NET_NS: created new netcontext %p, level %u, 
for %s (pid=%d)\n, ns, (ns-level == NET_NS_LEVEL2) ?
@@ -127,15 +128,17 @@
struct net_namespace *ns;
 
ns = container_of(kref, struct net_namespace, kref);
-   unregister_netdev(ns-loopback_dev_p);
-   if (ns-dev_base_p != NULL) {
-   printk(NET_NS: BUG: namespace %p has devices! ref %d\n,
-   ns, atomic_read(ns-kref.refcount));
-   return;
-   }
 
-   if (ns-level == NET_NS_LEVEL2)
+   if (ns-level == NET_NS_LEVEL2) {
ip_fib_struct_cleanup(ns);
+   unregister_netdev(ns-loopback_dev_p);
+   if (ns-dev_base_p != NULL) {
+   printk(NET_NS: BUG: namespace %p has devices! ref 
%d\n,
+  ns, atomic_read(ns-kref.refcount));
+   return;
+   }
+   }
+
if (ns-level == NET_NS_LEVEL3)
put_net_ns(ns-parent);
 
Index: 2.6.20-rc4-mm1/include/linux/net_namespace.h
===
--- 2.6.20-rc4-mm1.orig/include/linux/net_namespace.h
+++ 2.6.20-rc4-mm1/include/linux/net_namespace.h
@@ -56,6 +56,15 @@
 DECLARE_PER_CPU(struct net_namespace *, exec_net_ns);
 #define current_net_ns (__get_cpu_var(exec_net_ns))
 
+static inline struct net_namespace *net_ns_l2(void)
+{
+   struct net_namespace *net_ns = current_net_ns;
+
+   if (net_ns-level == NET_NS_LEVEL3)
+   return net_ns-parent;
+   return net_ns;
+}
+
 static inline void init_current_net_ns(int cpu)
 {
get_net_ns(init_net_ns);
@@ -110,6 +119,11 @@
 
 #define current_net_ns NULL
 
+static inline struct net_namespace *net_ns_l2(void)
+{
+   return NULL;
+}
+
 static inline void init_current_net_ns(int cpu)
 {
 }
Index: 2.6.20-rc4-mm1/net/core/dev.c
===
--- 2.6.20-rc4-mm1.orig/net/core/dev.c
+++ 2.6.20-rc4-mm1/net/core/dev.c
@@ -485,7 +485,7 @@
 struct net_device *__dev_get_by_name(const char *name)
 {
struct hlist_node *p;
-   struct net_namespace *ns = current_net_ns;
+   struct net_namespace *ns = net_ns_l2();
 
hlist_for_each(p, dev_name_hash(name, ns)) {
struct net_device *dev
@@ -768,7 +768,7 @@
if (!err) {
hlist_del(dev-name_hlist);
hlist_add_head(dev-name_hlist, dev_name_hash(dev-name,
-   current_net_ns));
+  net_ns_l2()));

[patch 06/12] net namespace : check bind address

2007-01-19 Thread dlezcano
From: Daniel Lezcano [EMAIL PROTECTED]

Check the bind address is allowed. It must match ifaddr assigned to
the namespace and all derivative addresses.

Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]

---
 include/linux/net_namespace.h |7 +
 net/core/net_namespace.c  |   54 ++
 net/ipv4/af_inet.c|2 +
 net/ipv4/raw.c|3 ++
 4 files changed, 66 insertions(+)

Index: 2.6.20-rc4-mm1/net/ipv4/af_inet.c
===
--- 2.6.20-rc4-mm1.orig/net/ipv4/af_inet.c
+++ 2.6.20-rc4-mm1/net/ipv4/af_inet.c
@@ -433,6 +433,8 @@
 *  is temporarily down)
 */
err = -EADDRNOTAVAIL;
+   if (net_ns_check_bind(chk_addr_ret, addr-sin_addr.s_addr))
+   goto out;
if (!sysctl_ip_nonlocal_bind 
!inet-freebind 
addr-sin_addr.s_addr != INADDR_ANY 
Index: 2.6.20-rc4-mm1/net/ipv4/raw.c
===
--- 2.6.20-rc4-mm1.orig/net/ipv4/raw.c
+++ 2.6.20-rc4-mm1/net/ipv4/raw.c
@@ -559,7 +559,10 @@
if (sk-sk_state != TCP_CLOSE || addr_len  sizeof(struct sockaddr_in))
goto out;
chk_addr_ret = inet_addr_type(addr-sin_addr.s_addr);
+
ret = -EADDRNOTAVAIL;
+   if (net_ns_check_bind(chk_addr_ret, addr-sin_addr.s_addr))
+   goto out;
if (addr-sin_addr.s_addr  chk_addr_ret != RTN_LOCAL 
chk_addr_ret != RTN_MULTICAST  chk_addr_ret != RTN_BROADCAST)
goto out;
Index: 2.6.20-rc4-mm1/include/linux/net_namespace.h
===
--- 2.6.20-rc4-mm1.orig/include/linux/net_namespace.h
+++ 2.6.20-rc4-mm1/include/linux/net_namespace.h
@@ -93,6 +93,8 @@
 
 extern int net_ns_ioctl(unsigned int cmd, void __user *arg);
 
+extern int net_ns_check_bind(int addr_type, u32 addr);
+
 #else /* CONFIG_NET_NS */
 
 #define INIT_NET_NS(net_ns)
@@ -148,6 +150,11 @@
return -ENOSYS;
 }
 
+static inline int net_ns_check_bind(int addr_type, u32 addr)
+{
+   return 0;
+}
+
 #endif /* !CONFIG_NET_NS */
 
 #endif /* _LINUX_NET_NAMESPACE_H */
Index: 2.6.20-rc4-mm1/net/core/net_namespace.c
===
--- 2.6.20-rc4-mm1.orig/net/core/net_namespace.c
+++ 2.6.20-rc4-mm1/net/core/net_namespace.c
@@ -263,4 +263,58 @@
return err;
 }
 
+/*
+ * This function check if the specified bind address is allowed.
+ * The bind is allowed if the address is:
+ * - 127.0.0.1
+ * - INADDR_ANY
+ * - INADDR_BROADCAST
+ * - a multicast address
+ * - the specified address match an ifaddr owned by the current
+ *   network namespace. That implies the local address and the
+ *   computed address from the netmask
+ * @addr_type : an addr type
+ * @addr : the requested bind address
+ * Returns: -EPERM on failure, 0 on success
+ */
+int net_ns_check_bind(int addr_type, u32 addr)
+{
+   int ret = -EPERM;
+struct net_device *dev;
+struct in_device *in_dev;
+   struct net_namespace *net_ns = current_net_ns;
+
+   if (LOOPBACK(addr) ||
+   MULTICAST(addr) ||
+   INADDR_ANY == addr ||
+   INADDR_BROADCAST == addr)
+   return 0;
+
+read_lock(dev_base_lock);
+rcu_read_lock();
+for (dev = dev_base; dev; dev = dev-next) {
+in_dev = __in_dev_get_rcu(dev);
+if (!in_dev)
+continue;
+
+for_ifa(in_dev) {
+if (ifa-ifa_net_ns != net_ns)
+   continue;
+   if (addr == ifa-ifa_local ||
+   addr == ifa-ifa_broadcast ||
+   addr == (ifa-ifa_local  ifa-ifa_mask) ||
+   addr == ((ifa-ifa_address  ifa-ifa_mask)|
+ ~ifa-ifa_mask)) {
+   ret = 0;
+   goto out;
+   }
+} endfor_ifa(in_dev);
+}
+out:
+read_unlock(dev_base_lock);
+rcu_read_unlock();
+
+   return ret;
+}
+
 #endif /* CONFIG_NET_NS */

-- 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 07/12] net namespace: set source addresse

2007-01-19 Thread dlezcano
From: Daniel Lezcano [EMAIL PROTECTED]

When no source address is specified, search from the dev list the
ifaddr allowed to be used as source address.

Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]

---
 include/linux/net_namespace.h |   14 
 net/core/net_namespace.c  |   68 ++
 net/ipv4/route.c  |   28 +++--
 3 files changed, 100 insertions(+), 10 deletions(-)

Index: 2.6.20-rc4-mm1/net/ipv4/route.c
===
--- 2.6.20-rc4-mm1.orig/net/ipv4/route.c
+++ 2.6.20-rc4-mm1/net/ipv4/route.c
@@ -2475,17 +2475,17 @@
 
if (LOCAL_MCAST(oldflp-fl4_dst) || oldflp-fl4_dst == 
htonl(0x)) {
if (!fl.fl4_src)
-   fl.fl4_src = inet_select_addr(dev_out, 0,
- RT_SCOPE_LINK);
+   fl.fl4_src = SELECT_SRC_ADDR(dev_out, 0,
+RT_SCOPE_LINK);
goto make_route;
}
if (!fl.fl4_src) {
if (MULTICAST(oldflp-fl4_dst))
-   fl.fl4_src = inet_select_addr(dev_out, 0,
- fl.fl4_scope);
+   fl.fl4_src = SELECT_SRC_ADDR(dev_out, 0,
+fl.fl4_scope);
else if (!oldflp-fl4_dst)
-   fl.fl4_src = inet_select_addr(dev_out, 0,
- RT_SCOPE_HOST);
+   fl.fl4_src = SELECT_SRC_ADDR(dev_out, 0,
+RT_SCOPE_HOST);
}
}
 
@@ -2525,8 +2525,8 @@
 */
 
if (fl.fl4_src == 0)
-   fl.fl4_src = inet_select_addr(dev_out, 0,
- RT_SCOPE_LINK);
+   fl.fl4_src = SELECT_SRC_ADDR(dev_out, 0,
+RT_SCOPE_LINK);
res.type = RTN_UNICAST;
goto make_route;
}
@@ -2539,7 +2539,13 @@
 
if (res.type == RTN_LOCAL) {
if (!fl.fl4_src)
+#ifdef CONFIG_NET_NS
+   fl.fl4_src = net_ns_select_source_address(dev_out,
+ fl.fl4_dst,
+ 
RT_SCOPE_LINK);
+#else
fl.fl4_src = fl.fl4_dst;
+#endif
if (dev_out)
dev_put(dev_out);
dev_out = loopback_dev;
@@ -2561,8 +2567,10 @@
fib_select_default(fl, res);
 
if (!fl.fl4_src)
-   fl.fl4_src = FIB_RES_PREFSRC(res);
-
+   fl.fl4_src = res.fi-fib_prefsrc ? :
+   SELECT_SRC_ADDR(FIB_RES_DEV(res),
+   FIB_RES_GW(res),
+   res.scope);
if (dev_out)
dev_put(dev_out);
dev_out = FIB_RES_DEV(res);
Index: 2.6.20-rc4-mm1/include/linux/net_namespace.h
===
--- 2.6.20-rc4-mm1.orig/include/linux/net_namespace.h
+++ 2.6.20-rc4-mm1/include/linux/net_namespace.h
@@ -5,6 +5,7 @@
 #include linux/kref.h
 #include linux/nsproxy.h
 #include linux/errno.h
+#include linux/types.h
 
 struct net_namespace {
struct kref kref;
@@ -95,6 +96,11 @@
 
 extern int net_ns_check_bind(int addr_type, u32 addr);
 
+extern __be32 net_ns_select_source_address(const struct net_device *dev,
+  u32 dst, int scope);
+
+#define SELECT_SRC_ADDR net_ns_select_source_address
+
 #else /* CONFIG_NET_NS */
 
 #define INIT_NET_NS(net_ns)
@@ -155,6 +161,14 @@
return 0;
 }
 
+static inline __be32 net_ns_select_source_address(struct net_device *dev,
+ u32 dst, int scope)
+{
+   return 0;
+}
+
+#define SELECT_SRC_ADDR inet_select_addr
+
 #endif /* !CONFIG_NET_NS */
 
 #endif /* _LINUX_NET_NAMESPACE_H */
Index: 2.6.20-rc4-mm1/net/core/net_namespace.c
===
--- 2.6.20-rc4-mm1.orig/net/core/net_namespace.c
+++ 2.6.20-rc4-mm1/net/core/net_namespace.c
@@ -317,4 +317,72 @@
return ret;
 }
 
+/*
+ * This function choose the source address from the network device,
+ * destination and the scope. The function will browse the ifaddr
+ * owned by network namespace and choose the most adapted for the
+ * dst address and dev.
+ * @dev : the network device where the 

[patch 10/12] net namespace : add the loopback isolation

2007-01-19 Thread dlezcano
From: Daniel Lezcano [EMAIL PROTECTED]

When a packet is outgoing, the namespace source is stored
into the skbuff. Because it is the loopback address, the
source == destination, so when the packet is incoming, it
has already the namespace destination set into the packet.

Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]

---
 include/linux/net_namespace.h |   13 +++--
 include/linux/skbuff.h|5 -
 net/core/net_namespace.c  |   32 +++-
 net/ipv4/ip_input.c   |2 +-
 net/ipv4/ip_output.c  |1 +
 5 files changed, 44 insertions(+), 9 deletions(-)

Index: 2.6.20-rc4-mm1/include/linux/skbuff.h
===
--- 2.6.20-rc4-mm1.orig/include/linux/skbuff.h
+++ 2.6.20-rc4-mm1/include/linux/skbuff.h
@@ -225,6 +225,7 @@
  * @dma_cookie: a cookie to one of several possible DMA operations
  * done by skb DMA functions
  * @secmark: security marking
+ *  @net_ns: namespace destination
  */
 
 struct sk_buff {
@@ -309,7 +310,9 @@
 #ifdef CONFIG_NETWORK_SECMARK
__u32   secmark;
 #endif
-
+#ifdef CONFIG_NET_NS
+   struct net_namespace*net_ns;
+#endif
__u32   mark;
 
/* These elements must be at the end, see alloc_skb() for details.  */
Index: 2.6.20-rc4-mm1/net/ipv4/ip_input.c
===
--- 2.6.20-rc4-mm1.orig/net/ipv4/ip_input.c
+++ 2.6.20-rc4-mm1/net/ipv4/ip_input.c
@@ -396,7 +396,7 @@
 
iph = skb-nh.iph;
 
-   dst_net_ns = net_ns_find_from_dest_addr(iph-daddr);
+   dst_net_ns = net_ns_find_from_dest_addr(skb);
if (dst_net_ns  !net_ns_match(net_ns, dst_net_ns))
push_net_ns(dst_net_ns);
/*
Index: 2.6.20-rc4-mm1/net/ipv4/ip_output.c
===
--- 2.6.20-rc4-mm1.orig/net/ipv4/ip_output.c
+++ 2.6.20-rc4-mm1/net/ipv4/ip_output.c
@@ -272,6 +272,7 @@
 
IP_INC_STATS(IPSTATS_MIB_OUTREQUESTS);
 
+   net_ns_tag_sk_buff(skb);
skb-dev = dev;
skb-protocol = htons(ETH_P_IP);
 
Index: 2.6.20-rc4-mm1/include/linux/net_namespace.h
===
--- 2.6.20-rc4-mm1.orig/include/linux/net_namespace.h
+++ 2.6.20-rc4-mm1/include/linux/net_namespace.h
@@ -8,6 +8,7 @@
 #include linux/types.h
 
 struct in_ifaddr;
+struct sk_buff;
 
 struct net_namespace {
struct kref kref;
@@ -101,10 +102,13 @@
 extern __be32 net_ns_select_source_address(const struct net_device *dev,
   u32 dst, int scope);
 
-extern struct net_namespace *net_ns_find_from_dest_addr(u32 daddr);
+extern struct net_namespace
+*net_ns_find_from_dest_addr(const struct sk_buff *skb);
 
 extern int net_ns_ifa_is_visible(const struct in_ifaddr *ifa);
 
+extern void net_ns_tag_sk_buff(struct sk_buff *skb);
+
 #define SELECT_SRC_ADDR net_ns_select_source_address
 
 #else /* CONFIG_NET_NS */
@@ -173,7 +177,8 @@
return 0;
 }
 
-static inline struct net_namespace *net_ns_find_from_dest_addr(u32 daddr)
+static inline struct net_namespace
+*net_ns_find_from_dest_addr(const struct sk_buff *skb)
 {
return NULL;
 }
@@ -183,6 +188,10 @@
return 1;
 }
 
+static inline void net_ns_tag_sk_buff(struct sk_buff *skb)
+{
+   ;
+}
 #define SELECT_SRC_ADDR inet_select_addr
 
 #endif /* !CONFIG_NET_NS */
Index: 2.6.20-rc4-mm1/net/core/net_namespace.c
===
--- 2.6.20-rc4-mm1.orig/net/core/net_namespace.c
+++ 2.6.20-rc4-mm1/net/core/net_namespace.c
@@ -13,6 +13,9 @@
 #include linux/in.h
 #include linux/netdevice.h
 #include linux/inetdevice.h
+#include linux/skbuff.h
+#include linux/ip.h
+
 #include net/ip_fib.h
 
 struct net_namespace init_net_ns = {
@@ -389,18 +392,25 @@
 /*
  * This function finds the network namespace destination deduced from
  * the destination address. The network namespace is retrieved from
- * the ifaddr owned by a network namespace
- * @daddr  : destination
+ * the ifaddr owned by a network namespace. If the packet is for the
+ * loopback address so we assume the destination address is already filled
+ * by the sender which is the same as the receiver.
+ * @skb : the packet to be delivered
  * Returns : the network namespace destination or NULL if not found
  */
-struct net_namespace *net_ns_find_from_dest_addr(u32 daddr)
+struct net_namespace *net_ns_find_from_dest_addr(const struct sk_buff *skb)
 {
struct net_namespace *net_ns = NULL;
struct net_device *dev;
struct in_device *in_dev;
+   struct iphdr *iph;
+   __be32 daddr;
+
+   iph = skb-nh.iph;
+   daddr = iph-daddr;
 
-   if (LOOPBACK(daddr))
-   return current_net_ns;
+   if (LOOPBACK(daddr))
+   return skb-net_ns;
 
read_lock(dev_base_lock);
 

[patch 05/12] net namespace : ioctl to push ifa to net namespace l3

2007-01-19 Thread dlezcano
From: Daniel Lezcano [EMAIL PROTECTED]

New ioctl to push ifaddr to a container. Actually, the push is done
from the current namespace, so the right word is pull. That will be
changed to move ifaddr from l2 network namespace to l3.

Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]

---
 include/linux/net_namespace.h |7 ++
 include/linux/sockios.h   |4 +
 net/core/net_namespace.c  |  118 +-
 net/ipv4/af_inet.c|4 +
 4 files changed, 132 insertions(+), 1 deletion(-)

Index: 2.6.20-rc4-mm1/include/linux/sockios.h
===
--- 2.6.20-rc4-mm1.orig/include/linux/sockios.h
+++ 2.6.20-rc4-mm1/include/linux/sockios.h
@@ -122,6 +122,10 @@
 #define SIOCBRADDIF0x89a2  /* add interface to bridge  */
 #define SIOCBRDELIF0x89a3  /* remove interface from bridge */
 
+/* Container calls */
+#define SIOCNETNSPUSHIF  0x89b0 /* add ifaddr to namespace  */
+#define SIOCNETNSPULLIF  0x89b1 /* remove ifaddr to namespace   */
+
 /* Device private ioctl calls */
 
 /*
Index: 2.6.20-rc4-mm1/net/ipv4/af_inet.c
===
--- 2.6.20-rc4-mm1.orig/net/ipv4/af_inet.c
+++ 2.6.20-rc4-mm1/net/ipv4/af_inet.c
@@ -789,6 +789,10 @@
case SIOCSIFFLAGS:
err = devinet_ioctl(cmd, (void __user *)arg);
break;
+   case SIOCNETNSPUSHIF:
+   case SIOCNETNSPULLIF:
+   err = net_ns_ioctl(cmd, (void __user *)arg);
+   break;
default:
if (sk-sk_prot-ioctl)
err = sk-sk_prot-ioctl(sk, cmd, arg);
Index: 2.6.20-rc4-mm1/include/linux/net_namespace.h
===
--- 2.6.20-rc4-mm1.orig/include/linux/net_namespace.h
+++ 2.6.20-rc4-mm1/include/linux/net_namespace.h
@@ -91,6 +91,8 @@
 
 #define net_ns_hash(ns)((ns)-hash)
 
+extern int net_ns_ioctl(unsigned int cmd, void __user *arg);
+
 #else /* CONFIG_NET_NS */
 
 #define INIT_NET_NS(net_ns)
@@ -141,6 +143,11 @@
 
 #define net_ns_hash(ns)(0)
 
+static inline int net_ns_ioctl(unsigned int cmd, void __user *arg)
+{
+   return -ENOSYS;
+}
+
 #endif /* !CONFIG_NET_NS */
 
 #endif /* _LINUX_NET_NAMESPACE_H */
Index: 2.6.20-rc4-mm1/net/core/net_namespace.c
===
--- 2.6.20-rc4-mm1.orig/net/core/net_namespace.c
+++ 2.6.20-rc4-mm1/net/core/net_namespace.c
@@ -10,7 +10,9 @@
 #include linux/nsproxy.h
 #include linux/net_namespace.h
 #include linux/net.h
+#include linux/in.h
 #include linux/netdevice.h
+#include linux/inetdevice.h
 #include net/ip_fib.h
 
 struct net_namespace init_net_ns = {
@@ -123,6 +125,33 @@
return err;
 }
 
+/*
+ * The function will move the ifaddr to the l2 network namespace
+ * parent.
+ * @net_ns: the related network namespace
+ */
+static void release_ifa_to_parent(const struct net_namespace* net_ns)
+{
+   struct net_device *dev;
+   struct in_device *in_dev;
+
+   read_lock(dev_base_lock);
+   rcu_read_lock();
+   for (dev = dev_base; dev; dev = dev-next) {
+   in_dev = __in_dev_get_rcu(dev);
+   if (!in_dev)
+   continue;
+
+   for_ifa(in_dev) {
+   if (ifa-ifa_net_ns != net_ns)
+   continue;
+   ifa-ifa_net_ns = net_ns-parent;
+   } endfor_ifa(in_dev);
+   }
+   read_unlock(dev_base_lock);
+   rcu_read_unlock();
+}
+
 void free_net_ns(struct kref *kref)
 {
struct net_namespace *ns;
@@ -139,12 +168,99 @@
}
}
 
-   if (ns-level == NET_NS_LEVEL3)
+   if (ns-level == NET_NS_LEVEL3) {
+   release_ifa_to_parent(ns);
put_net_ns(ns-parent);
+   }
 
printk(KERN_DEBUG NET_NS: net namespace %p destroyed\n, ns);
kfree(ns);
 }
 EXPORT_SYMBOL_GPL(free_net_ns);
 
+/*
+ * This function allows to assign an IP address from a l2 network
+ * namespace to one of his l3 child or to release from an l3 network
+ * namespace to his l2 network namespace parent.
+ * @cmd: a push / pull command
+ * @arg: an userspace buffer containing an ifreq structure
+ * Returns:
+ * - EPERM : if caller has no CAP_NET_ADMIN capabilities or the
+ *   current level of network namespace is not layer 2
+ * - EFAULT : if arg is an invalid buffer
+ * - EADDRNOTAVAIL : if the specified ifaddr does not exists
+ * - EINVAL : if cmd is unknown
+ * - zero on success
+ */
+int net_ns_ioctl(unsigned int cmd, void __user *arg)
+{
+   struct ifreq ifr;
+   struct sockaddr_in *sin = (struct sockaddr_in *)ifr.ifr_addr;
+   struct net_namespace *net_ns = current_net_ns;
+   

[patch 11/12] net namespace : debugfs - add net_ns debugfs

2007-01-19 Thread dlezcano
From: Daniel Lezcano [EMAIL PROTECTED]

For debug purpose only, this is not intended to be included. 
Add /sys/kernel/debug/net_ns.

Creation of network namespace:

echo level  /sys/kernel/debug/net_ns/start

Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]

---
 fs/debugfs/Makefile |2 
 fs/debugfs/net_ns.c |  335 
 net/Kconfig |4 
 3 files changed, 340 insertions(+), 1 deletion(-)

Index: 2.6.20-rc4-mm1/fs/debugfs/Makefile
===
--- 2.6.20-rc4-mm1.orig/fs/debugfs/Makefile
+++ 2.6.20-rc4-mm1/fs/debugfs/Makefile
@@ -1,4 +1,4 @@
 debugfs-objs   := inode.o file.o
 
 obj-$(CONFIG_DEBUG_FS) += debugfs.o
-
+obj-$(CONFIG_NET_NS_DEBUG) += net_ns.o
Index: 2.6.20-rc4-mm1/fs/debugfs/net_ns.c
===
--- /dev/null
+++ 2.6.20-rc4-mm1/fs/debugfs/net_ns.c
@@ -0,0 +1,335 @@
+/*
+ *  net_ns.c - adds a net_ns/ directory to debug NET namespaces
+ *
+ *  Author: Daniel Lezcano [EMAIL PROTECTED]
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation, version 2 of the
+ * License.
+ */
+
+#include linux/module.h
+#include linux/kernel.h
+#include linux/pagemap.h
+#include linux/debugfs.h
+#include linux/sched.h
+#include linux/netdevice.h
+#include linux/inetdevice.h
+#include linux/syscalls.h
+#include linux/net_namespace.h
+#include linux/rtnetlink.h
+
+static struct dentry *net_ns_dentry;
+static struct dentry *net_ns_dentry_dev;
+static struct dentry *net_ns_dentry_start;
+static struct dentry *net_ns_dentry_info;
+
+static ssize_t net_ns_dev_read_file(struct file *file, char __user *user_buf,
+   size_t count, loff_t *ppos)
+{
+   return 0;
+}
+
+static ssize_t net_ns_dev_write_file(struct file *file,
+const char __user *user_buf,
+size_t count, loff_t *ppos)
+{
+   return 0;
+}
+
+static int net_ns_dev_open_file(struct inode *inode, struct file *file)
+{
+   return 0;
+}
+
+static int net_ns_start_open_file(struct inode *inode, struct file *file)
+{
+   return 0;
+}
+
+static ssize_t net_ns_start_read_file(struct file *file, char __user *user_buf,
+ size_t count, loff_t *ppos)
+{
+   return 0;
+}
+
+static ssize_t net_ns_start_write_file(struct file *file,
+  const char __user *user_buf,
+  size_t count, loff_t *ppos)
+{
+   int err;
+   size_t len;
+   const char __user *p;
+   char c;
+   unsigned long flags;
+   struct net_namespace *net, *new_net;
+   struct nsproxy *new_nsproxy = NULL, *old_nsproxy = NULL;
+
+   if (current_net_ns != init_net_ns)
+   return -EBUSY;
+
+   len = 0;
+   p = user_buf;
+   while (len  count) {
+   if (get_user(c, p++))
+   return -EFAULT;
+   if (c == 0 || c == '\n')
+   break;
+   len++;
+   }
+
+   if (len  1)
+   return -EINVAL;
+
+   if (copy_from_user(c, user_buf, sizeof(c)))
+   return -EFAULT;
+
+   if (c != '2'  c != '3')
+   return -EINVAL;
+
+   flags = (c=='2'?CLONE_NEWNET2:CLONE_NEWNET3);
+   err = unshare_net_ns(flags, new_net);
+   if (err)
+   return err;
+
+   old_nsproxy = current-nsproxy;
+   new_nsproxy = dup_namespaces(old_nsproxy);
+
+   if (!new_nsproxy) {
+   put_net_ns(new_net);
+   task_unlock(current);
+   return -ENOMEM;
+   }
+
+   task_lock(current);
+
+   if (new_nsproxy) {
+   current-nsproxy = new_nsproxy;
+   new_nsproxy = old_nsproxy;
+   }
+
+   net = current-nsproxy-net_ns;
+   current-nsproxy-net_ns = new_net;
+   pop_net_ns(new_net);
+   new_net = net;
+
+   task_unlock(current);
+
+   put_nsproxy(new_nsproxy);
+   put_net_ns(new_net);
+
+   return count;
+}
+
+static int net_ns_info_open_file(struct inode *inode, struct file *file)
+{
+   return 0;
+}
+
+static ssize_t net_ns_info_read_file(struct file *file, char __user *user_buf,
+size_t count, loff_t *ppos)
+{
+   const unsigned int length = 256;
+   size_t len;
+   char buff[length];
+   char *level;
+   struct net_namespace *net_ns = current_net_ns;
+   struct nsproxy *ns = current-nsproxy;
+
+   if (*ppos  0)
+   return -EINVAL;
+   if (*ppos = count)
+   return 0;
+   if (!count)
+   return 0;
+
+   switch (net_ns-level) {
+   case NET_NS_LEVEL2:
+   level = layer 2;
+   

[RFC] [patch 4/6] [Network namespace] Network inet devices isolation

2006-06-09 Thread dlezcano
The network isolation relies on the fact that an application can not
use IP addresses not belonging to the container in which it's
running. This patch isolates the inet device level by adding a
structure namespace pointer in the structure in_ifaddr. When an ip
address is set inside a network namespace, the structure in_ifaddr is
filled with the current namespace pointer. There is a special case
with loopback address which belongs to all the namespaces and its
particularity is to have the network namespace pointer set to NULL.
This patch isolates the ifconfig, ip addr commands, so when an IP
address is set, this one it is not visible by another network
namespaces.

Replace-Subject: [Network namespace] Network inet devices isolation 
Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] 
--
 include/linux/inetdevice.h |1 +
 net/ipv4/devinet.c |   28 +++-
 2 files changed, 28 insertions(+), 1 deletion(-)

Index: 2.6-mm/include/linux/inetdevice.h
===
--- 2.6-mm.orig/include/linux/inetdevice.h
+++ 2.6-mm/include/linux/inetdevice.h
@@ -99,6 +99,7 @@
unsigned char   ifa_flags;
unsigned char   ifa_prefixlen;
charifa_label[IFNAMSIZ];
+   struct net_namespace*ifa_net_ns;
 };
 
 extern int register_inetaddr_notifier(struct notifier_block *nb);
Index: 2.6-mm/net/ipv4/devinet.c
===
--- 2.6-mm.orig/net/ipv4/devinet.c
+++ 2.6-mm/net/ipv4/devinet.c
@@ -54,6 +54,7 @@
 #include linux/notifier.h
 #include linux/inetdevice.h
 #include linux/igmp.h
+#include linux/net_ns.h
 #ifdef CONFIG_SYSCTL
 #include linux/sysctl.h
 #endif
@@ -257,6 +258,7 @@
 
if (!(ifa-ifa_flags  IFA_F_SECONDARY) ||
ifa1-ifa_mask != ifa-ifa_mask ||
+   ifa-ifa_net_ns != net_ns() ||
!inet_ifa_match(ifa1-ifa_address, ifa)) {
ifap1 = ifa-ifa_next;
prev_prom = ifa;
@@ -317,6 +319,8 @@
if (destroy) {
inet_free_ifa(ifa1);
 
+   put_net_ns(ifa1-ifa_net_ns);
+
if (!in_dev-ifa_list)
inetdev_destroy(in_dev);
}
@@ -343,6 +347,7 @@
ifa-ifa_scope = ifa1-ifa_scope)
last_primary = ifa1-ifa_next;
if (ifa1-ifa_mask == ifa-ifa_mask 
+   ifa1-ifa_net_ns == ifa-ifa_net_ns 
inet_ifa_match(ifa1-ifa_address, ifa)) {
if (ifa1-ifa_local == ifa-ifa_local) {
inet_free_ifa(ifa);
@@ -437,6 +442,8 @@
 
for (ifap = in_dev-ifa_list; (ifa = *ifap) != NULL;
 ifap = ifa-ifa_next) {
+   if (ifa-ifa_net_ns != net_ns())
+   continue;
if ((rta[IFA_LOCAL - 1] 
 memcmp(RTA_DATA(rta[IFA_LOCAL - 1]),
ifa-ifa_local, 4)) ||
@@ -497,6 +504,9 @@
ifa-ifa_scope = ifm-ifa_scope;
in_dev_hold(in_dev);
ifa-ifa_dev   = in_dev;
+   ifa-ifa_net_ns = net_ns();
+   get_net_ns(net_ns());
+
if (rta[IFA_LABEL - 1])
rtattr_strlcpy(ifa-ifa_label, rta[IFA_LABEL - 1], IFNAMSIZ);
else
@@ -631,10 +641,15 @@
for (ifap = in_dev-ifa_list; (ifa = *ifap) != NULL;
 ifap = ifa-ifa_next)
if (!strcmp(ifr.ifr_name, ifa-ifa_label))
-   break;
+   if (!ifa-ifa_net_ns ||
+   ifa-ifa_net_ns == net_ns())
+   break;
}
}
 
+   if (ifa  ifa-ifa_net_ns  ifa-ifa_net_ns != net_ns())
+   goto done;
+
ret = -EADDRNOTAVAIL;
if (!ifa  cmd != SIOCSIFADDR  cmd != SIOCSIFFLAGS)
goto done;
@@ -678,6 +693,12 @@
ret = -ENOBUFS;
if ((ifa = inet_alloc_ifa()) == NULL)
break;
+   if (!LOOPBACK(sin-sin_addr.s_addr)) {
+   ifa-ifa_net_ns = net_ns();
+   get_net_ns(net_ns());
+   } else
+   ifa-ifa_net_ns = NULL;
+
if (colon)
memcpy(ifa-ifa_label, ifr.ifr_name, IFNAMSIZ);
else
@@ -782,6 +803,8 @@
goto out;
 
for (; ifa; ifa = ifa-ifa_next) {
+   if (ifa-ifa_net_ns  ifa-ifa_net_ns != net_ns())
+   continue;
if (!buf) {
done += sizeof(ifr);
continue;
@@ -1012,6 

[RFC] [patch 2/6] [Network namespace] Network device sharing by view

2006-06-09 Thread dlezcano
Adds to the network namespace a device list view. This view is emptied
when the unshare is done. The view is filled/emptied by a set of
function which can be called by an external module.

Replace-Subject: [Network namespace] Network device sharing by view
Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] 
--
 include/linux/net_ns.h |2 
 include/linux/net_ns_dev.h |   32 +++
 init/version.c |4 
 net/core/Makefile  |2 
 net/core/net_ns_dev.c  |  205 +
 net/net_ns.c   |6 +
 6 files changed, 250 insertions(+), 1 deletion(-)

Index: 2.6-mm/include/linux/net_ns_dev.h
===
--- /dev/null
+++ 2.6-mm/include/linux/net_ns_dev.h
@@ -0,0 +1,32 @@
+#ifndef _LINUX_NET_NS_DEV_H
+#define _LINUX_NET_NS_DEV_H
+
+struct net_device;
+
+struct net_ns_dev {
+   struct list_head list;
+   struct net_device *dev;
+};
+
+struct net_ns_dev_list {
+   struct list_head list;
+   rwlock_t lock;
+};
+
+extern int net_ns_dev_unregister(struct net_device *dev,
+struct net_ns_dev_list *devlist);
+
+extern int net_ns_dev_register(struct net_device *dev,
+  struct net_ns_dev_list *devlist);
+
+extern struct net_device *net_ns_dev_find_by_name(const char *devname,
+ struct net_ns_dev_list 
*devlist);
+extern int net_ns_dev_remove(const char *devname,
+struct net_ns_dev_list *devlist);
+
+extern int net_ns_dev_add(const char *devname,
+ struct net_ns_dev_list *devlist);
+
+extern int free_net_ns_dev(struct net_ns_dev_list *devlist);
+
+#endif
Index: 2.6-mm/include/linux/net_ns.h
===
--- 2.6-mm.orig/include/linux/net_ns.h
+++ 2.6-mm/include/linux/net_ns.h
@@ -4,9 +4,11 @@
 #include linux/kref.h
 #include linux/sched.h
 #include linux/nsproxy.h
+#include linux/net_ns_dev.h
 
 struct net_namespace {
struct kref kref;
+   struct net_ns_dev_list dev_list;
 };
 
 extern struct net_namespace init_net_ns;
Index: 2.6-mm/net/core/net_ns_dev.c
===
--- /dev/null
+++ 2.6-mm/net/core/net_ns_dev.c
@@ -0,0 +1,205 @@
+/*
+ *  net_ns_dev.c - adds namespace netwok device view
+ *
+ *  Copyright (C) 2006 IBM
+ *
+ *  Author: Daniel Lezcano [EMAIL PROTECTED]
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation, version 2 of the
+ * License.
+ */
+#include linux/list.h
+#include linux/spinlock.h
+#include linux/netdevice.h
+#include linux/net_ns_dev.h
+
+int free_net_ns_dev(struct net_ns_dev_list *devlist)
+{
+   struct list_head *l, *next;
+   struct net_ns_dev *db;
+   struct net_device *dev;
+
+   write_lock(devlist-lock);
+   list_for_each_safe(l, next, devlist-list) {
+   db = list_entry(l, struct net_ns_dev, list);
+   dev = db-dev;
+   list_del(db-list);
+   dev_put(dev);
+   kfree(db);
+   }
+   write_unlock(devlist-lock);
+
+   return 0;
+}
+
+/*
+ * Remove a device to the namespace network devices list
+ * when registered from a namespace
+ * @dev : network device
+ * @dev_list: network namespace devices
+ * Return ENODEV if the device does not exist,
+ */
+int net_ns_dev_unregister(struct net_device *dev,
+ struct net_ns_dev_list *devlist)
+{
+   struct net_ns_dev *db;
+   struct list_head *l;
+   int ret = -ENODEV;
+
+   write_lock(devlist-lock);
+   list_for_each(l, devlist-list) {
+   db = list_entry(l, struct net_ns_dev, list);
+   if (dev != db-dev)
+   continue;
+
+   list_del(db-list);
+   dev_put(dev);
+   kfree(db);
+   ret = 0;
+   break;
+   }
+   write_unlock(devlist-lock);
+   return ret;
+}
+
+EXPORT_SYMBOL_GPL(net_ns_dev_unregister);
+
+/*
+ * Add a device to the namespace network devices list
+ * when registered from a namespace
+ * @dev : network device
+ * @dev_list: network namespace devices
+ * Return ENOMEM if allocation fails, 0 on success
+ */
+int net_ns_dev_register(struct net_device *dev,
+   struct net_ns_dev_list *devlist)
+{
+   struct net_ns_dev *db;
+
+   db = kmalloc(sizeof(*db), GFP_KERNEL);
+   if (!db)
+   return -ENOMEM;
+
+   write_lock(devlist-lock);
+   dev_hold(dev);
+   db-dev = dev;
+   list_add_tail(db-list, devlist-list);
+   write_unlock(devlist-lock);
+
+   return 0;
+}
+
+EXPORT_SYMBOL_GPL(net_ns_dev_register);
+
+/*
+ * Add a device to the namespace network devices list
+ * 

[RFC] [patch 3/6] [Network namespace] Network devices isolation

2006-06-09 Thread dlezcano
The dev list view is filled and used from here. The dev_base_list has
been replaced to the dev list view and devices can be accessed only if
the view has the device in its list. All calls from the userspace,
ioctls, netlinks and procfs, will use the network devices view instead
of the global network device list.

Replace-Subject: [Network namespace] Network devices isolation 
Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] 
--
 net/core/dev.c   |  147 ++-
 net/core/rtnetlink.c |   21 +--
 2 files changed, 126 insertions(+), 42 deletions(-)

Index: 2.6-mm/net/core/dev.c
===
--- 2.6-mm.orig/net/core/dev.c
+++ 2.6-mm/net/core/dev.c
@@ -115,6 +115,7 @@
 #include net/iw_handler.h
 #include asm/current.h
 #include linux/audit.h
+#include linux/net_ns.h
 #include linux/dmaengine.h
 
 /*
@@ -474,13 +475,16 @@
 
 struct net_device *__dev_get_by_name(const char *name)
 {
-   struct hlist_node *p;
+   struct net_ns_dev_list *dev_list = (net_ns()-dev_list);
+   struct list_head *l, *list = dev_list-list;
+   struct net_ns_dev *db;
+   struct net_device *dev;
 
-   hlist_for_each(p, dev_name_hash(name)) {
-   struct net_device *dev
-   = hlist_entry(p, struct net_device, name_hlist);
+   list_for_each(l, list) {
+   db = list_entry(l, struct net_ns_dev, list);
+   dev = db-dev;
if (!strncmp(dev-name, name, IFNAMSIZ))
-   return dev;
+   return dev;
}
return NULL;
 }
@@ -498,13 +502,14 @@
 
 struct net_device *dev_get_by_name(const char *name)
 {
+   struct net_ns_dev_list *dev_list = (net_ns()-dev_list);
struct net_device *dev;
 
-   read_lock(dev_base_lock);
+   read_lock(dev_list-lock);
dev = __dev_get_by_name(name);
if (dev)
dev_hold(dev);
-   read_unlock(dev_base_lock);
+   read_unlock(dev_list-lock);
return dev;
 }
 
@@ -521,11 +526,14 @@
 
 struct net_device *__dev_get_by_index(int ifindex)
 {
-   struct hlist_node *p;
+   struct net_ns_dev_list *dev_list = (net_ns()-dev_list);
+   struct list_head *l, *list = dev_list-list;
+   struct net_ns_dev *db;
+   struct net_device *dev;
 
-   hlist_for_each(p, dev_index_hash(ifindex)) {
-   struct net_device *dev
-   = hlist_entry(p, struct net_device, index_hlist);
+   list_for_each(l, list) {
+   db = list_entry(l, struct net_ns_dev, list);
+   dev = db-dev;
if (dev-ifindex == ifindex)
return dev;
}
@@ -545,13 +553,14 @@
 
 struct net_device *dev_get_by_index(int ifindex)
 {
+   struct net_ns_dev_list *dev_list = (net_ns()-dev_list);
struct net_device *dev;
 
-   read_lock(dev_base_lock);
+   read_lock(dev_list-lock);
dev = __dev_get_by_index(ifindex);
if (dev)
dev_hold(dev);
-   read_unlock(dev_base_lock);
+   read_unlock(dev_list-lock);
return dev;
 }
 
@@ -571,14 +580,24 @@
 
 struct net_device *dev_getbyhwaddr(unsigned short type, char *ha)
 {
-   struct net_device *dev;
+   struct net_ns_dev_list *dev_list = (net_ns()-dev_list);
+   struct list_head *l, *list = dev_list-list;
+   struct net_ns_dev *db;
+   struct net_device *dev = NULL;
 
ASSERT_RTNL();
 
-   for (dev = dev_base; dev; dev = dev-next)
+   read_lock(dev_list-lock);
+   list_for_each(l, list) {
+   db = list_entry(l, struct net_ns_dev, list);
+   dev = db-dev;
if (dev-type == type 
!memcmp(dev-dev_addr, ha, dev-addr_len))
-   break;
+   goto out;
+   }
+   dev = NULL;
+out:
+   read_unlock(dev_list-lock);
return dev;
 }
 
@@ -586,15 +605,25 @@
 
 struct net_device *dev_getfirstbyhwtype(unsigned short type)
 {
+   struct net_ns_dev_list *dev_list = (net_ns()-dev_list);
+   struct list_head *l, *list = dev_list-list;
+   struct net_ns_dev *db;
struct net_device *dev;
 
rtnl_lock();
-   for (dev = dev_base; dev; dev = dev-next) {
+
+   read_lock(dev_list-lock);
+   list_for_each(l, list) {
+   db = list_entry(l, struct net_ns_dev, list);
+   dev = db-dev;
if (dev-type == type) {
dev_hold(dev);
-   break;
+   goto out;
}
}
+   dev = NULL;
+out:
+   read_unlock(dev_list-lock);
rtnl_unlock();
return dev;
 }
@@ -614,16 +643,23 @@
 
 struct net_device * dev_get_by_flags(unsigned short if_flags, unsigned short 
mask)
 {
+   struct net_ns_dev_list *dev_list = (net_ns()-dev_list);
+   struct list_head *l, *list = 

[RFC] [patch 6/6] [Network namespace] Network namespace debugfs

2006-06-09 Thread dlezcano
This patch is for testing purpose. It allows to read which network
devices are accessible and to add a network device to the view.
This RFC hack is purely for discussing the best way to do that.

After unsharing with CLONE_NEWNET flag:
--
 To see which devices are accessible:
 cat /sys/kernel/debug/net_ns/dev

 To add a device:
 echo eth1  /sys/kernel/debug/net_ns/dev

This functionnality is intended to be implemented in an higher level
container configuration.

Replace-Subject: [Network namespace] Network namespace debugfs
Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] 
--
 fs/debugfs/Makefile |2 
 fs/debugfs/net_ns.c |  141 
 net/Kconfig |4 +
 3 files changed, 146 insertions(+), 1 deletion(-)

Index: 2.6-mm/fs/debugfs/Makefile
===
--- 2.6-mm.orig/fs/debugfs/Makefile
+++ 2.6-mm/fs/debugfs/Makefile
@@ -1,4 +1,4 @@
 debugfs-objs   := inode.o file.o
 
 obj-$(CONFIG_DEBUG_FS) += debugfs.o
-
+obj-$(CONFIG_NET_NS_DEBUG) += net_ns.o
Index: 2.6-mm/fs/debugfs/net_ns.c
===
--- /dev/null
+++ 2.6-mm/fs/debugfs/net_ns.c
@@ -0,0 +1,141 @@
+/*
+ *  net_ns.c - adds a net_ns/ directory to debug NET namespaces
+ *
+ *  Copyright (C) 2006 IBM
+ *
+ *  Author: Daniel Lezcano [EMAIL PROTECTED]
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation, version 2 of the
+ * License.
+ */
+
+#include linux/module.h
+#include linux/kernel.h
+#include linux/pagemap.h
+#include linux/debugfs.h
+#include linux/sched.h
+#include linux/netdevice.h
+#include linux/net_ns.h
+
+static struct dentry *net_ns_dentry;
+static struct dentry *net_ns_dentry_dev;
+
+static ssize_t net_ns_dev_read_file(struct file *file, char __user *user_buf,
+   size_t count, loff_t *ppos)
+{
+   size_t len;
+   char *buf;
+   struct net_ns_dev_list *devlist = (net_ns()-dev_list);
+   struct net_ns_dev *db;
+   struct net_device *dev;
+   struct list_head *l;
+
+   if (*ppos  0)
+   return -EINVAL;
+   if (*ppos = count)
+   return 0;
+
+   /* It's for debug, everything should fit */
+   buf = kmalloc(4096, GFP_KERNEL);
+   if (!buf)
+   return -ENOMEM;
+   buf[0] = '\0';
+
+   read_lock(devlist-lock);
+   list_for_each(l, devlist-list) {
+   db = list_entry(l, struct net_ns_dev, list);
+   dev = db-dev;
+   strcat(buf,dev-name);
+   strcat(buf,\n);
+   }
+   read_unlock(devlist-lock);
+
+   len = strlen(buf);
+
+   if (len  count)
+   len = count;
+
+   if (copy_to_user(user_buf, buf, len)) {
+   kfree(buf);
+   return -EFAULT;
+   }
+
+   *ppos += count;
+   kfree(buf);
+
+   return count;
+}
+
+static ssize_t net_ns_dev_write_file(struct file *file,
+const char __user *user_buf,
+size_t count, loff_t *ppos)
+{
+   int ret;
+   size_t len;
+   const char __user *p;
+   char c;
+   char devname[IFNAMSIZ];
+   struct net_ns_dev_list *dev_list = (net_ns()-dev_list);
+
+   len = 0;
+   p = user_buf;
+   while (len  count) {
+   if (get_user(c, p++))
+   return -EFAULT;
+   if (c == 0 || c == '\n')
+   break;
+   len++;
+   }
+
+   if (len = IFNAMSIZ)
+   return -EINVAL;
+
+   if (copy_from_user(devname, user_buf, len))
+   return -EFAULT;
+
+   devname[len] = '\0';
+
+   ret = net_ns_dev_add(devname, dev_list);
+   if (ret)
+   return ret;
+
+   *ppos += count;
+   return count;
+}
+
+static int net_ns_dev_open_file(struct inode *inode, struct file *file)
+{
+   return 0;
+}
+
+static struct file_operations net_ns_dev_fops = {
+   .read = net_ns_dev_read_file,
+   .write =net_ns_dev_write_file,
+   .open = net_ns_dev_open_file,
+};
+
+static int __init net_ns_init(void)
+{
+   net_ns_dentry = debugfs_create_dir(net_ns, NULL);
+
+   net_ns_dentry_dev = debugfs_create_file(dev, 0666,
+   net_ns_dentry,
+   NULL,
+   net_ns_dev_fops);
+   return 0;
+}
+
+static void __exit net_ns_exit(void)
+{
+   debugfs_remove(net_ns_dentry_dev);
+   debugfs_remove(net_ns_dentry);
+}
+
+module_init(net_ns_init);
+module_exit(net_ns_exit);
+
+MODULE_DESCRIPTION(NET namespace debugfs);
+MODULE_AUTHOR(Daniel Lezcano [EMAIL 

[RFC] [patch 5/6] [Network namespace] ipv4 isolation

2006-06-09 Thread dlezcano
This patch partially isolates ipv4 by adding the network namespace
structure in the structure sock, bind bucket and skbuf. When a socket
is created, the pointer to the network namespace is stored in the
struct sock and the socket belongs to the namespace by this way. That
allows to identify sockets related to a namespace for lookup and
procfs. 

The lookup is extended with a network namespace pointer, in
order to identify listen points binded to the same port. That allows
to have several applications binded to INADDR_ANY:port in different
network namespace without conflicting. The bind is checked against
port and network namespace.

When an outgoing packet has the loopback destination addres, the
skbuff is filled with the network namespace. So the loopback packets
never go outside the namespace. This approach facilitate the migration
of loopback because identification is done by network namespace and
not by address. The loopback has been benchmarked by tbench and the
overhead is roughly 1.5 %

Replace-Subject: [Network namespace] ipv4 isolation
Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] 
--
 include/linux/skbuff.h   |2 ++
 include/net/inet_hashtables.h|   34 --
 include/net/inet_timewait_sock.h |1 +
 include/net/sock.h   |4 
 net/dccp/ipv4.c  |7 ---
 net/ipv4/af_inet.c   |2 ++
 net/ipv4/inet_connection_sock.c  |3 ++-
 net/ipv4/inet_diag.c |3 ++-
 net/ipv4/inet_hashtables.c   |6 +-
 net/ipv4/inet_timewait_sock.c|1 +
 net/ipv4/ip_output.c |4 
 net/ipv4/tcp_ipv4.c  |   25 -
 net/ipv4/udp.c   |7 +--
 13 files changed, 72 insertions(+), 27 deletions(-)

Index: 2.6-mm/include/linux/skbuff.h
===
--- 2.6-mm.orig/include/linux/skbuff.h
+++ 2.6-mm/include/linux/skbuff.h
@@ -27,6 +27,7 @@
 #include linux/poll.h
 #include linux/net.h
 #include linux/textsearch.h
+#include linux/net_ns.h
 #include net/checksum.h
 #include linux/dmaengine.h
 
@@ -301,6 +302,7 @@
*data,
*tail,
*end;
+   struct net_namespace*net_ns;
 };
 
 #ifdef __KERNEL__
Index: 2.6-mm/include/net/inet_hashtables.h
===
--- 2.6-mm.orig/include/net/inet_hashtables.h
+++ 2.6-mm/include/net/inet_hashtables.h
@@ -23,6 +23,8 @@
 #include linux/spinlock.h
 #include linux/types.h
 #include linux/wait.h
+#include linux/in.h
+#include linux/net_ns.h
 
 #include net/inet_connection_sock.h
 #include net/inet_sock.h
@@ -78,6 +80,7 @@
signed shortfastreuse;
struct hlist_node   node;
struct hlist_head   owners;
+   struct net_namespace*net_ns;
 };
 
 #define inet_bind_bucket_for_each(tb, node, head) \
@@ -274,13 +277,15 @@
 extern struct sock *__inet_lookup_listener(const struct hlist_head *head,
   const u32 daddr,
   const unsigned short hnum,
-  const int dif);
+  const int dif,
+  const struct net_namespace *net_ns);
 
 /* Optimize the common listener case. */
 static inline struct sock *
inet_lookup_listener(struct inet_hashinfo *hashinfo,
 const u32 daddr,
-const unsigned short hnum, const int dif)
+const unsigned short hnum, const int dif,
+const struct net_namespace *net_ns)
 {
struct sock *sk = NULL;
const struct hlist_head *head;
@@ -294,8 +299,9 @@
(!inet-rcv_saddr || inet-rcv_saddr == daddr) 
(sk-sk_family == PF_INET || !ipv6_only_sock(sk)) 
!sk-sk_bound_dev_if)
-   goto sherry_cache;
-   sk = __inet_lookup_listener(head, daddr, hnum, dif);
+   if (sk-sk_net_ns == net_ns  LOOPBACK(daddr))
+   goto sherry_cache;
+   sk = __inet_lookup_listener(head, daddr, hnum, dif, net_ns);
}
if (sk) {
 sherry_cache:
@@ -358,7 +364,8 @@
__inet_lookup_established(struct inet_hashinfo *hashinfo,
  const u32 saddr, const u16 sport,
  const u32 daddr, const u16 hnum,
- const int dif)
+ const int dif,
+ const struct net_namespace *net_ns)
 {
INET_ADDR_COOKIE(acookie, saddr, daddr)
const __u32 ports = INET_COMBINED_PORTS(sport, hnum);
@@ -373,12 

[RFC] [patch 1/6] [Network namespace] Network namespace structure

2006-06-09 Thread dlezcano
This patch adds to the nsproxy the network namespace and a set of
functions to unshare it. The network namespace structure should be
filled later with the identified network ressources needed for more
isolation.

Replace-Subject: [Network namespace] Network namespace structure
Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] 
--
 include/linux/init_task.h |2 
 include/linux/net_ns.h|   59 
 include/linux/nsproxy.h   |2 
 include/linux/sched.h |1 
 init/version.c|8 +++
 kernel/fork.c |   24 +--
 kernel/nsproxy.c  |   38 +++---
 net/Kconfig   |9 
 net/Makefile  |1 
 net/net_ns.c  |   96 ++
 10 files changed, 222 insertions(+), 18 deletions(-)

Index: 2.6-mm/include/linux/net_ns.h
===
--- /dev/null
+++ 2.6-mm/include/linux/net_ns.h
@@ -0,0 +1,59 @@
+#ifndef _LINUX_NET_NS_H
+#define _LINUX_NET_NS_H
+
+#include linux/kref.h
+#include linux/sched.h
+#include linux/nsproxy.h
+
+struct net_namespace {
+   struct kref kref;
+};
+
+extern struct net_namespace init_net_ns;
+
+#ifdef CONFIG_NET_NS
+
+extern int unshare_network(unsigned long unshare_flags,
+  struct net_namespace **new_net);
+
+extern int copy_network(int flags, struct task_struct *tsk);
+
+static inline void get_net_ns(struct net_namespace *ns)
+{
+   kref_get(ns-kref);
+}
+
+void free_net_ns(struct kref *kref);
+
+static inline void put_net_ns(struct net_namespace *ns)
+{
+   kref_put(ns-kref, free_net_ns);
+}
+
+static inline void exit_network(struct task_struct *p)
+{
+   struct net_namespace *net_ns = p-nsproxy-net_ns;
+   if (net_ns)
+   put_net_ns(net_ns);
+}
+#else /* !CONFIG_NET_NS */
+static inline int unshare_network(unsigned long unshare_flags,
+ struct net_namespace **new_net)
+{
+   return -EINVAL;
+}
+static inline int copy_network(int flags, struct task_struct *tsk)
+{
+   return 0;
+}
+static inline void get_net_ns(struct net_namespace *ns) {}
+static inline void put_net_ns(struct net_namespace *ns) {}
+static inline void exit_network(struct task_struct *p) {}
+#endif /* CONFIG_NET_NS */
+
+static inline struct net_namespace *net_ns(void)
+{
+   return current-nsproxy-net_ns;
+}
+
+#endif
Index: 2.6-mm/net/net_ns.c
===
--- /dev/null
+++ 2.6-mm/net/net_ns.c
@@ -0,0 +1,96 @@
+/*
+ *  net_ns.c - adds support for network namespace
+ *
+ *  Copyright (C) 2006 IBM
+ *
+ *  Author: Daniel Lezcano [EMAIL PROTECTED]
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation, version 2 of the
+ * License.
+ */
+
+#include linux/net_ns.h
+#include linux/module.h
+
+/*
+ * Clone a new ns copying an original, setting refcount to 1
+ * Cloned process will have
+ * @old_ns: namespace to clone
+ * Return NULL on error (failure to kmalloc), new ns otherwise
+ */
+struct net_namespace *clone_net_ns(struct net_namespace *old_ns)
+{
+   struct net_namespace *new_ns;
+
+   new_ns = kmalloc(sizeof(*new_ns), GFP_KERNEL);
+   if (!new_ns)
+   return NULL;
+   kref_init(new_ns-kref);
+   return new_ns;
+}
+
+/*
+ * unshare the current process' network namespace.
+ * called only in sys_unshare()
+ */
+int unshare_network(unsigned long unshare_flags,
+   struct net_namespace **new_net)
+{
+   if (!(unshare_flags  CLONE_NEWNET))
+   return 0;
+
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
+   *new_net = clone_net_ns(current-nsproxy-net_ns);
+   if (!*new_net)
+   return -ENOMEM;
+
+   return 0;
+}
+
+/*
+ * Copy task tsk's network namespace, or clone it if flags specifies
+ * CLONE_NEWNET.  In latter case, changes to the network ressources of
+ * this process won't be seen by parent, and vice versa.
+ */
+int copy_network(int flags, struct task_struct *tsk)
+{
+   struct net_namespace *old_ns = tsk-nsproxy-net_ns;
+   struct net_namespace *new_ns;
+   int err = 0;
+
+   if (!old_ns)
+   return 0;
+
+   get_net_ns(old_ns);
+
+   if (!(flags  CLONE_NEWNET))
+   return 0;
+
+   if (!capable(CAP_SYS_ADMIN)) {
+   err = -EPERM;
+   goto out;
+   }
+
+   new_ns = clone_net_ns(old_ns);
+   if (!new_ns) {
+   err = -ENOMEM;
+   goto out;
+   }
+   tsk-nsproxy-net_ns = new_ns;
+
+out:
+   put_net_ns(old_ns);
+   return err;
+}
+
+void free_net_ns(struct kref *kref)
+{
+   struct net_namespace *ns;
+
+   ns = container_of(kref, struct net_namespace, kref);
+   kfree(ns);

[RFC] [patch 0/6] [Network namespace] introduction

2006-06-09 Thread dlezcano
The following patches create a private network namespace for use
within containers. This is intended for use with system containers
like vserver, but might also be useful for restricting individual
applications' access to the network stack.

These patches isolate traffic inside the network namespace. The
network ressources, the incoming and the outgoing packets are
identified to be related to a namespace. 

It hides network resource not contained in the current namespace, but
still allows administration of the network with normal commands like
ifconfig.

It applies to the kernel version 2.6.17-rc6-mm1

It provides the following:
-
   - when an application unshares its network namespace, it looses its
 view of all network devices by default. The administrator can
 choose to make any devices to become visible again. The container
 then gains a view to the device but without the ip address
 configured on it. It is up to the container administrator to use
 ifconfig or ip command to setup a new ip address. This ip address
 is only visible inside the container.

   - the loopback is isolated inside the container and it is not
 possible to communicate between containers via the
 loopback. 

   - several containers can have an application bind to the same
 address:port without conflicting. 

What is for ?
-
   - security : an application can be bounded inside a container
 without interacting with the network used by another container

   - consolidation : several instance of the same application can be
 ran in different container because the network namespace allows
 to bind to the same addr:port

What could be done ?

- because the network ressources are related to a namespace, it is
  easy to identify them. That facilitate the implementation of the
  network migration

How to use ?

   - do unshare with the CLONE_NEWNET flag as root
   - do echo eth0  /sys/kernel/debug/net_ns/dev
   - use ifconfig or ip command to set a new ip address

What is missing ?
-
The routes are not yet isolated, that implies:

   - binding to another container's address is allowed

   - an outgoing packet which has an unset source address can
 potentially get another container's address

   - an incoming packet can be routed to the wrong container if there
 are several containers listening to the same addr:port

--
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html