Re: [Devel] [PATCH 1/5] net: Modify all rtnetlink methods to only work in the initial namespace
Eric W. Biederman wrote: Before I can enable rtnetlink to work in all network namespaces I need to be certain that something won't break. So this patch deliberately disables all of the rtnletlink methods in everything except the initial network namespace. After the methods have been audited this extra check can be disabled. [...] static int br_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb) { + struct net *net = skb-sk-sk_net; struct net_device *dev; int idx; I've read some code today greping 'init_net.loopback_dev' and found interesting non-trivial for me issue. Network namespace is extracted from the packet in two different ways in TCP. This is a socket for outgoing path and a device for incoming. Though, there are some places called uniformly both from incoming and outgoing path. Typical example is netfilters. They are called uniformly all around the code. The prototype is the following: static unsigned int reject6_target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, const struct xt_target *target, const void *targinfo); So, we are bound to the following options: - perform additional non-uniform hacks around to place 'struct net' into other and other structures like xt_target - add 7th parameter here and over - introduce an skb_net field in the 'struct sk_buff' making all code uniform, at least when we have an skb I think that this is not the last place with such a parameter list and we should make a decision at this point when the code in not mainline yet. As far as I understand, netfilters are not touched by the Eric and we can face some non-trivial problems there. So, if my point about uniformity is valid, this patchset looks wrong and should be re-worked :( Regards, Den - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Devel] [PATCH 1/5] net: Modify all rtnetlink methods to only work in the initial namespace
Denis V. Lunev wrote: Eric W. Biederman wrote: Before I can enable rtnetlink to work in all network namespaces I need to be certain that something won't break. So this patch deliberately disables all of the rtnletlink methods in everything except the initial network namespace. After the methods have been audited this extra check can be disabled. [...] static int br_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb) { + struct net *net = skb-sk-sk_net; struct net_device *dev; int idx; I've read some code today greping 'init_net.loopback_dev' and found interesting non-trivial for me issue. Network namespace is extracted from the packet in two different ways in TCP. This is a socket for outgoing path and a device for incoming. Though, there are some places called uniformly both from incoming and outgoing path. Typical example is netfilters. They are called uniformly all around the code. The prototype is the following: static unsigned int reject6_target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, const struct xt_target *target, const void *targinfo); Thanks Denis for auditing the code. As far as I see, struct net_device *in is NULL for outgoing traffic and struct net_device *out is NULL for ingress traffic. Except for the FORWARD rules where both are filled. If we are following network namespace semantic, we should not have two network devices belonging to two differents namespaces, right ? In this case, the following line of code should be sufficient to retrieve the network namespace, no ? struct net *net = in?in-nd_net:out-nd_net; So, we are bound to the following options: - perform additional non-uniform hacks around to place 'struct net' into other and other structures like xt_target - add 7th parameter here and over - introduce an skb_net field in the 'struct sk_buff' making all code uniform, at least when we have an skb I think that this is not the last place with such a parameter list and we should make a decision at this point when the code in not mainline yet. As far as I understand, netfilters are not touched by the Eric and we can face some non-trivial problems there. In Eric's git tree: http://git.kernel.org/?p=linux/kernel/git/ebiederm/linux-2.6-netns.git There are some modifications concerning net/ipv4/netfiler/iptable_filter.c and at the ipt_hook function, there is: struct net *net = (in?in:out)-nd_net; So, if my point about uniformity is valid, this patchset looks wrong and should be re-worked :( As Eric said, we want to build the network namespace step by step, taking care of not breaking the init network namespace. If you want to make iptables per namespace or catch problems before the code goes to Dave's tree, IMHO it will be more convenient to post to containers@ the patches against netns49, where the modifications will be in a network namespace big picture. Regards. -- Daniel - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Devel] [PATCH 1/5] net: Modify all rtnetlink methods to only work in the initial namespace
Daniel Lezcano wrote: struct net *net = in?in-nd_net:out-nd_net; So, we are bound to the following options: - perform additional non-uniform hacks around to place 'struct net' into other and other structures like xt_target - add 7th parameter here and over - introduce an skb_net field in the 'struct sk_buff' making all code uniform, at least when we have an skb I think that this is not the last place with such a parameter list and we should make a decision at this point when the code in not mainline yet. As far as I understand, netfilters are not touched by the Eric and we can face some non-trivial problems there. In Eric's git tree: http://git.kernel.org/?p=linux/kernel/git/ebiederm/linux-2.6-netns.git There are some modifications concerning net/ipv4/netfiler/iptable_filter.c and at the ipt_hook function, there is: struct net *net = (in?in:out)-nd_net; So, if my point about uniformity is valid, this patchset looks wrong and should be re-worked :( As Eric said, we want to build the network namespace step by step, taking care of not breaking the init network namespace. If you want to make iptables per namespace or catch problems before the code goes to Dave's tree, IMHO it will be more convenient to post to containers@ the patches against netns49, where the modifications will be in a network namespace big picture. my point is somewhat another. Yes, this is enough for that place. If so, I must scatter these checks all around in the netfilters code. Brr. In forward chain the situation is different for Layer3 switching. Let's assume that we have an OpenVZ scheme, where the packet flows from socket to device and after that from device to device via forwarding path. You can't call skb_orphan on namespace switching as this breaks UDP flow regulation. Virtual network device is fast while real Ethernet is slow, packets will be dropped on queue in real device. So, the situation with packet on send path with a socket from other namespace is possible :( I just pray for uniformity to concentrate on the code rather than on guesses on which path we are :( Regards, Den - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Devel] [PATCH 1/5] net: Modify all rtnetlink methods to only work in the initial namespace
Denis V. Lunev [EMAIL PROTECTED] writes: Eric W. Biederman wrote: Before I can enable rtnetlink to work in all network namespaces I need to be certain that something won't break. So this patch deliberately disables all of the rtnletlink methods in everything except the initial network namespace. After the methods have been audited this extra check can be disabled. [...] static int br_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb) { +struct net *net = skb-sk-sk_net; struct net_device *dev; int idx; I've read some code today greping 'init_net.loopback_dev' and found interesting non-trivial for me issue. Network namespace is extracted from the packet in two different ways in TCP. This is a socket for outgoing path and a device for incoming. Though, there are some places called uniformly both from incoming and outgoing path. Typical example is netfilters. They are called uniformly all around the code. The prototype is the following: static unsigned int reject6_target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, const struct xt_target *target, const void *targinfo); So, we are bound to the following options: - perform additional non-uniform hacks around to place 'struct net' into other and other structures like xt_target - add 7th parameter here and over - introduce an skb_net field in the 'struct sk_buff' making all code uniform, at least when we have an skb No. That bloats a sk_buff, changes the semantics of moving a skb around, and decreases performance (because we have to maintain the field on a fast path). There will not be a skb_net field. The entire concept of skb_net is a maintenance disaster. I think that this is not the last place with such a parameter list and we should make a decision at this point when the code in not mainline yet. Certainly that is what I have a proof of concept tree for. So we can see how these things look before we merge them. As far as I understand, netfilters are not touched by the Eric and we can face some non-trivial problems there. No. In my proof of concept tree I should have working per network namespace netfilter code. My intention was to just do enough to see what the impact would be so most of the netfilter code (in my tree) insists on running in the initial network namespace. But there are a few pieces that are fully converted. Please take a look. So, if my point about uniformity is valid, this patchset looks wrong and should be re-worked :( This patchset does need to get rebased on top of net-2.6.25 when it opens and hopefully your patchset to remove the unnecessary work in rtnl_unlock, and to really process netlink requests in process context. I see a need for the more fundamental change you seem to be advocating. Differentiating between the incoming and the outgoing code paths is something we already do permission checking, for locking, for sleeping, etc. Modifying the code requires reading and understanding it in context. That is the nature of code. This does make large patches going across the entire networking stack making something a network namespace parameter difficult, but it should not cause any problem for maintenance or other work on the code. As shown by the fact that even outside the tree rebasing my network namespace patches has not been all that difficult. So no I don't think uniformity, or beauty or elegance is what we are after right now. Trying to hard in that direction ultimately obfuscates the code. What we want is something that is simple, straight forward, and doesn't require you to be an expert in network namespaces to understand the code or the patches. In the particular case of the netfilter hooks we don't have a network namespace parameter laying around before we call NF_HOOK, and the idiom net = (in?in:out)-nd_net seems perfectly accurate so it seems reasonable to me to derive the network namespace that way in generic code. Although thinking about this. We know which hooks we are being called from so we may in fact actually know if which of in or out must be valid when we get to the netfilter hook. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html