Re: [PATCH 1/4] veth: move loopback logic to common location
Arnd Bergmann wrote: On Tuesday 24 November 2009, Patrick McHardy wrote: Eric W. Biederman wrote: I don't quite follow what you intend with dev_queue_xmit when the macvlan is in one namespace and the real physical device is in another. Are you mentioning that the packet classifier runs in the namespace where the primary device lives with packets from a different namespace? Exactly. And I think we should make sure that the namespace of the macvlan device can't (deliberately or accidentally) cause misclassification. This is independent of my series and a preexisting problem, right? Correct. Which fields do you think need to be reset to maintain namespace isolation for the outbound path in macvlan? In addition to those already handled, I'd say - priority: affects qdisc classification, may refer to classes of the old namespace - ipvs_property: might cause packets to incorrectly skip netfilter hooks - nf_trace: might trigger packet tracing - nf_bridge: contains references to network devices in the old NS, also indicates packet was bridged - iif: index is only valid in the originating namespace - tc_index: classification result, should only be set in the namespace of the classifier - tc_verd: RTTL etc. should begin at zero again - probably secmark. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] veth: move loopback logic to common location
Patrick McHardy ka...@trash.net writes: Arnd Bergmann wrote: On Tuesday 24 November 2009, Patrick McHardy wrote: Eric W. Biederman wrote: I don't quite follow what you intend with dev_queue_xmit when the macvlan is in one namespace and the real physical device is in another. Are you mentioning that the packet classifier runs in the namespace where the primary device lives with packets from a different namespace? Exactly. And I think we should make sure that the namespace of the macvlan device can't (deliberately or accidentally) cause misclassification. This is independent of my series and a preexisting problem, right? Correct. Which fields do you think need to be reset to maintain namespace isolation for the outbound path in macvlan? In addition to those already handled, I'd say - priority: affects qdisc classification, may refer to classes of the old namespace - ipvs_property: might cause packets to incorrectly skip netfilter hooks - nf_trace: might trigger packet tracing - nf_bridge: contains references to network devices in the old NS, also indicates packet was bridged - iif: index is only valid in the originating namespace - tc_index: classification result, should only be set in the namespace of the classifier - tc_verd: RTTL etc. should begin at zero again - probably secmark. Wow. I thought we were trying to reduce skbuff, where did all of those fields come from? Regarless that sounds like a good list to get stomped. Eric ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] veth: move loopback logic to common location
On Thursday 26 November 2009, Patrick McHardy wrote: In addition to those already handled, I'd say - priority: affects qdisc classification, may refer to classes of the old namespace - ipvs_property: might cause packets to incorrectly skip netfilter hooks - nf_trace: might trigger packet tracing - nf_bridge: contains references to network devices in the old NS, also indicates packet was bridged - iif: index is only valid in the originating namespace - probably secmark. ok - tc_index: classification result, should only be set in the namespace of the classifier - tc_verd: RTTL etc. should begin at zero again Wouldn't that defeat the purpose of RTTL? If you create a loop across two devices in different namespaces, it may no longer get detected. Or is that a different problem again? Arnd --- net: maintain namespace isolation between vlan and real device In the vlan and macvlan drivers, the start_xmit function forwards data to the dev_queue_xmit function for another device, which may potentially belong to a different namespace. To make sure that classification stays within a single namespace, this resets the potentially critical fields. Still needs testing, don't apply Signed-off-by: Arnd Bergmann a...@arndb.de --- drivers/net/macvlan.c |2 +- include/linux/netdevice.h |9 + net/8021q/vlan_dev.c |2 +- net/core/dev.c| 37 + 4 files changed, 44 insertions(+), 6 deletions(-) diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index 322112c..edcebf1 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -269,7 +269,7 @@ static int macvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev) } xmit_world: - skb-dev = vlan-lowerdev; + skb_set_dev(skb, vlan-lowerdev); return dev_queue_xmit(skb); } diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 9428793..fdf4a1a 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1009,6 +1009,15 @@ static inline bool netdev_uses_dsa_tags(struct net_device *dev) return 0; } +#ifdef CONFIG_NET_NS +static inline void skb_set_dev(struct sk_buff *skb, struct net_device *dev) +{ + skb-dev = dev; +} +#else /* CONFIG_NET_NS */ +void skb_set_dev(struct sk_buff *skb, struct net_device *dev); +#endif + static inline bool netdev_uses_trailer_tags(struct net_device *dev) { #ifdef CONFIG_NET_DSA_TAG_TRAILER diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c index de0dc6b..51fcfff 100644 --- a/net/8021q/vlan_dev.c +++ b/net/8021q/vlan_dev.c @@ -323,7 +323,7 @@ static netdev_tx_t vlan_dev_hard_start_xmit(struct sk_buff *skb, } - skb-dev = vlan_dev_info(dev)-real_dev; + skb_set_dev(skb, vlan_dev_info(dev)-real_dev); len = skb-len; ret = dev_queue_xmit(skb); diff --git a/net/core/dev.c b/net/core/dev.c index f8baa15..220d4e4 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1448,13 +1448,10 @@ int dev_forward_skb(struct net_device *dev, struct sk_buff *skb) if (skb-len (dev-mtu + dev-hard_header_len)) return NET_RX_DROP; - skb_dst_drop(skb); + skb_set_dev(skb, dev); skb-tstamp.tv64 = 0; skb-pkt_type = PACKET_HOST; skb-protocol = eth_type_trans(skb, dev); - skb-mark = 0; - secpath_reset(skb); - nf_reset(skb); return netif_rx(skb); } EXPORT_SYMBOL_GPL(dev_forward_skb); @@ -1614,6 +1611,39 @@ static bool dev_can_checksum(struct net_device *dev, struct sk_buff *skb) return false; } +/** + * skb_dev_set -- assign a buffer to a new device + * @skb: buffer for the new device + * @dev: network device + * + * If an skb is owned by a device already, we have to reset + * all data private to the namespace a device belongs to + * before assigning it a new device. + */ +void skb_set_dev(struct sk_buff *skb, struct net_device *dev) +{ + if (skb-dev !net_eq(dev_net(skb-dev), dev_net(dev))) { + secpath_reset(skb); + skb_dst_drop(skb); + nf_reset(skb); + skb_init_secmark(skb); + skb-mark = 0; + skb-priority = 0; + skb-nf_trace = 0; + skb-ipvs_property = 0; +#ifdef CONFIG_NET_SCHED + skb-tc_index = 0; +#ifdef CONFIG_NET_CLS_ACT + skb-tc_verd = SET_TC_VERD(skb-tc_verd, 0); + skb-tc_verd = SET_TC_RTTL(skb-tc_verd, 0); +#endif +#endif + } + skb-dev = dev; + skb-skb_iif = skb-dev-ifindex; +} +EXPORT_SYMBOL(skb_set_dev); + /* * Invalidate hardware checksum when packet is to be mangled, and * complete checksum manually on outgoing path. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] veth: move loopback logic to common location
Arnd Bergmann wrote: On Thursday 26 November 2009, Patrick McHardy wrote: In addition to those already handled, I'd say - priority: affects qdisc classification, may refer to classes of the old namespace - ipvs_property: might cause packets to incorrectly skip netfilter hooks - nf_trace: might trigger packet tracing - nf_bridge: contains references to network devices in the old NS, also indicates packet was bridged - iif: index is only valid in the originating namespace - probably secmark. ok - tc_index: classification result, should only be set in the namespace of the classifier - tc_verd: RTTL etc. should begin at zero again Wouldn't that defeat the purpose of RTTL? If you create a loop across two devices in different namespaces, it may no longer get detected. Or is that a different problem again? Mhh good point, that would indeed be possible. OTOH using ingress filtering in one namespace currently might cause the packet to get dropped in a different namespace because the ttl runs out. For now I'd suggest to go the safe route and keep the TTL intact until we can come up with something better. +void skb_set_dev(struct sk_buff *skb, struct net_device *dev) +{ + if (skb-dev !net_eq(dev_net(skb-dev), dev_net(dev))) { + secpath_reset(skb); + skb_dst_drop(skb); + nf_reset(skb); + skb_init_secmark(skb); + skb-mark = 0; + skb-priority = 0; + skb-nf_trace = 0; + skb-ipvs_property = 0; +#ifdef CONFIG_NET_SCHED + skb-tc_index = 0; +#ifdef CONFIG_NET_CLS_ACT + skb-tc_verd = SET_TC_VERD(skb-tc_verd, 0); + skb-tc_verd = SET_TC_RTTL(skb-tc_verd, 0); +#endif +#endif This makes we wonder which ones we actually should keep. Most of the others get reinitialized anyways, so maybe its better to simply clear the entire area up until -tail like f.i. skb_recycle_check(). + } + skb-dev = dev; + skb-skb_iif = skb-dev-ifindex; This doesn't seem necessary, if the packet goes through netif_receive_skb, it will be set anyways. +} +EXPORT_SYMBOL(skb_set_dev); ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] veth: move loopback logic to common location
Arnd Bergmann wrote: +int dev_forward_skb(struct net_device *dev, struct sk_buff *skb) +{ + skb_orphan(skb); + + if (!(dev-flags IFF_UP)) + return NET_RX_DROP; + + if (skb-len (dev-mtu + dev-hard_header_len)) + return NET_RX_DROP; + + skb_dst_drop(skb); + skb-tstamp.tv64 = 0; + skb-pkt_type = PACKET_HOST; + skb-protocol = eth_type_trans(skb, dev); + skb-mark = 0; skb-mark clearing should stay private to veth since its usually supposed to stay intact. The only exception is packets crossing namespaces, where they should appear like a freshly received skbs. + secpath_reset(skb); + nf_reset(skb); + return netif_rx(skb); +} +EXPORT_SYMBOL_GPL(dev_forward_skb); ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] veth: move loopback logic to common location
On Tuesday 24 November 2009 09:51:19 Patrick McHardy wrote: + skb_dst_drop(skb); + skb-tstamp.tv64 = 0; + skb-pkt_type = PACKET_HOST; + skb-protocol = eth_type_trans(skb, dev); + skb-mark = 0; skb-mark clearing should stay private to veth since its usually supposed to stay intact. The only exception is packets crossing namespaces, where they should appear like a freshly received skbs. But isn't that what we want in macvlan as well when we're forwarding from one downstream interface to another? I did all my testing with macvlan interfaces in separate namespaces communicating with each other, so I'd assume that we should always clear skb-mark and skb-dst in this function. Maybe I should make the documentation clearer? --- net: clarify documentation of dev_forward_skb Signed-off-by: Arnd Bergmann a...@arndb.de --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1433,6 +1433,10 @@ static inline void net_timestamp(struct sk_buff *skb) * dev_forward_skb can be used for injecting an skb from the * start_xmit function of one device into the receive queue * of another device. + * + * The receiving device may be in another namespace, so + * we have to clear all information in the skb that could + * impact namespace isolation. */ int dev_forward_skb(struct net_device *dev, struct sk_buff *skb) { ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] veth: move loopback logic to common location
Arnd Bergmann wrote: On Tuesday 24 November 2009 09:51:19 Patrick McHardy wrote: + skb_dst_drop(skb); + skb-tstamp.tv64 = 0; + skb-pkt_type = PACKET_HOST; + skb-protocol = eth_type_trans(skb, dev); + skb-mark = 0; skb-mark clearing should stay private to veth since its usually supposed to stay intact. The only exception is packets crossing namespaces, where they should appear like a freshly received skbs. But isn't that what we want in macvlan as well when we're forwarding from one downstream interface to another? In the TX direction you can use the mark for TC classification on the underlying device. I did all my testing with macvlan interfaces in separate namespaces communicating with each other, so I'd assume that we should always clear skb-mark and skb-dst in this function. Good point, in that case we probably should clear it as well. But in the non-namespace case the TC classification currently works and this is consistent with any other virtual device driver, so it should continue to work. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] veth: move loopback logic to common location
On Tuesday 24 November 2009 10:17:11 Patrick McHardy wrote: Arnd Bergmann wrote: On Tuesday 24 November 2009 09:51:19 Patrick McHardy wrote: + skb_dst_drop(skb); + skb-tstamp.tv64 = 0; + skb-pkt_type = PACKET_HOST; + skb-protocol = eth_type_trans(skb, dev); + skb-mark = 0; skb-mark clearing should stay private to veth since its usually supposed to stay intact. The only exception is packets crossing namespaces, where they should appear like a freshly received skbs. But isn't that what we want in macvlan as well when we're forwarding from one downstream interface to another? In the TX direction you can use the mark for TC classification on the underlying device. I don't use dev_forward_skb for the case where the data is sent to the underlying device, so the TC classification should stay intact. I did all my testing with macvlan interfaces in separate namespaces communicating with each other, so I'd assume that we should always clear skb-mark and skb-dst in this function. Good point, in that case we probably should clear it as well. But in the non-namespace case the TC classification currently works and this is consistent with any other virtual device driver, so it should continue to work. Do you think we should be able to use TC to direct traffic between macvlans on the same underlying device in bridge mode? It does sound useful, but I'm not sure how to implement that or if you'd expect it to work with the current code. If we support that, it should probably also work with namespaces, by consuming the mark in the macvlan and veth drivers. Arnd ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] veth: move loopback logic to common location
Arnd Bergmann wrote: On Tuesday 24 November 2009 10:17:11 Patrick McHardy wrote: Arnd Bergmann wrote: On Tuesday 24 November 2009 09:51:19 Patrick McHardy wrote: + skb_dst_drop(skb); + skb-tstamp.tv64 = 0; + skb-pkt_type = PACKET_HOST; + skb-protocol = eth_type_trans(skb, dev); + skb-mark = 0; skb-mark clearing should stay private to veth since its usually supposed to stay intact. The only exception is packets crossing namespaces, where they should appear like a freshly received skbs. But isn't that what we want in macvlan as well when we're forwarding from one downstream interface to another? In the TX direction you can use the mark for TC classification on the underlying device. I don't use dev_forward_skb for the case where the data is sent to the underlying device, so the TC classification should stay intact. Right, I see. This looks fine. I did all my testing with macvlan interfaces in separate namespaces communicating with each other, so I'd assume that we should always clear skb-mark and skb-dst in this function. Good point, in that case we probably should clear it as well. But in the non-namespace case the TC classification currently works and this is consistent with any other virtual device driver, so it should continue to work. Do you think we should be able to use TC to direct traffic between macvlans on the same underlying device in bridge mode? It does sound useful, but I'm not sure how to implement that or if you'd expect it to work with the current code. If we support that, it should probably also work with namespaces, by consuming the mark in the macvlan and veth drivers. I don't think its necessary, we bypass outgoing queuing anyways. But if you'd want to add it, just keeping the skb-mark clearing in veth should work from what I can tell. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] veth: move loopback logic to common location
On Tuesday 24 November 2009, Patrick McHardy wrote: I don't think its necessary, we bypass outgoing queuing anyways. But if you'd want to add it, just keeping the skb-mark clearing in veth should work from what I can tell. Ok, I won't bother with it for now then. Arnd ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] veth: move loopback logic to common location
Patrick McHardy ka...@trash.net writes: I did all my testing with macvlan interfaces in separate namespaces communicating with each other, so I'd assume that we should always clear skb-mark and skb-dst in this function. Good point, in that case we probably should clear it as well. But in the non-namespace case the TC classification currently works and this is consistent with any other virtual device driver, so it should continue to work. Do you think we should be able to use TC to direct traffic between macvlans on the same underlying device in bridge mode? It does sound useful, but I'm not sure how to implement that or if you'd expect it to work with the current code. If we support that, it should probably also work with namespaces, by consuming the mark in the macvlan and veth drivers. I don't think its necessary, we bypass outgoing queuing anyways. But if you'd want to add it, just keeping the skb-mark clearing in veth should work from what I can tell. veth doesn't have an outgoing queue. The reason we clear skb-mark in veth is because when reentering the networking stack the packet needs to be reclassified. At the point of loopback we are talking a packet that has at least logically gone out of the machine on a wire and come back into the machine on another physical interface. So it seems to me we should have consistent handling for macvlans, veth, for the cases where we are looping packets back around. In practice I expect all of those cases are going to be cross namespace as otherwise we would have intercepted the packet before going out a physical interface. Eric ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] veth: move loopback logic to common location
Eric W. Biederman wrote: Patrick McHardy ka...@trash.net writes: I did all my testing with macvlan interfaces in separate namespaces communicating with each other, so I'd assume that we should always clear skb-mark and skb-dst in this function. Good point, in that case we probably should clear it as well. But in the non-namespace case the TC classification currently works and this is consistent with any other virtual device driver, so it should continue to work. Do you think we should be able to use TC to direct traffic between macvlans on the same underlying device in bridge mode? It does sound useful, but I'm not sure how to implement that or if you'd expect it to work with the current code. If we support that, it should probably also work with namespaces, by consuming the mark in the macvlan and veth drivers. I don't think its necessary, we bypass outgoing queuing anyways. But if you'd want to add it, just keeping the skb-mark clearing in veth should work from what I can tell. veth doesn't have an outgoing queue. The reason we clear skb-mark in veth is because when reentering the networking stack the packet needs to be reclassified. At the point of loopback we are talking a packet that has at least logically gone out of the machine on a wire and come back into the machine on another physical interface. So it seems to me we should have consistent handling for macvlans, veth, for the cases where we are looping packets back around. In practice I expect all of those cases are going to be cross namespace as otherwise we would have intercepted the packet before going out a physical interface. Agreed on the looping case, that's what we're doing now. In the layered case (macvlan - eth0) its common behaviour to keep the mark however. But in case of different namespaces, I think macvlan should also clear the mark on the dev_queue_xmit() path since this is just a shortcut to looping the packets through veth. In fact probably both of them should also clear skb-priority so other namespaces don't accidentally misclassify packets. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] veth: move loopback logic to common location
Patrick McHardy ka...@trash.net writes: Eric W. Biederman wrote: Patrick McHardy ka...@trash.net writes: I did all my testing with macvlan interfaces in separate namespaces communicating with each other, so I'd assume that we should always clear skb-mark and skb-dst in this function. Good point, in that case we probably should clear it as well. But in the non-namespace case the TC classification currently works and this is consistent with any other virtual device driver, so it should continue to work. Do you think we should be able to use TC to direct traffic between macvlans on the same underlying device in bridge mode? It does sound useful, but I'm not sure how to implement that or if you'd expect it to work with the current code. If we support that, it should probably also work with namespaces, by consuming the mark in the macvlan and veth drivers. I don't think its necessary, we bypass outgoing queuing anyways. But if you'd want to add it, just keeping the skb-mark clearing in veth should work from what I can tell. veth doesn't have an outgoing queue. The reason we clear skb-mark in veth is because when reentering the networking stack the packet needs to be reclassified. At the point of loopback we are talking a packet that has at least logically gone out of the machine on a wire and come back into the machine on another physical interface. So it seems to me we should have consistent handling for macvlans, veth, for the cases where we are looping packets back around. In practice I expect all of those cases are going to be cross namespace as otherwise we would have intercepted the packet before going out a physical interface. Agreed on the looping case, that's what we're doing now. In the layered case (macvlan - eth0) its common behaviour to keep the mark however. But in case of different namespaces, I think macvlan should also clear the mark on the dev_queue_xmit() path since this is just a shortcut to looping the packets through veth. In fact probably both of them should also clear skb-priority so other namespaces don't accidentally misclassify packets. That is why I pushed for what is becoming dev_forward_skb. So that we have one place where we can make all of those tweaks. It seems like in every review we find another field that should be cleared/handled specially. I don't quite follow what you intend with dev_queue_xmit when the macvlan is in one namespace and the real physical device is in another. Are you mentioning that the packet classifier runs in the namespace where the primary device lives with packets from a different namespace? Eric ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] veth: move loopback logic to common location
On Tuesday 24 November 2009, Eric W. Biederman wrote: I don't quite follow what you intend with dev_queue_xmit when the macvlan is in one namespace and the real physical device is in another. Are you mentioning that the packet classifier runs in the namespace where the primary device lives with packets from a different namespace? I treat internal and external delivery very differently, the three cases are: 1. skb from real device to macvlan (macvlan_handle_frame): basically unchanged from before, except avoiding duplicate broadcasts. All skbs end up in netif_rx(vlan-dev) without clearing any data. We catch the frame in netif_receive_skb before it interacts with the namespace of the real device. 2. skb to external device (macvlan_start_xmit): if the destination is external, we just end up in dev_queue_xmit, with skb-dev set to the external device but no other changes. The data is already on the way out at this stage, so the namespace should not matter any more. 3. internal delivery: an skb from one macvlan to another gets always sent through dev_forward_skb, which is supposed to clear anything that must not leave the namespace. Does this make sense? Arnd ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] veth: move loopback logic to common location
Eric W. Biederman wrote: Patrick McHardy ka...@trash.net writes: In the layered case (macvlan - eth0) its common behaviour to keep the mark however. But in case of different namespaces, I think macvlan should also clear the mark on the dev_queue_xmit() path since this is just a shortcut to looping the packets through veth. In fact probably both of them should also clear skb-priority so other namespaces don't accidentally misclassify packets. That is why I pushed for what is becoming dev_forward_skb. So that we have one place where we can make all of those tweaks. It seems like in every review we find another field that should be cleared/handled specially. I don't quite follow what you intend with dev_queue_xmit when the macvlan is in one namespace and the real physical device is in another. Are you mentioning that the packet classifier runs in the namespace where the primary device lives with packets from a different namespace? Exactly. And I think we should make sure that the namespace of the macvlan device can't (deliberately or accidentally) cause misclassification. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 1/4] veth: move loopback logic to common location
The veth driver contains code to forward an skb from the start_xmit function of one network device into the receive path of another device. Moving that code into a common location lets us reuse the code for direct forwarding of data between macvlan ports, and possibly in other drivers. Signed-off-by: Arnd Bergmann a...@arndb.de --- drivers/net/veth.c| 17 +++-- include/linux/netdevice.h |2 ++ net/core/dev.c| 36 3 files changed, 41 insertions(+), 14 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 2d657f2..6c4b5a2 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -155,8 +155,6 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) struct veth_net_stats *stats, *rcv_stats; int length, cpu; - skb_orphan(skb); - priv = netdev_priv(dev); rcv = priv-peer; rcv_priv = netdev_priv(rcv); @@ -168,20 +166,12 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) if (!(rcv-flags IFF_UP)) goto tx_drop; - if (skb-len (rcv-mtu + MTU_PAD)) - goto rx_drop; - -skb-tstamp.tv64 = 0; - skb-pkt_type = PACKET_HOST; - skb-protocol = eth_type_trans(skb, rcv); if (dev-features NETIF_F_NO_CSUM) skb-ip_summed = rcv_priv-ip_summed; - skb-mark = 0; - secpath_reset(skb); - nf_reset(skb); - - length = skb-len; + length = skb-len + ETH_HLEN; + if (dev_forward_skb(rcv, skb) != NET_RX_SUCCESS) + goto rx_drop; stats-tx_bytes += length; stats-tx_packets++; @@ -189,7 +179,6 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) rcv_stats-rx_bytes += length; rcv_stats-rx_packets++; - netif_rx(skb); return NETDEV_TX_OK; tx_drop: diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 97873e3..9428793 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1562,6 +1562,8 @@ extern intdev_set_mac_address(struct net_device *, extern int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, struct netdev_queue *txq); +extern int dev_forward_skb(struct net_device *dev, + struct sk_buff *skb); extern int netdev_budget; diff --git a/net/core/dev.c b/net/core/dev.c index ccefa24..ba18e82 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -105,6 +105,7 @@ #include net/dst.h #include net/pkt_sched.h #include net/checksum.h +#include net/xfrm.h #include linux/highmem.h #include linux/init.h #include linux/kmod.h @@ -1419,6 +1420,41 @@ static inline void net_timestamp(struct sk_buff *skb) skb-tstamp.tv64 = 0; } +/** + * dev_forward_skb - loopback an skb to another netif + * + * @dev: destination network device + * @skb: buffer to forward + * + * return values: + * NET_RX_SUCCESS (no congestion) + * NET_RX_DROP (packet was dropped) + * + * dev_forward_skb can be used for injecting an skb from the + * start_xmit function of one device into the receive queue + * of another device. + */ +int dev_forward_skb(struct net_device *dev, struct sk_buff *skb) +{ + skb_orphan(skb); + + if (!(dev-flags IFF_UP)) + return NET_RX_DROP; + + if (skb-len (dev-mtu + dev-hard_header_len)) + return NET_RX_DROP; + + skb_dst_drop(skb); + skb-tstamp.tv64 = 0; + skb-pkt_type = PACKET_HOST; + skb-protocol = eth_type_trans(skb, dev); + skb-mark = 0; + secpath_reset(skb); + nf_reset(skb); + return netif_rx(skb); +} +EXPORT_SYMBOL_GPL(dev_forward_skb); + /* * Support routine. Sends outgoing frames to any network * taps currently in use. -- 1.6.3.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization