Re: [PATCH 1/4] veth: move loopback logic to common location

2009-11-26 Thread Patrick McHardy
Arnd Bergmann wrote:
 On Tuesday 24 November 2009, Patrick McHardy wrote:
 Eric W. Biederman wrote:
 I don't quite follow what you intend with dev_queue_xmit when the macvlan
 is in one namespace and the real physical device is in another.  Are
 you mentioning that the packet classifier runs in the namespace where
 the primary device lives with packets from a different namespace?
 Exactly. And I think we should make sure that the namespace of
 the macvlan device can't (deliberately or accidentally) cause
 misclassification.
 
 This is independent of my series and a preexisting problem, right?

Correct.

 Which fields do you think need to be reset to maintain namespace
 isolation for the outbound path in macvlan?

In addition to those already handled, I'd say

- priority: affects qdisc classification, may refer to classes of the
  old namespace
- ipvs_property: might cause packets to incorrectly skip netfilter hooks
- nf_trace: might trigger packet tracing
- nf_bridge: contains references to network devices in the old NS,
  also indicates packet was bridged
- iif: index is only valid in the originating namespace
- tc_index: classification result, should only be set in the namespace
  of the classifier
- tc_verd: RTTL etc. should begin at zero again
- probably secmark.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/4] veth: move loopback logic to common location

2009-11-26 Thread Eric W. Biederman
Patrick McHardy ka...@trash.net writes:

 Arnd Bergmann wrote:
 On Tuesday 24 November 2009, Patrick McHardy wrote:
 Eric W. Biederman wrote:
 I don't quite follow what you intend with dev_queue_xmit when the macvlan
 is in one namespace and the real physical device is in another.  Are
 you mentioning that the packet classifier runs in the namespace where
 the primary device lives with packets from a different namespace?
 Exactly. And I think we should make sure that the namespace of
 the macvlan device can't (deliberately or accidentally) cause
 misclassification.
 
 This is independent of my series and a preexisting problem, right?

 Correct.

 Which fields do you think need to be reset to maintain namespace
 isolation for the outbound path in macvlan?

 In addition to those already handled, I'd say

 - priority: affects qdisc classification, may refer to classes of the
   old namespace
 - ipvs_property: might cause packets to incorrectly skip netfilter hooks
 - nf_trace: might trigger packet tracing
 - nf_bridge: contains references to network devices in the old NS,
   also indicates packet was bridged
 - iif: index is only valid in the originating namespace
 - tc_index: classification result, should only be set in the namespace
   of the classifier
 - tc_verd: RTTL etc. should begin at zero again
 - probably secmark.

Wow.  I thought we were trying to reduce skbuff, where did all of those
fields come from?  Regarless that sounds like a good list to get stomped.

Eric

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/4] veth: move loopback logic to common location

2009-11-26 Thread Arnd Bergmann
On Thursday 26 November 2009, Patrick McHardy wrote:
 In addition to those already handled, I'd say
 
 - priority: affects qdisc classification, may refer to classes of the
   old namespace
 - ipvs_property: might cause packets to incorrectly skip netfilter hooks
 - nf_trace: might trigger packet tracing
 - nf_bridge: contains references to network devices in the old NS,
   also indicates packet was bridged
 - iif: index is only valid in the originating namespace
 - probably secmark.

ok

 - tc_index: classification result, should only be set in the namespace
   of the classifier
 - tc_verd: RTTL etc. should begin at zero again

Wouldn't that defeat the purpose of RTTL? If you create a loop
across two devices in different namespaces, it may no longer get
detected. Or is that a different problem again?

Arnd 

---

net: maintain namespace isolation between vlan and real device

In the vlan and macvlan drivers, the start_xmit function forwards
data to the dev_queue_xmit function for another device, which may
potentially belong to a different namespace.

To make sure that classification stays within a single namespace,
this resets the potentially critical fields.

Still needs testing, don't apply

Signed-off-by: Arnd Bergmann a...@arndb.de
---
 drivers/net/macvlan.c |2 +-
 include/linux/netdevice.h |9 +
 net/8021q/vlan_dev.c  |2 +-
 net/core/dev.c|   37 +
 4 files changed, 44 insertions(+), 6 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 322112c..edcebf1 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -269,7 +269,7 @@ static int macvlan_queue_xmit(struct sk_buff *skb, struct 
net_device *dev)
}
 
 xmit_world:
-   skb-dev = vlan-lowerdev;
+   skb_set_dev(skb, vlan-lowerdev);
return dev_queue_xmit(skb);
 }
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 9428793..fdf4a1a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1009,6 +1009,15 @@ static inline bool netdev_uses_dsa_tags(struct 
net_device *dev)
return 0;
 }
 
+#ifdef CONFIG_NET_NS
+static inline void skb_set_dev(struct sk_buff *skb, struct net_device *dev)
+{
+   skb-dev = dev;
+}
+#else /* CONFIG_NET_NS */
+void skb_set_dev(struct sk_buff *skb, struct net_device *dev);
+#endif
+
 static inline bool netdev_uses_trailer_tags(struct net_device *dev)
 {
 #ifdef CONFIG_NET_DSA_TAG_TRAILER
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index de0dc6b..51fcfff 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -323,7 +323,7 @@ static netdev_tx_t vlan_dev_hard_start_xmit(struct sk_buff 
*skb,
}
 
 
-   skb-dev = vlan_dev_info(dev)-real_dev;
+   skb_set_dev(skb, vlan_dev_info(dev)-real_dev);
len = skb-len;
ret = dev_queue_xmit(skb);
 
diff --git a/net/core/dev.c b/net/core/dev.c
index f8baa15..220d4e4 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1448,13 +1448,10 @@ int dev_forward_skb(struct net_device *dev, struct 
sk_buff *skb)
if (skb-len  (dev-mtu + dev-hard_header_len))
return NET_RX_DROP;
 
-   skb_dst_drop(skb);
+   skb_set_dev(skb, dev);
skb-tstamp.tv64 = 0;
skb-pkt_type = PACKET_HOST;
skb-protocol = eth_type_trans(skb, dev);
-   skb-mark = 0;
-   secpath_reset(skb);
-   nf_reset(skb);
return netif_rx(skb);
 }
 EXPORT_SYMBOL_GPL(dev_forward_skb);
@@ -1614,6 +1611,39 @@ static bool dev_can_checksum(struct net_device *dev, 
struct sk_buff *skb)
return false;
 }
 
+/**
+ * skb_dev_set -- assign a buffer to a new device
+ * @skb: buffer for the new device
+ * @dev: network device
+ *
+ * If an skb is owned by a device already, we have to reset
+ * all data private to the namespace a device belongs to
+ * before assigning it a new device.
+ */
+void skb_set_dev(struct sk_buff *skb, struct net_device *dev)
+{
+   if (skb-dev  !net_eq(dev_net(skb-dev), dev_net(dev))) {
+   secpath_reset(skb);
+   skb_dst_drop(skb);
+   nf_reset(skb);
+   skb_init_secmark(skb);
+   skb-mark = 0;
+   skb-priority = 0;
+   skb-nf_trace = 0;
+   skb-ipvs_property = 0;
+#ifdef CONFIG_NET_SCHED
+   skb-tc_index = 0;
+#ifdef CONFIG_NET_CLS_ACT
+   skb-tc_verd = SET_TC_VERD(skb-tc_verd, 0);
+   skb-tc_verd = SET_TC_RTTL(skb-tc_verd, 0);
+#endif
+#endif
+   }
+   skb-dev = dev;
+   skb-skb_iif = skb-dev-ifindex;
+}
+EXPORT_SYMBOL(skb_set_dev);
+
 /*
  * Invalidate hardware checksum when packet is to be mangled, and
  * complete checksum manually on outgoing path.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/4] veth: move loopback logic to common location

2009-11-26 Thread Patrick McHardy
Arnd Bergmann wrote:
 On Thursday 26 November 2009, Patrick McHardy wrote:
 In addition to those already handled, I'd say

 - priority: affects qdisc classification, may refer to classes of the
   old namespace
 - ipvs_property: might cause packets to incorrectly skip netfilter hooks
 - nf_trace: might trigger packet tracing
 - nf_bridge: contains references to network devices in the old NS,
   also indicates packet was bridged
 - iif: index is only valid in the originating namespace
 - probably secmark.
 
 ok
 
 - tc_index: classification result, should only be set in the namespace
   of the classifier
 - tc_verd: RTTL etc. should begin at zero again
 
 Wouldn't that defeat the purpose of RTTL? If you create a loop
 across two devices in different namespaces, it may no longer get
 detected. Or is that a different problem again?

Mhh good point, that would indeed be possible. OTOH using ingress
filtering in one namespace currently might cause the packet to get
dropped in a different namespace because the ttl runs out. For now
I'd suggest to go the safe route and keep the TTL intact until we
can come up with something better.

 +void skb_set_dev(struct sk_buff *skb, struct net_device *dev)
 +{
 + if (skb-dev  !net_eq(dev_net(skb-dev), dev_net(dev))) {
 + secpath_reset(skb);
 + skb_dst_drop(skb);
 + nf_reset(skb);
 + skb_init_secmark(skb);
 + skb-mark = 0;
 + skb-priority = 0;
 + skb-nf_trace = 0;
 + skb-ipvs_property = 0;
 +#ifdef CONFIG_NET_SCHED
 + skb-tc_index = 0;
 +#ifdef CONFIG_NET_CLS_ACT
 + skb-tc_verd = SET_TC_VERD(skb-tc_verd, 0);
 + skb-tc_verd = SET_TC_RTTL(skb-tc_verd, 0);
 +#endif
 +#endif

This makes we wonder which ones we actually should keep. Most of the
others get reinitialized anyways, so maybe its better to simply clear
the entire area up until -tail like f.i. skb_recycle_check().

 + }
 + skb-dev = dev;
 + skb-skb_iif = skb-dev-ifindex;

This doesn't seem necessary, if the packet goes through
netif_receive_skb, it will be set anyways.

 +}
 +EXPORT_SYMBOL(skb_set_dev);
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/4] veth: move loopback logic to common location

2009-11-24 Thread Patrick McHardy
Arnd Bergmann wrote:
 +int dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
 +{
 + skb_orphan(skb);
 +
 + if (!(dev-flags  IFF_UP))
 + return NET_RX_DROP;
 +
 + if (skb-len  (dev-mtu + dev-hard_header_len))
 + return NET_RX_DROP;
 +
 + skb_dst_drop(skb);
 + skb-tstamp.tv64 = 0;
 + skb-pkt_type = PACKET_HOST;
 + skb-protocol = eth_type_trans(skb, dev);
 + skb-mark = 0;

skb-mark clearing should stay private to veth since its usually
supposed to stay intact. The only exception is packets crossing
namespaces, where they should appear like a freshly received skbs.

 + secpath_reset(skb);
 + nf_reset(skb);
 + return netif_rx(skb);
 +}
 +EXPORT_SYMBOL_GPL(dev_forward_skb);
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/4] veth: move loopback logic to common location

2009-11-24 Thread Arnd Bergmann
On Tuesday 24 November 2009 09:51:19 Patrick McHardy wrote:
  + skb_dst_drop(skb);
  + skb-tstamp.tv64 = 0;
  + skb-pkt_type = PACKET_HOST;
  + skb-protocol = eth_type_trans(skb, dev);
  + skb-mark = 0;
 
 skb-mark clearing should stay private to veth since its usually
 supposed to stay intact. The only exception is packets crossing
 namespaces, where they should appear like a freshly received skbs.

But isn't that what we want in macvlan as well when we're
forwarding from one downstream interface to another?

I did all my testing with macvlan interfaces in separate namespaces
communicating with each other, so I'd assume that we should always
clear skb-mark and skb-dst in this function. Maybe I should make
the documentation clearer?

---
net: clarify documentation of dev_forward_skb

Signed-off-by: Arnd Bergmann a...@arndb.de

--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1433,6 +1433,10 @@ static inline void net_timestamp(struct sk_buff *skb)
  * dev_forward_skb can be used for injecting an skb from the
  * start_xmit function of one device into the receive queue
  * of another device.
+ *
+ * The receiving device may be in another namespace, so
+ * we have to clear all information in the skb that could
+ * impact namespace isolation.
  */
 int dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
 {
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/4] veth: move loopback logic to common location

2009-11-24 Thread Patrick McHardy
Arnd Bergmann wrote:
 On Tuesday 24 November 2009 09:51:19 Patrick McHardy wrote:
 + skb_dst_drop(skb);
 + skb-tstamp.tv64 = 0;
 + skb-pkt_type = PACKET_HOST;
 + skb-protocol = eth_type_trans(skb, dev);
 + skb-mark = 0;
 skb-mark clearing should stay private to veth since its usually
 supposed to stay intact. The only exception is packets crossing
 namespaces, where they should appear like a freshly received skbs.
 
 But isn't that what we want in macvlan as well when we're
 forwarding from one downstream interface to another?

In the TX direction you can use the mark for TC classification
on the underlying device.

 I did all my testing with macvlan interfaces in separate namespaces
 communicating with each other, so I'd assume that we should always
 clear skb-mark and skb-dst in this function.

Good point, in that case we probably should clear it as well. But
in the non-namespace case the TC classification currently works and
this is consistent with any other virtual device driver, so it
should continue to work.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/4] veth: move loopback logic to common location

2009-11-24 Thread Arnd Bergmann
On Tuesday 24 November 2009 10:17:11 Patrick McHardy wrote:
 Arnd Bergmann wrote:
  On Tuesday 24 November 2009 09:51:19 Patrick McHardy wrote:
  + skb_dst_drop(skb);
  + skb-tstamp.tv64 = 0;
  + skb-pkt_type = PACKET_HOST;
  + skb-protocol = eth_type_trans(skb, dev);
  + skb-mark = 0;
  skb-mark clearing should stay private to veth since its usually
  supposed to stay intact. The only exception is packets crossing
  namespaces, where they should appear like a freshly received skbs.
  
  But isn't that what we want in macvlan as well when we're
  forwarding from one downstream interface to another?
 
 In the TX direction you can use the mark for TC classification
 on the underlying device.

I don't use dev_forward_skb for the case where the data is sent
to the underlying device, so the TC classification should stay
intact.
 
  I did all my testing with macvlan interfaces in separate namespaces
  communicating with each other, so I'd assume that we should always
  clear skb-mark and skb-dst in this function.
 
 Good point, in that case we probably should clear it as well. But
 in the non-namespace case the TC classification currently works and
 this is consistent with any other virtual device driver, so it
 should continue to work.

Do you think we should be able to use TC to direct traffic between
macvlans on the same underlying device in bridge mode? It does sound
useful, but I'm not sure how to implement that or if you'd expect
it to work with the current code. If we support that, it should probably
also work with namespaces, by consuming the mark in the macvlan
and veth drivers.

Arnd 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/4] veth: move loopback logic to common location

2009-11-24 Thread Patrick McHardy
Arnd Bergmann wrote:
 On Tuesday 24 November 2009 10:17:11 Patrick McHardy wrote:
 Arnd Bergmann wrote:
 On Tuesday 24 November 2009 09:51:19 Patrick McHardy wrote:
 + skb_dst_drop(skb);
 + skb-tstamp.tv64 = 0;
 + skb-pkt_type = PACKET_HOST;
 + skb-protocol = eth_type_trans(skb, dev);
 + skb-mark = 0;
 skb-mark clearing should stay private to veth since its usually
 supposed to stay intact. The only exception is packets crossing
 namespaces, where they should appear like a freshly received skbs.
 But isn't that what we want in macvlan as well when we're
 forwarding from one downstream interface to another?
 In the TX direction you can use the mark for TC classification
 on the underlying device.
 
 I don't use dev_forward_skb for the case where the data is sent
 to the underlying device, so the TC classification should stay
 intact.

Right, I see. This looks fine.

 I did all my testing with macvlan interfaces in separate namespaces
 communicating with each other, so I'd assume that we should always
 clear skb-mark and skb-dst in this function.
 Good point, in that case we probably should clear it as well. But
 in the non-namespace case the TC classification currently works and
 this is consistent with any other virtual device driver, so it
 should continue to work.
 
 Do you think we should be able to use TC to direct traffic between
 macvlans on the same underlying device in bridge mode? It does sound
 useful, but I'm not sure how to implement that or if you'd expect
 it to work with the current code. If we support that, it should probably
 also work with namespaces, by consuming the mark in the macvlan
 and veth drivers.

I don't think its necessary, we bypass outgoing queuing anyways.
But if you'd want to add it, just keeping the skb-mark clearing
in veth should work from what I can tell.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/4] veth: move loopback logic to common location

2009-11-24 Thread Arnd Bergmann
On Tuesday 24 November 2009, Patrick McHardy wrote:
 I don't think its necessary, we bypass outgoing queuing anyways.
 But if you'd want to add it, just keeping the skb-mark clearing
 in veth should work from what I can tell.

Ok, I won't bother with it for now then.

Arnd 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/4] veth: move loopback logic to common location

2009-11-24 Thread Eric W. Biederman
Patrick McHardy ka...@trash.net writes:

 I did all my testing with macvlan interfaces in separate namespaces
 communicating with each other, so I'd assume that we should always
 clear skb-mark and skb-dst in this function.
 Good point, in that case we probably should clear it as well. But
 in the non-namespace case the TC classification currently works and
 this is consistent with any other virtual device driver, so it
 should continue to work.
 
 Do you think we should be able to use TC to direct traffic between
 macvlans on the same underlying device in bridge mode? It does sound
 useful, but I'm not sure how to implement that or if you'd expect
 it to work with the current code. If we support that, it should probably
 also work with namespaces, by consuming the mark in the macvlan
 and veth drivers.

 I don't think its necessary, we bypass outgoing queuing anyways.
 But if you'd want to add it, just keeping the skb-mark clearing
 in veth should work from what I can tell.

veth doesn't have an outgoing queue.  The reason we clear skb-mark
in veth is because when reentering the networking stack the packet
needs to be reclassified.  At the point of loopback we are talking
a packet that has at least logically gone out of the machine on a
wire and come back into the machine on another physical interface.

So it seems to me we should have consistent handling for macvlans,
veth, for the cases where we are looping packets back around.  In
practice I expect all of those cases are going to be cross namespace
as otherwise we would have intercepted the packet before going
out a physical interface.

Eric




___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/4] veth: move loopback logic to common location

2009-11-24 Thread Patrick McHardy
Eric W. Biederman wrote:
 Patrick McHardy ka...@trash.net writes:
 
 I did all my testing with macvlan interfaces in separate namespaces
 communicating with each other, so I'd assume that we should always
 clear skb-mark and skb-dst in this function.
 Good point, in that case we probably should clear it as well. But
 in the non-namespace case the TC classification currently works and
 this is consistent with any other virtual device driver, so it
 should continue to work.
 Do you think we should be able to use TC to direct traffic between
 macvlans on the same underlying device in bridge mode? It does sound
 useful, but I'm not sure how to implement that or if you'd expect
 it to work with the current code. If we support that, it should probably
 also work with namespaces, by consuming the mark in the macvlan
 and veth drivers.
 I don't think its necessary, we bypass outgoing queuing anyways.
 But if you'd want to add it, just keeping the skb-mark clearing
 in veth should work from what I can tell.
 
 veth doesn't have an outgoing queue.  The reason we clear skb-mark
 in veth is because when reentering the networking stack the packet
 needs to be reclassified.  At the point of loopback we are talking
 a packet that has at least logically gone out of the machine on a
 wire and come back into the machine on another physical interface.
 
 So it seems to me we should have consistent handling for macvlans,
 veth, for the cases where we are looping packets back around.  In
 practice I expect all of those cases are going to be cross namespace
 as otherwise we would have intercepted the packet before going
 out a physical interface.

Agreed on the looping case, that's what we're doing now.

In the layered case (macvlan - eth0) its common behaviour to
keep the mark however. But in case of different namespaces,
I think macvlan should also clear the mark on the dev_queue_xmit()
path since this is just a shortcut to looping the packets
through veth. In fact probably both of them should also clear
skb-priority so other namespaces don't accidentally misclassify
packets.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/4] veth: move loopback logic to common location

2009-11-24 Thread Eric W. Biederman
Patrick McHardy ka...@trash.net writes:

 Eric W. Biederman wrote:
 Patrick McHardy ka...@trash.net writes:
 
 I did all my testing with macvlan interfaces in separate namespaces
 communicating with each other, so I'd assume that we should always
 clear skb-mark and skb-dst in this function.
 Good point, in that case we probably should clear it as well. But
 in the non-namespace case the TC classification currently works and
 this is consistent with any other virtual device driver, so it
 should continue to work.
 Do you think we should be able to use TC to direct traffic between
 macvlans on the same underlying device in bridge mode? It does sound
 useful, but I'm not sure how to implement that or if you'd expect
 it to work with the current code. If we support that, it should probably
 also work with namespaces, by consuming the mark in the macvlan
 and veth drivers.
 I don't think its necessary, we bypass outgoing queuing anyways.
 But if you'd want to add it, just keeping the skb-mark clearing
 in veth should work from what I can tell.
 
 veth doesn't have an outgoing queue.  The reason we clear skb-mark
 in veth is because when reentering the networking stack the packet
 needs to be reclassified.  At the point of loopback we are talking
 a packet that has at least logically gone out of the machine on a
 wire and come back into the machine on another physical interface.
 
 So it seems to me we should have consistent handling for macvlans,
 veth, for the cases where we are looping packets back around.  In
 practice I expect all of those cases are going to be cross namespace
 as otherwise we would have intercepted the packet before going
 out a physical interface.

 Agreed on the looping case, that's what we're doing now.

 In the layered case (macvlan - eth0) its common behaviour to
 keep the mark however. But in case of different namespaces,
 I think macvlan should also clear the mark on the dev_queue_xmit()
 path since this is just a shortcut to looping the packets
 through veth. In fact probably both of them should also clear
 skb-priority so other namespaces don't accidentally misclassify
 packets.

That is why I pushed for what is becoming dev_forward_skb.  So that
we have one place where we can make all of those tweaks.  It seems
like in every review we find another field that should be cleared/handled
specially.

I don't quite follow what you intend with dev_queue_xmit when the macvlan
is in one namespace and the real physical device is in another.  Are
you mentioning that the packet classifier runs in the namespace where
the primary device lives with packets from a different namespace?

Eric

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/4] veth: move loopback logic to common location

2009-11-24 Thread Arnd Bergmann
On Tuesday 24 November 2009, Eric W. Biederman wrote:
 I don't quite follow what you intend with dev_queue_xmit when the macvlan
 is in one namespace and the real physical device is in another.  Are
 you mentioning that the packet classifier runs in the namespace where
 the primary device lives with packets from a different namespace?

I treat internal and external delivery very differently, the three
cases are:

1. skb from real device to macvlan (macvlan_handle_frame): basically
unchanged from before, except avoiding duplicate broadcasts. All
skbs end up in netif_rx(vlan-dev) without clearing any data.
We catch the frame in netif_receive_skb before it interacts with the
namespace of the real device.

2. skb to external device (macvlan_start_xmit): if the destination
is external, we just end up in dev_queue_xmit, with skb-dev set to
the external device but no other changes. The data is already on the
way out at this stage, so the namespace should not matter any more.

3. internal delivery: an skb from one macvlan to another gets always
sent through dev_forward_skb, which is supposed to clear anything
that must not leave the namespace.

Does this make sense?

Arnd 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/4] veth: move loopback logic to common location

2009-11-24 Thread Patrick McHardy
Eric W. Biederman wrote:
 Patrick McHardy ka...@trash.net writes:
 
 In the layered case (macvlan - eth0) its common behaviour to
 keep the mark however. But in case of different namespaces,
 I think macvlan should also clear the mark on the dev_queue_xmit()
 path since this is just a shortcut to looping the packets
 through veth. In fact probably both of them should also clear
 skb-priority so other namespaces don't accidentally misclassify
 packets.
 
 That is why I pushed for what is becoming dev_forward_skb.  So that
 we have one place where we can make all of those tweaks.  It seems
 like in every review we find another field that should be cleared/handled
 specially.
 
 I don't quite follow what you intend with dev_queue_xmit when the macvlan
 is in one namespace and the real physical device is in another.  Are
 you mentioning that the packet classifier runs in the namespace where
 the primary device lives with packets from a different namespace?

Exactly. And I think we should make sure that the namespace of
the macvlan device can't (deliberately or accidentally) cause
misclassification.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 1/4] veth: move loopback logic to common location

2009-11-23 Thread Arnd Bergmann
The veth driver contains code to forward an skb
from the start_xmit function of one network
device into the receive path of another device.

Moving that code into a common location lets us
reuse the code for direct forwarding of data
between macvlan ports, and possibly in other
drivers.

Signed-off-by: Arnd Bergmann a...@arndb.de
---
 drivers/net/veth.c|   17 +++--
 include/linux/netdevice.h |2 ++
 net/core/dev.c|   36 
 3 files changed, 41 insertions(+), 14 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 2d657f2..6c4b5a2 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -155,8 +155,6 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct 
net_device *dev)
struct veth_net_stats *stats, *rcv_stats;
int length, cpu;
 
-   skb_orphan(skb);
-
priv = netdev_priv(dev);
rcv = priv-peer;
rcv_priv = netdev_priv(rcv);
@@ -168,20 +166,12 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct 
net_device *dev)
if (!(rcv-flags  IFF_UP))
goto tx_drop;
 
-   if (skb-len  (rcv-mtu + MTU_PAD))
-   goto rx_drop;
-
-skb-tstamp.tv64 = 0;
-   skb-pkt_type = PACKET_HOST;
-   skb-protocol = eth_type_trans(skb, rcv);
if (dev-features  NETIF_F_NO_CSUM)
skb-ip_summed = rcv_priv-ip_summed;
 
-   skb-mark = 0;
-   secpath_reset(skb);
-   nf_reset(skb);
-
-   length = skb-len;
+   length = skb-len + ETH_HLEN;
+   if (dev_forward_skb(rcv, skb) != NET_RX_SUCCESS)
+   goto rx_drop;
 
stats-tx_bytes += length;
stats-tx_packets++;
@@ -189,7 +179,6 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct 
net_device *dev)
rcv_stats-rx_bytes += length;
rcv_stats-rx_packets++;
 
-   netif_rx(skb);
return NETDEV_TX_OK;
 
 tx_drop:
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 97873e3..9428793 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1562,6 +1562,8 @@ extern intdev_set_mac_address(struct 
net_device *,
 extern int dev_hard_start_xmit(struct sk_buff *skb,
struct net_device *dev,
struct netdev_queue *txq);
+extern int dev_forward_skb(struct net_device *dev,
+   struct sk_buff *skb);
 
 extern int netdev_budget;
 
diff --git a/net/core/dev.c b/net/core/dev.c
index ccefa24..ba18e82 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -105,6 +105,7 @@
 #include net/dst.h
 #include net/pkt_sched.h
 #include net/checksum.h
+#include net/xfrm.h
 #include linux/highmem.h
 #include linux/init.h
 #include linux/kmod.h
@@ -1419,6 +1420,41 @@ static inline void net_timestamp(struct sk_buff *skb)
skb-tstamp.tv64 = 0;
 }
 
+/**
+ * dev_forward_skb - loopback an skb to another netif
+ *
+ * @dev: destination network device
+ * @skb: buffer to forward
+ *
+ * return values:
+ * NET_RX_SUCCESS  (no congestion)
+ * NET_RX_DROP (packet was dropped)
+ *
+ * dev_forward_skb can be used for injecting an skb from the
+ * start_xmit function of one device into the receive queue
+ * of another device.
+ */
+int dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
+{
+   skb_orphan(skb);
+
+   if (!(dev-flags  IFF_UP))
+   return NET_RX_DROP;
+
+   if (skb-len  (dev-mtu + dev-hard_header_len))
+   return NET_RX_DROP;
+
+   skb_dst_drop(skb);
+   skb-tstamp.tv64 = 0;
+   skb-pkt_type = PACKET_HOST;
+   skb-protocol = eth_type_trans(skb, dev);
+   skb-mark = 0;
+   secpath_reset(skb);
+   nf_reset(skb);
+   return netif_rx(skb);
+}
+EXPORT_SYMBOL_GPL(dev_forward_skb);
+
 /*
  * Support routine. Sends outgoing frames to any network
  * taps currently in use.
-- 
1.6.3.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization