Re: [PATCH net-next v2 2/5] vxlan: support fdb and learning in COLLECT_METADATA mode

2017-02-10 Thread Roopa Prabhu
On 2/10/17, 8:05 PM, Joe Stringer wrote:
> On 31 January 2017 at 22:59, Roopa Prabhu  wrote:
>> @@ -1289,7 +1331,12 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff 
>> *skb)
>> if (!vs)
>> goto drop;
>>
>> -   vxlan = vxlan_vs_find_vni(vs, vxlan_vni(vxlan_hdr(skb)->vx_vni));
>> +   vni = vxlan_vni(vxlan_hdr(skb)->vx_vni);
>> +
>> +   if ((vs->flags & VXLAN_F_COLLECT_METADATA) && !vni)
>> +   goto drop;
>> +
>> +   vxlan = vxlan_vs_find_vni(vs, vni);
>> if (!vxlan)
>> goto drop;
> Hi Roopa,
>
> We've noticed a failure in OVS system-traffic kmod test cases and
> bisected it down to this commit. It seems that it's related to this
> new drop condition here. Can you explain what's meant to be special
> about VNI 0? I can't see anything mentioned about it in RFC7348, so I
> don't see why it should be dropped.
>
> In the OVS testsuite, we configure OVS in the root namespace with an
> OVS vxlan device (which has VXLAN_F_COLLECT_METADATA set), with vni 0.
> Then, we configure a veth pair into another namespace where we have
> the other end of the tunnel configured using a regular native linux
> vxlan device on vni 0. Prior to this commit, the test worked; after
> this test it failed. If we manually change to use a nonzero VNI, it
> works. The test is here:
To be honest, I thought vni 0 was only used for the collect metadata device for 
lookup
of the device until a real vni was derived. and since i moved the line that got 
the vni from the packet
up, I ended up adding that check. Did not realize vni 0 could be valid vni in 
the packet.
>
> https://github.com/openvswitch/ovs/blob/branch-2.7/tests/system-traffic.at#L218
>
> Jarno also tried setting up two namespaces with regular vxlan devices
> and VNI 0, and this worked too. Presumably this is because this would
> not use VXLAN_F_COLLECT_METADATA.
yeah, that should be it.

I will send a patch in a few hours. Thanks for reporting. I am glad you ran 
these tests.. as I was not able to
completely verify all cases for ovs.




Re: [PATCH net-next v2 2/5] vxlan: support fdb and learning in COLLECT_METADATA mode

2017-02-10 Thread Joe Stringer
On 31 January 2017 at 22:59, Roopa Prabhu  wrote:
> @@ -1289,7 +1331,12 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff 
> *skb)
> if (!vs)
> goto drop;
>
> -   vxlan = vxlan_vs_find_vni(vs, vxlan_vni(vxlan_hdr(skb)->vx_vni));
> +   vni = vxlan_vni(vxlan_hdr(skb)->vx_vni);
> +
> +   if ((vs->flags & VXLAN_F_COLLECT_METADATA) && !vni)
> +   goto drop;
> +
> +   vxlan = vxlan_vs_find_vni(vs, vni);
> if (!vxlan)
> goto drop;

Hi Roopa,

We've noticed a failure in OVS system-traffic kmod test cases and
bisected it down to this commit. It seems that it's related to this
new drop condition here. Can you explain what's meant to be special
about VNI 0? I can't see anything mentioned about it in RFC7348, so I
don't see why it should be dropped.

In the OVS testsuite, we configure OVS in the root namespace with an
OVS vxlan device (which has VXLAN_F_COLLECT_METADATA set), with vni 0.
Then, we configure a veth pair into another namespace where we have
the other end of the tunnel configured using a regular native linux
vxlan device on vni 0. Prior to this commit, the test worked; after
this test it failed. If we manually change to use a nonzero VNI, it
works. The test is here:

https://github.com/openvswitch/ovs/blob/branch-2.7/tests/system-traffic.at#L218

Jarno also tried setting up two namespaces with regular vxlan devices
and VNI 0, and this worked too. Presumably this is because this would
not use VXLAN_F_COLLECT_METADATA.


[PATCH net-next v2 2/5] vxlan: support fdb and learning in COLLECT_METADATA mode

2017-01-31 Thread Roopa Prabhu
From: Roopa Prabhu 

Vxlan COLLECT_METADATA mode today solves the per-vni netdev
scalability problem in l3 networks. It expects all forwarding
information to be present in dst_metadata. This patch series
enhances collect metadata mode to include the case where only
vni is present in dst_metadata, and the vxlan driver can then use
the rest of the forwarding information datbase to make forwarding
decisions. There is no change to default COLLECT_METADATA
behaviour. These changes only apply to COLLECT_METADATA when
used with the bridging use-case with a special dst_metadata
tunnel info flag (eg: where vxlan device is part of a bridge).
For all this to work, the vxlan driver will need to now support a
single fdb table hashed by mac + vni. This series essentially makes
this happen.

use-case and workflow:
vxlan collect metadata device participates in bridging vlan
to vn-segments. Bridge driver above the vxlan device,
sends the vni corresponding to the vlan in the dst_metadata.
vxlan driver will lookup forwarding database with (mac + vni)
for the required remote destination information to forward the
packet.

Changes introduced by this patch:
- allow learning and forwarding database state in vxlan netdev in
  COLLECT_METADATA mode. Current behaviour is not changed
  by default. tunnel info flag IP_TUNNEL_INFO_BRIDGE is used
  to support the new bridge friendly mode.
- A single fdb table hashed by (mac, vni) to allow fdb entries with
  multiple vnis in the same fdb table
- rx path already has the vni
- tx path expects a vni in the packet with dst_metadata
- prior to this series, fdb remote_dsts carried remote vni and
  the vxlan device carrying the fdb table represented the
  source vni. With the vxlan device now representing multiple vnis,
  this patch adds a src vni attribute to the fdb entry. The remote
  vni already uses NDA_VNI attribute. This patch introduces
  NDA_SRC_VNI netlink attribute to represent the src vni in a multi
  vni fdb table.

iproute2 example (patched and pruned iproute2 output to just show
relevant fdb entries):
example shows same host mac learnt on two vni's.

before (netdev per vni):
$bridge fdb show | grep "00:02:00:00:00:03"
00:02:00:00:00:03 dev vxlan1001 dst 12.0.0.8 self
00:02:00:00:00:03 dev vxlan1000 dst 12.0.0.8 self

after this patch with collect metadata in bridged mode (single netdev):
$bridge fdb show | grep "00:02:00:00:00:03"
00:02:00:00:00:03 dev vxlan0 src_vni 1001 dst 12.0.0.8 self
00:02:00:00:00:03 dev vxlan0 src_vni 1000 dst 12.0.0.8 self

Signed-off-by: Roopa Prabhu 
---
 drivers/net/vxlan.c|  196 +---
 include/uapi/linux/neighbour.h |1 +
 2 files changed, 126 insertions(+), 71 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 19b1653..6f16882 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -75,6 +75,7 @@ struct vxlan_fdb {
struct list_head  remotes;
u8eth_addr[ETH_ALEN];
u16   state;/* see ndm_state */
+   __be32vni;
u8flags;/* see ndm_flags */
 };
 
@@ -302,6 +303,10 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct 
vxlan_dev *vxlan,
if (rdst->remote_vni != vxlan->default_dst.remote_vni &&
nla_put_u32(skb, NDA_VNI, be32_to_cpu(rdst->remote_vni)))
goto nla_put_failure;
+   if ((vxlan->flags & VXLAN_F_COLLECT_METADATA) && fdb->vni &&
+   nla_put_u32(skb, NDA_SRC_VNI,
+   be32_to_cpu(fdb->vni)))
+   goto nla_put_failure;
if (rdst->remote_ifindex &&
nla_put_u32(skb, NDA_IFINDEX, rdst->remote_ifindex))
goto nla_put_failure;
@@ -400,34 +405,51 @@ static u32 eth_hash(const unsigned char *addr)
return hash_64(value, FDB_HASH_BITS);
 }
 
+static u32 eth_vni_hash(const unsigned char *addr, __be32 vni)
+{
+   /* use 1 byte of OUI and 3 bytes of NIC */
+   u32 key = get_unaligned((u32 *)(addr + 2));
+
+   return jhash_2words(key, vni, vxlan_salt) & (FDB_HASH_SIZE - 1);
+}
+
 /* Hash chain to use given mac address */
 static inline struct hlist_head *vxlan_fdb_head(struct vxlan_dev *vxlan,
-   const u8 *mac)
+   const u8 *mac, __be32 vni)
 {
-   return >fdb_head[eth_hash(mac)];
+   if (vxlan->flags & VXLAN_F_COLLECT_METADATA)
+   return >fdb_head[eth_vni_hash(mac, vni)];
+   else
+   return >fdb_head[eth_hash(mac)];
 }
 
 /* Look up Ethernet address in forwarding table */
 static struct vxlan_fdb *__vxlan_find_mac(struct vxlan_dev *vxlan,
-   const u8 *mac)
+ const u8 *mac, __be32 vni)
 {
-   struct hlist_head *head =