subject:"\[RFC PATCH net\-next 0\/5\] bridge\: per vlan lwt and dst

Re: [RFC PATCH net-next 0/5] bridge: per vlan lwt and dst_metadata support

2017-01-25 Thread Roopa Prabhu

On 1/24/17, 7:47 AM, Stephen Hemminger wrote:
> On Fri, 20 Jan 2017 21:46:51 -0800
> Roopa Prabhu  wrote:
>
>> From: Roopa Prabhu 
>>
>> High level summary:
>> lwt and dst_metadata/collect_metadata have enabled vxlan l3 deployments
>> to use a single vxlan netdev for multiple vnis eliminating the scalability
>> problem with using a single vxlan netdev per vni. This series tries to
>> do the same for vxlan netdevs in pure l2 bridged networks.
>> Use-case/deployment and details are below.
>>
>> Deployment scerario details:
>> As we know VXLAN is used to build layer 2 virtual networks across the
>> underlay layer3 infrastructure. A VXLAN tunnel endpoint (VTEP)
>> originates and terminates VXLAN tunnels. And a VTEP can be a TOR switch
>> or a vswitch in the hypervisor. This patch series mainly
>> focuses on the TOR switch configured as a Vtep. Vxlan segment ID (vni)
>> along with vlan id is used to identify layer 2 segments in a vxlan
>> overlay network. Vxlan bridging is the function provided by Vteps to 
>> terminate
>> vxlan tunnels and map the vxlan vni to traditional end host vlan. This is
>> covered in the "VXLAN Deployment Scenarios" in sections 6 and 6.1 in RFC 
>> 7348.
>> To provide vxlan bridging function, a vtep has to map vlan to a vni. The rfc
>> says that the ingress VTEP device shall remove the IEEE 802.1Q VLAN tag in
>> the original Layer 2 packet if there is one before encapsulating the packet
>> into the VXLAN format to transmit it through the underlay network. The remote
>> VTEP devices have information about the VLAN in which the packet will be
>> placed based on their own VLAN-to-VXLAN VNI mapping configurations.
>>
>> Existing solution:
>> Without this patch series one can deploy such a vtep configuration by
>> by adding the local ports and vxlan netdevs into a vlan filtering bridge.
>> The local ports are configured as trunk ports carrying all vlans.
>> A vxlan netdev per vni is added to the bridge. Vlan mapping to vni is
>> achieved by configuring the vlan as pvid on the corresponding vxlan netdev.
>> The vxlan netdev only receives traffic corresponding to the vlan it is mapped
>> to. This configuration maps traffic belonging to a vlan to the corresponding
>> vxlan segment.
>>
>>   ---
>>  |  bridge   |
>>  |   |
>>   ---
>> |100,200   |100 (pvid)|200 (pvid)
>> |  |  |
>>swp1  vxlan1000  vxlan2000
>> 
>> This provides the required vxlan bridging function but poses a
>> scalability problem with using a single vxlan netdev for each vni.
>>
>> Solution in this patch series:
>> The Goal is to use a single vxlan device to carry all vnis similar
>> to the vxlan collect metadata mode but vxlan driver still carrying all
>> the forwarding information.
>> - vxlan driver changes:
>> - enable collect metadata mode device to be used with learning,
>>   replication, fdb
>> - A single fdb table hashed by (mac, vni)
>> - rx path already has the vni
>> - tx path expects a vni in the packet with dst_metadata and vxlan
>>   driver has all the forwarding information for the vni in the
>>   dst_metadata.
>>
>> - Bridge driver changes: per vlan LWT and dst_metadata support:
>> - Our use case is vxlan and 1-1 mapping between vlan and vni, but I have
>>   kept the api generic for any tunnel info
>> - Uapi to configure/unconfigure/dump per vlan tunnel data
>> - new bridge port flag to turn this feature on/off. off by default
>> - ingress hook:
>> - if port is a lwt tunnel port, use tunnel info in
>>   attached dst_metadata to map it to a local vlan
>> - egress hook:
>> - if port is a lwt tunnel port, use tunnel info attached to vlan
>>   to set dst_metadata on the skb
>>
>> Other approaches tried and vetoed:
>> - tc vlan push/pop and tunnel metadata dst:
>> - posses a tc rule scalability problem (2 rules per vni)
>> - cannot handle the case where a packet needs to be replicated to
>>   multiple vxlan remote tunnel end-points.. which the vxlan driver
>>   can do today by having multiple remote destinations per fdb.
>> - making vxlan driver understand vlan-vni mapping:
>> - I had a series almost ready with this one but soon realized
>>   it duplicated a lot of vlan handling code in the vxlan driver
>>
>> This series is briefly tested for functionality. Sending it out as RFC while
>> I continue to test it more. There are some rough edges which I am in the 
>> process
>> of fixing.
>>
>> Signed-off-by: Roopa Prabhu 
>>
>> Roopa Prabhu (5):
>>   ip_tunnels: new IP_TUNNEL_INFO_BRIDGE flag for ip_tunnel_info mode
>>   vxlan: make COLLECT_METADATA mode bridge friendly
>>   bridge:

Re: [RFC PATCH net-next 0/5] bridge: per vlan lwt and dst_metadata support

2017-01-24 Thread Stephen Hemminger

On Fri, 20 Jan 2017 21:46:51 -0800
Roopa Prabhu  wrote:

> From: Roopa Prabhu 
> 
> High level summary:
> lwt and dst_metadata/collect_metadata have enabled vxlan l3 deployments
> to use a single vxlan netdev for multiple vnis eliminating the scalability
> problem with using a single vxlan netdev per vni. This series tries to
> do the same for vxlan netdevs in pure l2 bridged networks.
> Use-case/deployment and details are below.
> 
> Deployment scerario details:
> As we know VXLAN is used to build layer 2 virtual networks across the
> underlay layer3 infrastructure. A VXLAN tunnel endpoint (VTEP)
> originates and terminates VXLAN tunnels. And a VTEP can be a TOR switch
> or a vswitch in the hypervisor. This patch series mainly
> focuses on the TOR switch configured as a Vtep. Vxlan segment ID (vni)
> along with vlan id is used to identify layer 2 segments in a vxlan
> overlay network. Vxlan bridging is the function provided by Vteps to terminate
> vxlan tunnels and map the vxlan vni to traditional end host vlan. This is
> covered in the "VXLAN Deployment Scenarios" in sections 6 and 6.1 in RFC 7348.
> To provide vxlan bridging function, a vtep has to map vlan to a vni. The rfc
> says that the ingress VTEP device shall remove the IEEE 802.1Q VLAN tag in
> the original Layer 2 packet if there is one before encapsulating the packet
> into the VXLAN format to transmit it through the underlay network. The remote
> VTEP devices have information about the VLAN in which the packet will be
> placed based on their own VLAN-to-VXLAN VNI mapping configurations.
> 
> Existing solution:
> Without this patch series one can deploy such a vtep configuration by
> by adding the local ports and vxlan netdevs into a vlan filtering bridge.
> The local ports are configured as trunk ports carrying all vlans.
> A vxlan netdev per vni is added to the bridge. Vlan mapping to vni is
> achieved by configuring the vlan as pvid on the corresponding vxlan netdev.
> The vxlan netdev only receives traffic corresponding to the vlan it is mapped
> to. This configuration maps traffic belonging to a vlan to the corresponding
> vxlan segment.
> 
>   ---
>  |  bridge   |
>  |   |
>   ---
> |100,200   |100 (pvid)|200 (pvid)
> |  |  |
>swp1  vxlan1000  vxlan2000
> 
> This provides the required vxlan bridging function but poses a
> scalability problem with using a single vxlan netdev for each vni.
> 
> Solution in this patch series:
> The Goal is to use a single vxlan device to carry all vnis similar
> to the vxlan collect metadata mode but vxlan driver still carrying all
> the forwarding information.
> - vxlan driver changes:
> - enable collect metadata mode device to be used with learning,
>   replication, fdb
> - A single fdb table hashed by (mac, vni)
> - rx path already has the vni
> - tx path expects a vni in the packet with dst_metadata and vxlan
>   driver has all the forwarding information for the vni in the
>   dst_metadata.
> 
> - Bridge driver changes: per vlan LWT and dst_metadata support:
> - Our use case is vxlan and 1-1 mapping between vlan and vni, but I have
>   kept the api generic for any tunnel info
> - Uapi to configure/unconfigure/dump per vlan tunnel data
> - new bridge port flag to turn this feature on/off. off by default
> - ingress hook:
> - if port is a lwt tunnel port, use tunnel info in
>   attached dst_metadata to map it to a local vlan
> - egress hook:
> - if port is a lwt tunnel port, use tunnel info attached to vlan
>   to set dst_metadata on the skb
> 
> Other approaches tried and vetoed:
> - tc vlan push/pop and tunnel metadata dst:
> - posses a tc rule scalability problem (2 rules per vni)
> - cannot handle the case where a packet needs to be replicated to
>   multiple vxlan remote tunnel end-points.. which the vxlan driver
>   can do today by having multiple remote destinations per fdb.
> - making vxlan driver understand vlan-vni mapping:
> - I had a series almost ready with this one but soon realized
>   it duplicated a lot of vlan handling code in the vxlan driver
> 
> This series is briefly tested for functionality. Sending it out as RFC while
> I continue to test it more. There are some rough edges which I am in the 
> process
> of fixing.
> 
> Signed-off-by: Roopa Prabhu 
> 
> Roopa Prabhu (5):
>   ip_tunnels: new IP_TUNNEL_INFO_BRIDGE flag for ip_tunnel_info mode
>   vxlan: make COLLECT_METADATA mode bridge friendly
>   bridge: uapi: add per vlan tunnel info
>   bridge: vlan lwt and dst_metadata netlink support
>   bridge: vlan lwt dst_metadata hooks in ingress and

Re: [RFC PATCH net-next 0/5] bridge: per vlan lwt and dst_metadata support

2017-01-23 Thread Roopa Prabhu

On 1/23/17, 9:03 AM, Or Gerlitz wrote:
> On Mon, Jan 23, 2017 at 6:13 PM, Roopa Prabhu 
> wrote:
>
>> Also, the goal is to reduce the number of vxlan devices from say 4k to 1.
>> I don't think replacing it with 8k (egress + ingress) rules is going in the
>> right direction.
>>
> Can't you take advantage of the shared vxlan device configuration
> introduced throughout the LWT work such that you have single device dealing
> with many tunnels? why?
>
I tried to cover this in my initial paragraph in the cover letter:
"lwt and dst_metadata/collect_metadata have enabled vxlan l3 deployments to use 
a 'single vxlan
netdev for multiple vnis' eliminating the scalability problem with using a 
'single vxlan netdev per vni'.
This series tries to do the same for vxlan netdevs in pure l2 bridged networks. 
Use-case/deployment and
details are below." there is more in the cover letter on this.

There is no route pointing to the vxlan device here. vxlan device is a bridged 
port. And it bridges local host ports to remote vxlan tunnels
vlan-to-vxlan.

Re: [RFC PATCH net-next 0/5] bridge: per vlan lwt and dst_metadata support

2017-01-23 Thread Roopa Prabhu

On 1/23/17, 8:24 AM, Jiri Benc wrote:
> On Mon, 23 Jan 2017 08:13:30 -0800, Roopa Prabhu wrote:
>> And, a 'vlan-to-tunid' mapping is a very common configuration in L2 ethernet 
>> vpn configurations.
> You have one particular and narrow use case in mind and are proposing a
> rather large patchset to add support for that (and only that) single
> use case, while we already have a generic mechanism in place to address
> this and many similar (and dissimilar, too) use cases. That doesn't
> sound right.
Let me clarify:
the generic mechanism you are talking about is dst_metadata infra. Any 
subsystem can use it.
tc vlan and dst_metadata wrapper/filter provide a creative way to use it inside 
the tc subsystem and is very
useful for people using tc all-around.
What I am proposing here is hooks in bridge to use the dst_metadata for pure L2 
networks who
use the bridge driver. This is similar to how we have lwt plugged into the L3 
(routing) code.
If you are using the bridge driver for vlan config and filtering, I don't see 
why one
 has to duplicate vlan config using tc. Its painful trying to deploy l2 
networks with vlan config spanning
multiple subsystems and apis.

Regarding the patch-set size, let me give you a breakdown:
If i used tc for passing dst_metadata (assume 4k vlans that are participating 
in l2 ethernet vpn):
(a) configure bridging/vlan filtering using bridge driver (4k vlans)
(b) configure tc rules to map vlans to tunnel-id (Additional patch to tc to 
only allow tunnel-id in dst_metadata: ingress + egress = 8k tc rules)
(c) vxlan driver patch to make it bridge friendly (my patch in this series is 
required regardless if i use tc or bridge driver for dst_metadata because vxlan 
driver learns and needs to carry the forwarding information database)
(d) ethernet vpn controller (quagga bgp) looks at 'bridge api + vxlan api + tc 
filtering rules'
   

My current series:
(a) configure bridging/vlan filtering using bridge driver (4k vlans with tunnel 
info)
(b) vxlan driver patch to make it bridge friendly (my patch in this series is 
required regardless if
i use tc or bridge driver for dst_metadata because vxlan driver learns and 
needs to carry the forwarding information database)
(c) ethernet vpn controller (quagga bgp) looks at 'bridge api + vxlan api'


And btw, most of the functions that i am adding in the bridge driver are 
related to vlan range handling.
vlan ranges code is tricky and i am trying to also support vlan-tunnelid 
mapping in ranges, and i have tried
to rewrite my own vlan range code (added long back) to include tunnel info. The 
rest is just use of the dst_metadata infra
to store and use  dst_metadata per vlan.


>
> If the current generic mechanisms have bottlenecks for your use case,
> let's work on removing those bottlenecks. That way, everybody benefits,
> not just a single use case.
For people using all tc, the tc wrapper for dst_metadata is a good fit.
I see my series as still using the generic 'dst_metadata' mechanism/infra for a 
newer use case.
like i say above, I see this similar to how we have plugged dst_metadata into 
the L3 (routing) code.
This does it in the bridging code (for L2 networks).

Thanks,
Roopa

Re: [RFC PATCH net-next 0/5] bridge: per vlan lwt and dst_metadata support

2017-01-23 Thread Jiri Benc

On Mon, 23 Jan 2017 08:13:30 -0800, Roopa Prabhu wrote:
> And, a 'vlan-to-tunid' mapping is a very common configuration in L2 ethernet 
> vpn configurations.

You have one particular and narrow use case in mind and are proposing a
rather large patchset to add support for that (and only that) single
use case, while we already have a generic mechanism in place to address
this and many similar (and dissimilar, too) use cases. That doesn't
sound right.

If the current generic mechanisms have bottlenecks for your use case,
let's work on removing those bottlenecks. That way, everybody benefits,
not just a single use case.

Thanks,

 Jiri

Re: [RFC PATCH net-next 0/5] bridge: per vlan lwt and dst_metadata support

2017-01-23 Thread Roopa Prabhu

On 1/23/17, 12:51 AM, Jiri Benc wrote:
> On Mon, 23 Jan 2017 09:08:05 +0100, Jiri Pirko wrote:
>> Sat, Jan 21, 2017 at 06:46:51AM CET, ro...@cumulusnetworks.com wrote:
>>> Other approaches tried and vetoed:
>>> - tc vlan push/pop and tunnel metadata dst:
>>>- posses a tc rule scalability problem (2 rules per vni)
>> Why it is a problem?
> Wanted to ask exactly the same question.
>
>>>- cannot handle the case where a packet needs to be replicated to
>>>  multiple vxlan remote tunnel end-points.. which the vxlan driver
>>>  can do today by having multiple remote destinations per fdb.
>> Can't you just extend the tc to support this?
> +1
>
>> To me, looks like the tc is the correct place to hangle this. Then, the
>> user can use it for multiple cases of forwarding, including bridge,
>> tc-mirred, ovs and others. Putting this in bridge somehow seems wrong in
>> this light. Also, the bridge code is polluted enough as it is. I this we
>> should be super-picky to add another code there.
> Completely agreed.
>

The problem is, When you use the Linux bridge for vlan configuration and vlan 
filtering, having
additional vlan config in some other subsystem is a bit awkward. Its the same 
argument where
tc and netfilter subsystems have so much overlap...but they do because, each 
subsystem has to
have the missing functionality for completenesscannot expect the user to 
configure a few rules
in tc and a few others in netfilter. In this case, I cannot expect the user/app 
to configure vlan filtering
in one place and have additional vlan to tunnel filtering in another subsystem. 
Its duplicating vlan
configuration in multiple places.

Also, the goal is to reduce the number of vxlan devices from say 4k to 1. I 
don't think replacing
it with 8k (egress + ingress) rules is going in the right direction.

bigger picture/context... With bgp now being deployed as a controller for
l2 ethernet vpn solutions 
(https://tools.ietf.org/html/draft-ietf-bess-evpn-overlay-07), popular routing
suites like quagga, are looking at using the Linux api for L2 configuration.
And, a 'vlan-to-tunid' mapping is a very common configuration in L2 ethernet 
vpn configurations.
With the bridge driver being the center of vlan configuration in such bridged 
networks,
having all vlan configuration in one place makes sense. Also, quagga now has a 
single api
to get the 'vlan-to-tunid' mapping. Telling quagga to look at tc filtering 
rules to derive this
mapping is not inline with the rest of the L2 api ..(when you use the Linux 
bridge ..).

Regarding piling this on to the bridge driver:
- It is using existing dst metadata infra + two hooks disabled by default.
- I started this with vlan-to-vxlan map in the vxlan driver (regret spending 
time on it)..
I ended up duplicating a lot of vlan handling code that the bridge driver 
all-ready had in the vxlan driver.
Hence bridge driver is the right place for this ...when you are using the 
bridge driver for vlan filtering.
- Besides, having it in the bridge driver ..enables the bridge driver for 
future other
 l2 evpn dataplanes (vxlan just happens to be one of them i am working on 
currently).

Re: [RFC PATCH net-next 0/5] bridge: per vlan lwt and dst_metadata support

2017-01-23 Thread Jiri Benc

On Mon, 23 Jan 2017 09:08:05 +0100, Jiri Pirko wrote:
> Sat, Jan 21, 2017 at 06:46:51AM CET, ro...@cumulusnetworks.com wrote:
> >Other approaches tried and vetoed:
> >- tc vlan push/pop and tunnel metadata dst:
> >- posses a tc rule scalability problem (2 rules per vni)
> 
> Why it is a problem?

Wanted to ask exactly the same question.

> >- cannot handle the case where a packet needs to be replicated to
> >  multiple vxlan remote tunnel end-points.. which the vxlan driver
> >  can do today by having multiple remote destinations per fdb.
> 
> Can't you just extend the tc to support this?

+1

> To me, looks like the tc is the correct place to hangle this. Then, the
> user can use it for multiple cases of forwarding, including bridge,
> tc-mirred, ovs and others. Putting this in bridge somehow seems wrong in
> this light. Also, the bridge code is polluted enough as it is. I this we
> should be super-picky to add another code there.

Completely agreed.

 Jiri

Re: [RFC PATCH net-next 0/5] bridge: per vlan lwt and dst_metadata support

2017-01-23 Thread Jiri Pirko

Sat, Jan 21, 2017 at 06:46:51AM CET, ro...@cumulusnetworks.com wrote:
>From: Roopa Prabhu 
>
>High level summary:
>lwt and dst_metadata/collect_metadata have enabled vxlan l3 deployments
>to use a single vxlan netdev for multiple vnis eliminating the scalability
>problem with using a single vxlan netdev per vni. This series tries to
>do the same for vxlan netdevs in pure l2 bridged networks.
>Use-case/deployment and details are below.
>
>Deployment scerario details:
>As we know VXLAN is used to build layer 2 virtual networks across the
>underlay layer3 infrastructure. A VXLAN tunnel endpoint (VTEP)
>originates and terminates VXLAN tunnels. And a VTEP can be a TOR switch
>or a vswitch in the hypervisor. This patch series mainly
>focuses on the TOR switch configured as a Vtep. Vxlan segment ID (vni)
>along with vlan id is used to identify layer 2 segments in a vxlan
>overlay network. Vxlan bridging is the function provided by Vteps to terminate
>vxlan tunnels and map the vxlan vni to traditional end host vlan. This is
>covered in the "VXLAN Deployment Scenarios" in sections 6 and 6.1 in RFC 7348.
>To provide vxlan bridging function, a vtep has to map vlan to a vni. The rfc
>says that the ingress VTEP device shall remove the IEEE 802.1Q VLAN tag in
>the original Layer 2 packet if there is one before encapsulating the packet
>into the VXLAN format to transmit it through the underlay network. The remote
>VTEP devices have information about the VLAN in which the packet will be
>placed based on their own VLAN-to-VXLAN VNI mapping configurations.
>
>Existing solution:
>Without this patch series one can deploy such a vtep configuration by
>by adding the local ports and vxlan netdevs into a vlan filtering bridge.
>The local ports are configured as trunk ports carrying all vlans.
>A vxlan netdev per vni is added to the bridge. Vlan mapping to vni is
>achieved by configuring the vlan as pvid on the corresponding vxlan netdev.
>The vxlan netdev only receives traffic corresponding to the vlan it is mapped
>to. This configuration maps traffic belonging to a vlan to the corresponding
>vxlan segment.
>
>  ---
> |  bridge   |
> |   |
>  ---
>|100,200   |100 (pvid)|200 (pvid)
>|  |  |
>   swp1  vxlan1000  vxlan2000
>
>This provides the required vxlan bridging function but poses a
>scalability problem with using a single vxlan netdev for each vni.
>
>Solution in this patch series:
>The Goal is to use a single vxlan device to carry all vnis similar
>to the vxlan collect metadata mode but vxlan driver still carrying all
>the forwarding information.
>- vxlan driver changes:
>- enable collect metadata mode device to be used with learning,
>  replication, fdb
>- A single fdb table hashed by (mac, vni)
>- rx path already has the vni
>- tx path expects a vni in the packet with dst_metadata and vxlan
>  driver has all the forwarding information for the vni in the
>  dst_metadata.
>
>- Bridge driver changes: per vlan LWT and dst_metadata support:
>- Our use case is vxlan and 1-1 mapping between vlan and vni, but I have
>  kept the api generic for any tunnel info
>- Uapi to configure/unconfigure/dump per vlan tunnel data
>- new bridge port flag to turn this feature on/off. off by default
>- ingress hook:
>- if port is a lwt tunnel port, use tunnel info in
>  attached dst_metadata to map it to a local vlan
>- egress hook:
>- if port is a lwt tunnel port, use tunnel info attached to vlan
>  to set dst_metadata on the skb
>
>Other approaches tried and vetoed:
>- tc vlan push/pop and tunnel metadata dst:
>- posses a tc rule scalability problem (2 rules per vni)

Why it is a problem?


>- cannot handle the case where a packet needs to be replicated to
>  multiple vxlan remote tunnel end-points.. which the vxlan driver
>  can do today by having multiple remote destinations per fdb.

Can't you just extend the tc to support this?


To me, looks like the tc is the correct place to hangle this. Then, the
user can use it for multiple cases of forwarding, including bridge,
tc-mirred, ovs and others. Putting this in bridge somehow seems wrong in
this light. Also, the bridge code is polluted enough as it is. I this we
should be super-picky to add another code there.

Can you please elaborate more why can't have this as a re-usable TC solution?


Thanks.

[RFC PATCH net-next 0/5] bridge: per vlan lwt and dst_metadata support

2017-01-20 Thread Roopa Prabhu

From: Roopa Prabhu 

High level summary:
lwt and dst_metadata/collect_metadata have enabled vxlan l3 deployments
to use a single vxlan netdev for multiple vnis eliminating the scalability
problem with using a single vxlan netdev per vni. This series tries to
do the same for vxlan netdevs in pure l2 bridged networks.
Use-case/deployment and details are below.

Deployment scerario details:
As we know VXLAN is used to build layer 2 virtual networks across the
underlay layer3 infrastructure. A VXLAN tunnel endpoint (VTEP)
originates and terminates VXLAN tunnels. And a VTEP can be a TOR switch
or a vswitch in the hypervisor. This patch series mainly
focuses on the TOR switch configured as a Vtep. Vxlan segment ID (vni)
along with vlan id is used to identify layer 2 segments in a vxlan
overlay network. Vxlan bridging is the function provided by Vteps to terminate
vxlan tunnels and map the vxlan vni to traditional end host vlan. This is
covered in the "VXLAN Deployment Scenarios" in sections 6 and 6.1 in RFC 7348.
To provide vxlan bridging function, a vtep has to map vlan to a vni. The rfc
says that the ingress VTEP device shall remove the IEEE 802.1Q VLAN tag in
the original Layer 2 packet if there is one before encapsulating the packet
into the VXLAN format to transmit it through the underlay network. The remote
VTEP devices have information about the VLAN in which the packet will be
placed based on their own VLAN-to-VXLAN VNI mapping configurations.

Existing solution:
Without this patch series one can deploy such a vtep configuration by
by adding the local ports and vxlan netdevs into a vlan filtering bridge.
The local ports are configured as trunk ports carrying all vlans.
A vxlan netdev per vni is added to the bridge. Vlan mapping to vni is
achieved by configuring the vlan as pvid on the corresponding vxlan netdev.
The vxlan netdev only receives traffic corresponding to the vlan it is mapped
to. This configuration maps traffic belonging to a vlan to the corresponding
vxlan segment.

  ---
 |  bridge   |
 |   |
  ---
|100,200   |100 (pvid)|200 (pvid)
|  |  |
   swp1  vxlan1000  vxlan2000

This provides the required vxlan bridging function but poses a
scalability problem with using a single vxlan netdev for each vni.

Solution in this patch series:
The Goal is to use a single vxlan device to carry all vnis similar
to the vxlan collect metadata mode but vxlan driver still carrying all
the forwarding information.
- vxlan driver changes:
- enable collect metadata mode device to be used with learning,
  replication, fdb
- A single fdb table hashed by (mac, vni)
- rx path already has the vni
- tx path expects a vni in the packet with dst_metadata and vxlan
  driver has all the forwarding information for the vni in the
  dst_metadata.

- Bridge driver changes: per vlan LWT and dst_metadata support:
- Our use case is vxlan and 1-1 mapping between vlan and vni, but I have
  kept the api generic for any tunnel info
- Uapi to configure/unconfigure/dump per vlan tunnel data
- new bridge port flag to turn this feature on/off. off by default
- ingress hook:
- if port is a lwt tunnel port, use tunnel info in
  attached dst_metadata to map it to a local vlan
- egress hook:
- if port is a lwt tunnel port, use tunnel info attached to vlan
  to set dst_metadata on the skb

Other approaches tried and vetoed:
- tc vlan push/pop and tunnel metadata dst:
- posses a tc rule scalability problem (2 rules per vni)
- cannot handle the case where a packet needs to be replicated to
  multiple vxlan remote tunnel end-points.. which the vxlan driver
  can do today by having multiple remote destinations per fdb.
- making vxlan driver understand vlan-vni mapping:
- I had a series almost ready with this one but soon realized
  it duplicated a lot of vlan handling code in the vxlan driver

This series is briefly tested for functionality. Sending it out as RFC while
I continue to test it more. There are some rough edges which I am in the process
of fixing.

Signed-off-by: Roopa Prabhu 

Roopa Prabhu (5):
  ip_tunnels: new IP_TUNNEL_INFO_BRIDGE flag for ip_tunnel_info mode
  vxlan: make COLLECT_METADATA mode bridge friendly
  bridge: uapi: add per vlan tunnel info
  bridge: vlan lwt and dst_metadata netlink support
  bridge: vlan lwt dst_metadata hooks in ingress and egress paths

 drivers/net/vxlan.c|  209 
 include/linux/if_bridge.h  |1 +
 include/net/ip_tunnels.h   |1 +
 include/uapi/linux/if_bridge.h |   11 ++
 include/uapi/linux/if_link.h   |1 +
 include/uapi/linux/neighbour.h |1 +

Re: [RFC PATCH net-next 0/5] bridge: per vlan lwt and dst_metadata support

Re: [RFC PATCH net-next 0/5] bridge: per vlan lwt and dst_metadata support

Re: [RFC PATCH net-next 0/5] bridge: per vlan lwt and dst_metadata support

Re: [RFC PATCH net-next 0/5] bridge: per vlan lwt and dst_metadata support

Re: [RFC PATCH net-next 0/5] bridge: per vlan lwt and dst_metadata support

Re: [RFC PATCH net-next 0/5] bridge: per vlan lwt and dst_metadata support

Re: [RFC PATCH net-next 0/5] bridge: per vlan lwt and dst_metadata support

Re: [RFC PATCH net-next 0/5] bridge: per vlan lwt and dst_metadata support

[RFC PATCH net-next 0/5] bridge: per vlan lwt and dst_metadata support

9 matches

Site Navigation

Mail list logo

Footer information