Re: [ovs-dev] [PATCH net-next 00/22 v2] Lightweight flow based encapsulation

2015-07-22 Thread roopa

On 7/22/15, 1:58 AM, thomas.mo...@orange.com wrote:

Hi Thomas,

This looks promising.

One question: will this approach allow MPLS-in-GRE and MPLS-in-UDP ?

The current series was focused on IP to MPLS tunnels. But, the 
infrastructure allows associating encap state with routes and calling 
the respective encapsulation output handlers.


For MPLS-in-GRE and MPLS-in-UDP, it appears that MPLS LSP routes 
(af_mpls.c) could support associating GRE and UDP encap info (RTA_ENCAP 
and RTA_ENCAP_TYPE) similar to IP routes and the re-direction from MPLS 
to GRE or MPLS to UDP can be achieved in a similar way using the same 
infrastructure.


Thanks,
Roopa

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ovs-dev] [PATCH net-next 00/22 v2] Lightweight flow based encapsulation

2015-07-22 Thread thomas.morin

Hi Thomas,

This looks promising.

One question: will this approach allow MPLS-in-GRE and MPLS-in-UDP ?

-Thomas

2015-07-21, Thomas Graf:

This series combines the work previously posted by Roopa, Robert and
myself. It's according to what we discussed at NFWS. The motivation
of this series is to:

  * Consolidate code between OVS and the rest of the kernel and get
rid of OVS vports and instead represent them as pure net_devices.
  * Introduce a lightweight tunneling mechanism which enables flow
based encapsulation to improve scalability on both RX and TX.
  * Do the above in an encapsulation unspecific way so that the
encapsulation type is eventually abstracted away from the user.
  * Use the same forwarding decision for both native forwarding and
encapsulation thus allowing to switch between native IPv6 and
UDP encapsulation based on endpoint without requiring additional
logic

The fundamental changes introduces in this series are:
  * A new RTA_ENCAP Netlink attribute for routes carrying encapsulation
instructions. Depending on the specified type, the instructions
apply to UDP encapsulations, MPLS and possible other in the future.
  * Depending on the encapsulation type, the output function of the
dst is directly overwritten or the dst merely attaches metadata and
relies on a subsequent net_device to apply it to the packet. The
latter is typically used if an inner and outer IP header exist which
require two subsequent routing lookups to be performed.
  * A new metadata_dst structure which can be attached to skbs to
carry metadata in between subsystems. This new metadata transport
is used to provide a single interface for VXLAN, routing and OVS
to communicate through metadata.

The OVS interfaces remain as-is but will transparently create a real
VXLAN net_device in the background. iproute2 is extended with a new
use cases:

   VXLAN:
   ip route add 40.1.1.1/32 encap vxlan id 10 dst 50.1.1.2 dev vxlan0

   MPLS:
   ip route add 10.1.1.0/30 encap mpls 200 via inet 10.1.1.1 dev swp1

Performance implications:
   The additional memory allocation in the receive path should have
   performance implications although it is not observable in standard
   throughput tests if GRO is properly done. The correct net_device
   model outweights the additional cost of the allocation. Furthermore,
   this implication can be relaxed by reintroducing a direct unqueued
   path from a software device to a consumer like bridge or OVS if
   needed.

 $ netperf  -t TCP_STREAM -H 15.1.1.201
 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
 15.1.1.201 (15.1.1.201) port 0 AF_INET : demo
 Recv   SendSend
 Socket Socket  Message  Elapsed
 Size   SizeSize Time Throughput
 bytes  bytes   bytessecs.10^6bits/sec

  87380  16384  1638410.009118.17

Changes since v1:
  * Properly initialize tun_id as reported by Julian
  * Drop dupliate netif_keep_dst() as reported by Alexei

Roopa Prabhu (9):
   rtnetlink: introduce new RTA_ENCAP_TYPE and RTA_ENCAP attributes
   lwtunnel: infrastructure for handling light weight tunnels like mpls
   ipv4: support for fib route lwtunnel encap attributes
   ipv6: support for fib route lwtunnel encap attributes
   lwtunnel: support dst output redirect function
   ipv4: redirect dst output to lwtunnel output
   ipv6: rt6_info output redirect to tunnel output
   mpls: export mpls functions for use by mpls iptunnels
   mpls: ip tunnel support

Thomas Graf (13):
   ip_tunnel: Make ovs_tunnel_info and ovs_key_ipv4_tunnel generic
   icmp: Don't leak original dst into ip_route_input()
   dst: Metadata destinations
   arp: Inherit metadata dst when creating ARP requests
   vxlan: Flow based tunneling
   route: Extend flow representation with tunnel key
   route: Per route IP tunnel metadata via lightweight tunnel
   fib: Add fib rule match on tunnel id
   vxlan: Factor out device configuration
   openvswitch: Make tunnel set action attach a metadata dst
   openvswitch: Move dev pointer into vport itself
   openvswitch: Abstract vport name through ovs_vport_name()
   openvswitch: Use regular VXLAN net_device device

  drivers/net/vxlan.c  | 672 +--
  include/linux/lwtunnel.h |   6 +
  include/linux/mpls_iptunnel.h|   6 +
  include/linux/skbuff.h   |   1 +
  include/net/dst.h|   6 +-
  include/net/dst_metadata.h   |  55 +++
  include/net/fib_rules.h  |   1 +
  include/net/flow.h   |   8 +
  include/net/ip6_fib.h|   3 +
  include/net/ip_fib.h |   5 +-
  include/net/ip_tunnels.h |  95 -
  include/net/lwtunnel.h   | 144 
  include/net/mpls_iptunnel.h  |  29 ++
  include/net/route.h  |   1 +
  include/net/rtnetlink.h  |   1 +
  

Re: [PATCH net-next 00/22 v2] Lightweight flow based encapsulation

2015-07-21 Thread David Miller
From: Thomas Graf tg...@suug.ch
Date: Tue, 21 Jul 2015 10:43:44 +0200

 This series combines the work previously posted by Roopa, Robert and
 myself. It's according to what we discussed at NFWS. The motivation
 of this series is to:
 
  * Consolidate code between OVS and the rest of the kernel and get
rid of OVS vports and instead represent them as pure net_devices.
  * Introduce a lightweight tunneling mechanism which enables flow
based encapsulation to improve scalability on both RX and TX.
  * Do the above in an encapsulation unspecific way so that the
encapsulation type is eventually abstracted away from the user.
  * Use the same forwarding decision for both native forwarding and
encapsulation thus allowing to switch between native IPv6 and
UDP encapsulation based on endpoint without requiring additional
logic
 
 The fundamental changes introduces in this series are:
  * A new RTA_ENCAP Netlink attribute for routes carrying encapsulation
instructions. Depending on the specified type, the instructions
apply to UDP encapsulations, MPLS and possible other in the future.
  * Depending on the encapsulation type, the output function of the
dst is directly overwritten or the dst merely attaches metadata and
relies on a subsequent net_device to apply it to the packet. The
latter is typically used if an inner and outer IP header exist which
require two subsequent routing lookups to be performed.
  * A new metadata_dst structure which can be attached to skbs to
carry metadata in between subsystems. This new metadata transport
is used to provide a single interface for VXLAN, routing and OVS
to communicate through metadata.

Series applied, but please take Alexei's endianness feedback into
consideration.

Thanks!
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 00/22 v2] Lightweight flow based encapsulation

2015-07-21 Thread Thomas Graf
This series combines the work previously posted by Roopa, Robert and
myself. It's according to what we discussed at NFWS. The motivation
of this series is to:

 * Consolidate code between OVS and the rest of the kernel and get
   rid of OVS vports and instead represent them as pure net_devices.
 * Introduce a lightweight tunneling mechanism which enables flow
   based encapsulation to improve scalability on both RX and TX.
 * Do the above in an encapsulation unspecific way so that the
   encapsulation type is eventually abstracted away from the user.
 * Use the same forwarding decision for both native forwarding and
   encapsulation thus allowing to switch between native IPv6 and
   UDP encapsulation based on endpoint without requiring additional
   logic

The fundamental changes introduces in this series are:
 * A new RTA_ENCAP Netlink attribute for routes carrying encapsulation
   instructions. Depending on the specified type, the instructions
   apply to UDP encapsulations, MPLS and possible other in the future.
 * Depending on the encapsulation type, the output function of the
   dst is directly overwritten or the dst merely attaches metadata and
   relies on a subsequent net_device to apply it to the packet. The
   latter is typically used if an inner and outer IP header exist which
   require two subsequent routing lookups to be performed.
 * A new metadata_dst structure which can be attached to skbs to
   carry metadata in between subsystems. This new metadata transport
   is used to provide a single interface for VXLAN, routing and OVS
   to communicate through metadata.

The OVS interfaces remain as-is but will transparently create a real
VXLAN net_device in the background. iproute2 is extended with a new
use cases:

  VXLAN:
  ip route add 40.1.1.1/32 encap vxlan id 10 dst 50.1.1.2 dev vxlan0

  MPLS:
  ip route add 10.1.1.0/30 encap mpls 200 via inet 10.1.1.1 dev swp1

Performance implications:
  The additional memory allocation in the receive path should have
  performance implications although it is not observable in standard
  throughput tests if GRO is properly done. The correct net_device
  model outweights the additional cost of the allocation. Furthermore,
  this implication can be relaxed by reintroducing a direct unqueued
  path from a software device to a consumer like bridge or OVS if
  needed.

$ netperf  -t TCP_STREAM -H 15.1.1.201
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
15.1.1.201 (15.1.1.201) port 0 AF_INET : demo
Recv   SendSend
Socket Socket  Message  Elapsed
Size   SizeSize Time Throughput
bytes  bytes   bytessecs.10^6bits/sec

 87380  16384  1638410.009118.17

Changes since v1:
 * Properly initialize tun_id as reported by Julian
 * Drop dupliate netif_keep_dst() as reported by Alexei

Roopa Prabhu (9):
  rtnetlink: introduce new RTA_ENCAP_TYPE and RTA_ENCAP attributes
  lwtunnel: infrastructure for handling light weight tunnels like mpls
  ipv4: support for fib route lwtunnel encap attributes
  ipv6: support for fib route lwtunnel encap attributes
  lwtunnel: support dst output redirect function
  ipv4: redirect dst output to lwtunnel output
  ipv6: rt6_info output redirect to tunnel output
  mpls: export mpls functions for use by mpls iptunnels
  mpls: ip tunnel support

Thomas Graf (13):
  ip_tunnel: Make ovs_tunnel_info and ovs_key_ipv4_tunnel generic
  icmp: Don't leak original dst into ip_route_input()
  dst: Metadata destinations
  arp: Inherit metadata dst when creating ARP requests
  vxlan: Flow based tunneling
  route: Extend flow representation with tunnel key
  route: Per route IP tunnel metadata via lightweight tunnel
  fib: Add fib rule match on tunnel id
  vxlan: Factor out device configuration
  openvswitch: Make tunnel set action attach a metadata dst
  openvswitch: Move dev pointer into vport itself
  openvswitch: Abstract vport name through ovs_vport_name()
  openvswitch: Use regular VXLAN net_device device

 drivers/net/vxlan.c  | 672 +--
 include/linux/lwtunnel.h |   6 +
 include/linux/mpls_iptunnel.h|   6 +
 include/linux/skbuff.h   |   1 +
 include/net/dst.h|   6 +-
 include/net/dst_metadata.h   |  55 +++
 include/net/fib_rules.h  |   1 +
 include/net/flow.h   |   8 +
 include/net/ip6_fib.h|   3 +
 include/net/ip_fib.h |   5 +-
 include/net/ip_tunnels.h |  95 -
 include/net/lwtunnel.h   | 144 
 include/net/mpls_iptunnel.h  |  29 ++
 include/net/route.h  |   1 +
 include/net/rtnetlink.h  |   1 +
 include/net/vxlan.h  |  85 -
 include/uapi/linux/fib_rules.h   |   2 +-
 include/uapi/linux/if_link.h |   1 +
 include/uapi/linux/lwtunnel.h|  16 +
 include/uapi/linux/mpls_iptunnel.h   |  28 ++