[Yahoo-eng-team] [Bug 1509184] Re: Enable openflow based dvr routing for east/west traffic

Armando Migliaccio Tue, 24 Nov 2015 10:52:02 -0800

Unless I am missing something, this can be handled by working in
dragonflow.


** Changed in: neutron
       Status: New => Won't Fix

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1509184

Title:
  Enable openflow based dvr routing for east/west traffic

Status in neutron:
  Won't Fix

Bug description:
  In the juno cycle dvr support was added to neutron do decentralise routing to 
the compute nodes.
  This  RFE bug proposes the introduction of a new dvr mode 
(dvr_local_openflow) to optimise the datapath
  of east/west traffic.

  -----------------------------------------------High level 
description-------------------------------
  The current implementation of DVR with ovs utilizes linux network namespaces 
to instantiate l3
  routers, the details of which are described here: 
http://docs.openstack.org/networking-guide/scenario_dvr_ovs.html

  fundamentally a neutron router comprises of 3 elements.
  - a router instance (network namespace)
  - a router interface (tap device)
  - a set or routing rules (kernel ip routes)

  In the special case of routing east/west traffic both the source and 
destination interfaces are known to neutron.
  because of that fact neutron contains all the information required to 
logically route traffic from its origin to its destination
  enabling the path to be established primitively. this proposal suggests 
moving the instantiation of the dvr local router from the kernel ip stack to 
Open vSwitch(ovs) for east/west traffic. 

  Open vSwitch provides a logical programmable interface (Openflow) to 
configure traffic forwarding and modification actions on arbitrary packet 
streams. When managed by the neutron openvswich l2 agent, ovs operates as a 
simple mac learning switch with limited utilisation of it programmable 
dataplane. to utilise ovs to create an l3 router the follow mappings from the 3 
fundamental elements can be made
  - a router instance (network namespace + a ovs bridge)
  - a router interface (tap device  + patch port pair)
  - a set or routing rules (kernel ip routes + openflow rules)

  ----------------------------------------background 
context---------------------------------------------
  TL;DR 
  basic explanation of openflow/ovs briges and patch ports
  skip to implementation section if familiar.

  ovs implementation background:
  In openvswich at the control layer an ovs bridge is a unique logical domain 
of interfaces and flow rules.
  Similarly at the control layer a patch port pair is a logical entity that 
interconnects two bridges(or logical domains).

  From a dataplane perspective each ovs bridge is  first created as a separate 
instance of a dataplane.
  if these separate bridges/dataplanes are interconnected by patch ports, ovs  
will collapse the independent dataplanes into a single
  ovs dataplane instance. As a direct result of this implementation a logical 
topology of 1 bridge with two interfaces is realised in the dataplane level 
identically to 2 bridges each with 1 interface interconnected by path ports. 
This translate to zero dataplane overhead to the creation of multiple bridge 
allowing for arbitrary numbers of router instances to be created.

  Openflow capability background:
  The openflow protocol provides many capabilities which can be generally 
summarised as packet match criteria and actions to apply
  when the criteria is satisfied.  In the case of l3 routeing the match 
criteria of relevance are the Ethernet type and the destination ip 
address.similarly the openflow actions required are 
mod_dest,set_field,move,dec_ttl,output and drop.

  logical packet flow for a ping between two vms on same host:
  in the l2 case if a vm tries to ping another vm in the same subnet thre are 4 
stages. 
  - first it will send a broadcast arp packet to learn the mac address from the 
destination ip of the remote vm. 
  - second the destination vm receives the arp request and learns the source 
vms mac,then replies as follows:
      a.) swap the source and destination ip of the arp packet
      b.) copy the source mac address to the destination mac address and set 
the source mac address to the local interface mac.
      c.) set arp type code form request to reply.
      d.) transmit reply via received interface
  - third on receiving the arp reply the source vm will transmit the icmp 
packet 
    source vm will then transmit the icmp packet to the destination vm with the 
learned mac address
  -  fourth on receiving the icmp the destination vm replies.

  in the l3 case the packet flow is similar but slightly different.
  - first the source vm sends an arp to the subnet gateway. 
  - second the gateway router responds with its mac address
  - third the source vm send the icmp packet to the router
  - fourth the  router receives the icmp packet and send an arp to the 
destination vm.
  - fifth the destination vm sends a arp reply to the gateway
  - sixth the router forwards the icmp to the destination vm
  -seventh the destination vm replies to the router
  - eight the reply is received by the source vm.

  ----------------------------------current
  implementation---------------------------------------------------

  l3 ping packet flow in dvr_local mode(simplified to ignore broadcast):
  logical:
  - the arp packet is received from the source vm and logically vlan 
tagged(tenant isolation)
  - the arp packet is output to the router tap device(tap1), the vlan is 
striped and the packet is copied from the ovs dataplane to the 
    kernel networking stack in the routers linux namespace.
  - the kernel network stack replies to the arp and the reply packet is copied 
to the ovs dataplane and it is logically vlan tagged
  - the vlan is logically striped and the  arp reply switched to the source vm 
interface.
  - the icmp packet is received from the source vm and logically vlan 
tagged(tenant isolation)
  - the icmp packet is output to the route tap device, the vlan is striped and 
the packet is copied from the ovs dataplane to the 
    kernel networking stack in the routers linux namespace.
  - the kernel generates an arp request to the destination vm which follows the 
same path as the arp described above
  - the kernel  modifies the dest mac address, decrements the ttl and routes 
the packet to the appropriate tap device(tap2) where the packet is copied to 
the ovs dataplane and it is logically vlan tagged
  - the vlan is logically striped and the  icmp packet switched to the 
destination vm interface.
  - the reply path is similarly and is shortened as follows:
     destvm->vlan tagged->vlan stripped -> copied to kernel name space via 
tap2-> copied to ovs dataplane via tap1-> vlan tagged-> vlan stripped-> 
received by source vm.

  actual:
  - arp form source vm -> tap1 (vlan tagging skipped) + broadcast to other ports
  - tap1-> kernel network stack
  - kernel sends arp reply tap1
  - tap1-> source vm (vlan tagging skipped)
  - icmp from source vm -> tap1(vlan tagging skipped)
  - kernel receives icmp on tap1 and send arp request to  dest vm via 
tap2(broadcast)
  - arp via tap2 -> dest vm (vlan tagging skipped) 
  - dest vm replies -> tap2
  - kernel updates dest mac and decrement ttl the forward icmp packet to tap2
  - tap2 -> dest vm-> dest vm replies->tap2.(vlan tagging skipped) 
  - kernel updates dest mac and decrement ttl the forward icmp reply packet to 
tap1
  - tap1-> source vm

  -------------------------------------proposed 
change----------------------------------------------------------
  Proposed change:
  - a new class will be added to implement the new mode that subclasses the 
existing
    dvr_local router class.
  - if mode is dvr_local_openflow a routing bridge will be created for each dvr 
router.
  - when an internal network is added to the router the following actions will 
be preformed:
    a.) the tap interface will be created in the router network namespaces as 
normal but added 
          to routing bridge instead of the br-int.(tap devices are only used 
for north/south traffic)
    b.) a patch port pair will be created between the br-int and routing bridge
    c.) the  attached-mac,iface-id and iface-status will be populated in the 
external-id field or the br-int side of the patch port.
         this will enabled the unmodified neutron l2 agent to correctly manage 
the patch port.
    d.) a low priority rule that send all traffic form the patch port to the 
tap device will be added to the routing bridge.
    e.) a medium priority rule that will reply to all arp request to the router 
will be added to the routing bridge.
          this rule will use openflows move and set field actions to rewrite 
the arp request into a reply and output=in_port.
    f.) a high priority dest mac update and ttl decrement rule will be added to 
the routing bridge for each port 
         on the internal network.
  - when an external network is added to the router the workflow will be 
unchanged and is inherited from the dvr_local
    implementation.
  - the _update_arp_entry function will be extended additional populate and 
delete the  high priority dest mac update rules
    as neutron ports are added/removed form connected networks.

  l3 packet flow in dvr_local_openflow mode:

  logical:
  - the arp packet is received from the source vm and logically vlan 
tagged(tenant isolation)
  - the arp packet is output to the router bridge patch port , the vlan is 
striped
  - the arp request is rewritten into a reply and sent back to the br-int and 
logically vlan tagged
  - the vlan is logically striped and the  arp reply switched to the source vm 
interface.
  - the icmp packet is received from the source vm and logically vlan 
tagged(tenant isolation)
  - the icmp packet is output to the router bridge patch port , the vlan is 
striped.
  - the icmp packet matches the high priority rule and its destination mac is 
updated the it is output to the second patch port and it is logically vlan 
tagged
  - the vlan is logically striped and the  icmp packet switched to the 
destination vm interface.
  - the reply path is similarly and is shortened as follows:
     destvm->vlan tagged->vlan stripped -> router bridge via patch 2-> dest mac 
and ttl updated then output patch 1-> vlan tagged-> vlan stripped-> received by 
source vm.

  actual:
  - arp form source vm -> arp rewritten to reply -> sent to source vm ( single 
openflow action).
  - icmp from source vm ->  destination mac update, ttl decremented -> dest vm 
( single openflow action)
  - icmp from dest vm ->  destination mac update, ttl decremented -> source vm 
( single openflow action)

  other considerations:

  - north/south
      as ovs cannot lookup the destination mac dynamically via arp it is not 
possible to optimise the 
      north/south path as described above.

  - openvswich support
      this mechanism is compatible with both kernel and dpdk ovs.
      this mechanism requires nicira extensions for arp rewrite.
      arp rewrite can be skipped for great support if required as it will fall 
back to  tap device and kernel.
      icmp traffic for router interface will be handled by tap device as ovs 
currently does not 
      support setting icmp type code via set_field or load openflow actions.

  - performance
     performance of l3 routing is expected to approach l2 performance for 
east/west traffic.
     performance is not expected to change for north/south.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1509184/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1509184] Re: Enable openflow based dvr routing for east/west traffic

Reply via email to