Re: [ovs-dev] RFC OVN: fabric integration

Felix Huettner via dev Mon, 05 Aug 2024 03:09:33 -0700

On Thu, Aug 01, 2024 at 04:45:26PM +0200, Frode Nordahl wrote:
> On Mon, Jul 29, 2024 at 2:17 PM Felix Huettner
> <felix.huettner@mail.schwarz> wrote:
> >
> > Hi everyone,
> >
> > i have built a very first, ugly prototype of such a feature which
> > brought interesting insights that i want to share.
> >
> > # Testimplementation
> >
> > I first want to shortly share the setup i implemented this on:
> > The testsetup consists of 3 ovn nodes. One representing a compute node
> > while two others serve as gateways. The gateways also each have a
> > point-ot-point interface to an additional machine that represents a
> > leaf-spine architecture using network namespaces and static routes.
> >
> > For the OVN northbound content we have:
> > * a normal neutron project setup with a:
> >    * LSP for a VM (LSP-VM)
> >    * LS for the network (LS-internal)
> >    * LR for the router (R1)
> >    * LSP to the router (LSP-internal-R1)
> >    * LRP to the network (LRP-R1-internal)
> >    * a nat rule on R1 representing a floating ip
> > * The router R1 has an LRP (LRP-R1-public) with a ha_chassis_group
> >   configured to point to both gateways with different priorities
> > * There is an integration LR (public) that serves as the integration
> >   point of different projects. It replaces the LS normally used for
> >   this.
> > * The LR public has options:chassis configured to "gtw01,gtw02" (therby
> >   making it an l3gateway)
> > * LR public has an LRP (LRP-public-R1)
> > * The LRPs LRP-public-R1 and LRP-R1-public are configured as each others
> >   peers
> > * There is a logical switch (LS-public-for-real)
> > * LS-public-for-real has a LSP (physnet) of type localnet and
> >   network_name set
> > * LR public has an LRP (LRP-public-for-real)
> > * LS-public-for-real has a LSP (LSP-public)
> > * LSP-public and LRP-public-for-real are connected
> >
> > This setup contains two things that are currently not possible:
> > 1. l3gateways can not be bound by more than 1 chassis
> > 2. l3gateway lrps can not be directly connected to a distributed gateway
> >    port
> >
> > Supporting an l3gateway that runs one more than 1 chassis can quite
> > easily be done. This basically creates an active-active router on
> > multiple chassis where each chassis does not know anything of any of the
> > other routers (so e.g. no conntrack syncing).
> > However this also means that the LRP-public-for-real only exists once
> > from the OVN perspective while actually residing on multiple nodes. This
> > e.g. means that there is only a single Static Route for the LR pointing
> > to an IP behind LRP-public-for-real. This IP must be the same on all
> > chassis implementing the LR and it also must have the same mac address.
> >
> > Directly connecting an l3gateway lrp to a distributed gateway port gets
> > a little more ugly. The current implementation relies on the fact that
> > the chassis for the l3gateway must be a superset of any chassis in the
> > distributed gateway port. In this case i decided to tunnel to the
> > appropriate chassis between the ingress and egress pipeline of the
> > l3gateway.
> >
> > The implementation of this is avilable here: 
> > https://github.com/ovn-org/ovn/compare/main...felixhuettner:ovn:test_active_active_routing
> >
> > It allows the above setup to send icmp packets between the LSP-VM and
> > the external system. The external system can send packets through both
> > of the gateway chassis and they will be forwarded appropriately. Reply
> > traffic is always send from the chassis that is currently active for
> > LRP-R1-public to the external system.
> >
> > ## Current Limitations
> >
> > * An outage of the link between the LRP-public-for-real and the nexthop
> >   is not handled.
> > * It breaks some testcases. I did not investigate them yet


Hi everyone,

i now have a new and completely different version built based on the
learnings from the last one and internal discussions.

The code builds upon Numans patch for centralized routing of dgp: 
https://patchwork.ozlabs.org/project/ovn/patch/20240730023850.1671255-1-num...@ovn.org/
It is available here: 
https://github.com/ovn-org/ovn/compare/main...felixhuettner:ovn:test_active_active_routing_v3?expand=1
(Yes it is named v3, don't ask about v2).
Only the last 5 patches are actually specific to this change. All
previous ones are cleanups to make this work.

The chassis setup itself is the same as above.
The new OVN Northbound content now is:
* a normal neutron project setup with a (same as above):
   * LSP for a VM (LSP-VM)
   * LS for the network (LS-internal)
   * LR for the router (R1)
   * LSP to the router (LSP-internal-R1)
   * LRP to the network (LRP-R1-internal)
   * a nat rule on R1 representing a floating ip
* The router R1 has an LRP (LRP-R1-public) with a ha_chassis_group
  configured to point to both gateways with different priorities (same
  as above)
* We now have an integration LS (public) which is just a normal logical
  switch. This is the neutron external network
* LS public has a LSP (LSP-public-R1) that connects to LRP-R1-public
* There is a LR (magic-router)
* LR magic-router is connected via a normal LSP/LRP combination with LS
  public
* There is a logical switch (LS-public-for-real)
* LS-public-for-real has a LSP (physnet) of type localnet and
  network_name set to "phys"
* LS-public-for-real has a LSP (LSP-public-for-real-magic-router)
* LR public has an LRP (LRP-magic-router-public-for-real). It has set:
   * an ha_chassis_group set to point to both gateway chassis is the
     same or different priorities
   * mac set to "active-active"
   * networks set to "[active-active]"
   * option:active-active-lrp=true
* LSP-public-for-real-magic-router and LRP-magic-router-public-for-real are 
connected

Additionally on local ovsdb of gtw01 we have configured:
* ovn-aa-port-mappings="phys;00:fe:fe:fe:fe:01,172.16.0.10/25"
* ovn-bridge-mappings="phys:physnet"

Additionally on local ovsdb of gtw02 we have configured:
* ovn-aa-port-mappings="phys;00:fe:fe:fe:fe:11,172.16.0.139/25"
* ovn-bridge-mappings="phys:physnet"

The magic that makes this setup then working comes from:
1. Numans patch for centralized routing which i patched to not need a
   setting anymore
2. Logic in northd to generate dervied Port Bindings for
   LSP-public-for-real-magic-router and LRP-magic-router-public-for-real
   for each entry in ovn-aa-port-mappings if the port is an
   active-active-lrp

As a result this behaves very similar to the implementation of Frode in
https://mail.openvswitch.org/pipermail/ovs-dev/2024-July/415343.html
The major change is that there is no need to specify all ports in the
northbound. This allows chasis specific settings to be configured on the
chassis itself, instead of configuring it via northbound.

The setting has the format "<network_name>;<mac>,<ip>;<mac>,<ip>".
It thereby allows multiple ports for a single network per chassis. OVN
will generate multiple LRPs for this. However they are connected to the
same localnet port, so to the same external bridge. It is up to the user
to then split them out to real phyiscal ports.

The above also means that this feature should work extremely well with
existing CMS like neutron. There are just a few things that would need
changes in comparison to a normal neutron setup:
1. The external network needs to be switched out for a new overlay-only
   network with an additional router behind it. This should be easily
   doable, it is just not common at the moment.
2. Setting the option:active-active-lrp=true on the
   LRP-magic-router-public-for-real
3. Setting the ovn-aa-port-mappings

This means that such active-active routing should be quite easily
adaptable for existing deployments (in comparison to all others).

## Improvements over the previous version

* Matches more closely what a general public cloud deployment looks like
  today and what existing neutron can do
* There is no need anymore for the external mac and ip addresses to be
  identical everywhere
* Multiple external ports per chassis can be handled but are not yet
  nice. They require additional flows on the external bridge
* The routes for each external port can be different. This also allows
  users to disable external traffic on a gateway chassis while still
  using it for ovn internal traffic.
* Uses more existing logic on the individual chassis. This is mostly a
  northd change.

## Current Limitations

* It breaks some testcases. I did not investigate them yet
* It definately broke incremental processing as i did not add anything
  there
* Routing is not yet dynamic between the magic-router and the physical
  world. I wanted to test Martin and Frodes changes for this
* Communication between all LRs connected to LS public need manual
  routes to be set everywhere (if you do not do nat)

> >
> > # Results and Open questions
> >
> > ## Integrating the project LRs via a Router or Switch
> >
> > The current default setup of at least neutron is to integrate project
> > LRs via a single Switch that also hosts a localnet port to the outside
> > world.
> >
> > In the setup above i tried to use a Router instead for this purpose.
> >
> > The differences i see between these setups:
> > * In case a integration Switch is used each project router must hold the 
> > full
> >   routing table to all other project routers on the switch. In case of a
> >   integration Router it is the only one that needs the routing table
> >   while the project routers only need a default route.
> > * A logical switch brings features that are not necessary in this
> >   scenario (like multicast/broadcast). In case of a lot of LSPs they
> >   actually can generate flows that are too large.
> > * For project to project communicatiation there is no difference in the
> >   amount of datapaths traversed. For project to external communication
> >   the switch add an extra datapath.
> >
> > So from a greenfield perspective i would see no value in using a logical
> > switch between the projects. However for existing setups and
> > integrations a logical switch would maybe make the life of everyone easier.

I decided here for using a logical switch to preserve existing cms
behaviour.
Potentially we can improve things for logical switches that are just
connected to logical routers to remove flows that we know are
unnecessary (like broadcast).

> >
> > ## Connection from public router to localnet port
> >
> > The setup above uses a single LRP on the public router for connection to
> > the localnet port. This is quite easy to setup and extend for newly
> > created chassis.
> >
> > The approach in other examples rather used one LRP per external
> > connection. When a new chassis is added an additional LRP and external
> > LS is needed (or multiple with multiple nics).
> >
> > The differences i see are:
> > * a single external LRP requires the side outside of ovn to behave
> >   identically everywhere. Meaning it needs the same IP and MAC address.
> >   This might not always be possible depending on the systems there.
> > * with multiple LRPs the CMS needs additional information to configure
> >   them in comparison to what is currently needed (e.g. IPs).
> > * multiple LRPs currently do not support prefering a node-local route if
> >   available. This is needed to prevent us from sending packets between
> >   gateway chassis for no reason.
> > * multiple LRPs would allow us to learn different routes (or routes with
> >   different cost) on different chassis. A single LRP means that we have
> >   the same routing table on all chassis.
> > * having multiple external connections on a single chassis bound to the
> >   same public LR is not easily possible with a single LRP. with multiple
> >   LRPs this natively works.
> >
> > I see points for both approaches. Maybe we find an alternative that
> > combines the benefits and skips the drawbacks?

I think i found this combination with ovn-aa-port-mappings.
That way we have one real external LRP from the perspective of the CMS.
But from the perspective of northd and the ovn-controllers we have
multiple ones, thereby enabling different mac/ips and different routes.

> >
> > # Summary
> >
> > I would like to hear opinitions on these topics as i think they are
> > relevant to each potential implementation.
> > Maybe we can also use some time in the community meeting later on these.

I am honestly a lot more happy with this approach than with the previous
one. It feels a lot cleaner and less hacked together. Also from the
perspective of running this in production it looks easier to reason
about.

> 
> Thanks alot for contributing work on discovering potential paths
> forward for interaction between gateway routers and distributed
> gateways for the OpenStack use case, Felix, much appreciated!
> 
> There is a lot to consider in what you have laid out above, so I'll
> leave you with a comment here for now informing you of my intention to
> review this more thoroughly and respond to you, to ensure you do not
> feel ignored.

Thanks a lot, i guess this is even more to read now.

<SNIP>

> > > >
> > > > One thing I forgot to mention is that for simplicity the script uses a
> > > > lot of IPv4 addresses. In a final solution I would propose we use IPv6
> > > > LLAs also for routing between the internal OVN L3 constructs to avoid
> > > > this.
> > >
> > > /me goes back to fixing the v4-over-v6 routing patch :)
> 
> I did indeed see your proposals for that, and we are very much
> interested in those too! :)

Sorry did not really spend time on it. This topic stole all attention :)

> 
> --
> Frode Nordahl
> 

Thanks
Felix
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] RFC OVN: fabric integration

Reply via email to