On 11/26/24 3:37 PM, Felix Huettner via dev wrote: > Hi everyone, > Hi Felix,
> this is the current state of the implementation of the discussion of the OVN > Fabric integration: > https://mail.openvswitch.org/pipermail/ovs-dev/2024-June/415033.html > > This patchset is extremely large and split into multiple parts. > They are partially cross-dependent on each other which is why this is one big > patchset instead of multiple small ones. > To make the work easier for everyone involved we should be able to already > merge earlier ones if we have an agreement on them. I started looking ath the first part of the series and applied two of the patches. I also shared some minor comments on a few of the other preliminary patches. However, the rest of series needs a rebase. That would make it easier to review. I'll continue looking at the rest of the patches until then. Thanks, Dumitru > > The code is working in my testsetup, but especially the later patches are > most probably not yet acceptable for inclusion in OVN. They lack testcases > and documentation of the new features. > I will continue improving their quality, but i wanted to send this out > soonish so everyone can have a look. > > Larger changes v2->v3: > * added documentation for all new options > * added tests > * allow filtering of receiving routes on multiple LRPs per chassis by > ifname > Larger changes v1->v2: > * fixed a bunch of issues the ci uncovered > * included 3 commits for incremental processing in their original > changes > * as the CI does not run with the custom ovs submodule i changed the > order of part 3 and 4 to allow for a larger CI coverage > > > Part 1 contains general refactoring for later steps. > None of these patches should have any effect on the output of northd and > ovn-controller. > This is also why they do not touch any tests. > The patches of this part are: > * northd: Set southbound mac from lrp_networks. > * northd: Fix relying on naming coincidences. > * northd: Find outports based on ovn_port. > * northd: Store outport of parsed_route. > * northd: Split out join_logical_ports. > * northd: Reorder join_logical_ports. > * northd: Rename en_static_routes to en_routes. > * northd: Move connected routes to route engine. > > > Part 2 modifies existing features to better fit to the changes here. > It has small impacts on functionality by allowing previously forbidden > configurations or removing not necessary options. > The patches of this part are: > * northd: Autodiscover centralize_routing. > * northd: Routing-protocol-redirect on crps. > > > Part 3 includes the northd side for learning and advertising routes. > For this we add a new sb-db table to contain routes in either direction. > After these patches an external system could already use the new table to > learn and advertise routes from/to OVN. > The patches of this part are: > * northd: Add route table to southbound and sync. > * northd: Add filtering which routes to advertise. > * northd: Handle learned routes. > * northd: Remove learned routes if lrp is removed. > * northd: Allow announcing individual host routes. > * northd: Sync routing data to pb. > > > Part 4 prepares the CI and our dependencies for the later patches. > In the patchset submitted here it actually changes the URL of the ovs > submodule to my fork. > This is just for making the CI run and alowing everyone easier testing. > It should actually be replaced by a pointer to the OVS repo with the > following patchset included: > https://mail.openvswitch.org/pipermail/ovs-dev/2024-November/418547.html > The patches of this part are: > * DO NOT APPLY: Use my ovs repo and bump. > * ci: Manage host/system level dependencies. > * system-ovn: Remove route without nexthop. > > > Part 5 includes the ovn-controller features for learning and advertising > routes. > With these changes ovn-controller can read and modify the routing tables in > linux VRFs or network namespaces. > External tools (like frr) can then use these routing tables to communicate > these routes to external systems. > The patches of this part are: > * controller: Introduce route node. > * controller: Introduce route-exchange-netlink. > * controller: Announce routes via route-exchange. > * controller: Support learning routes. > * controller: Support receiving routes per iface > * controller: Prioritize host routes. > * controller: Allow network namespaces for routes. > * controller: Watch for route changes. > * controller: Cleanup routes on stop. > > > Part 6 introduces active-active LRPs. > These LRPs allow the northbound database to just contain a single LRP for > external connections which are then translated to many Port_Bindings on the > southbound db. > This allows for an easy integration of this whole featureset with existing > CMS by keeping potential chassis-local configuration values outside of the > northbound database. > The patches of this part are: > * controller: Publish ovn-active-active-mappings. > * northd: Support active-active lrps. > * northd: Support active-active bgp redirects. > * northd: Support filter routes on active-active. > > > Part 7 optimizes ECMP routes across different chassis. > This allows keeping traffic on a local chassis instead of sending it to other > chassis just based on the ecmp hash. > With the setups the features here encourage this will be more common than it > is right now. > The patches of this part are: > * northd: ECMP prefer local routes if possible. > > > > > Below i will try to point out how to use this patchset. > However i probably missed something, sorry for that. > > * This change was tested on a setup using 4 nodes > * cmp: running the OVN control plane and ovn-controller > * gtw1, gtw2: running ovn-controller and connect to the control plane of > cmp > * bgp: outside of OVN, simulating a spine-leaf fabric > * cmp, gtw1, gtw2 are on a shared network which they use for control and > overlay traffic > * setup the local ovsdbs with "system-id", "ovn-encap-ip" and all the other > stuff generally needed > * gtw1 has 2 point-to-point links to bgp > * gtw2 has 1 point-to-point link to bgp > > Create the following northbound setup: > ``` > switch 6403cdcc-9354-40c3-801c-9eec666ceebf (public) > port public-project-router > type: router > router-port: project-router-public > port public-magic-router > type: router > router-port: magic-router-public > switch 473d1998-2ded-410c-b42e-286dd22cf758 (intern) > port intern-project-router > type: router > router-port: project-router-intern > port intern-port1 > addresses: ["ca:b7:23:93:fd:3f 192.168.0.2"] > switch e45213e9-3312-48ad-ba27-0568f11f6939 (public-for-real-1) > port public-for-real-1-magic-router > type: router > router-port: magic-router-public-for-real-1 > port physnet > type: localnet > addresses: ["unknown"] > port bgp-port > router 2db59849-45e3-40e7-9ace-b6179ace2cb5 (magic-router) > port magic-router-public > mac: "00:00:00:00:fe:01" > ipv6-lla: "fe80::200:ff:fe00:fe01" > networks: ["1.0.0.1/24"] > port magic-router-public-for-real-1 > mac: "active-active" > ipv6-lla: "fe80::200:ff:fe00:0" > networks: ["active-active"] > router ffa39f2d-6def-413b-953a-4c2ee5f671fe (project-router) > port project-router-public > mac: "00:00:00:00:ff:02" > ipv6-lla: "fe80::200:ff:fe00:ff02" > networks: ["1.0.0.100/24"] > port project-router-intern > mac: "00:00:00:00:ff:01" > ipv6-lla: "fe80::200:ff:fe00:ff01" > networks: ["192.168.0.1/24"] > nat d924de49-67c0-4b2d-b2cc-dc362f43984d > external ip: "1.0.0.150" > logical ip: "192.168.0.2" > type: "dnat_and_snat" > ``` > > * On "project-router-public" set a ha_chassis_group pointing to gtw1 and gtw2 > with different prios > * ensure that "intern-port1" is bound on "cmp" and that you have a interface > somewhere responding to pings > > Ensure the following options are set on the LRPs "magic-router-public" and > "magic-router-public-for-real-1": > ``` > root@ovn-cmp:~# ovn-nbctl list Logical_Router_Port > magic-router-public-for-real-1 > _uuid : de032fe6-5af5-4cbd-85ac-1d19ffc7c8b1 > dhcp_relay : [] > enabled : [] > external_ids : {} > gateway_chassis : [] > ha_chassis_group : 7f808d4b-af12-4291-b249-2b83d248574c > ipv6_prefix : [] > ipv6_ra_configs : {} > mac : active-active > name : magic-router-public-for-real-1 > networks : [active-active] > options : {active-active-lrp="true", bgp-mirror=bgp-port, > routing-protocol-redirect=bgp-port, routing-protocols="BGP,BFD", > use-netns="true"} > peer : [] > status : {hosting-chassis=ovn-gtw1} > > root@ovn-cmp:~# ovn-nbctl list Logical_Router_Port magic-router-public > _uuid : 7b3884aa-cde8-4994-ac8c-9f4ed154e4a9 > dhcp_relay : [] > enabled : [] > external_ids : {} > gateway_chassis : [] > ha_chassis_group : [] > ipv6_prefix : [] > ipv6_ra_configs : {} > mac : "00:00:00:00:fe:01" > name : magic-router-public > networks : ["1.0.0.1/24"] > options : {dynamic-routing-connected="true", > dynamic-routing-connected-as-host-routes="true", > dynamic-routing-static="true"} > peer : [] > status : {} > ``` > > * The ha-chassis-group of "magic-router-public-for-real-1" should also point > to gtw1 and gtw2, priorities are irrelevant here > * Set the option:dynamic-routing=true on the magic-router LR > > Run on gtw1 and gtw2: > ``` > ovs-vsctl set Open . external_ids:ovn-bridge-mappings="phys:physnet" > ovs-vsctl add-br physnet > ovs-vsctl add-port physnet <your-point-to-point-links-to-bgp-node> > ``` > > * Get the datapath id of the magic-router LR (i assume 4 below) > * On gtw1 and gtw2 create a netns named "ovnns${datapathid}" > * On gtw1 and gtw2 create ports on br-int for the bgp connections > * On gtw1: "bgp-port-gtw1-0" and "bgp-port-gtw1-1" > * On gtw2: "bgp-port-gtw2-0" > * Create a veth pair on the linux side matching to these ports and move > the other end to the ovnns netns (i below expect the one in the netns to be > named "<portname>-ns" > > * Set the following via "ovs-vsctl set Open . > external_ids:ovn-active-active-mappings=" > * on gtw1: > "phys;00:fe:fe:fe:fe:01,172.16.0.10/25,bgp-port-gtw1-0-ns;00:fe:fe:fe:fe:33,172.20.0.10/25,bgp-port-gtw1-1-ns" > * on gtw2: "phys;00:fe:fe:fe:fe:11,172.16.0.139/25,bgp-port-gtw2-0-ns" > > * Set the mac and ip address of the bgp ports correctly: > * matching to "bgp-port-gtw1-0": 00:fe:fe:fe:fe:01 172.16.0.10/25 > * matching to "bgp-port-gtw1-1": 14;00:fe:fe:fe:fe:33 172.20.0.10/25 > * matching to "bgp-port-gtw2-0": 00:fe:fe:fe:fe:11 172.16.0.139/25 > > * Spawn frr on gtw1 and gtw2 > * zebra needs to be started with "-n" as option > > Configuration on gtw1: > ``` > frr version 8.4.4 > frr defaults traditional > hostname gtw1 > log syslog informational > no ip forwarding > no ipv6 forwarding > service integrated-vtysh-config > ! > vrf ovnns4 > netns /run/netns/ovnns4 > exit-vrf > ! > router bgp 65000 vrf ovnns4 > bgp router-id 172.16.0.10 > no bgp hard-administrative-reset > no bgp graceful-restart notification > neighbor 172.16.0.1 remote-as 64512 > neighbor 172.16.0.1 bfd > neighbor 172.20.0.1 remote-as 64512 > neighbor 172.20.0.1 bfd > ! > address-family ipv4 unicast > redistribute kernel > redistribute connected > redistribute static > neighbor 172.16.0.1 soft-reconfiguration inbound > neighbor 172.16.0.1 route-map ALLOW-ALL in > neighbor 172.16.0.1 route-map ALLOW-ALL out > neighbor 172.20.0.1 soft-reconfiguration inbound > neighbor 172.20.0.1 route-map ALLOW-ALL in > neighbor 172.20.0.1 route-map ALLOW-ALL out > exit-address-family > exit > ! > route-map ALLOW-ALL permit 100 > exit > ! > ``` > > And for gtw2: > ``` > frr version 8.4.4 > frr defaults traditional > hostname gtw2 > log syslog informational > no ip forwarding > no ipv6 forwarding > service integrated-vtysh-config > ! > vrf ovnns4 > netns /run/netns/ovnns4 > exit-vrf > ! > router bgp 65000 vrf ovnns4 > bgp router-id 172.16.0.139 > no bgp hard-administrative-reset > no bgp graceful-restart notification > neighbor 172.16.0.129 remote-as 64512 > neighbor 172.16.0.129 bfd > ! > address-family ipv4 unicast > redistribute kernel > redistribute connected > redistribute static > neighbor 172.16.0.129 soft-reconfiguration inbound > neighbor 172.16.0.129 route-map ALLOW-ALL in > neighbor 172.16.0.129 route-map ALLOW-ALL out > exit-address-family > exit > ! > route-map ALLOW-ALL permit 100 > exit > ! > ``` > > * Now configure the bgp node > * Setup some network local to the node that you want to announce to OVN > and that is not overlapping with anything > * setup the 3 point to point links with the matching ip addresses > * 172.16.0.1/25 > * 172.16.0.129/25 > * 172.20.0.1/25 > * Configure frr for peering with: > * 172.16.0.10 > * 172.16.0.139 > * 172.20.0.10 > > If all goes well you should now learn routes to: > * 1.0.0.1/32 > * 1.0.0.100/32 > * 1.0.0.150/32 > > pings to 1.0.0.150 should now work and end at cmp at "intern-port1". > depending on how you prioritized the ha_chassis_group for > "project-router-public" you will receive different router preferences from > gtw1 and gtw2. > If you change the priority order the route preferences should also change. > If you shut down any of the bgp peerings you should see the ping replies no > longer taking that path. > > To debug which routes should be advertised or have been learned look at > "ovn-sbctl list Route" > > If you now add port ports/routers to the LS "public" they should get > announced automatically. > > > Thanks for sticking through and i hope this was helpful and this not generate > more confusion. > > > Felix Huettner (32): > northd: Set southbound mac from lrp_networks. > northd: Fix relying on naming coincidences. > northd: Find outports based on ovn_port. > northd: Store outport of parsed_route. > northd: Split out join_logical_ports. > northd: Reorder join_logical_ports. > northd: Rename en_static_routes to en_routes. > northd: Move connected routes to route engine. > northd: Autodiscover centralize_routing. > northd: Routing-protocol-redirect on crps. > northd: Add route table to southbound and sync. > northd: Add filtering which routes to advertise. > northd: Handle learned routes. > northd: Remove learned routes if lrp is removed. > northd: Allow announcing individual host routes. > northd: Sync routing data to pb. > DO NOT APPLY: Use my ovs repo and bump. > system-ovn: Remove route without nexthop. > controller: Introduce route node. > controller: Introduce route-exchange-netlink. > controller: Announce routes via route-exchange. > controller: Support learning routes. > controller: Support receiving routes per iface. > controller: Prioritize host routes. > controller: Allow network namespaces for routes. > controller: Watch for route changes. > controller: Cleanup routes on stop. > controller: Publish ovn-active-active-mappings. > northd: Support active-active lrps. > northd: Support active-active bgp redirects. > northd: Support filter routes on active-active. > northd: ECMP prefer local routes if possible. > > Frode Nordahl (1): > ci: Manage host/system level dependencies. > > .github/workflows/test.yml | 5 + > .gitmodules | 3 +- > NEWS | 28 + > configure.ac | 2 + > controller/automake.mk | 18 +- > controller/chassis.c | 22 + > controller/ovn-controller.8.xml | 30 + > controller/ovn-controller.c | 337 +++++- > controller/route-exchange-netlink.c | 360 +++++++ > controller/route-exchange-netlink.h | 61 ++ > controller/route-exchange-stub.c | 36 + > controller/route-exchange.c | 330 ++++++ > controller/route-exchange.h | 38 + > controller/route-table-notify-stub.c | 37 + > controller/route-table-notify.c | 154 +++ > controller/route-table-notify.h | 43 + > controller/route.c | 233 +++++ > controller/route.h | 88 ++ > ic/ovn-ic.c | 21 - > lib/automake.mk | 2 + > lib/lrp-index.c | 43 + > lib/lrp-index.h | 25 + > lib/ovn-util.c | 139 +++ > lib/ovn-util.h | 31 + > lib/stopwatch-names.h | 1 + > m4/ovn.m4 | 25 + > northd/automake.mk | 2 + > northd/en-lflow.c | 11 +- > northd/en-northd.c | 38 +- > northd/en-northd.h | 8 +- > northd/en-routes-sync.c | 565 ++++++++++ > northd/en-routes-sync.h | 46 + > northd/inc-proc-northd.c | 31 +- > northd/northd.c | 1437 ++++++++++++++++++-------- > northd/northd.h | 116 ++- > ovn-nb.xml | 184 +++- > ovn-sb.ovsschema | 19 +- > ovn-sb.xml | 74 ++ > ovs | 2 +- > tests/automake.mk | 6 + > tests/multinode.at | 7 - > tests/ovn-northd.at | 379 +++++-- > tests/ovn.at | 375 +++++++ > tests/ovs-macros.at | 11 + > tests/system-common-macros.at | 27 + > tests/system-ovn.at | 705 ++++++++++++- > 46 files changed, 5542 insertions(+), 613 deletions(-) > create mode 100644 controller/route-exchange-netlink.c > create mode 100644 controller/route-exchange-netlink.h > create mode 100644 controller/route-exchange-stub.c > create mode 100644 controller/route-exchange.c > create mode 100644 controller/route-exchange.h > create mode 100644 controller/route-table-notify-stub.c > create mode 100644 controller/route-table-notify.c > create mode 100644 controller/route-table-notify.h > create mode 100644 controller/route.c > create mode 100644 controller/route.h > create mode 100644 lib/lrp-index.c > create mode 100644 lib/lrp-index.h > create mode 100644 northd/en-routes-sync.c > create mode 100644 northd/en-routes-sync.h > > > base-commit: 12886fb5399c44d9ef40381cd4ac562dab0fdeba _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
