Hi everyone,
this is the current state of the implementation of the discussion of the OVN
Fabric integration:
https://mail.openvswitch.org/pipermail/ovs-dev/2024-June/415033.html
This patchset is extremely large and split into multiple parts.
They are partially cross-dependent on each other which is why this is one big
patchset instead of multiple small ones.
To make the work easier for everyone involved we should be able to already
merge earlier ones if we have an agreement on them.
The code is working in my testsetup, but especially the later patches are most
probably not yet acceptable for inclusion in OVN. They lack testcases and
documentation of the new features.
I will continue improving their quality, but i wanted to send this out soonish
so everyone can have a look.
Larger changes v2->v3:
* added documentation for all new options
* added tests
* allow filtering of receiving routes on multiple LRPs per chassis by
ifname
Larger changes v1->v2:
* fixed a bunch of issues the ci uncovered
* included 3 commits for incremental processing in their original
changes
* as the CI does not run with the custom ovs submodule i changed the
order of part 3 and 4 to allow for a larger CI coverage
Part 1 contains general refactoring for later steps.
None of these patches should have any effect on the output of northd and
ovn-controller.
This is also why they do not touch any tests.
The patches of this part are:
* northd: Set southbound mac from lrp_networks.
* northd: Fix relying on naming coincidences.
* northd: Find outports based on ovn_port.
* northd: Store outport of parsed_route.
* northd: Split out join_logical_ports.
* northd: Reorder join_logical_ports.
* northd: Rename en_static_routes to en_routes.
* northd: Move connected routes to route engine.
Part 2 modifies existing features to better fit to the changes here.
It has small impacts on functionality by allowing previously forbidden
configurations or removing not necessary options.
The patches of this part are:
* northd: Autodiscover centralize_routing.
* northd: Routing-protocol-redirect on crps.
Part 3 includes the northd side for learning and advertising routes.
For this we add a new sb-db table to contain routes in either direction.
After these patches an external system could already use the new table to learn
and advertise routes from/to OVN.
The patches of this part are:
* northd: Add route table to southbound and sync.
* northd: Add filtering which routes to advertise.
* northd: Handle learned routes.
* northd: Remove learned routes if lrp is removed.
* northd: Allow announcing individual host routes.
* northd: Sync routing data to pb.
Part 4 prepares the CI and our dependencies for the later patches.
In the patchset submitted here it actually changes the URL of the ovs submodule
to my fork.
This is just for making the CI run and alowing everyone easier testing.
It should actually be replaced by a pointer to the OVS repo with the following
patchset included:
https://mail.openvswitch.org/pipermail/ovs-dev/2024-November/418547.html
The patches of this part are:
* DO NOT APPLY: Use my ovs repo and bump.
* ci: Manage host/system level dependencies.
* system-ovn: Remove route without nexthop.
Part 5 includes the ovn-controller features for learning and advertising routes.
With these changes ovn-controller can read and modify the routing tables in
linux VRFs or network namespaces.
External tools (like frr) can then use these routing tables to communicate
these routes to external systems.
The patches of this part are:
* controller: Introduce route node.
* controller: Introduce route-exchange-netlink.
* controller: Announce routes via route-exchange.
* controller: Support learning routes.
* controller: Support receiving routes per iface
* controller: Prioritize host routes.
* controller: Allow network namespaces for routes.
* controller: Watch for route changes.
* controller: Cleanup routes on stop.
Part 6 introduces active-active LRPs.
These LRPs allow the northbound database to just contain a single LRP for
external connections which are then translated to many Port_Bindings on the
southbound db.
This allows for an easy integration of this whole featureset with existing CMS
by keeping potential chassis-local configuration values outside of the
northbound database.
The patches of this part are:
* controller: Publish ovn-active-active-mappings.
* northd: Support active-active lrps.
* northd: Support active-active bgp redirects.
* northd: Support filter routes on active-active.
Part 7 optimizes ECMP routes across different chassis.
This allows keeping traffic on a local chassis instead of sending it to other
chassis just based on the ecmp hash.
With the setups the features here encourage this will be more common than it is
right now.
The patches of this part are:
* northd: ECMP prefer local routes if possible.
Below i will try to point out how to use this patchset.
However i probably missed something, sorry for that.
* This change was tested on a setup using 4 nodes
* cmp: running the OVN control plane and ovn-controller
* gtw1, gtw2: running ovn-controller and connect to the control plane of cmp
* bgp: outside of OVN, simulating a spine-leaf fabric
* cmp, gtw1, gtw2 are on a shared network which they use for control and
overlay traffic
* setup the local ovsdbs with "system-id", "ovn-encap-ip" and all the other
stuff generally needed
* gtw1 has 2 point-to-point links to bgp
* gtw2 has 1 point-to-point link to bgp
Create the following northbound setup:
```
switch 6403cdcc-9354-40c3-801c-9eec666ceebf (public)
port public-project-router
type: router
router-port: project-router-public
port public-magic-router
type: router
router-port: magic-router-public
switch 473d1998-2ded-410c-b42e-286dd22cf758 (intern)
port intern-project-router
type: router
router-port: project-router-intern
port intern-port1
addresses: ["ca:b7:23:93:fd:3f 192.168.0.2"]
switch e45213e9-3312-48ad-ba27-0568f11f6939 (public-for-real-1)
port public-for-real-1-magic-router
type: router
router-port: magic-router-public-for-real-1
port physnet
type: localnet
addresses: ["unknown"]
port bgp-port
router 2db59849-45e3-40e7-9ace-b6179ace2cb5 (magic-router)
port magic-router-public
mac: "00:00:00:00:fe:01"
ipv6-lla: "fe80::200:ff:fe00:fe01"
networks: ["1.0.0.1/24"]
port magic-router-public-for-real-1
mac: "active-active"
ipv6-lla: "fe80::200:ff:fe00:0"
networks: ["active-active"]
router ffa39f2d-6def-413b-953a-4c2ee5f671fe (project-router)
port project-router-public
mac: "00:00:00:00:ff:02"
ipv6-lla: "fe80::200:ff:fe00:ff02"
networks: ["1.0.0.100/24"]
port project-router-intern
mac: "00:00:00:00:ff:01"
ipv6-lla: "fe80::200:ff:fe00:ff01"
networks: ["192.168.0.1/24"]
nat d924de49-67c0-4b2d-b2cc-dc362f43984d
external ip: "1.0.0.150"
logical ip: "192.168.0.2"
type: "dnat_and_snat"
```
* On "project-router-public" set a ha_chassis_group pointing to gtw1 and gtw2
with different prios
* ensure that "intern-port1" is bound on "cmp" and that you have a interface
somewhere responding to pings
Ensure the following options are set on the LRPs "magic-router-public" and
"magic-router-public-for-real-1":
```
root@ovn-cmp:~# ovn-nbctl list Logical_Router_Port
magic-router-public-for-real-1
_uuid : de032fe6-5af5-4cbd-85ac-1d19ffc7c8b1
dhcp_relay : []
enabled : []
external_ids : {}
gateway_chassis : []
ha_chassis_group : 7f808d4b-af12-4291-b249-2b83d248574c
ipv6_prefix : []
ipv6_ra_configs : {}
mac : active-active
name : magic-router-public-for-real-1
networks : [active-active]
options : {active-active-lrp="true", bgp-mirror=bgp-port,
routing-protocol-redirect=bgp-port, routing-protocols="BGP,BFD",
use-netns="true"}
peer : []
status : {hosting-chassis=ovn-gtw1}
root@ovn-cmp:~# ovn-nbctl list Logical_Router_Port magic-router-public
_uuid : 7b3884aa-cde8-4994-ac8c-9f4ed154e4a9
dhcp_relay : []
enabled : []
external_ids : {}
gateway_chassis : []
ha_chassis_group : []
ipv6_prefix : []
ipv6_ra_configs : {}
mac : "00:00:00:00:fe:01"
name : magic-router-public
networks : ["1.0.0.1/24"]
options : {dynamic-routing-connected="true",
dynamic-routing-connected-as-host-routes="true", dynamic-routing-static="true"}
peer : []
status : {}
```
* The ha-chassis-group of "magic-router-public-for-real-1" should also point to
gtw1 and gtw2, priorities are irrelevant here
* Set the option:dynamic-routing=true on the magic-router LR
Run on gtw1 and gtw2:
```
ovs-vsctl set Open . external_ids:ovn-bridge-mappings="phys:physnet"
ovs-vsctl add-br physnet
ovs-vsctl add-port physnet <your-point-to-point-links-to-bgp-node>
```
* Get the datapath id of the magic-router LR (i assume 4 below)
* On gtw1 and gtw2 create a netns named "ovnns${datapathid}"
* On gtw1 and gtw2 create ports on br-int for the bgp connections
* On gtw1: "bgp-port-gtw1-0" and "bgp-port-gtw1-1"
* On gtw2: "bgp-port-gtw2-0"
* Create a veth pair on the linux side matching to these ports and move the
other end to the ovnns netns (i below expect the one in the netns to be named
"<portname>-ns"
* Set the following via "ovs-vsctl set Open .
external_ids:ovn-active-active-mappings="
* on gtw1:
"phys;00:fe:fe:fe:fe:01,172.16.0.10/25,bgp-port-gtw1-0-ns;00:fe:fe:fe:fe:33,172.20.0.10/25,bgp-port-gtw1-1-ns"
* on gtw2: "phys;00:fe:fe:fe:fe:11,172.16.0.139/25,bgp-port-gtw2-0-ns"
* Set the mac and ip address of the bgp ports correctly:
* matching to "bgp-port-gtw1-0": 00:fe:fe:fe:fe:01 172.16.0.10/25
* matching to "bgp-port-gtw1-1": 14;00:fe:fe:fe:fe:33 172.20.0.10/25
* matching to "bgp-port-gtw2-0": 00:fe:fe:fe:fe:11 172.16.0.139/25
* Spawn frr on gtw1 and gtw2
* zebra needs to be started with "-n" as option
Configuration on gtw1:
```
frr version 8.4.4
frr defaults traditional
hostname gtw1
log syslog informational
no ip forwarding
no ipv6 forwarding
service integrated-vtysh-config
!
vrf ovnns4
netns /run/netns/ovnns4
exit-vrf
!
router bgp 65000 vrf ovnns4
bgp router-id 172.16.0.10
no bgp hard-administrative-reset
no bgp graceful-restart notification
neighbor 172.16.0.1 remote-as 64512
neighbor 172.16.0.1 bfd
neighbor 172.20.0.1 remote-as 64512
neighbor 172.20.0.1 bfd
!
address-family ipv4 unicast
redistribute kernel
redistribute connected
redistribute static
neighbor 172.16.0.1 soft-reconfiguration inbound
neighbor 172.16.0.1 route-map ALLOW-ALL in
neighbor 172.16.0.1 route-map ALLOW-ALL out
neighbor 172.20.0.1 soft-reconfiguration inbound
neighbor 172.20.0.1 route-map ALLOW-ALL in
neighbor 172.20.0.1 route-map ALLOW-ALL out
exit-address-family
exit
!
route-map ALLOW-ALL permit 100
exit
!
```
And for gtw2:
```
frr version 8.4.4
frr defaults traditional
hostname gtw2
log syslog informational
no ip forwarding
no ipv6 forwarding
service integrated-vtysh-config
!
vrf ovnns4
netns /run/netns/ovnns4
exit-vrf
!
router bgp 65000 vrf ovnns4
bgp router-id 172.16.0.139
no bgp hard-administrative-reset
no bgp graceful-restart notification
neighbor 172.16.0.129 remote-as 64512
neighbor 172.16.0.129 bfd
!
address-family ipv4 unicast
redistribute kernel
redistribute connected
redistribute static
neighbor 172.16.0.129 soft-reconfiguration inbound
neighbor 172.16.0.129 route-map ALLOW-ALL in
neighbor 172.16.0.129 route-map ALLOW-ALL out
exit-address-family
exit
!
route-map ALLOW-ALL permit 100
exit
!
```
* Now configure the bgp node
* Setup some network local to the node that you want to announce to OVN and
that is not overlapping with anything
* setup the 3 point to point links with the matching ip addresses
* 172.16.0.1/25
* 172.16.0.129/25
* 172.20.0.1/25
* Configure frr for peering with:
* 172.16.0.10
* 172.16.0.139
* 172.20.0.10
If all goes well you should now learn routes to:
* 1.0.0.1/32
* 1.0.0.100/32
* 1.0.0.150/32
pings to 1.0.0.150 should now work and end at cmp at "intern-port1".
depending on how you prioritized the ha_chassis_group for
"project-router-public" you will receive different router preferences from gtw1
and gtw2.
If you change the priority order the route preferences should also change.
If you shut down any of the bgp peerings you should see the ping replies no
longer taking that path.
To debug which routes should be advertised or have been learned look at
"ovn-sbctl list Route"
If you now add port ports/routers to the LS "public" they should get announced
automatically.
Thanks for sticking through and i hope this was helpful and this not generate
more confusion.
Felix Huettner (32):
northd: Set southbound mac from lrp_networks.
northd: Fix relying on naming coincidences.
northd: Find outports based on ovn_port.
northd: Store outport of parsed_route.
northd: Split out join_logical_ports.
northd: Reorder join_logical_ports.
northd: Rename en_static_routes to en_routes.
northd: Move connected routes to route engine.
northd: Autodiscover centralize_routing.
northd: Routing-protocol-redirect on crps.
northd: Add route table to southbound and sync.
northd: Add filtering which routes to advertise.
northd: Handle learned routes.
northd: Remove learned routes if lrp is removed.
northd: Allow announcing individual host routes.
northd: Sync routing data to pb.
DO NOT APPLY: Use my ovs repo and bump.
system-ovn: Remove route without nexthop.
controller: Introduce route node.
controller: Introduce route-exchange-netlink.
controller: Announce routes via route-exchange.
controller: Support learning routes.
controller: Support receiving routes per iface.
controller: Prioritize host routes.
controller: Allow network namespaces for routes.
controller: Watch for route changes.
controller: Cleanup routes on stop.
controller: Publish ovn-active-active-mappings.
northd: Support active-active lrps.
northd: Support active-active bgp redirects.
northd: Support filter routes on active-active.
northd: ECMP prefer local routes if possible.
Frode Nordahl (1):
ci: Manage host/system level dependencies.
.github/workflows/test.yml | 5 +
.gitmodules | 3 +-
NEWS | 28 +
configure.ac | 2 +
controller/automake.mk | 18 +-
controller/chassis.c | 22 +
controller/ovn-controller.8.xml | 30 +
controller/ovn-controller.c | 337 +++++-
controller/route-exchange-netlink.c | 360 +++++++
controller/route-exchange-netlink.h | 61 ++
controller/route-exchange-stub.c | 36 +
controller/route-exchange.c | 330 ++++++
controller/route-exchange.h | 38 +
controller/route-table-notify-stub.c | 37 +
controller/route-table-notify.c | 154 +++
controller/route-table-notify.h | 43 +
controller/route.c | 233 +++++
controller/route.h | 88 ++
ic/ovn-ic.c | 21 -
lib/automake.mk | 2 +
lib/lrp-index.c | 43 +
lib/lrp-index.h | 25 +
lib/ovn-util.c | 139 +++
lib/ovn-util.h | 31 +
lib/stopwatch-names.h | 1 +
m4/ovn.m4 | 25 +
northd/automake.mk | 2 +
northd/en-lflow.c | 11 +-
northd/en-northd.c | 38 +-
northd/en-northd.h | 8 +-
northd/en-routes-sync.c | 565 ++++++++++
northd/en-routes-sync.h | 46 +
northd/inc-proc-northd.c | 31 +-
northd/northd.c | 1437 ++++++++++++++++++--------
northd/northd.h | 116 ++-
ovn-nb.xml | 184 +++-
ovn-sb.ovsschema | 19 +-
ovn-sb.xml | 74 ++
ovs | 2 +-
tests/automake.mk | 6 +
tests/multinode.at | 7 -
tests/ovn-northd.at | 379 +++++--
tests/ovn.at | 375 +++++++
tests/ovs-macros.at | 11 +
tests/system-common-macros.at | 27 +
tests/system-ovn.at | 705 ++++++++++++-
46 files changed, 5542 insertions(+), 613 deletions(-)
create mode 100644 controller/route-exchange-netlink.c
create mode 100644 controller/route-exchange-netlink.h
create mode 100644 controller/route-exchange-stub.c
create mode 100644 controller/route-exchange.c
create mode 100644 controller/route-exchange.h
create mode 100644 controller/route-table-notify-stub.c
create mode 100644 controller/route-table-notify.c
create mode 100644 controller/route-table-notify.h
create mode 100644 controller/route.c
create mode 100644 controller/route.h
create mode 100644 lib/lrp-index.c
create mode 100644 lib/lrp-index.h
create mode 100644 northd/en-routes-sync.c
create mode 100644 northd/en-routes-sync.h
base-commit: 12886fb5399c44d9ef40381cd4ac562dab0fdeba
--
2.47.0
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev