Hi everyone,

this is the initial implementation of the discussion of the OVN Fabric 
integration:
https://mail.openvswitch.org/pipermail/ovs-dev/2024-June/415033.html

This patchset is extremely large and split into multiple parts.
They are partially cross-dependent on each other which is why this is one big 
patchset instead of multiple small ones.

The code is working in my testsetup, but especially the later patches are most 
probably not yet acceptable for inclusion in OVN. They lack testcases and 
documentation of the new features.
I will continue improving their quality, but i wanted to send this out soonish 
so everyone can have a look.


Larger changes v1->v2:
   * fixed a bunch of issues the ci uncovered
   * included 3 commits for incremental processing in their original
     changes
   * as the CI does not run with the custom ovs submodule i changed the
     order of part 3 and 4 to allow for a larger CI coverage


Part 1 contains general refactoring for later steps.
None of these patches should have any effect on the output of northd and 
ovn-controller.
This is also why they do not touch any tests.
The patches of this part are:
* northd: Set southbound mac from lrp_networks.
* northd: Fix relying on naming coincidences.
* northd: Find outports based on ovn_port.
* northd: Store outport of parsed_route.
* northd: Split out join_logical_ports.
* northd: Reorder join_logical_ports.
* northd: Rename en_static_routes to en_routes.
* northd: Move connected routes to route engine.


Part 2 modifies existing features to better fit to the changes here.
It has small impacts on functionality by allowing previously forbidden 
configurations or removing not necessary options.
The patches of this part are:
* northd: Autodiscover centralize_routing.
* northd: Routing-protocol-redirect on crps.


Part 3 includes the northd side for learning and advertising routes.
For this we add a new sb-db table to contain routes in either direction.
After these patches an external system could already use the new table to learn 
and advertise routes from/to OVN.
The patches of this part are:
* northd: Add route table to southbound and sync.
* northd: Add filtering which routes to advertise.
* northd: Handle learned routes.
* northd: Remove learned routes if lrp is removed.
* northd: Allow announcing individual host routes.
* northd: Sync routing data to pb.


Part 4 prepares the CI and our dependencies for the later patches.
In the patchset submitted here it actually changes the URL of the ovs submodule 
to my fork.
This is just for making the CI run and alowing everyone easier testing.
It should actually be replaced by a pointer to the OVS repo with the following 
patchset included:
https://mail.openvswitch.org/pipermail/ovs-dev/2024-October/417872.html
The patches of this part are:
* DO NOT APPLY: Use my ovs repo and bump.
* ci: Manage host/system level dependencies.
* system-ovn: Remove route without nexthop.


Part 5 includes the ovn-controller features for learning and advertising routes.
With these changes ovn-controller can read and modify the routing tables in 
linux VRFs or network namespaces.
External tools (like frr) can then use these routing tables to communicate 
these routes to external systems.
The patches of this part are:
* controller: Introduce route node.
* controller: Introduce route-exchange-netlink.
* controller: Announce routes via route-exchange.
* controller: Support learning routes.
* controller: Prioritize host routes.
* controller: Allow network namespaces for routes.
* controller: Watch for route changes.
* controller: Cleanup routes on stop.


Part 6 introduces active-active LRPs.
These LRPs allow the northbound database to just contain a single LRP for 
external connections which are then translated to many Port_Bindings on the 
southbound db.
This allows for an easy integration of this whole featureset with existing CMS 
by keeping potential chassis-local configuration values outside of the 
northbound database.
The patches of this part are:
* controller: Publish ovn-active-active-mappings.
* northd: Support active-active lrps.
* northd: Support active-active bgp redirects.
* controller: Allow filtering routes on ifindex.


Part 7 optimizes ECMP routes across different chassis.
This allows keeping traffic on a local chassis instead of sending it to other 
chassis just based on the ecmp hash.
With the setups the features here encourage this will be more common than it is 
right now.
The patches of this part are:
* northd: ECMP prefer local routes if possible.




Below i will try to point out how to use this patchset.
However i probably missed something, sorry for that.

* This change was tested on a setup using 4 nodes
   * cmp: running the OVN control plane and ovn-controller
   * gtw1, gtw2: running ovn-controller and connect to the control plane of cmp
   * bgp: outside of OVN, simulating a spine-leaf fabric
* cmp, gtw1, gtw2 are on a shared network which they use for control and 
overlay traffic
* setup the local ovsdbs with "system-id", "ovn-encap-ip" and all the other 
stuff generally needed
* gtw1 has 2 point-to-point links to bgp
* gtw2 has 1 point-to-point link to bgp

Create the following northbound setup:
```
switch 6403cdcc-9354-40c3-801c-9eec666ceebf (public)
    port public-project-router
        type: router
        router-port: project-router-public
    port public-magic-router
        type: router
        router-port: magic-router-public
switch 473d1998-2ded-410c-b42e-286dd22cf758 (intern)
    port intern-project-router
        type: router
        router-port: project-router-intern
    port intern-port1
        addresses: ["ca:b7:23:93:fd:3f 192.168.0.2"]
switch e45213e9-3312-48ad-ba27-0568f11f6939 (public-for-real-1)
    port public-for-real-1-magic-router
        type: router
        router-port: magic-router-public-for-real-1
    port physnet
        type: localnet
        addresses: ["unknown"]
    port bgp-port
router 2db59849-45e3-40e7-9ace-b6179ace2cb5 (magic-router)
    port magic-router-public
        mac: "00:00:00:00:fe:01"
        ipv6-lla: "fe80::200:ff:fe00:fe01"
        networks: ["1.0.0.1/24"]
    port magic-router-public-for-real-1
        mac: "active-active"
        ipv6-lla: "fe80::200:ff:fe00:0"
        networks: ["active-active"]
router ffa39f2d-6def-413b-953a-4c2ee5f671fe (project-router)
    port project-router-public
        mac: "00:00:00:00:ff:02"
        ipv6-lla: "fe80::200:ff:fe00:ff02"
        networks: ["1.0.0.100/24"]
    port project-router-intern
        mac: "00:00:00:00:ff:01"
        ipv6-lla: "fe80::200:ff:fe00:ff01"
        networks: ["192.168.0.1/24"]
    nat d924de49-67c0-4b2d-b2cc-dc362f43984d
        external ip: "1.0.0.150"
        logical ip: "192.168.0.2"
        type: "dnat_and_snat"
```

* On "project-router-public" set a ha_chassis_group pointing to gtw1 and gtw2 
with different prios
* ensure that "intern-port1" is bound on "cmp" and that you have a interface 
somewhere responding to pings

Ensure the following options are set on the LRPs "magic-router-public" and 
"magic-router-public-for-real-1":
```
root@ovn-cmp:~# ovn-nbctl list Logical_Router_Port 
magic-router-public-for-real-1
_uuid               : de032fe6-5af5-4cbd-85ac-1d19ffc7c8b1
dhcp_relay          : []
enabled             : []
external_ids        : {}
gateway_chassis     : []
ha_chassis_group    : 7f808d4b-af12-4291-b249-2b83d248574c
ipv6_prefix         : []
ipv6_ra_configs     : {}
mac                 : active-active
name                : magic-router-public-for-real-1
networks            : [active-active]
options             : {active-active-lrp="true", bgp-mirror=bgp-port, 
routing-protocol-redirect=bgp-port, routing-protocols="BGP,BFD", 
use-netns="true"}
peer                : []
status              : {hosting-chassis=ovn-gtw1}

root@ovn-cmp:~# ovn-nbctl list Logical_Router_Port magic-router-public
_uuid               : 7b3884aa-cde8-4994-ac8c-9f4ed154e4a9
dhcp_relay          : []
enabled             : []
external_ids        : {}
gateway_chassis     : []
ha_chassis_group    : []
ipv6_prefix         : []
ipv6_ra_configs     : {}
mac                 : "00:00:00:00:fe:01"
name                : magic-router-public
networks            : ["1.0.0.1/24"]
options             : {dynamic-routing-connected="true", 
dynamic-routing-connected-as-host-routes="true", dynamic-routing-static="true"}
peer                : []
status              : {}
```

* The ha-chassis-group of "magic-router-public-for-real-1" should also point to 
gtw1 and gtw2, priorities are irrelevant here
* Set the option:dynamic-routing=true on the magic-router LR

Run on gtw1 and gtw2:
```
ovs-vsctl set Open . external_ids:ovn-bridge-mappings="phys:physnet"
ovs-vsctl add-br physnet
ovs-vsctl add-port physnet <your-point-to-point-links-to-bgp-node>
```

* Get the datapath id of the magic-router LR (i assume 4 below)
* On gtw1 and gtw2 create a netns named "ovnns${datapathid}"
* On gtw1 and gtw2 create ports on br-int for the bgp connections
   * On gtw1: "bgp-port-gtw1-0" and "bgp-port-gtw1-1"
   * On gtw2: "bgp-port-gtw2-0"
   * Create a veth pair on the linux side matching to these ports and move the 
other end to the ovnns netns
   * For the veth ends in the netns get the interface index (the number before 
the colon in "ip a")
   * For the settings below i expect that the corresponding veth ports are 
numbered as follows:
      * matching to "bgp-port-gtw1-0": 14
          * matching to "bgp-port-gtw1-1": 29
          * matching to "bgp-port-gtw2-0": 19
          
* Set the following via "ovs-vsctl set Open . 
external_ids:ovn-active-active-mappings="
   * on gtw1: 
"phys;00:fe:fe:fe:fe:01,172.16.0.10/25,14;00:fe:fe:fe:fe:33,172.20.0.10/25,29"
   * on gtw2: "phys;00:fe:fe:fe:fe:11,172.16.0.139/25,19"

* Set the mac and ip address of the bgp ports correctly:
  * matching to "bgp-port-gtw1-0": 00:fe:fe:fe:fe:01 172.16.0.10/25
  * matching to "bgp-port-gtw1-1": 14;00:fe:fe:fe:fe:33 172.20.0.10/25
  * matching to "bgp-port-gtw2-0": 00:fe:fe:fe:fe:11 172.16.0.139/25

* Spawn frr on gtw1 and gtw2
   * zebra needs to be started with "-n" as option

Configuration on gtw1:
```
frr version 8.4.4
frr defaults traditional
hostname gtw1
log syslog informational
no ip forwarding
no ipv6 forwarding
service integrated-vtysh-config
!
vrf ovnns4
 netns /run/netns/ovnns4
exit-vrf
!
router bgp 65000 vrf ovnns4
 bgp router-id 172.16.0.10
 no bgp hard-administrative-reset
 no bgp graceful-restart notification
 neighbor 172.16.0.1 remote-as 64512
 neighbor 172.16.0.1 bfd
 neighbor 172.20.0.1 remote-as 64512
 neighbor 172.20.0.1 bfd
 !
 address-family ipv4 unicast
  redistribute kernel
  redistribute connected
  redistribute static
  neighbor 172.16.0.1 soft-reconfiguration inbound
  neighbor 172.16.0.1 route-map ALLOW-ALL in
  neighbor 172.16.0.1 route-map ALLOW-ALL out
  neighbor 172.20.0.1 soft-reconfiguration inbound
  neighbor 172.20.0.1 route-map ALLOW-ALL in
  neighbor 172.20.0.1 route-map ALLOW-ALL out
 exit-address-family
exit
!
route-map ALLOW-ALL permit 100
exit
!
```

And for gtw2:
```
frr version 8.4.4
frr defaults traditional
hostname gtw2
log syslog informational
no ip forwarding
no ipv6 forwarding
service integrated-vtysh-config
!
vrf ovnns4
 netns /run/netns/ovnns4
exit-vrf
!
router bgp 65000 vrf ovnns4
 bgp router-id 172.16.0.139
 no bgp hard-administrative-reset
 no bgp graceful-restart notification
 neighbor 172.16.0.129 remote-as 64512
 neighbor 172.16.0.129 bfd
 !
 address-family ipv4 unicast
  redistribute kernel
  redistribute connected
  redistribute static
  neighbor 172.16.0.129 soft-reconfiguration inbound
  neighbor 172.16.0.129 route-map ALLOW-ALL in
  neighbor 172.16.0.129 route-map ALLOW-ALL out
 exit-address-family
exit
!
route-map ALLOW-ALL permit 100
exit
!
```

* Now configure the bgp node
   * Setup some network local to the node that you want to announce to OVN and 
that is not overlapping with anything
   * setup the 3 point to point links with the matching ip addresses
      * 172.16.0.1/25
          * 172.16.0.129/25
          * 172.20.0.1/25
   * Configure frr for peering with:
      * 172.16.0.10
          * 172.16.0.139
          * 172.20.0.10
          
If all goes well you should now learn routes to:
   * 1.0.0.1/32
   * 1.0.0.100/32
   * 1.0.0.150/32
   
pings to 1.0.0.150 should now work and end at cmp at "intern-port1".
depending on how you prioritized the ha_chassis_group for 
"project-router-public" you will receive different router preferences from gtw1 
and gtw2.
If you change the priority order the route preferences should also change.
If you shut down any of the bgp peerings you should see the ping replies no 
longer taking that path.

To debug which routes should be advertised or have been learned look at 
"ovn-sbctl list Route"

If you now add port ports/routers to the LS "public" they should get announced 
automatically.


Thanks for sticking through and i hope this was helpful and this not generate 
more confusion.
All of this will hopefully become part of the documentation as i further work 
on the patches.


Felix Huettner (31):
  northd: Set southbound mac from lrp_networks.
  northd: Fix relying on naming coincidences.
  northd: Find outports based on ovn_port.
  northd: Store outport of parsed_route.
  northd: Split out join_logical_ports.
  northd: Reorder join_logical_ports.
  northd: Rename en_static_routes to en_routes.
  northd: Move connected routes to route engine.
  northd: Autodiscover centralize_routing.
  northd: Routing-protocol-redirect on crps.
  northd: Add route table to southbound and sync.
  northd: Add filtering which routes to advertise.
  northd: Handle learned routes.
  northd: Remove learned routes if lrp is removed.
  northd: Allow announcing individual host routes.
  northd: Sync routing data to pb.
  DO NOT APPLY: Use my ovs repo and bump.
  system-ovn: Remove route without nexthop.
  controller: Introduce route node.
  controller: Introduce route-exchange-netlink.
  controller: Announce routes via route-exchange.
  controller: Support learning routes.
  controller: Prioritize host routes.
  controller: Allow network namespaces for routes.
  controller: Watch for route changes.
  controller: Cleanup routes on stop.
  controller: Publish ovn-active-active-mappings.
  northd: Support active-active lrps.
  northd: Support active-active bgp redirects.
  controller: Allow filtering routes on ifindex.
  northd: ECMP prefer local routes if possible.

Frode Nordahl (1):
  ci: Manage host/system level dependencies.

 .github/workflows/test.yml           |    5 +
 .gitmodules                          |    3 +-
 configure.ac                         |    2 +
 controller/automake.mk               |   18 +-
 controller/chassis.c                 |   22 +
 controller/ovn-controller.c          |  274 ++++-
 controller/route-exchange-netlink.c  |  360 +++++++
 controller/route-exchange-netlink.h  |   60 ++
 controller/route-exchange-stub.c     |   36 +
 controller/route-exchange.c          |  323 ++++++
 controller/route-exchange.h          |   38 +
 controller/route-table-notify-stub.c |   37 +
 controller/route-table-notify.c      |  154 +++
 controller/route-table-notify.h      |   43 +
 controller/route.c                   |  226 ++++
 controller/route.h                   |   76 ++
 ic/ovn-ic.c                          |   21 -
 lib/automake.mk                      |    2 +
 lib/lrp-index.c                      |   43 +
 lib/lrp-index.h                      |   25 +
 lib/ovn-util.c                       |  143 +++
 lib/ovn-util.h                       |   29 +
 lib/stopwatch-names.h                |    1 +
 m4/ovn.m4                            |   25 +
 northd/automake.mk                   |    2 +
 northd/en-lflow.c                    |   11 +-
 northd/en-northd.c                   |   40 +-
 northd/en-northd.h                   |    8 +-
 northd/en-routes-sync.c              |  516 ++++++++++
 northd/en-routes-sync.h              |   44 +
 northd/inc-proc-northd.c             |   32 +-
 northd/northd.c                      | 1413 ++++++++++++++++++--------
 northd/northd.h                      |   82 +-
 ovn-nb.xml                           |    5 +
 ovn-sb.ovsschema                     |   19 +-
 ovn-sb.xml                           |   74 ++
 ovs                                  |    2 +-
 tests/automake.mk                    |    6 +
 tests/ovn-northd.at                  |   96 +-
 tests/system-common-macros.at        |   12 +
 tests/system-ovn.at                  |    1 +
 41 files changed, 3782 insertions(+), 547 deletions(-)
 create mode 100644 controller/route-exchange-netlink.c
 create mode 100644 controller/route-exchange-netlink.h
 create mode 100644 controller/route-exchange-stub.c
 create mode 100644 controller/route-exchange.c
 create mode 100644 controller/route-exchange.h
 create mode 100644 controller/route-table-notify-stub.c
 create mode 100644 controller/route-table-notify.c
 create mode 100644 controller/route-table-notify.h
 create mode 100644 controller/route.c
 create mode 100644 controller/route.h
 create mode 100644 lib/lrp-index.c
 create mode 100644 lib/lrp-index.h
 create mode 100644 northd/en-routes-sync.c
 create mode 100644 northd/en-routes-sync.h


base-commit: 12886fb5399c44d9ef40381cd4ac562dab0fdeba
-- 
2.47.0


_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to