On 11/26/24 3:37 PM, Felix Huettner via dev wrote:
> Hi everyone,
> 

Hi Felix,

> this is the current state of the implementation of the discussion of the OVN 
> Fabric integration:
> https://mail.openvswitch.org/pipermail/ovs-dev/2024-June/415033.html
> 
> This patchset is extremely large and split into multiple parts.
> They are partially cross-dependent on each other which is why this is one big 
> patchset instead of multiple small ones.
> To make the work easier for everyone involved we should be able to already 
> merge earlier ones if we have an agreement on them.

I started looking ath the first part of the series and applied two of
the patches.  I also shared some minor comments on a few of the other
preliminary patches.  However, the rest of series needs a rebase.  That
would make it easier to review.

I'll continue looking at the rest of the patches until then.

Thanks,
Dumitru

> 
> The code is working in my testsetup, but especially the later patches are 
> most probably not yet acceptable for inclusion in OVN. They lack testcases 
> and documentation of the new features.
> I will continue improving their quality, but i wanted to send this out 
> soonish so everyone can have a look.
> 
> Larger changes v2->v3:
>       * added documentation for all new options
>       * added tests
>       * allow filtering of receiving routes on multiple LRPs per chassis by 
> ifname
> Larger changes v1->v2:
>    * fixed a bunch of issues the ci uncovered
>    * included 3 commits for incremental processing in their original
>      changes
>    * as the CI does not run with the custom ovs submodule i changed the
>      order of part 3 and 4 to allow for a larger CI coverage
> 
> 
> Part 1 contains general refactoring for later steps.
> None of these patches should have any effect on the output of northd and 
> ovn-controller.
> This is also why they do not touch any tests.
> The patches of this part are:
> * northd: Set southbound mac from lrp_networks.
> * northd: Fix relying on naming coincidences.
> * northd: Find outports based on ovn_port.
> * northd: Store outport of parsed_route.
> * northd: Split out join_logical_ports.
> * northd: Reorder join_logical_ports.
> * northd: Rename en_static_routes to en_routes.
> * northd: Move connected routes to route engine.
> 
> 
> Part 2 modifies existing features to better fit to the changes here.
> It has small impacts on functionality by allowing previously forbidden 
> configurations or removing not necessary options.
> The patches of this part are:
> * northd: Autodiscover centralize_routing.
> * northd: Routing-protocol-redirect on crps.
> 
> 
> Part 3 includes the northd side for learning and advertising routes.
> For this we add a new sb-db table to contain routes in either direction.
> After these patches an external system could already use the new table to 
> learn and advertise routes from/to OVN.
> The patches of this part are:
> * northd: Add route table to southbound and sync.
> * northd: Add filtering which routes to advertise.
> * northd: Handle learned routes.
> * northd: Remove learned routes if lrp is removed.
> * northd: Allow announcing individual host routes.
> * northd: Sync routing data to pb.
> 
> 
> Part 4 prepares the CI and our dependencies for the later patches.
> In the patchset submitted here it actually changes the URL of the ovs 
> submodule to my fork.
> This is just for making the CI run and alowing everyone easier testing.
> It should actually be replaced by a pointer to the OVS repo with the 
> following patchset included:
> https://mail.openvswitch.org/pipermail/ovs-dev/2024-November/418547.html
> The patches of this part are:
> * DO NOT APPLY: Use my ovs repo and bump.
> * ci: Manage host/system level dependencies.
> * system-ovn: Remove route without nexthop.
> 
> 
> Part 5 includes the ovn-controller features for learning and advertising 
> routes.
> With these changes ovn-controller can read and modify the routing tables in 
> linux VRFs or network namespaces.
> External tools (like frr) can then use these routing tables to communicate 
> these routes to external systems.
> The patches of this part are:
> * controller: Introduce route node.
> * controller: Introduce route-exchange-netlink.
> * controller: Announce routes via route-exchange.
> * controller: Support learning routes.
> * controller: Support receiving routes per iface
> * controller: Prioritize host routes.
> * controller: Allow network namespaces for routes.
> * controller: Watch for route changes.
> * controller: Cleanup routes on stop.
> 
> 
> Part 6 introduces active-active LRPs.
> These LRPs allow the northbound database to just contain a single LRP for 
> external connections which are then translated to many Port_Bindings on the 
> southbound db.
> This allows for an easy integration of this whole featureset with existing 
> CMS by keeping potential chassis-local configuration values outside of the 
> northbound database.
> The patches of this part are:
> * controller: Publish ovn-active-active-mappings.
> * northd: Support active-active lrps.
> * northd: Support active-active bgp redirects.
> * northd: Support filter routes on active-active.
> 
> 
> Part 7 optimizes ECMP routes across different chassis.
> This allows keeping traffic on a local chassis instead of sending it to other 
> chassis just based on the ecmp hash.
> With the setups the features here encourage this will be more common than it 
> is right now.
> The patches of this part are:
> * northd: ECMP prefer local routes if possible.
> 
> 
> 
> 
> Below i will try to point out how to use this patchset.
> However i probably missed something, sorry for that.
> 
> * This change was tested on a setup using 4 nodes
>    * cmp: running the OVN control plane and ovn-controller
>    * gtw1, gtw2: running ovn-controller and connect to the control plane of 
> cmp
>    * bgp: outside of OVN, simulating a spine-leaf fabric
> * cmp, gtw1, gtw2 are on a shared network which they use for control and 
> overlay traffic
> * setup the local ovsdbs with "system-id", "ovn-encap-ip" and all the other 
> stuff generally needed
> * gtw1 has 2 point-to-point links to bgp
> * gtw2 has 1 point-to-point link to bgp
> 
> Create the following northbound setup:
> ```
> switch 6403cdcc-9354-40c3-801c-9eec666ceebf (public)
>     port public-project-router
>         type: router
>         router-port: project-router-public
>     port public-magic-router
>         type: router
>         router-port: magic-router-public
> switch 473d1998-2ded-410c-b42e-286dd22cf758 (intern)
>     port intern-project-router
>         type: router
>         router-port: project-router-intern
>     port intern-port1
>         addresses: ["ca:b7:23:93:fd:3f 192.168.0.2"]
> switch e45213e9-3312-48ad-ba27-0568f11f6939 (public-for-real-1)
>     port public-for-real-1-magic-router
>         type: router
>         router-port: magic-router-public-for-real-1
>     port physnet
>         type: localnet
>         addresses: ["unknown"]
>     port bgp-port
> router 2db59849-45e3-40e7-9ace-b6179ace2cb5 (magic-router)
>     port magic-router-public
>         mac: "00:00:00:00:fe:01"
>         ipv6-lla: "fe80::200:ff:fe00:fe01"
>         networks: ["1.0.0.1/24"]
>     port magic-router-public-for-real-1
>         mac: "active-active"
>         ipv6-lla: "fe80::200:ff:fe00:0"
>         networks: ["active-active"]
> router ffa39f2d-6def-413b-953a-4c2ee5f671fe (project-router)
>     port project-router-public
>         mac: "00:00:00:00:ff:02"
>         ipv6-lla: "fe80::200:ff:fe00:ff02"
>         networks: ["1.0.0.100/24"]
>     port project-router-intern
>         mac: "00:00:00:00:ff:01"
>         ipv6-lla: "fe80::200:ff:fe00:ff01"
>         networks: ["192.168.0.1/24"]
>     nat d924de49-67c0-4b2d-b2cc-dc362f43984d
>         external ip: "1.0.0.150"
>         logical ip: "192.168.0.2"
>         type: "dnat_and_snat"
> ```
> 
> * On "project-router-public" set a ha_chassis_group pointing to gtw1 and gtw2 
> with different prios
> * ensure that "intern-port1" is bound on "cmp" and that you have a interface 
> somewhere responding to pings
> 
> Ensure the following options are set on the LRPs "magic-router-public" and 
> "magic-router-public-for-real-1":
> ```
> root@ovn-cmp:~# ovn-nbctl list Logical_Router_Port 
> magic-router-public-for-real-1
> _uuid               : de032fe6-5af5-4cbd-85ac-1d19ffc7c8b1
> dhcp_relay          : []
> enabled             : []
> external_ids        : {}
> gateway_chassis     : []
> ha_chassis_group    : 7f808d4b-af12-4291-b249-2b83d248574c
> ipv6_prefix         : []
> ipv6_ra_configs     : {}
> mac                 : active-active
> name                : magic-router-public-for-real-1
> networks            : [active-active]
> options             : {active-active-lrp="true", bgp-mirror=bgp-port, 
> routing-protocol-redirect=bgp-port, routing-protocols="BGP,BFD", 
> use-netns="true"}
> peer                : []
> status              : {hosting-chassis=ovn-gtw1}
> 
> root@ovn-cmp:~# ovn-nbctl list Logical_Router_Port magic-router-public
> _uuid               : 7b3884aa-cde8-4994-ac8c-9f4ed154e4a9
> dhcp_relay          : []
> enabled             : []
> external_ids        : {}
> gateway_chassis     : []
> ha_chassis_group    : []
> ipv6_prefix         : []
> ipv6_ra_configs     : {}
> mac                 : "00:00:00:00:fe:01"
> name                : magic-router-public
> networks            : ["1.0.0.1/24"]
> options             : {dynamic-routing-connected="true", 
> dynamic-routing-connected-as-host-routes="true", 
> dynamic-routing-static="true"}
> peer                : []
> status              : {}
> ```
> 
> * The ha-chassis-group of "magic-router-public-for-real-1" should also point 
> to gtw1 and gtw2, priorities are irrelevant here
> * Set the option:dynamic-routing=true on the magic-router LR
> 
> Run on gtw1 and gtw2:
> ```
> ovs-vsctl set Open . external_ids:ovn-bridge-mappings="phys:physnet"
> ovs-vsctl add-br physnet
> ovs-vsctl add-port physnet <your-point-to-point-links-to-bgp-node>
> ```
> 
> * Get the datapath id of the magic-router LR (i assume 4 below)
> * On gtw1 and gtw2 create a netns named "ovnns${datapathid}"
> * On gtw1 and gtw2 create ports on br-int for the bgp connections
>    * On gtw1: "bgp-port-gtw1-0" and "bgp-port-gtw1-1"
>    * On gtw2: "bgp-port-gtw2-0"
>    * Create a veth pair on the linux side matching to these ports and move 
> the other end to the ovnns netns (i below expect the one in the netns to be 
> named "<portname>-ns"
> 
> * Set the following via "ovs-vsctl set Open . 
> external_ids:ovn-active-active-mappings="
>    * on gtw1: 
> "phys;00:fe:fe:fe:fe:01,172.16.0.10/25,bgp-port-gtw1-0-ns;00:fe:fe:fe:fe:33,172.20.0.10/25,bgp-port-gtw1-1-ns"
>    * on gtw2: "phys;00:fe:fe:fe:fe:11,172.16.0.139/25,bgp-port-gtw2-0-ns"
> 
> * Set the mac and ip address of the bgp ports correctly:
>   * matching to "bgp-port-gtw1-0": 00:fe:fe:fe:fe:01 172.16.0.10/25
>   * matching to "bgp-port-gtw1-1": 14;00:fe:fe:fe:fe:33 172.20.0.10/25
>   * matching to "bgp-port-gtw2-0": 00:fe:fe:fe:fe:11 172.16.0.139/25
> 
> * Spawn frr on gtw1 and gtw2
>    * zebra needs to be started with "-n" as option
> 
> Configuration on gtw1:
> ```
> frr version 8.4.4
> frr defaults traditional
> hostname gtw1
> log syslog informational
> no ip forwarding
> no ipv6 forwarding
> service integrated-vtysh-config
> !
> vrf ovnns4
>  netns /run/netns/ovnns4
> exit-vrf
> !
> router bgp 65000 vrf ovnns4
>  bgp router-id 172.16.0.10
>  no bgp hard-administrative-reset
>  no bgp graceful-restart notification
>  neighbor 172.16.0.1 remote-as 64512
>  neighbor 172.16.0.1 bfd
>  neighbor 172.20.0.1 remote-as 64512
>  neighbor 172.20.0.1 bfd
>  !
>  address-family ipv4 unicast
>   redistribute kernel
>   redistribute connected
>   redistribute static
>   neighbor 172.16.0.1 soft-reconfiguration inbound
>   neighbor 172.16.0.1 route-map ALLOW-ALL in
>   neighbor 172.16.0.1 route-map ALLOW-ALL out
>   neighbor 172.20.0.1 soft-reconfiguration inbound
>   neighbor 172.20.0.1 route-map ALLOW-ALL in
>   neighbor 172.20.0.1 route-map ALLOW-ALL out
>  exit-address-family
> exit
> !
> route-map ALLOW-ALL permit 100
> exit
> !
> ```
> 
> And for gtw2:
> ```
> frr version 8.4.4
> frr defaults traditional
> hostname gtw2
> log syslog informational
> no ip forwarding
> no ipv6 forwarding
> service integrated-vtysh-config
> !
> vrf ovnns4
>  netns /run/netns/ovnns4
> exit-vrf
> !
> router bgp 65000 vrf ovnns4
>  bgp router-id 172.16.0.139
>  no bgp hard-administrative-reset
>  no bgp graceful-restart notification
>  neighbor 172.16.0.129 remote-as 64512
>  neighbor 172.16.0.129 bfd
>  !
>  address-family ipv4 unicast
>   redistribute kernel
>   redistribute connected
>   redistribute static
>   neighbor 172.16.0.129 soft-reconfiguration inbound
>   neighbor 172.16.0.129 route-map ALLOW-ALL in
>   neighbor 172.16.0.129 route-map ALLOW-ALL out
>  exit-address-family
> exit
> !
> route-map ALLOW-ALL permit 100
> exit
> !
> ```
> 
> * Now configure the bgp node
>    * Setup some network local to the node that you want to announce to OVN 
> and that is not overlapping with anything
>    * setup the 3 point to point links with the matching ip addresses
>       * 172.16.0.1/25
>           * 172.16.0.129/25
>           * 172.20.0.1/25
>    * Configure frr for peering with:
>       * 172.16.0.10
>           * 172.16.0.139
>           * 172.20.0.10
> 
> If all goes well you should now learn routes to:
>    * 1.0.0.1/32
>    * 1.0.0.100/32
>    * 1.0.0.150/32
> 
> pings to 1.0.0.150 should now work and end at cmp at "intern-port1".
> depending on how you prioritized the ha_chassis_group for 
> "project-router-public" you will receive different router preferences from 
> gtw1 and gtw2.
> If you change the priority order the route preferences should also change.
> If you shut down any of the bgp peerings you should see the ping replies no 
> longer taking that path.
> 
> To debug which routes should be advertised or have been learned look at 
> "ovn-sbctl list Route"
> 
> If you now add port ports/routers to the LS "public" they should get 
> announced automatically.
> 
> 
> Thanks for sticking through and i hope this was helpful and this not generate 
> more confusion.
> 
> 
> Felix Huettner (32):
>   northd: Set southbound mac from lrp_networks.
>   northd: Fix relying on naming coincidences.
>   northd: Find outports based on ovn_port.
>   northd: Store outport of parsed_route.
>   northd: Split out join_logical_ports.
>   northd: Reorder join_logical_ports.
>   northd: Rename en_static_routes to en_routes.
>   northd: Move connected routes to route engine.
>   northd: Autodiscover centralize_routing.
>   northd: Routing-protocol-redirect on crps.
>   northd: Add route table to southbound and sync.
>   northd: Add filtering which routes to advertise.
>   northd: Handle learned routes.
>   northd: Remove learned routes if lrp is removed.
>   northd: Allow announcing individual host routes.
>   northd: Sync routing data to pb.
>   DO NOT APPLY: Use my ovs repo and bump.
>   system-ovn: Remove route without nexthop.
>   controller: Introduce route node.
>   controller: Introduce route-exchange-netlink.
>   controller: Announce routes via route-exchange.
>   controller: Support learning routes.
>   controller: Support receiving routes per iface.
>   controller: Prioritize host routes.
>   controller: Allow network namespaces for routes.
>   controller: Watch for route changes.
>   controller: Cleanup routes on stop.
>   controller: Publish ovn-active-active-mappings.
>   northd: Support active-active lrps.
>   northd: Support active-active bgp redirects.
>   northd: Support filter routes on active-active.
>   northd: ECMP prefer local routes if possible.
> 
> Frode Nordahl (1):
>   ci: Manage host/system level dependencies.
> 
>  .github/workflows/test.yml           |    5 +
>  .gitmodules                          |    3 +-
>  NEWS                                 |   28 +
>  configure.ac                         |    2 +
>  controller/automake.mk               |   18 +-
>  controller/chassis.c                 |   22 +
>  controller/ovn-controller.8.xml      |   30 +
>  controller/ovn-controller.c          |  337 +++++-
>  controller/route-exchange-netlink.c  |  360 +++++++
>  controller/route-exchange-netlink.h  |   61 ++
>  controller/route-exchange-stub.c     |   36 +
>  controller/route-exchange.c          |  330 ++++++
>  controller/route-exchange.h          |   38 +
>  controller/route-table-notify-stub.c |   37 +
>  controller/route-table-notify.c      |  154 +++
>  controller/route-table-notify.h      |   43 +
>  controller/route.c                   |  233 +++++
>  controller/route.h                   |   88 ++
>  ic/ovn-ic.c                          |   21 -
>  lib/automake.mk                      |    2 +
>  lib/lrp-index.c                      |   43 +
>  lib/lrp-index.h                      |   25 +
>  lib/ovn-util.c                       |  139 +++
>  lib/ovn-util.h                       |   31 +
>  lib/stopwatch-names.h                |    1 +
>  m4/ovn.m4                            |   25 +
>  northd/automake.mk                   |    2 +
>  northd/en-lflow.c                    |   11 +-
>  northd/en-northd.c                   |   38 +-
>  northd/en-northd.h                   |    8 +-
>  northd/en-routes-sync.c              |  565 ++++++++++
>  northd/en-routes-sync.h              |   46 +
>  northd/inc-proc-northd.c             |   31 +-
>  northd/northd.c                      | 1437 ++++++++++++++++++--------
>  northd/northd.h                      |  116 ++-
>  ovn-nb.xml                           |  184 +++-
>  ovn-sb.ovsschema                     |   19 +-
>  ovn-sb.xml                           |   74 ++
>  ovs                                  |    2 +-
>  tests/automake.mk                    |    6 +
>  tests/multinode.at                   |    7 -
>  tests/ovn-northd.at                  |  379 +++++--
>  tests/ovn.at                         |  375 +++++++
>  tests/ovs-macros.at                  |   11 +
>  tests/system-common-macros.at        |   27 +
>  tests/system-ovn.at                  |  705 ++++++++++++-
>  46 files changed, 5542 insertions(+), 613 deletions(-)
>  create mode 100644 controller/route-exchange-netlink.c
>  create mode 100644 controller/route-exchange-netlink.h
>  create mode 100644 controller/route-exchange-stub.c
>  create mode 100644 controller/route-exchange.c
>  create mode 100644 controller/route-exchange.h
>  create mode 100644 controller/route-table-notify-stub.c
>  create mode 100644 controller/route-table-notify.c
>  create mode 100644 controller/route-table-notify.h
>  create mode 100644 controller/route.c
>  create mode 100644 controller/route.h
>  create mode 100644 lib/lrp-index.c
>  create mode 100644 lib/lrp-index.h
>  create mode 100644 northd/en-routes-sync.c
>  create mode 100644 northd/en-routes-sync.h
> 
> 
> base-commit: 12886fb5399c44d9ef40381cd4ac562dab0fdeba

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to