Hi all, I did test your Routing protocol port redirection patch and I was wondering how you guys are planning to make the learning and advertisement of the routes between the Logical router's routing table and the bgp daemon.
Thanks Regards, Tiago Pires On Tue, Aug 6, 2024 at 9:03 AM Frode Nordahl <fnord...@ubuntu.com> wrote: > On Mon, Aug 5, 2024 at 12:02 PM Ales Musil <amu...@redhat.com> wrote: > > On Thu, Aug 1, 2024 at 6:04 PM Frode Nordahl <fnord...@ubuntu.com> > wrote: > >> > >> Hello, Ales, > >> > >> This is a fork of the thread to go back to discuss some of the items > >> raised in the most recent instance of the OVN A/V Community meeting > >> [6]. > > > > > > > > Hi Frode, > > > > thank you for the followup discussion. > > > >> > >> On Fri, Jun 28, 2024 at 11:03 AM Ales Musil <amu...@redhat.com> wrote: > >> > On Tue, Jun 25, 2024 at 6:52 PM Frode Nordahl <fnord...@ubuntu.com> > wrote: > >> >> > >> >> Hello, > >> >> > >> >> We are increasingly seeing requests for integration between OVN > >> >> powered CMSs/workloads and the fabric. > >> >> > >> >> As a side note, this is a very interesting topic to me personally, > and > >> >> I think there are opportunities in the long term for this class of > >> >> software to potentially fill a void for more automated and SDN-like > >> >> ways of managing the physical network, as previously closed physical > >> >> switch hardware is increasingly opening up to programmatic extension > >> >> and control. > >> >> > >> >> While very exciting, it will take a while, both in terms of evolving > >> >> how networking teams are organized, in terms of the longevity of > >> >> networking gear making entity wide refresh cycles very long, not to > >> >> mention gathering agreement and momentum to build such a thing from > >> >> the pieces we have. > >> >> > >> >> > >> >> So to be pragmatic, we need to integrate with something that fabric > >> >> network engineers are comfortable with, and already available on most > >> >> networking hardware, be it closed or open, today. > >> >> > >> >> The most ubiquitous routing protocol, which has prevailed in modern > >> >> layer 3 only data center designs [0], is BGP. > >> >> > >> >> Use cases: > >> >> * Allow fabric to locate and direct traffic to reroutable resources > >> >> such as IPv4/IPv6 prefixes, Floating IPs (FIPs) and Load Balancer > >> >> VIPs. > >> >> > >> >> * Use the fabric as a load balancer, announcing the same service IP > on > >> >> multiple hosts (anycast). > >> >> > >> >> * Aggregate announcements from stacked CMSes (i.e. Kubernetes running > >> >> on top of OpenStack). > >> >> > >> >> > >> >> Requirements: > >> >> * Data path must be hardware offloaded, i.e. the next hop address the > >> >> peer resolves for announcements of OVN resources needs to be an LRP > >> >> IP. > >> >> > >> >> * Minimize configuration overhead through the use of IPv6 LLAs for > >> >> peering routing both IPv4 and IPv6 prefixes over a IPv6 BGP session > >> >> [1] (aka. “BGP Unnumbered”). > >> >> > >> >> * Support ECMP out of the host, i.e. use L3 interfaces potentially > >> >> connecting to two different ToRs, instead of bonds, avoiding the > >> >> additional complexity of multi-chassis bonds. > >> >> > >> >> * Support BGP authentication [2][3], i.e. the source, destination > >> >> address and ports in packet headers can not be changed. > >> >> > >> >> * Compatibility > >> >> * Running a BGP protocol suite on the host is becoming a thing in > >> >> its own right, and our users may have requirements of their own that > >> >> influence their choice of implementation. We need to take this into > >> >> account and choose integration methods that allow OVN to work with > >> >> multiple protocol suite implementations. > >> >> > >> >> * While we have the power to change and fix issues in popular > >> >> routing protocol suites, such as FRR, we need to be able to integrate > >> >> with versions that exist on networking hardware out there today. > >> >> > >> >> Limitations that influence/dictate implementation choices: > >> >> * Peering with IPv6 LLAs to meet the configuration overhead > >> >> requirement makes the peering relationship point to point. > >> >> > >> >> * Popular BGP implementations, such as FRR which is used as routing > >> >> protocol suite by many ToR open source NOSes, does not accept > >> >> sending/receiving IPv6 LLA next hop with the route, so the BGP peer > >> >> address will be used as next hop. (There are even mentions of 3rd > >> >> party nexthop currently not being supported, but not sure if that is > >> >> accurate [4]). > >> >> > >> >> * As mentioned above, BGP authentication requires IP headers to be > >> >> unchanged for the BGP TCP packets going to/from the BGP speaker. > >> >> > >> >> > >> >> Proposed implementation: > >> >> > >> >> We are in the process of preparing some RFC/PoC patches that at a > high > >> >> level will: > >> >> * Manage a VRF in the system serving two purposes: > >> >> * Leaking of route information from ovn-controller to the VRF > >> >> routing table, which a routing protocol suite can redistribute > subject > >> >> to configuration. > >> >> > >> >> * Provide an IP endpoint that a VRF aware application, such as > FRR, > >> >> can bind to serving as a BGP speaker on behalf of a OVN LRP IP. > >> >> > >> >> * We will attach a OVN VIF to this VRF that has data path rules that: > >> >> > >> >> * Forward required traffic destined to the OVN LRP IP to the VRF. > >> >> > >> >> * Forward required traffic from the application bound to the VRF > as > >> >> if it originated from the OVN LRP IP. > >> >> > >> >> > >> >> Hopefully we'll have something up on the list before the end of this > >> >> week, which makes it real and easier to reason about for further > >> >> discussion. > >> >> > >> >> > >> >> Prior art: > >> >> > >> >> We recognize that there already exists a third party approach to this > >> >> in the ovn-bgp-agent [5] governed by OpenStack, and our goal with > this > >> >> work is to provide a tighter integration that might cater generically > >> >> for other CMSes and use cases. > >> >> > >> >> > >> >> 0: https://datatracker.ietf.org/doc/html/rfc7938 > >> >> 1: https://datatracker.ietf.org/doc/html/rfc5549 > >> >> 2: https://datatracker.ietf.org/doc/html/rfc2385 > >> >> 3: https://datatracker.ietf.org/doc/html/rfc5925 > >> >> 4: > https://github.com/FRRouting/frr/blob/cc3519f3e6eaa06f762e0d447202df32df66e129/bgpd/bgp_route.c#L2719 > >> >> 5: https://docs.openstack.org/ovn-bgp-agent/latest/ > >> > >> > >> 6: > https://mail.openvswitch.org/pipermail/ovs-dev/2024-August/416209.html > >> > >> > > >> > Hi Frode, > >> > > >> > looking forward to the RFC. > >> > >> As we agreed, the current set of patches that we have > >> [7][8][9][10][11][12][13] will not be considered for the 24.09 release > >> as we would like to make it more feature complete and target the 25.03 > >> release instead. In that context I guess they serve as the RFC > >> patches. > >> > >> 7: https://mail.openvswitch.org/pipermail/ovs-dev/2024-July/416038.html > >> 8: https://mail.openvswitch.org/pipermail/ovs-dev/2024-July/416039.html > >> 9: https://mail.openvswitch.org/pipermail/ovs-dev/2024-July/416040.html > >> 10: > https://mail.openvswitch.org/pipermail/ovs-dev/2024-July/416042.html > >> 11: > https://mail.openvswitch.org/pipermail/ovs-dev/2024-July/416041.html > >> 12: > https://mail.openvswitch.org/pipermail/ovs-dev/2024-July/416043.html > >> 13: > https://mail.openvswitch.org/pipermail/ovs-dev/2024-July/416044.html > >> > >> In addition to the above there is the LRP BGP redirect patch from > >> Martin [14], which could be useful independently. > >> > >> 14: > https://mail.openvswitch.org/pipermail/ovs-dev/2024-July/416095.html > >> > >> Discussion points from meeting: > >> 1) OVN and Netlink code > >> In the meeting you raised some concerns about introducing Netlink code > >> in the OVN repository. While I agree with you 100% that the part of > >> [10] that vendors code from OVS (contents of > >> route-exchange-netlink-private.h), should instead be a patch for OVS. > >> > >> The parts of [10] that provide higher layer helper functions, > >> consuming OVS library code, do not naturally fit in OVS as OVS itself > >> has no use for them. > > > > > > > > After the discussion that we had during the meeting I agree that we > should reuse > > as much OvS code as possible. > > > >> > >> As a quick reminder, we are looking at Netlink because it provides a > >> simple and established API for exchange of this type of information > >> which is already supported by all routing protocol suites out there. > >> It is not tied to any particular data path type, we could > >> theoretically even use it as IPC between two userspace processes, > >> removing the kernel from the picture with support on the routing > >> protocol suite side. (There has been some discussion on this for BIRD: > >> https://bird.network.cz/pipermail/bird-users/2021-September/015707.html > ). > >> > >> Would it be possible to reach some compromise to include only the > >> parts that consume OVS library code (route-exchange-netlink.{c|h})? > >> > >> While a plugin based approach was also suggested, and we have prior > >> examples of successfully using that, it does not come without > >> substantial cost. So I want to explore what options there are to host > >> this inside the main repository. > > > > > > We could maintain the netlink plugin in the OVN codebase as an example > > and still have the plugin system in place. The plugin system has > potential > > benefit of not locking ourselves to just netlink, but we could > potentially use > > API regardless of the OVN code. Would that be enough of justification > > and to add the plugin system + netlink as default plugin housed > > in the OVN codebase? > > This makes sense and provides a path forward, thanks! > > We are in a point of the development cycle where we need to tend to > some downstream stuff, but we will continue this work. I'll try to get > patches to make the OVS route-table module consumable/reusable for > this work posted as soon as possible and start thinking about what's > needed in the route-exchange provider interface. > > >> 2) OVS OpenFlow extensions > >> One of the counter proposals you brought up was to add OVS OpenFlow > >> extensions to allow OVN instruct OVS to insert routes into a system > >> routing table. > >> > >> While I see this could be a clear separation of concerns between OVN > >> and OVS, and OpenFlow being OVN's native integration language, I > >> struggle a bit with the general usefulness of such an extension. > >> > >> Our use case for inserting routes into a system routing table is > >> purely for exchange of control plane information with some external > >> system, such as a routing protocol suite, and we have no interest in > >> using it for actual data path control. This is in contrast to how OVN > >> uses OpenFlow generally, which as far as I understand is to control > >> the data path. > > > > > > In the light of the other discussion it doesn't really make sense to have > > plugin when most of the netlink code wouldn't be in the OvS codebase > > anyway. > > Assuming you are referring to the OpenFlow extension here, and yes, I > agree. > > >> > >> > >> > >> [ snip ] > >> > >> > Another important part that we should keep in mind if possible, is > the EVNP > >> > use case. To be able to configure VXLAN tunnels based on the info > that we > >> > will receive. > >> > > >> > I'm not sure how far/deep in the actual design you are but maybe the > following > >> > might be helpful in some way. What I had in mind was sort of plugin > that would > >> > expose the info of bound entities that are interesting in terms of BGP > >> > (it could be configurable), so mainly FIPS, LBs, GW router IPs. For > the import > >> > part (which would be applicable only for GW LR) we would create > entries in SB > >> > DB similarly as we do currently with Multicast_Group > >> > (BGP_Routes? EVPN_Tunnels?). Northd could consume those values and > configure > >> > logical flows and encaps as needed. > >> > >> While the EVPN part is not a priority for us at this point in time, we > >> will of course be interested in making sure the work we put into stage > >> 1 (ovn-controller redistributing FIPS, LBs and GW router IPs), stage 2 > >> (ovn-controller learning routes) will be consumable for a stage 3. > > > > > > Great, yeah my point during the discussion wasn't about > > making it available right away just to reiterate that it would be of > > interest for us and if needed we can of course help with the development > > process. > > Cool stuff, let's do it! > > -- > Frode Nordahl > > >> > >> > >> -- > >> Frode Nordahl > >> > >> > Let me know if that makes sense. > >> > > >> > Thanks, > >> > Ales > >> > > >> > > >> > -- > >> > > >> > Ales Musil > >> > > >> > Senior Software Engineer - OVN Core > >> > > >> > Red Hat EMEA > >> > > >> > amu...@redhat.com > >> > > > > Thanks, > > Ales > > > > -- > > > > Ales Musil > > > > Senior Software Engineer - OVN Core > > > > Red Hat EMEA > > > > amu...@redhat.com > _______________________________________________ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > -- _‘Esta mensagem é direcionada apenas para os endereços constantes no cabeçalho inicial. Se você não está listado nos endereços constantes no cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas estão imediatamente anuladas e proibidas’._ * **‘Apesar do Magazine Luiza tomar todas as precauções razoáveis para assegurar que nenhum vírus esteja presente nesse e-mail, a empresa não poderá aceitar a responsabilidade por quaisquer perdas ou danos causados por esse e-mail ou por seus anexos’.* _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev